PVS Retries and Quality-of-Service On Cisco UCS

Hello Everyone and Happy Tuesday!

I’ve promised to write a full-blown article dedicated on troubleshooting Provisioning Services retries, but while that’s in the works I’ll share with you all a solution to an issue that I came across in a recent implementation of XenDesktop/PVS with VMware ESXi on Cisco UCS hardware. I’m sure most of you that work with PVS on a daily basis have seen at least some retries in your environment. While a certain amount (0-100) can be deemed acceptable, anything that’s above that count is a cause of concern. As a quick refresher, let’s remind ourselves of what PVS retries are and why they occur.

Retries in PVS are a mechanism to track packet drops in the streaming traffic between a Provisioning Server and a target device. Because that traffic is based on the not-so-reliable (however optimized by Citrix) UDP protocol, it’s very important that we don’t put configurations in place that would strangle that traffic to death (surely you don’t want your users complaining about application slowness and session latency). So if one day you look at the Show Usage tab of your vDisk in the PVS Console and you realize you have hundreds or thousands of retries generated on some or most of your targets, you know that something wrong is going on in your environment and it has to be addressed immediately:

PVSRetries

Of course, starting at the physical side and working your way up to the virtual layer is a good approach even though a lot of times the opposite occurs because your network or storage teams will want hard evidence that it’s not your system that’s at fault until they get involved. I recommend involving them from the very beginning and while you are looking at your PVS configuration, they can start investigating routers, switches, cables, storage arrays, and other equipment (it could be something as simple as a malfunctioning switch port or outdated firmware or even a misconfiguration on the VMware vSwitch and the NIC teaming settings). In this particular case, though, everything was configured correctly both on the Citrix side (2 PVS servers on Windows 2012 R2 and 100 Windows 7 SP1 VDA targets with Citrix best practices in place across the board) and on vSphere (6 ESXi hosts in a cluster with Standard vSwitches and virtual adapters dedicated to the PVS traffic). We even checked the firmware in UCS which was slightly out-of-date but updating it didn’t help either.

So what ended up being the issue? QoS! Cisco UCS has Fabric Interconnects (FIs) that provide connectivity for blade/rack servers within your chassis. Just like regular switches, FIs have Quality-of-Service capability that prioritizes traffic based on system classes as shown in the following picture:

UCSQOS

So what if the VNIC that carries out the PVS traffic has a drop-eligible, low-priority, or best-effort weight assigned to it? Yes, traffic will certainly get dropped! As a result, you will see retries generated in the PVS Console and session latency is likely to occur on the target devices. The best thing you can do in this case is to DISABLE QoS for the PVS VNIC in the UCS Manager and reboot all your PVS target VMs. Arguably, you could be fine by just assigning that VNIC a higher priority in the QoS stack but I personally haven’t tested that option and recommend disabling it even if I have to take a bit of heat from the UCS Gurus 🙂

As always, any questions or feedback are welcome in the comments section. I hope this helps those of you who are experiencing this issue or just want to be proactive about it!

PVS 7.6, Windows 2012, SMB 3.0, and Secure Negotiate

Folks,

I hope Y’ALL had a great weekend!
Going back to PVS I wanted to share the resolution to an issue I came across recently during a client implementation. Instead of confusing you with a big giant paragraph, I’ll use one of my favorite templates back from my years on the Citrix Escalation Team.

 

Environment:

Citrix Product: Provisioning Services 7.6
VHD Storage: EMC Isilon NAS (w/ CIFS shares)
PVS Server OS: Windows 2012 R2 SP1
VHD OS: Windows 7 SP1

 

Issue:

Attempting to create a vDisk on shared storage failed with Error Management Interface (Management Interface: Operating System error occurred). The same error was thrown both when creating it from the PVS Console and from the target device using the Imaging Wizard. Also, when validating server paths in the vDisk Store Properties, randomly Path Not Found message is displayed.

securenegotiation

Resolution:

On all Provisioning Servers in the environment, run the following command in PowerShell as an Administrator to disable Secure Negotiate in Windows:

Set-ItemProperty -Path “HKLM:\SYSTEM\CurrentControlSet\Services\LanmanWorkstation\Parameters” RequireSecureNegotiate -Value 0 -Force

 

Explanation:

This behavior can be caused by the Secure Negotiate (also known as Secure Dialect Negotiation) feature added by Microsoft in SMB 3.0 for Windows 2012 which requires that error responses by all SMBv2 servers including protocols 2.0 and 2.1 are correctly signed. If the correct signature is not received back from the SMB client, the connection is cut off to prevent Man-in-the-Middle attacks. Some file servers don’t support this feature and that’s where you would see the most failures. Check out Microsoft’s article on Secure Negotiation by the Open Specifications Support Team HERE (they’re pretty technical BTW! 😉 )

Machine Creation Services (MCS) Fail to Create Catalog (Permissions)

Dear Readers,

I haven’t posted anything in such a long time mainly because I’ve been so busy with my new role as Consulting Architect and all the cool things I’m learning in the field. Anyway, the PVS Guy is NOT dead. In fact, he is more alive than ever 🙂 – and I am planning to revive the website starting NOW. I am launching a new category called XenDesktop to share some tips & tricks from my VDI projects.

Today we’ll talk about an error I came across recently that can be often seen in recently upgraded XenDesktop environments (5.x – 7.x) and parallel implementations. As most of you are aware, when creating a new machine catalog with MCS, the Delivery Controller uses the vCenter host connection and service account configured at site setup to request actions from the VMware hypervisor. If you happen to use the same account on a 7.x DDC that you used in 5.6 without changing any permissions in vCenter, MCS will most likely fail to create the catalog. If you export the error details to a text file (as you always should), you will see the following exception:

 

Terminating Error:
An error occurred while preparing the image.
Stack Trace:
at Citrix.Console.PowerShellSdk.ProvisioningSchemeService.BackgroundTasks.
ProvisioningSchemeTask.CheckForTerminatingError(SdkProvisioningScheme
Action sdkProvisioningSchemeAction)
at Citrix.Console.PowerShellSdk.ProvisioningSchemeService.BackgroundTasks.
ProvisioningSchemeTask.WaitForProvisioningSchemeActionCompletion
(Guid task
Id, Action`1 actionResultsObtained)
at Citrix.Console.PowerShellSdk.ProvisioningSchemeService.BackgroundTasks.
ProvisioningSchemeCreationTask.StartProvisioningAction()
at Citrix.Console.PowerShellSdk.ProvisioningSchemeService.BackgroundTasks.
ProvisioningSchemeCreationTask.RunTask()
at Citrix.Console.PowerShellSdk.BackgroundTaskService.BackgroundTask.
Task.Run()

DesktopStudio_ErrorId : UnknownError
ErrorCategory : NotSpecified
ErrorID : FailedToCreateImagePreparationVm
TaskErrorInformation : Terminated
InternalErrorMessage : Either the account is not granted sufficient privilege or disabled or username/password is incorrect Either the account is not granted sufficient privilege or disabled or username/password is incorrect Permission to perform this operation was denied.

That’s because XenDesktop 7.x requires two additional rights assigned in vCenter that were not required for 4.x and 5.x:

VirtualMachine.Config.AdvancedConfig ==> Virtual machine > Configuration > Advanced

VirtualMachine.Config.Settings ==> Virtual machine > Configuration > Settings

For a full list of VMware service account permissions for XenDesktop, click HERE for 7.x and HERE for 4.x and 5.x.

 

Ah!! Tricky, isn’t it?! 🙂

Hotfix PVS710TargetDeviceWX64002

Hotfix PVS710TargetDeviceWX64002 is the most recent patch for your provisioned images and it contains a fix for a known BSOD on CFsDep2.sys as well a NIC teaming enhancement for HP Moonshot Broadcom NICs. It also replaces the previous public target-side release (PVS710TargetDeviceWX64001) which has the important “Cache on device RAM with overflow on HD” fix that enables the RAM portion. Get it here.

Why Is It Important to Be a Local Admin in PVS?

My Friends,

Today we are going to talk about permissions in PVS and why it is important for the Soap service user to be a member of Local Administrators on your Provisioning Servers.

For the most part in PVS you can get by with just letting the Configuration Wizard do its thing during initial setup. It enables the different services that make the PVS functionality possible (Soap, Stream, etc.) and turns on the necessary permissions on the database. For KMS, however, every time you switch modes from Private to Standard and select Key Management Service on the vDisk, PVS performs a volume operation on the server that requires elevated privileges, specifically the ability to perform volume maintenance tasks and if you are running Soap/Stream under, say, Network Service or a custom=made account, it will likely lack those rights. While there is a GPO that you can enable called “Perform Volume Maintenance Tasks” under \Computer Configuration\Windows Settings\Security Settings\Local Policies\User Rights Assignment\ in GPEDIT.msc and add your account to the member list, you will definitely be better off just adding Soap user to the Local Administrators group on all Provisioning Servers in the farm. You will save yourself a lot of headaches down the road – permissions are always tricky!

Regards,

– The PVS Guy

Hotfix PVS710TargetDeviceWX64001 Is Out!

Folks, I hope you had a great holiday season! The New Year for PVS has started with some exciting news about the new Write Cache option – Cache on device RAM with overflow on hard disk. Introduced with PVS 7.1, it is designed to provide faster performance than Cache on Device HD (which is the most popular method of caching these days) while at the same time fix an issue with Microsoft’s Address Space Layout Randomization (ASLR). The new cache uses a VHDX format which takes care of any issues you may have experienced with crashing applications and print drivers on your provisioned images due to a conflict with the ASLR technique which Microsoft developed to randomize areas of the code in memory and make it harder for hackers to predict the location of certain processes within an application.

With the initial release of PVS 7.1, however, there was an issue with turning on the RAM portion of this hybrid write cache, so you would’ve seen some performance improvement but not what you expected with RAM. The new hotfix PVS710TargetDeviceWX64001 fixes this issue and is now publicly available for download at CTX139850. 

It is a target-side patch so you will need to reverse image your vDisk to do the install. Good luck!

Tip of the Day: Troubleshooting Storage

Hello again, my friends! A quick Saturday-night one-liner if you run into slow performance/boot issues with your PVS targets and you are using shared storage, move the vDisk files (VHD, AVHD, and PVP) local to the Provisioning Server. That way your Stream Process won’t have to travel through the wire to read VHD blocks of data! If that improves performance drastically, you will at least know to concentrate your troubleshooting efforts on the connectivity to the storage device.