I’ve promised to write a full-blown article dedicated on troubleshooting Provisioning Services retries, but while that’s in the works I’ll share with you all a solution to an issue that I came across in a recent implementation of XenDesktop/PVS with VMware ESXi on Cisco UCS hardware. I’m sure most of you that work with PVS on a daily basis have seen at least some retries in your environment. While a certain amount (0-100) can be deemed acceptable, anything that’s above that count is a cause of concern. As a quick refresher, let’s remind ourselves of what PVS retries are and why they occur.
Retries in PVS are a mechanism to track packet drops in the streaming traffic between a Provisioning Server and a target device. Because that traffic is based on the not-so-reliable (however optimized by Citrix) UDP protocol, it’s very important that we don’t put configurations in place that would strangle that traffic to death (surely you don’t want your users complaining about application slowness and session latency). So if one day you look at the Show Usage tab of your vDisk in the PVS Console and you realize you have hundreds or thousands of retries generated on some or most of your targets, you know that something wrong is going on in your environment and it has to be addressed immediately:
Of course, starting at the physical side and working your way up to the virtual layer is a good approach even though a lot of times the opposite occurs because your network or storage teams will want hard evidence that it’s not your system that’s at fault until they get involved. I recommend involving them from the very beginning and while you are looking at your PVS configuration, they can start investigating routers, switches, cables, storage arrays, and other equipment (it could be something as simple as a malfunctioning switch port or outdated firmware or even a misconfiguration on the VMware vSwitch and the NIC teaming settings). In this particular case, though, everything was configured correctly both on the Citrix side (2 PVS servers on Windows 2012 R2 and 100 Windows 7 SP1 VDA targets with Citrix best practices in place across the board) and on vSphere (6 ESXi hosts in a cluster with Standard vSwitches and virtual adapters dedicated to the PVS traffic). We even checked the firmware in UCS which was slightly out-of-date but updating it didn’t help either.
So what ended up being the issue? QoS! Cisco UCS has Fabric Interconnects (FIs) that provide connectivity for blade/rack servers within your chassis. Just like regular switches, FIs have Quality-of-Service capability that prioritizes traffic based on system classes as shown in the following picture:
So what if the VNIC that carries out the PVS traffic has a drop-eligible, low-priority, or best-effort weight assigned to it? Yes, traffic will certainly get dropped! As a result, you will see retries generated in the PVS Console and session latency is likely to occur on the target devices. The best thing you can do in this case is to DISABLE QoS for the PVS VNIC in the UCS Manager and reboot all your PVS target VMs. Arguably, you could be fine by just assigning that VNIC a higher priority in the QoS stack but I personally haven’t tested that option and recommend disabling it even if I have to take a bit of heat from the UCS Gurus 🙂
As always, any questions or feedback are welcome in the comments section. I hope this helps those of you who are experiencing this issue or just want to be proactive about it!
In today’s HOW-TO edition we’ll cover a fairly simple but very important method: applying a cumulative hotfix that requires a full reinstall of PVS on your two HA configured Provisioning servers. Here are the steps:
2. Stop the Stream service on PVS01. Your targets should fail over to PVS02. Reinstall PVS Console and Server software from the hotfix package on PVS01.
3. Rerun the PVS Configuration Wizard. When presented with farm options, select “Farm is already configured.”
4. Breeze through the Config Wizard. Fortunately, the tool selects by default your previous configuration settings, so no need to change anything unless you really need to. If the process completes successfully, Soap and Stream services will be restarted automatically.
5. Stop the Stream service on PVS02. Your targets should fail over to PVS01. Repeat the same procedure 1-3 on PVS02. You can then rebalance your targets manually if needed.
6. You are done.
Note: I generally recommend scheduling a maintenance window for updating your servers. Even though the procedure can be finished in 10 minutes, you don’t want to find yourself in a situation where HA failover doesn’t work as expected and you lose half of your connections when stopping the services.
As many of you have noticed, PVS 7.1 has a brand new cache type called “Cache in Device RAM with Overflow on Hard Disk.” This new feature of PVS is designed to provide better performance by combining the light speed of RAM with the efficiency of hard disk storage and at the same time avoiding previous hurdles such as unexpected BSOD when using RAM cache due to the memory getting filled up. The new differencing format of the file (VHDX) also resolves the issue when caching to device HD where applications accessing printer drivers would randomly crash.
As some of you have noticed, however, target device performance has not increased dramatically in terms of speed. In fact, some folks out there with IOMeters have reported that IOPs have not improved at all with the new cache type. This is currently a known issue due to a problem turning on the RAM portion of the cache and I know for a fact that Citrix is working on fixing it in the next hotfix release for PVS 7.1. So stay excited!
The RAM portion of the this cache type is fixed in CTX140338 which is a target device hotfix.
If you ever get locked out of your PVS Console because someone in your organization changed Active Directory membership groups around, you will need to find out what security groups have permission to access the farm. It’d be super-easy if you could just open your Console…Security tab under Farm Properties. But what if you lost access?
…Fortunately there is a way because it’s all in the database. All you need to do is login to your SQL server, launch SQL Management Studio, expand the PVS database, right-click on the dbo.AuthGroup table, select top 1000 rows, and you will see a list of all the AD user groups that have permissions to access the farm. Then you will most likely realize that your Console user is NOT a member of any of those groups!
Reverse imaging can be a tedious procedure for some of us but is necessary to keep our vDisks up-to-date with hypervisor tools and cumulative hotfixes for PVS. Here are my 14 steps:
01. Boot your target device to the vDisk you want to reverse image with additional disk attached (same size or larger than the vDisk).
02. Make sure the new disk is visible in Windows Disk Management and mounted as a drive (i.e. D:\).
03. Use BNImage or XenConvert to copy the vDisk to the added drive (both tools are located under C:\Program Files\Citrix\Provisioning Services).
04. Make sure the new volue is set to Active in Disk Management.
05. Set the device to boot from Hard Disk in the PVS console.
06. Boot the device to Hard Drive by manually changing the boot order in BIOS.
07. Remove any antivirus software (Reboot).
08. Remove PVS target device software (Reboot).
09. Update hypervisor tools if necessary (Reboot).
10. Install latest PVS target device software (Reboot).
11. Run the PVS Imaging Wizard (System will reboot automatically and will continue to finish the conversion process). Before reboot change the boot order in BIOS to Network boot.
12. Switch the device to boot from vDisk in the PVS console (in Private or Maintenance Mode).
13. Install Antivirus software.
14. You are done.
Citrix Provisioning Services is a UDP-based streaming technology designed to deliver an operating system (vDisk) to client devices over the network. PVS uses PXE protocol specs (UNDI) to boot a target device (PXE client) and deliver a bootfile program that contains the instructions necessary to login to a Provisioning Server and start streaming the virtual disk over the network.
There are three really great things about PVS:
1. Single image management
Imagine you have a data center with 100 XenApp servers. Using traditional methods of server management, you would need to login to each and every one of them to make changes such as application updates, Windows patches, and lots of different things or maybe use GPOs to enforce certain modifications, etc.
With Citrix Provisioning Server (PVS) you can use a designated machine as a golden image, create a virtual disk from its hard drive, and assign it to hundreds or even thousands of servers for OS delivery. Since a vDisk has 2 modes – read/write and read-only, you can modify the image in read/write (Private mode) from one device and then stream to all your devices in read-only (Standard mode). That way all the changes made in Private mode update the VHD and can then be streamed to the rest of your devices in Standard mode propagating the changes you made instantaneously!
2. The Power of Read-Only
Read-only VHD is a truly powerful feature of PVS. Every time a machine is connected to a virtual disk from PVS, any changes made by users to the OS (outside of their roaming profiles) are flushed upon reboot! So, let’s say for instance, user John logs into a provisioned target device (e.g. XenApp server, XenDesktop, Windows endpoint, etc.) and messes with network adapter settings, clock, registry, etc., those changes are gone once the machine is shut down. Also, think about viruses! 🙂
PVS is fully enterprise-ready. Not only you have the option of adding existing machines to Device Collections in the PVS Console but in a virtual environment you can spin them yourself! XenDesktop Setup Wizard and Streamed VM Setup Wizard are at your disposal to quickly create new VMs on the fly when you need them.