CPU Ready: Over Built VM or Over Utilized Host?
When talk performance with any virtualized server environment, CPU Ready is a common key indicator of how well your VM is performing. However, its not always a cut and dry explination as to why your CPU Ready times are high.
To start, for those that aren’t aware, CPU Ready is:
The amount of time a virtual machine waits in the queue in a ready-to-run state before it can be scheduled on a CPU.
This means that a VM is ready to process something, however, it has to wait because the CPU resources it requires are not available on the physical host.
Before we examine causes of high CPU Ready, lets try to look at what are acceptable values for CPU Ready time. Unfortunately, there is no hard set value to say ‘Yep, your CPU Ready has crossed the ‘its bad’ threshold.’ General rule of thumb is that your CPU Ready time not be higher than 200ms if being checked in the vCenter performance charts, or 5% if being checked using the esxtop command. Again, this isn’t a hard set value. Your VMs role may require less CPU Ready time for more critical functions, or may be more lenient to longer CPU Ready times as well. It all depends on your environment, and its up to you as an admin to determine what works in your individual environment.
So let’s move on to common causes of CPU Ready. The two biggest causes of CPU Ready also happen to be exact opposites. The first and most common is over built VMs. The second is having an over utilized physical host. Both can be equally bad, although the latter usualy has more of an impact across the environment.
Over built VMs are a common problem for people and environments new to virtualization. Design and provisioning of physical and virtual machines are two different animals entirely. I’m not going to dig deep into this, as its already a topic I’ve discussed multiple times and even written a series on. Instead, we will look at why over built VMs cause a spike in CPU Ready time. The problem lies with the vCPU amount. To understand this concept, we first have to look at how vCPUs relate to physical CPUs. For this topic, we will use the example of a dual quad core physical host, resulting in 8 physical cores to be allocated.
vCPUs are tied directly to physical cores, but no specifically to 1 core. If you create a VM with 1 vCPU, then you are allocating 1 physical core to that VM whenever it needs to process something. In sticking with our example, this means that you can in theory create 8 single vCPU VMs and never have any resource contention between the VMs. Once you start treking into multi-vCPU VMs, this is where you start running into problems. VMs with multiple vCPUs require that all allocated cores be free before processing can begin. This means, if you have a 2 vCPU machine, 2 physical cores must be available, and a 4 vCPU requires 4 physical cores, etc.
Most VMs can run as a single vCPU machine with no issues. I’d venture to say 50-80% of all multi-vCPU VMs are over built. So looking at our example of 8 single vCPU on a physical host with 8 cores, let’s say someone misguidely requested to increase the vCPU count on one of the VMs. Going to 2 vCPUs means that not only did the one physical core it previously had allocated have to be free, but now, another VM has to be idle and give up its allocated core to the 2 vCPU VM. So, at this point, if all the VMs need to process something at the exact same time, you now have one VM that has to wait until the needed physical cores become available. So the more vCPUs you have allocated, the greater the chance of CPU contention, and elevated CPU Ready times. The same idea applies for VMs with 4 vCPUs, except in our example, you now have to wait for half of the physical cores to become available before the VM can process.
With this information, its easy to see why over built VMs can create high CPU Ready times. If you have a VM that doesn’t need multiple vCPUs, but has them anyway, then you have the potential to cause that VM, or another VM to have to wait long periods of time for the corrent amount of physical cores to become free.
The other main cause of CPU Ready times being high, is having a host that is over utilized. Having too many VMs on a host is obviously going to be a bad thing for mutliple reasons, including those other than high CPU Ready issues. Resource contention is the main problem, and CPU contention can be one that will impact the end user the most. The same concept of relationship between vCPUs and physical cores applies here as well. While it is a common occurance to provision more vCPUs than the number of cores available, its a fine line to walk, and can be a hard thing to get right at first. Each environment will differ, but the key is to make sure that your VMs have adequate access to the resources they need to process their task. You don’t want to have too many of your VMs waiting around to process.
In closing, these two scenarios can be quite common items, and should be the first place to investigate when you notice high CPU Ready times. There are tools available that can help with these issues as well, such as Clustering, DRS, Resource Pools, and proper knowledge of capacity planning and VM provisioning; all things that seasoned ESX admins utilize and have a deep knowledge and understanding of. There are multiple white papers, and blog postings from many of the respected VM Experts, so be sure to read up on these things if needed, and as always, feel free to post any questions or comments below.