Tag Archives: VDI

Which VDI User needs more CPU or RAM – Part 2?

In the previous post, I shared how we can quickly answer fundamental questions such as:

  • Is there any users out there who needs more RAM or CPU?
  • If yes, who and how much short are they? What time and how often they did this situation?

We covered CPU, let’s cover RAM now 🙂

RAM is not so simple. As you can see here, the Cached Memory and Free Memory are not visible outside the Guest OS. This means the counter you need to use should ideally be from the Guest, and not from the Hypervisor. The post here shows that it is possible that they differ.

The good thing in VDI is, Horizon View comes with the agent out of the box. The vRealize Operations for Horizon agent has been integrated into the base Horizon View agent. As a result, there is no need to deploy the vRealize Operations End Point agent.

Now, there are 2 ways we can determine when a user needs more RAM:

  • RAM usage is high.
  • Available RAM is low

I’m using the second one as it’s easier for you to see. If I show RAM Usage is 13574 MB, you still need to know the total configured RAM (e.g. 16 GB RAM), and then subtract the number. Well, that will take you to the Available RAM 🙂

Since we have lots of VDI users, the first thing we need to do is to ensure no one has high utilization that is too high, or runs out of available RAM. Super Metric comes in handy here. To find out if anyone runs out of Available RAM, you can create the super metric below.

Available RAM

Once you do that, it’s a matter of showing them as a line chart on the dashboard.

Available RAM 2

You do the same thing for the Committed Byte. Why do I use Committed Byte and not Memory In Use? Can you guess why?

Memory In Use can easily be determined. It is just Total RAM – Available RAM.

Committed Byte, on the other hand, does not always go hand in hand with Memory Usage. See this blog for the explanation. So we need to complement our Available RAM (MB) with Committed Memory (%). vRealize Operations for Horizon has the metric too.

The 2 super metrics will provide a good overview of the entire environment. We can just see the 2 line charts, and at a glance we know if everyone is doing well. If not, the list next to it will tell us which user was affected. The list is just using the standard View widget, which I covered in previous post.

RAM 1

V4V 6.2 lets you map the user name with the VM name and Windows name.

Memory

Hope that helps you in making sure your VDI users are happy, and productive! 🙂

Which VDI User needs more CPU or RAM?

VDI workload differs to Server workload. As a result, we cannot use the same approach to right size them. You probably know it well, so let me just highlight some of the differences

  • Usage is spiky, not predictable.
    • A server generally speaking has a nice predictable pattern on any given 5 minutes. The CPU and RAM does not go from 5% to 95% back and forth within 1 minute.
  • Human does not work non-stop.
    • Typing time, thinking time, meeting, coffee break, travelling, public holiday, sick leave, etc. A server, being a machine, has none of this 🙂
    • In this age of mobile cloud, there is no fixed “working hours”. Each user has his or her own work schedule. We cannot average across a long time period. 1 hour is probably as long as you want it when it comes to averaging.
  • There is non-user workload
    • Windows weekly AV scan, Windows update, Windows patch, Horizon events (e.g. recompose, rebalance), VSAN events (e.g. rebalance, staging)
    • Large application install
  • Issues like runaway process can chew up CPU for >10 minutes.
    • I’ve seen this on a web based application.
  • User does not want to wait for hours when performance issue happens
    • You are user too. How long are you willing to wait for an application to launch? Yup, 1 minute 🙂

For the time being, I’d ignore Disk (IOPS) and Network, and just focus on Compute (CPU and RAM) for the time being.

As I have shared in this blog, RAM has different behaviour to CPU. As a result, we need a different counters for CPU and RAM.

For CPU, we should use the data from outside the Guest. 
For RAM, we should use the data from inside the Guest.

Picking the right counter is critical. As you can see here, choosing the wrong counter can result in wrong decision.

Set aside the technology and tool, when should we give a user more CPU or more RAM?

  • Well… when she needs more.

How do we define “more”?

We must see her workload certainly.

  • If we want to be less generous, we consider the workload in the past 1 week.
  • If we want to provide a high performance, snappy VDI experience, then any given day is enough to warrant an upsize. We don’t wait for 1 week of unacceptable performance.

Ok, how do we get insight into the workload in the past 1 day? There is no point in getting the average of the last 24 hours, as she likely only generate workload for 8 hours. Maybe even less, as she may have meetings, phone calls, or even not in the office. As we said, the average will be low.

What we need is the Max of any given 5 minutes. This gives us insight whether she demanded more resource. 5 minute is a good and balanced window. Going to 1 minute will be too sensitive. Going to 10 minutes is too long for a user to wait.

vRealize Operations provides this via its View widget. The following screenshot show that it can display the Maximum during the sample period.

CPU 1

Beside the Maximum, what else do you notice?

Yup, I show Standard Deviation. I’ve shown below how you add it. Michael Ryom has written an excellent explanation here. Please read it first.

CPU 2

I’d use a simple example. Say user Marie CPU Workload average is 50% in the past 1 day. The standard deviation is 10%. That means in the past 24 hours, 95% of her workload falls between 30% – 70%. Standard Deviation formula states that 95% of the data falls within 2 standard deviation. If the max is 95%, that means she only hit that workload 5% in the past 1 day. That’s still 72 minutes, a long time from her viewpoint. 3 Standard Deviation takes us to 99.7%. That means 99.7% of the time, her CPU workload falls between 20 – 80%. That 0.3% translates into 4 minutes in the last 24 hours. So as what Michael said, the devil is in the detail, and now you have the details 🙂

Let’s now take a real example. Notice the first one has average of 21.52%. Standard Deviation is only 2.24%. Maximum is however a whopping 96%. So it is off the range. We can tell quickly that it not normal. Since the sample period below is 24 hours (1440 minutes), that means this is a one off data in vRealize Operations.

CPU 3

Zooming into the VM to plot the entire 24 hours, we can see it’s indeed one off. Bingo! 🙂

CPU 4

Now that you have insight, you can confidently decide if that’s a one off instance, or something that does need an upsize. BTW, since this is VDI, your starting line is probably 2 vCPU (I’d avoid going 1 vCPU) and you should only increment 1 vCPU at a time. Another word, I won’t jump from 2 vCPU to 4, 6, 8. I’d go 2, 3, 4, 5 as that hits my consolidation ratio.

Yes, that’s all you need to find out which User needs more CPU. Simple, yet accurate. Sometimes as engineer, we over engineering a solution 🙂

What about RAM?

Well… that’s a topic of another blog. I want you to review this first. Let me know which counter to use!

Hint: it is not Memory Usage, Memory Consumed, Memory Workload, Memory Active

You can find the answer here.

How to measure Windows 7 Memory Usage

I did a memory usage test of Windows 7 64 bit with 4 GB RAM. Since Windows 7 takes advantage of more RAM, let’s see the result if we test with 8 GB RAM. If you are using 8 GB RAM for your VDI sizing, this test will be more applicable than the test with 4 GB RAM because Windows 7 will make use of the extra RAM. It is smart enough to utilised the hardware since it’s already given to it.

BTW, if what you want is Windows 2008, see this.

Both tests (4 GB and 8 GB) were conducted on physical desktops. I did it on physical so we know for sure there is no hypervisor impacting any reading here. I know VMware VMkernel will not impact the reading, but to assure some readers, I decided to eliminate that layer altogether. The 2 desktops are not identical but they have the same set of applications. The 8 GB desktop drives a 4K monitor, while the 4 GB desktop drives a Full HD monitor. I’m not sure if the 4K display impacts RAM, as I thought it uses the video RAM instead. One thing for sure, it’s much easier to review on 4K display!

In my system, since I used SSD, SuperFetch is disabled. This informative post provides details on SuperFetch. BTW, if you are using VDI, the Horizon 6 manual recommends disabling it. The reason is “By disabling the Windows prefetch and superfetch features, you can avoid generating prefetch files and the overhead associated with prefetch and superfetch operations. This action can reduce the growth of linked-clone machines and minimize IOPS on full virtual machines and linked clones”

With that, let’s dive to the test.

Screenshot #1

11 starting

  • I boot up the machine, and let it idle for a few minutes to ensure all start up programs have finished running. I want them to “settle down”…
  • Windows 7 makes use of all the physical RAM. The 8 GB desktop takes 1 GB more RAM compared with the 4 GB desktop. I do not have any applications running and have listed all the processes shown in Task Manager. Windows 7 takes up 3 GB right away.

Screenshot #2

13 Free did dropped to 0 for 1 second

  • I performed a similar test to the one I did on the 4 GB desktop. Essentially, I launched a lot of common apps, and opened lots of large files (>10 MB on average). For the video, I opened a 650 MB video.
  • I also forced PowerPoint to load all the slides, by going into slide sorter and made it draw all slides.
  • Naturally, CPU and Disk would spike, so I let them settle down first. My focus here is RAM.
  • The screenshot is taken after Windows 7 settles down. As you can see, CPU metrics have gone down for all processes. PowerPoint, Visio, Adobe, Word, etc. have gone down to 0%, as they are done opening files.
  • Surprisingly, Windows 7 still have 1.5 GB of free RAM. This tells me that 6.5 GB is comfortable.

Screenshot #3

13 Free did dropped to 0 for 1 second

  • Just in case you think 8 GB RAM is too much, you can easily hit it by opening more applications and files. In the screenshot above, the Free memory dropped to 61 MB. It actually touched 0 MB for a second. Windows did not seem to like the 0 value and would move some pages out.
  • The screenshot was again taken after CPU and Disk stabilised. The RAM is also stable around 6 GB. Windows “Used Physical Memory” graph at the top right does not take into account Standby Memory.
  • Should you take into account Standby? That seems debatable.

Screenshot #4

15 commit RAM also stable

  • There is another way to know if Windows needs more RAM. I recommend you read this excellent article by Ed Bott. The link to the screenshots is not working, which is one reason I recreated my own.
  • From the PerfMon, I can see that my Commit Limit is 16 GB. Commit Limit = Physical RAM + Virtual RAM. In my case, that’s 8 GB of RAM and 8 GB of PageFile.sys. So 16 GB is all I have.
  • % Committed is what is currently committed / Commit Limit. In my case, Windows commits 6.2 GB. The value is stable, so the % Committed is stable at 38%.
  • Should you use your Virtual RAM? There are different opinions on the Internet. I personally prefer what I recommend at the end of this blog.
  • BTW, the pagefile.sys is typically located in C:\ directory. It’s hidden, just like the file for hibernation. By default, Windows 7 automatically manages the pagefile. I notice the same behaviour in Windows 8.1. Both basically creates a pagefile the size of your physical RAM. Whether pagefile is good or bad, that is again debatable…. My experience is at 8 GB, you need it. Windows 7 performs poorer without it.
  • One thing for sure. Pagefile is not swap file. Swap file is used when Windows runs out of physical RAM. Pagefile is used proactively. Just because you’re seeing activity in pagefile.sys does not mean Windows is running out of RAM.

Screenshot #5

16 impact of closing all apps

  • To complete the test, I closed all applications. You can see in the top right corner, the chart changes in value in tandem as Windows closed the applications.
  • Interestingly, the usage did not go back to original. Windows is using ~4.6 GB of RAM. I think some applications do not actually leave the RAM. Skype, Chrome are examples of such applications. I have seen Chrome taking up >1 GB of RAM if you let it run for days.
  • BTW, Windows 7 keeps the pages in the Standby memory. It does not move it out after a few minutes. To me, this makes engineering sense. It is the same strategy adopted by ESXi VMkernel, which is why you see the Memory Consumed counter to be high.

Screenshot #6 (video)

22 normal playing does not result in growing RAM - it is steady with minimal page fault

  • I’m curious the impact of playing video on RAM. I played a 650 MB video. Surprisingly, it did not occupy 650 MB. In fact, it occupied only 360 MB. How I know is the Free memory dropped by 360 MB. I played the video in full size (1:1), not full screen, which explains why it did not occupy the full screen as it’s 4K display.
  • During the play, the memory counter did not slowly go up. It remains essentially the same, as you can see above.
  • I thought perhaps because I was simply watching the video normally (sequentially). So I jumped along the video, forcing it to play randomly. I would click toward the end, let it play, then immediately click somewhere at the beginning. It is a 27 minutes video, so I have plenty of timeline to click. It’s interesting to see that this random jump does not result in page fault. It is as if the entire video is already in memory. The counter Hard Faults/sec barely moved. Perhaps it was reading from disk directly?

Conclusion

  • If you want the best performance, use Total – Free as the sizing.
  • If you want a more cost effective solution, use Total – Free – Standby.  This would result in around 1-3 GB less RAM.
  • Let Windows manage the pagefile. This is the default setting anyway. I noticed a visibly slower performance even though Windows showing >1 GB of Free memory. In fact, Windows gave error message, and some applications crashed.
  • The % Committed metric should not hit 80%. Performance drops when it hits 90%, as if it’s a hard threshold used by Windows. If you use a pagefile, you will not hit this limit.
  • In general, I’d size Windows 7 between 4 – 8 GB of RAM, depending on the users. I’d use the following guidelines
    • 4 GB for light user
    • 8 GB for average
    • 12 GB for heavy. Yes, that’s 12 GB as I’ve seen my customers hit near 0 Free Memory, and he is just a “normal, average user”. He is an IT Manager.

Additional Resource

You might be wondering what those memory counters mean in Task Manager. Naveed Qadri explains it well here, so please read it.

  • Cached = Standby + Modified
  • Available = Standby + Free
  • Free = Free + Zero.

The difference between Cached and Available is Cached uses Modified, while Available uses Free. Available means exactly what the word means. It is the amount of physical memory immediately available for used.

Another article on Windows 7 memory management that I found useful was one written by Brandon Paddock. You can find it here. He wrote a program that proves that Commit can go up without In Use going up. I’m going to quote a sentence, so you can find it in his blog.

“Notice how my physical memory usage is unchanged, despite the fact that Commit has now increased by the full 2.3GB of that file."

Committed RAM can go beyond the physical RAM, as it takes into account pagefile.sys. The Commit Limit is typically 2x your physical RAM. In the example that Brandon gave:

"In fact, my commit value is now 6GB, even though I have only 4GB of physical memory and less than 3GB in use.”

Mark Russinovich explains in this technet blog something that you need to know. There is Reserved memory, and then there is committed memory. Some applications like to have its committed memory in 1 long contiguous block, so it reserves a large chunk up front. I can think of Databases and JVM in this example. This reserved memory does not actually store meaningful application data or executable. Only when the application commits the page that it becomes used. Mark explains that “when a process commits a region of virtual memory, the OS guarantees that it can maintain all the data the process stores in the memory either in physical memory or on disk”.

Notice the word “on disk“. Yes, that’s where the pagefile.sys comes in. Windows will use either the physical memory or the pagefile.sys.