Tag Archives: RAM

Right sizing VM Memory without using agent

This post continues from the Operationalize Your World post. Do read it first so you get the context.

The much needed ability to get visibility into Guest OS Memory is finally possible in vSphere. Part of the new features in vR Ops 6.3, you can now get Guest OS RAM metrics without using agent. So long you have vSphere 6.0 U1 or later, and the VM is running Tools 10.0.0 or later, you are set. Thanks Gavin Craig for pointing this out. The specific feature needed in Tools is called Common Agent Framework. That removes the need for multiple agents in a VM.

As a result, we can now update the guidance for RAM Right Sizing:

For Apps that manage its own RAM, use metrics from the Apps.
For others, use metrics from the Guest OS.
Use vR Ops Demand if you have no Guest OS visibility. Do not use vCenter Active.

Examples of applications that manage its own RAM are JVM and Database. If you use Guest OS counter, you can result in wrong sizing and make situation worse. Manny Sidhu provides a real example here. The application vendor asked for 64 GB RAM when they are only actively using 16 GB, as he shared in the vCenter screenshot below.

For apps that do not manage its own RAM, you should use Guest OS data. The table below compares 63 VMs, using a variety of Microsoft Windows. A good proportion of them are just idle, as this is a lab, not real life production.

  1. What conclusion do you arrive at? I’ve added a summary at the bottom of the list.
  2. How do you think VM Consumed vs VM Active vs Guest OS Used?

comparison-windows

And the table below shows comparison for Linux.

What do you spot? What’s your conclusion? How does this change your capacity planning? 😉

comparison-linux

Here is the summary for both OS. Total is 101 VM, not a bad sample size. I’ve also added comparison. Notice something does not add up?

total

To help you compare further, here is a vR Ops heatmap showing all the VMs.

compare

I created a super metric that compares Guest OS metric with VM Active. As expected, Guest OS is higher as it takes into account cache. It’s not just Used, and Windows does use RAM as cache (I think Linux does too, but not 100% sure).

The super metric is a ratio. I divide Guest OS : VM Active. I set 0 as Black, 5 as yellow, and 10 as red. Nothing is black, as VM Active is lower than Guest in all samples.

Conclusion

  • VM Consumed is always near 100%, even on VM that are idle for days. This is expected, as its nature as a cache. Do not use it for right sizing.
  • Windows memory management differs to Linux. Notice its VM Consumed is higher (94%) than Linux (82%). I guess it’s writing zero during boot creates this.
  • VM Active can be too aggressive as it does not take into account cache. vR Ops adds Demand counter, which makes the number less aggressive.
  • Guest OS Used + Cache is much greater than VM Active or VM Demand. It’s 69% vs 15% vs 31%
  • Guest OS Used + Cache + Free does not add up to 100%. In the sample, it only adds to 83%

Based on the above data, I’d prefer to use Guest OS, as it takes into account cache.

  • Side reading, if you need more info:
    Refer to this for Windows 7 metrics, and this for Windows 2008 metrics. 
    This is a simple test to understand Windows 7 memory behaviour.

You can develop a vR Ops dashboard like the one below to help you right size based on Guest OS. Notice it takes similar approach with the dashboard to right size CPU.

vm-right-sizing-memory

The dashboard answers the following questions:

  • How many large VMs do I have? What’s the total RAM among them?
    • Answered by the scoreboard widget. It only shows large VM (default is >24 GB RAM) which is powered on and has Guest OS metric.
  • Are the large VM utilizing the RAM given to them?
    • Answered by the 2 line charts:
      • Maximum Guest OS Used (%) in the group
      • Average Guest OS Used (%) in the group
    • In general, Guest OS Used can hit 100% as Windows/Linux takes advantage of the RAM as cache. Hence you see the peak of Used is high.
  • Where are these large VMs located?
    • Answered by the heat map.

The dashboard excludes all VMs that do not have Guest OS RAM data. Since not all VMs have Guest OS RAM data, the first step is to create a group that only contains VMs with the data. Use the example below.

group

You should also manually exclude app that manages its own memory.

Notice the Group Type is VM Types. Follow that exactly, including the case!

Once you created the group type and group, the next steps is to download the following:

  • Super metrics. Don’t forget to enable them!
  • Views
  • Dashboard

You should download the dashboard, view, super metric and the rest of Operationalize Your World package.

You can customize the dashboard. Do not be afraid to experiment with it. It does not modify any actual metric and object as dashboard is just a presentation layer.

Take for example, the scoreboard. We can add color coding to quickly tell you the amount of RAM wasted. If you have > 1 TB RAM wasted, you want it to show red.

customize

To do that, it’s a matter of editing the scoreboard widget. I’ve added thresholds, so it changes from green to yellow when I cross 500 GB, to orange when I cross 750 GB, and to red when I cross 1 TB.

scoreboard

Hope that helps. I’m keen to know how it helps you right sizing with confidence, now that you have in-guest visibility.

How to monitor Windows RAM usage with vRealize Operations 6.1

[4 Sep 2016: vRealize Operations 6.3 can retrieve Guest OS RAM without agent]

I covered how to monitor Windows RAM usage in earlier blog posts. I did Windows 7 x64 in this post and Windows 2008 R2 in this post. The information shows that the way Windows manages memory is not visible to the hypervisor. Some counters it uses are not visible, as they are acting as cache. BTW, you can use those counters when monitoring physical machines, useful in sizing them before you P2V them into a VM.

vRealize Operations 6.1 delivers a good enhancement in Guest OS visibility. It uses an agent inside the Guest OS, called End Point agent. Having this in-guest visibility improves vRealize Operations accuracy.

Let’s take an example. I will use Windows 2012 this time around, as I have used Windows 2008 and Windows 7 in previous blogs.

The Windows VM runs a small MS AD and DNS features, and support the VMware ASEAN lab. It’s not providing other features or services. Let’s see the system configuration. The System Information shows it has 8 GB of Physical Memory + 1.25 GB of Virtual Memory. For a small MS AD doing just DNS and AD, 8 GB is more than enough. That probably explains why the pagefile.sys is only 1.25 GB.

Page file

What about the usage (utilization)? A quick snapshot using Resource Manager shows that Available is hovering around 6.4 GB. This indicates I have plenty of RAM, as I have 8 GB of physical RAM. The RAM usage is also quite stable, with very little hard fault per second.

Resource Manager

I use the word “around” 6.4 GB, as there is a difference between how vRealize Operations and Windows sample the data.

Windows PerfMon and Resource Manager reports every second. This is different to vRealize Operations, which reports every 5 minutes. 5 minutes is 300 seconds, and it is an average. So expect the data to be different as the sampling length is different. vRealize Operations use case is for Monitoring tool, while tools that can go down to more granular level (Log Insight, Windows counters, esxtop) is more suitable for Troubleshooting. They are not so suitable for overall monitoring, as you will induce performance penalty while monitoring. In addition, you really do not want to react to every spike that lasts only a few seconds. There is a good chance it does not impact the business.

With that, let’s take a look the first metric. Commit Limit is important as a growing value is an early warning sign. Remember Windows proactively increases its pagefile.sys if it’s under memory pressure.

I’m expecting the Commit Limit to be a constant 9.25 GB, as my AD is not under memory pressure at all. In bytes, this is 9,931,640,832 bytes. So that’s the value I’m expecting from vRealize Operations EP Agent.

vR Ops Commit Limit

Bingo! The value matches. More importantly, it’s showing a constant value, meaning Windows 2012 is not under memory pressure. I need to maintain this value to be 16 GB or below. I’m surprised Windows 2012 uses a very low page file. Windows 2008 would have probably set it to 16 GB.

Tips: create a super metric that tracks the ratio of pagefile.sys to total RAM. 
If it is >1.0, you need to add RAM.

The Commit Limit, while it is a good indicator, does not change frequently. It’s also not telling how much RAM is actually used, and how much RAM is free. In Windows (2012, 2008, 7 x64), the counter Available Memory tells you how much RAM is available. It consists of Standby RAM and Free RAM. It is possible that the data in Free Memory dropped to near 0 in a short period. Windows manages its memory actively and will increase it.

Performance Monitor shows in the following screenshot that I have around 6.4 GB of available RAM. This is consistent with a snapshot I saw at Resource Monitor.

Available and Free

Let’s now look at the data at vRealize Operations 6.1 EP agent. Since it is a 5 minute average, it can be slightly different. The good thing is business requirements do not dictate us to be so accurate to the nearest megabytes. In fact, sizing to the nearest GB is acceptable. vRealize Operations shows that I’m using around 20%. 80% of 8 GB is around 6.4 GB, which is shown by the Free Memory. Again, this matches what I saw inside Windows.

20% may seem on the low side of memory utilisation. As I shared earlier, Windows would take advantage of the RAM given to it. So you need to include the Standby RAM and Modified RAM also, if you want to be more conservative in your sizing. If you add them, you will see higher utilisation. This means you have 2 choices of sizing: Conservative or Cost Effective

Cost Effective: Memory Used.
Conservative: Memory Used + Cached Memory

Following the KISS principle, I’d aim for 90% utilisation (Used + Cached) as the healthy range for both server workload and VDI workload. In this specific case of VMware lab, we know actually that AD does not use 80% actively, as the cached size is large. In this case, I probably can live with 5 GB comfortably.

vR Ops Usage 2

For reporting convenient, vRealize Operations also provides the Free Memory counter. The 2 counters add up to 100%, as you can see below. The Free Memory only started to show up at 1 pm as it was not enabled by default. I enabled it manually around that time.

vR Ops Usage

Memory Usage value from outside the Guest OS

At this point you may ask me what the value as seen from outside the Guest OS, i.e. from the hypervisor. The value is different. As you can see, the pattern is different. Yes, the absolute value is similar, which means you can use the value as a gauge.

There are other cases where the difference is not negligible. Let’s show some.

compare

I’ll show you an example where the difference is big enough, that it resulted in a false alarm by the hypervisor. I use the word hypervisor, as it should apply to all hypervisor, not just vSphere.

In the chart below, vRealize Operations show 3 values: A, B, and C.

A is the memory usage counter from the hypervisor. C is the Memory Used counter from inside the Guest OS. The delta is quite significant. The pattern is also different.

a Windows 2008 running MS AD - 7

The value in vCenter is hovering around 90%, and it actually triggered an alarm.

Windows 2008 running MS AD from hypervisor alarm

Let’s zoom into to see the various other counters, just in case we’re missing something. As you can see, there is no ballooning, swapping, and compression. Memory latency is also 0. Another word, there is no hypervisor-related factor that can impact the VM internal memory usage.

Windows 2008 running MS AD from hypervisor alarm 2

The above example shows a situation where the hypervisor was over-reporting. What about under-reporting? Can the hypervisor show a value that was too low?

Let’s look at the example below.

a comparison install EP agent

The counter from hypervisor reported that the VM above was doing <20%. The Memory Usage counter was stable, hovering around 15 – 20%.

What I did then was installing the EP Agent. As you could see, the EP Agent started providing the metrics.

Interesting, the Memory Usage counter rose from around 20% to 90%. Not only that, it stayed there. No, the agent does not consume excessive RAM. It’s just the way Active Memory is sampled. Please review this excellent article by Mark Achtemichuk.

Installing the EP agent had a significant impact on the Memory Usage counter. The value from vCenter moved from being lower to being higher. I let the VM stabilised for several hours. The values did not change. Both the Memory Usage and Memory Used maintains their pattern.

I decided to trigger a long running work. In this case, I decided to update Windows. This VM has not been patched for probably over 2 years, so it had a lot of patches. The entire patching took several hours. You can see the impact on the counters. The 2 counters began their own track, and you can see that they do not match. You have both under-reporting and over-reporting scenarios.

compare 2

I hope the article has been useful. I do encourage you to install the EP agent. If you are using the Advanced edition or higher, it’s already licensed.

This blog article covers Windows, not Linux, because I do not want to make the assumption until I have tested it. In future, I hope to test Linux and share it here.

How to measure Windows 2008 Memory usage

The counters you use to monitor Windows 2008 R2 RAM are the same with what you use to monitor Win 7.

  • If you want the best performance, use Total – Free as the sizing.
  • If you want a more cost effective solution, use Total – Free – Standby.
  • Keep % Committed to below 80%
  • Ensure pagefile.sys is not larger than the physical RAM. If Windows increase it, it’s a sign of it’s not having enough RAM.

To avoid repeat of blog, you can see the details recommendation here.

Windows 2008 R2 manages memory slightly differently to Windows 7 x64. While it is based on the same kernel, the difference is visible from what a small test that I conducted. I encourage you to perform your own real world test, not synthetic, in your environment.

Windows uses both the physical memory and virtual memory concurrently. The challenge is, if you look at the actual metrics, it is not clear how it actually prefers one over the other. In my case, Windows still had >300 MB of physical memory (on Standby and Free) yet Windows used the virtual memory extensively already. The paging file usage was >50% and it had increased the pagefile.sys size by 10%. Performance, at least from the UI, was however still good. This is different to Windows 7 x64, where the performance visibly dropped. Windows 7 in fact gave warning messages. Windows 2008 R2 just continued running well as if it had no memory shortage. The screenshot below showed the Standby actually dropped to 46 MB and Free dropped to a mere 8 MB in Windows 2008.

Windows 2008 running MS AD - 5 windows patch no warning whatsoever

Windows 7 64 bit, on the other hand, gave warning when it still had plenty of RAM. See the screenshot below. From the Task Manager, you can see that it only used 6.15 GB, meaning I still had 1.85 GB of physical RAM. From the Resource Monitor, I had 1709 MB Available RAM (Standby RAM + Free RAM). Plenty of physical RAM, as I disabled the paging file. Yet, Windows already gave a warning message, suggesting me to close an application. It suggested that I closed Windows Explorer.

0 Windows gave error message even though I have 1 GB free

I ignored it, since I had plenty of RAM. I opened a small application, which I knew would fit easily. I chose MS Paint. To my surprise, Windows 7 did not tolerate it. It opened MS Paint, but it gave an error. In fact, beyond giving warning, some applications actually crashed. In my case it was both MS Word and Google Chrome.

0 Hit issue - Word hit issue, Chrome hit issue

Another difference between Windows 7 x64 and Windows 2008 R2 is SuperFetch is disabled in the later. In fact, the service is hidden from the Services list. In Windows 7, it is not hidden. It is automatically disabled when Windows 7 detects it runs on SSD.

10 Windows7 superFetch disabled

Back to the Windows 2008. Since the memory is running low, it makes sense to increase. I changed it from 3 GB to 4 GB, and continued the update/patching process. Since I have more RAM, I also update Java at the same time. Notice the RAM is healthier now.

MS AD with 4 GB RAM - patch n Java

The next screenshot was after Java was installed, and it’s now doing Windows updates installation. While the Free Memory was low, the Standby Memory was acceptable. If you want to be conservative, you can increase it to 5 GB. For my use case, 4 GB is good enough.

MS AD with 4 GB RAM - patch - better 2

I hope you find the information useful. Follow this to see how to use vRealize Operations 6.1 End Point to monitor Windows memory.