Tag Archives: capacity management

vSphere Capacity Reclamation

This post continues from the Operationalize Your World post. Do read it first so you get the context.

There are 5 Reclamation levels you can do. Start from the easiest one first.

Let’s go through the table above:

  • Non VM is the easiest, because they are not owned by someone else. They are yours! Non VM objects, such as template and ISO should be kept in 1 Datastore per physical location. Naturally, you can only reclaim Disk, and not CPU & RAM.
  • Orphaned VM and orphaned vmdk are next, as they are not even registered in vCenter. If they are, they may appear italicized, indicating something wrong. They may not have owners too. Take note vR Ops 6.4 cannot check orphaned vmdk.
  • Powered Off VM is harder, as there is now owner of the VM. You need to deal with VM Owner before you delete them.
  • Idle VM is a great target, as you can now claim CPU and RAM when you power them off. You cannot claim disk yet as you are not deleting them yet.
  • Active VM is the hardest. Focus on large VM. Take on CPU and RAM separately. Easier to tackle when you split them. Divide and conquer.
  • Claiming CPU and RAM from Small VM can be futile, regardless if it’s idle. A idle VM with 1 vCPU cannot be further reduced. It should be powered off. By powering them off first, it’s a safer procedure. You can simply power on if the VM is actually being used.
  • Snapshot. This is actually not as hard as CPU and RAM, hence in the actual dashboard we list them separately.

Why do cars have brake?

So it can go faster!

Take advantage of Powered Off as the brake for your Idle VMs. If you treat Idle and Powered off as 1 continuum, you can power off the Idle VMs earlier. You get the benefit of CPU and RAM reclamation.

What value is considered as Idle?

  • It has to be defined, so it’s measurable and not subjective. Declare it as a formal policy, so you don’t end up arguing with your customers.
  • Default setting in vR Ops policy is CPU Demand = 100 MHz. A VM using 100 Hz or less CPU will be considered Idle.
  • While a VM uses CPU, RAM, Disk and Network, we only use CPU as a definition for Idle. I think there is no need to consider all 4, and states all 4 must be idle, because they are inter-related. It takes CPU cycle to process Network Packets and perform Disk activity. Data from NIC and Disk must be copied to RAM also, and the copying effort requires CPU cycle.
  • How long has it been under that threshold?
    VM does not use CPU non-stop for months. There are times it’s idle, and it’s normal. A month-end VM that processes payroll can be idle for 29 days! The default value of 90% will miss this.

Because of these month-end VMs, I recommend you change the definition from 90% to 99%. Even 99% for 30 days can still wrongly mark active VM as Idle. 1% active means it’s only active for a total of 8 hours (0.3 days) in 30 days. Notice it’s a total, not 1 continuous 8 hours. It’s accumulative within 30 days.

A VM that is idle for 30 days straight, then active for 8 hours, will only need 8 hours to be marked as non idle. A VM that does not accumulate 8 hours, will obviously need more time. The Idle decision logic runs only every 24 hours. So the VM may be marked idle for days after it’s gone active.

The drawback of setting at 99% is we wait the full 30 days before deciding. In some corner case, the VM may never be marked as idle. Take a scenario:

  • A VM was active and serve its purpose for months.
  • After 2 years, the application is being decommissioned as new version released.
  • As a result, the VM goes idle, as it is simply waiting to be deleted. But because we set at 99%, the logic will wait for the full 30 days before deciding.
  • It’s consuming CPU/RAM during the period, as basic services like AV and OS Patches still run. If these non-app workload adds up to >8 hours in 30 days, the VM will never be marked as Idle

Solution: increase threshold from 100 MHz to the amount you think it’s suitable. If possible, power off the VM if it’s really not used.

Powered Off is simpler than Idle, as it’s binary.

A VM that has been powered off for at least 15 days, will take 15 days for it to be marked as Powered On. This creates problem as it’s not a VM you can reclaim.

Solution: add “Is it Powered On now?” into the formula. If a VM is running, it’s no longer considered powered off right away.

This is where the setting is in vR Ops 6.4.

You need to modify the value in your active policy:

  • Change idle from 90% to 99%
  • Change powered off from 90% to 50%

The above is the first of a set of vR Ops dashboard for Capacity Reclamation. I added a short Read Me for 2 reasons:

  • There are 4 dashboards.
    1. The dashboard above
    2. Idle VMs and Powered Off VM. See below.
    3. Active VM: CPU. See this
    4. Active VM: RAM. See this.
  • Reclamation is quite complex when you look at the details. There are many things we can reclaim.

You can replace the Read Me widget with a picture if you know the target screen resolution. I didn’t use image as it will make your import harder.

The above is the 2nd dashboard. It shows the Powered Off VMs and Idle VMs.

The summary at the top tells how much you can reclaim. The table shows where you can claim it.

For the powered off VMs, the widget gives the summary. It tells you how many VMs, and how much space. The table provides details.

The numbers will not be identical due to rounding. The summary is shown in TB while the table in GB. Just in case you’re wondering. 3.7 TB is the correct rounding for 3769.36 GB as there are 1024 GB in 1 TB. 3769/1024 is actually less than 3.7 TB.

Right-sizing Virtual Machines

This post is part of Operationalize Your World program. Do read it first to get the context.

Over Provisioning is a common malpractice in real life SDDC. P2V is a common reason, as the VM was simply matched to the physical server size. Another reason is conservative sizing by vendor, which was further added by Application Team.

I’ve seen large enterprise customers try to do a mass adjustment, downsizing many VMs, only to have the effort backfired when performance suffer.

Since performance is critical, you should address it from this angle. Educate VM Owner that right sizing actually improves performance. Carrot is a lot more effective than stick, especially for those with money. Saving money is a weak argument in most cases, as VM Owners have paid for the VMs.

Why oversized VM is bad

Use the picture below to explain why they are bad for VM Owner

  • Boot time.
    • If a VM does not have reservation, vSphere will create a swap file the size of the configured RAM. The bigger the VM, the longer it takes to create the file, especially in slow storage.
  • vMotion
    • The bigger the VM, the longer it takes to do vMotion and Storage vMotion.
  • NUMA
    • This happens when the VM cannot fit into a single socket.
    • It also happens when the VM’s active vCPU are more than what is available at that point in time. Example:
      • you have a 12 vCPU VM on a 12-core Socket. This is fine, if it’s the only VM running on the box. In reality, you have other VMs competing for resources. If the 12 vCPU wants to run, but Socket 0 has 6 free cores and Socket 1 has 6 free cores, the VM will be spread across both sockets.
  • Co-Stop and Ready
    • It will experience higher co-stop and ready time. Even if not all vCPU is used by the application, the Guest OS will still demand all the vCPUs be provided by the hypervisor.
  • Snapshot
    • Longer to snapshot, especially if memory snapshot is included.
  • Processes
    • The Guest OS is not aware of the NUMA nature of the physical motherboard, and thinks it has a uniform structure. It may move processes within its own CPUs, as it assumes it has no performance impact. If the vCPUs are spread into different NUMA node, example a 20 vCPU on a box with 2-socket and 20 cores, it can experience the ping-pong effect.
  • Visibility
    • Lack of performance visibility at the individual vCPU or virtual core level. Majority of the counter is at the VM level, which is an aggregate of all of its vCPU. It does not matter whether you use virtual socket or virtual core.

Impacts of Large VMs

  • Large VMs are also bad for other VMs, not just for themselves. They can impact other VMs, large or small. ESXi VMkernel scheduler has to find available cores for all the vCPUs, even though they are idle. Other VMs maybe migrated from core to core, or socket to socket, as a result. There is a counter in esxtop that tracks this migration.
  • Large VMs tend to have slower performance. ESXi may not have all the available vCPU for them. Large VMs are slower as all their vCPU have to be scheduled. The counter CPU Co-Stop tracks this.
  • Large VMs reduce consolidation ratio. You can pack more vCPU with smaller VM than with big VM.

As a Service Provider, it actually hit your bottom line. Unless you have progressive pricing, you make more money with smaller VM as you sell more vCPU. Example. If you have 2 socket, 20 cores, 40 threads, you can have either:

  • 1x 20-vCPU VM with moderately high utilisation
  • 40x 1 vCPU VM with moderately high utilization

In the above example, you sell 20 vCPU vs 40 vCPU.

Approach to Right-sizing

Focus on large VM

  • Every downsize is a battle because you are changing paradigm with “Less is More”. Plus, it requires downtime.
  • Downsizing from 4 vCPU to 2 does not buy much nowadays with >20 core Xeon.
  • No one likes to give up that they are given, especially if they are given little. By focusing on the large ones, you spend 20% effort to get 80% result.

Focus on CPU first, then RAM

  • Do not change both at the same time.
    • It’s hard enough to ask apps team to reduce CPU, so asking for both will be even harder.
    • If there is a performance issue after you reduced both CPU and RAM…., you have to bring both back up, even though it was caused by just one of them.
  • RAM in general is plentiful as RAM is cheap
  • RAM monitoring is hard to measure, even with agents. If the App manages its own memory, it needs application-specific counter. Apps like Java VM, Databases do not expose to Windows/Linux how it manages its RAM.

Disk right size needs to be done at Guest OS partition level

  • VM Owners won’t agree that you resize their partition.
  • Windows and Linux have different partition names. This can make reporting difficult across OSes

Technique

The technique we use for both CPU and RAM are the same. I’d use CPU as an example.

The first thing you need is to create a dynamic group that capture all the large VMs. Create 1 group for CPU, and one for RAM.

Once you create a group, the next step is to create a super metric. You should create 2 super metrics:

  1. Maximum CPU Workload among these large VMs
    • You expect this number to be hovering around 80%, as it only takes 1 VM among all the large VMs for the line chart to spike.
    • If you have many large VMs, one of them tend to have high utilisation at any given time.
    • If this number is low, that means a severe wastage!
  2. Average CPU Workload among these large VMs
    • You expect this number to hovering around 40%, indicating sizing was done correctly.
    • If this chart is below <25% all the time for entire month, then all the large VMs are over sized.

You do not need to create the Minimum. There is bound to be a VM who are idle at any given time.

In a perfect world, if all the large VMs are right sized, which scenario will you see?

All good. The 2 line chart shows us the degree of over provisioning. Can you tell a limitation?

It lies in the counter itself. We cannot distinguish if the CPU usage is due to real demand or not. Real demand comes from the app. Non-real demands comes from the infra, such as:

  • Guest OS reboot
  • AV full scan.
  • Process hang. This can result in 100% CPU Demand. How to distinguish a runaway process?

If your Maximum line is constantly ~100%, you may have a runaway process.

Now that you’ve got the technique, we are ready to implement it. Follow these 2 blogs

  1. CPU right sizing.
  2. RAM right sizing

Presentation from VMware vForum Singapore

My good buddy Sunny Dua and I had the joy of co-presenting at VMware vForum Singapore. We had 2 sessions, but sadly he was only able to make it for 1 due to his engagement.

The first session was a 90 minute workshop with just 40 people. The audience was capped at 40. On hindsight, we should have allocated more as it was filled up fast, and folks were forming long queue! The room was full.

The second session was a high level session of just 30 minutes, with around 200+ audience. It’s open to all.

There were a lot of questions during the 90-minute workshop, as the audience realized (pun intended) as they need to change their mindset. All these years I’m doing performance and capacity management, it’s amazing how many customers are still not clear on the difference. This is not surprising, because there is no difference between performance and capacity in HDDC.

You can get the full deck from here. It builds from our VMworld deck, and we added more depth as we had more time.

Feel free to use it, and let’ us know how it has helped you. One thing that keeps me going in sharing the knowledge is the many emails, WhatsApp, LinkedIn message I got from customers/partners on how changing their paradigm has helped them in managing their SDDC better. They’d been managing it like a HDDC all along without realising it.

All the best in SDDC Operations!