Capacity Management: What’s wrong with these statements?

Can you figure out why the following statements are wrong? They are all well meaning advice on the topic of Capacity Management. Usually, the questions are:

  • How is my IaaS performing? What’s the performance of my VMware environment?
  • How healthy it in terms of performance and capacity? Am I running a risk?

 

Regarding Cluster CPU:

  • Physical Core to Virtual CPU Ratio is high at 1:5 times on cluster “XYZ” since this is an important production cluster.
  • The rest of your clusters overcommit ratio looks good around 1:3. Maintain it below this ratio and you’re safe.
  • Keep the over commitment ratio to 1:4 for Tier 3 workload.
  • CPU usage is around 70% on cluster “ABCDE”. Since they are UAT servers, don’t worry. You should get worried only when they reach 85%.
  • The rest of your clusters CPU utilization is around 25%. This is good! You have plenty of capacity left.

Regarding Cluster RAM:

  • We recommend 1:2 overcommit ratio between physical RAM and virtual RAM.
  • Memory Usage on most of your clusters is high, around 70%. You should aim for 50%
  • Cluster “ABCD” is running peak at around 75%. CPU utilization should be less than 70%, so move some VMs out.
  • If you see Active Mem% is high than we should add more RAM to cluster.
  • The counter Memory Active (%) should not exceed 50-60%.
  • Memory should be running at high state on each host.

I’m sure you have heard them, or even given them, in the past. In the past 7+ years with VMware, I know I personally have both given them and heard about them.

The scope of the statements below is obviously a VMware vSphere Cluster. Cluster is the smallest logical building block, due to HA and DRS. So it is correct that we do capacity planning at Cluster level, and not at Host level or Data Center level.

I have highlighted the parts where the mistakes lie. Can you figure them out?

You should notice a trend by now. They have something in common.

The above statements are wrong as they focus on the wrong item. It’s looking at the cluster, when it should be looking at the VM. It’s looking at the Supplier (Provider), when it should be looking at the Consumer (Customer). What’s important is your VM.

Think of your IaaS business like a restaurant business. It has Dining Area, where your customers live, and Kitchen, where you prepare the food. Guess which one is more important?

You’re right. The dining area.

If everything runs smoothly in the dining area, customers are eating happily, and they are being served on time and on quality, it is a good day for the business. Whether you’re running around in the hot kitchen is a separate, internal matter. The customers need not know about it. The VM Owner does not care if you are fire fighting in the data center.

You should focus on the Customer. Focus on the VM, not the IaaS. How do you do that? Review this solution and let me know your thought. Warning, it might make you re-think of your architecture 🙂

Leave a Reply