Monthly Archives: October 2014

Performance and Capacity Management in virtual datacenter

[30 Nov 2015: this was the reason why I wrote the 1st edition. While the 2nd edition has major changes, the high level goals of the 2nd edition remain the same]

Ever wonder what those counters mean in vCenter and vCenter Operations? For example, what does Memory Contention really mean, where does it get its value from, and what values you should expect in a healthy environment?

I was puzzled myself. I’ve been with VMware for 6+ years, and I do not see them being documented. Scott Drummonds did document some of them at when he was doing performance. But I have not seen a complete one, especially one written from VMware Admin’s view point (as opposed from a product view point). I want to explain from the view point of the person whose hands are on the keyboard when the VMware platform is not performing as expected.

I’m writing a book to address it. As a field person, I work closely with customers. I see first hand that there are related issue which makes performance and capacity management difficult in virtual world. The gaps explain why customers struggle with virtualisation. Because there are several gaps, I address them “top down“, starting from high level then move to technical matters. There are >100 pages on counters, so this is going to get deep 🙂

Chapter 1 covers the big picture. I aim to correct some architecture and operations mistakes that customers make. It is very rare for customers to master the virtual world, both architecturally and operationally. Part of the reason is the software-Defined Data Center (SDDC) is not yet matured. The biggest reason, however, is the lack of deep understanding of what exactly virtualisation means to IT. As we all know, small leaks sink the ship, so I’m going to expose the details.

Chapter 2 continues the theme, but I dive into Capacity Management. I’m hoping to share how you do capacity management. Once you know what it takes, you can use a tool to automate it.

Chapter 3 dives into the metrics and how CPU, RAM, Disk and Network change once you add a layer called hypervisor. You need to know this, as there are behaviour that completely change.

Chapter 4 – 7 document the counters in vCenter and vC Ops. One chapter for CPU, RAM, Network and Disk. The book goes beyond documenting the metrics. It also applies them so customers can operationalise their virtual platform.

Chapter 8 provides examples of dashboards. The book was practically born in the field, sitting down with customers in creating all those dashboards and performing troubleshooting.

The book is with Packt Publishing. There are many reasons why I chose to work with them. The plan is to have the book out at the same time with the GA of vRealize Operations 6.

vRealize Operations: screen resolution requirement

The screenshots below is based on Full HD resolution. So hopefully this helps in justifying why you need a nice 24” screen monitor 🙂

I’ve been playing with the next release also. While the UI changes drastically, the screen real estate requirements do not change. You will be creating many custom dashboards, with interactions among the widgets, so having a Full HD resolution helps. The Line Chart will be easier if you need to show a long period of time (say more than 1 week). In a large environment, your Top N will also easier to read if you do not have to scroll. The Heat Map will be clearer if you can give it more real estate.

image001-758649 image002-755804

The Chef and his cooking – Story of VMware Admin

I see a lot of VMware Admin/Engineers/Architect at end-user environment do not extend his/her influence beyond architecture. I think that’s a lost opportunity because Operations and Architecture are like Yin and Yang. Or Mobius strip.

I shared the idea that as the creator of the platform, we have to have interest on how it’s operated. It was an impromptu presentation at our VMUG Singapore back in mid 2014, hence no slide.

I used analogy about restaurant.

The restaurant business provides a good analogy to our Infra-as-a-Service business. We (Virtualisation Architect/Engineer/Admin) are the Chef. In that end-user environment where you work, you are the expert in producing what your customers want. You architect and design a solid platform, where your customers can confidently run their VMs. If there is an issue, you often get involved, restoring their confidence in your creation. You are seen as the VMware guy, or the virtualization expert. Yes, you may engage VMware PSO or SI, but they are not working for the company. You are the employee. As far as your customers concern, the buck stops at you.

You do not sell hardware nor software. You charge your customers per VM. In fact, to ensure that your customers order the right kind of VM, you need to charge per vCPU, per vRAM and per vDisk. The chargeback model is something that I very rarely see we discuss. We tend to stay in technical discussion. We need to realise we are no longer just a System Builder. We are Service Provider. By not extending our circle of influence into how App Team should pay for our service, we created the issue we have today (Oversized VM, dormant VM, VM sprawl). We need to “step out from the kitchen” from time to time. We need to be like Chef who step out to the dining area, building relationship with his customers, explaining the reason behind his cooking.

As the Architect/Engineer, we are the best person to determine how much it should be charged. We build this thing. We know the costs, and we know the capacity. Not convinced? Put it this way, would you rather someone else determine how much your creation is worth?

We all know that IT exists because of Business. It starts with the Business. Some of the issues we have are caused by unsuitable chargeback model and incorrect Service Tiering. The VM in Tier 1 (mission critical) platform cannot cost the same with the VM in Tier 3 (non prod). I’d make sure there is distinct difference in quality between Tier 1, Tier 2 and Tier 3, so it’s easy for business to choose. Need a good example? Review this.

Using the restaurant analogy, say you cook fried rice. It’s your dish. You need to determine the price of the fried rice. You also need to be able to justify why you have normal fried rice and special fried rice, and why the special one costs a lot more for the same amount.

To me, the Chargeback model and the Service Tiering serve as Key Drivers to our Architecture. I will not consider my architecture complete unless I include these 2 in my design. We are architecting to meet the business requirements, which are “defined” in the chargeback model (e.g. the business wants a $100 VM per month, not a $100K VM per month), and service tiering (e.g. the business wants 99.999% and 3% CPU Contention).

As shared, I see a chance for us to STEP UP and STEP OUT.

Step out of the kitchen and network with your customers (the App team). Educate and fix the problem at the source. Step up from pure IT architecture to business architecture. Architect your pricing strategy and service tiering.

The good thing about pricing is…. your benchmark is already set.

Azure, AWS, Google, and many SP have to a certain set the benchmark. Your private cloud cannot be too far from it. Too low and you will likely make a loss (it’s almost impossible to beat their efficiency). Too high and you will get a complain. Another source of benchmark is physical.

If you are pricing your VDI, the cost of a PC sets your benchmark. You can be higher, but not by a huge gap. A PC costs $800 with Windows + 3 year warranty + 17” monitor. Add your IT Desktop cost, and you meet your benchmark

Hope the above provides clarity. I’m keen to hear your thought.