Monthly Archives: March 2016

Released – VMware Performance and Capacity Management

Glad to share that the 2nd edition is finally out. It is now available for order. If you use Amazon, it’s here.

From both the amount of effort, and the resultant book, this to me is more like 2.0 than 1.1. Page wise, it is 500+ pages, doubling the 1st edition. The existing 8 chapters have been expanded and reorganized, resulting in 15 chapters.

It now has 3 distinct parts, whereas the 1st edition has no part. The 3 parts are structured specifically in that order to make it easier for you to see the big picture. You will find the key changes versus the 1st edition below.

It’s a surprise how much things changed in just 14 months. I certainly did not expect some of the changes back in Jan 2015!

  • Major improvement in monitoring beyond vSphere.
    • vRealize and its ecosystems have huge improvement in Storage, Network, Application monitoring. This includes newer technology such as VSAN and NSX.
    • Many adapters (management packs) and content pack were released for both vRealize Operations and vRealize Log Insight. I’m glad to see thriving ecosystems. Blue Medora especially have moved ahead very fast.
  • Rapid adoption of NSX and VSAN, that I had to add them. They were not plan of the original 2nd edition.
  • Rapid adoption of VDI monitoring using vRealize. I had to include VDI use cases.
  • Adoption by customers, partners and internal have increased.
    • In the original plan, I wasn’t planning of asking any partners to contribute. So I’m surprised that 2 partners agreed right away.
    • It is much easier to ask for review, as people are interested and want to help.
  • vSphere 6.0, 6.0 U1 and U2 were released.
    • Since the book focus on Operations (and not architecture), the impact of both releases is very minimal.
    • Very few counters have changed since with vSphere 5.5.
  • vRealize Operations had 6.1 and 6.2 releases. Log Insight has many releases too.
    • Again, this has minimal impact, since the book is not a product book.

You can find more details of the book here.

The Manager of Managers

This post is part of Operationalize Your World program. Do read it first to get the context.

Have you ever made that requirements to your management vendors, or been given the idea that you need one? It’s certainly a logical requirement. As an IaaS service provider, you put together products from many vendors. Each product cannot be monitored in isolation, as they impact one another.

The above requirement or idea comes with other names too. The popular ones are “single pane of glass” and “end to end visibility”.

I’m not implying it is a bad idea. It is in fact the holy grail of IT operations. The mistake is the solution you have in mind is incomplete.

Think about it. What’s the reason behind the requirement? Another word, what’s the end goal you’re trying to achieve?

We can use different words to articulate the goal. They are essentially the same thing. Here are some examples:

  • You want to do your job well, which is to serve your customers. The business of IaaS requires you to ensure the IaaS platform delivers the performance and availability promise you state in your SLA.
  • You want to be able to troubleshoot fast, and see problems before they become serious.
  • If you cannot troubleshoot the root cause, you need to at least prove it is not due to your IaaS.

On a lighter note, we can say the goal can be summed in 1 word: TTI.

That’s Time to Innocence, not Time to Investigate 🙂

Ok, so we have seen here that there are 2 parties: You and Your Customer.

Let’s now look at 2 scenarios of opposite nature.

  • Scenario 1: You spot some errors in your storage, network and VMware. But your customers are all happy. None of their VMs are affected. Is there an active fire you need to put out right now? All customers told me, the answer is no. Enterprise IT is practical because business is practical. There is no need to complicate matter. You can take your time to troubleshoot as business is not affected.
  • Scenario 2: I’d take the opposite situation. Every component of your IaaS is healthy. Your storage, network, server, hypervisor are all doing well. You support 10000 VM. 9999 is happy, but 1 is not. But that 1 VM happens to be the CEO desktop, and he needs to use it right now. Is there a raging fire, and you’re firefighting furiously? You bet!

The above 2 scenarios clarify that the single pane of glass you are looking for is not what your customers want you to look for. Your customers don’t care about what you care. Your infrastructure is irrelevant as far as they are concerned.

There are clearly 2 sets of dashboards:

  1. The 1st set shows how VMs are being served by your environment. This is what your customers want you to see. We will dive on this later.
  2. The 2nd set shows your IaaS platform. It has information about your ESXi, cluster, datastore, distributed virtual switch, NSX, physical storage, physical network, FC fabric, etc. You want to see them all. How each performs, and how they are related to one another.

Let’s focus on the first set. Focus on something your customers care, which is their VM and how you serve them.

  • If you promise 0 dropped packets for network, then prove that not a single VMs experience dropped packet.
  • If you promise 30 ms disk latency for Tier 3 service level, then prove that every VM has its IOs served within 30 ms.
  • If you promise that performance will be as good as physical for your Platinum service level, then show that not a single VM in the Platinum cluster is contending for CPU and RAM.
  • etc.

The chart below shows a reasonable expectation on your IaaS from CIO viewpoint.


How do we prove that not a single VM in any service tier fails the SLA threshold you promise for that tier? Since it is IaaS, that means CPU, RAM, Disk and Network, the 4 main components of Infrastructure.

Your CIO expects to see consistency in performance. So you need to show a monthly report.

Is it easy?

Let’s see. Assume you look after 4000 VM.


Can you think what the dashboard looks like?

That’s the dashboard you need to display as The First Dashboard. It is the first pane in your single pane of glass screen. Head to the Dining Area to find out.

ASEAN Partner Training 2016

For the attendance of the ASEAN partner training in Bali, you can download the material on my sessions from here.

I cover 2 sessions on SDDC Performance and Capacity Management:

  • Level 200
  • Level 300


As the attendees of the event, you know you have passed Level 100. You have several years of vSphere performance troubleshooting, and you have hands-on experience on vRealize Operations and Log Insight.

Plus, the allocated time does not permit us to cover product design, installation, and configuration.