Your vR Ops. A bespoke operations tool

As human, we love tinkering. We customize our cars, our phones, etc. That’s why toys like Lego is hugely popular. In vR Ops, we recognise that each operations is unique. 100 companies may use the same identical architecture (e.g. they all use VMware on Amazon), but the way they run operations will be unique. While they are similar, at the ground level you will have 100 different operations.

One key feature of vR Ops is customization. One cloud provider customized by building their own UI. Another example of advance customisation is the vR Ops vCD Tenant App. You get a completely different app!

Now, you don’t have to spend that much engineering effort. Within a few minutes, you can customize vR Ops. I’ll show you one example, as this is a popular one among my customers.

In the above, my customers want to create a troubleshooting flow. It enables you to drill down by simply clicking on the object name. We leverage vR Ops ability to customize

  • the Summary Page.
  • the out of the box Dashboard.

So the implementation looks like this. We customize the Troubleshoot a Cluster Dashboard, and 3 Summary Pages.

Here is what the customized Troubleshoot a Cluster dashboard. You can access it via the Getting Started, because you’re using the same dashboard ID. Cool!

You can sort by any column. You can also change the time period (yes, no need to have Edit access anymore!). If you have 100s clusters, you can also filter to specific vCenter. Yup, these are new features in vR Ops 7.0!

Once you find the cluster you need, simply click on it. It takes you the Cluster Summary Page. I explained it here.

You can see whether the performance (read: high contention) is caused by high utilization or not. You can also see if the problem is spread across multiple hosts or not. From here, you can drill down into the Host. I explain the dashboard below here.

Finally, from the host, you can drill down into a VM.

The implementation does not use Group nor Policy. It’s certainly heavy on super metric.

You can download all the above dashboards from Sample Exchange. Hope that gives you an idea to customize your vR Ops to meet your unique operations. And have fun tinkering!

VM Performance dashboard

Continuing from the vSphere Cluster Performance dashboard and ESXi Performance dashboard, here is the VMware VM Performance dashboard:

The dashboard follows the same layout with cluster and ESXi dashboards. If it’s not clear, let me know.

The dashboard implements the concept I shared here. Do read it, as most of my customers are surprised with the choice of counters. I’m not saying your counters are wrong. I’m saying you can be more accurate in certain cases.

As VM is the smallest object, the dashboard does not show any child objects. Rather, I added a Trend line chart. This helps you check if the spike is something expected.

Lastly, the dashboard is complemented with a List page. How do you know which VMs to look at first? Well, the list below should help you:

As usual, get the dashboards from VMware Sample Exchange. For instruction on how to install, read the ESXi Performance dashboard.

vSphere Cluster Performance Dashboard

Continuing the blog on vSphere ESXi Performance dashboard, here is the Cluster Performance dashboard:

It’s designed to be similar to the ESXi dashboard, so it’s easier to learn. So do read the ESXi first, as it’s a building block for the cluster.

While a cluster is technically a collection of ESXi, it does have its own characteristic. So here are the changes:

  • Added VM Disk Latency. This covers HCI scenario, or design where the datastores do not span across clusters.
  • Cater for scenario where there is unbalanced in the cluster. The root cause could be reservation, limit, VM affinity, etc. But the first thing is to determine if there is unbalance to begin with. So for CPU, I plot both the cluster average, and the highest among its hosts. In perfectly balance, the 2 average and highest will be very similar in value and pattern. In unbalance, either their pattern or value is not the same, or both.
  • For RAM, since there are 2 counters (Consumed and Active), it will be confusing if I plot the Average and Max for both. You will end up with 4 line charts. So I simply plot the Consume (average) and Active (average).

The above dashboard helps you troubleshooting a specific cluster. If you have many clusters,  how do you know which ones to look at first? You need to have a table listing all clusters. You want to compare their performance, not their utilization. The table below does not list their utilization, as it’s not a primary information. It will clutter this table, and may even mislead you to look at the wrong cluster.

The above is good if you have <100 clusters. What if you have a lot more? The View List lets you filter into a specific vCenter or Datacenter.

The above table is good, but what if you can’t look at it every 5 minutes. What if you look at it once a day? Or once a week? If you look at it on Sunday morning when there is no load, what data do we show?

  • We can show the current data, which may not show problem in the past.
  • We can show the average of the week, which will be good.
  • We can show the worst of the week, which will be bad but not relevant as it could be a one time, 5 minute peak.

This is where Percentile coming handy. You can ignore the outlier.

Just like the ESXi dashboard, find this dashboard in VMware Sample Exchange