Tag Archives: self monitoring

vRealize Operations 6.3 Self Monitoring

vRealize Operations 6.3 sports an enhanced self monitoring. This is covered by Michael Ryom on this blog as part of his what’s new in 6.3, so I will start where he left off. So do read his first.

The screenshots in this blog are taken from 6.4 release (see what’s new by Roshan). I do not have 6.3, but I think this self-monitoring feature is the same with 6.3.

vCenter Adapter Details

The vCenter Adapter collects data from vCenter. The dashboard helps answer collection questions from each vCenter, such as:

  • Is there anything wrong in collection? A big drop in the number of objects and metrics can give a clue, especially if you are not removing objects in the associated vCenter.
  • Is collection taking longer than usual?
  • Is collection failing to collect the new objects?

vcenter-adapter-details

The lab has ~300 VMs and 30 ESXi. I added the number of objects and metrics. On average, I get around 160 metrics per object.

As you can see from the above, I have customised it. It is safe to customize, and I do encourage you to do so. Best to follow these 2 rules:

  1. Do not use Admin account. You won’t be able to track what you have changed if you do.
  2. Do not modify the existing object. Clone them, prefix with your company name (e.g. MSFT)

If you want to know where the counters come from, go into edit mode. Notice you cannot edit if you are not using the built-in Admin account. That’s a protection, so you do not accidentally modify OOTB objects.

vcenter-adapter-details-where-it-comes-from

From the above, you can tell the metrics are coming from the vCenter object itself. vSphere World is chosen, as its children are vCenter objects.

Cluster Statistics

The dashboard provides aggregate information at cluster level, so you can see summary before going into each node. There are interesting counters such as Object, Metrics, Alarms and Alerts.

You can click on each scoreboard and the detail line chart will be automatically shown. For example, I clicked on Metric and can see that collection went up on 21 November. If this is not due to new VMs or vSphere infra, then it’s something I’d need to investigate.

self-cluster-statistics

You can also get usual information such as CPU, RAM, Disk and Network. I’ve selected the CPU Usage in the example below.

self-cluster-statistics-cpu

If your vR Ops is slow, you can use the Average IO Transaction Time to tell you if vR Ops is experiencing high disk latency. If the number is much higher here than what you see at the VM level, check if the IO is stuck in the Guest OS.

self-cluster-statistics-latency

We can also see the IOPS. From here we can see there is a daily pattern. There is a daily spike in Writes. The peak hit 4K IOPS sustained over 5 minutes. So the actual IOPS is higher as it is a 300 seconds average. There is also a daily spike in Reads, but at a different time.

daily-iops-patterns

Performance Details

The detail dashboard covers the individual node. The lab only has 1 node, which is what I’d recommend you to deploy. From what I see, a single node with 4 vCPU running the latest Intel Xeon should be able to handle up to 4000 VM. I’m assuming you only use the vSphere Adapters.

performance-node

Can you spot the customisation I made to the dashboard?

Yes, I’ve added extra column. This is how it’s done.

performance-customize

Notice you do not need more than 4 vCPU here.

performance-node-cpu

Can you guess the one time peak around 16 November? Yes, that’s when we upgraded it.

Customising

You might want to customize the dashboard further, or build your own. You may also want to setup new alerts. To do that, you need to know 2 info:

  1. The Relationships among objects, such as the hierarchy.
  2. The metrics and properties for each object.

One way to study is to click on a particular object and see the All Metric page. Below is an example. This one is for the Collector services. You can see the metrics and property you can get from this object.

collector

You can see the full list of metric & property here.

To create new alert and new symptom, it’s wise to check if existing alert has covered it. For example, here are the symptoms for collector. Notice there are different object type. You need to know that.

symptom

Hope you find it useful. I will expand this post next week once I finish my travel.