Tag Archives: super metric

Super Metrics bulk export import

If you have a lot of super metrics, backing them up can be a challenge. You cannot do bulk export to back up. It’s easy to have version control issue if you manually export each.

Replicating in another instance (e.g. your test/dev) is tedious as you need to import one by one.

A workaround is to use Policy as vehicle to bulk export/import.

  • For backup purpose, export is all you need.
  • For restore into the same environment (where you exported it earlier), you can use the same XML file. You don’t need to customise it.
  • For replicating in another environment, you likely need to modify the XML file. This is because the policy file contains other settings, such as alert. It’s safer if your exported policy does not contain all other settings.

I’ll show you how to trim the policy file, so it has only super metrics. In this way, it’s safe to import into any environment, as it won’t modify anything. The XML file only contains super metrics, that’s all.

The policy file is just a long XML file. In the example below, it has >5000 lines! Notice it has Alerts and Custom Profile.

To delete an entire section, simply use the keyboard to highlight them. See below, where I selected Custom Profile.

I’ve also deleted the alert section. The file is shorter now 🙂

We still have some irrelevant lines (line 4 – 1187 in my case). Delete them too. You’ll end up with something like this.

I’ll expand the content so you know what exactly the supermetric section contains.

BTW, do not copy/paste your super metric from vR Ops UI into the XML file. The expression is not 100% identical.

Once done, you can import it safely into another vR Ops instance. The import is much faster too! Here is what it looks like:

And if you go to Super Metrics, they are all there 🙂

To enable them, go your default policy (the one marked with a tiny D on the column), and edit it. Go to section 6, and find your super metrics. In Operationalize Your World case, they are all prefixed with “Ops.

Do not enable for all objects. It will slow down your system. It also makes it more complex unnecessarily as the formula don’t apply to them.

What’s new with vRealize Operations 6.3

This blog complements the official blog that you can find here, and other articles such as this and this by Michael Ryom. Bill Roth and team have written a lot of articles there. You should also read the Release Notes. Take note when you have EPO agent.

Super Metric

I use around ~75 super metrics in typical engagement. By and large, it’s good enough for my customers’ requirements. On the other hand, I have seen folks like Ronald Buder and Brandon Gordon, who build very advance formula, and would like to have more capabilities. Here are 3 enhancements that would go a long way in making super metric more useful:

  • Ability to specify a condition
    • Prior to 6.3, super metric applies to every member of the group or the parent object. If you are counting the number of VMs in a cluster, it will give you all VMs.
    • With 6.3, you can add condition. You can count only VMs that are powered on, or VMs with >8 vCPU. Another example, you can count how many VMs in a Datastore which have latency above certain number.
  • Ability to have IF THEN ELSE
    • Prior to 6.3, super metric works in 1 formula. You cannot apply Formula 1 for condition A and Formula 2 if Condition A is not met. A use case here is you are checking VM Uptime. If you have VMware Tools running, you use the Tools heartbeat to decide that the VM is up. If the VMware Tools is not running, you use VM utilization to decide.
    • The IF THEN ELSE can be combined with AND, OR and NOT. This enables you to build a more comprehensive logic.
    • You can chain it to create IF THEN ELSEIF.
  • Ability to combine expression
    • You can have AND, OR, NOT. Enough said 🙂
    • Ability to compare. You can have less than, less than or equal to, greater than, greater than or equal to.

The where clause cannot point to another object, but can point to different metric in the same object. For example, you cannot count the number of VMs in a cluster with CPU Contention metric > SLA of that cluster. The phrase “SLA of that cluster” belongs to the cluster object, not VM object.

That right operand must also be a number. It cannot be another super metric or variable.

The where clause cannot be combined with AND, OR, NOT. This means you cannot have “where VM CPU > 4 and VM RAM > 16”. The reason is that ‘where’ clause calculation is running on the vR Ops node where the data is retrieved, while the rest of all operators (AND, OR, NOT) are running on the node where the super metric expression is executed. Other operators are executed when all data has already retrieved. The retrieved data does not contain metric values for each member object but aggregated values of these objects.

As expected, you will find the new operators in the super metric editor, as shown below.


The following screenshot, courtesy of Brandon Gordon, shows a brief description of the operators:


Example on how to use the where clause

sum(${adaptertype=VMWARE, objecttype=VirtualMachine, attribute=mem|guest_provisioned, depth=5, where = "sys|poweredOn==1" })

Example on how to use the IF THEN ELSE

${this, metric=diskspace|used}>1024 ? max(${this, attribute=virtualDisk|commandsAveraged_average} as IOPS) / ${this, metric=diskspace|used} * 1024 : max(IOPS)
Max([${this, metric=mem|host_contentionPct}>${this, metric=Super Metric|sm_4a3bd0c0-c897-4baf-a60e-4bea139e537b} ? 1 : 0, ${this, metric=cpu|capacity_contentionPct}>${this, metric=Super Metric|sm_20ff3c62-0185-47a8-9bdc-a96f3081a2a8} ? 1 : 0])

The [x,y,z] array is actually available since earlier release. What you can do now is x, y, z are independent expressions and all their results are put into the array. They are no longer limited to just constant or metric.

Resource Alias

The name of resource is rather long. If you have a lot of resources in the formula, the whole formula can be hard to read. You can now have a name for the resource. Here is an example:

Before 6.3:

${adapterkind=VMWARE, resourcekind=HostSystem, attribute= cpu|demand|active_longterm_load, 
depth=5, where=”>=0”}) + 1)/
(max(${adapterkind=VMWARE, resourcekind=HostSystem, attribute=cpu|demand|active_longterm_load, 
depth=5, where=”>=0”}) + 1)”

We will name the resource as CPUload. by adding as CPUload in the formula. Once added, we can refer to it in the formula, resulting in a shorter formula.

${adapterkind=VMWARE, resourcekind=HostSystem, attribute= cpu|demand|active_longterm_load,
depth=5, where=”>=0”} as CPUload) + 1)/
(max(CPUload) + 1)”

Notice that CPUload includes the depth clause and where clause, not just the metric.

Guest OS metrics

Having visibility inside the Guest is critical. I discuss the limitation in sizing VM RAM in this blog. In a nutshell, the hypervisor does not have visibility into how the Guest OS manages its RAM. Some applications, such as JVM and databases, manage their own RAM. The guest OS does not have visibility to how the app manages its RAM. This is why RAM sizing is best done at Guest OS and App levels.

vR Ops 6.3 brings Guest OS metrics. Yes, it is agentless! There is no need to deploy agents on every VM. How does it work then, if there is no network connection to the VM? VMware Tools comes to the rescue! vR Ops talks to vCenter, which in turns talks to the ESXi via management network. The new version of VMware Tools pulls these additional counters. ESXi retrieves them and passes them to vCenter.

This feature was actually available since vSphere 6.0 Update 1. You need a minimum of ESXi 6.0U1, vCenter 6.0U1 and the VM must be running Tools from ESXi 6.0U1.

Always check Tools Release Notes for enhancement & bug fixes!

The table shows a variety of VMs with the Guest OS data. I’ve added the Active RAM from hypervisor as a comparison.


Here is the list of metrics. I’m using the internal name as the table above already has the friendly name.

Internal nameDescription
guest|mem.free_latestThis is one the 3 major counters for capacity & performance monitoring. The other 2 counters are Page-in Rate and Commit Ratio.
In Windows, this is the Free Memory counter. This excludes the cached memory. If this number drops to a low number, Windows is running out of Free RAM. While that number varies per application and use case, I’d generally keep this number > 500 MB for server VM and >100 MB for VDI VM. I set a lower number for VDI because they add up. If you have 10K users, that’s 1 TB of RAM.
In Linux, read this good article: http://www.chrisjohnston.org/ubuntu/why-on-linux-am-i-seeing-so-much-ram-usage
guest|mem.needed_latestThe amount of memory needed by the Guest OS. Below this amount, the Guest OS may swap.
The formula for Linux is physicalMem - Maximum of (0, (memAvailable - 5 % of phyiscalMem)).
The formula for Windows is memTotal-(coldStandby + free + reservation)
In Linux, memAvailable is an estimate of how much memory is available for starting new applications, without swapping. Calculated from MemFree, SReclaimable, the size of the file LRU lists, and the low watermarks in each zone. The estimate takes into account that the system needs some page cache to function well, and that not all reclaimable slab will be reclaimable, due to items being in use. Reference: https://superuser.com/questions/980820/what-is-the-difference-between-memfree-and-memavailable-in-proc-meminfo

The Standby memory (which can be significant) can be split into 3: FreeAndZero, Cold and Hot. MemNeeded will count the hot part of the buffer cache as being required by the OS.
In Linux, review this https://www.linuxatemyram.com/
In Tools, the counter is called guest.mem.needed

Example: Say you have 10 GB of RAM. So the Physical RAM = 10 GB.

Situation 1: high memory utilization.

MemAvailable = 2 GB.
Tools will calculate MemNeeded as
= 10 GB - Maximum (0, 2 - 5% of 10 GB)
= 10 - Maximum (0, 1.5 GB)
= 10 - 1.5 GB
= 8.5 GB
You actually still have 2 GB here. But Tools adds around 5%

Situation 1: low memory utilization.
MemAvailable = 8 GB.
Tools will calculate MemNeeded as
= 10 GB - Maximum (0, 8 - 5% of 10 GB)
= 10 - Maximum (0, 7.5 GB)
= 10 - 7.5 GB
= 2.5 GB
Again, Tools adds around 5%
guest|page.inRate_latestThe Rate the Guest OS brings memory back from disk to DIMM per second. Another word, the rate of reads going through paging/cache system. It includes not just swapfile I/O, but cacheable reads as well (double pages/s). A page that was paged out earlier, has to be brought back first before it can be used. This creates performance issue as the application is waiting longer, as disk is much slower than RAM.
The unit is in number of pages, not MB. It's not possible to convert due to mix use of Large Page (2 MB) and Page (4 KB).
A process can have concurrent mixed usage of Large and non-Large page in Windows. The page size isn’t a system-wide setting that all processes use. The same is likely true for Linux Huge Pages.
$ cat /proc/vmstat | grep pgpgin
pgpgin 604222959257
Windows: Win32_PerfFormattedData_PerfOS_Memory::PagesInputPersec
guest|page.outRate_latestThe opposite of the above. This is not as important as the above. Just because a block of memory is moved to disk that does not mean the application experiences memory problem. In many cases, the page that was moved out is the idle page. Windows does not page out any Large Pages.
guest|page.size_latestSize of the page. In Windows, this is 4 KB by default.
This is not the size of the pagefile.sys in c:\.
guest|mem.physUsable_latestPhysically Usable Memory
Based on a sample of 9 VMs (Windows and Linux), this looks like VM Configured RAM - Hardware used. Since Hardware Used is near 0, this value is near the Configured RAM
guest|swap.spaceRemaining_latestThe amount of swap space remaining, taking into account the possibility of swapfile growth where possible. A low remaining will trigger paging. If the system is configured to run without a swapfile, this will return zero
guest|hugePage.size_latestCurrent size of Huge Page.
This should be 2 MB in Windows.
guest|hugePage.total_latestTotal number of Huge Pages.
This is Linux specific.
guest|mem.activeFileCache_latestActive File Cache Memory. This is the actively in-use subset of the file cache. Unused file cache and non-file backed anonymous buffers (mallocs etc) are not included.
This seems to be the Cache Bytes in Windows
guest|contextSwapRate_latestCPU Context switch Rate per second in Windows/Linux.
For details, see https://msdn.microsoft.com/en-us/library/aa394279(v=vs.85).aspx and

The last metric is a CPU metric. So now you know if the process performance is due to heavy context switch!

Let’s compare them with the RAM counters from Windows. The list below is from Windows 10 Performance Monitor.


I’m not sure if they are enabled by default. If not, it’s a matter of enabling from the Policy, as shown below:


This is what it looks like in VM object. Finally! 🙂


Reduction in Metrics

This is one of my favourite, as I do have customers struggle with the long list of metrics. This should also improve vR Ops scalability. The example below is from ESXi Host. Quite a number of the capacity metrics are now hidden, as they are needed by default.


The reduction can be seen in the Self Monitoring, which has improved a lot in 6.3 also. You can see the number of metrics dropped on the following chart.

Reduced Metrics

The reduction translates into less resource utilisation (CPU, Disk, RAM). I’ve added CPU as an example. Notice the load is also less spiky.

CPU reduced in percentage

Drill down via Line chart

One popular use case is the ability to automatically plots all the children value when you select a parent. There are many examples of this, such as:

  • You select a cluster, and you want to automatically have a line chart of all its ESXi CPU Demand. If you have 8 hosts in that cluster, then you get 8 line charts.
  • You select a data center, and want to automatically have a line chart of all its clusters No of VM too have a sense of VM growth among clusters.

See the following screenshot. Can you notice how it’s done?


Hint: it’s done differently than in other widgets.

The way you do this is by knowing relationship among objects. You choose the metrics you want to display, not the parent. In the following example, I need to show the ESXi CPU contention on all ESXi in a cluster. So I pick the ESXi object, not the cluster object.

You do not have to specify the relationship (parent, child, self, etc.). vRealize Operations actually automatically figures out the relationship. Unlike other widgets, where you must specify, the View Widget has that intelligence built-in. Nice!

Can you spot a performance issue that happened in the past in the selected cluster below?


The above screenshot shows one of the ESXi experienced a spike in CPU Contention. It touched 9%, which is a high number as the number at ESXi level is the average of all its VMs. One of the VM likely experiencing a much higher number, as most VMs have low CPU Contention. The reason why most have low value because your ESXi has enough cores to serve quite a number of VMs.

Property now accompany metrics

One widget customers use heavily is the Object List widget. It can list any objects along with its metrics. In 6.3, you can now list its property. This makes it a lot more useful.


Heat Map: Zoom and Grouping

I use heat map a lot, especially in Configuration and Capacity use cases. They are also useful in NOC (big screen or projector). They are not so useful in performance as they can only show latest value. Since vR Ops collect data every 5 minutes, that means anything beyond 5 minutes cannot be shown.

The other limitation of Heat Map, which is addressed in 6.3, is scalability. When you have lots of objects, it can be difficult to see. 6.3 groups the objects, and allows you to drill down.


I then drilled down into the selected group. It reveals a lot of more objects.

6 (2)

Sportier looking

I’m a big fan of UI and UX. While underlying architecture matters, the human experience is what we see every time we deal with the system. There are 3 UI enhancements that I spotted as I compared 6.2.1 widgets with 6.3 widgets.

Scoreboard widget

The Scoreboard widget now provides more visual themes than just 2 themes. This is useful when you have multiple Scoreboard widgets in 1 screen. You can use 1 theme for VM and another theme for Infrastructure objects. They help in differentiating objects easily.


There is a small usability enhancement. When you choose Fixed View, the size controls do not appear as it’s not relevant. Choose Fixed Size and they will appear.


Scoreboard Health widget

Here is what it looks like in 6.2. Notice the font for the object name is not so clear. It does not work well if you need to show it on the NOC (big screen projector). The other problem is long name is truncated. Some objects, such as Disk Device and NSX port group, are very long.

9 (2)

Notice the border? Yup, I’m not a big fan either J Personally, I prefer not to see the border. I use this widget to see a lot of objects, so the border does get in the way.

Here is what it looks in 6.3. I definitely find this more usable. Thank you UX team!


Forensic widget

I use forensic widget to quickly know where an object spends 95% of its time. The chart below shows that the ESXi has barely any CPU stress. 95% of the time, the value is not even 0.002%. Once you get used to this widget, it’s a great complement to other visualisation.


As you can see above, in 6.2 the UI is looking a little dated.

This is what it looks like in 6.3. Notice the grid lines make it easier to read. There is also peak and low, so it’s easier to see the minimum and maximum.


GUI Editor for XML interaction

No more manually modifying XML file and figuring out what the metric names are! There is now a wizard that guides you along the way.


Once you select the Adapter Kind, the wizard automatically moves into the Resource Kind. No more typing!


Maintenance Schedule

The maintenance schedule has more flexibility. A few limitations in 6.2 that were addressed in 6.3:

  • You cannot specify the start date. You can only specify the start time.
  • You cannot specify the expiry date on this schedule. Often you want to schedule only for a fixed period, such as a few months or weeks.
  • You cannot specify the number of runs. Sometimes you want to specify that you only need to run this a few times.

As a comparison, here is what the maintenance schedule editor looks like in 6.2:


6.3 addresses the above limitation as you can see in the following screenshot.


Note: The new Maintenance Scheduler is not backward compatible. All previously created maintenance schedules will no longer be available and should be created again.

New VM properties

VM folder and VM Datastore are now available via the View widget. If a VM has >1 datastore, it will show all of them, separated by commas. If you have a nested folder, it will show all of them too.


That’s all folks. Hope it helps and keep in touch at LinkedIn.

How to create a super metric in vRealize Operations 6

The steps to create a super metric have changed in version 6. I could not find a write up on this, even on my favourite sites such as Sunny Dua and Lior Kamrat blogs, so I’m writing one as my customer asked for it.

If you are not sure what super metrics to create, here are some of my customers’ favourite.

Here is a short video.

If you need it in writing (e.g. for your documentation), here we go:

To begin, click on the Content icon, then choose Super Metrics from the side list. You get something like this below. I have a few super metrics created already in the example screenshot below.


Click on the green Plus icon to create a new one. A dialog box will pop up. It’s pretty easy once you get used to it.

See that big red no 1. Give it a name. I use a naming convention, as you can see the previous screenshot. The naming convention I use is:

Function – Object – Element Type – Metric – in a Container.

  • Function can be Maximum, Minimum, Average, etc.
  • Object can be VM, ESXi, Cluster, etc
  • Element type is one of these: CPU, RAM, Disk, Network
  • Metric is the metric of the element. For example, a CPU can have Usage, Demand, Contention, etc.
  • “In a Container” means the container I’m applying the super metric at. I normally apply at the Cluster level.

See that big red no 2. That’s the place you choose the Object. You can choose any object from any adapter. Yes, your super metric is not limited just to vSphere objects. The list of objects are shown below the rectangle, hence you see a heading Object Types. Remember, an adapter brings many objects. The vSphere adapter gives you object like VM, ESXi, and Cluster.

Once you choose an object, its metrics will appear in the big red no 3. Now it is blank as I have not chosen any adapter yet. vRealize Operations call it Attribute. Don’t get confused with the term. Metric = counters = attributes.


See that big red no 4.That’s where your super metric will appear.

In the screenshot below, I’ve clicked on the Adapter Type drop down box. You can see from here the universal nature of vRealize Operations as data analytics tool. My friend Ronald Buder says it best, “It’s like a big data”.

12 choose the adapter

I chose the vSphere Adapter from the list above. The list of objects are then filtered to just what the vSphere adapters bring. From the list, I then clicked on Virtual Machine. vRealize Operations automatically lists all the VM metrics in the Attributes Types area.

13 choose the object type

From here, we are ready to choose the metric. Since there are many of them, I normally just search. So I typed “contention” and you can see the list is narrowed down.

14 choose the counter

To bring the metric into the super metric formula, simply double click on it. It will appear on the formula area. Naturally, you need to specify what transformation you want. You can choose from the list of Functions, as marked with the big red no 1. Most functions are pretty short, so I just type it. In this example, I typed the Max() manually.

Here is a tricky part. You are applying your super metric to an object. That object is normally the parent or children objects. In my case, I’m applying the super metric to a Cluster. A cluster is the parent of ESXi Host, which in turn is the parent of VM. So it’s 2 level higher. So I have to manually modify the “depth=1” to “depth=2”. This tells vRealize Operations to look up 2 level higher.

See the big red no 2. There are 3 icons there.

  • The 1st one is called This. This means you’re applying the formula to the object itself.
  • The 2nd one helps you visualize your formula. Click on it to see it in more readable format. In my case, it shows the correct formula. vRealize Operations automatically color code it. Nice!
  • The 3rd one is my favourite. This button is called the Visualize button. It lets me go back in time, and also verify if I choose the right metrics, because the result will “make sense” when I choose the right one.

See the big red no 3. Since I’m applying my super metric to a Cluster, I’ve chosen the Cluster object. Notice vRealize Operations automatically lists my 2 clusters.

15 always visualise

Click on the Visualize button. The bottom area of the dialog box got cleared, making room for a line chart to be displayed.


I clicked on one of the 2 clusters, and the line chart appeared.

17 be patient if it is far to the past

And that’s it! Save the dialog box. It will take you back to the main screen. From here, associate the super metric with the right object.

18 set the obect type

Is that all? Nope, you need to apply them to the objects. For that, you need to read this article.