Monthly Archives: July 2019

Operationalize Your World 7.5

Thank you for all the feedback. This particular release took a while as I wanted a longer validation period and more testing by customers. This release is based on vRealize Operations 7.5.

You’ll notice that the dashboards are neater. They are also visually consistent, making learning easier. The idea is all dashboards, and not just vSphere dashboards, can adopt the design philosophy. After all, they all serve the 7 pillars of operations, which are:

  • Availability Management
  • Performance Management
  • Capacity Management
  • Configuration Management
  • Cost Management
  • Compliance Management
  • Inventory Management

This release also brings back SLA as you guys keep asking for it. However, I vrealized you’re happy with a single threshold, as it’s better than nothing and easier for you to operationalize. Your dev/test environment is slower than your mission critical, so this reality will get reflected because they are benchmarked on the same performance standard. So expect to see your cheaper environment shows a lower Performance (%) value.

Let’s tour the dashboards, then we talk about import. 3 dashboards are provided for each object type:

  • Performance
  • Utilization
  • Capacity

Performance shows 1 day and not just the present data, as you need to see the pattern beyond 5 minutes. So the summary table shows all the peak (read: worst) in the last 24 hours. With this, you can simply check it once a day.

VM dashboards

Utilization is shown separately as it impacts Performance and Capacity differently.

The dashboards answer these questions:

  • Are the VMs performing well? If not, which VMs are affected by what problems (CPU, Disk, RAM, Network)?
  • Is the VM performance caused by IaaS not serving it, or by contention within the Guest OS?
  • Are the VMs running high utilization? If yes, which VMs, how high, and what resource (CPU, RAM, Disk, Network)?
  • Are they really high (as that can cause strain in the shared infrastructure!)?
  • Any VMs need to be right-sized? By how much and for which resource? For disk, we need to look inside each partition, not at VM level.

You can also go back to any point in time, and ask the same questions above. This is important as by the time you have the chance to look at the problem, 5 minutes have passed, or the problem no longer happening.

The dashboards sports new counters, giving you deeper insight

  • Guest OS CPU Run Queue
  • Guest OS CPU Context Switch
  • Guest OS Disk Queue Length
  • VM CPU Overlap
  • VM CPU Co-Stop
  • Guest OS RAM Needed (for Capacity use case)
  • Guest OS RAM Free (for Performance use case)
  • Guest OS Page in and Page out

The counters are agentless, hence you need Tools 10.3.5 or higher and the corresponding vSphere that supports it. You need vSphere 6.5 U3 or 6.7 U2 as it includes the new hostd poller.

VMs Performance: Can you guess why there is no utilization counter here?

Cluster Dashboards

You’ll notice the dashboards are similar

  • Just like VM, the same 3 dashboards are provided: performance, capacity, utilization
  • They sport a consistent look

The questions being answered are also similar

  • Are the clusters performing well? If not, which clusters are struggling to deliver what services (CPU, Disk, RAM, Network)? Is it because the cluster is running high utilization?
  • Do we have enough capacity? If not, which cluster are running on what component? What’s the Time Remaining? VM Remaining?
  • Any VMs need to be right-sized? By how much and for which resource? For disk, we need to look inside each partition, not at VM level.

ESXi Dashboards

The ESXi dashboards complement the cluster dashboards by providing the host level details. Unbalance can happens in large or stretched cluster.

Datastore & Datastore Clusters

There are 2 sets, 1 for datastore and 1 for datastore clusters. For the table, we are showing worst (peak) and not 99th percentile as the data is already the average of all the VMs.

Multi-Tier Application

I’ve update this battle tested dashboard. It now sports more metrics and are color coded.

Summary Pages

Operationalize Your World customizes the object summary pages. It has performance, utilization and capacity. If you add configuration, share with me!

Imports

Download the deck to familiarize yourself.

  • Download the file and unzip only once.
  • Import the super metrics.
  • Go to you Default Policy, and enable them. Do not enable them on All Objects, as you end up having super metrics for every single types of objects.
  • Import the views. Choose overwrite existing.
  • Import the dashboards. Choose overwrite existing.
  • Import the Summary Pages. Do not change your default pages yet

Take a coffee break! Let the super metrics be calculated. To validate if they are calculated, go to  Performance: Clusters dashboard. If the dashboards are filled with data, then it’s time change your default Summary Page.

To do it, follow these screenshots

Future Enhancements

VM Availability. Need to think as it should only apply to production VMs.