Tag Archives: vCenter

Who powered off/on what VM and when in vSphere

The answer to the above question can be answered easily with Log Insight. I used the 2 built-in variables, which will ensure the log entry has user name and VM name. To filter all non power activity, I specify the string “power”. This will include both power on and power off.

Apparently, the log entry included irrelevant entry. That was easy to filter. Just click on the entry and filter the text out, like what I did below.

The result is a table showing who powered on or powered off which VM and when. You get the host, cluster, data center, vCenter also. I hide the time stamp. You can easily bring it back by clicking the Columns link. Notice I hide 11 columns, so there are other info that Log Insight can show.

1

I grouped the above chart by the VM Name. You can easily change it. Below is how to do it. Notice I’ve grouped it by user name. This is just a lab, as I used root a lot (not a good discipline!). I should have used proper AD name.

2

For more Log Insight tips, I highly recommend Steven Flander’s blog.

Utilization: When is a peak not a true peak?

This blog post is adapted from my book, titled VMware vRealize Operations Performance and Capacity Management. It is published by Packt Publishing. You can buy it at Amazon or Packt.

Let’s elaborate on peaks. How do you define peak utilization or contention without being overly conservative or aggressive?

There are 2 dimensions of peaks. You can measure them across time or members of the group. Let’s take a cluster with 8 ESXi hosts as an example:

  1. You measure across members of the group. For each sample data, take the utilization from the host with the highest utilization. In our cluster example, let’s say at 1:30 pm, host number 7 has the highest utilization among all hosts. It hits 80 percent. We then take it that the cluster peak utilization at 1:30 pm is also 80%. You repeat this process for each sample period. You may get different hosts at different times. You will not know which host provides the peak value as that varies from time to time. This method results in over-reporting, as it is the peak of a member. You can technically argue that this is the true peak.
  2. You measure across time. You take the average utilization of the cluster, roll up the time period to a longer time period, and take the peak of that longer time period. For example, the cluster average utilization peaks at 80% at 1:30 pm. You roll up the data for 1 day. This means the peak utilization for that day is 80%. This is the most common approach. The problem of this approach is that it is actually an average. For the cluster to hit 80% average utilization, some hosts have to hit over 80%. That means you can’t rule out the possibility that one host might hit near 100%. The same logic applies to a VM. If a VM with 16 vCPUs hits 80% utilization, some cores probably hit 100 percent. This method results in under-reporting as it is an average.

The first approach is useful if you want to know detailed information. You retain the 5-minute granularity. With the second approach, you lose the granularity and each sample becomes one day (or one month, depending on your timeline). You do not know what time of the day it hits the peak. The first approach will result in higher average than the second approach, because in most cases, your cluster is not perfectly balanced (identical utilization). In the tier 1 cluster, where you do not oversubscribe, I’d recommend the first approach as it will capture the host with the highest peak. The first approach can be achieved by using super metrics in vRealize Operations. The second approach requires the View widget with data transformation.

Does this mean you always use the first approach? The answer is no. The first approach can be too aggressive when the number of members is high. If your data center has 500 hosts and you use the first approach, then your overall data center peak utilization will always be high. All it takes is just one host to hit a peak at any given time. The same situation applies in contention. All it takes is 1 big VM, which tends to have higher contention, to skew the peak contention figure in the cluster.

The first approach fits a use case where automatic load balancing should happen. So you expect an overall balanced distribution. A DRS cluster is a good example.

vCenter audits: who did what and when

As companies virtualize more, vCenter becomes more critical to the business. With a software-defined data center, changes can be made quite to the data center. Right click is essentially what it takes. No downtime required. With such a fluid environment, changes have to be tracked. Changes made in vCenter need to be tracked, so we know what changes are made and when.

vCenter tracks changes via its Tasks and Events. The problem is it is hard to query the history. It’s not like a big data, where we can treat it like a giant database. This is where Log Insight comes in.

A simple query below got me all the changes made in vCenter. In fact, this is across multiple vCenter servers.

[Note: notice in the query panel, I’ve deliberately omitted a task called “recompute virtual disk digest. There is a bug which results in excessive log entries]

All vCenter events and tasks

From the result above, looks like the main change is “reconfigure VM”. Let’s click on it to drill down, and see Who made the changes. In my case, it is root. So let’s see which VM did the user root change.

[Note: I need to figure out why it is root. I thought I did not really use it].

Who reconfigure VM - root

I drilled down on the above, filtered it to only show Root. I then group the result by VM.

Who reconfigure VM - root - on what VM

If I want to know the time the change made to a specific VM, I can drill down to that VM. In the example below, I drilled down to a given VM. Notice the queries are all shown below the chart, so we always know exactly what filters we use.

Who reconfigure VM - root - on what VM - when - zoom

For more Log Insight tips, I highly recommend Steven Flander’s blog.