How healthy is your vSphere environment?

One common question I get from customers is how to prove that there are not hidden warning lurking around in the log files. As you know, vSphere produces a lot of logs. I shared how you can check performance issue here, so I will complement it here.

Your first stop should be the General Problems dashboard in Log Insight. This dashboard checks the health of your vSphere using 8 queries. You expect a flying color, meaning it should be blank like this. That means vSphere does not log any issue.

1 good result

Let’s look at some of the query that Log Insight does. The SCSI latency is based on 1 second, which is 1,000,000 microseconds. Here is what the query look like:

good result 1

1 second is on the high side; you can change it to a lower number. Do note that this is from VMkernel viewpoint and it’s taking 1 SCSI operation (1 read or 1 write), so the number will be much higher than vCenter average. I’ve seen 12 ms value in vCenter (from the real time chart, so it is a 20 second average) became 600 ms. For details, see this.

The above query is pretty simple, as it’s looking for a specific item. Here is a much broader health check.

good result 2

The example below check for any error in the vCenter that is not already reported as alarm.

good result 4

This query below check for cluster imbalance.

good result 5

This query tracked for VM rebooted due to HA.

good result 6

OK, all the above are what you want to see. In reality, your environment may not be 100% healthy. Let’s look at another example, this time with some errors.

bad 1

You can drill down to each widget. Log Insight presents the Interactive Analysis screen, as you can perform analysis interactively on this screen.

bad 2

The above data gives you the relative distribution. You can drill down by adding time dimension. This lets you see if the problem happens consistently or not. In the example below the problems keep on happening.

bad 3

I can drill down to a specific problem. Let’s choose SCSI device connection loss. Once I narrow it, I can group the information by device.

bad 4

vSphere logs seem to be distinguishing the permanent loss further. From the above, we can see there are multiple types. I did not know about it, but it’s clearly shown by Log Insight. As a result, I can probe further.

bad 5

We can go on with more examples. I hope it has given you the idea that Log Insight is a good companion for your VMware logs. Since it is free for 25 sources per vCenter, give it a spin!

Leave a Reply