Monthly Archives: March 2015

What are the ESXi Hosts that a VM has ever run on?

“Where has this VM been running in the past xxx months?” You may need to answer that to satisfy Oracle audit. You need to prove that it has not been running on Host that you have not licensed. To prove, you need to use vSphere logs. That can be difficult as vSphere produces a lot of logs.

Luckily, Log Insight can help. The following screenshot shows the query. It gets all the log entries that has a VM name, a host name and a cluster name. If you do not use Cluster, then you just exclude the cluster name. I group the result by host name. From the bar chart, it shows 3 ESXi (which is all I had in my cluster), which became 2 as I put the host into maintenance mode to save power.

The table below the chart shows all the VM name. In my example, it happens to show an Oracle VM.

10

If this is an Oracle VM that is the subject of audit, I could zoom into it. I simply used it as a filter, and I got the result below.

Bingo! All the ESXi hosts that ever run the VM is shown.

11 - zoomed into a VM

The chart above maybe a little too details if you need to go back far in time. You can change the level of granularity to daily, as shown below.

12 - day view and event type

For video example, watch this by VMware Spain team.

Log Insight 4.0 has the dashboard as one of its out of the box dashboard.

So that’s it! I hope it helps in proving to the Oracle audit.

vSphere Storage Latency – View from the vmkernel

The storage latency data that vCenter provides is a 20-second average. With vRealize Operations, and other management tools that keeps the data much longer, it is a 5-minute average. 20 seconds may seem short, but if the storage is doing 10K IOPS, that is an average of 10,000 x 20 = 200,000 read or writes. An average of 200,000 numbers will certainly hide the peak. If you have 100 reads or writes that experienced bad latency, but the remaining 199,900 operations returned fast, that poor 100 operations will be hidden.

With esxtop, it can get down to 2 seconds. That’s great, but we need to do it per ESXi. If you have a large farm, it means logging into every ESXi host. This will also have impact on performance, so you do not want to do it all the time, 24 x 7 x 365 days. There is also an issue of presenting that data in 1 screen.

ESXi vmkernel actually logs if the storage latency gets worse or improve. It does this out of the box, so nothing we need to enable. This log entry is not an average. No, it is not logging every single IO the vmkernel issues. That would generates way too many logs. It only exists when the situation improves or deteriorates. This is what you want anyway. As a bonus, the data is in microseconds, so it’s also more granular if you need something more accurate than ms.

This is where vRealize Log Insight comes in. With just a few clicks, I got the screenshot below. Notice that Log Insight already has a variable (field) for VMware ESXi SCSI Latency. So all I needed to do is to get all the log entries where this field exist. From there, it’s a matter of presentation. In the following screenshot, I grouped them by the Device ID (LUN). I only have 1 device that has this issue in the past, so apology for the poor example.

The line chart is an average. You can take the Max if you want to see the worst. I’ve zoomed the chart to a narrow window of just 2:15 minutes (from 18:06:00 to 18:08:15 hours), and each data point represents 1 second. So that is 1 second granularity. 20x better than the 20 seconds average you get from vCenter.

Storage latency

What do they look like when you have multiple LUNs, and you have storage issue? Here is an example. I have grouped them by Device as it’s easier to discuss with storage admin if you can tell the Device ID.

Max Storage Latency by LUN

Here is another example. This time the latency went up beyond 20,000 ms!

Max Storage Latency by LUN 2

Beside the line chart, you can also create table. Log Insight automatically shows the field above. All I did was hiding some (14 to be exact, as shown next to the Columns). I find the table useful to complement the chart.

You may be curious about what field is. Log Insight comes with many out of the box fields. They are provided by the content pack. To see the default fields, just duplicate it like what I did below. Log Insight will automatically highlight in green, how it determines the field. In the example below, it parses the string “microseconds to” and “microseconds” and the value in between the strings is extracted as field.

You can also set the type of field. In this case, it specifies the field as Integer.

Storage latency - field

You may think…. why not search for the string “latency”, so we get everything? Well… this is what I got. There are many log entries with the word latency in it that is not relevant. I have 169 entries below.

Storage latency - text latency

The following screenshot shows more examples of the log entries.

Storage latency - text latency 2

That’s all. Hope you find it useful. If you do, you may find this and this useful.

Blue Medora NetApp Management Pack: A deeper dive

The good folks at BlueMedora helped me to upgrade the Management Pack to their latest release, which is 5.1. This releases has both enhancements (2 new dashboards as you can see below: IOPS and Historical Debugging) and fixes. It also has UI enhancements, for example numbers are rounded to make it easier to read. The entire list of dashboards are shown below.

00

I will just cover some of them, as they are actually quite straight forward. The product will help NetApp Storage Admin and VMware Sys Admin to sit down together and jointly customise the dashboards to their specific needs. It has a lot of data, so it increases visibility and transparency.

Below is the Cluster Systems Overview dashboard. It lists the systems being monitored. The Identifier-3 column is actually the DFM Server name. Clicking on any of the system will automatically shows the information on the right and below it. You can then choose a metric and plot it (apology it is not visible as it’s right at the bottom).

01

The NetApp Heatmaps dashboard drills further. As shown below, you can see the heatmaps of Systems, Aggregate, Volume, LUNs, etc. For each Heatmaps, you can choose different Configuration. Just choose it from the drop down. You can also add/remove/change them. The data on the right shows a bit more details. It can be difficult to see which ones are which. This is a limitation in vRealize Ops. As a workaround, you can mouse over.

00 heatmap

The Top-N NetApp Disk dashboard shows the physical spindles. It picks up all disks: data disks, parity disks and spare disks. From VMware Admin view point, being able to see the performance at spindle level is certainly assuring.

You can easily customise the dashboard. I’ve shown some examples below, where I changed the latency to show Top 15 instead of Top 5. You can also change the time period. I normally like to see current data in Top-N. Looking at past 30-day is not a good idea as it’s an average.

My dashboard below looks different to yours as I have also reorder them. As I said in my book, do not be afraid to customise any dashboard. It does not change the way the product works. You will not “damage” it in anyway.

10

How do I customise a widget? Simply click on the small edit icon (pencil). It brings up a dialog box. This is for Top-N widget. Other widget will have different dialog box. Any changes you make on this is safe. So go ahead and tailor the dashboards and widget to your own needs.

11

Back to Blue Medora management pack. You can see from the window below that they have added a lot of objects. For each object, they’ve added a lot of metrics. Great visibility into the NetApp array!

12

Version 5.1 of the Management Pack adds this IOPS dashboard. I like this one as at a glance I can easily see the IOPS at Volume level, Datastore level, and VM level. So I can see whether an issue happens at a VM level, or it’s a more a widespread issue.

The dashboard is interactive. You first select the volume. For each volume, it will automatically list the Datastore. You select a datastore, and it will automatically list the VM. The IOPS data will be shown on the right automatically. Nice!

IOPS

All in all, I think it’s a product that gives both deep and complete visibility into the NetApp array. This makes collaboration between VMware Admin and NetApp Admin much easier, as there is a rich set of data supporting them!