Tag Archives: VMware vSphere

Is vSphere performing well?

In general, you know that you’ve done a good job with your vSphere IaaS because the VM Owners are happy with the performance of their VMs. Business is powered by the VMware infrastructure that you design and operate.

But….

What does the logs of vSphere say? Is there anything lurking in the log files? 

As VMware professionals, we know vSphere well and probably have years of experience on it. We can architect, design, implement, upgrade, and even troubleshoot it.

The same thing cannot be said with the logs. Generally speaking, the deep knowledge of vSphere logs belongs to VMware GSS engineers, as they read logs on daily basis, performing all kinds of troubleshooting. That knowledge has been slowly codified into Log Insight. I’m not sure which engineers are doing this great job, but Steven Flanders will be a good starting point. If you have any feedback on the Content Pack, drop him a note.

BTW, if you are new to Log Insight, there are many bloggers who have written good articles on Log Insight. Examples are VMware Arena (vSphere), VMguru (NSX integration), Cody (variety of articles). BlueShift has also written a good overview here.

Back to the question: what does the log of vSphere say about its health?

This is where the vSphere content pack comes in. As you can see from the following screenshot, it comes with many out of the box dashboards, queries, alerts and field. I’d encourage to review the definition, and not just simply look at dashboard content.

12

Going through the above can be daunting, as it is indeed a lot. Yes, vSphere has a lot of logs, and Log Insight merely reflects that breadth of information. One approach that works for me is Search. I searched for the particular thing I am after. For example, in the following screen, I searched for vMotion. The result told me what information I can get about vMotion.

13

You might be wondering how to use the variables, or Field as Log Insight calls it. Well, you don’t have to do anything, as they are automatically appears in the drop down field. All you need to do is to type the name. I type vmw_ in the following screenshot as all the VMware specific content packs follow this naming convention. Notice the variable name as the Content Pack name in bracket. This lets you know which content pack is contributing to that field. Nice!

14

You will eventually create your own field. Please have a naming convention. I typically use prefix of the company name.

Now that you understand the basic, let’s review some of the dashboards.

The first one answers these questions: What vSphere alarms are we getting? How often do they happen? When do they happen? Is there a pattern? Which hosts, cluster, etc get it?

23

I plot the above for just 1 day. We can tell my small environment hit by what alarms and when. There was a spike at 10 pm.

Do you notice how easy it was to produce the dashboard? All it takes was 2 built-in variables. Log Insight has created vc_event_type and vmw_vc_alarm_type fields. The dashboard was built by simply choosing they exists. So long there is a value, it’s counted as exist.

The above chart was grouped by type of alarm. What if you are looking at your customers, and want to be grouped by VM?

This is where another built-in field comes in. The field vm_vm_name identifies the VM name.

21

The preceding 2 screenshots are individual charts. It’s in the Interactive Analytics mode of Log Insight. Notice at the top of the screen, there are Dashboards and Interactive Analytics. The dashboards is where you will come first in your day to day operations.

You will eventually build your own dashboard, picking what you like from the out of the box dashboards. The good thing about your own is they won’t be superseded during Log Insight upgrade.

The dashboard below shows the vSphere alarms. From here, you can drill into any of the widgets. I normally use existing dashboard as starting point, drilled down into one of the widget, customise it, and convert the chart into a widget in my own custom dashboard.

00

Let’s move into Storage, as that’s one area VMware Admin have to deal with. Regardless of storage platform, ESXi is the one executing the IO on the VM behalf. You can track errors by device, hostname, and path. You can also track the latency as seen by the VM kernel. Knowing the latency at hypervisor level complements the info you have at VM level and Guest OS level.

SCSI not all device

I put the word “Regardless” in red, because it is not actually regardless. There is a situation where ESXi cannot provide you with the above info. Can you figure out what situation is that?

Yup, it is the distributed storage architecture. Specifically, when the physical disk is directly pass through to the Storage VM as PCI device (not RDM). In this case, the hypervisor does not see it. Only a monitoring tool that knows how that specific product works can monitor it. vCenter and vRealize Operations cannot help you. My point here is you need to know how your things works first, before you can monitor it properly.

You might notice on the preceding dashboard the latency widget show no result. That’s because the setting was set at 1 second. That’s 1000 ms. You can easily change it. In the dashboard below, I’ve changed the field vmw_esxi_scsi_latency from >1000000 to just exists. I also plot the Minimum and Maximum, so I can see the variation.

22

If I’m interested in just specific device, I can filter it. I can also change the chart type. I can also display only Maximum, so it’s easier to see.

222

Ok, that was Storage.

Let’s move to DRS and HA. The dashboard below gives an overall picture of DRS and HA events. I can see that during the time period I specified, all my clusters were balanced. However, I have quite a number VM heartbeat issue. [Yes, the product team is aware that the chart legend says red but it’s showing blue]

02

What about the speed of vMotion? How much bandwidth does it take? Did it utilize my 10Gb Ethernet well?

01 vMotion

From the above, my precopy stun time is on the high side. In a healthy environment, this number would be <300 ms. The bandwidth would also be much higher than that. This is a lab, and I know the physical connectivity.

I think you’ve got the idea of how Log Insight helps you. Let’s do just 1 example at VM level. This dashboard answer a specific question: Did any VM hit high CPU Usage? If yes, which VM and when?

31

We covered quite a fair of things. They are all something familiar to you. Storage, Cluster, VM, etc.

Now…. your CIO may ask: what about something we don’t know? Is there any errors, warning, timeout, abort, etc. that we need to know?

Below is the built-in query that gives you that. It’s a pretty complex query 🙂

11

An example of the result is below. I have grouped them by ESXi host

10

Hope you find it useful. Deploy it (free for 25 OSI), do the health check, and let me know what you found! You might be surprised, and have to cancel that vacation 🙂 🙂

How to redirect vSphere Platform Services Controller syslog

I use vRealize Log Insight as my log management platform, so I’d like all components of SDDC to direct the syslog to it. I am using an external Platform Services Controller, and I do not see my Log Insight receiving log entries from the hostname (or source) of my PSC.

It turned out that you have to configure it. It is not automatically configured when you added a vCenter in Log Insight.

Login to your vCenter with an account with administrator privilege. I use Administrator@vSphere.local as I do not configure my vSphere admin to have the full privilege.

From there, go to Administration. Under the Deployment group, choose System Configuration. It will take you to the following screen. From there, click on Nodes. Your list of vCenter and PSC will be shown.

0

Double click on the PSC that you want to configure. Click on the Services, and you will see the Syslog Service. You can see that the Health is good and it’s running.

00

Click on the VMware Syslog Service. It will show you that it is not yet configured.

1

Simply configure it. Here is mine as an example.

2

It says restart required. So I restarted mine. Wait a few minute, and the entries start showing up! In my case, the hostname is core-platform-sc-1, so all I need is to filter Log Insight entries to just show entries from this hostname.

3

If you have multiple PSC, you need to do it for each one by one.

Who snapshot what VM and when

I got a request from my customer to track the VM snapshot operations. They need to track creation and deletion. Basically, who snapshot what VM and when. So I tried in the lab. I simply created a snapshot. I waited for a few seconds, then proceeded to delete it. You can see the activity in the vSphere Web Client below.

Notice the snapshot name is not shown in the vCenter task list. In production environment, you should have a meaningful snapshot name. If you have a naming pattern, you can actually build a Log Insight query based on it. Let’s see if Log Insight captures the name of the snapshot!

Who snapshot what VM and when: VM snapshot

Where do they show up? Well, the awesome folks at Log Insight has created an out of the box dashboard for you. Just go to the “Virtual Machine – Snapshots” like I did below. Notice Log Insight has categorised the 2 events nicely.

VM snapshot - 1

You can drill down to the Interactive Analytics. Here is what they look like. In this example, I’ve modified the chart so it’s simpler for me.

VM snapshot - 2

If you want to know the actual query, here is what they look like. Yup, just 2 variables are all you need. In the example below, I’ve also extended the time line to the past 7 days as I got curious if anyone else have done any snapshot. Good to know no one did.

VM snapshot - create 001

Now… can you guess the snapshot name? It’s in the log above. Hints: I was singing an old song by The Beatles. Ok, it wasn’t technically singing, it was a bad attempt at singing 🙂

Wait a minute! We have not shown the User who made the changes. To do that, you need to use the vc_username field, and add the word Snapshot in the text field. To make it easier to see, use the Field Table. I’ve provided an example below.

VM snapshot - 5

There you go. Now you know who snapshot what VM and when. Have fun combing the logs with Log Insight. Easier than grep right 😉 (just kidding!)