VMware vRealize Log Insight provides out of the box content pack for Horizon View. I find the content pack useful for both monitoring and troubleshooting. Naturally, there are cases where you need to build your own custom field and dashboard. In this blog, I share one such example.
The example I share is a Mass Disconnect of users. This means many users were suddenly disconnected at the same time, causing disruption at work and complain to IT.
Mass disconnect can certainly happen, and Horizon View Event Database records that. You will see the string “has disconnected from machine” in the View Event DB. Here is one such example. In this case, I’ve filtered to a particular user.
The first challenge is Horizon View Event does not distinguish between abnormal disconnect and normal disconnect. You will see many such string in the event DB. It is difficult to analyse in large environment as you cannot visualise.
Yes, see the example below.
I queried the same string. With Log Insight, I can query all the View Connection Server, not just one at a time. That’s another benefit of Log Insight.
From the above, you can see clearly that the string happens many times. I plotted for 5 days, and we can see the pattern matches the working hours. So how do we see an abnormal one since the log does not distinguish it?
The mass disconnect means it hit many users at one shot. Within 1 minute, you will see many users hit. Log Insight enables us to zoom. As you can see below, I zoomed into 5 seconds and we can see there is a mass disconnect event within that 60 seconds.
I masked out the user name. Yes, you can also show the user Microsoft AD ID in the table. I also masked the ESXi host. Yes, that means you can group the result by ESXi Host. An example of such chart is shown below. We can also show them by cluster.
We can also present the chart differently. In this chart, I group by ESXi, as I want to know quickly how many users were hit for each ESXi. From here I can tell it was quite well spread.
Once I know the users, we can create a custom group in vRealize Operations. This has to be done manually. It’s a one time effort, so it’s okay with me.
Once the custom group created, I can run analysis on it. For example, I can check if the disconnect were because of disk. As you can see below, the disk latency rose to 543 ms during the time of the disconnect. It’s a one time rise, and the time matches the mass disconnect time.
In vRealize Operations, we can zoom into the specific time. Here, it’s clear that it’s a one time spike.
Hope you find the example useful.