Monthly Archives: December 2015

NetFlow Logic Integrator for vRealize Operations Insight

NetFlow Logic extends both vRealize Operations and vRealize Log Insight capability. It is an analytics engine for network flow data (NetFlow, IPFIX, sFlow, etc.). It has Log Insight Content Pack and vRealize Operations Management Pack.

The installation has a few stages.

  1. Install and configure NetFlow Integrator.
  2. Configure vSphere, NSX & physical switch to send netflow, sFlow, IPFIX, etc.
  3. Install Log Insight content pack
  4. Install vRealize Operations management pack

The manual is pretty straightforward, so I will only add items that I hope complement it.

Stage 1: NetFlow Integrator

Download it from NetFlow download page. There are actually 3 softwares you need to download:

  1. NetFlow Integrator 2.4.
    • I recommend you use the Windows version. I used the VM form factor, which needs manual Linux command lines.
    • The VM only takes 2 vCPU, 4 GB RAM and 1 network
  2. NFI Updater
    • This small component is installed on top of NFI. It provides NFI with information such as GEOIP, Reputation, etc
  3. NFI Operations Analytics.
    1. This provides the vRealize Operations Insight integration. So there are multiple products to install once you unzip the files.
    2. TP2 Package means this package of software is still in Tech Preview 2. The folks are working closely with VMware team.

I installed the VM appliance. It needs some Linux command lines. Installation of the NFI Updater is also via CLI, as shown below.

NFI 17

Once installed, it’s time to configure it. There are a few things to do

  • Input and Output
  • vSphere and NSX integration
  • Top of Rack
  • Additional NFI modules (optional)

To configure input and output, it’s a matter of specifying the port. Add 9995 (netflow) and 6343 (sflow) and 2055 (IPFIX). I have to add 6343 because Arista uses sflow

NFI config input

Next is the Output. To configure the Log Insight integration, you just need to fill the dialog box below. NetFlow already knows Log Insight, as you can see it has a drop down for it!

NFI Log Insight

You need to configure the vSphere and NSX integration. The current version is limited to 1 vCenter per NFI. If you have multiple vCenter, install another NFI. Multiple NFI can point to the same Log Insight.

NFI 19

To configure the Top of Rack switch, you just need to specify their IP address.

NFI TOR

To get the vRealize Operations integration, change the output method from the default 0 to 2.

NFI 1

Stage 2: vSphere, NSX & Physical Switch

There are many articles on how to configure netflow in vSphere Distributed Switch and in physical switches. An example for Cisco is here, and for Arista is here.

In vCenter, the default collector port for NFI is 9995. You specify the NFI IP address (not hostname). In my example below, it is 172.16.101.90.

NFI 31

To configure IPFIX in NSX, go to the Flow Monitoring and key in the NFI IP address. I use port 2055.

NFI NSX

On the physical switch, here is how to configure Cisco for SNMP v3

Cisco SNMP v3

Stage 3: Log Insight

The Content Pack is not yet made available in Log Insight marketplace. Just upload it manually as per the screenshot below.

NFI 20

Once uploaded, here is what you get

NFI 21

You get additional information about the traffic.

NFI Log Insight 2

And who is talking to who in your network…

NFI Log Insight 3

You can drill down to see the details

MFI 40

Stage 4: vRealize Operations

The Management Pack installation is similar to typical Management Pack. The only thing you need to do is provide the URL

NFI vR Ops

You also need to enable the collection of IP Address

NFI vR Ops 2

Is vSphere performing well?

In general, you know that you’ve done a good job with your vSphere IaaS because the VM Owners are happy with the performance of their VMs. Business is powered by the VMware infrastructure that you design and operate.

But….

What does the logs of vSphere say? Is there anything lurking in the log files? 

As VMware professionals, we know vSphere well and probably have years of experience on it. We can architect, design, implement, upgrade, and even troubleshoot it.

The same thing cannot be said with the logs. Generally speaking, the deep knowledge of vSphere logs belongs to VMware GSS engineers, as they read logs on daily basis, performing all kinds of troubleshooting. That knowledge has been slowly codified into Log Insight. I’m not sure which engineers are doing this great job, but Steven Flanders will be a good starting point. If you have any feedback on the Content Pack, drop him a note.

BTW, if you are new to Log Insight, there are many bloggers who have written good articles on Log Insight. Examples are VMware Arena (vSphere), VMguru (NSX integration), Cody (variety of articles). BlueShift has also written a good overview here.

Back to the question: what does the log of vSphere say about its health?

This is where the vSphere content pack comes in. As you can see from the following screenshot, it comes with many out of the box dashboards, queries, alerts and field. I’d encourage to review the definition, and not just simply look at dashboard content.

12

Going through the above can be daunting, as it is indeed a lot. Yes, vSphere has a lot of logs, and Log Insight merely reflects that breadth of information. One approach that works for me is Search. I searched for the particular thing I am after. For example, in the following screen, I searched for vMotion. The result told me what information I can get about vMotion.

13

You might be wondering how to use the variables, or Field as Log Insight calls it. Well, you don’t have to do anything, as they are automatically appears in the drop down field. All you need to do is to type the name. I type vmw_ in the following screenshot as all the VMware specific content packs follow this naming convention. Notice the variable name as the Content Pack name in bracket. This lets you know which content pack is contributing to that field. Nice!

14

You will eventually create your own field. Please have a naming convention. I typically use prefix of the company name.

Now that you understand the basic, let’s review some of the dashboards.

The first one answers these questions: What vSphere alarms are we getting? How often do they happen? When do they happen? Is there a pattern? Which hosts, cluster, etc get it?

23

I plot the above for just 1 day. We can tell my small environment hit by what alarms and when. There was a spike at 10 pm.

Do you notice how easy it was to produce the dashboard? All it takes was 2 built-in variables. Log Insight has created vc_event_type and vmw_vc_alarm_type fields. The dashboard was built by simply choosing they exists. So long there is a value, it’s counted as exist.

The above chart was grouped by type of alarm. What if you are looking at your customers, and want to be grouped by VM?

This is where another built-in field comes in. The field vm_vm_name identifies the VM name.

21

The preceding 2 screenshots are individual charts. It’s in the Interactive Analytics mode of Log Insight. Notice at the top of the screen, there are Dashboards and Interactive Analytics. The dashboards is where you will come first in your day to day operations.

You will eventually build your own dashboard, picking what you like from the out of the box dashboards. The good thing about your own is they won’t be superseded during Log Insight upgrade.

The dashboard below shows the vSphere alarms. From here, you can drill into any of the widgets. I normally use existing dashboard as starting point, drilled down into one of the widget, customise it, and convert the chart into a widget in my own custom dashboard.

00

Let’s move into Storage, as that’s one area VMware Admin have to deal with. Regardless of storage platform, ESXi is the one executing the IO on the VM behalf. You can track errors by device, hostname, and path. You can also track the latency as seen by the VM kernel. Knowing the latency at hypervisor level complements the info you have at VM level and Guest OS level.

SCSI not all device

I put the word “Regardless” in red, because it is not actually regardless. There is a situation where ESXi cannot provide you with the above info. Can you figure out what situation is that?

Yup, it is the distributed storage architecture. Specifically, when the physical disk is directly pass through to the Storage VM as PCI device (not RDM). In this case, the hypervisor does not see it. Only a monitoring tool that knows how that specific product works can monitor it. vCenter and vRealize Operations cannot help you. My point here is you need to know how your things works first, before you can monitor it properly.

You might notice on the preceding dashboard the latency widget show no result. That’s because the setting was set at 1 second. That’s 1000 ms. You can easily change it. In the dashboard below, I’ve changed the field vmw_esxi_scsi_latency from >1000000 to just exists. I also plot the Minimum and Maximum, so I can see the variation.

22

If I’m interested in just specific device, I can filter it. I can also change the chart type. I can also display only Maximum, so it’s easier to see.

222

Ok, that was Storage.

Let’s move to DRS and HA. The dashboard below gives an overall picture of DRS and HA events. I can see that during the time period I specified, all my clusters were balanced. However, I have quite a number VM heartbeat issue. [Yes, the product team is aware that the chart legend says red but it’s showing blue]

02

What about the speed of vMotion? How much bandwidth does it take? Did it utilize my 10Gb Ethernet well?

01 vMotion

From the above, my precopy stun time is on the high side. In a healthy environment, this number would be <300 ms. The bandwidth would also be much higher than that. This is a lab, and I know the physical connectivity.

I think you’ve got the idea of how Log Insight helps you. Let’s do just 1 example at VM level. This dashboard answer a specific question: Did any VM hit high CPU Usage? If yes, which VM and when?

31

We covered quite a fair of things. They are all something familiar to you. Storage, Cluster, VM, etc.

Now…. your CIO may ask: what about something we don’t know? Is there any errors, warning, timeout, abort, etc. that we need to know?

Below is the built-in query that gives you that. It’s a pretty complex query 🙂

11

An example of the result is below. I have grouped them by ESXi host

10

Hope you find it useful. Deploy it (free for 25 OSI), do the health check, and let me know what you found! You might be surprised, and have to cancel that vacation 🙂 🙂

vRealize Operations upgrade path

vRealize Operations have 3 forms, which explains the 3 different downloads.

  1. Virtual Appliance.
  2. Linux installation
  3. Windows installation.

The first one is what I recommend, hence I showed them as bold green line in the diagram below. The diagram shows the 3 choices of deployments. After you deploy 6.1, you should deploy the Hot Patch. Once you complete it, then you should deploy your solutions (it’s called Management Packs).

Fresh

The patching comes in 5 forms, hence there are 5 different installers for you to choose. You just need to pick 1. Hopefully the above diagram helps you pick the right one.

By now, quite a number of customers are already using vRealize Operations. The upgrade path is different. You first upgrade your management pack. The Horizon View 6.1 solution for example, does not support vRealize Operations 6.1, 6.0.3 nor 6.0.2. Because it was released before any of these versions.

So you first upgrade your V4V from 6.1 to 6.2. V4V 6.2 is supported on vRealize Operations 6.0.

Upgrade

For those celebrating Christmas, have a blessed Christmas. For those who aren’t, have a well deserved break.