Monthly Archives: December 2016

Monitoring vSphere Replication

This blog is contributed by my friend Luciano Gomes, a VMware PSO Senior Consultant in Rio de Janeiro Area, Brazil. Thank you, Lucky!

In this post, I will share how you can monitor your vSphere Replication using vRealize Operations Manager. This can be done with a simple custom dashboard.

First, enable the vSphere Replication metrics in the Policy (yes, I agree, there are only 3 metrics coming from vSphere, more would be nice)

repo01

vRops is now collecting these 3 metrics. You can see in the ESXi object.

I’ve created a simple dashboard. You can import it to monitor your vSphere Replication.

rep02

You can grab the dashboard here.

The dashboard uses custom interaction, which is an XML file. Create a new XML file, then copy paste the text into it. The file name has to be replication.xml, as that’s what the imported dashboard expects.

<?xml version="1.0" encoding="UTF-8"?>
 <AdapterKinds>
 <AdapterKind adapterKindKey="VMWARE">
 <ResourceKind resourceKindKey="HostSystem">
 <Metric attrkey="hbr|hbrNumVms_average" label="" unit="" yellow="" orange="" red=""/>
 <Metric attrkey="hbr|hbrNetRx_average" label="" unit="" yellow="" orange="" red=""/>
 <Metric attrkey="hbr|hbrNetTx_average" label="" unit="" yellow="" orange="" red=""/>
 </ResourceKind>
 </AdapterKind>
 </AdapterKinds>

So long you make the name of the file that the dashboard refer to, and the actual file name, consistent, you can name them whatever you want.

If you don’t know how to use custom interaction in vROps, take a look at this article.

I am thinking of blogging about Log Insight and vSphere Replication, let me know if it will be useful for you.

Hope you find it useful. Do reach out via Linkedin / Twitter. Thanks for reading!

What’s new with vRealize Operations 6.4

Apology that this is coming late. Please read the following first, as I will start from where they end:

I’d focus on the new dashboards, since I was the one designing them. If the dashboards are not meeting your requirements, you now know exactly where to complain 😉 The dashboards were reviewed extensively by Product Managers (Monica Sharma, Ronit Halachmi Bekel) and Sunny Dua.

The dashboards in 6.4 is a subset of the dashboards in Operationalize Your World program. Around 20% made it. They are also simplified. The reason is we wanted the dashboards to pass The 5-second Test, and be applicable to SMB segment. We also wanted to have more feedback from real life environment, before bringing additional & more advanced dashboards. So do let me know at e1@vmware.com.

The following screenshot shows the 2 sets when you are running 6.4. You end up with both sets. They can co-exist, and the cost is some metrics are duplicated. As part of porting the dashboards to 6.4, we converted the super metrics into regular metrics so it’s simpler for you.

2016-12-10

Do we remove the old dashboards in 6.3? Nope, we did not. We simply moved them. Can you guess where they are on the screenshot above?

Yes, we moved them under “Other” folder. In future, we might deprecate them as we enhance the UI and dashboards.

You will notice we have grouped the dashboards into Infrastructure and VM. This is in-line with the Dining Area and Kitchen shared in Operationalize Your World. We wanted to drive your attention that you should ensure the VMs are served well, before you look at the kitchen.

We’ve placed some dashboards outside the folder for your convenience. We’ve also created a Read Me dashboard. We call it Getting Started. It explains the new dashboards.

getting-started

Technically, the dashboard has only 1 Text Widget. The adventurous among you will ask me if you can clone and tailor it for your company. The answer is yes. No, it is not supported. All the texts and images are in this directory:

ContentPack/VMWARE/conf/pages/getting_started/index.html?locale=%locale%

Operations Overview dashboard

We designed this dashboard to answer a few frequently asked questions on your day to day operations.

  • What have we got? If this number change, or not what you expect, you want to probe why.
  • What’s the Health of my environment? Environment can vary in size, so we group them by vSphere Data Center. The dashboard lists all your data centers. Select one, and you can see its uptime and alerts. You expect the uptime to be 100% and the Alerts to be below your normal operations.
  • Just because your vSphere is healthy does not mean the VMs are being served well. This is where the Top-N comes in. As this dashboard is your daily dashboard, you should expect the number to be within your expectation.

ops-overview

Cluster Performance dashboard

If the VMs are not being served well, you need to investigate why. This is where the following dashboard comes into play. It lets you see which clusters are not performing well. The heatmap shows the cluster by alert. Start with the reddest cluster.

Select the cluster you want to probe. Its performance counters will be automatically shown. You can see if it’s serving its VMs well. We are using line chart, so you can see the past and check if there is any strange spike.

cluster-performance

Heavy Hitter VMs dashboard

One possible cause of performance is you have Villain VM. Your vSphere environment is a shared environment. It can take as little as 1-2 VM to create performance problem in a cluster with 500 VMs.

This dashboard answers if there is any abnormal spike generated by any VM. It tracks both Storage and Network. From the following example, we can easily see there are both excessive storage and very high network throughput. They happened on different time and were caused by different VMs. The dashboard quickly shows the 2 villain VMs. You can see their workload is >10x to the second highest VM.

heavy-hitter-vms

In the Operationalize Your World, we enhance this dashboard by adding details, and split it into 2 (Storage and Network). This is to facilitate collaboration with your peers.

Datastore Performance dashboard

Cluster covers compute. What about storage? The Datastore Performance dashboard lets you see the performance of all your datastores. You can use the view to select a datastore. Its performance charts will be automatically shown. From the line chart, you could see if the datastore was having difficulty serving its VMs.

You can drill down to see each VM in the datastore. Select a VM, and its IOPS and Latency will be automatically plotted.

datastore-performance

VM Performance Troubleshooting dashboard

There are 2 spectrum of performance problem:

  • Whole house on fire
  • A small number of VMs were hit

The first few dashboards help you answer the first use case. This dashboard helps you look at a single VM. It’s a big dashboard, so I’ve added visual sections. It has 3 sections.

  • Section 1
    • This is where you select the VM. You can search, filter or simply browse.
    • The selected VM key properties, alerts and how it fits into the larger environment are automatically shown. If a VM is part of Resource Pool, check if the Resource Pool is limiting it.
  • Section 2
    • We display both the VM KPI and the IaaS KPI.
    • IaaS KPI is the 4 key metrics that shows how the IaaS serves this VM. If this is high, there is a good chance your IaaS capacity is full. It is struggle to serve all its VMs.
  • Section 3
    • You are verifying if the underlying IaaS was able to serve the VM.
    • Can you guess why we show Cluster instead of ESXi?
    • Hint: the performance problem may happened in the past.

vm-troubleshoot

In the Operationalize Your World, we expanded this dashboard into a set of dashboard.

VM Usage dashboard

A common request from VM Owner is to get his VM utilization and property. This is what this simple dashboard is for. You simply select the VM, and the key information is automatically shown. We are using line chart again, as the data that the VM Owner wants to know can be in the past.

If you need something more advanced, with self service, review the tenant dashboard here.

vm-usage

Capacity Dashboards

The twin brother of performance is capacity. While they are different, they are closely related. We’ve provided 2 dashboards to get you going:

  • Cluster capacity
  • Datastore capacity

The Cluster capacity lets you see a cluster utilization in 3 areas: CPU, RAM and Disk.

The model we use here is based on utilization. It does not take into account Availability Policy and Performance SLA. It is also based on Demand model, which is not suitable if you are doing Tier 1 cluster.

capacity-overview

The datastore capacity lets you see quickly which datastore is running out of space, and which datastores are hardly used. It uses the red color to show low capacity, and dark grey to show wastage. What you want to see is balanced usage across all datastores.

datastore-capacity

Configuration Dashboards

The last set of dashboards cover Configuration. We focus on configuration that need attention, rather than simply listing all configuration. Take the VM configuration dashboard, shown below. It highlights is you have lrge VMs, how large they are, and how many for each size.

You can also customize the filter. Simply edit the view widget.

It also highlight configuration that you need to watch. A VM with > vNIC should get your attention that it can bridge your network.

vm-config

We apply the same principle to the ESXi Configuration dashboard. For example, it shows the BIOS version. You want to keep the version consistent and minimal.

esxi-conf

The Cluster Configuration highlight inconsistent config among members ESXi in the cluster.

cluster-conf

The Network configuration lets your peers in the Network team to quickly understand the virtual network. It lists all the distributed virtual switch. Once you select one, it automatically lists all the port groups and ESXi in that switch. It also lists all the VMs. You can control and customise all these lists.

network-config

Hope you found them useful. We have intentionally kept them simple in 6.4. If you are running 6.3 or later, and you need a more advanced dashboards, download from here.

vROps dashboard best practices

This is a common request among vRealize Operations dashboard artist. To a few of us who love visualizing information with vR Ops, we see the dashboard as a canvas. Granted, the widgets have limitation but that’s part of the art.

capture

The 5-second Test

  • Can the dashboard be understood within 5 seconds? If yes, you buy yourself a few more minutes. User understands what the dashboard does, and is willing to spend more time mastering it.
  • Which information, object, metric can you take out from the dashboard?

In the Executive Dashboard below, it has minimal info and the content is largely large number.

picture2

Purpose-Built

  • 2 dashboards can have identical role, purpose, scope, etc. But if the size of the environment differs, the 2 dashboard will be different. An environment with 50K VMs is managed differently with an environment of 50 VMs.
  • An environment with 100 VM in just 8 hosts in 1 cluster (hence 1 datacenter, 1 vCenter) needs less dashboards than an environment with 10K VM spread over 800 ESXi , 100 clusters, 10 datacenters and 3 vCenters.
  • Begin with the end in mind. What is the final outcome and laser focus on that.
  • Generic dashboard that tries to please everyone and cover all end up not used

Layout

  • Divide the screen into sections visually. This makes the dashboard easier to read.
  • Here are some examples of how you can divide the screen.

2

Here is a good example of layout. Notice how simple it is. This is built by Blue Medora.

picture1

From the above, it is clear that it has 4 layers as layout is consistent among them.

Color and Content

  • Take advantage of color.
  • There are different type of color spectrum you can choose. Here are 3 to consider:
    • Green –> Yellow –> Orange –> Red.
    • Black <– Green –> Red. We use this in Capacity as wastage (unused) is a bigger issue than over utilization.
    • Light Blue –> Dark Blue. We use this when trying to highlight difference in heatmap. Easier to see the boxes when they are not the same color! We use this in the Cluster Configuration dashboard.

In the Compliance dashboard below, color is used to quickly show the various level of compliance. If all you see is Green, there is no need to look at the numbers & texts!

picture3

More Examples

The next few dashboards show how the best practices are being applied.

The next dashboard covers Inventory. It answers questions such as “What have we got? Where are they?” Inventory is not Configuration. They are related but not identical.

picture4

The dashboard below is an overview dashboard for Storage Team. It is purpose-built for them, so info that is not related is not shown. It is also an overview dashboard, so details information is not shown.

picture5

Once the Storage team understands the big picture, they are ready to tackle monitoring. One major part is Capacity Monitoring. The dashboard below is purpose-built for this use case.

picture6

This next dashboard uses line chart a lot. The purpose is for Help Desk to quickly determine if performance issue is due to VM or IaaS. Because of that, the performance threshold line is shown. Line chart is used as the performance issue might happen the past.

picture7

This next dashboard uses distribution chart as its top as top-line header. They attract attention and can easily tell if you have configuration that can cause problem in your IaaS.

picture8

A dashboard can be visually simple, but has deep concept. When you need to change paradigm, use the text widget to explain. You can also use image. Upload your image into the View widget once you know the screen resolution you’re building it for.

picture9

The next dashboard helps to quickly determine the degree of VM CPU over-provisioning. Notice it has 3 questions. Each question is answered by 1 widget.

picture91

Hope you find it useful. I want to close by adding some limitations:

  • Widget cannot drive grandchildren
    • Use a parent widget as in between.
    • Use auto-select 1st row to automate the driving. Collapse this parent widget if they are not required
  • 6.3 and 6.4 cannot show 4-column layout
  • If usability is acceptable, drive another dashboard. Jumping to another dashboard lets you have greater screen real estate, and the dashboard takes longer to load as it’s smaller.