Sunny Dua and Simon Eady events

Sunny Dua and Simon Eady have been doing a monthly webex where they are sharing their knowledge on VMware vRealize Operations. The latest one is coming this Friday, Thursday, 25th August. It’s 1:30 PM – 2:45 PM Singapore time. I know it’s not a good time for certain cities. If you cannot make it, it’s recorded.

I’ll join them in the next session. We are hoping to answer questions like the following. We put some answers in light hearted words as you know it’s a serious question.

Capture

We live in an era where society is hypersensitive to people who are not sensitive. In the example above, I use her but I meant her/his/him.

The session aims to help you monitoring performance and capacity. Hopefully, you gain a new perspective, and questions like the following will make sense:

2

3

You will also be able to answer questions like this:

4

See you next week!

Operationalize SDDC program

Callum Eade and Kenon Owens run a program called Operationalize Your World. Sunny and I provide the technical content. We help customers operationalize their VMware SDDC environment. It covers 4 areas in management (Availability, Performance, Capacity and Configuration). We map each area to both Consumer and Provider layers of your IaaS business.

blog 1

The set of dashboards also go beyond vSphere Admin, and provides dashboards for Storage Team, Network Team and NOC Team.

blog 2

You can find the material here. 

They are editable format (ppt), not in PDF format. There are 3 ppt files and I use a restaurant analogy.

3 files

I’m giving in PowerPoint as Operations vary widely. Take what’s relevant to you, throw away what’s not, add your custom deck, and make it yours. When you share your deck to your peers or customers, let me know how it goes. I’m keen to hear your journey. It’s a journey because it will take you multiple rounds to enlighten your peers. 

The tools we use to manage VMware SDDC is vRealize Operations and Log Insight. We did a live demo during the events and in all situations, customers eagerly ask for a copy that they can import. This blog provides the steps to import ~40 dashboards, some are shown below.

The dashboards form 1 story. They are not 40 independant dashboards. They are complementary to one another.

Proposed Dashboards

You can find the steps to import to your environment here.

Future

I plan to keep the content and dashboards up to date with future release of vRealize Operations and Log Insight.

Operationalize SDDC program: Import Steps

This blog explains the steps to import the dashboards, super metrics, views that we use in Operationalize Your World event. Refer to this post for the reasons why the dashboards are needed.

Read the whole instruction first, before executing the steps. I have not tested if you deviate.

Planning

  1. Decide which clusters and datastores are on what Tier. You choose per cluster, not per resource pool.
  2. Define Large VM. Normally, I’d classify large VM as those with >8 vCPU or >24 GB RAM.
  3. Think of a way to exclude local datastores. I use naming convention to mark them.

Prerequisite

  1. vRealize Operations 6.2 or higher. Ideally, a setup where you have not created any Policy, super metric and dashboards.
  2. Hands-on with vR Ops 6.2. I assume you know what you’re doing.
  3. Have an ID with admin privilege. No, you do not have to use Admin.
  4. Download the files here. I’ve provided in both 1 zipped file, and individual.

Steps (Summary)

The steps can be grouped into 3 parts:

  1. Part 1: Group
    1. Create the group types
    2. Create the groups
  2. Part 2: Policy and Metrics
    1. Import the Policies.
      • Importing Policy automatically imports super metrics.
      • Do not enable the policy.
      • Enable the super metrics in your active base policy
    2. Import the Performance SLA super metrics
      • They are imported manually because they are not applied to all objects.
  3. Part 3: View & Dashboards
    1. Import the Views
      • Do this prior importing the dashboard as there is dependency
    2. Import the Dashboards.
      • Importing dashboard automatically creates the menu structure

To help you, the videos below maps to the 3 parts. It gives you an overview.

Steps (Details)

Read the steps as it has more details than the videos.

Part 1: Group

Create 2 new group types as marked in the arrow below. Name is Class of Service and VM Types. Do not deviate from the names as the super metrics are tied to it.

Create the following groups. Follow their names closely, as they are hardcoded in the dashboards.

  • Under the group type Class of Service
    • Tier 1 (Gold)
    • Tier 2 (Silver)
    • Tier 3 (Bronze). Don’t forget to create policy and super metrics. I do not have it in my example.
  • Under the group type Function
    • Datastores (shared)
    • Datastores (local)
  • Under the group type VM Types
    • Large VMs (CPU)
    • Large VMs (RAM)
    • Powered on VMs
    • Powered off VMs
    • VM with no VMware Tools
    • VM with VMware Tools

For each tier, ensure you select the right Cluster, VM and Datastore that you have planned earlier. Do not do an impromptu planning. That’s an oxymoron 🙂

You need to select these 3 objects. For datastore, exclude local datastore unless they are part of your official Service Tier. If you do not select the object, you cannot apply the Performance SLA.

Group - Service Tier

For datastores group, make sure your selection formula is correct. If your local datastores has the default “datastore” name or you prefix it “local”, then you can use this.

group - Local datastores

In any group, always do a preview before you save. It’s a good habit.

group - always do preview

For the Large VM groups, you can change the definition to suit your need. I’d recommend changing below from 4 to at least 7. If you have a lot of 8 vCPU, then change to 8 so they are not included.

Large VM - CPU

Make sure you only choose powered on VMs too! See below on how to add this condition. You can use this metric, or use the Summary | Running metric.

Large VM

For the Powered Off VMs group, I define them as VMs that are off >50% in the past 30 days. This is very conservative, as that VM needs to be powered off for a total of 15 days in the past 30 days. You can pick your level. In the example below, it’s set to be super conservative at 95%.

Powered Off VM

For group of VM with VMware Tools and group of VM without VMware Tools, use the property shown below. I clone the group, and simply change from is to is not.

group - VMware Tools

Part 2: Policy and Metrics

Import the SDDC policy. Choose Skip import to ensure nothing is overwritten.

Policy import

It should take around 1-2 minutes. You will get this when done.

Policy import success

The purpose of the policy import is to merely import the super metrics. We have to enable them manually. If you are curious the list of super metrics you are getting, the list looks like this:

Super Metrics

Once imported, enable the super metrics in your base policy. Yes, you can bulk enable. Use the Actions menu.

enable super metrics

Import the Performance SLA super metrics. Do adjust the SLA accordingly.

super metrics

Create 1 policy for each Tier. This has to be based on your active policy. In the example below, my base policy is called OneCloud Default Policy. Make sure you choose the right one.

Enable the SLA for that tier. In the example below, I’m only enabling Tier 2. From the big red number 1, you can see I’m editing a policy named Tier 2. You can see it’s being selected in the background, behind the dialog box.

See the big red number 2: It shows the Performance SLA that should belong to Tier 2. As a result, I only enabled them (see the big red number 3). I do not enable the super metrics for Tier 1 (see the big red number 4).

Correct super metric for each policy

Click Save to end the editing.

Assign the policy to its associated group. Tier 1 policy should be mapped to Tier 1 group. Below is an example. Use the green plus sign, as I circle it below:

Assign policy to Tier 1

You know you got the policy associated when it appears in the Active Policies. I only have Tier 1 shown here. If you have 3 tiers, then you will have Tier 1, Tier 2 and Tier 3.

active policy

Do the same steps for Tier 2 (Silver) and Tier 3 (Bronze).

Part 3: View and Dashboard

You can import them in any order.

view import

The lists shown below is partial. There are ~100 in total. Yes, I use View widget heavily 🙂

Views

Import the Dashboard. Again, you can import them in any order. When you are done, it looks something like this.

Dashboard

Once imported, take your well deserved coffee break! It you have a large environment, it can take an hour for all the dashboards, super metrics, policies, groups, to be applied. During the process, you may see the known error while trying to open a dashboard. Just wait an hour or so. So go ahead, catch some pokemon 🙂

Limitations and Gotchas!

  • Import
    • You can make it work with earlier version, but I highly recommend you use the latest.
    • A few super metrics may appear wrong in the View widget. When that happen, simply edit the view widget, remove the super metric, and add it again.
  • Policy
    • If you are doing IaaS business, you should have at least 2 policies. That’s the main reason why I have not thought of a business scenario where you only have 1 policy.
    • The Policy is applied at cluster level. I do not use Resource Pool. It complicates matters operationally.
    • You might hit an known error of import. Not all dashboards may make it. I’m checking internally on this.

When things go wrong

You should not need to do any of these things. But if things go wrong, there are a couple of things you can check. First, ensure each Policy actually applies to the correct object. For example, you can see below that I’ve applied the policy named Tier 2 to a group called Tier 2. Under the Assigned Groups, column, it shows it’s being applied to 1 group and it impacts 302 objects.

Policy objects group

The same goes with super metrics. In the following example, a super metric is being applied to Tier 2 policy. It’s not applied to other policies, as it does not make sense.

Super Metric n policy

If import fail, you will see the error message. Simply rename the duplicate object, then reimport.

import duplicate

Let me know how it has helped you, or problems you encounter.