Tag Archives: vCenter

Architecting VMware vSphere for Operations

As an Architect, you take into account many requirements when designing your VMware vSphere environment. As an Operations person, I shall not question your Architecture. I’m sure it is fast, highly available, right on budget, etc. My role is to help you prove to CIO that what you architect actually lives up to its expectation. “Plan meets Actual” is what an Architect wants, because that means your architecture delivers its promise.

The plan of the Architecture exists in some diagram and documents. It's static.
The reality of the Architecture exists in Datacenter. It's live.

When proving, here are the questions we should be able to answer:

  • Availability: Does the IaaS deliver the promised Availability SLA? If not, what was its actual number, when was it breached, how long, which VMs affected? For each VM, when exactly it happened and ended?
  • Performance: Does the IaaS serve all its VMs well? An IaaS platform provides CPU, RAM, Disk and Network as services. If has to deliver these 4 resources when asked by each VM, 24 x 7 x 365 days a year. If not, which VMs were affected by what and when?

Those are simple questions, but are very difficult to answer. Let say you have 10,000 VMs. How do you answer that? How do you provide answer over time (e.g. monthly), proving you handle the peaks well?

To complicate matters, you need to able to answer per Business Units. Business Units A will not care about other business units. Since a Business Unit has >1 apps, you also need to answer per application. An application owner only cares about her applications.

There are a few things you need to do, so you are in the position to prove.

Step 1: Reflect the business in vSphere

Does your vCenter show all the business units? Can you show how the business is mapped into your vSphere environment? You design vSphere so Business can run on it, so where is the Business? A company is made up business units, which may have multiple departments. The structure below is a typical example.

An application typically has multiple tiers. Does your vSphere understand that?

Map the above as folders in your vCenter.

I see many naming convention that is not operations-friendly. It’s impossible to guess what it is. The names are very similar and hard to read, hence it’s easy for operators to make mistake! The naming convention typically originates from mainframe or MS-DOS era, where you cannot have space and have limited characters. Examples are SG1-D01-INS-0001W-PRD. Can you guess what on earth that is? You’re right, you can’t. Imagine there are 1000s of them like that, and you have new operators joining the help desk team.

Folder, Tags and Annotation

Have you seen a vSphere environment where there are tags and annotation everywhere?

It’s rare to meet customers with a 100% well-thought and documented approach to the 3 features above. They may have general guidelines, but not explicit Do’s and Dont’s. As a result, these 3 features are used wrongly.

Use Tags when the values are discrete, ideally Yes/No. I’d use tag to tag the following:

  • VMs with RDM.
  • VMs with MSCS or Linux Clustering.

Do not use Tags when the values are unlikely to be common. Use annotation for this. Examples are VM Owner Name, Email Address, Mobile Number. In an environment where there are >10K VMs, there can be 1000 VM Owners.

Do not use Tags to tag Service Tier. For Infra objects such as Cluster and Datastore Cluster, that should be clearly reflected in the name itself. I’d prefix all Tier 1 clusters with Tier 1, so the chance of deploying into the wrong tier is minimized.

Step 2: Design Service Tiers into vSphere

Does your vSphere understand that there are different classes of service? Are Tier 1 clusters and datastores clearly labelled?

You should avoid mixing multiple classes of service into a single cluster or datastore. While it is technically possible to segregate, it’s operationally challenging. Resource Pool expects the number of VMs for each pool to be identical among the sibling pool.

For each tier, you need to have both Availability SLA and Performance. For Performance SLA, review this doc.

Step 3: Define and Map Tiers in vR Ops

Now that you’ve considered service tier into your vSphere architecture, time to show it. You cannot show it in vSphere as vCenter does not understand Performance SLA and Availability SLA. You can use vR Ops do this. Follow this step.

Step 4: Map Applications in vR Ops

Use custom groups to create applications. If you have a proper naming convention, it should not be difficult to select members of the applications. All you need is a query that says the names contains XYZ. There should not be a need for regular expression.

Once apps are mapped, you can do something like this.

Step 5: Consider Debug-ability

Things go wrong. Especially in production. Your architecture should lend itself for troubleshooting.

A major area is to ensure the counters are reliable, else it’s hard to troubleshoot performance. The CPU Contention counter, which is the main counter for IaaS Performance SLA, is greatly affected by Power Management. Ensure your ESXi power management follow this guide by Sunny Dua.

Once you have that in place, you will be able to prove that your Architecture lives up to its expectation. Use the dashboards from Operationalize Your World to show that proof!

VMware vCenter Server 6.0 Update 1b

The VMware vSphere team released Update 1b a few days ago. The Build no is 3343019.

As usual, it’s wise to review the Release Notes before making changes to live environment. You need to review both the vCenter Server release notes and ESXi release notes.

If you are already on Update 1, and you are using the vCenter appliance, the update to 1b is pretty straight forward. If you are not yet on Update 1, then there are more steps required. Others have documented the steps well, and some good examples are here and here.

The steps are identical for vCenter appliance and the PSC appliance.

Let’s start with the PSC. I don’t think the order matters since this is just a minor update. However, the manual says that “Before you update a vCenter Server with an external PSC, you must apply the patches to the PSC and its replicating partners, if any in the vCenter SSO domain. ”

The VMware ASEAN Lab has 2 external PSC VMs in a single domain. Since this is the setup, let’s start with the first PSC, before we do its replicating partner.

You can do the update via CLI or UI. I’ll do the UI one here to give you the screenshots. Login at root (not administrator@vsphere.local) to the address https://Your-PSC-address:5480/

PSC 01

I had configured my vCenter to automatically check when I updated it to Update 1. So it’s a pleasant experience to see that it has detected the update. Notice the build number matches. There is a KB article linked to it, which gives a bit more info, such as the size of the update (1.5 GB). At 1.5 GB, this will take a while to complete.

To update, simply click on the Install Updates link and follow the wizard.

PSC 02

Example of third-party products are JRE, tcServer, and SLES OS components. Proceed to update, and you will see the familiar progress below. Click on Show Details to see the actual commands executed. The last status is shown at the top. So if you want to see from the beginning, scroll down. The Stage Packages step took 5 minutes in my case for a PSC and 28 minutes for vCenter. It is safe to click the browser refresh button.

PSC 03

The longest step is the Pre install scripts. In my case, this has been running for >10 minutes.

PSC 04

I had to go and pick up my wife at the airport, so I left the upgrade. When I’m back, it’s already done. This is what it should look like. Notice the build number and release date matches the release notes.

PSC 05

And that’s it! You then repeat for the other replicating PSC in the domain. Once done, you do the same steps for the vCenter.

You might be curious how the update impacts the load on the PSC VM. This VM is a 2 vCPU. As you can see below, the spike is minimal. The CPU Run hit 10348 for 20 seconds, which is around 25% as this is a 2 vCPU (max is 40000).

PSC 30

Let’s look at Storage. The spike t 5:20 pm is the time I did the update. It’s below 300 IOPS for each read or write.

PSC 31

If you want to configure the auto update, simply click on the Settings button. It checks for a weekly update, which is reasonable in most cases

PSC 26

BTW, the PSC also has the https://Your-PSC-address/psc address, while the vCenter only has the https://Your-PSC-address:5480/. The /psc requires the administrator@vsphere.local, not root. When you login, you get the screen below.

PSC 1

From here, there is a link to the https://Your-PSC-address:5480/ address

PSC 2

Happy updating! FYI: the 2 vCenter and 2 PSC got updated successfully. Didn’t hit any error.

VMware vCenter Support Assistant 6.0

There are many articles explaining this useful little products, so I will just focus on items that I was not able to find. I’d recommend you read this great post by Vladan.

Installation is pretty straight forward as it is a virtual appliance. You know you’ve got the deployment right when you have the console screen looking like this below. I was expecting it to show FQDN instead of IP.

11

The above should have stated that you should login with root. This useful KB Article has a small mistake. The password is not specified during install. Rather it is hard-coded to vmware. I learned this from a great post by Chris Wahl here.

Once you login, you will see the following screen:

12

Enter your vSphere 6 Platform Service Controller address. It will automatically append the https, the port number and the rest of the URL. You just need to type the IP or FQDN.

13

Once it finds it, you need to register. I use the usual administrator@vsphere.local.

14

Click Finish in the above screen. You will see the screen below. I created a separate service account so I can tell if it is Support Assistant that login. There is a minor limitation here, it shows only 1 vCenter. There are actually 2 vCenter Servers in the lab, sharing the same PSC. I am not able to add the second one manually as it says it’s not finding it.

15

Click Next button. The rest is pretty straight forward, and Chris Wahl has explained it here. As Chris said it, the rest of the configuration is done at the vSphere Web Client. This is what mine looks like after I configured it. Remember I wrote it is a minor issue that it does now show my second vCenter? That’s because it actually does see it. The screenshot below it automatically recognises both.

16

Besides getting the email notification, it actually integrates into your vSphere Web Client and put all the issues it find there. To see it, you go to Monitor –> Issues. There is a new category under Triggered Alarms.

17

You can see the result of the scheduled collection in the Monitor tab. I scheduled mine and the first run was successfully completed, as you can see below.

21