Category Archives: Architecture

Cover both architecture and engineering. It does not cover operations and strategy.

1000 VM per rack is the new minimum

The purpose of the eye-catching title is to drive a point that you need to look at the entire SDDC, and not just a component (e.g. Compute, Storage, Network, Security, Management, UPS). Once you look at the whole SDDC infrastructure in its entirety, you maybe surprised that you can shrink your footprint.

The purpose is not to say that you must achieve 1000 VMs per rack. It is also possible that you can’t even achieve 100 VMs per rack (for example, you are running all Monster VMs). I’m just using “visual” so it’s easier for you to see that there is a lot of inefficiency in typical data center. The number 1000 VM is an easy number to remember 🙂

If your entire data center shrinks into just 1 rack, what happens to the IT Organisation? You are right, it will have to shrink also.

  • You may no longer need 3 separate teams (Architect, Implement, Operate).
  • You may no longer need silos (Network, Server, Storage, Security).
  • You may no longer need the layers (Admin, Manager, Director, Head)

With less people, there is less politics and the whole team becomes more agile.

The above is not just my personal opinion. Ivan Pepelnjak, a networking authority, has in fact shared back in October 2014 that “2000 VMs can easily fit onto 40 servers”. I recommend you review his calculation on this blog article. I agree with Ivan that “All you need are two top-of-rack switches” for your entire data center. Being a networking authority, he elaborates from networking angle. I’d like to complement it from a Server angle.

Let’s take a quick calculation to see how many VMs we can place in a standard 42 RU rack. I’d use Server VM, not Desktop VM, as they demand higher load.

I’d use a 2RU, 4 ESXi Host form factor, as this is a popular form factor. You can find example at SuperMicro site. Each ESXi has 2 Intel Xeon sockets and all flash local SSD running vSAN. With Intel Xeon E5-2699, each ESXi Host has 40 physical cores. Add 25% of Intel Hyper-Threading benefit, you can support ~30 VM with 3 vCPU on average as there are enough physical cores to schedule the VMs. Total 90 vCPU divided by 50 cores (40 + HT). This number is even better with Xeon Platinum.

The above take into account that a few cores are needed for:

  • VMkernel
  • NSX
  • vSAN
  • vSphere Replication
  • NSX services from partners, which take the form of VM instead of kernel module.

30 VM for each ESXi. That’s 30:1 consolidation ratio, which is a reality today. You have 4 ESXi in a 2RU form factor. That means 30 x 4 = 120 VM fits into 2 RU space. Let’s assume you standardise on a 8-node cluster, and you do N+1 for HA. That means a cluster with HA will house 7 ESXi x 30 VM = 210 VMs. Each cluster only occupies 4 RU, and it comes with shared storage.

To hit ~1000 VMs, you just need 4+ clusters. In terms of rack space, that’s just 5 x 4 RU = 20 RU. Half a rack!

Let’s do ~1500 VM. This gives you 7 clusters. If you do 1000 VM that means you can have larger VM.

Capture

A standard rack has 42 RU. You still have 42 – 28 = 14 RU. That’s plenty of space for Networking, Internet connection, KVM, UPS, and Backup!

Networking will only take 2 x 2 RU. You can get 96 ports per 2 RU. Arista has models you can choose here. Yes, there is no need for spine-leaf architecture. That simplifies networking a lot.

KVM will only take 1 RU. With iLO, some customers do not use KVM as KVM encourages physical presence in data center.

If you still need a physical firewall, there is space for it.

If you prefer external storage, you can easily put 1400 VM into a 2RU all-flash storage. Tintri has an example here.

I’ve provided a sample rack design in this blog.

What do you think? How many racks do you still use to handle 1000 VM?

Updates

  • [7 Nov 2015:  Tom Carter spotted an area I overlooked. I forgot to take into account the power requirements! He was rightly disappointed, and this is certainly disappointing for me too, as I used to sell big boxes like Sun Fire 15K and HDS 9990! On big boxes like this, I had to ensure that customers data center has the correct cee form. Beyond just the Ampere, you need to know if they are single-phase or triple-phase. So Tom, thank you for the correction! Tom provided his calculation in Ivan’s blog, so please review it]
  • [15 Nov 2015: Greg Ferro shared in his article that 1000 VM is certainly achievable. I agree with him that it’s a consideration. It’s not a goal nor a limit. It all depends on your application and situation]
  • [27 Mar 2016: Intel Xeon E5-2699-V4 is delivering 22 cores per socket, up from 18 cores in v3]
  • [16 July 2018: vSAN has wide adoption. Xeon Platinum has even more core, price of local SSD has gone down]

75% discount code for the SDDC Performance & Capacity Management book

Code is TechSummit15

With 75% discount, it’s only US$ 6.75.

The code only works at the Publisher web site. It does not work in Amazon or other site. The direct link to the book is this.

It is also for soft-copy only. Good for the environment, saves shipping fee, and it’s immediate download 🙂

For those who are not familiar, the title is rather misleading. As shared here, I wanted to name it SDDC Performance and Capacity Management. That was what I proposed to the Publisher, as it is not actually a product book. It is more of an architecture or best practice book, focus on performance and capacity. Out of the 260 pages, the bulk of the book is not about vRealize Operations. Product wise, the book covers vSphere more than it covers vRealize Operations. So it’s actually relevant if you just want to master those vSphere counters.

Because it is not a product book, you can in fact apply a lot of the concept even if you do not have vRealize Operations. If you are looking for a product book, the best one is here.

Multi-hypervisor consideration

My customer was considering adding a second hypervisor, because the Analysts say it is a common practice. My first thought as an IT Architect is: just because others are doing it, does not mean it is a good idea to do it. Even if it is a good idea, and it is also a best practice, does not mean it is good for you. There are many factors to consider that makes your situation and condition different to others.

Before I proceed further, we need to be clear on the scope of the discussion. This is about multi-hypervisors vs single-hypervisor. This is not hypervisor A vs B. To me, you are better off running Hyper-V or Acropolis or vSphere completely, then running >1. At least, you are not doubling complexity and need to master both. If you cannot even troubleshoot vSphere + NSX + VSAN properly, why add another platform into the mix?

To me, one needs to be grounded before making a decision. This allows us to be precise. Specific to hypervisor, we need to know which cluster should be running the 2nd hypervisor. Because of HA/DRS, a vSphere cluster is the smallest logical building block. I treat cluster as 1 unit of computing. I will make each member to have the same ESXi version and patch; hence running a totally different hypervisor in the same vSphere cluster is out of the question for me.

In order to pinpoint which cluster to run the 2nd hypervisor, you need to look at your overall SDDC architecture. This helps you ensure that the 2nd hypervisor fits well into your overall architecture. So start with your SDDC Architecture. You have that drawing right? 😉

I have created a sample for 500 server VM and 1000 VDI VM. Review that and see where you can fit the 2nd hypervisor. For those with larger deployment, the sample I provided scales to 2000 server VM and 5000 VDI VM. That’s large enough for most customers. If yours is larger, you can use that as Pod.

It’s a series of posts, and I go quite deep. So take your coffee and carefully review it.

I am happy to wait 🙂

Done reviewing? Great!

What you need to do now is to come up with your own SDDC Architecture. Likely, it won’t be as optimized and ideal as mine, as yours have to take into account brownfield reality.

You walk from where you stand. If you can't stand properly, don't walk.

Can you see where you can optimize and improve your SDDC? A lot of customers can improve their private cloud, better capability while lowering cost/complexity, by adding storage virtualization and network virtualization. If what you have is just server consolidation, then it is not even an SDDC. If you already have SDDC, but you’re far from AWS or Google level of efficiency and effectiveness, then adding a 2nd hypervisor is not going to get you closer. Focus first on getting to SDDC or Private Cloud.

Even if you have the complete architecture of SDDC, you can still lower cost by improving Operations. Review this material.

Have you designed your improved SDDC? If you have, there is a good chance that you have difficulty placing a 2nd hypervisor. The reason is a 2nd hypervisor de-optimize the environment. It actually makes the overall architecture more complex.

hypervisor

The hypervisor, as you quickly realized, is far from a commodity. Here is a detailed analysis on why it is not a commodity.

This additional complexity brings us the very point of the objective of a 2nd hypervisor. There are only 2 reasons why customer adds a second vendor to their environment:

  • The first one does not work
  • The first one too expensive

Optimizing SDDC Cost

In the case of VMware vSphere and SDDC, I think it is clear which one is the reason 🙂

So let’s talk about cost. With every passing year, IT has to deliver more with less. That’s the nature of the industry, hence your users expect it from you. You’re providing IT service. Since your vendors & suppliers are giving you more with less, you have to pass on this to the business.

If you look at the total IT cost, the VMware cost is a small component. If it were a big component, VMware revenue would equal to many IT giants. VMware revenue is much smaller than many IT giants, and I’m referring to just the Infrastructure revenue of these large vendors. For every dollar a CIO spends, perhaps <$0.1 goes to VMware. While you can focus on reducing this $0.1 by adding a second hypervisor, there is an alternative. You can use the same virtualization technology that you’ve applied to Server, and apply it to the other 2 pillars of Data Center. Every infrastructure consists of just 3 large pillars: Server, Storage, and Network. Use the same principles and experience, and extend virtualization to the rest of your infrastructure. Another word, evolve from Server Consolidation to SDDC.

What if Storage and Network are not something you can improve? In a lot of cases, you can still optimize your Compute. If you are running a 3-5 year old server, going to the latest Xeon will help you consolidate more. If your environment is small, you can consider single-socket. I wrote about it here. Reducing your socket counts mean less vSphere license. You can use the savings and improve your management capability with vRealize Operations Insight.

Even without this article, a lot of you realized that adding a 2nd hypervisor is not the right thing to do. I heard it directly from almost every VMware Architect/Engineers/Administrator at customers’ side. You’re trading cost from one bucket to another. This is because hypervisor is not merely a kernel that can run VMs. That small piece of software is at the core of your SDDC. Everything else on top depends on it, and leverages its API heavily. Everything else below is optimized for it. It is far from commodity. If you have peers who still think it’s a commodity, I hope this short blog article helps.

Have an enjoyable journey toward your SDDC, whichever hypervisor it maybe.