If you are to architect a virtual infrastructure for 500 VM, what will it look like? The minor details will certainly differ from one implementation to another, however the major building blocks will be similar. The same goes for say a 2000 VM class environment. Given the rapid improvement in hardware price/performance, I categorise 500 – 2500 VM as a medium SDDC. 2500 VM used to be a large farm, occupying rows of racks in a decent size data center. I think in 2016, it will be merely 1-3 racks, inclusive of network and storage! What used to be the whole data center has become the size of a small server room! Yes, a few experts are all it takes to manage 2500 VM.
Before you proceed, I recommend that you read the official VMware Validated Design. I've taken idea and solution from there and added my 2 cents. You should also review EVO SDDC, especially this interview with Raj Yavatkar, VMware Fellow and Lead Development Architect of EVO SDDC.
I’m privilege to server customers with >50K server VM, and even more desktop VM. For customers with >10K VM per physical data center, I see 2015-Q4 Pod to consist of 2 racks and house 2000 – 3000 VM. While data center power supply is reliable, most customers take into rack failure or rack maintenance. So the minimum size is 2 physical racks. The pod is complete and can stand on its own. It has server, storage, network and management. It is 1 logical unit, and managed as 1. They are patched, updated, secured and upgraded as 1. It may not have its own vCenter, as you do not want to have too many vCenter as that increases complexity. Because of this operation and management challenge, a pod has only 1 hypervisor. You either go with VMware SDDC Pod, or pod from other vendors. If you are going with multiple hypervisors, then you will create 2 independent pods. Each pod will host similar number of VMs. If your decision to go multiple hypervisors because you think they are commodity, read this blog.
I understand that most customers have <2500 VM. In fact, in my region, if you have >1000 VM, you are considered large. So what does a pod look like when you don’t even have enough workload for half a pod? You don’t have economics of scale, so how do you optimize a smaller infrastructure?
Architecture is far from trivial, so this will be a series of blogs.
- Part 1 (this blog) will set the stage, cover Requirements, provide Overall Architecture and summary.
- Part 2 covers Network Architecture
- Part 3 will cover Storage Architecture (coming after VMworld, as I need some details from Tintri!)
- Part 4 covers the Rack Design.
- Part 5 explains the design considerations I had when thinking through the solution. This is actually critical. If you have a different consideration, you will end up with a different design.
- Part 6 covers the methodology I used.
To enable us to discuss, we need to take an example. The following table scopes the size and requirements:
The SDDC will have to cater for both server VM and desktop VM. I call them VSI and VDI, respectively. To save cost, they share the same management cluster. They have their own vCenter as this allows Horizon to be upgraded independently of the server farm.
It has to have DR capability. I’m using VMware SRM and vSphere Replication. It has to support active-active application too. I consider VDI as an application, and in this architecture, it is active/active, hence DR is irrelevant. Because I have Active-Active at application layer already, I do not see a need to cater for Disaster Avoidance.
The server farm is further broken into Service Tiers. It is common to see multiple tiers, so I’m using 3. Gold is the highest and best tier. Service Tier is critical, and you can see this for more details. The 3 tier of service is defined below:
VMs are also split into different environments. For simplicity, I’d group them into Production and Non-Production. To make it easier to comply with audit and regulation, I am not mixing Production and Non-Production in the same vSphere cluster. Separate cluster also allows you to test SDDC changes in a less critical environment. The problem is the nasty issue typically happens at production. Just because it’s running smoothly in Non-Production for years does not mean it will be in Production.
For the VDI, I simply go with 500 VM per cluster, as 500 is an easy number to remember. In a specific customer environment, I normally refine the approach and use 10 ESXi host per cluster as the number of VDI VMs vary depending on the user profile.
SDN is a key component of SDDC. NSX best practices tells to have a dedicated cluster for Edge. I’m including a small cluster on each physical data center.
VMware best practice also recommends a separate cluster for management. I am extending this concept and call it IT + Management Cluster. It is not only for management, which is out of band. It is also for core or shared services that IT provides to business. These services are typically in-band.
Based on all of the above requirements, scope, and consideration, here is what I’d propose:
It has 2 physical data centers, as I have to cater for DR and active-active applications. It has 2 vCenters servers for the same reason. Horizon has its own vCenter for flexibility and simplicity.
The physical Data Center 1 serves the bulk of production workload. I’m not keen on splitting Production equally between 2 data centers as you need a lot of WAN bandwidth. As you can see in the diagram, I’m also providing active/active capability. Majority of traffic is East – West. I’ve seen customers who have big pipes and yet encounter latency issue even though the link is not saturated.
The other reason is to force the separation between Production and Non Production. Migration to Production should be controlled. If they are in the same physical DC, it can be tempting to shortcut the process.
- Tier 1. I further split the Gold tier into 2. This is to enable mission critical applications to have long distance vMotion. Out of 100 VM, I allocate 25% for active/active applications and Disaster Avoidance (DA).
- Tier 2. I split it into 2 physical data center, as I need to meet the DR requirements. Unlike Tier 1, I no longer provide DA and active/active application.
- Tier 3. I only make this available for Non-Production. An environment with just 300 Production VM is too small to have >2 tiers. In this example, I am actually providing 2+ tier, as the Gold Tier has option for active/active application and DA.
My sizing is based on the simple model of consolidation ratio. To me, they are just guidance. For proper sizing, review this link. You may wanna get yourself a good cup of coffee as that’s a 5-part series.
Let’s now add the remaining component to make the diagram a bit more complete. Here is what it looks like after I add Management, DR and ESXi.
DR with SRM. We use SRM to failover Tier 1 into Tier 3. During DR, we will freeze the Tier 3 VMs, so Tier 1 VMs can run with no performance impact. I’ve made the cluster size identical to ensure no performance loss. For Tier 2, it will failover to Tier 2. So there is 50% performance loss. I’m drawing the arrow one-way for both, in reality you can fail over in either direction.
ESXi Sizing. I have 2 sizes: 2 socket and single-socket. The bigger square is 2 sockets, and the smaller one is 1 socket. Please review this for why I use single socket. I’m trying to the cluster size 4 – 12 nodes, and I try not to have too many sizes. As you can see, I do have some small cluster as there is simply not enough workload to justify more node.
We’ve completed the overall architecture for 500 server VM and 1000 VDI. Can we scale this to 2000 server VM and 5000 VDI, with almost no re-architecting? The answer is yes. Here is the architecture. Notice how similar it is. This is why I wrote in the beginning that “the major building blocks will be similar”. In this case, I’ve shown that they are in fact identical. My little girl told me as I went back and forth between the 2 diagram…. “Daddy, why are drawing the same thing two times?” 🙂
The only changes above is just the ESXi sizes and cluster size. For examples:
- For the VDI, I have 5 clusters per site.
- For the Tier 1 server VM, I have 3 clusters. Each has 8 ESXi host. I keep all 8 to make it simpler.
- For the Tier 3 server VM, I have 2 clusters. Each has 12 ESXi hosts. Total is 24 hosts, so it’s enough to run all Tier 1 during DC-wide failover.
By now, you likely notice that I have omitted 2 large components of SDDC Architecture. Can you guess?
Yup, it’s Storage and Network.
I will touch Storage here, and will cover Network on a separate blog. I’m simplifying the diagram so we can focus on the storage subsystem:
I’m using 2 types of storage, although we can very well use VSAN all the way. I use VSAN for VDI and IaaS clusters (Management and Network Edge), and classic array for Server clusters (Tier 1, 2, and 3).
I’ve added the vSphere integration that Storage arrays typically have. All these integration need specific firmware level, and they also impact the way you architect, size and configure the array. vSphere is not simply a workload that needs a bunch of LUNs.
I’ve never seen an IT environment where the ground team is not stretched. The reality of IT support if you are under-staff, under-trained, lack of proper tools and bogged down by process and politics. There are often more managers than individual contributors.
As you can see from this article, the whole thing becomes very complex. Making the Architecture simple pays back in Operation. It is indeed not a simple matter. This is why I believe the hypervisor is not a commodity at all. It is your very data center. If you think adding Hyper-V is a simple thing, suggest you review this. That’s written by someone with actual production experience, not consultant who leave after project is over.
As Architect, we all know that it is one thing to build, it is another to operate. The above architecture requires a very different operation than classic, physical DC architecture. SDDC is not physical DC, virtualised. It needs a special team, led by the SDDC Architect.
In the above architecture, I see that adding a second hypervisor as “penny wise pound foolish”. If you think that results in a vendor lock in, kindly review this and share your analysis.
- Not able to do Disaster Avoidance. The main reason is I think it increases cost and complexity with minimal additional benefit. For critical applications, it is already protected with Active/Active at application layer, making DR and DA redundant. For the rest, it already has SRM.
BTW, if you want the editable diagram, you can get it here. Happy architecting! In the next post, I’ll cover Network architecture..