Tag Archives: architecture

Multi-hypervisor consideration

My customer was considering adding a second hypervisor, because the Analysts say it is a common practice. My first thought as an IT Architect is: just because others are doing it, does not mean it is a good idea to do it. Even if it is a good idea, and it is also a best practice, does not mean it is good for you. There are many factors to consider that makes your situation and condition different to others.

Before I proceed further, we need to be clear on the scope of the discussion. This is about multi-hypervisors vs single-hypervisor. This is not hypervisor A vs B. To me, you are better off running Hyper-V or Acropolis or vSphere completely, then running >1. At least, you are not doubling complexity and need to master both. If you cannot even troubleshoot vSphere + NSX + VSAN properly, why add another platform into the mix?

To me, one needs to be grounded before making a decision. This allows us to be precise. Specific to hypervisor, we need to know which cluster should be running the 2nd hypervisor. Because of HA/DRS, a vSphere cluster is the smallest logical building block. I treat cluster as 1 unit of computing. I will make each member to have the same ESXi version and patch; hence running a totally different hypervisor in the same vSphere cluster is out of the question for me.

In order to pinpoint which cluster to run the 2nd hypervisor, you need to look at your overall SDDC architecture. This helps you ensure that the 2nd hypervisor fits well into your overall architecture. So start with your SDDC Architecture. You have that drawing right? 😉

I have created a sample for 500 server VM and 1000 VDI VM. Review that and see where you can fit the 2nd hypervisor. For those with larger deployment, the sample I provided scales to 2000 server VM and 5000 VDI VM. That’s large enough for most customers. If yours is larger, you can use that as Pod.

It’s a series of posts, and I go quite deep. So take your coffee and carefully review it.

I am happy to wait 🙂

Done reviewing? Great!

What you need to do now is to come up with your own SDDC Architecture. Likely, it won’t be as optimized and ideal as mine, as yours have to take into account brownfield reality.

You walk from where you stand. If you can't stand properly, don't walk.

Can you see where you can optimize and improve your SDDC? A lot of customers can improve their private cloud, better capability while lowering cost/complexity, by adding storage virtualization and network virtualization. If what you have is just server consolidation, then it is not even an SDDC. If you already have SDDC, but you’re far from AWS or Google level of efficiency and effectiveness, then adding a 2nd hypervisor is not going to get you closer. Focus first on getting to SDDC or Private Cloud.

Even if you have the complete architecture of SDDC, you can still lower cost by improving Operations. Review this material.

Have you designed your improved SDDC? If you have, there is a good chance that you have difficulty placing a 2nd hypervisor. The reason is a 2nd hypervisor de-optimize the environment. It actually makes the overall architecture more complex.


The hypervisor, as you quickly realized, is far from a commodity. Here is a detailed analysis on why it is not a commodity.

This additional complexity brings us the very point of the objective of a 2nd hypervisor. There are only 2 reasons why customer adds a second vendor to their environment:

  • The first one does not work
  • The first one too expensive

Optimizing SDDC Cost

In the case of VMware vSphere and SDDC, I think it is clear which one is the reason 🙂

So let’s talk about cost. With every passing year, IT has to deliver more with less. That’s the nature of the industry, hence your users expect it from you. You’re providing IT service. Since your vendors & suppliers are giving you more with less, you have to pass on this to the business.

If you look at the total IT cost, the VMware cost is a small component. If it were a big component, VMware revenue would equal to many IT giants. VMware revenue is much smaller than many IT giants, and I’m referring to just the Infrastructure revenue of these large vendors. For every dollar a CIO spends, perhaps <$0.1 goes to VMware. While you can focus on reducing this $0.1 by adding a second hypervisor, there is an alternative. You can use the same virtualization technology that you’ve applied to Server, and apply it to the other 2 pillars of Data Center. Every infrastructure consists of just 3 large pillars: Server, Storage, and Network. Use the same principles and experience, and extend virtualization to the rest of your infrastructure. Another word, evolve from Server Consolidation to SDDC.

What if Storage and Network are not something you can improve? In a lot of cases, you can still optimize your Compute. If you are running a 3-5 year old server, going to the latest Xeon will help you consolidate more. If your environment is small, you can consider single-socket. I wrote about it here. Reducing your socket counts mean less vSphere license. You can use the savings and improve your management capability with vRealize Operations Insight.

Even without this article, a lot of you realized that adding a 2nd hypervisor is not the right thing to do. I heard it directly from almost every VMware Architect/Engineers/Administrator at customers’ side. You’re trading cost from one bucket to another. This is because hypervisor is not merely a kernel that can run VMs. That small piece of software is at the core of your SDDC. Everything else on top depends on it, and leverages its API heavily. Everything else below is optimized for it. It is far from commodity. If you have peers who still think it’s a commodity, I hope this short blog article helps.

Have an enjoyable journey toward your SDDC, whichever hypervisor it maybe.

SDDC Architecture Consideration

Recently, I published a sample architecture for an SDDC that is based in VMware technology. The post resonated well with readers (thank you!). The sample architecture is probably not what you expected, so I will share the consideration I had when thinking through the architecture.

A vSphere-based SDDC is very different to physical Data Center. I covered that in-depth in my book, so here I will just highlight the relevant points for this post:

  • It breaks best practice, as virtualisation is a disruptive technology and it changes paradigm. Do not apply physical-world paradigm into virtual-world. There are many “best practices” in physical world that are caused by physical world limitation. Once the limitation is removed, the best practice becomes dated practice.
  • Take advantage of emerging technology to break away from constraint. Virtualization is innovating rapidly. Best practice means proven practice, and that might mean outdated practice in this rapidly changing landscape.
  • Consider unrequested requirements as business expect cloud to be agile. You have experienced VM sprawl right 🙂 . Expect to be able to adapt to changing requirements.

In the past 2+ decades that I work in IT, I have been fortunate to learn or see many great Architects and Engineers. I notice there is an element of style. Each architect does thing a little differently. There are also principles that they adopt. One of my favourite is the KISS principle. Besides this, here is another one I hold dearly:

Do not architect something you are not prepared to troubleshoot.

If you will not be staying post implementation, think of the support team. A good IT Architect does not setup potential risk for Support Person down the line. I also tend to keep things simple and modular. Cost will certainly not be optimal, but it is worth the benefits. You are right, I’m applying the 80/20 rule.

Having said all the above, what do I consider in vSphere based architecture?

Architecture consideration


Why do I consider so many things? Because Enterprise IT is complex. I’m not here to paint it is simple. Chuck Hollis explains it very well here, so I won’t repeat it.


  • This is unique in the virtual world. It is not something you consider in the physical world. A key aspect of SDDC that we don’t normally discuss is how to upgrade the SDDC itself. Upgrading the entire SDDC can be something like renovating your home while living in it.
  • When considering an upgrade, think beyond the next version. Generally speaking, it is safe to assume that there is an upgrade path from current version to the next version. But do you always follow the latest? Normally, you would skip, as it’s an expensive operation to perform the upgrade. If you are upgrading every 3 years, you might be 3 versions behind. The upgrade path maybe more complex than you assume it is.
  • Upgradeability is no 1 in my consideration because an SDDC consists of multiple products from multiple vendors. You need to have an approach on how you will upgrade your SDDC as it’s unique to yours.
  • Your architecture will likely span 3 years. It will be operational for possibly many years beyond that. So check with your vendors for NDA roadmap presentation. While you know the roadmap is not a guarantee, at least you know you’re not implementing something that your vendor has no intention of improving.


  • How easy it is to troubleshoot the SDDC you architected?
  • Troubleshooting in virtual environment is harder than physical, as boundary is blurred and physical resources are shared. So troubleshooting tools is a critical component of SDDC.
  • For troubleshooting, log files is a rich source of information. I rarely see customers with a proper Log Management Platform (LMP). I shared the criteria in this blog so you can benchmark yours.
  • There are 3 types of troubleshooting:
    • Configuration. Generally this results in something becomes broken. It used to work normally, and it stops working as expected. The symptom and root cause can be unrelated.
    • Stability. Stability means something hang or crash (BSOD, PSOD, etc.) or corrupted. This is typically due to bug or incompatibility.
    • Performance. This can be really hard to solve if the slow performance is short lived and in most cases it is performing well. You may need to create additional vCenter alarm to catch this infrequent performance issue.
  • For Tier 1 workload, I’d add extra server and storage. This means we can isolate the problematic VM while performing joint troubleshooting with App team. The extra hardware is not wasted as for Tier 1 I normally specify N+2 as Availability Policy.

Manageability and Supportability

  • This is related, but not the same with Debugability.
  • This relates to things that make day to day management easier. Monitoring counters, reading logs, setting up alerts, big screen projectors, etc. Make it easy for the front line help desk team to support your SDDC.
  • Moving toward “1 VM 1 OS 1 App”. In physical data center, some physical servers serve multiple purpose. In virtual, we can afford, and should do so, to run 1 App per VM.
  • A good design makes it harder for Support team to make human error. Virtualisation makes task easy, sometimes way too easy relative to physical world. Consider this operational and psychological changes in your design. For example, I separate Production and Non Production into separate cluster. I also put Non Production into separate physical Data Center, so the promotion to Production is a deliberate effort.
  • Supportability also means using components that are supported by the vendors. This should be obvious as we should not deploy unsupported configuration.


  • I hope you are not surprised that this appears as no 4 in the list, not no 1. I’m mindful of keeping cost low, as you can see in the choice of hardware and removal of certain features. Cost was a key factor for not having Disaster Avoidance and Active/Active at Infrastructure layer.
  • The Secondary Site serves 3 purposes to reduce cost:
    • DR
    • Running Non Production and other workload
    • A test environment for the SDDC itself.
  • VMs from different Business Units are mixed in 1 cluster to avoid provisioning extra cluster. If they can share same LAN and SAN, there is no technical reason why they cannot share the same hypervisor. So physical Network, Storage and Server are all shared.
  • Window and Linux are mixed in 1 physical cluster. If you have large numbers of RHEL OS, separate them into dedicated to maximise your OS license.
  • If you have a large number of Oracle or MS SQL, or other software that charges per physical CPU, dedicating a cluster can result in saving. This is especially true when the software costs a lot more than the IaaS.
  • Business Units cannot buy entire cluster, as this changes the business model from IaaS to Hosting. This increases complexity and prevents cost optimisation.
  • DMZ and non DMZ are mixed in 1 cluster to avoid provisioning a cluster for DMZ. I am using VMware NSX to achieve the isolation.


  • Software has Bugs. Hardware has Fault. We cater for hardware fault by having redundant hardware. What about software bugs? How do you cater for that when the entire data center is now defined in Software? 🙂
  • Because the key component is software, you can do a fair amount of testing in a virtual ESXi. So I’d build an identical stack in Management Cluster. The reason for choosing Management Cluster instead of Non Production cluster is to keep the separation between IT and Business clean and clear.
  • One key reason for not having an active/active infrastructure is to enable testing of the SDDC at the Passive Data Center. When both vSphere is active/active, serving 50% production workload each, it gets difficult to test/patch/update vSphere. You don’t have a “test environment” as both vSphere is live.


  • It is related to availability, but it is not the same. Availability is normally achieved by redundancy. Reliability is normally achieved by keeping things simple, using proven components, separating things, and standardising. Standardisation extends beyond technical components. You can and should standardise your process, chargeback model, etc.
  • One area that customers tend to standardise that I no longer believe is VM size. I used to advocate standard sizes (small, medium, large) where each size is fixed. I learned from customers that having different sizes do not make operations more complex. So I’d allow  “odd size” VM, such as 3 vCPU in order to optimize performance minimize cost.


  • This should be obvious, so I just want to highlight that there are 2 dimension to performance.
    • How fast can we do 1 transaction? Latency, clock speed, CPU Cache Size, SSD quality matters here. One reason why I prefer to use CPU with the highest cache size is performance of a single transaction.
    • How many transactions can we do within SLA? Throughput and scalability matters here.
  • Include head room. Almost all companies need to have more servers, especially in non production. So when virtualisation happens, we have this VM Sprawl. As such, the design should have head room. Like one CIO told me, “You need to be ahead of the business.”

Existing Environment and People

  • Brown Field is certainly a major consideration. We walk from where we stand. There are 2 elements to the brown field:
    • People
    • Technology.
  • A good CIO considers the skills of his IT team, as when the vendors leave, his team has to support it. Nowadays, most IT departments have complemented their staff with resident engineers from vendors. So the skills include both internal and external (preferred vendor who complement the IT team).
  • In SDDC, it is impossible to be expert on all areas. I’m sure you have heard the saying “jack of all trades master of none”. Consider complementing the internal team by establishing long term partnership with an IT vendor. Having a vendor/vendee relationship saves cost initially, but in the long run there is a cost. You can negotiate hard with the vendor, but do not antagonize the human representing the vendor. There is a level of support above the highest level that the vendor provides. That level is called friendship.
  • How does the new component fit into existing environment? E.g. adding a new Brand A server into a data center full of Brand B servers need to take into account management and compatibility with common components.

Can you notice a missing consideration? Something we should always consider, that I have not listed.

You are right, it is Security. In some regulated industry, you need to include Compliance also. This is a big topic, worth a blog by itself 🙂 One thing I want to quickly share here, is physical isolation is not good enough if it is the only solution. You need to complement it with logical isolation, just in case the physical isolation is bridged (intentionally or unintentionally).

VMware SDDC Architecture: Network

In the previous article, I covered the requirements and overall architecture. We have also covered the Compute architecture. To some extend, I’ve covered Storage architecture as it’s using VSAN (we will dive in a future blog). In this blog, I will cover the Network architecture. I’m not a Network Architect, and have benefited from great blogs by Ivan and Scott.

Logical Architecture

There are 4 kinds of network in VMware SDDC, namely:

  1. VMkernel network
  2. ESXi Agents network
  3. VM network
  4. iLO network

The VMkernel network has grown as VMware adds more capabilities into the ESXi kernel. In vSphere 6, you may need up to 8 IP addresses for each ESXi host. Since they are on the physical network, you will need VLAN for each. While you can place the vmkernel network on VXLAN, for simplicity and cleaner separatio, I’d put them on VLAN. Operationally, I’ve put them on a 2-digit VLAN ID, so it’s easier for anyone in the company to remember that VLAN 10 – 99 are for VMkernel.

network 1

There are many articles on the various VMkernel network, so I will just touch them briefly:

  • Management Network. This is predominantly for ESXi. I put all of them on the same VLAN to prevent operations from becoming too complex. There is no need for isolation as this is out of band. VMs traffic do not go here.
  • vMotion network. I keep them on separate VLAN as there is no need to do inter-cluster vMotion a VM across Clusters Type. For example, In the Network Edge Clusters, the only VMs living here will be NSX Edge VMs and NSX Distributed Router VMs. There is no need for other VMs to live in this cluster. To minimize human error and ensure the segregation, the vMotion network does not go across different type of clusters. Let’s take another example to make it clearer. In the VDI cluster, we will have 1 – 5 clusters per physical DC. They can vMotion among these 5 VDI clusters, so the entire 5 clusters is just 1 logical pool. There is no business need for the VDI VM to vMotion to the Management Cluster. Also, the VDI server infrastructure (e.g. Horizon View Connection Server) live in the Management Cluster, so separation helps simplify operation. This interface needs 2 IP addresses if you are doing multi-NIC vMotion.
  • Fault Tolerant network. I apply the same restriction to prevent a VM and its shadow VM spans across 2 different type of cluster.
  • Storage network. This can be NFS, iSCSI or VSAN. From the above diagram, you can see that I share the Storage network between Server workload, Desktop workload and Non Production. To keep things simpler, you can actually share with the Management and Edge clusters also. This means you only have 1 VLAN. The reason it is safe to do that is there is no life migration. The VM has to be shut down as there is no vMotion network. You also cannot have an FT VM spanning as there is no FT network across the cluster type.
  • vSphere Replication network. Having a separate network makes monitoring easier.
  • VXLAN network. This is the network where the VM traffic will be tunnelled. Having a separate network makes monitoring easier.

The above will need 6-7 IP addresses. Plan your IP address carefully. I personally prefer an easy correlation to my ESXi. For example, ESXi-01 in my environment will have x.x.x.1 address, and ESXi-99 will have the x.x.x.99 address.

ESXi Agent Network

Data Center components are moving to the hypervisor. When they move, they either move to the kernel as kernel module (e.g. VSAN), or they take the VM form factor. An example of VM form factor is Nutanix Storage VM and TrendMicro Deep Security. Regardless, you need an IP address for every ESXi.

This Network is not a vmkernel network. They are VM network. However, they are backed by VLAN, not VXLAN. That means they are on the physical network, given a physical IP address. So you need to plan them.

Now that we’ve covered the ESXi networks, let’s move to the VM Network.

VM Network

All of them, without exception, will be on VXLAN. This allows decoupling with the physical DC network. Another word, we virtualize the VM Network. Defining it on software allows inter-DC mobility. There will be many VXLAN networks, so I need to plan them carefully too. In the above diagram, I have grouped into 4 top-level groups. I’d give each group its own range, so I know what kind of workload is running given a VXLAN number. Here is an example:

  • VXLAN 10000 – 19999: Server Workload
  • VXLAN 20000 – 29999: DMZ Workload
  • VXLAN 30000 – 39999: Desktop Workload
  • VXLAN 40000 – 49999: Non Production Workload

I have a wide range on each as I have a sub-category. Yes, this means you will have a lot more VXLAN than you do VLAN. This is a fundamental difference between networking in SDDC and networking prior network virtualization. You do not want to have too many VLANs as it’s operationally taxing. VXLAN does not have that issue. Network becomes cheap. You can lots of them. For example, the server workload is split per application. If I give each application up to 10 networks, I can have 1000 applications. By having 10 networks, I can have numbering convention. For example:

  • Web server network: xxxx1. Example is 10001, 10011, 10021, and 10031 for the first 4 applications. This means I know that anything on 1xxx1 is my production web servers.
  • Application server network: xxxx2
  • DB server network: xxxx3

Lastly, but certainly not the least important, you should have the iLO network for light-out management. This is the physical boxes management network.

Physical Architecture

[10 Nov 2015: I got a correction from Raj Yavatkar and T. Sridhar that we should not have spine connect to the northbound switch – if you do that, that creates some interesting issues; spines should be devoted to carry, inter-rack, East-West traffic. I will update the diagrams once I have some time. Thanks Raj and Sridhar for the correction. Much appreciated].

As SDDC Architect, you need to know how to implement the above Logical Architecture. It is software defined, but the hardware plays an important role. If necessary, tap the expertise of the Network Architect. In my case, I have requested YJ Huang from Arista to help me. I also benefit from Ivan Pepelnjak’s post here.

We start from the base connectivity. The diagram below shows 2 ESXi hosts, and how they are physically connected to network devices. I am using 2x 10 GE cables for data network, and 1x 1 GE cable for iLO. There is no need for HA for the iLO network. In most cases, 2x 10 GE is more than enough. Know your workload before you decide with 4x 10 GE.

network 1-1

Now that we’ve covered the basic, let’s see what the overall picture look like when we attach all the ESXi Hosts. The diagram below shows the 2 physical data centers. They have identical physical setup, but different IP addresses. Data Center 1 could have 10.10.x.x, while Data Center 2 has 20.20.x.x. By not extending the physical network, you contain the failure domain.

The diagram shows how my 5 clusters are connected to the switches. I use a Spine Leaf as I want to be able to scale without major re-architecting. I’ve drawn the future switches in grey. They naturally come in pair. I draw the spine-leaf connection thicker as that is 40G.

network 2

Let’s see how the architecture scale to the requirements, which is 2000 server VM and 5000 VDI VM. As you can see, it’s essentially an extension. Fundamentally, it remains the same. I do change the cluster placement to keep it simpler. This comes at a cost of re-cabling.

Architecture 2000

You maybe wondering why I use 40G between spine and leaf. For the VM Network, 10G is more than sufficient. The reason is the VMkernel Network. The vMotion boundary cut across pods.

network 5

I hope you find useful. Keen to hear your thought!