The rise of single-socket ESXi Host

When I first started doing VMware back in mid 2008, it was common to see 4-socket ESXi host. A cluster of 8 nodes means the cluster will have 32 sockets. This means 32 licenses of vSphere. On top of that, customers also have to pay for Guest OS. Some customers had to pay both RedHat and Windows.

With each passing year, Intel delivered more and more cores. I have customers who have excess vSphere license as they went from dual-core to 12-core over several years.

Fast forward to the current time. Intel launched an 18-core Xeon E5-2699 V3 in Q4 2014, followed by Xeon E6-2699 V4 launched in Q1 2016, and then Xeon Platinum 8176 in July 2017. This sports 28 cores! AMD has also joined in with Epyc.

The VMmark result shows near linear performance compared with older Xeon. Some of my customers have managed to reduce the number of ESXi Hosts. This is a good news to them, as that means less:

  • power consumption
  • UPS facility
  • data center space
  • air-cond facility
  • smaller VMware environment (for large customer, this makes management easier)
  • fewer vSphere licence (which means they can use the maintenance budget to get NSX, vSAN and vRealize)
  • less Windows Datacenter licence as it gives unlimited VM
    • Note: this does not apply to a single socket. See below.
  • less RHEL license as it gives unlimited VM
  • less software license that charge per physical socket. For example, if you run Oracle softwares or Microsoft SQL Server, the savings will be more than the infrastructure saving.

Going forward, I see customers with large ESXi farm to further increase their consolidation ratio in the next 1-2 years. I see that 20:1 is becoming common. This means

  • 15:1 for Tier 1 workload
  • 30:1 for Tier 2 workload
  • 60:1 for Tier 3 workload (double of Tier 2, as price is also half)

On the other scale, I see customers with very small number of VM to go down to 1 socket ESXi. This actually opens up a possibility for use cases that vSAN, NSX or vSphere could not address due to cost. Remote branches or ROBO is such a use case. In this use case, a 4-node cluster for vSAN may not make financial sense. That means 8 license. By going single-socket, the cost is reduced by 50%.

Thanks to Patrick Terlisten (@PTerlisten) and Flemming Riis (@FlemmingRiis) who corrected me on Twitter that Windows Datacenter Edition comes with 2 physical socket entitlement. It cannot be split into 2 separate physical servers. I found that a document from Microsoft titled “Volume Licensing reference guide Windows Server 2012 R2”, dated Nov 2013, stated it clearly on page 10:

Can I split my Windows Server 2012 R2 license across multiple servers?
No. Each license can be assigned only to a single physical server.

The extra core also supports the converged architecture. Storage and Network can now be run in software. We can use the extra core for services such as:

  • vSAN
  • vSphere Replication
  • vSphere Data Protection
  • NSX distributed router
  • NSX Edge
  • Trend Micro anti virus
  • F5 load balancer
  • Palo Alto Network firewall
  • etc

With a single-socket, the form factor has to be 1RU max. 2RU will be considered too space-consuming. In some cases such as VxRail, Supermicro and Nutanix, the 2RU form factor actually hosts 4 nodes, making each node 0.5 RU so to speak.

In the 1RU form factor, you do not have to compromise storage. For example, the Dell PowerEdge r630 provides 24 SSD, giving you 23 TB raw capacity. It has 24 x 1.8” SSD – up to 23 TB via 0.96 TB hot-plug SATA SSD.

You also do not need to compromise on RAM. Review this post to see that it’s a common mistake to have ESXi with excess RAM.

We know that Distributed Storage works much better on 10Gb networking than 1Gb networking. To get a pair of 24-ports 10Gb switch can be cost prohibitive. The good thing is there are vendors who supply 12-port switches, such as XSNetGear and ZyXEL.

I’d like to end this post by getting it validated by practitioners, folks who actually run technology like VSAN in production to see if this single-socket idea makes sense. I discussed this with Neil Cresswell, CEO of Indonesian Cloud, and this is what he has to say:

“With the rapidly increasing performance of CPUs, once again the discussion of scale up (fewer big servers) or scale out (more small servers) comes back into play. I remember a client a few years ago that bought an IBM 8 Socket Server and were running 100 VMs on it; at that time I thought they were crazy; why do that vs having 2x 4 socket servers with 50 VMs each! Well, that same argument is now true for 2 socket vs single socket. You can have a dual socket server running 50VMs or 2x single socket servers each running 25 VMs.
As a big believer in risk mitigation, I for one would always prefer to have more servers to help spread my risk of failure (smaller failure domain).”

What’s your take? Do you see customers (or your own infrastructure) adopting single-socket ESXi host? Do you see distributed storage at ROBO becomes visible with this smaller config?

I’m keen to hear your thought.

10 thoughts on “The rise of single-socket ESXi Host

    1. Iwan Rahabok Post author

      Thanks Krish for taking time to comment.
      For heavy applications, I’m assuming you mean in terms of CPU and RAM, and not storage and network. In this case, I think 2 sockets, or even 4 sockets, will make more sense.
      For heavy applications, in my experience they tend to be subjective. Over-sizing is quite a widespread problem we have. I’ve seen a 16 vCPU VM which basically idle for months. Over sizing is common because we tend to follow what we know (which is a practice dated in the Physical Machine era). I’d measure all those heavy applications. In my book, I have an example where I track these large VMs. For a banking client, I tracked >100 VMs with 8 vCPU or more. Their utilisation in the past 1 month was averaging 3%. That’s 800 vCPU wasted.

  1. Jason Dowd

    Hi Iwan, good post and well written.

    You’ve made a good case for 1 socket CPU’s for the ROBO however I was wondering what your thoughts were on their use in SMB data centers. The idea of scaling out smaller systems instead of scaling up into fewer large blade and storage systems. This isn’t a new idea of course as it’s been made popular by Google and Facebook data centers. However they run custom software and in some cases custom hardware to achieve these economies of scale. Something out of the reach for even large companies as they lack the technical expertise.

    VSAN is a step in the right direction but of course it is limited in scale. Nutanix is interesting with their distributed file system as it is able to scale much more and can use SSD as actual storage not just as a cache as VSAN does. Scaling out with dense compute, storage, and networking just seems to make more sense for a lot of use cases as compared to investing quite heavily in large blade and storage systems.

    I’m sure this idea that is not popular with big blade and storage vendors as the margin on their hefty systems is much higher. With the relationship between EMC and VMware I question how far VSAN will be developed. Do you see VSAN growing in scale and features to support SMB along the lines of Nutanix?

    Jason Dowd

    1. Iwan Rahabok Post author

      Thanks Jason for taking time to comment.
      The choice of hardware is always a fun thing to discuss. There are plus and minus for each. Some folks can get agitated over it though 🙂
      In 2015, I think there are now 4 design choice:
      – Blade
      – 2 sockets. Example is HP DL 380, probably one of the best selling ESXi host. I see a lot of these models in my customers (some are global banks)
      – 1 socket. This is the one I shared, because the 18-core Xeon 2699 VMmark result was a surprise to me.
      – Converged infrastructure. Nutanix, VSAN, Simplivity.
      If I were to architect for a given solution, there are many things to consider. You mention technical expertise, which is a key point I’d discuss with customer. Blade has its own set of features & implication that we need to be aware of.

      I see converged gaining momentum in 2015. I agree with you that converged makes economic sense in smaller setup. Probably the setup is not too small as you still need 2x 10 GE switch (I think the price will continue going down. You can tell me better here 🙂 ).

      I do not see that EMC drawing the boundary of how far VSAN will be developed. You will see VSAN gaining a couple of interesting features this year. Work with your account SE to see the roadmap. There seems to be a perception that EMC is controlling VMware. I’ve been with VMware for coming 7 years now, and my personal experience tell me no. It has been great working with EMC team.

      I’ve linked up with you at LinkedIn.

      All the best in 2015!

      1. Jason Dowd

        It’s good to hear from the inside that EMC is not influencing product development. Looking in from the outside it has always seemed EMC has been very hands off to the benefit of both companies. Sometimes I get a flare up of cynicism and conspiracies abound 🙂

        I think converged infrastructure will gain as well in 2015. It won’t be viable for larger scale deployments until there is a lot more commercial support. It’s a great idea but until software and hardware vendors are more fully behind it, it’ just not viable for an enterprise. It would be interesting to one day have the option of doing a large scale converged deployment.

        Thanks for linking up with me on LinkedIn. I look forward to more of your posts.

        Best Regards

  2. Pingback: Newsletter: January 4, 2015 | Notes from MWhite

  3. Pingback: VMware SDDC Architecture: sample for 500 and 2000 VM

  4. Pingback: Multi-hypervisor consideration

  5. Pingback: NOC Dashboards for SDDC - Part 2 - virtual red dot

Leave a Reply