Monthly Archives: August 2015

VMware SDDC Architecture: Rack Design

This blog continues our example of VMware SDDC Architecture. I started with the requirements and overall architecture. In a follow up blog, I discussed the Network architecture.

Let’s now discuss the Rack design, as it’s closely related. Make that intertwined, as you will see it later. The rack is where all the Server, Network and Storage have to live together, so it’s critical that everyone sits down together and plans this. By everyone I mean the Server Architect, Storage Architect and Network Architect. This should be led by the SDDC Architect. Not too many people, else you may have a committee instead 🙂

With new technology and form factor, the entire infrastructure fits inside a single rack. It is becoming common that customers drastically save space when moving to a more efficient form factor. I think the era of gigantic and unique equipments are slowly coming to an end. I wrote that with a bit of sadness as I used to sell Sun Fire E12K – 25K and HDS 9990. If you are supporting 2000 VM, you should aim toward 2 racks space as your benchmark.

It’s worth repeating: 2000 VM = 2 racks.

Rack Design for 500 VM

In the example below, I have chosen a 2RU 4-Node form factor. You can find many models from Super Micro that uses this form factor. From their web site, you can tell that they have a lot of other form factors to choose from. So you do not have to follow this form factor. Whatever you choose, avoid choosing more than 2. Standardisation could be the difference between camping in the data center and spending time with your loved ones.

I chose this form factor as I have used it before. It saves space and power. It is easier to handle as it is lighter than bigger chassis with same number of nodes and socket. It is also not a blade. There is no backplane and proprietary switches. As a result, I do not have to cater for chassis failure. There is no active component in the chassis that needs patches or replacement. It’s just a metal chassis with no electronic. Because of this, I do not have to span a VMware vSphere Cluster across chassis to cater for chassis issue.

What’s the drawback? Cabling. Relative to blade, you have more of them, and they are visible.

Let’s look at the design. I let you digest it for a 60 billions nanoseconds….

Architecture rack

What do you think? I hope now you see the complexity in rack design, and why I said it’s the work of the entire team.

I follow the standard placement of Network switches at the top, and Storage at the bottom. The servers fill the space the middle, and is further split into 2:

  1. Infrastructure:
    1. Network Edge
    2. Management
  2. Workload:
    1. VDI workload
    2. Server workload

You notice something missing? Yes, there are 2 components missing:

  1. UPS
  2. KVM

I’m not familiar with UPS, hence unable to provide advice. What I know is they can fit inside the rack. I’d place them at the bottom. This means Storage will be above them. In general, put heavy equipments at the bottom.

On KVM, I think there is little need since we have iLO. I’m also a fan of dark data center, and do not like to stand in front of rack working on console. I also think this improves security.

For Storage, I’m using Tintri 820. I will discuss the reason why in the Storage section, which I’m drafting and seeking advice from Jason Stegeman, a Tintri SE based in Australia. The blog post will hopefully appear post VMworld. Tintri takes up 4RU, which is a good save spacing.

For Compute, the space required is 14 RU. This gives us 28 ESXi Host. I only need 25, so there will be empty slots in the chassis. I’m using the following logic in placement:

  • NSX Edge Cluster at the top. This is because the network switch is at the top. Depending on my expansion plan, I may leave space for this cluster to grow. In my diagram, I’m leaving 2 slots, as I’m expecting to grow to 4 nodes.
  • Management Cluster below the NSX Edge Cluster. This is because I’m not expecting it to grow beyond 4 nodes. Depending on my expansion plan, I may leave space for this cluster to grow.
  • VDI Cluster below the Management Cluster.
  • Gap. This is a common expansion area for VDI workload and Server workload. I’m not sure how many each will grow, so by sharing a common expansion slots, I’m giving myself flexibility.
  • Server Cluster above the Storage box.

The above Rack Design is for Primary Data Center. It has a total of 25 ESXi hosts.

The Secondary Data Center has 27 ESXi Host. The Rack Diagram is very similar.

Architecture rack secondary DC

Notice that there is plenty of space left in the rack. This means I could have used a standard 1RU server instead of 2RU 4 Node building block. Using a 1RU server give you more choice of vendors and models.

We have done the equipment placement. We know they will fit into the rack. But there are 2 more items we must consider. Can you guess?

Yes, power and cooling. Just because we can fit the equipment, does not mean we can power it. Just because we can power it, does not mean we can cool it. Also, we need to consider UPS. All these depends on the Data Center facility. Generally speaking, you can expect 32 Ampere x 2 ceeform per rack. In older data center, you may only have 16A x 4 cables. Plan your power carefully, as you do not want to hit the limit. Generally speaking, I’d buffer 10%. So if I’m given 16A, I’d only use until 14.4A.

Scaling to 2000 VM

We’ve discussed the rack design for 500 Server VM + 1000 Desktop VM. What does it look like when we scale to 2000 server VM + 5000 Desktop VM?

As you can guess, because we are dealing with physical world, the scaling cannot be done without physical re-wiring and relocation of equipments. This can certainly be difficult to execute in production environment. This is why it is critical to plan ahead. You may even need to buy ahead, and leave with extra equipments that you do not actually need. One way to have a better protection is to partner with vendors who are willing to invest the future box, knowing that you will eventually buy them anyway.

Below is the draft Rack Design for 1000 Server VM + 2500 Desktop VM. This is for the Production Data Center, so it has around half the workload. This is an early draft, as I have not applied some of the principles I discussed above.

2

From the above diagram, can you notice something missing? It’s a big component.

Yes, you are right. It’s the shared Storage. I have not included the shared storage. For that, you have to wait for the next blog 🙂 I’m planning to use VSAN and Tintri 820 as the examples. One idea I’m exploring is to have Tintri 820 per rack, serving the ESXi on that rack only. This minimises cabling.

If you are wondering why I end up with this architecture, the design consideration will help to explain.

Capacity Management: What’s wrong with these statements?

Can you figure out why the following statements are wrong? They are all well meaning advice on the topic of Capacity Management. Usually, the questions are:

  • How is my IaaS performing? What’s the performance of my VMware environment?
  • How healthy it in terms of performance and capacity? Am I running a risk?

 

Regarding Cluster CPU:

  • Physical Core to Virtual CPU Ratio is high at 1:5 times on cluster “XYZ” since this is an important production cluster.
  • The rest of your clusters overcommit ratio looks good around 1:3. Maintain it below this ratio and you’re safe.
  • Keep the over commitment ratio to 1:4 for Tier 3 workload.
  • CPU usage is around 70% on cluster “ABCDE”. Since they are UAT servers, don’t worry. You should get worried only when they reach 85%.
  • The rest of your clusters CPU utilization is around 25%. This is good! You have plenty of capacity left.

Regarding Cluster RAM:

  • We recommend 1:2 overcommit ratio between physical RAM and virtual RAM.
  • Memory Usage on most of your clusters is high, around 70%. You should aim for 50%
  • Cluster “ABCD” is running peak at around 75%. CPU utilization should be less than 70%, so move some VMs out.
  • If you see Active Mem% is high than we should add more RAM to cluster.
  • The counter Memory Active (%) should not exceed 50-60%.
  • Memory should be running at high state on each host.

I’m sure you have heard them, or even given them, in the past. In the past 7+ years with VMware, I know I personally have both given them and heard about them.

The scope of the statements below is obviously a VMware vSphere Cluster. Cluster is the smallest logical building block, due to HA and DRS. So it is correct that we do capacity planning at Cluster level, and not at Host level or Data Center level.

I have highlighted the parts where the mistakes lie. Can you figure them out?

You should notice a trend by now. They have something in common.

The above statements are wrong as they focus on the wrong item. It’s looking at the cluster, when it should be looking at the VM. It’s looking at the Supplier (Provider), when it should be looking at the Consumer (Customer). What’s important is your VM.

Think of your IaaS business like a restaurant business. It has Dining Area, where your customers live, and Kitchen, where you prepare the food. Guess which one is more important?

You’re right. The dining area.

If everything runs smoothly in the dining area, customers are eating happily, and they are being served on time and on quality, it is a good day for the business. Whether you’re running around in the hot kitchen is a separate, internal matter. The customers need not know about it. The VM Owner does not care if you are fire fighting in the data center.

You should focus on the Customer. Focus on the VM, not the IaaS. How do you do that? Review this solution and let me know your thought. Warning, it might make you re-think of your architecture 🙂

Meet your VMware CTO Ambassadors at VMworld

VMworld is a place to learn and network. So it is great to see that around half of VMware CTO Ambassadors are sharing and contributing at VMworld. That’s very high for a group that is not part of any product Business Unit. We are field personnels and individual contributors. The high percentage shows the passion and capability of the group. Some of us are co-delivering the session with R&D and BU, showing the relationship that the group has. Other than presenting and facilitating, you can also find us at the Office of the CTO booth.

The following VMware CTO Ambassadors will be there: Adam Osterholt, Aidan Dalgleish, Amanda Blevins, Amy Chalifoux, Andrew Murphy, Anoop Jalan, Ben Lin, Charles Saroka, Christopher Cullingford, Christopher Knowles, Dale Carter, Donald Schubot, Eamon Ryan, Ed Hoppitt, Edward (Allen) Shortnacy, Edward Blackwell, Emad Benjamin, Eric Hardcastle, Gary Blake, Greg Mulholland, Iwan Rahabok, Jeff Whitman, Jennifer Green, Jerry Johanes, Jodi Shely, Jonathan Cham, Jonathan McDonald, Josh Gwyther, Julienne Pham, Justin Jones, Kannan Mani, Kim Jahnz, Martijn Baecke, Martin Banda, Michael Francis, Mike O’Reilly, Mostafa Khalil, Patrick Daigle, Peter Bjork, Richard Damoser, Roman Tarnavski, Ryan Pletka, Scott Carpenter, Sid Smith, Sunny Dua, TJ Vatsa, Tomas Fojta, Travis Wood.

Here are the list of sessions and workshops that we’re delivering or facilitating. See you there!

Applications

CNA4859 – Agility in the Datacenter – Workflows and Tools to Speed Application Delivery

  • Roman Tarnavski, CTO Ambassador, VMware
  • Chris Sexsmith – Sr Manager of Field Enablement, Cloud-Native Apps, VMware

STO4525 – Architecting Disaster Recovery of Tier 1 Applications (SAP, Oracle, SQL & Exchange) using Site Recovery Manager and vSphere 6

  • Kannan Mani – Staff Solutions Architect – Data Platforms, VMware
  • GS Khalsa – Senior Technical Marketing Manager, VMware

VAPP4440 – Migrating Large Oracle Footprint to Vblock

  • Kannan Mani – Staff Solutions Architect – Data Platforms, VMware
  • Chandra Mukherjee, KBACE Technologies

VAPP4449 – How VMware Customers Build and Tune High Performance Application Platforms

  • Emad Benjamin – Principal Architect, VMware
  • Wendy Zhao – Global Head of Middleware Engineering, Societe Generale
  • Alessandro Quargnali-Linsley – Systems Engineer, Societe Generale

VAPP4732 – Enterprise Application Architecture Influence on SDDC

  • Emad Benjamin – Principal Architect, VMware
  • Jeff Quinn – Director or Virtualization & Cloud Converged Engineering at DTCC, DTCC

Cloud Native Applications

CNA5379 – Panel: Enterprise architecture for Cloud-Native Applications

  • Martijn Baecke – Solutions Consultant, VMware
  • Joe Baguley – CTO EMEA, VMware
  • Robbie Jerrom – Senior Solutions Architect, VMware
  • Greg Andsager – VP, Cloud Native Applicaions, VMware
  • Chris Sexsmith – Sr Manager of Field Enablement, Cloud-Native Apps, VMware
  • Aaron Sweemer – Director of Field Strategy, Cloud-Native Apps, VMware

CNA5479 – Running Cloud-Native Apps on your Existing Infrastructure

  • Martijn Baecke – Solutions Consultant, VMware
  • Robbie Jerrom – Senior Solutions Architect, VMware

CTO6659 – Ask the Experts – Cloud Native Applications

  • Emad Benjamin – Principal Architect, VMware
  • Joe Baguley – CTO EMEA, VMware
  • Ed Hoppitt – CTO Ambassador, VMware
  • Robbie Jerrom – Senior Solutions Architect, VMware
  • Martijn Baecke – Solutions Consultant, VMware

vCloud Air

ELW-HBD-1681 – vCloud Air Workshop

  • Captains: Jodi Shely (CTO Ambassador), Cleavon Roberts, Tony Welsh

SPL-HBD-1681 – VMware vCloud® Air™ – Jump Start for vSphere Admins

  • Captains:  Cleavon Roberts, Jodi Shely, Patrick Mahoney

SDDC IaaS

SDDC5260 – Reducing Costs and Increasing Availability in Healthcare: Customer Stories in The Software-Defined Transformation

  • Scott Carpenter – Staff SE | CTO Ambassador, VMware
  • Jordan Wise – Architect, Lancaster General Health
  • Dave Miller – IT Architect, Baystate Health
  • Kevin Holland – Senior Systems Engineer, VMware

SPL-SDC-1606 – Cloud 101 – Deliver your Infrastructure as a Service

  • Captains: Andrew Murphy, Kelly Montgomery, Danny Farber

INF4712 – Just Because You COULD, Doesn’t Mean You SHOULD – vSphere 6.0 Architecture Considerations from Real World Experiences

  • Jonathan McDonald, Solutions Architect |CTO Ambassador, VMware

PAR6411 – PSE: SDDC Assess, Design and Deploy 2.0 – What’s New?

  • Jonathan McDonald, Solutions Architect |CTO Ambassador, VMware

PAR6412 – PSE: vSphere 6 Architectural Design and lessons learned

  • Jonathan McDonald, Solutions Architect |CTO Ambassador, VMware

Business Continuity and High Availability

ELW-SDC-1605 – Business Continuity and Disaster Recovery Workshop

  • Captains: Paul Irwin and Adam Osterholt

SPL-SDC-1605 – High Availability and Resilient Infrastructure.

  • Captains:  Adam Osterholt, Paul Irwin, Nick Fritsch

STO4510 – When it Rains it Pours: Protecting Your VMware Based Cloud.

  • Aidan Dalgleish, VMware UK
  • Matt Vandenbeld

SDDC Management and Operations

OPT5519 – Nimble Automation in a Regulated Environment: Good, Fast, and Cheap. Pick Any Two.

  • Mike O’Reilly – Staff System Engineer, VMware,
  • Jase Machado – Architect, Infrastructure Automation, Blue Shield of CA
  • Jeff Shaw – IT Virtualization, Delta Dental

INF6108 – Something Broke, What Now? Managing and Troubleshooting OpenStack Environments

  • Jonathan Cham – Global Solutions Consultant | CTO Ambassasdor, VMware
  • Ben Lin – Solutions Architect | CTO Ambassasdor, VMware

 

NET5836 – OpenStack with NSX Architecture Deep Dive

  • Jonathan Cham – Global Solutions Consultant, VMware
  • Ben Lin – Solutions Architect | CTO Ambassasdor, VMware

ELW-SDC-1620 – OpenStack with VMware vSphere and NSX Workshop

  • Captains: Ed Shmookler, Marcos Hernandez, Jonathan Cham, and Hadar Freehling

SPL-SDC-1620 – OpenStack with VMware vSphere and NSX.

  • Captains:  Ed Shmookler, Marcos Hernandez, Jonathan Cham, Hadar Freehling

MGT5471 – How VMware and Partners Bring Actions to Enterprise Administrators with vRealize Operations

  • Eric Hardcastle – Principal SDE Solutions Engineer | CTO Ambassador, VMware
  • Phil Smith – Staff Engineer, VMware
  • Michael White – Director, DataGravity Labs & Customer, DataGravity
  • Mike Kelly – CTO, Blue Medora

MGT4973 – Mastering Performance Monitoring and Capacity Planning

  • Iwan Rahabok – CTO Ambassador | Staff SE, VMware
  • Sunny Dua – CTO Ambassador | Senior Consultant, VMware

Storage

ELW-SDC-1627 – Software Defined Storage Advanced Topics Workshop

  • Captains: Mousumi Mullick and Martin Banda (CTO Ambassador)

PAR6407-BC – vSAN Workshop

  • Noel Nguyen – Director of Systems Engineering, VMware
  • Bo Bolander – Senior Systems Engineer, VMware
  • Mostafa Khalil – Technical Director, VMware
  • Greg Mulholland – VSAN Specialist | CTO Ambassador, VMware

STO4572 – Conducting a Successful Virtual SAN Proof of Concept

  • Cormac Hogan – Corporate Storage Architect, VMware
  • Julienne Pham – Technical Solution Architect | CTO Ambassador, VMware

Network

MGT5973 – Automate the Deployment of NSX and Micro-Segmentation: A Deep Dive

  • Justin Jones – Consulting Architect, Integration and Automation, VMware
  • Mitesh Pancholy – Principal Architect | CTO Ambassador, VMworld

INF4823 – Real World – Architecting a vCloud for NFV Platform for Success

  • Gary Blake – Senior Solutions Architect | CTO Ambassador, VMware UK Ltd
  • Niklas Kånge – Consulting Architect, VMware

NET4468 – Defining Your Future With NSX Certification

  • Ben Lin – Solutions Architect | CTO Ambassador, VMware
  • Chris McCain – Director NSBU, vmware

End User Computing

EUC5733 – Deep Dive on VMware Horizon 6 Cloud Pod Architecture Best Practices to Successfully Deploy a Highly Available Virtual Desktop Solution

  • Aaron Black – EUC Product Manager, VMware
  • TJ Vatsa – Principal Architect | CTO Ambassador, VMware

SPL-MBL-1653 – Advanced Concepts of VMware Workspace Portal

  • Captains: Peter Bjork, Karsten Giesse

EUC5909 – VMware’s End User Computing (EUC) Strategy into 2015 and Beyond

  • Shawn Bass – Sr. Director, Strategy & Planning, VMware
  • TJ Vatsa – Principal Architect | CTO Ambassador, VMware
  • Karthik Lakshminarayanan – Senior Director, Product Management, VMware
  • Harry Labana – VP Products, VMware

EUC5062 – Your Desktops Secured: What Can NSX Do for You?

  • Tristan Todd – EUC Architect, VMware
  • Jeff Whitman – Staff Systems Engineer | CTO Ambassador, VMware

EUC4509 – Architecting Horizon for VSAN, the VCDX way – VMware on VMware.

  • Simon Long – Cloud Architect, VMware
  • Travis Wood – Senior Solutions Architect | CTO Ambassador, VMware

EUC4630 – Managing Users: A Deep Dive Into VMware User Environment Manager

  • Michael Bradley – Senior Solutions Architect, VMware
  • Dale Carter – Senior Solutions Architect | CTO Ambassador, VMware

EUC5516 – Delivering the Next Generation of Hosted Applications

  • Justin Venezia – Sr. Solution Architect – VMware Alliance, F5 Networks
  • Nick Jeffries – Senior Solutions Architect, VMware
  • Dale Carter – Senior Solutions Architect | CTO Ambassador, VMware
  • Michael Bradley – Senior Solutions Architect, VMware
  • Mark Ewert – Architect – EUC Technical Competitive Team, VMware

PAR6426 – App Volumes Architecture and Delivery

  • Nick Jeffries – Senior Solutions Architect, VMware
  • Dale Carter – Senior Solutions Architect | CTO Ambassador, VMware