Monthly Archives: July 2015

On being a VMware CTO Ambassador

CTOA2015SummerConference_GroupPhoto-3-DSC_7887

While there are many folks who know and value their CTO Ambassadors, there are even more people who aren’t even aware of what the program is. If you are not familiar, do visit the official site here. Joe Baguley, VMware CTO for EMEA, shared at this VMworld 2015 videoAmanda, a fellow Ambassador, provides her thought here. In this blog, I want to give you my personal take on what being a VMware CTO Ambassador means to me.

Here is the definition from the official site, with added highlight from me: “The CTO Ambassador program is run by the VMware Office of the CTO. The CTO Ambassadors are members of a small group of our most experienced and talented customer facing, individual contributor technologists. They are pre-sales systems engineers (SEs), technical account managers (TAMs), professional services consultants, architects and global support services engineers. The ambassadors help to ensure a tight collaboration between R&D and our customers so that we can address current customer issues and future needs as effectively as possible.”

We are a virtual member of Office of the CTO, because our day job is on the field. This global group is led by Paul Strong (Global Field CTO), Matthew Stepanski (GTS), Joe Baguley, Chris Wolf (America CTO) and Shannon Klebart (Program Manager). They are joined at the Advisory Board by 5 CTO Ambassadors. Instead of calling them VMware CTO Ambassador Advisory Board, I just call them Jedi Council 🙂

The group is diverse, yet there is a strong commonality among the members. It’s easy to establish friendship as we are in the same “frequency”. Our thought is driven by customer requirements. We are all passionate about VMware, and yet we see things from customers view point.

What do we do?

The ~100 CTO Ambassadors do a wide variety of things. Here, I’m listing the things that I do.

We collaborate with the product team. We provide feedback on upcoming features. The feedback varies from strategic direction to actual screenshots. For example, I spent an hour on just 1 widget with the Architect and Product Manager of vRealize Operations. We can engage R&D at implementation level. Collectively, the Ambassadors bring both the breadth and depth to R&D. A lot of us work with customers at operations stage, not just architecture stage, hence we know what works and what doesn’t. Some of us actually get seconded to R&D on a short term basis.

We explain the VMware story. I am privileged to witness a once in a life time change in IT Industry. There are mega trends, and it’s interesting to see them unfolds as these trends overlap. VMware started, and became hugely successful, when it led the industry with X86 virtualization. As virtualization spreads into the rest of data center, and interfaces with changes at application stack, it becomes critical to explain the VMware story. It is far from a vSphere company that a lot of customers still perceive it to be. VMware is now a much larger company than what it was 5 years ago. Naturally, the business scope becomes wider as the circle of influence increases. What was server virtualization becomes Mobile Cloud. This is a more complex story to tell, hence the need for CTO Ambassadors. Customers ask me about industry trends and it is good hear first-hand that VMware story resonates. The complete story makes sense. It is just not what you think it is.

We specialize and take technical leadership. Many of us blog and it is great to hear feedback that it is useful to colleagues, partners and customers. Take Sunny Dua, for example. His blogs on vRealize Operations is probably #1 on the topic and he is one of the authority on the product. I wrote a book on SDDC Performance and Capacity Management, explaining the topic from customer view point as opposed to product view point. Contact me if you want to contribute in the second edition (due mid 2016). A lot of us are capable of delivering a Level-300 training on our areas of specialization.

We participate in beta. For customers who are keen, we get them into the beta program. There is a certain level of maturity required in beta program, and glad to say my customers appreciate the closer feedback loop you experience during beta.

We evangelize to and socialize with the virtualization community, both in physical world and digital world. One beautiful thing about virtualization is there is a strong community in this space, and friendship is born. We guide one another in our career, regardless of where we work. In digital world, you will find some of us on Twitter, LinkedIn, Facebook, VMTN, etc. On Facebook, I founded this group to facilitate networking and discussion among VMware users. Glad to see it has grown to become one of the largest VMware groups on Facebook. Feel free to copy the guiding principle that I created, as it will help you in minimizing public conflict.

Internally, the Ambassadors champion the use of Socialcast. We are strong advocate of collaboration. I see this is as critical, because VMware is a multi-product organization. I used to be a big fan of mailing list. With Socialcast, we have granular control. I am a member of >100 Socialcast groups and it does not require me to create 100 folders in my mailbox. I’ve disabled all push notifications, as SocialCast online notification does a better job. Yes, no email at all. For search, I find the Socialcast search to be more powerful.

We bridge, both internally and externally. There are thousands of people in VMware, partner ecosystems and customers. It can be difficult to know who actually does what when you need help. Sure, there is an org chart. You know how accurate they are right 🙂 The bi-directional connection that the CTO Ambassadors have mean we can link you. A typical request from Product Manager or R&D to me is “I need a customer that meets this profile. Can you schedule a meeting with a person at the right level?” On the other hand, account team or partners normally ask for a specific engineer or person in Palo Alto that can help with a given issue or opportunity.

How does being an Ambassador change me in what I do?

By and large, nothing. And that’s fundamental. To be an Ambassador of any company, one should be doing it already. The formal appointment significantly amplifies and enhances your capability, but it does not alter the core. In this way, you won’t be wondering what you should do, as you’re already doing it! People should be able to see the Ambassador quality in you because of what you’ve done. This will also make it easier for the Advisory Board to decide on your application. The bar to become VMware CTO Ambassador is high.

I’m more aware of my role and duty. What I say, be it off-line or on-line, can be quoted. It can also be taken out of context. So before I reply to an on-line post, I ask myself “Will VMware say what I’m about to say?” I care about the image of VMware. Yes, I may be protected legally with all the disclaimer that this is my personal post. Practically, however, the damage is done, so I need to be careful. Some of my posts are taking 1 week because I am soaking it and also ask close friends to review. My little girl likes to say “Just in case!”.

I also see myself as an extension of the Product Team. When a product needs improvement, my feedback to them is followed by solution. There is no value in a conversation if I merely criticize. It’s easy to criticize. It’s hard to provide a solution. I contribute IP back to the product team, and it is a privilege to see my work makes it back to the product.

I hope that gives you a good summary of VMware CTO Ambassadors. If you are a technology vendor, I think you should establish one too. If you a large, global IT organization, you should also establish one too. I’m happy to share our experience on it. For those of you based in Asia Pacific, do reach out to your local Ambassadors here.

Should our children do IT for a living?

[13 Feb 2017 update: a similar article appears at Wired for the Application Developer job]

Ask any Infrastructure Engineers or Software Developers, whether their young kids should follow their foot step, and you may get a No answer. Maybe you get a Yes, but I think it is no longer as firm as it was 2 decades ago. Back in early 90s, the answer was a resounding yes. As an Infrastructure Engineer myself, I do not want my kids (12 and 15) to follow my footsteps. Don’t get me wrong. I love what I do. It’s just that I think the party is coming to an end. It will last long enough for us to enjoy, but not for our children. We are in a once life time change in IT Industry. There are megatrends, and it’s interesting to see them unfolds as they overlap.

If we look at IT, there are large sub-industries. Each has billion-dollar vendors (with market cap >$10b) jockeying for positions as they all want to get bigger. The greedy nature of capitalism means growth is the only factor that matters. If your market cap is $10 billions, then grow to $50b. If you have reached $100 billions, then grow to $1 trillion. It is indeed greed. The father of Greed is Fear. The thinking goes something like “if you do not grow, you will die.” As human, we know we cannot be satisfied with money, but our fear of losing the money makes us want more of it. This “dog eat dog” mindset contributes to tectonic shifts we’re seeing in our industry.

We can categorize the IT Industry into Application and Infrastructure. In fact, the IT Department in some large organizations typically means just the Infrastructure or common applications (e.g. Database, Email, and Directory). They have also turned into internal Service Providers.

We can also categorize the IT Industry into Personal (Retail, Consumer, B2C) and Commercial (Enterprise, SMB, B2B). The Rise of the Individual helped propel Apple to be a giant in the industry, even though it had a relatively tiny presence in the commercial space. The Personal-Commercial categorization can also be viewed using Core-Edge category. The Core physically resides at Data Center. The small core may only be as big as a few servers inside a rack. The Edge physically resides with the Individual. The first instantiation was PC, which has evolved into notebook, tablet, smart phone, gadgets. Recently, it starts to show up on machines, giving birth to Internet of Things.

Here are the forces that I see, in no particular order:

  • Cloud Computing
  • Mobile
  • DevOps
  • Container
  • Storage virtualization
  • Network virtualization
  • Edge Computing
  • Internet of Things
  • The Rise of the Individual

Here are the areas the areas that are already declining as a result. Again, in no particular order:

  • UNIX
  • Mainframe
  • PC
  • FC and FCoE
  • Physical storage
  • RDBMS
  • Current UX or UI

I will provide my 2 cents on the above trends. We can analyze each of them and how they impact one another. But that’s a subject for another blog post 🙂 In here, I want to focus on our children, not us. My timeline here is 15-20 years, not 5-10 years. What does IT look like around the decade of 2030? A 15-year old teenager today will be 30 year old adult by then. He is probably holding a mid-level position. What does the job look like? What does the industry look like? What jobs that will be important in 2030 and do not even exist in 2015? Will he work as independent freelancer? A 5 year old kid will be 20 year old in 2030. Should she be pursuing an IT degree? If she does, can she utilize her hard-earned degree for 2 decades? That will take us to 2050!

You may say that 15 years is too long to predict. I agree. It’s much easier to predict 5 years. But that’s not a prediction actually. It’s just a projection. A projection has no or minimal element of surprise, as it’s just along the trajectory. You have a historical data, and you’re merely moving along the line. The other reason is IT changes slowly, especially the mission critical, core system. The mainframe, the oldest among IT technology and it has celebrated its 50th birthday, will still be around in 5 years. In fact, they will probably be around 15 years later.

You want to know my prediction of IT in 2030-2050? At the rate innovation is going, I think there will be a super intelligent system. It spans the globe, connected with high speed network. A fault-tolerant distributed system that is always available. It is life-critical, that no administrator can bring it down. The good news is… there is a role for human to play in 2050. The bad news is… We become the battery 🙂 🙂 No, don’t even think of getting out of your container. The robot dog and drone will kill you! 🙂 🙂

Let me know your thought!

VMware SDDC Architecture: Network

In the previous article, I covered the requirements and overall architecture. We have also covered the Compute architecture. To some extend, I’ve covered Storage architecture as it’s using VSAN (we will dive in a future blog). In this blog, I will cover the Network architecture. I’m not a Network Architect, and have benefited from great blogs by Ivan and Scott.

Logical Architecture

There are 4 kinds of network in VMware SDDC, namely:

  1. VMkernel network
  2. ESXi Agents network
  3. VM network
  4. iLO network

The VMkernel network has grown as VMware adds more capabilities into the ESXi kernel. In vSphere 6, you may need up to 8 IP addresses for each ESXi host. Since they are on the physical network, you will need VLAN for each. While you can place the vmkernel network on VXLAN, for simplicity and cleaner separatio, I’d put them on VLAN. Operationally, I’ve put them on a 2-digit VLAN ID, so it’s easier for anyone in the company to remember that VLAN 10 – 99 are for VMkernel.

network 1

There are many articles on the various VMkernel network, so I will just touch them briefly:

  • Management Network. This is predominantly for ESXi. I put all of them on the same VLAN to prevent operations from becoming too complex. There is no need for isolation as this is out of band. VMs traffic do not go here.
  • vMotion network. I keep them on separate VLAN as there is no need to do inter-cluster vMotion a VM across Clusters Type. For example, In the Network Edge Clusters, the only VMs living here will be NSX Edge VMs and NSX Distributed Router VMs. There is no need for other VMs to live in this cluster. To minimize human error and ensure the segregation, the vMotion network does not go across different type of clusters. Let’s take another example to make it clearer. In the VDI cluster, we will have 1 – 5 clusters per physical DC. They can vMotion among these 5 VDI clusters, so the entire 5 clusters is just 1 logical pool. There is no business need for the VDI VM to vMotion to the Management Cluster. Also, the VDI server infrastructure (e.g. Horizon View Connection Server) live in the Management Cluster, so separation helps simplify operation. This interface needs 2 IP addresses if you are doing multi-NIC vMotion.
  • Fault Tolerant network. I apply the same restriction to prevent a VM and its shadow VM spans across 2 different type of cluster.
  • Storage network. This can be NFS, iSCSI or VSAN. From the above diagram, you can see that I share the Storage network between Server workload, Desktop workload and Non Production. To keep things simpler, you can actually share with the Management and Edge clusters also. This means you only have 1 VLAN. The reason it is safe to do that is there is no life migration. The VM has to be shut down as there is no vMotion network. You also cannot have an FT VM spanning as there is no FT network across the cluster type.
  • vSphere Replication network. Having a separate network makes monitoring easier.
  • VXLAN network. This is the network where the VM traffic will be tunnelled. Having a separate network makes monitoring easier.

The above will need 6-7 IP addresses. Plan your IP address carefully. I personally prefer an easy correlation to my ESXi. For example, ESXi-01 in my environment will have x.x.x.1 address, and ESXi-99 will have the x.x.x.99 address.

ESXi Agent Network

Data Center components are moving to the hypervisor. When they move, they either move to the kernel as kernel module (e.g. VSAN), or they take the VM form factor. An example of VM form factor is Nutanix Storage VM and TrendMicro Deep Security. Regardless, you need an IP address for every ESXi.

This Network is not a vmkernel network. They are VM network. However, they are backed by VLAN, not VXLAN. That means they are on the physical network, given a physical IP address. So you need to plan them.

Now that we’ve covered the ESXi networks, let’s move to the VM Network.

VM Network

All of them, without exception, will be on VXLAN. This allows decoupling with the physical DC network. Another word, we virtualize the VM Network. Defining it on software allows inter-DC mobility. There will be many VXLAN networks, so I need to plan them carefully too. In the above diagram, I have grouped into 4 top-level groups. I’d give each group its own range, so I know what kind of workload is running given a VXLAN number. Here is an example:

  • VXLAN 10000 – 19999: Server Workload
  • VXLAN 20000 – 29999: DMZ Workload
  • VXLAN 30000 – 39999: Desktop Workload
  • VXLAN 40000 – 49999: Non Production Workload

I have a wide range on each as I have a sub-category. Yes, this means you will have a lot more VXLAN than you do VLAN. This is a fundamental difference between networking in SDDC and networking prior network virtualization. You do not want to have too many VLANs as it’s operationally taxing. VXLAN does not have that issue. Network becomes cheap. You can lots of them. For example, the server workload is split per application. If I give each application up to 10 networks, I can have 1000 applications. By having 10 networks, I can have numbering convention. For example:

  • Web server network: xxxx1. Example is 10001, 10011, 10021, and 10031 for the first 4 applications. This means I know that anything on 1xxx1 is my production web servers.
  • Application server network: xxxx2
  • DB server network: xxxx3

Lastly, but certainly not the least important, you should have the iLO network for light-out management. This is the physical boxes management network.

Physical Architecture

[10 Nov 2015: I got a correction from Raj Yavatkar and T. Sridhar that we should not have spine connect to the northbound switch – if you do that, that creates some interesting issues; spines should be devoted to carry, inter-rack, East-West traffic. I will update the diagrams once I have some time. Thanks Raj and Sridhar for the correction. Much appreciated].

As SDDC Architect, you need to know how to implement the above Logical Architecture. It is software defined, but the hardware plays an important role. If necessary, tap the expertise of the Network Architect. In my case, I have requested YJ Huang from Arista to help me. I also benefit from Ivan Pepelnjak’s post here.

We start from the base connectivity. The diagram below shows 2 ESXi hosts, and how they are physically connected to network devices. I am using 2x 10 GE cables for data network, and 1x 1 GE cable for iLO. There is no need for HA for the iLO network. In most cases, 2x 10 GE is more than enough. Know your workload before you decide with 4x 10 GE.

network 1-1

Now that we’ve covered the basic, let’s see what the overall picture look like when we attach all the ESXi Hosts. The diagram below shows the 2 physical data centers. They have identical physical setup, but different IP addresses. Data Center 1 could have 10.10.x.x, while Data Center 2 has 20.20.x.x. By not extending the physical network, you contain the failure domain.

The diagram shows how my 5 clusters are connected to the switches. I use a Spine Leaf as I want to be able to scale without major re-architecting. I’ve drawn the future switches in grey. They naturally come in pair. I draw the spine-leaf connection thicker as that is 40G.

network 2

Let’s see how the architecture scale to the requirements, which is 2000 server VM and 5000 VDI VM. As you can see, it’s essentially an extension. Fundamentally, it remains the same. I do change the cluster placement to keep it simpler. This comes at a cost of re-cabling.

Architecture 2000

You maybe wondering why I use 40G between spine and leaf. For the VM Network, 10G is more than sufficient. The reason is the VMkernel Network. The vMotion boundary cut across pods.

network 5

I hope you find useful. Keen to hear your thought!