How to operate your virtualized data center

The article was first published on LinkedIn in July 2014. I’ve received many positive feedback, and hence decided to update and share here.
24 Dec 2015: Is Gartner’s Bimodal IT harmful? A good read.

X86 virtualization has been around for more than a decade. We have now many first-hand experiences on the ramifications of virtualization in the data center. It has broken the boundaries between many infrastructure silos. The boundaries between the server team and the rest of the team (storage, network, security, Windows, Linux, BCDR, management) have become blurred. For example: virtualization has eliminated the network access switch and the distributed virtual switch has hundreds of ports, making it the largest switch in the data center by port count. The diagrams below show that the Infrastructure component team must band together and reduce the internal siloes. I also draw each team smaller, because each will have less people. 10K physical servers virtualized into 500 ESXi host certainly will require less people. Each person will be a lot more experienced as the infrastructure becomes integrated.

Operations - Silo

Virtualization has also impacted the relationship between the Infrastructure team and Application team. The shared services nature of virtualization means the application team becomes a customer. Performance SLA becomes critical. The relationship between Developers and Operations is also impacted. In fact, this has resulted in the DevOps concept, where the two teams work seamlessly as one. A similar, emerging concept is “Operations Engineering”, an idea worth exploring as it moves Engineering deeper into the world of Operations. In the diagram above, there is now a formal SLA, depicted by a thick red line.

IaaS requires the transformation of the infrastructure team into a fully-fledged service provider. In this short article, I will examine the Infrastructure team including the Architecture, Engineering, Integration and Engineering teams.

  • First I will discuss the major challenges facing these teams when a company attempts to implement IaaS without an organizational restructure.
  • Secondly I will propose some ideas, based on my own observations and experiences, for how these teams can be restructured to best deliver IaaS.

Organizational Challenges in IaaS

A major change in Architecture certainly impacts Operations. We cannot operate two things that are architecturally different in the same manner. It does not work, and many customers have told me how things are broken operationally because their CIOs do not know how to restructure for the virtual data center.

In large enterprise IT, it is common for the IT team to be split into Architecture, Engineering, Integration, Operations and Project Management. The Architecture team typically looks at enterprise wide standards and decides on large building blocks. The Engineering team builds technical solutions conforming to the architecture and tests them in their lab. The Integration team implements the engineered solutions, with the help of Project Management. Finally, things are handed over to Operations team for on-going support. Operations team will escalate to Engineering team if there is an issue.

Operations - Team - Physical

I have seen many jokes and complaints from one team about the other team. More often than not, the complaints on both sides have some truth behind them.

For the above to work when the data center is defined in software, certain things must change. A fundamental requirement when dealing with software is the need to be hands-on. You cannot get away with assuming things will run in a software-defined world like you can with hardware. The hands-on experience also differs drastically between a small engineering lab environment and a large mission critical production environment. It is easy to upgrade vCenter in the lab. Try upgrading a vCenter with 10,000 virtual machines with complex dependencies.

Proposed Organizational Changes

To help companies proceed with a move to a software defined data center from a hardware focused world, I would propose merging the Architecture team with Engineering and Integration with Operations. Certainly, there would be variations from customer to customer. What works well in one company may be a disaster in your company. These are conclusions that I have drawn from my own experience.

Architect

To architect something well, you need to know the end state well, both architecturally and operationally. For an Architect to have both the depth and breadth of knowledge of their software-defined data center, they can no longer live in the engineering lab. The architect has to be involved in order to get a real taste of the ground level experience. They must log in regularly to vCenter and vCenter Operations to get a feel of the environment they have designed. It is a feedback loop. If there is a wide spread production issue, the hands on the keyboard performing complex troubleshooting belong to the Architect. You do not architect something you are not prepared to troubleshoot.

Engineer

Where does the Engineer fit in? I’d recommend that the Architect and the Engineer are the same person. Yes, the skills are different in a traditional environment. There is no more separation in a software-defined datacenter as what we architect is what we engineer. Sure, we can and will have many different types of Architect or Engineer. We need to have Network Architect/Engineer, Storage Architect/Engineer, the overall SDDC Architect and others, depending on the organization. The overall SDDC Architect has hands-on level knowledge. They are not a people manager. If they are not hands on, they would be the weakest link in the chain. Politics grow when we cannot use technical facts, as things become debatable.

Operations

I have seen personally how Operations team becomes more technical than the Engineering team. They know how to upgrade an environment with 50,000 VMs globally. The kind of questions and issues they discuss with me are not normally asked by Engineering team, because Engineering team does not support such a large infrastructure. Working in a production environment makes us wise.

Integration

What about the Integration team? Amongst the changes I would propose, the Integration team would be absorbed into Operations. The Operations team faces the production environment directly. They know best what exactly is broken and why. They need to have technical capability and resource to make improvements. If the improvement is complex, the Architecture/Engineering team will redesign and certify the new solution, after which they lead the implementation before handing over to Operations/Integration.

Working Together to Deliver IaaS

Operations - Team - SDDC

Once the merging of the Architecture/Engineering teams and Integration/Operations teams has been achieved, I would also call out the need for a tighter integration between the two new hybrid teams. I propose they become a single, virtual team. Architecture/Engineering must actively include Integration/Operations in architectural planning and decisions. Integration/Operations should circle back to Architecture/Engineering to involve them in troubleshooting and infrastructure optimization exercises. While the two teams remain distinct in organizational structure, they act as a single team with a shared vision, set of responsibilities, and annual review criteria.

Every now and then, there should be job secondment between Architect/Engineer team and Integration/Operation team. You take 1, or 5% of your team if you have a large team, and send them to the other side for say 1 month. The other team would also do the same. This would help them to appreciate the roles of their colleagues in the other team, promote mutual understanding of the challenges faced by each team, and promote the sense of working as a united virtual team.

I am very interested in learning your thoughts and experience on the operational implications of moving to a software defined data center. There is no right or wrong answer. It all depends, says the Consultant 🙂

————————————————————-

It wouldn’t have appeared on your screen if not for the review by the following practitioners:

2 thoughts on “How to operate your virtualized data center

  1. Pingback: 1000 VM per rack is the new minimum

  2. Pingback: 1000 VM per rack is the new minimum

Leave a Reply