Tag Archives: IaaS

The Rise and Fall of Infrastructure Architect

I’ve been with IT for almost 2.5 decades. We are fortunate as we experience a once in a life time journey in technology changes. Technology has changed both work and life. Business now runs on IT, and what we know as banks, airlines, telcos, practically depend on IT. Within IT, applications run on infrastructure. This infrastructure has improved drastically that it has become a commodity. With the arrival of the cloud computing, it has become utility too. When something has come down to both a commodity and utility, the human who knows it follow as a consequence. The value of Infrastructure Architect has diminished, as the technology has become good enough, simple enough, and cheap enough for most cases. Granted, mega infrastructure such as AWS and VMware on AWS are complex. But how many of us are working there?

Most of us aren’t doing this mega infrastructure. Most businesses have <10K VMs. At 25:1 consolidation ratio, that’s <400 ESXi. At around 12 ESXi per cluster, that’s just 36 clusters, including HA. Space wise, it will occupy just ~10 racks. 1000 VM per rack for all compute + storage + network are doable.

Compared with say 10 years ago, it’s much easier to architect and operate a VMware environment with just 10K VMs. It’s easier because there are many references architecture, such as VMware Validated Design and VMware Cloud Foundation. For those using VMware on AWS, the design, implementation, upgrade and support are done by VMware.

So what can you do as Infrastructure Architect?

If you are not moving into managerial or sales position, you need to add skills that are valued by CIO or Business. That means non-technical, as these folks care less about technical matters. The following diagram shows the career progression:

Since Infrastructure is becoming a service, you need to know how to architect a service (e.g. IaaS, DBaaS, Desktop as a Service).

  • What are the services the IaaS is providing? How you define a service?
  • What metrics do you use to quantify its quality?
  • How many services? How do you distinguish between higher class service and normal one?

You also need to know what type of services are on demand. Yes, this require you to go out and meet your customers. Understand their requirements. What Price/Performance are on demand? From there, you can architect a corresponding services.

I cover this in more details in Purpose-driven Architecture, so I won’t repeat it.

Done reading it? Great!

The next step after Service Architect is Business Architect. This is especially valuable to CIO, who runs the business of IT. It’s also important to Cloud SP, whose business is actually selling the service.

For a start, know the business you are in. Below are the 2 main models. Be clear on the nuance, as Internal IT is morphing towards internal Cloud Provider.

As a Business Architect, you not just know the cost of running the service, but you also know how & when to break even. You do not have to responsible for P&L, as you’re not the CIO or Cloud SP CEO, but you play a strategic role to them. You’re not merely a techie. You know what to price, how to price and your price is competitive.

The world of Cost and Price is a complex one. vRealize comes with a tool to help you manage this part.

Summary

  • Systems Architect needs to evolve, as infrastructure is becoming commodity and utility.
  • Service Architect and Business Architect are the next steps for Infrastructure Architect.

A test of your IaaS Operations maturity

What you architect is SDDC. What you handover as business result to CIO is IaaS. We can assess if the architecture is good or not, based on the actual result in production. Does it result in fire-fighting and blame-storming? Or you have a peaceful operations?

The litmus test below helps you assess the maturity of your IaaS.

Do your customers blame your infrastructure?

  • If the answer is yes, take a step to ask yourself why. There is a high chance you’re relying on complaint in your operations. So you actually encourage it. No complaint, no problem. A Complaint-based Operations.
  • The reason why you rely on complaint is you don’t have other means. You have not defined the performance of your IaaS.
  • A sign of matured operations is you have Performance SLA. It is per-VM, measured every 5 minutes.

Is your IaaS cheaper than both VMware on Amazon and Amazon?

  • If not, your CIO may question your business value. The reason for having an in-house architect is so you can bring lower cost, after taking into account your salary.

 Does Help Desk provide a good first level defense?

  • If Help Desk simply passes through to the next level, you need to look at why.
  • Help Desk is your first line of defence. They are not as technical as you are. Equip them with simple dashboard so they can handle VM Owner complaint:
    • Is the problem caused by IaaS not serving the VM well?
    • If yes, which part of the Infra: CPU, RAM, Disk, Network?
    • If not, how to prove it convincingly?

Can you justify new infrastructure when utilization is not yet high?

  • This is not referring to additional money that comes with new project. This is referring to existing clusters/storage.
  • Capacity is measured on utilization and performance. A cluster capacity is full if it can’t serve its VMs well. Since it takes time to buy hardware, you need to have have early warning to detect this performance degradation.

Do you struggle with many over-provisioned VMs?

  • This is an indicator that you’re operating as a System Builder as opposed to a Service Provider.
  • As a System Builder, you’re meddling with each System (read: Application). You size them, and argue with application team.
  • As a Service Provider, you’re not “on the way”. IT simply uses an effective pricing model to drive the right behaviour. Does AWS block you when you buy 40 CPU EC2 VM when you only need 2 CPU?

Does Troubleshooting mean all hands on deck?

  • Do you have a process that is followed by all teams (network, storage, server, OS, application)? Does that process end with Root Cause Analysis?
  • As part of RCA, do you set up alert so issue can be detected faster if it happens again?

There are more questions, but I thought we start with those first. If you want to see details, download this.

Operationalize SDDC program

This post continues from the Operationalize Your World post. Do read it first so you get the context.

Callum Eade and Kenon Owens created a program called Operationalize Your World. Sunny and I provide the technical content. Many folks, both internal and external, have reviewed the materials along the way in the past several years. I was cleaning up my files and surprised to see decks from early 2011 have the old versions of the slides you’re seeing today.

If you only have 10 minutes, below is a 7-minute introduction to what you get in the 1-day workshop. Sunny & I delivered that in VMworld 2016. We benefited a lot from the community, so we immediately said yes when Alastair and vBrownbag invited us to share.

summary

In 2017, they again invited us. This time, we are given 30 minutes, so you get some of the solution this talk.

The 1-day workshop actually has 2-day worth of material. Hence there are flexibility on what is delivered on that day and it’s driven by the audience:

We use a restaurant analogy to raise awareness that your IaaS business should be operated differently. There are 4 main ppt files.

powerpoint

You can find the material hereThey are in editable format (ppt), not in PDF format.

We are giving in PowerPoint as Operations vary widely. Take what’s relevant to you, throw away what’s not, add your custom deck, and make it yours. When you share your deck to your peers or customers, let me know how it goes. I’m keen to hear your journey. It’s a journey because it will take you multiple rounds to enlighten your peers.

The workshop covers 4 areas in management (Availability, Performance, Capacity and Configuration). We map each area to both Consumer and Provider layers of your IaaS business.

blog 1

Hope you find the material useful. If you do, go back to the Main Page.