Category Archives: People

Cover things such as career, soft skills, and community.

A new Adventure!

I’ve joined the Product Management team as a PM. Being a member of a small team (Sunny and a few others), we are given the privilege to plan and drive vR Ops to the next level.

Why did I change job?

As a human being, there are 3 levels of what we do: Job, Career and Calling. Out of 7 billions people, most of us have a job. Those luckier have a career. A few have a calling. IMHO, a calling is when you have a balance among the 3M (Money, Meaning, Merry) of what you do. A calling is not perfect, as it’s a trade off the 3 corner of a triangle. Ideally, the triangle is as small as possible, so you’re close to all 3. To read more, follow this.

Some folks asked how I could get a Product Manager job out of Singapore, since there is no R&D, QA, UX, Tech Marketing, Product Marketing, and Management here.

If you want to know, here is a short story.

This job took me >5 years to vrealize. I’ve been doing vR Ops since 1.0, when it was released back in early 2011. I was one of the first to get trained in Asia Pacific. I remember when David Lavigna trained us in Sydney. I saw how I could apply super metric and custom dashboards to help customers monitor and troubleshoot. Instead of spending a lot of time with vCenter performance tabs, I could simply slice and dice the whole environment.

By 2014, I’d already spent a few years on the product. Customers taught me things they need to monitor or troubleshoot. It’s amazing how much you can learn in production environment vs lab. Real problems, real people. I compiled these lessons learned, gave it a structure, and published my first book on Dec 2014.

In 2014, VMware elected me as a member of the CTO Ambassador program. In my 1+ decade in VMware, this is the best “training” program. It opens door. It gave me trips to Palo Alto and RADIO, where I could develop the relationship with R&D.

Sunny and I brought the material to the world at VMworld 2015. We did 2 sessions, ~600 audience. The feedback told me we’re on the right path. That was the turning point to start packaging the dashboards into an integrated suite.

I continued enhancing the material, and published a second edition of my book in March 2016. Product Management team, who had been super supportive of my work, invited me to Palo Alto, VMware HQ in Silicon Valley. They paid for my first Take 2, and I spent 2 weeks in with R&D team in March 2016.

Kenon took the material and turned it into a program in June 2016. He called it Operationalize Your World. He worked with all the regions in Asia Pacific. Both of us traveled heavily and met many customers and partners. He secured the travel funding and worked with local team to get the event going. I am averaging 150 – 200 days a year since then.

VMworld 2016 was another success. I met even more customers, who convinced me that there is a big market for me to focus on. Post VMworld, Product Management team decided to bring on board part of Operationalize Your World. vRealize Operations 6.4 was the first release where we replaced bulk of existing dashboards. It was released in Nov 2016, and the feedback was very positive. Since then I had been privilege to get involved with the release, giving feedback as basically Customer[0].

By this time, Sunny had moved to Palo Alto. That changed a lot of things for me, and I benefited from that close partnership. In life, Sunny gave me an experience that 1+1=3. Each of us will have a solution, and after some fight, we end up with a 3rd and better solution.

In June 2017, I was given the chance to spend 2 weeks R&D. It gave the chance to meet more developers. Their eagerness to make the products better, and most importantly how they treated me like a member of the family, convinced me that this is where I wanted to focus. Since then, I’ve been back 2 more times, for a total of 4 weeks. All were kindly paid by CMBU. Yes, they really treated me like a member of the team.

In VMworld 2017, Product Marketing got me to speak in both US and Europe events. That was my first time meeting EMEA customers. Glad to know Operationalize Your World was resonating. In fact, it resonated better than US.

I got a chance to participate in 6.5, 6.6 and 6.7 releases. My main focus was on the ability to customise. If you compare 6.7 vs 6.4, you notice it’s easier to work with the widgets. They have better control, and look more pleasing too. You also have a lot less metrics, hence it’s easier to know what to pick. We also added a lot of property.

In March 2018, R&D invited me to do a Take 3. It is a 3 month secondment where I was part of the Product Management team. Upon completion of the Take 3, they helped to work with my CS management to transfer me. I’m grateful for my management, who gave their blessings and did the transfer with my interest at heart.

Throughout all these years, customers and partners feedback are clear: vR Ops and Log Insight are useful to them, and they want to use vRealize even more. At the end of the day, it is this assurance from them that made jump into the PM role. I’m blessed to have met probably a thousand customers since 1.0 in 2011. Collectively, they educate me, using their production environment as real examples. Their feedback shape my thought, and give me clear guidance on where we should take the products.

The Rise and Fall of Infrastructure Architect

I’ve been with IT for almost 2.5 decades. We are fortunate as we experience a once in a life time journey in technology changes. Technology has changed both work and life. Business now runs on IT, and what we know as banks, airlines, telcos, practically depend on IT. Within IT, applications run on infrastructure. This infrastructure has improved drastically that it has become a commodity. With the arrival of the cloud computing, it has become utility too. When something has come down to both a commodity and utility, the human who knows it follow as a consequence. The value of Infrastructure Architect has diminished, as the technology has become good enough, simple enough, and cheap enough for most cases. Granted, mega infrastructure such as AWS and VMware on AWS are complex. But how many of us are working there?

Most of us aren’t doing this mega infrastructure. Most businesses have <10K VMs. At 25:1 consolidation ratio, that’s <400 ESXi. At around 12 ESXi per cluster, that’s just 36 clusters, including HA. Space wise, it will occupy just ~10 racks. 1000 VM per rack for all compute + storage + network are doable.

Compared with say 10 years ago, it’s much easier to architect and operate a VMware environment with just 10K VMs. It’s easier because there are many references architecture, such as VMware Validated Design and VMware Cloud Foundation. For those using VMware on AWS, the design, implementation, upgrade and support are done by VMware.

So what can you do as Infrastructure Architect to progress your career?

If you are not moving into managerial or sales position, you need to add skills that are valued by CIO or Business. That means non-technical, as these folks care less about technical matters. The following diagram shows the career progression:

Since Infrastructure is becoming a service, you need to know how to architect a service (e.g. IaaS, DBaaS, Desktop as a Service).

  • What are the services the IaaS is providing? How you define a service?
  • What metrics do you use to quantify its quality?
  • How many services? How do you distinguish between higher class service and normal one?

You also need to know what type of services are on demand. Yes, this require you to go out and meet your customers. Understand their requirements. What Price/Performance are on demand? From there, you can architect a corresponding services.

I cover this in more details in Purpose-driven Architecture, so I won’t repeat it.

Done reading it? Great!

The next step after Service Architect is Business Architect. This is especially valuable to CIO, who runs the business of IT. It’s also important to Cloud SP, whose business is actually selling the service.

For a start, know the business you are in. Below are the 2 main models. Be clear on the nuance, as Internal IT is morphing towards internal Cloud Provider.

As a Business Architect, you not just know the cost of running the service, but you also know how & when to break even. You do not have to responsible for P&L, as you’re not the CIO or Cloud SP CEO, but you play a strategic role to them. You’re not merely a techie. You know what to price, how to price and your price is competitive.

The world of Cost and Price is a complex one. vRealize comes with a tool to help you manage this part.

Summary

  • Systems Architect needs to evolve, as infrastructure is becoming commodity and utility.
  • Service Architect and Business Architect are the next steps for Infrastructure Architect.

The next post, VCDX meets VCOX, discusses this further.

SDDC Operations Dashboards for SMB environment

This post continues from the Operationalize Your World post. Do read it first so you get the context.

The SMB segment is a world of its own. There are things that are mandatory in Enterprise segment, but not relevant in SMB segment. As a result, products should be tailored for that market segment.

IMHO, there are actually 4 different market segments when it comes to SDDC Operations. I use No of VM as the marker for each segment. Each of the following segment requires different dashboards and reports:

  1. 100 VM
  2. 1000 VM
  3. 10000 VM
  4. 100000 VM

Now, it will be difficult to create a product with 4 sets of vROps dashboards & reports. I make a compromise on the above, and use this one instead:

  1. 400 VM: SMB market
  2. 4000 VM: Enterprise market
  3. 40000 VM: <give me a name here folks> market

I hope the above is acceptable. As the above has very wide range, I’d take the following reference point

  1. SMB market: 250 VM
  2. Enterprise market: 2500 VM
  3. Huge Private Cloud: 25000 VM

Let’s dive to the 250 VM segment. What are the unique characteristics?

  • 1-2 guys doing everything. No siloes in the team. You and your best friends take care of the whole darn IT.
  • You only have a few clusters. Each cluster only has a few ESXi Host.
  • You know your environment very well because it’s small. They all fit into 1 rack. Architecture is simple. You have a mental picture of it in your head.
  • You don’t buy hardware or VMware every quarter. Likely it’s every 2 years. Capacity planning and monitoring are simple.
  • The workload is quite stable. You are not adding/removing/changing VM every day.
  • Service Tier is an overkill as you only have 1-2 clusters for all workload.

Which of the above points apply to a large environment?

You are right. None.

As a result, SMB needs a purpose-built dashboards. It covers the following:

  1. Availability
  2. Performance
  3. Capacity
  4. Reclaimable Capacity
  5. Compliance
  6. When a VM Owner complains

Home

Your main dashboard. It’s the first dashboard you check, likely on a daily basis as part of your cadence. It answers the no 1 question: is everything healthy?

This is what it looks like in vR Ops 6.3. I’ve added explanation so you can easily see that it’s layered into 4 areas.

home

Availability

The first element of Health is Availability. If a VM or ESXi is down, there is no need to talk about performance or capacity as the damn thing is dead 🙂

The Availability dashboard gives you details info. You can answer questions such as “When did it go down? For how long?”

availability

The dashboard is also useful when you need to report uptime. You do need to create a report and customize it though. If you need it, email me your requirement.

Performance

Just because something is up, does not mean it’s fast. Performance dashboard provides the info here. The dashboard sports the new concept of Performance, which you can review here. It does not apply the formal SLA, as that’s not applicable in SMB. Even without SLA, you can use it to prove your innocence, or justify new hardware purchase.

Line Charts are used as performance problem might have started earlier, or it’s no longer happening and you’re doing a root cause.

If the performance issue is caused by villain VM, the dashboard lets you find the VM. Change the time line in the Top-N widget to the time where there is performance problem.

BTW, if you like the ability to find out which VM was causing the problem, send your thank you to Matthew Hurley

Capacity

Generally speaking, Performance problem happens because supply is not being met by demand. The Capacity dashboard gives detail info on the supply side. As there are only a few clusters, capacity management is much simpler.

capacity

Notice it takes into account performance.

If you mix Prod and Non Prod, capacity management becomes harder. Since the hardware is shared, we need to monitor at the overall cluster level. Since the Production VMs have a more stringent SLA, naturally their number reflects that. As a result, we need to show Prod and Non Prod differently. Let me know if you need it, as to me that complicates operations. This is another reason why I advocate separate cluster for Prod and Non Prod.

One common issue in virtual environment is VM sprawl. Some of these VMs end up not being used. You can reclaim CPU, RAM and Disk from these VMs.

  • The easiest to reclaim is from orphaned VMs, as they are not even registered in vCenter.
  • The second easiest is snapshot. You should only keep snapshot for 1 day or less.

Once the above is reclaimed, you need to look at Powered Off VMs and Idle VMs

  • CPU and RAM are reclaimed from running VMs, as powered off VMs are no longer consuming the resource.
  • CPU: claim from large VM (e.g. 8 vCPU or more). Avoid reclaiming from 2 vCPU unless you’ve completed the large VMs.
  • RAM: claim from large VM (e.g. 16 GB RAM of more) that has Guest OS metrics. It’s more accurate than hypervisor metric.

The Reclaimable dashboard lists all the VMs that have been idle or powered off. It also lists the orphaned VMs and large snapshots.

reclaimable

Configuration

If you configure vSphere hardening guide, and your Infra and VMs comply to it, you will see all green in the dashboard below. If not, you can see exactly which VM or infra is not complying. You can customize the default threshold, although it’s better than you customize the symptoms & alert instead.

You can see compliance for Network and vCenter too, under the vSphere Compliance widget. There is a drop-down there that is not shown.

IaaS

Last but not least, your job is actually about making sure the VM is being served well. It’s a service. Your customers don’t care about your infrastructure. So when they complain that their VM has a problem, you need a dashboard that quickly prove if the problem is at your end or their end. TTI is not Time to Investigate, but Time to Innocence 😉

The Troubleshoot a VM dashboard is built exactly for that!

troubleshoot-a-vm

This dashboard is quite long, as it lets you check underlying ESXi and datastore. You can collapse the widget, as shown below, to see more.

troubleshoot-a-vm-2

Hope you find the material useful. If you do, go back to the Main Page. It gives you the big picture so you can see how everything fits together. If you already know how it all fits, you can go straight to download here.