Monthly Archives: September 2017

WaveFront – How to deploy and use – PART I

This blog is contributed by my friend Luciano Gomes, a VMware Technical Account Manager in Rio de Janeiro Area, Brazil. Thank you, Lucky!

Today, I am super excited to share about Wavefront, a new acquisitions by VMware. I’ve been playing with it as it’s been awesome!

First, what is Wavefront?

It is cloud-hosted service, not a product you install in your premise. You send your time-series (metric) data – from CollectD, StatsD, JMX, Ruby’s logger, AWS, anything basically, to it.

You can then perform arbitrary mathematical operations on any number of those series, render charts to see anomalies or KPI dashboards, and ultimately create truly intelligent alerts to proactively watch over your entire stack.

It scales seamlessly, it’s reliable, it’s feature complete, and the support is great. It’s everything you have always wanted but never get. Find more details here.

How to use Wavefront

I will share the instructions in how you can setup your environment and start today to monitor your environment. Yeah, today! Wavefront is a SaaS solution, so you only need to set up a proxy on-premises and connect to Wavefront in the cloud.

Step 0: Prerequisites

  1. Setup your account in Wavefront (You can use a trial here)
  2. A Linux (or Windows) machine to be used as a WaveFront Proxy (need to have Internet Access).
  3. A Linux or Windows machine that you want to monitor (need access to Internet, at least for installing the Agent).
    • If you want to see Application Monitoring, like a MySQL Database, you need it
    • If you want to explore integration, like Zabbix (yeah, we have it OOTB), you need it.

Step 1: Install Proxy

Login in your WaveFront account and click in Browse -> Proxies (Choose Linux) and copy the code.

Go to your Linux machine, that you choose to be your WaveFront proxy, and paste the code. (remember, you need to have Internet access).

Tip: to avoid errors, copy and paste the code in one single line, like below:

sudo bash -c "$(curl -sL" -- install --proxy --wavefront-url --api-token 09090099-7405993030033-a403904930907d

If you already are the root, you don’t need to use sudo.

After the installation concludes, go to the same place and check if your proxy is listed.

If the proxy does not appear, check /var/log/wavefront/wavefront.log to verify the installation.

Step 2: Install Agents

Once your proxy is setup, time to install an agent inside the machines that we want to monitor.

Click in Integration, and then Click Linux Host

After that, click Setup

If your Wavefront Proxy is not reachable with the hostname that appears in the code, change it before pasting.

Tip: put the code in one single line, like below:

bash -c "$(curl -sL" -- install --agent --proxy-address ahutchings --proxy-port 2878

After this command concludes with success, you can click in the Metrics tab and check if your metrics are available in awesome dashboard, like below:

This concludes the first part of a series of blog post about Wavefront.

Hope you find it useful. Do reach out via Linkedin and Twitter. Thanks for reading!

VMworld and your presentation skills

VMworld scoring system is a proven measurement. It has been battle tested for years and I found it a great reflection of your presentation skills. I explained the system here. While there are cultural difference between US and Europe audience, the ratings are very consistent. You get what you deserve to get.

It’s hard get a high score in VMworld. You’re presenting at the world stage, so the expectation is high. If you are not from US or Europe, you’re presenting to people from different culture. So if you get a high score, you can be proud of your presentation skills and hard work.

Don’t be obsessed with the final score itself. Use the details to improve your presentation skills. The details is a great measurement on presentation skills.

My presentations in the last 4 VMworld (2015, 2016, 2017 US, 2017 Europe) ranges from 4.3 to 4.8. Having done >10 presentations, I have a good feeling on what the audience value.

The website to review the survey and feedback is slow. It also does not render properly. Luckily, you can export the details to Excel. Here are the areas assessed:

I highlighted in red the area that need improvement in my case.

  • As a presenter, I get too excited about the product, as I’m deeply involved on it. The audience loves to see passionate presentation, but I need to be extra sensitive that it does not come across as selling. 1 attendee actually gave me a “strongly disagree“. Now, the feedback system is anonymous, so you cannot reach out to the person to find out why.
  • The audience loves technical depth. It’s hard to compress everything in 1 hour while remaining technical. I thought I have given it enough technical depth. From the feedback, it’s clear they want more.

The green is where the score is >4.8. I use 4.8 as the benchmark that I’ve done well and need to ensure I maintain it.

  • Audience wants to be respected and engaged. I encouraged them to move the front seats, ask questions and provide product feedback before the sessions start. Normally, you have a good 10 minutes before the actual time. I use it for Q&A, sending a signal that I value their Questions more than my presentation. If I do a small workshop and I have more time, I may even give more time until the audience is happy.
  • Audience wants someone who truly knows the subject. If you’re presenting a product, it helps if you are directly involved in the product design. They want to know they can give direct feedback to the person doing the work.
  • Audience wants to be entertained. Let’s face it. Datacenter Infrastructure is dry and boring. We love the technology, but we’d rather watch movie and play games instead of sitting in window-less room watching a presentation. I use humour a lot. Having done >100s presentations, I know entertainment complements education.
  • Audience wants something practical that they can apply. If you download my deck, you notice there is not a single marketecture slide. I don’t even mention the edition of the product, as it’s something I can mention in passing. Every slide has to be about the audience.

The audience also provides raw feedback. This is important as they have to type it, not simply selecting from a drop-down. So I value this more than the score. Some types from their mobile, so you get short feedback. Here are some of them. The audience certainly appreciate your effort and reward you well:

  • e1 was the best speaker I have experienced in all the VMworlds I have attended. He was so excellent that you needed an extra option on the rating scale.
  • Most interesting session I have ever been to after 3 years of VMworld.
  • Best session I have attended
  • Best session overall.
  • This is possibly the best presentation I’ve seen this year. There is a big gap of useful vRealize demos
  • Presenter was excellent (e1)

There are no shortage of generic tips on how to deliver a great presentation. Here is a short post I wrote on it. That’s what I follow, and I hope it works for you too.

May the force be with you in your presentation.


Capacity Management: it’s not what you think!

If you struggle with Capacity Management, then you’ve approached it with the wrong understanding. The issue is not with your technical skills. The issue is you don’t look at it from your customers viewpoint.

Let’s check your technical skills if you don’t trust me 😊

  1. Can you architect a cluster where the performance matches physical? Easy, just don’t overcommit.
  2. Can you architect a cluster that can handle monster VM? Easy, just get lots of core per socket.
  3. Can you architect with very high availability? Easy, just have more HA host, more vSAN FTT and failure domain.
  4. Can you architect a cluster that can run lots of VMs? Easy, just get lots of big hosts.
  5. Can you optimize the performance? Sure, follow performance best practices and configure for performance.
  6. Can you squeeze the cost? Sure, minimize the hardware, CPU socket, and choose the best bang for the buck. You know all the vendors and their technology. You know the pro and cons of each.

You see, it’s not your technical skills. It’s how you present your solution. Remember this?

“Customers want it good, cheap, and fast. Let them pick any 2”

In IaaS business, this translates into

  • Good = high performance, high availability, deep monitoring.
  • Cheap = low $$
  • Fast = soon. How fast you can get this service.

You want good high performance at cheap price? Wait until next generation Xeon and NVM arrive.

In IaaS, it is a service. Customers should not care about the underlying hardware model and architecture. Whether you’re using NSX or not, they should not and do not care.

So, present the following table. It provides 4 sample tiers for CIO & customers to choose from. Tell them the hardware & software are identical.

You should always start your presentation by explaining Tier 1. That’s the tier they expect for performance. They want it as good as physical. Give customers what they want to hear, else they will go to someone’s else cloud (e.g. Amazon or Azure).

Tier 1 sports performance guarantee. This is only possible because you do not overcommit. To the VM, it’s as good as it’s running alone in the box. No contention. There is no need for reservation, and every VM can run at 100% all day long.

What’s the catch?

Obviously, just like First Class seat, tier 1 is expensive. It’s suitable only for those latency sensitive apps.

Show them the price for Tier 1. If they are happy, end of discussion. You architect for Tier 1, as that’s the requirements. If your customers want to fly first class, then you should not stop them.

What if VM Owners wants something much cheaper, and don’t mind a small drop of performance?

You then offer Tier 2 & Tier 3. Explain that you can cut down the cost to any discount they want. But you need to match the over commitment. If they want 50% discount, then it’s 2:1 overcommit. If they want 67% discount, then it’s 3:1 overcommit. It’s that simple.

Any IT fresh graduate can do the above 🙂 No need seasoned IT Prof with 1-2+ decade of combat experience.

Your professionalism comes in here: The performance drops does not drop as low as the discount. You can achieve 50% at <50% performance drop.

How is that possible?

2 reasons impacting Demand: VM Size and VM Utilization.

You control the VM size. By not having monster VM in the lower tier, the IaaS has higher chance of giving good performance for everyone.

BTW, this is your solution to avoid over-provisioned to begin with.

From experience, we know VMs don’t run at 100% most of the time. This utilization + size helps deliver a good performance.

So we know at 2:1 overcommit, the performance degradation will not 50%. But what it will be? 10%, 30%?

BTW, 10% means that the resource is not available immediately 10% of the time. It does not mean that it’s never available. It’s just that there is a latency in getting the resource (e.g. CPU).

We can’t predict what the degradation will be, as it depends on the total utilization of the VMs, which is not in your control. However, we can monitor the degradation experienced by each VM.

This is where you tell CIO: “What is your comfort level?”

Now, we don’t know the impact to the application when there is latency in infrastructure. That depends on the application. Even on the same identical software, e.g. SQL Server 2016, the impact may differ as it depends on how you use that software. Different nature of business workload (e.g. batch vs OLTP) gets impacted differently even on the identical version of the software.

The good thing is we’re not measuring application. We are measuring infrastructure. Infra that takes the shape of service (meaning VM owners don’t really care the spec) cannot be measured by the hardware spec as that’s irrelevant. So you track how well a VM is served by the IaaS instead.

For example, for a Tier 1 VM, what the VM gets will be very close to what it wants. For example, CPU Contention will be below 0.3%, while Memory Contention will simply be 0%. Disk Latency maybe 5 ms (you need to specify it as it can’t be 0 ms).

A Tier 3 VM, on the other hand, will have worse SLA. The CPU Contention maybe 15% (you decide with CIO), the Disk latency maybe 40 ms (again, this is a business decision).

An SLA isn’t useful if it’s not tracked per VM. You track it every 5 minutes for every single VM. This is part of Operationalize Your World.

I talked earlier about controlling the demand by limiting the VM size. You specify the limit to VM size for each tier. For example, any VM in Tier 2 cannot span a physical socket. You will impact the hypervisor scheduler. Customer who wants monster VM is more than welcome to move to higher tier. You do not act as gatekeeper or government. It’s their money and they don’t appreciate you playing parent.

How do you encourage right-sizing?

By money.

Not by “we try to save company money” motherhood talk. This is business, and both Apps team and Infra team are professional. Avoid playing the government in internal IT. The Application Team expects to be treated as customers, not just colleague.

Develop a pricing model that is compelling for them to go small. Use this as best practice:

The above uses Discount and Tax. As a result, it’s much cheaper to go small. A 32 vCPU VM costs 32x of 4 vCPU, not 8x.

The above gets you the model. How about the actual price?

In business, there is no point if you can’t put a price.

Bad news, your price has been set by leading cloud players (AWS & Azure). It’s a commodity business using commodity software and hardware. The price of DC co-location and admin salary are pretty similar too.

All these means the price can’t differ that much. Using the airline analogy, the price among airlines are similar too.

Here is the price from AWS (as at 31 July 2017).

I use M4 series as that gives balance CPU & RAM. Other series are cheaper but they are using older hardware and does not provide balance combination.

From the above, I take the 1-year contract, 100% paid up front for comparison. In Enterprise IT, you may get budget annually and the budget can be transferred up front at the start of fiscal year.

The price above excludes Storage and Network. Only Compute. It also excludes Support, Monitoring, Reporting, Guest OS update, Network, Backup, Security.

It includes: DC facility rent, IaaS software + maintenance.

How do you calculate your price from above? You can take a comparison, and make it apple to apple.

I took 50 4 vCPU VM and 25 8 vCPU VM, and calculate the 3 year price.

To convert to private cloud, use 2:1 overcommit. AWS counts the HT as core.

Based on the above, you can see the price of AWS is high, as it > $100K per ESXi.

To determine your VM cost, you start by determining your total cost. I put half of AWS as I still think it’s reasonable. $396K for 7 ESXi still give you room for IT team salary, DC colocation, etc.

The above gives you your price for the equivalent AWS or Azure machine.

You should run your own calculation. Use your own overcommit ratio and VM size.

Once done, you should take this price to your internal customers. Have a Pricing discussion. When you order food at restaurant, does the price matter to you?

As the architect of the platform, you know the value of your creation best.

I hope this blog gives you food for thought. Capacity Management does not start when the VM is deployed, or the hardware was purchased. It starts much earlier than that. Go back to that time, as that’s how you can get out of the never ending right-sizing and argument over capacity.