My web site was down and I called GoDaddy. In the past few days, I worked with 4-5 engineers (I think they are all different). My problem is solved, but I basically solved it myself. The reason for calling Tech Support is to have a technical issue addressed.
I sincerely hope that GoDaddy review the recordings after reading this. This is by no means a complaint to GoDaddy. I plan to continue staying on with GoDaddy for a foreseable future. It’s a feedback as part of long term business relationship.
All in all, I sincerely hope GoDaddy improves its customer service. There are a few areas that I hope GoDaddy management will review:
- Reduce pressure on the Support Engineer to end the call. From the call, it is obvious they are measured on how fast they close a call. A better way would be to measure customer satisfaction at the end of the call.
- Expert level training. The engineers are knowledgeable. That’s why I use the word “expert-level” as sometimes problem requires both a broader and deeper knowledge. The broader aspect is important as sometimes the real problem is not what it appears to be.
- Give the engineers time to solve the real problem. As part of the support process, they need time to get a context of the situation. I had to make my question specific, meaning I was doing the overall troubleshooting. I can’t imagine the situation for customers who do not understand how Internet infrastructure works.
- Be polite. I had a bad day and yet I remain polite. We are all human being. One of the engineer was rude, which is actually bad for his own health.
As I said, I plan to continue with GoDaddy. This one incident does not constitute an overall weakness. I think they have one of the best services and at the price I think is reasonable. I use both the domain and hosting services.
This blog post is adapted from my book, titled VMware vRealize Operations Performance and Capacity Management. It is published by Packt Publishing. You can buy it at Amazon or Packt.
Let’s elaborate on peaks. How do you define peak utilization or contention without being overly conservative or aggressive?
There are 2 dimensions of peaks. You can measure them across time or members of the group. Let’s take a cluster with 8 ESXi hosts as an example:
- You measure across members of the group. For each sample data, take the utilization from the host with the highest utilization. In our cluster example, let’s say at 1:30 pm, host number 7 has the highest utilization among all hosts. It hits 80 percent. We then take it that the cluster peak utilization at 1:30 pm is also 80%. You repeat this process for each sample period. You may get different hosts at different times. You will not know which host provides the peak value as that varies from time to time. This method results in over-reporting, as it is the peak of a member. You can technically argue that this is the true peak.
- You measure across time. You take the average utilization of the cluster, roll up the time period to a longer time period, and take the peak of that longer time period. For example, the cluster average utilization peaks at 80% at 1:30 pm. You roll up the data for 1 day. This means the peak utilization for that day is 80%. This is the most common approach. The problem of this approach is that it is actually an average. For the cluster to hit 80% average utilization, some hosts have to hit over 80%. That means you can’t rule out the possibility that one host might hit near 100%. The same logic applies to a VM. If a VM with 16 vCPUs hits 80% utilization, some cores probably hit 100 percent. This method results in under-reporting as it is an average.
The first approach is useful if you want to know detailed information. You retain the 5-minute granularity. With the second approach, you lose the granularity and each sample becomes one day (or one month, depending on your timeline). You do not know what time of the day it hits the peak. The first approach will result in higher average than the second approach, because in most cases, your cluster is not perfectly balanced (identical utilization). In the tier 1 cluster, where you do not oversubscribe, I’d recommend the first approach as it will capture the host with the highest peak. The first approach can be achieved by using super metrics in vRealize Operations. The second approach requires the View widget with data transformation.
Does this mean you always use the first approach? The answer is no. The first approach can be too aggressive when the number of members is high. If your data center has 500 hosts and you use the first approach, then your overall data center peak utilization will always be high. All it takes is just one host to hit a peak at any given time. The same situation applies in contention. All it takes is 1 big VM, which tends to have higher contention, to skew the peak contention figure in the cluster.
The first approach fits a use case where automatic load balancing should happen. So you expect an overall balanced distribution. A DRS cluster is a good example.
I had the above situation on 2 of the 3 vCenter Appliances in the lab. All 3 are running the latest 5.5 Updates. I found a useful article here, which then link to this great article.
As you can see below, the Coredumps is full. It shows 100%.
To empty it, simply login to the appliance via SSH. I use Bitvise, a great utility, in the example below.
From there, just browse to the directory where we need to delete the files. As shown below, I have a lot of files that I no longer require.
It’s a matter of deleting them. The result is a clean directory 🙂
And the core dump now shows a healthy usage.
Speaking of vCenter, I’d recommend you deploy the vCenter Support Assistant. You should also upgrade your vCenter independently of your ESXi. I have provided some articles on vSphere 6.0, which I hope you find useful:
- How to redirect the vSphere 6.0 Platform Services Controller log
- vSphere 6 Update 1 appliance installation error.
- vSphere 6 enhancements. A tour of the web client.
- Features that are now global (cross vCenter Servers) in vSphere 6