Datastore Capacity Management

This post is part of Operationalize Your World post. Do read it first to get the context.

This is the 2nd installment of Storage Capacity Management. The previous post covers the overall storage capacity management, where you can see the big picture and know which datastores are low in capacity. This post drills further and lets you analyze a specific datastore.

Datastore capacity is driven by 2 factors:

  • Performance: If the datastore is unable to serve its existing VMs, are you going to add more VM? You are right, the datastore is full, regardless of how space it has left.
  • Utilization: How much capacity is left? Thin provisioning makes this challenging.

This is what the dashboard looks like.

You start by selecting a datastore you want to check. This step is actually optional, as you would have come from the overall dashboard.

When you select a datastore, its Performance and Utilization are automatically shown.

  • Performance
    • Both actual and SLA are shown.
    • You just need to ensure that actual does not breach SLA.
  • Utilization
    • This shows the total capacity, the provisioned capacity (configured to the VM), and what’s actually used (thin provisioned).
    • You want to be careful with thin provisioning, as the VM can consumed the space as it’s already allocated to it. The line chart has 30-day projection to help you plan.

The 2 line charts is all you need. It is simple enough, yet detailed enough. It gives you room to make the judgement call. You can decide to ignore the spike because you knew it was a special event.

If you want to analyse, you can see the individual VMs. The heatmap shows the VMs distribution. You can see if there are large VMs, because they are bigger. You can see if any VM is running out of capacity, or any VM is wasting the allocated capacity.

The heatmap configuration below shows how it’s done.

You can also check if there are VMs that you can delete. Reclamation will give you extra space. The heatmap has a filter for powered off VMs, so only powered off VMs are shown.

From there, you can drill further to check that the VM has indeed met your Powered Off definition. It’s showing the VM powered off time (%) in the past 30 days. I’ve set the threshold to be 99%. Green means the VM is at least 99% powered off in the past 30 days.


I hope you agree by now that datastore performance is measured on how well it serves its VMs. We can track this by plotting a line chart showing the maximum storage latency experienced by any VM in the datastore. This maximum number has to be lower than the SLA you promise at all times.

For Utilization, we will plot a line chart showing the disk capacity left in the datastore cluster.

You should be using Datastore Cluster. Other than the benefits that you get from using it, it also makes capacity management easier.

  • You need not manually exclude local datastore.
  • You need not manually group the shared datastores, which can be complex if you have multiple clusters.

With vSAN, you only have 1 datastore per cluster and need not exclude local datastores manually. This means it’s even simpler in vSAN.

Include buffer for snapshot. This can be 20%, depending on your environment. This is why I’m not a fan of many small datastores, as you have pockets of unusable capacity. This does not have to be hardcoded in your super metric, but you have to be mentally aware of it.

Super Metrics

The screenshot below shows the super metric formula to get the Maximum latency of all the VMs in the cluster. I’ve chosen at Virtual Disk level, so it does not matter whether it is VMFS, VMFS, NFS or VSAN.

super metric - vDisk

You can copy paste the formula below:

Max ( ${adapterkind=VMWARE, resourcekind=VirtualMachine, attribute=virtualDisk|totalLatency, depth=2 } )

The screenshot below shows the super metric formula to get the total number of disk capacity left in the cluster. This is based on Thin Provisioning consumption.

You can copy paste the formula below:

sum( ${adapterkind=VMWARE, resourcekind=Datastore, attribute=capacity|available_space, depth=1} )

For Thick Provision, use the following super metric:

super metric - Disk - space left in datastore cluster - thick

You can copy paste the formula below:

${adapterkind=VMWARE, resourcekind=Datastore, attribute=capacity|total_capacity, depth=1}
) –
${adapterkind=VMWARE, resourcekind=Datastore, attribute=capacity|consumer_provisioned, depth=1}

Hope you find it useful. Just in case you’re not aware, you don’t have to implement all these manually. You can import this dashboard, together with 50+ others, from this set.

Leave a Reply