Understanding Capacity in vSAN

Overview

Recently a customer posed a question about what they were seeing in vSAN when they moved a virtual machine over to a vSAN datastore.  Essentially, they were seeing vCenter report the VMDK as consuming more space than they had expected.  I began to compose a response, but then realized I might be missing some things in my explanation myself, so I took the opportunity to leverage our Hands-On-Labs to do a little experimenting.

By the way, if you don’t already use the HOL, it’s a great resource for learning, experimenting, and understanding behavior…. there are a ton of products you can play with and it is all available through your web browser.https://hol.vmware.com

So I wanted to answer the following questions for my customer:

  • How does the reporting of free / used space differ between vCenter and the guest operating system of a virtual machine?
  • What is the difference – if any – between the utilization of capacity between a normal datastore and a vSAN datastore?
  • What should we expect to see when the protection scheme used by vSAN changes?

I used the following methodology to answer these questions – screen shots and descriptions follow in subsequent sections:

  1. Create a new virtual machine on an NFS datastore, using thin provisioning.
    1. Compare reported disk utilization between guest operating system and vCenter as a baseline.
  2. Create a large file in the guest.
    1. Compare reported disk utilization between guest operating system and vCenter.
  3. Delete the large file in the guest.
    1. Compare reported disk utilization between guest operating system and vCenter.
  4. Migrate the guest from NFS to vSAN.
    1. Compare vSAN space utilization with previous NFS.
    2. This test leveraged the default vSAN Storage Policy of FTT=1 / RAID 1.
  5. Change the policy to FTT=1 / RAID5.
    1. Compare vSAN space utilization against RAID 1.
  6. Change the policy to FTT=2 / RAID6.
    1. Compare vSAN space utilization against RAID 1.

New VM on NFS

I began by downloading the vSphere OVA for Photon OS (found here:  https://vmware.github.io/photon/). I deployed it to a datastore mounted over NFS to my cluster in the HOL, ensuring it was thin provisioned.  I took screen shots of the datastore listing while powered off, and then again while powered on.

You can see from the above images that the provisioned VMDK is 15.625 GB, but it is only consuming about 556 MB after being powered on.

Impact of Large File in Guest

Once the virtual machine was booted and running, I logged into the guest file system to run a quick df -h command to compare what I was seeing in vCenter with what the virtual machine thinks is going on.

08-photon-vm-os-df-h

You can see from the Used column that the virtual machine thinks there is about 406-407MB in use… so right off the bat there is a discrepancy of about ~150MB between vCenter and the virtual machine itself.  Some of this might be chalked up to disk geometry, as well as perhaps some VMDK file headers, and so on (note – I have a request into VMware to see if I can get better detail on what makes up that difference).

Then I went ahead and created a 1GB binary file inside the guest; I wanted to see if that 150 MB discrepancy was simply overhead in the VMDK itself, or if it was compounded once the operating system began to chew up space.  I compared the reported space in the guest with the space reported by vCenter, and as you can see in the screen shots below:

  • the guest shows a 1GB file
  • vCenter shows the VMDK is now 1,565,720 KB (about 1.5GB), growing by about 992832 KB (or just shy of 1GB).

I then deleted the 1GB file, and though the VM shows used space shrinking, the VMDK does not. 

A df -h command within the guest show used space returning to ~407MB, yet vCenter continues to show ~1.5GB.

What’s going on here?

Essentially, vSphere can only tell whether a block has been touched- not if is still in use. In the example above, the VMDK grows by 1 GB, as one would expect.  vSphere recognizes that those blocks have been claimed by the guest, and allocates them to the guest as appropriate.  Even though we then deleted the file – effectively releasing 1GB of space – all vSphere knows is that the guest has touched those blocks… not that they have actually been released.  Unless you run some tools to both release the file space within the virtual machine and then reclaim the space on VMFS, vSphere only knows that 1GB of space on VMFS was touched – even if it has been deleted within the guest.  vSAN will automatically reclaim space when a virtual machine is deleted (unlike VMFS datastores on FC), but if you are looking to reclaim space that has been released within a guest, some extra steps are necessary.

Migrate VM to vSAN

I then powered off the virtual machine (to ensure the guest operating system wasn’t doing anything I needed to account for), and migrated the vm to vSAN.  In the screen captures below, the “Virtual SAN Default Storage Policy” is in effect, which performs protection of virtual machines using a mirror or replica of the VMDK on 2 ESXi hosts.

Once the virtual was migrated, I checked the properties of the virtual machine:

18-photon-settings-post-migration

You can see from the properties that the VM is now listed as consuming 3.22 GB, but is allocated 31.25 GB.  This is further confirmed by examining the vSAN datastore:

You can see the placement of the replica copies on hosts 1 and 6, and the datastore now lists the size of the VMDK as 3375104KB, or 3.2 GB.  This is only about 200 MB larger than 2x the individual VMDK when it was homed on the NFS datastore.  This is most likely due to the witness component, which contains a small amount of metadata to identify the hosts participating in the RAID1 mirror for the VMDK.

Change vSAN Policy to RAID5

vSAN 6.2 introduced Erasure Coding, which essentially allows us to treat the hosts in a cluster as members of a disk pool and perform something like RAID5 or RAID6 protection on the virtual machines. These protection schemes may be exposed as policies which may be applied to individual or groups of virtual machines.  Since we do not to ever risk data loss, if you change the policy applied to a virtual machine, vSAN will first apply the new protection scheme to the vm before eliminating the previous protection structures.  This means during the re-protection period, those virtual machines affected will effectively consume the capacity required for both the original protection scheme AND the new protection scheme.  For that reason, VMware suggests customers leave approximately 30% free space on the vSAN datastore.

In the screens below, you can see that I created a new Storage Policy leveraging FTT=1 / RAID5, which I then applied to the virtual machine.  vSAN created the new protection structures which still protecting the virtual machine with RAID 1.

You can see in the settings of the virtual machine that it predicts capacity utilization will drop by 10.4 GB of allocated space and 1.06 of used space.  Sure enough, in the final 2 shots above, you can see that the virtual machine has now been protected across 4 hosts, and the capacity utilization reported by vCenter has dropped to 2355200KB, or 2.25GB.

  • the original VMDK size was 1.5GB
  • it grew to 3.2 GB on vSAN, protected by RAID1
  • it is now 2.25GB on vSAN, protected by RAID5.

Change vSAN Policy to RAID6

Following the same methodology as above, I created a new storage policy for FTT=2 / RAID6.  I then applied the policy to the virtual machine to observe changes in capacity utilization.

As you can see, once again the virtual machine settings warn of a change in utilization – this time an increase in capacity utilization, as we are now demanding the solution be capable of accommodating 2 failures (FTT=2).  Once again, you can see that vSAN maintains the RAID5 protection while it is creating the RAID6 protection.  Finally, once the protection is complete, you can see the storage utilization is 2666496KB, or 2.54GB.

 

  • The original VMDK size was 1.5GB
  • It grew to 3.2 GB on vSAN, protected by RAID1 – slightly more than 2x the original.
    • The extra 200MB is most likely the ‘Witness’ component, used to track the replicas.
  • It shrank to 2.25GB on vSAN, protected by RAID5 – about 1.5x the original.
    • The extra overhead on the VMDK is most likely the ‘parity’ information used for protection.
  • It grew to 2.54GB on vSAN, protected by RAID6 – about 1.7x the original.
    • The extra overhead on the VMDK is most likely the ‘parity’ information used for protection.

Conclusion

While I didn’t perform an in-depth analysis of the storage profiles available to virtual machines in vCenter and vSAN, hopefully you can see the power of using storage profiles to alter the protection and behavior of your virtual machines.  Furthermore, I hope this post serves to help explain how vSAN consumes capacity under different protection schemes, and how to make sense of what you are seeing in vCenter.

 

6 comments

  1. vSAN will automatically reclaim space when a virtual machine is deleted,Is this conclusion refer to vSAN version? I know that vSAN doesn’t support VAAI because it is built-in Hypervisor, no need to offload. So I think vSAN doesnot have capacity of VAAI unmap.

      1. Can ‘vSAN will automatically reclaim space when a virtual machine is deleted’ be final conclusion?Storage backend of HOL environment may not be real vSAN.

  2. Well, in point of fact, vSAN will reclaim space when a virtual machine is deleted. I should be more clear – TRIM and UNMAP when data is deleted from within the VM are not yet supported, however. In other words, ad a virtual machine grows, the VMDK grows… If the virtual machine or the VMDK is deleted, then vSAN will reclaim the space. If data is deleted from within the virtual machine, however, we do not yet support automatic reclaim of space if the data store is vSAN. On the other hand, vSphere 6.5 absolutely supports TRIM and UNMAP when data is deleted from within the virtual machine on a FC or iSCSI datastore.

    As for your final comment – “Storage backend of HOL environment may not be real vSAN” – I’m not sure what you mean here.

  3. You mentioned “but if you are looking to reclaim space that has been released within a guest, some extra steps are necessary” so what is these extra steps that could help to reclaim the space if is is deleted inside the guest OS?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s