Phantom Snapshots: Resolving Orphaned Snapshot Issues
Recently, we had an issue with Phantom Snapshots. Basically, we had a VM with multiple VMDK files (vmname_01-000001.vmdk) and by multiple, I mean 26 snapshot vmdk files per VM HDD. It was quite a nightmare and was taking up a lot of space on our datastore. Unfortunately, they were not showing up in snapshot manager, nor showing up on our morning snapshot report, so they went unnoticed. Luckily, we caught these and were able to get the issues resolved.
The first problem was we were not able to ‘Delete the snapshot’ from snapshot manager, because it wasn’t there. The simple fix for that was to clone the machine during off hours. The cloning process consolidates the snapshots, and leaves only 1 VMDK file per VM HDD. Once the Line of Business confirmed the new machine was functional, I proceeded to delete the old VM thinking my problems were over. Boy was I wrong.
The delete ran perfectly fine, yet when I went back to determine if the files had actually been removed, there were still a few VMDK files left that weren’t deleted. Every time I tried to delete them from the datastore I received an error. Quick trip to the blogs and the VMware Community Forums, and I found I was not alone. Others had experienced this and the fix was to restart the hostd services on the owning ESX host of the VM, or reboot that host. Problem was, I didn’t exactly note which host this VM was on last.
Luckily we have a daily health report that comes out which list all the major task performed in the past 24 hours, so my delete was listed, along with which host I deleted it from. I proceeded to evacuate the host of all VMs with the help of DRS and placed the host in maintenance mode. I decide a reboot would be the simplest solution and anytime you get the chance to reboot a host, it’s never a bad thing in my book.
Post reboot, files were able to be deleted with no issues and all was well in the world on VMware here in my data center. I hope having all this information in one spot will help future admins with this issue. Having files eat up disk space is never a good thing and being able to resolve it quickly is a big help. If you run into this problem in the future, the steps to fix are:
- Clone your VM to consolidate snapshots
- Note which ESX/ESXi Host your problem VM is running on
- Delete the problem VM from inventory
- Evacuate other VMs from identified ESX/ESXi Host
- Restart hostd services or Reboot Host
- Delete leftover VMDK files from datastore
- Have a Coke and a Smile