User Tools

Site Tools


vmware:resignlun

How to resginature or force mount a snapshot LUN

In some situations, ESX will detect a LUN as a snapshot, this is not related to the VM snapshot functionality, this is something on the external or internal storage system.

What happens here is the following:

  1. When ESX creates a new VMFS, it take the LUN identifier (NAA, T10..etc) and a randomly generated UUID and puts it in the metadata of the new filesystem
  2. Everytime we try to mount the filesystem, we compare the LUN identifier coming from the storage system to the one we have in our metadata:
    1. If the LUN Ids match, we mount.
    2. If the LUN Ids do NOT match, we refuse to mount and declare a snapshot LUN condition.

Several things can cause this condition, including:

  1. RAID rebuild
  2. Firmware upgrade on the array that causes the NAAs to change
  3. Presentation changes on your storage
  4. This is a replica of the LUN, an actual snapshot LUN.

The one thing I noticed, is that a LUN could be in snapshot condition for a very long time, and we never notice it until something causes us to reboot and then the fun begins, so even if you need to know when, it might not be possible to do so.

Regardless of what caused it, this is intended to help you get out of this situation.

Up until ESX 3.5, the only method we had was to resignature the LUN, meaning, modifying the metadata and putting the new NAA in there, this causes interesting side effects, since resignaturing causes the UUID of the filesystem to change, so you end up with a whole bunch of VMs in inaccessible state, easily solved by unregistering and re-registering the VM.

Starting with 4.x however, VMware introduced a new mechanism where we can force-mount the LUNs using the esxcfg-volume command.

The procedure is as follows:

  1. find which volumes are in snapshot condition: esxcfg-volume -l
  2. force mount - temporarily (revert after reboot) esxcfg-volume -m VMFS_NAME, or permanently: esxcfg-volume -M VMFS_NAME
  3. you can also resignature the LUN using esxcfg-volume -r VMFS_NAME
  4. you can resignature a forced mounted LUN as well, you have to unmount then resignature.
  5. OR you can do it from the vSphere Client GUI, simply do an “Add Storage” and select to “Keep existing signature” or “Create new signature”, it does the same thing but for all the ESX hosts in the cluster, which might be easier since esxcfg-volume operates on a single ESX host level.

In case you want to do it from the command line, for example you have a large number of LUNs in that condition, I wrote a small one-liner that will do this for you, if you are doing a force mount, you will need to do it on each ESX host in the cluster, but it will save you a lot of time.

for i in `esxcfg-volume -l | grep VMFS | cut -d ' ' -f 3 | cut -d '/' -f 2 `; do esxcfg-volume -M $i ; done

Enjoy

-nick

Update

With the introduction of esxcli and localcli that replace multiple esxcfg-* commands, the same operations can be done using the following instead:

  1. Get the list: localcli storage vmfs snapshot list
  2. Mount “-m”: localcli storage vmfs mount –no-persist –volume-label
  3. Mount “-M”: localcli storage vmfs mount –volume-uuid
  4. Resignature: localcli storage vmfs resignature –volume-label

-nick

vmware/resignlun.txt · Last modified: 2014/09/10 15:36 by naccad