David Andino
2014-05-25 22:37:35 UTC
Hello everyone,
I have serious troubles with something that recently happend and this is our story.
We have 12 nodes and 1 engine server. Our infrastructure is based in ISCSI and NFS (for Exports and ISOs). Recently my partner made some bad practices with our NFS server and one thing we noted in the engine, was that all of our nodes were contending the SPM and they never got agreed which one would it take it and the logs pointed to a Metadata corruption in the NFS Domains and after hours of the contending situation we tried to stabilized it.
Trying to stabilize our cluster, we shutdown our NFS server and nothing happened. The cluster was still contending the SPM. After many tests like putting the nodes in maintain mode, reboot, shutdown all the cluster, etc, The last thing we did was destroying from the configuration (using the web interface) the Export and ISO Domains leaving the Data Domain intact trying to put online our guests, and that was (I think) our big mistake. Because now we are getting a serious situation that our Data Domain is not getting online because the vdsm in every node are looking for the NFS domains we already deleted.
We were using vdsClient to consult the information the vdsm is getting and it says that we have 1 Storage Pool Domain and 3 Storage Domains, our ISCSI and the 2 NFS domains that were deleted.
We let only one node active and the rest are in maintenance and the message is that it still contending for SPM. The vdsm.log says that the node is not finding the NFS Storages. We tried to delete this domains using vdsCLient but it says that the node has to have the SPM but it can't take the SPM because it can't find the NFS Domains so is like a circle.
Now the question would be. Is there any way to delete this domains from the vdsm configuration?. What do I have to do to break this circle and the node leave to contending the SPM and activate our Data Domain again?.
I appreciate all your coments and help you can share with me.
Regards
David
I have serious troubles with something that recently happend and this is our story.
We have 12 nodes and 1 engine server. Our infrastructure is based in ISCSI and NFS (for Exports and ISOs). Recently my partner made some bad practices with our NFS server and one thing we noted in the engine, was that all of our nodes were contending the SPM and they never got agreed which one would it take it and the logs pointed to a Metadata corruption in the NFS Domains and after hours of the contending situation we tried to stabilized it.
Trying to stabilize our cluster, we shutdown our NFS server and nothing happened. The cluster was still contending the SPM. After many tests like putting the nodes in maintain mode, reboot, shutdown all the cluster, etc, The last thing we did was destroying from the configuration (using the web interface) the Export and ISO Domains leaving the Data Domain intact trying to put online our guests, and that was (I think) our big mistake. Because now we are getting a serious situation that our Data Domain is not getting online because the vdsm in every node are looking for the NFS domains we already deleted.
We were using vdsClient to consult the information the vdsm is getting and it says that we have 1 Storage Pool Domain and 3 Storage Domains, our ISCSI and the 2 NFS domains that were deleted.
We let only one node active and the rest are in maintenance and the message is that it still contending for SPM. The vdsm.log says that the node is not finding the NFS Storages. We tried to delete this domains using vdsCLient but it says that the node has to have the SPM but it can't take the SPM because it can't find the NFS Domains so is like a circle.
Now the question would be. Is there any way to delete this domains from the vdsm configuration?. What do I have to do to break this circle and the node leave to contending the SPM and activate our Data Domain again?.
I appreciate all your coments and help you can share with me.
Regards
David