Discussion:
[vdsm] Rv: Re: Problems with vdsm deleted Storages Domains
David Andino
2014-05-26 12:53:52 UTC
Permalink
Hello Federico,
These are your
# rpm -aq | grep vdsm
vdsm-python-zombiereaper-4.14.6-0.el6.noarch
vdsm-xmlrpc-4.14.6-0.el6.noarch
vdsm-4.14.6-0.el6.x86_64
vdsm-python-4.14.6-0.el6.x86_64
vdsm-cli-4.14.6-0.el6.noarch
vdsm-hook-hostusb-4.14.6-0.el6.noarch
Thread-14::INFO::2014-05-26
07:01:01,287::logUtils::44::dispatcher::(wrapper) Run and
getTaskStatus(taskID='45cbd9d4-7a1c-47ff-abb2-92c0dc539554',
spUUID=None, options=None)
Thread-14::INFO::2014-05-26
07:01:01,288::logUtils::47::dispatcher::(wrapper) Run and
{'taskStatus': {'code': 358,
'message': 'Storage domain does not exist',
'taskState': 'finished',
'taskResult': 'cleanSuccess',
'45cbd9d4-7a1c-47ff-abb2-92c0dc539554'}}
Thread-14::INFO::2014-05-26
07:01:01,296::logUtils::44::dispatcher::(wrapper) Run and
getSpmStatus(spUUID='00000002-0002-0002-0002-00000000030b',
options=None)
Thread-14::INFO::2014-05-26
07:01:01,317::logUtils::47::dispatcher::(wrapper) Run and
{'spmId': -1, 'spmStatus': 'Free',
'spmLver': -1}}
Thread-14::INFO::2014-05-26
07:01:01,388::logUtils::44::dispatcher::(wrapper) Run and
clearTask(taskID='45cbd9d4-7a1c-47ff-abb2-92c0dc539554',
spUUID=None, options=None)
Thread-14::INFO::2014-05-26
07:01:01,388::logUtils::47::dispatcher::(wrapper) Run and
protect: clearTask, Return response: None
Thread-17::ERROR::2014-05-26
07:01:07,404::sdc::137::Storage.StorageDomainCache::(_findDomain)
looking for unfetched domain
0ae15b52-821f-438a-9602-2494eac3ac5b
Thread-17::ERROR::2014-05-26
07:01:07,405::sdc::154::Storage.StorageDomainCache::(_findUnfetchedDomain)
looking for domain 0ae15b52-821f-438a-9602-2494eac3ac5b
Thread-17::WARNING::2014-05-26
07:01:07,672::lvm::377::Storage.LVM::(_reloadvgs) lvm vgs
failed: 5 [] ['  Volume group
"0ae15b52-821f-438a-9602-2494eac3ac5b" not
found']
Thread-17::ERROR::2014-05-26
07:01:07,680::sdc::143::Storage.StorageDomainCache::(_findDomain)
domain 0ae15b52-821f-438a-9602-2494eac3ac5b not found
  File
"/usr/share/vdsm/storage/sdc.py", line 141, in
_findDomain
    dom =
findMethod(sdUUID)
  File
"/usr/share/vdsm/storage/sdc.py", line 171, in
_findUnfetchedDomain
    raise
se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does
(u'0ae15b52-821f-438a-9602-2494eac3ac5b',)
Thread-17::ERROR::2014-05-26
07:01:07,681::domainMonitor::239::Storage.DomainMonitorThread::(_monitorDomain)
Error while collecting domain
0ae15b52-821f-438a-9602-2494eac3ac5b monitoring
information
Traceback (most recent call
  File
"/usr/share/vdsm/storage/domainMonitor.py", line
204, in _monitorDomain
    self.domain =
sdCache.produce(self.sdUUID)
  File
"/usr/share/vdsm/storage/sdc.py", line 98, in
produce
    domain.getRealDomain()
  File
"/usr/share/vdsm/storage/sdc.py", line 52, in
getRealDomain
    return
self._cache._realProduce(self._sdUUID)
 
File "/usr/share/vdsm/storage/sdc.py", line 122,
in _realProduce
    domain =
self._findDomain(sdUUID)
  File
"/usr/share/vdsm/storage/sdc.py", line 141, in
_findDomain
    dom =
findMethod(sdUUID)
  File
"/usr/share/vdsm/storage/sdc.py", line 171, in
_findUnfetchedDomain
    raise
se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does
(u'0ae15b52-821f-438a-9602-2494eac3ac5b',)
Thread-16::ERROR::2014-05-26
07:01:10,508::sdc::137::Storage.StorageDomainCache::(_findDomain)
looking for unfetched domain
ba26bc14-3aff-48eb-816c-e02e81e5fd63
Thread-16::ERROR::2014-05-26
07:01:10,509::sdc::154::Storage.StorageDomainCache::(_findUnfetchedDomain)
looking for domain ba26bc14-3aff-48eb-816c-e02e81e5fd63
Thread-16::WARNING::2014-05-26
07:01:10,829::lvm::377::Storage.LVM::(_reloadvgs) lvm vgs
failed: 5 [] ['  Volume group
"ba26bc14-3aff-48eb-816c-e02e81e5fd63" not
found']
Thread-16::ERROR::2014-05-26
07:01:10,836::sdc::143::Storage.StorageDomainCache::(_findDomain)
domain ba26bc14-3aff-48eb-816c-e02e81e5fd63 not found
  File
"/usr/share/vdsm/storage/sdc.py", line 141, in
_findDomain
    dom =
findMethod(sdUUID)
  File
"/usr/share/vdsm/storage/sdc.py", line 171, in
_findUnfetchedDomain
    raise
se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does
(u'ba26bc14-3aff-48eb-816c-e02e81e5fd63',)
Thread-16::ERROR::2014-05-26
07:01:10,837::domainMonitor::239::Storage.DomainMonitorThread::(_monitorDomain)
Error while collecting domain
ba26bc14-3aff-48eb-816c-e02e81e5fd63 monitoring
information
Traceback (most recent call
  File
"/usr/share/vdsm/storage/domainMonitor.py", line
204, in _monitorDomain
    self.domain =
sdCache.produce(self.sdUUID)
  File
"/usr/share/vdsm/storage/sdc.py", line 98, in
produce
    domain.getRealDomain()
  File
"/usr/share/vdsm/storage/sdc.py", line 52, in
getRealDomain
    return
self._cache._realProduce(self._sdUUID)
 
File "/usr/share/vdsm/storage/sdc.py", line 122,
in _realProduce
    domain =
self._findDomain(sdUUID)
  File
"/usr/share/vdsm/storage/sdc.py", line 141, in
_findDomain
    dom =
findMethod(sdUUID)
  File
"/usr/share/vdsm/storage/sdc.py", line 171, in
_findUnfetchedDomain
    raise
se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does
(u'ba26bc14-3aff-48eb-816c-e02e81e5fd63',)
Thread-14::INFO::2014-05-26
07:01:11,489::logUtils::44::dispatcher::(wrapper) Run and
getSpmStatus(spUUID='00000002-0002-0002-0002-00000000030b',
options=None)
Thread-14::INFO::2014-05-26
07:01:11,511::logUtils::47::dispatcher::(wrapper) Run and
{'spmId': -1, 'spmStatus': 'Free',
'spmLver': -1}}
Thread-14::INFO::2014-05-26
07:01:11,516::logUtils::44::dispatcher::(wrapper) Run and
protect: getAllTasksStatuses(spUUID=None, options=None)
Thread-14::ERROR::2014-05-26
07:01:11,516::task::866::TaskManager.Task::(_setError)
Task=`74051611-a84b-43d6-8cb4-5e148dd59eed`::Unexpected
error
  File
"/usr/share/vdsm/storage/task.py", line 873, in
_run
    return fn(*args, **kargs)
  File
"/usr/share/vdsm/logUtils.py", line 45, in
wrapper
    res = f(*args, **kwargs)
  File
"/usr/share/vdsm/storage/hsm.py", line 2114, in
getAllTasksStatuses
    allTasksStatus =
sp.getAllTasksStatuses()
  File
"/usr/share/vdsm/storage/securable.py", line 73,
in wrapper
    raise
SecureError("Secured object is not in safe
state")
SecureError: Secured object is
not in safe state
Thread-14::INFO::2014-05-26
07:01:11,517::task::1168::TaskManager.Task::(prepare)
Task=`74051611-a84b-43d6-8cb4-5e148dd59eed`::aborting: Task
is aborted: u'Secured object is not in safe state' -
code 100
Thread-14::ERROR::2014-05-26
07:01:11,518::dispatcher::68::Storage.Dispatcher.Protect::(run)
Secured object is not in safe state
  File
"/usr/share/vdsm/storage/dispatcher.py", line 60,
in run
    result =
ctask.prepare(self.func, *args, **kwargs)
 
File "/usr/share/vdsm/storage/task.py", line 103,
in wrapper
    return m(self, *a, **kw)
  File
"/usr/share/vdsm/storage/task.py", line 1176, in
prepare
    raise self.error
SecureError: Secured object is not in safe
state
Thread-14::INFO::2014-05-26
07:01:11,610::logUtils::44::dispatcher::(wrapper) Run and
getSpmStatus(spUUID='00000002-0002-0002-0002-00000000030b',
options=None)
Thread-14::INFO::2014-05-26
07:01:11,644::logUtils::47::dispatcher::(wrapper) Run and
{'spmId': -1, 'spmStatus': 'Free',
'spmLver': -1}}
Thread-14::INFO::2014-05-26
07:01:11,679::logUtils::44::dispatcher::(wrapper) Run and
spmStart(spUUID='00000002-0002-0002-0002-00000000030b',
prevID=-1, prevLVER='-1', maxHostID=250,
domVersion='3', options=None)
Thread-14::INFO::2014-05-26
07:01:11,681::logUtils::47::dispatcher::(wrapper) Run and
protect: spmStart, Return response: None
7b861db2-05c7-44b1-aac7-b4ea018120cf::INFO::2014-05-26
07:01:11,988::clusterlock::184::SANLock::(acquireHostId)
Acquiring host id for domain
3e7aa4fd-fb27-47a3-9ddb-d7a97027723d (id: 12)
7b861db2-05c7-44b1-aac7-b4ea018120cf::WARNING::2014-05-26
07:01:11,989::fileUtils::167::Storage.fileUtils::(createdir)
Dir
/rhev/data-center/mnt/blockSD/3e7aa4fd-fb27-47a3-9ddb-d7a97027723d/images
already exists
7b861db2-05c7-44b1-aac7-b4ea018120cf::WARNING::2014-05-26
07:01:11,990::fileUtils::167::Storage.fileUtils::(createdir)
Dir
/rhev/data-center/mnt/blockSD/3e7aa4fd-fb27-47a3-9ddb-d7a97027723d/dom_md
already exists
7b861db2-05c7-44b1-aac7-b4ea018120cf::WARNING::2014-05-26
07:01:12,012::persistentDict::256::Storage.PersistentDict::(refresh)
data has no embedded checksum - trust it as it is
Thread-14::INFO::2014-05-26
07:01:12,329::logUtils::44::dispatcher::(wrapper) Run and
protect: repoStats(options=None)
Thread-14::INFO::2014-05-26
07:01:12,339::logUtils::47::dispatcher::(wrapper) Run and
{'code': 0, 'version': 3,
'0.00607831', 'lastCheck': '4.7',
'valid': True},
{'code': 358, 'version': -1,
'acquired': False, 'delay': '0',
'lastCheck': '1.5', 'valid': False},
{'code': 358, 'version': -1,
'acquired': False, 'delay': '0',
False}}
7b861db2-05c7-44b1-aac7-b4ea018120cf::INFO::2014-05-26
07:01:12,344::clusterlock::235::SANLock::(acquire) Acquiring
cluster lock for domain 3e7aa4fd-fb27-47a3-9ddb-d7a97027723d
(id: 12)
Thread-14::INFO::2014-05-26
07:01:12,739::logUtils::44::dispatcher::(wrapper) Run and
getTaskStatus(taskID='7b861db2-05c7-44b1-aac7-b4ea018120cf',
spUUID=None, options=None)
Thread-14::INFO::2014-05-26
07:01:12,742::logUtils::47::dispatcher::(wrapper) Run and
{'taskStatus': {'code': 0,
'message': 'Task is initializing',
'taskState': 'running',
'7b861db2-05c7-44b1-aac7-b4ea018120cf'}}
7b861db2-05c7-44b1-aac7-b4ea018120cf::INFO::2014-05-26
07:01:13,347::sp::431::Storage.StoragePool::(_upgradePool)
Trying to upgrade master domain
`3e7aa4fd-fb27-47a3-9ddb-d7a97027723d`
7b861db2-05c7-44b1-aac7-b4ea018120cf::WARNING::2014-05-26
07:01:13,615::persistentDict::256::Storage.PersistentDict::(refresh)
data has no embedded checksum - trust it as it is
Thread-14::INFO::2014-05-26
07:01:13,758::logUtils::44::dispatcher::(wrapper) Run and
getTaskStatus(taskID='7b861db2-05c7-44b1-aac7-b4ea018120cf',
spUUID=None, options=None)
Thread-14::INFO::2014-05-26
07:01:13,759::logUtils::47::dispatcher::(wrapper) Run and
{'taskStatus': {'code': 0,
'message': 'Task is initializing',
'taskState': 'running',
'7b861db2-05c7-44b1-aac7-b4ea018120cf'}}
7b861db2-05c7-44b1-aac7-b4ea018120cf::INFO::2014-05-26
07:01:13,878::sd::374::Storage.StorageDomain::(_registerResourceNamespaces)
Resource namespace
3e7aa4fd-fb27-47a3-9ddb-d7a97027723d_imageNS already
registered
7b861db2-05c7-44b1-aac7-b4ea018120cf::INFO::2014-05-26
07:01:13,878::sd::382::Storage.StorageDomain::(_registerResourceNamespaces)
Resource namespace
3e7aa4fd-fb27-47a3-9ddb-d7a97027723d_volumeNS already
registered
7b861db2-05c7-44b1-aac7-b4ea018120cf::INFO::2014-05-26
07:01:13,879::blockSD::456::Storage.StorageDomain::(_registerResourceNamespaces)
Resource namespace
3e7aa4fd-fb27-47a3-9ddb-d7a97027723d_lvmActivationNS already
registered
Thread-228151::ERROR::2014-05-26
07:01:14,188::sdc::137::Storage.StorageDomainCache::(_findDomain)
looking for unfetched domain
ba26bc14-3aff-48eb-816c-e02e81e5fd63
Thread-228152::ERROR::2014-05-26
07:01:14,191::sdc::137::Storage.StorageDomainCache::(_findDomain)
looking for unfetched domain
0ae15b52-821f-438a-9602-2494eac3ac5b
Thread-228151::ERROR::2014-05-26
07:01:14,192::sdc::154::Storage.StorageDomainCache::(_findUnfetchedDomain)
looking for domain ba26bc14-3aff-48eb-816c-e02e81e5fd63
Thread-228152::ERROR::2014-05-26
07:01:14,195::sdc::154::Storage.StorageDomainCache::(_findUnfetchedDomain)
looking for domain 0ae15b52-821f-438a-9602-2494eac3ac5b
7b861db2-05c7-44b1-aac7-b4ea018120cf::WARNING::2014-05-26
07:01:14,201::fileUtils::167::Storage.fileUtils::(createdir)
Dir
/rhev/data-center/mnt/blockSD/3e7aa4fd-fb27-47a3-9ddb-d7a97027723d/master
already exists
Thread-228152::WARNING::2014-05-26
07:01:14,513::lvm::377::Storage.LVM::(_reloadvgs) lvm vgs
failed: 5 [] ['  Volume group
"0ae15b52-821f-438a-9602-2494eac3ac5b" not
found']
Thread-228151::WARNING::2014-05-26
07:01:14,513::lvm::377::Storage.LVM::(_reloadvgs) lvm vgs
failed: 5 [] ['  Volume group
"ba26bc14-3aff-48eb-816c-e02e81e5fd63" not
found']
Thread-228152::ERROR::2014-05-26
07:01:14,553::sdc::143::Storage.StorageDomainCache::(_findDomain)
domain 0ae15b52-821f-438a-9602-2494eac3ac5b not found
Regards
--------------------------------------------
El lun, 5/26/14, Federico Simoncelli <fsimonce at redhat.com>
[vdsm] Problems with vdsm deleted Storages Domains
A: "David Andino" <david_andino at yahoo.com>
vdsm-devel at lists.fedorahosted.org
Fecha: lunes, 26 de mayo de 2014, 04:54 am
Can you paste the exact
traceback that you see in the vdsm log?
Also, what vdsm version is it?
Thanks,
--
Federico
-----
Original Message
-----
"David Andino" <david_andino at yahoo.com>
To: vdsm-devel at lists.fedorahosted.org
Sent: Monday, May 26, 2014 12:37:35 AM
Subject: [vdsm] Problems with vdsm
deleted
Storages Domains
Hello
everyone,
I
have serious troubles with something that
recently happend
and this is our
story.
We have 12 nodes and
1
engine server. Our infrastructure is based in ISCSI
and NFS (for Exports and ISOs).
Recently
my partner made some bad
practices
with
our
NFS server and one thing we noted in the engine, was
that all of
our nodes
were contending
the SPM and they never got
agreed which one would
it take it and
the logs pointed to a
Metadata corruption
in the NFS Domains
and after hours of the contending situation we tried to
stabilized it.
Trying to stabilize our
cluster, we shutdown our NFS server
and
nothing
happened. The cluster was
still contending the SPM. After many tests
like
putting the nodes in maintain
mode,
reboot, shutdown all the cluster,
etc,
The last thing
we did was destroying from the configuration
(using the web
interface) the Export
and ISO Domains
leaving the Data Domain intact trying
to put online our guests, and that was (I
think) our big mistake. Because
now
we
are getting a serious situation that our
Data Domain is not
getting
online because the vdsm in
every node
are looking for the NFS domains we
already deleted.
We were using vdsClient to consult the
information the vdsm is getting and it
says that we have 1 Storage Pool
Domain and 3 Storage
Domains, our ISCSI
and
the 2 NFS
domains
that were deleted.
We let only one node active and the rest
are in maintenance and the message
is
that it still
contending for SPM. The vdsm.log says that the
node is not
finding the
NFS Storages.
We tried to delete this
domains using vdsCLient
but it says
that the node has to have the
SPM but it
can't take the SPM
because it can't find the NFS Domains so
is like a
circle.
Now the
question would
be. Is there any way to delete this domains
from the
vdsm configuration?. What do
I
have to do to break this circle and the
node
leave to contending the SPM and
activate
our Data Domain again?.
I appreciate all
your coments and help you
can share with
me.
Regards
David
_______________________________________________
vdsm-devel mailing list
vdsm-devel at lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Federico Simoncelli
2014-05-26 13:20:45 UTC
Permalink
----- Original Message -----
From: "David Andino" <david_andino at yahoo.com>
To: vdsm-devel at lists.fedorahosted.org, fsimonce at redhat.com
Sent: Monday, May 26, 2014 2:53:52 PM
Subject: Rv: Re: [vdsm] Problems with vdsm deleted Storages Domains
Hello Federico,
These are your
# rpm -aq | grep vdsm
vdsm-python-zombiereaper-4.14.6-0.el6.noarch
vdsm-xmlrpc-4.14.6-0.el6.noarch
vdsm-4.14.6-0.el6.x86_64
vdsm-python-4.14.6-0.el6.x86_64
vdsm-cli-4.14.6-0.el6.noarch
vdsm-hook-hostusb-4.14.6-0.el6.noarch
Some more lines are missing, I need to see the traceback of the task:

7b861db2-05c7-44b1-aac7-b4ea018120cf
--
Federico
Federico Simoncelli
2014-05-26 14:27:46 UTC
Permalink
----- Original Message -----
From: "David Andino" <david_andino at yahoo.com>
To: "Federico Simoncelli" <fsimonce at redhat.com>
Sent: Monday, May 26, 2014 4:09:44 PM
Subject: Re: Rv: Re: [vdsm] Problems with vdsm deleted Storages Domains
Hello Federico
I have attached the entire file so you can read it.
Let me know whether you need anything else
Thanks
David
The relevant traceback is:

7b861db2-05c7-44b1-aac7-b4ea018120cf::ERROR::2014-05-26 07:01:15,376::sp::329::Storage.StoragePool::(startSpm) Unexpected error
Traceback (most recent call last):
File "/usr/share/vdsm/storage/sp.py", line 296, in startSpm
self._updateDomainsRole()
File "/usr/share/vdsm/storage/securable.py", line 75, in wrapper
return method(self, *args, **kwargs)
File "/usr/share/vdsm/storage/sp.py", line 205, in _updateDomainsRole
domain = sdCache.produce(sdUUID)
File "/usr/share/vdsm/storage/sdc.py", line 98, in produce
domain.getRealDomain()
File "/usr/share/vdsm/storage/sdc.py", line 52, in getRealDomain
return self._cache._realProduce(self._sdUUID)
File "/usr/share/vdsm/storage/sdc.py", line 122, in _realProduce
domain = self._findDomain(sdUUID)
File "/usr/share/vdsm/storage/sdc.py", line 141, in _findDomain
dom = findMethod(sdUUID)
File "/usr/share/vdsm/storage/sdc.py", line 171, in _findUnfetchedDomain
raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist: (u'ba26bc14-3aff-48eb-816c-e02e81e5fd63',)


The problem was fixed (and backported to 3.4) in:

http://gerrit.ovirt.org/25424
http://gerrit.ovirt.org/27194

That is going to be available in vdsm-4.14.8

With that build you'll be able to start the SPM and later on use
forcedDetachStorageDomain to remove the old domain.
--
Federico
Loading...