canonical-server team mailing list archive
-
canonical-server team
-
Mailing list archive
-
Message #00061
[Bug 334994] Re: Degraded RAID boot fails: kobject_add_internal failed for dev-sda1 with -EEXIST, don't try to register things with the same name in the same directory
I should also note that the kernel is not lying, these file are visibly
present in sysfs:
(initramfs) ls /sys/devices/virtual/block/md0/md
dev-sda1 safe_mode_delay resync_start raid_disks
reshape_position new_dev component_size layout
array_state metadata_version chunk_size level
(initramfs) ls /sys/devices/virtual/block/md1/md
dev-sda5 safe_mode_delay resync_start raid_disks
reshape_position new_dev component_size layout
array_state metadata_version chunk_size level
(initramfs)
Note the dev-sda1 in the md0/md directory in sysfs, and the dev-sda5 in
the md1/md directory. These are the ones it complains about on
insertion:
[ 35.023792] WARNING: at /build/buildd/linux-2.6.28/fs/sysfs/dir.c:462 sysfs_add_one+0x4c/0x50()
[ 35.023794] sysfs: duplicate filename 'dev-sda1' can not be created
[...]
[ 35.074528] WARNING: at /build/buildd/linux-2.6.28/fs/sysfs/dir.c:462 sysfs_add_one+0x4c/0x50()
[ 35.074529] sysfs: duplicate filename 'dev-sda5' can not be created
Whatever registered this directory seems to have done it properly, it has appropriate links etc internally:
(initramfs) ls -l /sys/devices/virtual/block/md0/md/dev-sda1
lrwxrwxrwx 1 0 0 0 block -> ../../../../../pci0000:00/0000:00:01.1/host0/target0:0:0/0:0:0:0/block/sda/sda1
-rw-r--r-- 1 0 0 4096 size
-rw-r--r-- 1 0 0 4096 offset
-rw-r--r-- 1 0 0 4096 slot
-rw-r--r-- 1 0 0 4096 errors
-rw-r--r-- 1 0 0 4096 state
Ok so where do these come from. They are made by bind_rdev_to_array() and undone by unbind_rdev_from_array(). From the logs we can see that that basically the kernel is making, unmaking, and remaking the array to degrade it:
[ 3.371474] md: bind<sda1>
[ 3.381990] md: bind<sda5>
[...]
[ 35.003029] md: md0 stopped.
[ 35.003043] md: unbind<sda1>
[ 35.020198] md: export_rdev(sda1)
[ 35.023745] md: bind<sda1>
[ 35.023787] ------------[ cut here ]------------
[ 35.023792] WARNING: at /build/buildd/linux-2.6.28/fs/sysfs/dir.c:462 sysfs_add_one+0x4c/0x50()
[ 35.023794] sysfs: duplicate filename 'dev-sda1' can not be created
If we look at the unbind_rdev_from_array() call it uses delayed work to
remove the actual entries:
static void unbind_rdev_from_array(mdk_rdev_t * rdev)
{
[...]
synchronize_rcu();
INIT_WORK(&rdev->del_work, md_delayed_delete);
kobject_get(&rdev->kobj);
schedule_work(&rdev->del_work);
}
And it appears to be this this is removing the objects finally:
static void md_delayed_delete(struct work_struct *ws)
{
mdk_rdev_t *rdev = container_of(ws, mdk_rdev_t, del_work);
kobject_del(&rdev->kobj);
kobject_put(&rdev->kobj);
}
So if this was not waited for appropriatly we might well then sometimes manage to get back to binding the new one before this has been done. This being a race would also fit with the transient nature of the issue.
Will patch this to wait for the pending work and see if that resolves
the issue or not.
--
Degraded RAID boot fails: kobject_add_internal failed for dev-sda1 with -EEXIST, don't try to register things with the same name in the same directory
https://bugs.launchpad.net/bugs/334994
You received this bug notification because you are a member of Canonical
Server Team, which is a bug assignee.