← Back to team overview

canonical-server team mailing list archive

[Bug 334994] Re: Degraded RAID boot fails: kobject_add_internal failed for dev-sda1 with -EEXIST, don't try to register things with the same name in the same directory

 

I should also note that the kernel is not lying, these file are visibly
present in sysfs:

    (initramfs) ls /sys/devices/virtual/block/md0/md
    dev-sda1          safe_mode_delay   resync_start      raid_disks
    reshape_position  new_dev           component_size    layout
    array_state       metadata_version  chunk_size        level
    (initramfs) ls /sys/devices/virtual/block/md1/md
    dev-sda5          safe_mode_delay   resync_start      raid_disks
    reshape_position  new_dev           component_size    layout
    array_state       metadata_version  chunk_size        level
    (initramfs) 

Note the dev-sda1 in the md0/md directory in sysfs, and the dev-sda5 in
the md1/md directory.  These are the ones it complains about on
insertion:

    [   35.023792] WARNING: at /build/buildd/linux-2.6.28/fs/sysfs/dir.c:462 sysfs_add_one+0x4c/0x50()
    [   35.023794] sysfs: duplicate filename 'dev-sda1' can not be created
    [...]
    [   35.074528] WARNING: at /build/buildd/linux-2.6.28/fs/sysfs/dir.c:462 sysfs_add_one+0x4c/0x50()
    [   35.074529] sysfs: duplicate filename 'dev-sda5' can not be created

Whatever registered this directory seems to have done it properly, it has appropriate links etc internally:
    (initramfs) ls -l  /sys/devices/virtual/block/md0/md/dev-sda1
    lrwxrwxrwx    1 0        0               0 block -> ../../../../../pci0000:00/0000:00:01.1/host0/target0:0:0/0:0:0:0/block/sda/sda1
    -rw-r--r--    1 0        0            4096 size
    -rw-r--r--    1 0        0            4096 offset
    -rw-r--r--    1 0        0            4096 slot
    -rw-r--r--    1 0        0            4096 errors
    -rw-r--r--    1 0        0            4096 state


Ok so where do these come from.  They are made by bind_rdev_to_array() and undone by unbind_rdev_from_array().  From the logs we can see that that basically the kernel is making, unmaking, and remaking the array to degrade it:

    [    3.371474] md: bind<sda1>
    [    3.381990] md: bind<sda5>
    [...]
    [   35.003029] md: md0 stopped.
    [   35.003043] md: unbind<sda1>
    [   35.020198] md: export_rdev(sda1)
    [   35.023745] md: bind<sda1>
    [   35.023787] ------------[ cut here ]------------
    [   35.023792] WARNING: at /build/buildd/linux-2.6.28/fs/sysfs/dir.c:462 sysfs_add_one+0x4c/0x50()
    [   35.023794] sysfs: duplicate filename 'dev-sda1' can not be created

If we look at the unbind_rdev_from_array() call it uses delayed work to
remove the actual entries:

    static void unbind_rdev_from_array(mdk_rdev_t * rdev)
    {
	[...]
        synchronize_rcu();
        INIT_WORK(&rdev->del_work, md_delayed_delete);
        kobject_get(&rdev->kobj);
        schedule_work(&rdev->del_work);
    }

And it appears to be this this is removing the objects finally:

    static void md_delayed_delete(struct work_struct *ws)
    {
        mdk_rdev_t *rdev = container_of(ws, mdk_rdev_t, del_work);
        kobject_del(&rdev->kobj);
        kobject_put(&rdev->kobj);
    }


So if this was not waited for appropriatly we might well then sometimes manage to get back to binding the new one before this has been done.  This being a race would also fit with the transient nature of the issue.

Will patch this to wait for the pending work and see if that resolves
the issue or not.

-- 
Degraded RAID boot fails: kobject_add_internal failed for dev-sda1 with -EEXIST, don't try to register things with the same name in the same directory
https://bugs.launchpad.net/bugs/334994
You received this bug notification because you are a member of Canonical
Server Team, which is a bug assignee.