Jul 30th 2010, 0:42:41
Hi all; Sorry about the major disaster we had there, here's what happened FYI.
I'm planning on moving the server on the weekend; so in preparation of that, I wanted to do a quick test to make sure that the server would, in fact, come back after I moved it.
Unfortunately, as you can see it did not.
Why not?
I forgot some important little details to do with RAID arrays back a while ago; we had a disk failure, and I hot-swapped it for a new drive, no problem, game kept on running and hardly anybody noticed anything ;) Then later I added a 3rd drive to the array, to make sure we had an extra mirror just in case.
However, I forgot to rebuild the mdadm.conf file and update the initramfs which basically makes it so that the kernel understands which drives belong to which array on booting.
So when I rebooted, it was expecting different partitions on different drives, and got totally confused. This was at 11pm my time. I suspect I might have figured it out last night, except for the fact that it took nearly 10 minutes each time I tried to boot for it to fail and drop me to a BusyBox prompt. This necessarily protracted the amount of time needed to test things.... I ended up booting to a LiveCD about 10x, verifying the RAID was good, and looking stuff up online trying to figure out why the heck the boot sequence couldn't figure out what the drives were =/ Anyway, I went to bed at 4am, got up at 7am to go to work, looked up a few things there, built a list of commands to try; got home at 6pm, and as I was booting thought of the solution, fixed it, and here we are.... well and it forced me to do a check of all the drives in the system, as there had been 240 days without a check... (the system had been online for 192 days -- that took 30 or 40 minutes).
So the lesson of the day:
If you ever change a RAID array (especially hot-swap).... update your mdadm.conf AND initramfs RIGHT THEN AND THERE, because if you reboot, everything will be totally fubared
I'm planning on moving the server on the weekend; so in preparation of that, I wanted to do a quick test to make sure that the server would, in fact, come back after I moved it.
Unfortunately, as you can see it did not.
Why not?
I forgot some important little details to do with RAID arrays back a while ago; we had a disk failure, and I hot-swapped it for a new drive, no problem, game kept on running and hardly anybody noticed anything ;) Then later I added a 3rd drive to the array, to make sure we had an extra mirror just in case.
However, I forgot to rebuild the mdadm.conf file and update the initramfs which basically makes it so that the kernel understands which drives belong to which array on booting.
So when I rebooted, it was expecting different partitions on different drives, and got totally confused. This was at 11pm my time. I suspect I might have figured it out last night, except for the fact that it took nearly 10 minutes each time I tried to boot for it to fail and drop me to a BusyBox prompt. This necessarily protracted the amount of time needed to test things.... I ended up booting to a LiveCD about 10x, verifying the RAID was good, and looking stuff up online trying to figure out why the heck the boot sequence couldn't figure out what the drives were =/ Anyway, I went to bed at 4am, got up at 7am to go to work, looked up a few things there, built a list of commands to try; got home at 6pm, and as I was booting thought of the solution, fixed it, and here we are.... well and it forced me to do a check of all the drives in the system, as there had been 240 days without a check... (the system had been online for 192 days -- that took 30 or 40 minutes).
So the lesson of the day:
If you ever change a RAID array (especially hot-swap).... update your mdadm.conf AND initramfs RIGHT THEN AND THERE, because if you reboot, everything will be totally fubared
Finally did the signature thing.