The bootleg server, orchid, I run at work finally died a death, It ran Debian Squeeze the hardware a very cheap AMD system, hand-built in 2003. One day I could no longer connect to it via the VPN. This could have been caused be someone accidently unplugging its ethernet cable, but when I could finally get in front the machine, I found it continually looping in the BIOS startup. I powered off, and tried to power on. Lights came on but nothing on the screen. Ah well, nine years of continual running from a cheap consumer-grade PC was pretty good.
Through the good offices of a colleague in the IT department, I was able to take possession a no-longer used Dell server. This gave me a two core 2.8GHz Pentium with 2GB of main memory. Luxury! There was only one slight snag; it supported SATA disks, but the disks from orchid were PATA. However, it did have one IDE PATA connector for the CDROM drive. I attached the orchid drives to this. The IDE ribbon connector was not long enough to allow the drives to sit in the enclosures, so I had to let them sit loose on a convenient shelf inside the case.
Acid test time - would it boot? First attempt hung in the BIOS, which complained about a missing disk (the existing SATA drive I had removed). Turned the SATA drive off in the BIOS and, lo, orchid lived again.
The PATA disks were now the oldest things in the machine, and likely next to go. I therefore bought two 320GB SATA drives with the aim of having mirrored disks, replacing the existing PATA drives (which totalled 240GB).
First, I installed mdadm that also caused a rebuild of the initrd so that the mdadm stuff would be available during kernel boot.
I decided I would use the ability of md devices to support partitions to create one big mirror and then carve it up into the root, boot, usr etc partitions. A single partition was created on each of the two disks using cfdisk. The result is shown below, in parted output format.
orchid:~# parted /dev/sdc GNU Parted 2.3 Using /dev/sdc Welcome to GNU Parted! Type 'help' to view a list of commands. (parted) p Model: ATA Maxtor 7L320S0 (scsi) Disk /dev/sdc: 324GB Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End Size Type File system Flags 1 32.3kB 324GB 324GB primary ext3 raid (parted) orchid:~# parted /dev/sdd GNU Parted 2.3 Using /dev/sdd Welcome to GNU Parted! Type 'help' to view a list of commands. (parted) p Model: ATA Maxtor 7L320S0 (scsi) Disk /dev/sdd: 324GB Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End Size Type File system Flags 1 32.3kB 324GB 324GB primary ext3 raid
Next step was to create the mirror (RAID1) array:
orchid:~# mdadm --create --level=1 --raid-devices=2 /dev/sdc1 /dev/sdd1 orchid:~# mdadm --detail /dev/md0 /dev/md0: Version : 1.2 Creation Time : Wed Feb 29 15:30:27 2012 Raid Level : raid1 Array Size : 316334723 (301.68 GiB 323.93 GB) Used Dev Size : 316334723 (301.68 GiB 323.93 GB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent Update Time : Fri Mar 2 08:34:15 2012 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Name : orchid:0 (local to host orchid) UUID : 482da84a:8fdc30de:b2acf8c8:fc8e97d6 Events : 46 Number Major Minor RaidDevice State 0 8 33 0 active sync /dev/sdc1 1 8 49 1 active sync /dev/sdd1
Or another way of looking at the array:
orchid:~# cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md0 : active raid1 sdc1[0] sdd1[1] 316334723 blocks super 1.2 [2/2] [UU] unused devices: <none>
The array was then partioned using cfdisk
. The resulting
set of partition is shown below (using parted
as it
provides a more pleasing listing format)
orchid:~# parted /dev/md0 GNU Parted 2.3 Using /dev/md0 Welcome to GNU Parted! Type 'help' to view a list of commands. (parted) p Model: Linux Software RAID Array (md) Disk /dev/md0: 324GB Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End Size Type File system Flags 1 2048B 500MB 500MB primary ext3 boot 2 500MB 1524MB 1024MB primary ext3 3 1524MB 2548MB 1024MB primary linux-swap(v1) 4 2548MB 324GB 321GB extended 5 2548MB 3048MB 500MB logical ext3 6 3048MB 4072MB 1024MB logical ext3 7 4072MB 10.2GB 6144MB logical ext3 8 10.2GB 133GB 123GB logical ext3 9 133GB 324GB 191GB logical ext3
The third step required the copying of the existing partition data on the PATA drives to the new mirrored version. I wrote a shell script to automate this process, shown below:
#!/bin/sh # # NAME # # clone-part.sh - Script to automate cloning of existing disk partions to # a new device # # SYNOPSIS # # sh clone-parts.sh -t target [-n] [-l label-prefix] [-p parts] # # Switches: # -t target Set target device name, e.g. sdc, md or md0p # -l label-prefix Set label prefix for partitions on new device # -p parts Set partition-id mountpoint relationship # -n Just show what would be done # # DESCRIPTION # # clone-parts.sh provides a capability to clone the contents of # mounted partitons to another disk. It accepts a mapping from the # current mount points to partition identifiers (partids) (it is # possible to override a built-in default mapping using the -p # argument. # # Partition mappings are specfied by a string of the following # format: # # "<partition-id>=<mount-point> [<partition-id>=<mount-point>] ..." # # The default partition mappings are specified in the shell script, # i.e.: # PARTS="1=/boot 2=/ 3=swap 5=/tmp 6=/var 7=/usr 8=/home 9=/rep" # # This default can be overridden on the command line using the "-p" # switch. # # An ext3 filesystem is created on each partition. The new # filesystems are labelled, according to a prefix (default "raid") # followed by the mount-point name (e.g. raid-boot). A mount-point # of "/" is converted to "root". Leading "/" characters are removed # from mount-point names when using them as a component of partition # labels. The prefix and postfix are separated by "-". # # The rsync utility is invoked to copy the contents of the source # partition (as identifed by the <mount-point> token) to the target # partition. Each new parition is mounted on the /mnt mount point. # # The "-n" switchs request clone-part.sh to emit the generated # commands to stdout, rather than actualy executing them. # # MPW 3rd March, 2012 EXEC=eval SCRIPT=${0##*/} TARGET="" PARTS="1=/boot 2=/ 3=swap 5=/tmp 6=/var 7=/usr 8=/home 9=/rep" LABPREFIX="raid" query() { echo -n "$SCRIPT: $1 OK to continue? (y/n): " read cont if [ $cont != "y" ]; then echo "${SCRIPT}: Terminated." exit 1 fi } die() { echo "${SCRIPT}: $1 Terminating..." exit 1 } while [ $# -gt 0 ]; do case $1 in -t) TARGET=$2 shift ;; -p) PARTS=$2 shift ;; -n) EXEC=echo ;; -l) LABPREFIX=$2 shift ;; *) die "unrecognised switch: $1." ;; esac; shift done if [ "${TARGET}" = "" ]; then die " no target specified." fi for part in $PARTS do partid=${part%=*} mntpoint=${part#*=} if [ ${mntpoint} = "swap" ] ; then echo "${SCRIPT}: making swap space on ${TARGET}${partid}" ${EXEC} mkswap /dev/${TARGET}${partid} continue; fi if [ ${mntpoint} = "/" ]; then labelname="root" else labelname=`echo ${mntpoint}|sed -e 's/\///g'` fi ${EXEC} mkfs -t ext3 -L ${LABPREFIX}-${labelname} /dev/${TARGET}${partid} if [ $? -ne 0 ]; then die " mkfs error." fi ${EXEC} mount -t ext3 /dev/${TARGET}${partid} /mnt if [ $? -ne 0 ]; then die " unable to mount ${TARGET}${partid}." fi echo "${SCRIPT}: rsyncing ${mntpoint} to ${TARGET}${partid} ..." # rsync flags: -a, archive, -u, update, -H, preserve hard links # -x, don't cross file system boundaries ${EXEC} rsync -auHx --exclude=/proc/* --exclude=/sys/* ${mntpoint}/ /mnt if [ $? -ne 0 ]; then die "error rsyncing partition ${mntpoint}." fi ${EXEC} umount /mnt done
This was run as follows (the defaults conveniently match my use case):
sh clone-parts.sh -t md0p
The /etc/fstab
file on the RAID was changed to reflect the
mirrored disks:
orchid:~# more /mnt/etc/fstab # /etc/fstab: static file system information. # # <file system> <mount point> <type> <options> <dump> <pass> proc /proc proc defaults 0 0 LABEL=raid-root / ext3 errors=remount-ro 0 1 LABEL=raid-boot /boot ext3 defaults 0 2 LABEL=raid-home /home ext3 defaults 0 2 LABEL=raid-rep /rep ext3 defaults 0 2 LABEL=raid-tmp /tmp ext3 defaults 0 2 LABEL=raid-usr /usr ext3 defaults 0 2 LABEL=raid-var /var ext3 defaults 0 2 /dev/md3 none swap sw 0 0 /dev/cdrom /media/cdrom0 udf,iso9660 user,noauto 0 0 /dev/fd0 /media/floppy0 auto rw,user,noauto 0 0
Now we enter black art territory. There seems to be no formal documentation on how to boot raid disks using GRUB2 (apart from that "GRUB 2 can read files directly from LVM and RAID devices.", stated in the GNU GRUB2 manual).
I re-configured grub-pc
so that I could install it on all
the hard drives, via:
dpkg-reconfigure grub-pc
That seemed to work, and it also auto-generated an entry for kernel on the mirrored partition. However, the stanza was not what I wanted, as it passed the existing PATA-based root partition to the kernel, not the mirrored version.
So, from reading various web pages, here's how I set up the GRUB2
stanza for booting from the mirror, in
the /etc/grub.d/40_custom
file, designed for that very
purpose:
menuentry "Debian GNU/Linux, with Linux 2.6.32-5-686 (RAID1 /dev/md0)" { insmod raid insmod mdraid insmod part_msdos insmod ext2 set root='(md0,1)' search --no-floppy --label --set raid-boot linux /vmlinuz-2.6.32-5-686 root=LABEL=raid-root initrd /initrd.img-2.6.32-5-686 }
The insmod lines ensure that the mdraid aware code is available to
grub. The search command populates the root
environment
variable with the grub disk name corresponding to the device
labelled with "raid-root". Kind of makes the
preceeding set
command a bit surperflous.
I performed a test boot off the raid array, which worked, but showed
I had buggered up the rsync copy to the mirror (missing "/" after
${mntpoint} in the rsync
command, if you must know). I
rebooted into the normal system to find that the mirror was marked
as "(auto-read-only)" and "resync=PENDING". I suspect this was due
to me using the reboot command, which did not allow the array to
shutdown cleanly. The resync can be restarted by issuing the
command:
mdadm --readwrite /dev/md0
I fixed the files on the mirror and then fiddled around with the
grub defaults. I invoked update-grub
to generate a
new grub.cfg
file:
orchid:~# update-grub Generating grub.cfg ... Found linux image: /boot/vmlinuz-2.6.32-5-686 Found initrd image: /boot/initrd.img-2.6.32-5-686 Found Debian GNU/Linux (6.0.4) on /dev/md0p2 /usr/sbin/grub-probe: error: no such disk. /usr/sbin/grub-probe: error: no such disk. done
Not good. As an attempt to fix, I ran the following command:
orchid:~# grub-mkdevicemap --no-floppy
With not much success. Well, it was different, I suppose.
orchid:~# update-grub Generating grub.cfg ... Found linux image: /boot/vmlinuz-2.6.32-5-686 Found initrd image: /boot/initrd.img-2.6.32-5-686 Found Debian GNU/Linux (6.0.4) on /dev/md0p2 /usr/sbin/grub-probe: error: unknown filesystem. done
This error has been reported as Debian bug 567618 and supposed fixed in grub-pc 1.98+20100804-14+squeeze1, so maybe this is a different problem. Further research indicated it was caused by problems grub has with md partitions. So, back to square one. I decided to use a different mirroring scheme; create multiple partitions on each physical device and then mirror each partition.
OK, back to the beginning. First, I had to stop and destroy the current array:
mdadm --stop /dev/md0 mdadm --zero-superblock /dev/sdc1 /dev/sdd1
I created a bunch of partitions on /dev/sdc
, as shown
below:
orchid:~# parted /dev/sdc GNU Parted 2.3 Using /dev/sdc Welcome to GNU Parted! Type 'help' to view a list of commands. (parted) p Model: ATA Maxtor 7L320S0 (scsi) Disk /dev/sdc: 324GB Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End Size Type File system Flags 1 32.3kB 502MB 502MB primary ext3 boot 2 502MB 1522MB 1020MB primary raid 3 1522MB 2542MB 1020MB primary raid 4 2542MB 324GB 321GB extended 5 2542MB 3043MB 502MB logical raid 6 3043MB 4063MB 1020MB logical raid 7 4063MB 10.2GB 6144MB logical raid 8 10.2GB 133GB 123GB logical raid 9 133GB 324GB 191GB logical raid
The partition setup was cloned to the other disk
(/dev/sdd
), using sfdisk:
orchid:~# sfdisk -d /dev/sdc |sfdisk /dev/sdd Checking that no-one is using this disk right now ... OK Disk /dev/sdd: 39382 cylinders, 255 heads, 63 sectors/track Old situation: Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0 Device Boot Start End #cyls #blocks Id System /dev/sdd1 0+ 39381 39382- 316335883+ fd Linux raid autodetect /dev/sdd2 0 - 0 0 0 Empty /dev/sdd3 0 - 0 0 0 Empty /dev/sdd4 0 - 0 0 0 Empty New situation: Units = sectors of 512 bytes, counting from 0 Device Boot Start End #sectors Id System /dev/sdd1 * 63 979964 979902 83 Linux /dev/sdd2 979965 2972024 1992060 fd Linux raid autodetect /dev/sdd3 2972025 4964084 1992060 fd Linux raid autodetect /dev/sdd4 4964085 632671829 627707745 5 Extended /dev/sdd5 4964148 5944049 979902 fd Linux raid autodetect /dev/sdd6 5944113 7936109 1991997 fd Linux raid autodetect /dev/sdd7 7936173 19936664 12000492 fd Linux raid autodetect /dev/sdd8 19936728 259931699 239994972 fd Linux raid autodetect /dev/sdd9 259931763 632671829 372740067 fd Linux raid autodetect Successfully wrote the new partition table Re-reading the partition table ... If you created or changed a DOS partition, /dev/foo7, say, then use dd(1) to zero the first 512 bytes: dd if=/dev/zero of=/dev/foo7 bs=512 count=1 (See fdisk(8).)
This time, I planned to leave the boot partition un-mirrored to avoid
any flaws in GRUB2. I created a new script, mkmirror.sh
to
automate the process of creating RAID1 arrays:
#!/bin/sh # # NAME # # mkmirror.sh - Script to automate creation of raid1 (mirror) disks # # SYNOPSIS # # sh mkmirrr.sh [-d1 dev] [-d2 dev] [-p "partition list"] [-n] # # Switches: # -d1 dev Set first device name, default /dev/sdc # -d2 dev Set second device name, default /dev/sdd # -p "partition list" Define list of partition numbers # -n Just show what would be done # # DESCRIPTION # # mkmirror.sh simplifies the creation of multiple RAID devices via # linux's mdadm utility. md devices are created with the same number # as the source partition id. The script assumes that each source # device already contains partitions matching the partition id list. # The script contains a in-built list of partition ids. # # Partition ids are specified by space separated numbers, enclosed # in double quotes. The in-built default is: # # "2 3 5 6 7 8 9" # # This default can be overridden on the command line using the "-p" # switch. # # The "-n" switch requests mkmirror.sh to emit the generated # commands to stdout, rather than actualy executing them. # # MPW 3rd March, 2012 SCRIPT=${0##*/} DEV1="/dev/sdc" DEV2="/dev/sdd" PARTS="2 3 5 6 7 8 9" EXEC=eval die() { echo "${SCRIPT}: $1 Terminating..." exit 1 } while [ $# -gt 0 ]; do case $1 in -d1) DEV1=$2 shift ;; -d2) DEV2=$2 shift ;; -p) PARTS=$2 shift ;; -n) EXEC=echo ;; *) die "unrecognised switch: $1." ;; esac shift done for partid in $PARTS do echo "${SCRIPT}: making /dev/md${partid} ..." # yes used to cater for mdadm sanity check ${EXEC} "yes | /sbin/mdadm --create /dev/md${partid} --level=1 --raid-devices=2 ${DEV1}${partid} ${DEV2}${partid}" if [ $? -ne 0 ]; then die " unable to create /dev/md${partid}." fi done
The RAID1 arrays were then created and populated using the two scripts:
# omit boot partition from the setup ... sh mkmirror.sh -d1 /dev/sdc -d2 /dev/sdd -p "2 3 5 6 7 8 9" sh clone-parts.sh -t md -p "2=/ 3=swap 5=/tmp 6=/var 7=/usr 8=/home 9=/rep"
/etc/fstab
on the new mirrored root partition was modified
to mount the mirrored disks:
orchid:~# mount -t ext3 /dev/md2 /mnt orchid:~# cat /mnt/etc/fstab # /etc/fstab: static file system information. # # <file system> <mount point> <type> <options> <dump> <pass> proc /proc proc defaults 0 0 LABEL=raid-root / ext3 errors=remount-ro 0 1 LABEL=raid-boot /boot ext3 defaults 0 2 LABEL=raid-home /home ext3 defaults 0 2 LABEL=raid-rep /rep ext3 defaults 0 2 LABEL=raid-tmp /tmp ext3 defaults 0 2 LABEL=raid-usr /usr ext3 defaults 0 2 LABEL=raid-var /var ext3 defaults 0 2 /dev/md3 none swap sw 0 0 /dev/cdrom /media/cdrom0 udf,iso9660 user,noauto 0 0 /dev/fd0 /media/floppy0 auto rw,user,noauto 0 0
I populated the boot partitions on both /dev/sdc
and
/dev/sdd
manually (label is the same on both partitions, as
I don't really care which one we boot from):
for disk in c d; do mkfs -t ext3 -L raid-boot /dev/sd${disk}1 mount -t ext3 /dev/sd${disk}1 /mnt rsync -auHx /boot/ /mnt umount /mnt done
With a non-raid boot partition, the GRUB2 boot stanza looks as follows:
menuentry "Debian GNU/Linux, with Linux 2.6.32-5-686 (RAID1 /dev/sdc)" { insmod part_msdos insmod ext2 set root='(hd2,msdos1)' search --no-floppy --label --set raid-boot linux /vmlinuz-2.6.32-5-686 root=LABEL=raid-root initrd /initrd.img-2.6.32-5-686 }
OK, let's run update-grub
again...
orchid:~# update-grub Generating grub.cfg ... Found linux image: /boot/vmlinuz-2.6.32-5-686 Found initrd image: /boot/initrd.img-2.6.32-5-686 done
That's much better. Booting GRUB2 from /dev/sda
and
choosing to boot from the mirrored disks worked. I then re-install
GRUB2 on /dev/sdc
and /dev/sdd
, re-writing the
/boot/grub/grub.cfg
in the process:
orchid:~# grub-install /dev/sdc orchid:~# grub-install /dev/sdd orchid:~# update-grub Generating grub.cfg ... Found linux image: /boot/vmlinuz-2.6.32-5-686 Found initrd image: /boot/initrd.img-2.6.32-5-686 Found Debian GNU/Linux (6.0.4) on /dev/sda7 done
As /dev/sdc1
was the partition mounted as boot, I had to
copy the new grub.cfg
file to /dev/sdd1. I then re-booted
and changed the boot order so that the SATA disks preceeded the PATA
disks. Yes, it booted, although off /dev/sdd
, rather than
/dev/sdc
as I had expected.
Orchid is now happily running on mirrored disks.
I decided I didn't like the unmirrored boot partitions (not sufficiently uniform with the other partitions?), so I took the following steps to move to a fully mirrored environment:
cfdisk
)
mdadm --create /dev/md1 --level=1 --raid-devices=2 /dev/sdc1 missing
mkfs -t ext3 -L raid-boot /dev/md1
mdadm --add /dev/md1 /dev/sdd1
The reboot failed, GRUB2 complaining it could not find its files
(error: no such disk
). I was presented with the GRUB2
rescue prompt. Well, I suppose GRUB2 provides a good field surgical
kit when you shoot yourself in the foot.
After a bit of panic research on the internet, I typed ls
at the rescue prompt. That gave me a list of drives and partions
(e.g.):
(hd0) (hd0,msdos9) (hd0,msdos8)...
In all, three drives were seen by GRUB2, hd0, hd1 and hd2. Hmm, which was
/dev/sda
, the old PATA drive? To find out, I used the
following commands, trying each drive in turn:
set prefix=(hd0,msdos1)/grub set root=(hd0,msdos1) insmod normal
Naturally, the last I tried, hd2, was the one that worked i.e. did
not give an error on the insmod normal
command. Inserting the
normal
module provides more facilties at the grub prompt.
Invoke it by typing normal
at the rescue prompt. If you
are lucky (as I was) you will get the GRUB2 menu from
the boot (/dev/sda1
) partition.
So, what had I forgotten to do in the setup of the mirrored disks? Maybe I should actually install GRUB2 on the boot mirror partition? So, I entered the following command:
# grub-install /dev/md1
which completed successfully. On reboot, GRUB2 greeted me with the
menu from grub/grub.cfg
on /dev/md1
. Hurrah!
I had to perform the initial boot from the mirrored disk by choosing
one of the custom boot stanzas I'd written into the
/etc/grub.d/40_custom
file (see above), as the default
stanza, created by grub-update
, still referred to the old
/dev/sdd1
UUID (that of course no longer existed). Once I
had the mirrored version up, I could issue a grub-update
command again, which rewrote the default boot stanza with the
correct /dev/md1
UUID value as the value of
--fs-uuid
in the set
command. This is what the
result looks like:
menuentry 'Debian GNU/Linux, with Linux 2.6.32-5-686' --class debian --class gnu-linux --class gnu --class os { insmod raid insmod mdraid insmod part_msdos insmod part_msdos insmod ext2 set root='(md/1)' search --no-floppy --fs-uuid --set 629db29e-e0b6-4ecb-b688-e34db2e6b363 echo 'Loading Linux 2.6.32-5-686 ...' linux /vmlinuz-2.6.32-5-686 root=UUID=a1a08a38-878e-4709-a040-69fcd383e095 ro printk.time=0 quiet echo 'Loading initial ramdisk ...' initrd /initrd.img-2.6.32-5-686 }
And now, df -h
gives me:
Filesystem Size Used Avail Use% Mounted on /dev/md2 958M 120M 790M 14% / tmpfs 1013M 0 1013M 0% /lib/init/rw udev 1008M 244K 1008M 1% /dev tmpfs 1013M 0 1013M 0% /dev/shm /dev/md1 464M 27M 413M 7% /boot /dev/md8 113G 60G 48G 57% /home /dev/md9 175G 33G 134G 20% /rep /dev/md5 464M 11M 430M 3% /tmp /dev/md7 5.7G 930M 4.5G 17% /usr /dev/md6 958M 201M 709M 23% /var
Now everything is much more regular.