With 320GB SATA drives on sale with free shipping, I decided to increase my workstation’s RAID capacity for the last time. I say for the last time because I am now literally out of drive bays. With the new drive, tuna now has one system drive and five 320GB drives in a RAID 1. I have two more SATA ports remaining, but I’m out of 3.5″ drive bays. Plus, I’m probably set for capacity for now!
One nice things about SATA drives with Linux these days is you don’t ever need to shut the system down. In fact, with SATA, Linux software RAID, and the XFS filesystem I was able to add 320GB of capacity to the RAID filesystem without even unmounting it. During the entire time the system was completely usable, including all data on the RAID volume.
First, I took the case side off and carefully removed the drive cage which has an empty slot. There was already another drive in this cage so I had to set it down carefully without pulling out the SATA connection or power. Next, I screwed the rail kit onto the drive. The drive with rails then slides into the cage. One the drive was in the cage I carefully plugged a new SATA cable into the SiI3114 controller on my motherboard. As soon as you connect the other end of the SATA cable to the drive, and connect power to the drive, dmesg is filled with some kernel messages recording the hotplug event:
ata8: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0x2 frozen
ata8: hard resetting port
ata8: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata8.00: ATA-7, max UDMA/133, 625142448 sectors: LBA48 NCQ (depth 0/32)
ata8.00: configured for UDMA/100
ata8: EH complete
scsi 7:0:0:0: Direct-Access ATA ST3320620AS 3.AA PQ: 0 ANSI: 5
SCSI device sdf: 625142448 512-byte hdwr sectors (320073 MB)
sdf: Write Protect is off
sdf: Mode Sense: 00 3a 00 00
SCSI device sdf: drive cache: write back
SCSI device sdf: 625142448 512-byte hdwr sectors (320073 MB)
sdf: Write Protect is off
sdf: Mode Sense: 00 3a 00 00
SCSI device sdf: drive cache: write back
sdf: unknown partition table
sd 7:0:0:0: Attached scsi disk sdf
sd 7:0:0:0: Attached scsi generic sg5 type 0
At this point I very carefully slid the cage back into the chassis and organized the cables. Next, I buttoned up the case and ran cfdisk on the new drive to create partitions:
root@tuna:~# cfdisk /dev/sdf
I created a single partition the size of the entire drive, then set the partition type to Linux raid autodetect (code “fd”). Once you tell cfdisk to write the partition table a kernel log message is generated indicating the new partition layout.
SCSI device sdf: 625142448 512-byte hdwr sectors (320073 MB)
sdf: Write Protect is off
sdf: Mode Sense: 00 3a 00 00
SCSI device sdf: drive cache: write back
sdf: sdf1
With the new sdf1 partition available for use, I added /dev/sdf1 into the RAID 5 as a hot spare.
root@tuna:~# mdadm --add /dev/md/0 /dev/sdf1
mdadm: added /dev/sdf1
root@tuna:~#
At this point md0 has four drives, with a fifth available as a hot spare:
root@tuna:~# mdadm --detail /dev/md/0
/dev/md/0:
Version : 00.90.03
Creation Time : Wed Jan 3 20:23:10 2007
Raid Level : raid5
Array Size : 937705728 (894.27 GiB 960.21 GB)
Used Dev Size : 312568576 (298.09 GiB 320.07 GB)
Raid Devices : 4
Total Devices : 5
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Wed Jul 11 18:45:40 2007
State : clean
Active Devices : 4
Working Devices : 5
Failed Devices : 0
Spare Devices : 1
Layout : left-symmetric
Chunk Size : 64K
UUID : e741b1af:83f74428:24e3f15c:4a59a3ce
Events : 0.207866
Number Major Minor RaidDevice State
0 8 17 0 active sync /dev/sdb1
1 8 33 1 active sync /dev/sdc1
2 8 49 2 active sync /dev/sdd1
3 8 65 3 active sync /dev/sde1
4 8 81 - spare /dev/sdf1
root@tuna:~#
To grow the array by turning the spare into an active component, you need to use mdadm –grow. There is a short delay while it copies a critical section, then reconstruction begins. The entire array must be reconstructed to stripe data and parity information across all five drives.
root@tuna:~# mdadm --grow /dev/md/0 -n 5
mdadm: Need to backup 768K of critical section..
mdadm: ... critical section passed.
root@tuna:~#
This kernel message is printed:
md: bind<sdf1>
RAID5 conf printout:
--- rd:5 wd:5
disk 0, o:1, dev:sdb1
disk 1, o:1, dev:sdc1
disk 2, o:1, dev:sdd1
disk 3, o:1, dev:sde1
disk 4, o:1, dev:sdf1
md: reshape of RAID array md0
md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reshape.
md: using 128k window, over a total of 312568576 blocks.
The file /proc/mdstat gives the status of the reshaping, including a time remaining estimate. In my experience, the time estimate was pretty reasonable. It took around 7 hours to reshape my array from four drives to five. You can use the “watch” command to get continuous status updates.
root@tuna:~# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdf1[4] sdb1[0] sde1[3] sdd1[2] sdc1[1]
937705728 blocks super 0.91 level 5, 64k chunk, algorithm 2 [5/5] [UUUUU]
[>....................] reshape = 0.5% (1749580/312568576) finish=440.0min speed=11770K/sec
unused devices: <none>
root@tuna:~# watch -n 30 cat /proc/mdstat
root@tuna:~# mdadm -D /dev/md/0
/dev/md/0:
Version : 00.91.03
Creation Time : Wed Jan 3 20:23:10 2007
Raid Level : raid5
Array Size : 937705728 (894.27 GiB 960.21 GB)
Used Dev Size : 312568576 (298.09 GiB 320.07 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Wed Jul 11 18:55:33 2007
State : clean, recovering
Active Devices : 5
Working Devices : 5
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
Reshape Status : 2% complete
Delta Devices : 1, (4->5)
UUID : e741b1af:83f74428:24e3f15c:4a59a3ce
Events : 0.212058
Number Major Minor RaidDevice State
0 8 17 0 active sync /dev/sdb1
1 8 33 1 active sync /dev/sdc1
2 8 49 2 active sync /dev/sdd1
3 8 65 3 active sync /dev/sde1
4 8 81 4 active sync /dev/sdf1
About 7 hours later the reshaping finished. At this point, I run xfs_growfs /mntpoint. This command will automatically choose options to grow the existing filesystem to the maximum available size. You do not need to unmount the filesystem. The grow process completes relatively quickly. Note, this is for SGI’s XFS filesystem. For other filesystems such as ext3, consult manpages to find out how to grow the filesystem.
root@tuna:~# xfs_growfs /mnt/albacore
meta-data=/dev/md/0 isize=256 agcount=48, agsize=4883888 blks
= sectsz=4096 attr=0
data = bsize=4096 blocks=234426432, imaxpct=25
= sunit=16 swidth=32 blks, unwritten=1
naming =version 2 bsize=4096
log =internal bsize=4096 blocks=32768, version=2
= sectsz=4096 sunit=1 blks
realtime =none extsz=131072 blocks=0, rtextents=0
data blocks changed from 234426432 to 312568576
root@tuna:~# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda3 138G 127G 12G 92% /
udev 1007M 2.7M 1004M 1% /dev
/dev/md/0 1.2T 853G 340G 72% /mnt/albacore
shm 1007M 0 1007M 0% /dev/shm
mackerel:/export 232G 150G 82G 65% /mnt/mackerel
anchovy:/export 149G 3.5G 145G 3% /mnt/anchovy
As you can see the capacity of /mnt/albacore is now up to 1.2TB, without ever turning the computer off, or even unmounting the filesystem!