Mini-HOWTO: Disks and Partitions larger than 2TB under LINUX
using an Adaptec 2820SE RAID card
Tilo Sloboda, March 2007
If you're just looking for information on creating large partitions >2TB under LINUX,
then please skip to the second part.
Outline:
I built a couple of systems which use an Adaptec 2820SE 8-channel SATA
card, and which use eight 750 GB SATA drives.
Nice hardware, but how to configure it properly so that LINUX can
handle partitions larger than 2TB? Some of the LINUX tools like "fdisk"
or the native partitioning tools on Fedora Core or RedHat won't even
work on a drive larger than 2TB..
RAID Card Firmware/BIOS
First I would recommend to upgrade your RAID controller to the latest firmware.
In my case for the Adaptec 2820SE , that's BIOS V5.2-0 build number 11564.
Check the manufacturer's web-site and compare against what you have.
Kernel Driver for your RAID Card
Second, please double check if
the manufacturer of your RAID card supports the LINUX distribution
which you are using.. you may have to compile a driver yourself, and
for that you will have to install the kernel-sources. e.g. the
manufacturer does not support the Adaptec 2820SE card for Fedora Core.
I installed several Adaptec 2820SE cards on Fedora Core: FC4, FC5 and FC6.
On FC6 you don't need a special driver - the one coming with the distribution
works.
For FC4 and FC5 I experienced that the systems hang randomly and without
any error message if the system comes under load. The remedy is to install / compile
the correct driver.
The kernel driver installation procedure for the Adaptec 2820SE is:
- when running Fedora Core 4 or 5, you need to install a
hand-compiled
DKMS kernel module:
- download latest version of Dynamic Kernel Module Source Code
from:
http://www.adaptec.com/en-US/speed/raid/aac/linux/aacraid-dkms-1_1_5-2433_tgz.htm
- cd /tmp;
mkdir aacraid-dkms-1.1.5-2433;
cd aacraid-dkms-1.1.5-2433 ;
tar -zxf ../aacraid-dkms-1.1.5-2433.tgz
- rpm -Uhv *.rpm
NOTE: the source rpm places driver source contents in
/usr/src/aacraid-1.1.5.2433
in case manual build
intervention is required.
- make sure the kernel sources are installed , e.g. "yum install
kernel-devel"
- Use the various dkms commands to Build and install a driver:
- dkms add -m aacraid -v 1.1.5.2433
- dkms build -m aacraid -v 1.1.5.2433
- dkms install -m aacraid -v 1.1.5.2433
Running module version sanity check.
aacraid.ko:
- Original module
- Found
/lib/modules/2.6.17-1.2142_FC4smp/kernel/drivers/scsi/aacraid//aacraid.ko
- Storing in
/var/lib/dkms/aacraid/original_module/2.6.17-1.2142_FC4smp/x86_64/
- Archiving for uninstallation purposes
- Installation
- Installing to
/lib/modules/2.6.17-1.2142_FC4smp/kernel/drivers/scsi/aacraid//
depmod.....
Saving old initrd as /boot/initrd-2.6.17-1.2142_FC4smp_old.img
Making new initrd as /boot/initrd-2.6.17-1.2142_FC4smp.img
(If next boot fails, revert to the _old initrd image)
mkinitrd......
DKMS: install Completed.
this keeps backup copies of the old driver and the old initrd, in case
something goes wrong you can revert.
- reboot!
LINUX with Partitions >2TB
Partitioning anything larger than 2TB is very problematic with the
standard tools.
Although the kernels of FC4, FC5, FC6 all come pre-compiled and
pre-configured with support for >2TB drives and partitions, a couple of system
components have a 2TB limit:
- grub or lilo only understand 'msdos' disk labels - this means that you can not boot from a GPT disk label.
- 'msdos' disk labels can not handle partitions or drives larger than 2TB
- fdisk can not handle drives or partitions larger than 2TB -- even
if you manage to partition a >2TB drive with parted, don't use fdisk
to display it - fdisk will lie about the partitioning.
- LVM can not handle logical volumes larger than 2TB
- the linux installer(s) can typically not handle "gpt" disk
labels, but only "msdos" disk labels - both because of "grub" and the partitioning tools used
The following system components do not have the 2TB limit:
- the kernel can usually handle >2TB without problems
- "gpt" disk lables can handle >2TB drives and partitions
- "parted" can handle >2TB drives and partitions --- but be
careful: parted writes
directly to the disk!!! RTFM!
So with a little planning and trickery you can create and use >2TB
partitions.
Once you have your system running with the correct firmware/BIOS and
the correct drivers, you need to decide if you want to use Software or Hardware RAID,
and then you need to create and carve out the logical disk drives for your
system, so you can install your OS and the data partitions
Carving out Logical Drives
I chose to use Hardware RAID6 on my Adaptec 2820SE, because it has a battery backup for it's cache,
which gives it an advantage over using software RAID6, besides that the hardware RAID is also faster.
What worked best for me was to set-up all the Logical Drives as RAID6,
distributed over the maximum number of physical drives (max spindles).
- for new systems it's a good idea to use RAID6 if the controller
supports it, so it can tolerate 2-disk failures -- e.g. the Adaptec 2820SE does.
- try to carve out:
- one RAID6-volume for the operating system - e.g.
30..100 GB - it will get a 'msdos' disk label
- one RAID6-volume with the remaining space for data - it will get a 'gpt' disk label
this way you will have:
- /dev/sda as the system drive, 30..100GB large
- /dev/sdb as the drive holding the data, e.g. >4TB in my case.
IMPORTANT NOTE: The RAID layout above will cause problems with the Adaptec 2820SE controller... It's firmware has certain limitations, which will make it only work for a single RAID-Volume per controller card. Detailed explanation: Assume you would use two RAID volumes as described above, and one of your hard drives failed (drive 1), and on top of it you would get some bad blocks for the first RAID array on disk 2 and some bad blocks for the second RAID array on disk 3 ... essentially those secondary errors are not on the same drive. But the 2820SE firmware makes the incorrect assumption that you only have one RAID volume, and for the controller this scenario looks like a three disk failure -- unrecoverable! All your data will be lost!
So if you're using an Adaptec 2820SE, do not create more than one RAID volume per controller card!
Instead, put your data partition on a RAID6 volume, and put your root partition on a separate controller card or on a software RAID.
- When you install LINUX, e.g. Fedora Core, let the installer only
use /dev/sda, and do not touch /dev/sdb - e.g. de-select it.
The installer will create an "msdos" disk label for /dev/sda and
because the size of that logical drive is smaller than 2TB there will
be no problems.
You will need a /boot partition of ~100MB , a swap partition (2x physical memory),
and a root partition / on /dev/sda.
- After installation of the OS, you can use "parted" -- be very
careful there!! -- to partition the huge drive and then create the
filesystem on it.
"parted" warnings:
- "parted" writes directly to the disk -- it does not wait for you to say 'write', like "fdisk" does.
- "parted" uses Megabytes, not blocks or cylinders to determine the partition boundaries.
- prepare to lose all your data in case you use "parted" on a disk with live data...!
- do a "man parted" and read the instructions before you attempt to use parted.
- Do read the "parted" manual pages! And beware that parted directly writes to the disk - it does not wait for a "write" command like fdisk.
- run "parted print" to check if there is a current disk
label on the partition, e.g.: compare the differences to fdisk:
# fdisk -l /dev/sda
Disk /dev/sda: 101.8 GB, 101860769792 bytes
255 heads, 63 sectors/track, 12383 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot
Start
End Blocks Id System
/dev/sda1
*
1
13 104391 83 Linux
/dev/sda2
14 12383
99362025 8e Linux LVM
# parted /dev/sda
print
Disk geometry for /dev/sda: 0.000-97142.000 megabytes
Disk label type: msdos
Minor Start
End Type
Filesystem Flags
1
0.031 101.975 primary
ext3 boot
2 101.975
97135.202
primary
lvm
Information: Don't forget to update /etc/fstab, if
necessary.
From the output of parted you
can see that the system drive has a msdos
disk label -- that's good for the system drive, but NOT what you want for your data drive!
Your data drive should look somewhat like this -- all empty:
# parted /dev/sdb
print
Disk geometry for /dev/sdb: 0.000-4194614.000 megabytes
Disk label type: none
Minor Start
End Filesystem
Name
Flags
Please note that "parted" uses Megabytes for the partition boundaries, and not cylinders.
- run parted interactively
on your new, and empty data drive, e.g. /dev/sdb,
and create a GPT Disk Label, then specify one partition using the size of your drive in Megabytes
from the "disk geometry" listing:
# parted
(parted) mklabel gpt
(parted) print
Disk geometry for /dev/sdb: 0.000-4194614.000
megabytes
Disk label type: gpt
Minor Start
End Filesystem
Name
Flags
(parted) mkpart primary ext3 0 4194614
(parted) print
Disk geometry for /dev/sdb: 0.000-4194614.000 megabytes
Disk label type: gpt
Minor Start
End Filesystem
Name
Flags
1
1.000
4194613.983
ext3
Information: Don't forget to update /etc/fstab, if
necessary.
(parted) quit
#
- create your filesystem on /dev/sdb1
If you are using an ext3 file system, I recommend to use
the option -m0 to not reserve the standard 5% space for root -- no reason to waste that space!
# mkfs.ext3 -m0 /dev/sdb1
# mkdir /data
# mount -t ext3 /dev/sdb1 /data
# df -h /data
Filesystem
Size Used Avail Use% Mounted on
/dev/sdb1
3.9T 195M 3.9T 1% /data
- make the appropriate entry in your /etc/fstab to mount
/dev/sdb1 as ext3 on /data
est voila!