Using fusion-io drives on Redhat Enterprise 5
Table of Contents
%{color:red}Update:% Please note that this post is getting a bit old. Currently I am running these IBM FusionIO drives on RHEL 6. I’ll be posting about that and a few other PCIe-SSD subjects in the next short while. - 24 Apr 2012
FusionIO IODrive Overview
So at work we have a rather large IBM x3850 x5 server. It has 4 sockets each with six cores and hyperthreading (not that I’m necessarily a fan of hyperthreading–really I haven’t done enough research to make up my mind) which ends up with RHEL5 seeing 48 CPUS.
$ cat /proc/cpuinfo | grep proc | wc -l
48
Fun.
But the important part of this post is that this server also has three 640GB fusion-io drives which I have installed and configured as a volume group called fio
$ ls /dev/fio
fio/ fioa fiob fioc fiod fioe fiof
$ vgs fio
VG #PV #LV #SN Attr VSize VFree
fio 6 4 0 wz--n- 1.76T 1.08T
and where the fio[a,b,c,d,e,f]
are the drives, with each 640 gig card actually appearing as 2 320 gig disks.
$ dmesg |grep -i "found device"
fioinf IBM 640GB High IOPS MD Class PCIe Adapter 0000:89:00.0: Found device 0000:89:00.0
fioinf IBM 640GB High IOPS MD Class PCIe Adapter 0000:8a:00.0: Found device 0000:8a:00.0
fioinf IBM 640GB High IOPS MD Class PCIe Adapter 0000:93:00.0: Found device 0000:93:00.0
fioinf IBM 640GB High IOPS MD Class PCIe Adapter 0000:94:00.0: Found device 0000:94:00.0
fioinf IBM 640GB High IOPS MD Class PCIe Adapter 0000:98:00.0: Found device 0000:98:00.0
fioinf IBM 640GB High IOPS MD Class PCIe Adapter 0000:99:00.0: Found device 0000:99:00.0
Resources
The most important resource for using these FusionIO drives is the official knowledge base which has several articles specifically for linux. I would suggest reading all of them. :)
Install
Once the cards were put into the server, which is somewhat harrowing given their individual cost, and the server was booted, the software drivers that were downloaded from the IBM website were installed. This server runs RHEL5
$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.6 (Tikanga)
as that RHEL version that is what IBM supports for drivers.
$ rpm -i iodrive-driver-1.2.7.5-1.0_2.6.18_164.el5.x86_64.rpm \
iodrive-firmware-1.2.7.6.43246-1.0.noarch.rpm \
iodrive-jni-1.2.7.5-1.0.x86_64.rpm \
iodrive-snmp-1.2.7.5-1.0.x86_64.rpm \
iodrive-util-1.2.7.5-1.0.x86_64.rpm \
Currently I am using the drivers as they were downloaded, which means using a specific matching kernel to match. The drivers do come with a source RPM so that you can rebuild them for your latest kernel, but I have opted not to do that yet. So install the matching kernel
$ yum install kernel-2.6.18-164.el5
and reboot.
However, I am also using the amazing ksplice service to ensure that depsite the fact that I am using a rather old kernel to match the FusionIO drivers that the kernel is still up to date in terms of security issues:
$ uptrack-uname -r
2.6.18-238.12.1.el5
$ uname -r
2.6.18-164.el5
The @uptrack-uname -r@ command asks uptrack what security equivalent version of the kernel is. Great stuff that kslplice.
Once the drivers are installed we can load the modules
$ modprobe fio-driver
and now we can see the drives
$ ls /dev/fio*
fioa fiob fioc fiod fioe fiof
and at this point we can configure the drives.
Worker processes
Once the drivers are installed there is a /etc/init.d/iodrive
startup script. One of the things this script does is startup some worker
processes which I believe are used to move data around the FusionIO drives to ensure their performance and longevity.
$ chkconfig --list iodrive
iodrive 0:off 1:on 2:on 3:on 4:on 5:on 6:off
$ ps ax | grep worker
5271 ? S< 1169:51 [fct0-worker]
5588 ? S< 1168:07 [fct1-worker]
5593 ? S< 359:01 [fct2-worker]
5598 ? R< 206:02 [fct3-worker]
5603 ? S< 203:15 [fct4-worker]
5608 ? S< 203:12 [fct5-worker]
20921 pts/2 S+ 0:00 grep worker
These processes will take up some CPU time. Frankly, because there are 48 CPUs in this server, using up one to run these workers is OK. But it was a little confusing at first seeing all this activity–one worker process for each card.
Configuration
Given that we are going to manage the FusionIO drives via LVM, we will need to configure LVM to allow it. See this knowledge base article.
$ grep fio /etc/lvm/lvm.conf
types = [ "fio", 16 ]
Then add each /dev/fio*
drive as a phyical volume and then add them to a volume group.
$ pvs | grep fio
/dev/fioa fio lvm2 a- 300.31G 320.00M
/dev/fiob fio lvm2 a- 300.31G 100.31G
/dev/fioc fio lvm2 a- 300.31G 100.31G
/dev/fiod fio lvm2 a- 300.31G 300.31G
/dev/fioe fio lvm2 a- 300.31G 300.31G
/dev/fiof fio lvm2 a- 300.31G 300.31G
$ vgs fio
VG #PV #LV #SN Attr VSize VFree
fio 6 4 0 wz--n- 1.76T 1.08T
fio-status
Useful way to check the status of the FusionIO drives.
$ fio-status
Found 6 ioDrives in this system with 3 ioDrive Duos
Fusion-io driver version: 1.2.7.5
Adapter: ioDrive Duo
IBM 640GB High IOPS MD Class PCIe Adapter, Product Number:68Y7381 SN:59518
PCIE Power limit threshold: 24.75W
Connected ioDimm modules:
fct0: IBM 640GB High IOPS MD Class PCIe Adapter, Product Number:68Y7381 SN:77479
fct1: IBM 640GB High IOPS MD Class PCIe Adapter, Product Number:68Y7381 SN:77478
fct0 Attached as 'fioa' (block device)
IBM 640GB High IOPS MD Class PCIe Adapter, Product Number:68Y7381 SN:77479
Alt PN:68Y7382
Located in 0 Upper slot of ioDrive Duo SN:59518
Firmware v43246
322.46 GBytes block device size, 396 GBytes physical device size
Internal temperature: avg 56.6 degC, max 59.6 degC
Media status: Healthy; Reserves: 100.00%, warn at 10%
fct1 Attached as 'fiob' (block device)
IBM 640GB High IOPS MD Class PCIe Adapter, Product Number:68Y7381 SN:77478
Alt PN:68Y7382
Located in 1 Lower slot of ioDrive Duo SN:59518
Firmware v43246
322.46 GBytes block device size, 396 GBytes physical device size
Internal temperature: avg 61.0 degC, max 63.0 degC
Media status: Healthy; Reserves: 100.00%, warn at 10%
Adapter: ioDrive Duo
IBM 640GB High IOPS MD Class PCIe Adapter, Product Number:68Y7381 SN:59507
PCIE Power limit threshold: 24.75W
Connected ioDimm modules:
fct2: IBM 640GB High IOPS MD Class PCIe Adapter, Product Number:68Y7381 SN:77143
fct3: IBM 640GB High IOPS MD Class PCIe Adapter, Product Number:68Y7381 SN:77144
fct2 Attached as 'fioc' (block device)
IBM 640GB High IOPS MD Class PCIe Adapter, Product Number:68Y7381 SN:77143
Alt PN:68Y7382
Located in 0 Upper slot of ioDrive Duo SN:59507
Firmware v43246
322.46 GBytes block device size, 396 GBytes physical device size
Internal temperature: avg 62.0 degC, max 65.5 degC
Media status: Healthy; Reserves: 100.00%, warn at 10%
fct3 Attached as 'fiod' (block device)
IBM 640GB High IOPS MD Class PCIe Adapter, Product Number:68Y7381 SN:77144
Alt PN:68Y7382
Located in 1 Lower slot of ioDrive Duo SN:59507
Firmware v43246
322.46 GBytes block device size, 396 GBytes physical device size
Internal temperature: avg 64.0 degC, max 66.4 degC
Media status: Healthy; Reserves: 100.00%, warn at 10%
Adapter: ioDrive Duo
IBM 640GB High IOPS MD Class PCIe Adapter, Product Number:68Y7381 SN:100366
PCIE Power limit threshold: 24.75W
Connected ioDimm modules:
fct4: IBM 640GB High IOPS MD Class PCIe Adapter, Product Number:68Y7381 SN:77344
fct5: IBM 640GB High IOPS MD Class PCIe Adapter, Product Number:68Y7381 SN:77345
fct4 Attached as 'fioe' (block device)
IBM 640GB High IOPS MD Class PCIe Adapter, Product Number:68Y7381 SN:77344
Alt PN:68Y7382
Located in 0 Upper slot of ioDrive Duo SN:100366
Firmware v43246
322.46 GBytes block device size, 396 GBytes physical device size
Internal temperature: avg 68.9 degC, max 71.9 degC
Media status: Healthy; Reserves: 100.00%, warn at 10%
fct5 Attached as 'fiof' (block device)
IBM 640GB High IOPS MD Class PCIe Adapter, Product Number:68Y7381 SN:77345
Alt PN:68Y7382
Located in 1 Lower slot of ioDrive Duo SN:100366
Firmware v43246
322.46 GBytes block device size, 396 GBytes physical device size
Internal temperature: avg 63.0 degC, max 66.0 degC
Media status: Healthy; Reserves: 100.00%, warn at 10%
XFS
Prior to finding out about the official knowledge base, I had decided to purchase a subscription from Redhat for the XFS file system. Then, upon reading this kb article, I found that they heavily recommend XFS as the file system to run on top of a FusionIO drive
XFS is currently the recommended filesystem. It can achieve up to 3x
the performance of a tuned ext2/ext3 solution. At this time, there is
no know additional tuning for running XFS in a single- or multi-ioDrive
configuration
so that is the file system we use.
$ mount | grep fio
/dev/mapper/fio-vault1 on /var/lib/vault1 type xfs (rw)
Mounting drives after a reboot
I’ll admit I hadn’t thought of this during the initial installation. After a few days we moved the server to a new location which thus required a power down and restart.
While the server was restarting, and I was standing in the cold, loud server room because the new room didn’t have any networking for IPMI (which is not good), I noticed it took a very long time to get past the udev portion of the boot, and in fact the FusionIO drives failed to mount from fstab. Of course there is a logical reason for that–read about it here.
Because we are using the 1.2 driver, I followed the straight forward instructions here.
Performance testing
Performance testing is hard. Maybe it’s just me. But testing superdisk like these FusionIO drives on a server with 48 CPUS and 64 gigs of main memory is not easy. Again I will admit I took a shot at benchmarking the FusionIO disk having not read the kb. I messed around with Bonnie++, io-whatever, but nothing quite came out right, partially because I didn’t put a lot of time into it, and because the server has so much memory that it makes it hard to beat the cache (I did try to reduce the memory the OS could see via kernel configuration, but didn’t have a lot of luck with that).
Finally I read this kb article which suggested using the fio utility (which I don’t believe is a utility put out by FusionIO, rather just aptly named).
The fio tool is not in the RHEL repositories but it is in rpmforge/repoforge.
$ cd /var/tmp
$ wget http://pkgs.repoforge.org/rpmforge-release/rpmforge-release-0.5.2-2.el5.rf.x86_64.rpm
$ rpm -Uvh rpmforge-release-0.5.2-2.el5.rf.x86_64.rpm
$ yum repolist | grep forge
rpmforge RHEL 5Server - RPMforge.net - enabled: 10,636
$ yum search fio | grep -i benchmark
fio.x86_64 : I/O benchmark and stress/hardware verification tool
Here are a couple of example runs. Please note that at this point I do not know much about fio. Benchmarking disk is a highly technical thing to do, and getting tests right would take a lot of research and consideration, which I have not done.
It seems that the fio
benchmark utility suports direct=1
which means use non-buffered-io, thereby skipping memory cacheing and going straight to the disk.
$ cat fio-randwrite.fio
[randwrite[
direct=1
rw=randwrite
bs=1m
size=5G
numjobs=4
runtime=10
group_reporting
directory=/mnt/fio-test-xfs
$ fio fio-randwrite.fio
randwrite: (g=0): rw=randwrite, bs=1M-1M/1M-1M, ioengine=sync, iodepth=1
...
randwrite: (g=0): rw=randwrite, bs=1M-1M/1M-1M, ioengine=sync, iodepth=1
fio 1.55
Starting 4 processes
randwrite: Laying out IO file(s) (1 file(s) / 5120MB)
randwrite: Laying out IO file(s) (1 file(s) / 5120MB)
randwrite: Laying out IO file(s) (1 file(s) / 5120MB)
randwrite: Laying out IO file(s) (1 file(s) / 5120MB)
Jobs: 4 (f=4): [wwww] [100.0% done] [0K/522.8M /s] [0 /510 iops] [eta 00m:00s]
randwrite: (groupid=0, jobs=4): err= 0: pid=28487
write: io=4556.0MB, bw=466161KB/s, iops=455 , runt= 10008msec
clat (msec): min=1 , max=1692 , avg= 9.83, stdev=22.04
lat (msec): min=1 , max=1692 , avg= 9.84, stdev=22.04
bw (KB/s) : min= 559, max=264126, per=24.79%, avg=115540.55, stdev=20377.90
cpu : usr=0.10%, sys=14.85%, ctx=59071, majf=0, minf=92
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued r/w/d: total=0/4556/0, short=0/0/0
lat (msec): 2=0.53%, 4=1.27%, 10=97.17%, 20=0.59%, 50=0.09%
lat (msec): 100=0.18%, 250=0.15%, 2000=0.02%
Run status group 0 (all jobs):
WRITE: io=4556.0MB, aggrb=466161KB/s, minb=477349KB/s, maxb=477349KB/s,
mint=10008msec, maxt=10008msec
Disk stats (read/write):
dm-11: ios=0/158802, merge=0/0, ticks=0/55956241, in_queue=55915327,
util=66.05%, aggrios=0/159667, aggrmerge=0/0, aggrticks=0/55932489,
aggrin_queue=55785218, aggrutil=65.96%
fioc: ios=0/159667, merge=0/0, ticks=0/55932489, in_queue=55785218,
util=65.96%
And then a similar test using RAID10 SAS disk formated ext3.
$ cat fio-randwrite.fio
[randwrite[
direct=1
rw=randwrite
bs=1m
size=5G
numjobs=4
runtime=10
group_reporting
directory=/mnt/sas-test
$ fio fio-randwrite.fio
randwrite: (g=0): rw=randwrite, bs=1M-1M/1M-1M, ioengine=sync, iodepth=1
...
randwrite: (g=0): rw=randwrite, bs=1M-1M/1M-1M, ioengine=sync, iodepth=1
fio 1.55
Starting 4 processes
randwrite: Laying out IO file(s) (1 file(s) / 5120MB)
randwrite: Laying out IO file(s) (1 file(s) / 5120MB)
randwrite: Laying out IO file(s) (1 file(s) / 5120MB)
randwrite: Laying out IO file(s) (1 file(s) / 5120MB)
Jobs: 4 (f=4): [wwww] [1200.0% done] [0K/0K /s] [0 /0 iops] [eta
1158050441d:07h:00m:05sJobs: 4 (f=4): [wwww] [inf% done] [0K/0K /s]
[0 /0 iops] [eta 1158050441d:07h:00m:04s] Jobs: 4 (f=4): [wwww]
[1300.0% done] [0K/0K /s] [0 /0 iops] [eta 1158050441d:07h:00m:04sJobs:
4 (f=4): [wwww] [inf% done] [0K/0K /s] [0 /0 iops]
[eta 1158050441d:07h:00m:03s] Jobs: 1 (f=1): [___w] [66.1% done]
[0K/0K /s] [0 /0 iops] [eta 00m:19s]
randwrite: (groupid=0, jobs=4): err= 0: pid=28586
write: io=4096.0KB, bw=112369 B/s, iops=0 , runt= 37326msec
clat (usec): min=12140K, max=37183K, avg=32696578.04, stdev= 0.00
lat (usec): min=12140K, max=37183K, avg=32696579.88, stdev= 0.00
bw (KB/s) : min= 27, max= 83, per=31.61%, avg=34.46, stdev= 0.00
cpu : usr=0.00%, sys=51.90%, ctx=9598, majf=0, minf=102
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued r/w/d: total=0/4/0, short=0/0/0
lat (msec): >=2000=100.00%
Run status group 0 (all jobs):
WRITE: io=4096KB, aggrb=109KB/s, minb=112KB/s, maxb=112KB/s,
mint=37326msec, maxt=37326msec
Disk stats (read/write):
dm-12: ios=128/4721384, merge=0/0, ticks=5582/602531980, in_queue=602926524,
util=97.85%, aggrios=129/87424, aggrmerge=0/4634618, aggrticks=5631/10828734,
aggrin_queue=10826088, aggrutil=98.01%
sdb: ios=129/87424, merge=0/4634618, ticks=5631/10828734, in_queue=10826088,
util=98.01%
That’s a pretty big difference: io=4556.0MB
for the FusionIO drives versus io=4096.0KB
for the SAS RAID10. I’m going to have to look into this more! :)
PS. I found this list of device bandwidths interesting.