Monday, July 27, 2009

How to attach a 3par volume to a CentOS 5.2 server

    The goal of this post is to detail the process of how to make a 3par (san) volume usable by a linux host, and since not all that much of it is 3par specific, it should be easily extendable to most other types of arrays. The major prerequisites are that:

  1. The host is already zoned to the array.
  2. The host object has been created on the array.
  3. A volume is created on the array
  4. The volume is exported to this host

    The equipment involved here is a dell 1955 blade server with a dual port qlogic 2342 fibre channel card which is connected to a 3par S400 array via brocade switches (full topology details at another time). The OS is CentOS 5.2 x86_64 which is only important & mentioned because it makes use kernel 2.6.18 and is sysfs based and not proc based.


Step 1: Use dmsetup to get a quick baseline.

    Being clear on how things look before and after is key, so use dmsetup (device-mapper setup) to check out what the disk devices look like now.

root@server:~$ dmsetup ls
RootDisk-swap   (253, 1)
RootDisk-root   (253, 0)

Step 2: Create your base multipath.conf:

    This is probably the only part that is 3par specific and if you are using another type of array, change the contents of the 'devices' as appropriate. Details are easy to find online. For my setup, I do the following:

root@server:~$ cat > /etc/multipath.conf
defaults {
  user_friendly_names yes
}
blacklist {
  devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
  devnode "^hd[a-z]"
  devnode "^sda$"
  devnode "^sda[0-9]"
}
devices {
  device {
          vendor "3PARdata"
          product "VV"
          path_grouping_policy multibus
          path_checker tur
          no_path_retry 60
  }
}
^D

Step 3 - Rescan your hba ports & verify the volume is seen.

    This host has a single 2 port hba, which appears in sysfs as 'host1' and 'host2' ('host0' is the on-board raid controller). Trigger a re-scan of the fc ports so that they will see the new device.

echo "- - -" > /sys/class/scsi_host/host1/scan
echo "- - -" > /sys/class/scsi_host/host2/scan
    Use dmsetup to see if this worked, and in this case it did.

root@server:~$ dmsetup ls
RootDisk-swap   (253, 1)
RootDisk-root   (253, 0)
mpath0  (253, 2)                           <--------- yep

    In this case it works, which is great and all, but a lot of times it doesn't. If it didn't work, reinitialize each fibre port, scan again, and then check dmsetup (and hope). To reinitialize the port, use issue_lip:

root@server:~$ echo 1 > /sys/class/fc_host/host1/issue_lip
root@server:~$ echo 1 > /sys/class/fc_host/host2/issue_lip

For a ton of details on the details of fibre channel including lips, loops, zones, and other tantalizing details, read the very outdated-but-still-good Interop Library Fibre Channel Tutorial.

Step 4 - Start, Manage, and Verify multipathd

    Now that we've verified the host sees the volume, start multipathd, chkconfig it on, and if the output is similar to what follows you are in good shape.

root@server:~$ /etc/init.d/multipathd start
Starting multipathd daemon:                                [  OK  ]
root@server:~$ chkconfig multipathd on
root@server:~$ multipath -l
mpath0 (350002ac16bf80456) dm-2 3PARdata,VV
[size=75G][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=0][enabled]
\_ 1:0:1:124 sdb 8:16  [active][undef]
\_ 2:0:1:124 sdc 8:32  [active][undef]

mpath0 is a bad name

    Grab the wwid from the multipath output and add a block to the multipath conf giving it a friendly name. If you skip this, things will work just fine until it's the middle of the night a year later, you have 3 san attached volumes that showed up in a different order after a reboot, and you need to figure out what's going on. Of course your documentation is perfect and would have all this stuff written down, but DO IT ANYWAY... The stanza you need will look something like this, which should come right after the devices block.

multipaths {
  multipath {
          wwid    350002ac16bf80456
          alias   mt-prod-rodb-p2
  }
}

multipathd has a shell?

    Oh yeah it does. Now that you've appropriately named your volume (and if you chose 'mpath0' and work with me I will hunt you down ) you need to reconfigure multipathd to pick up the new changes. Use the super secret interactive multipathd shell which is only super secret because I didn't know about it for a long time.

root@server:~$ multipathd -k
multipathd> reconfigure
ok

    If you have the time, hit '?' for a list of options. There is a ton of stuff you can do here such as administratively shut paths if you have the need to. To do this, use 'del' path (and 'add' path to restore), not 'fail' path. Perhaps more on this another time.

multipathd> list paths
hcil      dev dev_t pri dm_st   chk_st  next_check
1:0:1:124 sdb 8:16  1   [active][ready] XXXXXXXXXX 20/20
2:0:1:124 sdc 8:32  1   [active][ready] XXXXXXXXXX 20/20
multipathd> del path sdc
ok
multipathd> list paths
hcil      dev dev_t pri dm_st   chk_st  next_check
1:0:1:124 sdb 8:16  1   [active][ready] .......... 1/20
multipathd> add path sdc
ok
multipathd> ^D

    Getting back on track, verify that both dmsetup and multipath see the device by its proper name and if they do, move along to the next part.

root@server:~$ multipath -l
mt-prod-rodb-p2 (350002ac16bf80456) dm-2 3PARdata,VV
[size=75G][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=0][enabled]
\_ 1:0:1:124 sdb 8:16  [active][undef]
\_ 2:0:1:124 sdc 8:32  [active][undef]

root@server:~$ dmsetup ls
RootDisk-swap   (253, 1)
RootDisk-root   (253, 0)
mt-prod-rodb-p2 (253, 2)

Step 5 - Partitioning and Filesystem nonsense

    Relax, The hard part is over. Next steps are to create a partition, get the os to see the partition, drop a filesystem on it, and mount it. For the most part this is really standard stuff.

Partition the disk

    Create a partition on the disk using plain old fdisk. Nothing fancy here, just "n,p,1,enter,enter,w,enter"

root@server:~$ fdisk /dev/mapper/mt-prod-rodb-p2
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel. Changes will remain in memory only,
until you decide to write them. After that, of course, the previous
content won't be recoverable.


The number of cylinders for this disk is set to 9790.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
(e.g., DOS FDISK, OS/2 FDISK)
Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

Command (m for help): n
Command action
e   extended
p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-9790, default 1):
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-9790, default 9790):
Using default value 9790

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.

WARNING: Re-reading the partition table failed with error 22: Invalid argument.
The kernel still uses the old table.
The new table will be used at the next reboot.
Syncing disks.

Ignore that warning, I do, everything is just fine after kpartx.

Use kpartx to update the device maps

    This is the only remaining step you don't normally need to do. The command you'll use is 'kpartx -a /dev/mapper/volume'

root@server:~$ ls -l /dev/mapper/ |grep mt-prod
brw-rw----  1 root disk 253,  2 Jul 25 13:24 mt-prod-rodb-p2

root@server:~$ kpartx -l /dev/mapper/mt-prod-rodb-p2
mt-prod-rodb-p2p1 : 0 157276287 /dev/mapper/mt-prod-rodb-p2 63

root@server:~$ kpartx -a /dev/mapper/mt-prod-rodb-p2

root@server:~$ ls -l /dev/mapper/ |grep mt-prod
brw-rw----  1 root disk 253,  2 Jul 25 13:24 mt-prod-rodb-p2
brw-rw----  1 root disk 253,  3 Jul 25 13:26 mt-prod-rodb-p2p1

Create the filesystem & mount it

    Smooth sailing from this point forward. Filesystem:

root@server:~$ mkfs.ext3 /dev/mapper/mt-prod-rodb-p2p1
mke2fs 1.39 (29-May-2006)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
9830400 inodes, 19659535 blocks
982976 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
600 block groups
32768 blocks per group, 32768 fragments per group
16384 inodes per group
Superblock backups stored on blocks:
  32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
  4096000, 7962624, 11239424

Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 22 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.
root@server:~$

   Mounting:

root@server:~$ mkdir -p /path/to/mountpoint
root@server:~$ echo "/dev/mapper/mt-prod-rodb-p2p1 /path/to/mountpoint   ext3 defaults,noatime 0 0" >> /etc/fstab  
root@server:~$ mount /path/to/mountpoint

   Done! For reference, check out Using Device-Mapper Multipath official docs.

Sunday, July 26, 2009

and so it has come to this


    Yeap, first post so, welcome to my blog! (shoots self in face). The intent of this blog is to share technical info that might just help others avoid the troubles I've run into. Or to put it another way, post what I wish I had in front of me while I stumbled through it the first time. The plan is for this to be more technical than not, focused on Virtualization, Networks, Storage, and Linux, but lets not set expectations too high - expect crap. Feedback is welcome and I hope that something useful will come out of this, only time will tell.