Monday, August 14, 2006

Solaris Live Upgrade (on an SVM mirror set)

Many of you have probably heard of Sun's live upgrade feature by now. Live upgrade essentially lets you upgrade your system from one Solaris version to another with minimal downtime. If done right, the only downtime you need to suffer is the time required for rebooting your server.

Live Upgrade works like this:
  • Create a "boot environment" (BE) representing your current system
  • Create an "alternate boot environment" (ABE) which is a clone of your BE
  • Run a Solaris upgrate against the ABE
  • Switch the active "boot environment" to the ABE
  • Reboot
Seems simple enough, right? Well, on the surface it does seem simple. There's just one problem. You need an empty partition to use as the ABE! You know what really annoyed me though? Most of the LU examples and docs I've read seem to involve using some random extra scrap of a partition for the ABE. Well, as you can probably guess, the ABE becomes your system boot partition at the end of this process. Do you really want some random scrap partition to be your system partition for the long term? I certainly didn't.

This whole procedure is also easier if you separate your system partitions from your data ones. Yes, I know this is normally a good practice. However, I've grown to just use a huge "/" and smaller "/var" on most of my machines these days. It's just easier, and I still have "/home" on an external file server.

So what was I to do? The Solaris 10 6/06 DVD set was here, and I wanted to upgrade. (my server was running the original Solaris 10 release) I needed something large to make my ABE on, but also needed it to be somewhere I was comfortable using as my long-term boot drive. I also wanted to avoid involving anything beyond that server itself. Then it occured to me... the "system disk" of my server was actually an SVM mirror set!

In short form, here was my plan of action:
  • Make a backup (thankfully this machine has a DDS3 drive installed in it)
  • Remove the second disk from the mirror and unconfigure its meta devices
  • Run live upgrade, using that second disk as the ABE
  • Switch the default BE to the one on the second disk
  • Boot off the second disk, into the new version of Solaris
  • Make sure the server is still working correctly
  • Unconfigure the mirror devices in SVM
  • Recreate the meta devices on the second disk, mirrors containing them, run metaroot, etc.
  • Reboot again
  • Add the first drive back into the mirrors
Seems simple enough, right? ;-) When all is said and done, the goal was to have the same drive configuration before. The only differences would be that my mirror components would be reversed, and I'd be running a newer version of Solaris.

While I should now show a complete walkthrough of what I did, a full post-mortem reconstruction would be rather tedious. Besides, if you're familar with SVM and can read through Sun's LU docs, following my strategy should be straightforward and simple. (yes, it does work) Just remember to install the recommended patches before using LU, or it'll fail.

Also, I strongly recommend mounting the upgraded ABE before that first reboot. You should then check the "/var/sadm/system/data/upgrade_cleanup" file for any changes of interest that it made. I failed to do this myself, and wound up having sendmail misconfigured for several hours. On the bright side, it does make backup copies of any configuration files that it changes.

Good luck!

Fun with Solaris 10 6/06 and ZFS

The 6/06 release of Solaris 10 finally incorporated ZFS as part of the operating system. This is quite exciting, because now we can start using ZFS without having to run a Solaris Express or OpenSolaris distribution. As such, I was itching to try it out. I started by ordering the "Solaris Enterprise System" DVD stack from Sun. Sure, I could have downloaded it, but its nicer to have a whole set of media already there for me.

Now I needed a test system... So I dug out my older Ultra 60 workstation, hooked up a DVD drive, and a few hours later I was good to go. Thus far, the only real change I noticed from the original Solaris 10 release was a newer and nicer looking login screen.

Time to hook up a boatload of hard drives! I had an expansion box from my now-since-decomissioned CLARiiON FC RAID monster, good to go with 10x36GB 10krpm FC hard drives. All I needed to do was connect them, reformat them with a normal block size (they were formated for 520 bytes instead of the normal 512, thanks to the CLARiiON controller), and I'd be good to go. Unfortunately, all I had to connect them to was a QLogic QLA2100 FC HBA. The QLA2100 isn't supported past Solaris 8, or so they'd lead you to believe. Thankfully you just have to get the Solaris driver, unpack it from the package stream QLogic provides, modify the package to not complain about your Solaris version, and install it. As expected, it then worked just fine.

To fix the block size on the drives, I got the "scu" utility from here, and then followed the instructions on this page. All pretty straightforward, but it did take about an hour per drive. It doesn't really do much I/O to to the drive from your system, though, so doing all the drives at once does speed things up.

Finally, I went through "format" on each drive to fix the annoying "bad magic" messages. Now I had 10 drives off the end of an FC link, all set and good to go!

Setting up ZFS was really easy. If you haven't done so yet, I strongly recommend going here to review their documentation and screencasts. The specific commands are really easy to figure out, but that site shows them to you. Essentially, with ZFS, you make a pool out of mirrors, RAID-Z sets, or individual disks. You can then chop up the pool however you see fit.

In any case, I tried a few configurations and ran some benchmarks. Keep in mind that testing with "dd" and a large block size will ALWAYS yield better results than you'll ever see on a real benchmark program. (I think I got up to 80MB/s with "dd" at some point) Also, running multiple benchmark programs or "dd" sessions in parallel may also yield higher throughput. FYI, I was connecting to all 10 drives over a single 100MB/s FC link. So on with the results!

One 10-drive RAID-Z set

$ bonnie++ -d . -s 2G
Version 1.03 ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
proxima 2G 14685 91 35308 48 23733 49 13301 92 52396 50 512.1 13
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 4798 99 +++++ +++ 7455 99 5230 99 +++++ +++ 7350 97
proxima,2G,14685,91,35308,48,23733,49,13301,92,52396,50,512.1,13,16,4798,99,+++++,+++,7455,99,5230,99,+++++,+++,7350,97


One 5-drive RAID-Z set

$ bonnie++ -d . -s 2G
Version 1.03 ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
proxima 2G 15241 94 32991 44 24989 45 13676 93 58862 52 550.2 10
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 4821 97 +++++ +++ 7509 99 5190 98 +++++ +++ 7849 99
proxima,2G,15241,94,32991,44,24989,45,13676,93,58862,52,550.2,10,16,4821,97,+++++,+++,7509,99,5190,98,+++++,+++,7849,99


Two 5-drive RAID-Z sets

$ bonnie++ -d . -s 2G
Version 1.03 ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
proxima 2G 15051 92 30531 41 26045 47 14018 93 57507 56 864.6 12
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 4908 99 +++++ +++ 6800 95 4868 99 +++++ +++ 5847 81
proxima,2G,15051,92,30531,41,26045,47,14018,93,57507,56,864.6,12,16,4908,99,+++++,+++,6800,95,4868,99,+++++,+++,5847,81