Future of OpenSolaris Boot Environment management
I was quite happy to see this recent post from Ethan Quach proposing an efficient method for sharing the variable parts of /var. It bears a striking resemblance to something that I suggested and and clarified in the past.
Correction June 6, 2009: Links to mail archives at opensolaris.org seem not to be stable. The same messages are available at the following: My initial suggestion and clarification.
But why does this matter? When you are making significant changes to the system, such as during a periodic patch cycle or upgrade, it is generally desirable to...
- be able to do so without taking the system down for the duration of the process
- be able to abort the operation if you have a change of heart
- be able to fail back if you realize that newer isn't better
- Mail boxes If the machine is a mail server (using sendmail et. al.) there is a pretty good chance that users have their active mail boxes at /var/mail.
- In flight mail messages Most machines process some email. For example, if a cron job generates output it is sent to the user via email. Many non-web mail clients invoke /bin/mail or /usr/lib/sendmail to cause mail to be sent. Each message spends somewhere between a few milliseconds and a few days in /var/spool/mqueue or /var/spool/clientmqueue.
- Print jobs If the machine acts as a print server (even for a directly attached printer) each print job spends a bit of time in /var/spool/lp.
- Logs When something goes wrong, it is often times useful to look in log messages to figure out why it went wrong. Those are often found under /var/adm.
- Temporary files that may not be It is rather common for people to stick stuff in /var/tmp and expect to be able to find it sometime in the future.
- DHCP If a machine is a dhcp server, it will store configuration and/or state information in /var/dhcp.
- ...
/var/mail OVERWRITE /var/spool/mqueue OVERWRITE /var/spool/cron/crontabs OVERWRITE /var/dhcp OVERWRITE /etc/passwd OVERWRITE /etc/shadow OVERWRITE /etc/opasswd OVERWRITE /etc/oshadow OVERWRITE /etc/group OVERWRITE /etc/pwhist OVERWRITE /etc/default/passwd OVERWRITE /etc/dfs OVERWRITE /var/log/syslog APPEND /var/adm/messages APPENDNotice that the default configuration loses your in flight print jobs because /var/spool/lp is not copied. Suppose you have a mail server with a few gigs of mail at /var/mail. Is it a good use of time or disk space to copy /var/mail between boot environments?
A much better solution seems to be to make those directories shared between the boot environments. The way to do this in Live Upgrade and presumably in the future is to remove (or not add) them to /etc/lu/synclist and allocate separate file systems. However, do you really want a file system for /, /var/mail, /var/spool/mqueue, /var/spool/clientmqueue, /var/spool/lp, /var/adm, /var/tmp, /var/dhcp, ...? What if you had someone tell you that you had to monitor every file system on every machine for being out of space? How big would you make all of those file systems so that your monitoring didn't wake you up in the middle of the night?
In the future, it looks as though OpenSolaris will use ZFS to store each boot environment. Among the features of ZFS that make this desirable are snapshots, clones, and rethinking the boundary between disk slices (or volumes) and file systems. If the organization of /var is changed just a bit...
/var/adm -> share/adm /var/dhcp -> share/dhcp /var/mail -> share/mail /var/spool -> share/spool /var/tmp -> share/tmp /var/share/adm /var/share/dhcp /var/share/mail /var/share/spool /var/share/tmpThen you can get by with having two zfs file systems: / and /var/share. The Snap Upgrade process would then likely do the following:
- Take a snapshot of /, clone it, then mount it somewhere usable in subsequent steps (e.g. /mnt/abe)
- Do whatever is needed on the alternate boot environment mounted at /mnt/abe.
- Unmount the alternate boot environment
This means that activating a boot environment would look like:
- Bring the system into single-user mode
- Mount the alternate boot environment
- Synchronize those files that need to be synchronized
- Take a snapshot of /var/shared
- Set the boot loader to boot from the new boot environment and offer a failback option to the old boot environment
- Reboot
Now suppose this system is a bit more complicated and has 20 zones on it. Have you ever patched a system with 20 zones on it? Did you start and Friday and finish on Monday? How happy were the users with the "must install in single-user mode" requirement? This same technique should allow you to have two file systems per non-global zone - one for the zone root and one for /var/shared in the zone. Supposing that the reboot processing takes 5 seconds per zone you are looking at an extra minute to reboot rather than a weekend of down time.
Without Live Upgrade or Snap Upgrade, what would backout look like? After you had the system down for patching for a couple days, you could take it down again for a couple days to back the patches out. Or you could go to tape. Neither is an attractive option. With Snap Upgrade you should be able to fail back with your normal reboot time plus a minute.
1 comment:
I love this idea.
I've got /var/cores split out to a separate file system, (I blogged about it recently), which is simple because it doesn't need synchronizing during activation of the new BE.
I've also moved /var/crash into /var/cores, with a soft-link from /var/crash to /var/cores/crash to "support" existing utilities looking for the files in /var/crash.
Post a Comment