«

»

Jun 22 2013

ZFS: some day my prince will come…

Lost

Many of us believe that filesystem integrity is the single most important component of computer systems. Disk drives fail, computer systems are upgraded, networks improve and and morph from wired to wireless – but the data files that represent the accumulated work and knowledge of users remain. It’s bad enough that, as applications become outdated, proprietary file formats become unreadable: who now can read an old WordStar doc? (Even NASA has learned, to its chagrin, that failure to have a policy regarding collected data can result in historic losses. For example, the original video transmission tapes from the Apollo 11 moon landing are gone, nowhere to be found; and telemetry from the first lunar orbiter satellites were stored in an uncertain file format on tapes that could only be played on machines no longer made.) We will probably never entirely overcome these kinds of issues. But it would be absolutely stupid to compound them, by using filesystems inherently vulnerable to data loss.

Anyone who has used Macs or PCs for more than a minute has erased a file they wished they’d kept. Or had a disk crash and become unreadable. Or discovered that a long-unread but valued image file now contains junk.

Over the years, OS designers have come up with various methods to prevent this from becoming a disastrous problem for users. Macs have had a two-step file erasure process since the beginning: first you put files in the trash; then you empty the trash. Hopefully, sometime before the second step, you recognize that you threw something away you didn’t mean to, and have a chance to recover it.

More recently, Apple included the “Time Machine” service in the OS X operating system, which by default backs up your entire system to another disk. It’s brain-dead simple to use, and you have to manually exclude files from being copied. For most users, this is probably a pretty good solution.

The true problem lies at a deeper level. Apple’s HFS+ filesystem is an old and out-of-date technology. It was designed for small-sized disks, not the terabyte-sized hard drives many of us now use. File corruption is not detectable, at least not in normal use; so several utilities have been designed to inspect and repair what can be repaired (Disk Utility, Disk Warrior, etc.). But in truth these only work on the metadata (directories, file blocks, etc.) and don’t repair damage to files themselves. This means you could have a corrupt file that gets copied by Time Machine into your backups, and you never know until it’s far too late.

How big a problem is this? Not much of one if we’re only talking about images or videos, which are often compressed with lossy techniques anyway; but it’s a disaster for spreadsheets, financial files, and so forth.

Recently, RAID systems have come into vogue, either to provide faster disk access (by treating a series of disks as one) or security (by having a group of disks act as mirrors of each other). Many users believe that RAID mirrors act as effective backups, but in fact the only thing they insure against is the failure of an entire disk: they provide no protection whatsoever against corruption of a file. RAID mirrors are most definitely not backups: since all disks in the mirror are immediately written with new data, if something happens to corrupt a file while you’re writing it, all disks in the RAID mirror get the corrupted file instantly. Even if you regularly back up your RAID mirror, you have the same problem underlying most backups: by the time you discover that a file is corrupted, it has probably already been copied into your backup system, and the original uncorrupted version may have been deleted.

Here comes the Sun

To address this, many years ago the engineers at Sun wrote an entirely new filesystem, designed for enormous disks, with complete integrity checks throughout, the ability to generate “snapshots” of an entire filesystem instantly, and so forth. They called it the Zettabye File System, or ZFS: a zettabyte is 1 billion terabytes. (For comparison, according to Wikipedia, “As of 2009, the entire World Wide Web was estimated to contain close to 500 exabytes. This is one half zettabyte.” [1] )

ZFS is an amazing thing. Every file – in fact every block of every file – is checksummed as it is written to disk (IOW, the data in each block is treated as if it contained numbers, and a calculation is performed to produce a single result that can be recomputed every time the block is read in the future and compared to the original to discover any change). Better yet, the metadata (directory information, ownership, etc.) is also checksummed. All these checksum results are stored within the filesystem, and consulted for every block of every file as they are read and written. The performance overhead to do so is slight given the speed of current computers, but the gain in verifying the integrity of a filesystem is enormous.

Better yet, if you make a RAID mirror of several disks using ZFS, the system will repair errors as it finds them. Each time a block of data is read from one of the disks, if the checksum computation reveals an error, ZFS will automatically go to a mirror disk, find the same block without the error, and correct the first disk. Put another way, ZFS is self-healing. This all happens invisibly to the user.

ZFS has many other remarkable features, such as snapshots, clones, and so forth. So it’s the future for Mac filesystems everywhere, right? You’d think. But you’d think wrong.

Apple suggested it would offer ZFS for its OS X 10.6 Snow Leopard server software in 2009 – and then, for reasons no one has fully understood, abruptly and silently canceled it. Perhaps as Apple reconsidered its entrée into the enterprise server market (the company killed its Xserve product line in 2010), ZFS seemed less important. Perhaps licensing issues with Oracle, which had bought Sun, became insurmountable. No one is sure. But it meant that OS X continued then, and continues now, to rely on a very hoary old filesystem technology for many of the features users are becoming dependent on: file versioning, Time Machine, ACLs, etc. While some of these new features provide functions that resemble those found in ZFS (like snapshots and versioning), they are far inferior since the integrity of the underlying filesystem is always in doubt.

MacZFS and ZEVO

Into this gap stepped two groups. MacZFS, an open source collaborative project, grabbed the source code that Apple released in 2009,  as well as the previously available source code from Sun, and began an effort to port a version of ZFS for users. Then, Apple’s filesystem engineer, Don Brady, left Apple to start a company to port ZFS, under the product name ZEVO, which was ultimately bought by GreenBytes and released as a free plugin. Both MacZFS and ZEVO are available now, although MacZFS is using older code and ZEVO’s future is uncertain. In neither case, very sadly, is Apple providing any known support, and neither version of ZFS can be used to format a boot disk, which makes them effectively useless for nearly all single-drive-based Macs.

So where are we?

Advanced Mac users are doing what they can with MacZFS and ZEVO for secondary storage. Server administrators are running ZFS on Linux. But the average user, ignorant of the issues and the risks, continues to rely on vulnerable systems, sooner or later to be bitten badly when a crucial file can no longer be found or read accurately. As we accumulate terabytes of data now, this problem will get worse and worse, and halfstep solutions such as Time Machine will not solve it. Apple really needs to get behind ZFS. Developers should be demanding it, and users should get educated. “Good enough” isn’t really good enough.

____

Update, November 3, 2013

Sadly it appears that ZEVO is gone. GreenBytes has essentially orphaned it, apparently discontinuing any further support despite some tweets that suggested that a Mavericks port was forthcoming. So I have removed it from my systems, and instead installed MacZFS. We’ll see how useful this is as a replacement.

Oh, and I purchased a standalone NAS box called an “FreeNAS mini”, from ixSystems, with 12T of drives, running FreeNAS (a FreeBSD variant with a nice web GUI to control the underlying ZFS file system). So far so good. Very nice external network storage. I may buy another for home.