Inspecting btrfs, Linux’s always half-finished filesystem
Enlarge / We do not recommend allowing btrfs to directly manage a complex array of volumes – floppy disks or otherwise.
Btrfs – short for “B-Tree File System” and often pronounced “butter” or “butter eff ess” – is the most advanced file system available in the mainline Linux kernel. In a way, btrfs is simply trying to replace ext4, the default file system for most Linux distributions. But btrfs also aims to provide next generation functionality that breaks the simple “file system” form and combines the functionality of a RAID array manager, volume manager, and more.
We have good news and bad news on this. First, btrfs is a perfectly cromulent single-disk Ext4 replacement. However, if you want to replace ZFS – or a more complex stack based on discrete RAID management, volume management, and a simple file system – the picture doesn’t look quite as rosy. Although the btrfs project fixed many of the glaring issues it started with in 2009, other issues remain essentially unchanged 12 years later.
Chris Mason is the founding developer of btrfs, which he started working on in 2007 while working at Oracle. This leads a lot of people to believe that btrfs is an Oracle project – it isn’t. The project belonged to Mason, not his employer, and to this day it is a community project that is not encumbered by company property. In 2009, btrfs 1.0 was included in the mainline Linux kernel 2.6.29.
Although btrfs was included in the mainline in 2009, it wasn’t really ready for production. For the next four years, creating a btrfs filesystem would give the administrator who dared to make mkfs a btrfs the following deliberately scary message, and it required a non-standard Y to proceed:
Btrfs is a new file system with extents, writable snapshots, support for multiple devices and many more features. Btrfs is very experimental and THE HARD DISK FORMAT HAS NOT BEEN FINALIZED. You should say N here unless you want to test Btrfs on non-critical data.
Since Linux users are Linux users, many chose to ignore this warning – and it comes as no surprise that a lot of data was lost as a result. This four year broad beta could have had a lasting impact on the btrfs developer community, which in my experience tended to fall back on “well, it’s all beta anyway” whenever user-reported issues came up. This happened after mkfs.btrfs lost its fear dialogue in late 2013.
It has now been almost eight years since the “experimental” tag was removed, but many of btrfs’ age-old problems remain unsolved and virtually unchanged. So we repeat it again: As a single-disk file system, btrfs has been stable and for the most part high-performance for years. But the deeper you get into the new functions of btrfs, the more uncertain the ground you are on – that’s what we’re concentrating on today.
Btrfs has only one real competitor in Linux and BSD file systems: OpenZFS. It is almost impossible to compare and contrast Btrfs with OpenZFS as the Venn diagram of their respective feature sets is little more than a single, slightly lumpy circle. But we will try to avoid direct comparison and confrontation of the two as much as possible. If you’re an OpenZFS administrator you already know; and unless you are an OpenZFS administrator they are not really helpful.
btrfs is not just a simple single disk file system, it also offers multiple disk topologies (RAID), volume managed storage (see Linux Logical Volume Manager), atomic copy-on-write snapshots, asynchronous incremental replication, automatic correction corrupted data, and compression on the hard drive.
Compared to legacy storage
If you want to create a btrfs- and ZFS-free system with similar functionality, you’ll need a stack of discrete layers – mdraid at the bottom for RAID, next LVM for snapshots, and then a filesystem like ext4 or xfs on the top of your storage cup.
Unfortunately, a mdraid + LVM + ext4 storage stack is still missing some of the theoretically most compelling features of btrfs. LVM provides atomic snapshots, but not direct snapshot replication. Neither ext4 nor xfs offer inline compression. And while mdraid can offer data healing if you enable the dm integrity target, it kind of sucks.
The dm-integrity target uses an extremely weak crc32 hash algorithm by default, which is prone to collision, it requires target devices to be completely overwritten on initialization, and it also requires that every block of a replaced hard drive be completely rewritten after a failure – beyond full drive writing required during initialization.
In short, you cannot replicate all of the promised functionality of btrfs on a legacy storage stack. To get the full bundle you will need either btrfs or ZFS.
Btrfs multi-panel topologies
Now that we’ve covered where things go wrong with a legacy storage stack, it is time to take a look at where btrfs itself is failing. For this reason, let’s take a look at btrfs’ multi-disk topologies first.
Btrfs offers five different disk topologies: btrfs-raid0, btrfs-raid1, btrfs-raid10, btrfs-raid5, and btrfs-raid6. Although the documentation tends to refer to these topologies more simply – e.g. only as raid1 and not as btrfs-raid1 – we strongly recommend keeping the prefix in mind. In some cases, these topologies can be very different from their traditional counterparts.
|topology||Conventional version||Btrfs version|
|RAID0||Simple stripe – lose any hard drive, lose the array||Simple stripe – lose any hard drive, lose the array|
|RAID1||Simple mirror—All data blocks on disk n and disk o are identical||Guaranteed redundancy– Copies of all blocks are saved on two separate devices|
|RAID10||Striped mirror sets – e.g. B. a stripe over three mirrored pairs of disks||Striped mirror sets – e.g. B. a stripe over three mirrored pairs of disks|
|RAID5||RAID with diagonal parity – single parity (one parity block per stripe), Fixed Strip width||RAID with diagonal parity – single parity (one parity block per stripe) with variable Strip width|
|RAID6||Diagonal parity RAID – double parity (two parity blocks per stripe), Fixed Strip width||RAID with diagonal parity – double parity (two parity blocks per stripe) with variable Strip width|
As you can see in the graphic above, btrfs-raid1 was quite different from its traditional analogue. To understand how to do this, let’s think about a hypothetical collection of “mutt” drives of various sizes. If we have one 8T hard drive, three 4T hard drives, and one 2T hard drive, it is difficult to make a useful conventional RAID array out of this – for example, a RAID5 or RAID6 would have to treat them all as 2T hard drives (which are only 8T Produce raw storage). before parity).
However, btrfs-raid1 offers a very interesting premise. Since it does not pair the volumes, it can use the entire collection of volumes without wasting it. Every time a block is written to btrfs-raid1, it is written identically to two separate disks – any two separate disks. Since there are no fixed pairings, btrfs-raid1 is free to simply fill all of the disks at the same approximate rate proportional to their free capacity.
The btrfs-raid5 and btrfs-raid6 topologies are similar to btrfs-raid1 in that, unlike their traditional counterparts, they are able to handle mismatched drive sizes by dynamically changing the stripe width as smaller drives fill. However, neither btrfs-raid5 nor btrfs-raid6 should be used in production for reasons that we will discuss in a moment.
The btrfs-raid10 and btrfs-raid0 topologies are much closer to their traditional counterparts, and for most purposes these can be viewed as direct replacements with the same strengths and weaknesses.