UNIX and Linux File Systems

What is a file system? A file system basically organises data on a disk into files and directories and gives each file a name, which must be unique in each directory. It usually associates access permissions with each file too. More sophisticated file systems have facilities to recover data from system crashes.
You do not need to define file systems, it is possible to define and access disks as 'raw' partitions and in fact some database applications claim to prefer raw disks, but most users and applications require a file system to be able to store data.
Linux has its own file systems like ext2 and reiser, but it is unusual because it has a virtual file system layer. That allows it to support file systems developed for other operating systems, such as Windows FAT and NTFS and UNIX JFS

Simple File Systems


The basic file system for UNIX is UNIX File System, or UFS, or sometimes called the Berkeley Fast File system. It introduced the concept of inodes which contain the metadata that describes a file. Every inode describes one file. It is generally considered to be out-dated now, partly because it was optimised for traditional spinning disks.


The Linux default file system is ext2. It can support partitions of up to 4 Terabytes in size and single files up to 2 Gigabytes. However you may not achieve these full sizes as some Linux kernels only will only support 2TB partitions. A Linux filename can be up to 255 characters long.

As it is a simple file system ext2 has no automated recovery from system crashes, but it has an e2fsck command which you would run to check the validity of the file system after a crash. This program also runs automatically once every so many boot-ups to check out the file system, even if the system was correctly shut down.

With the ext2 file system you can undelete files which were deleted by mistake. This is not the same as the Windows recycle bin which simply moves deleted files to a different folder for a while. When a file is physically deleted the directory entry is removed but the underlying data remains intact on the disk until it is overwritten. An ext2 undelete will check to see if the data still exists and will replace the directory entry if the data is intact. The ext2 file system also has a secure deletion which will delete the underlying data if that is a compliance requirement, or it can mark files so they cannot be updated, or can only have data appended to them.

Windows Systems

Most UNIX variants do not support Windows file systems, but Linux supports Windows FAT16 and FAT32 and OS/2 simple file space partitions. This would typically be used to share files between Linux and Windows. Some Linux implementations will also support Windows NTFS in read only mode

Journalled file systems

If the power fails on a computer, or if it is shutdown by simply switching the power off, then it is possible that data on the disk has been corrupted, as some of the data may have been written out from memory, and some not. With a simple file system, the disks must be checked for errors after a failure using a 'scandisk' or 'fdisk' utility. A journalled file system avoids this problem by initially writing updates to a journal file on the disk, then writing out the data to the file areas as a background task. Once these writes are successful the updates are marked as complete and the journal deleted. If the system crashes then on reboot the journal area is scanned for pending updates and these are written out to the disk, without the requirement to scan the entire file system.


JFS (The Journalled File System) is a 64-bit journaling filesystem created by IBM. Don't confuse it with HP JFS which is a variant of Veritas Software's VxFS.
The original JFS was very much tied to the AIX operating system but the second version, sometimes called JFS2, can be used by Linux.

One of the biggest causes of poor I/O performance is fragmentation, where data from a single file is scattered around a disk. JFS carves up the available disk space into Allocation Groups (AG) to help avoid this. JFS will try to keep all the disk blocks and disc inodes for a files the same AG. When a file is opened, JSF will lock the AG that file resides in, so only that open file can allocate more space in the AG, if it grows.

JFS will normally lock out a file that is being updated, but this can be overridden by a Concurrent Input / Output (CIO) option. This is typically used where a database manages concurrent updates to a file and serialised access at record level

JFS just journals metadata, which means that metadata will remain consistent but user files may be corrupted after a crash or power loss. The journal itself can be up to 128MB. Inodes are allocated dynamically as required. Each inode is 512 Bytes and 32 Inodes are allocated on a 16KB Extent.

JFS under Linux is generally considered to be reliable and an efficient CPU user, and it delivers consistently good performance for various types of data and I/O access patterns.


IBM's General Purpose File System, allows applications hosted on different nodes to access files simultaneously, provided these nodes have the GPFS file systems mounted. GPFS supports AIX, Linux and Windows. As well as operating system images, nodes can be Network Shared Disks, provided they have been created and maintained by the NSD component of GPFS. A collection of GPFS nodes is called a GPFS cluster and it is also possible to share data between GPFS clusters.

GPFS is a powerful file system with lots of commands and facilities for managing it. It has a built in snapshot capability for backups, as by its nature the filespaces can grow to multi-terabytes. It also has a built in backup facility called mmbackup, which interfaces between snapshots and TSM.

Data integrity is managed by embedding GPFS commands in the application rather that at FS level.


The VERITAS File System (VxFS) is the default file system for HP-UX systems, and can run on AIX, Solaris and SCO. It is considered to be much faster for building large file systems than some other file systems, because it uses extent based allocation and its extents can span blocks, so big files can occupy contiguous space.

The basic VxFS includes extent based allocation, fast filesystem recovery and large filesystems support.
The advanced product also includes online resizing of filesystems and snapshot mounts for online backup

VxFS can run in parallel mode, which allows for multiple servers to simultaneously access the same file system. This version of VxFS is called the Veritas Cluster File System.

While VxFS allows much shorter recovery times in the event of system failure and handles big files much better, the downside is that it requires more memory that simple file systems.


Ext3 is essentially ext2 with journaling support. There are three types of journaling

Both metadata and file contents are written to the journal before they are written to disk. Secure but has a performance overhead as all the data is written out twice
Just the metadata is journalled, so it is faster, but the data writes could be written out in the wrong order, which could corrupt the end of a file after a crash.
Just the metadata is journalled, but each data component must be written out successfully before the metadata is committed. This forces the data writes to be in order. Ordered is faster than Journal and more secure than writeback so it is usually the default journaling scheme.

It is fully backwards compatible with ext2, ext2 partitions can be converted to ext3 and vice-versa without reformatting the partition and an ext3 partition can be mounted by an older kernel with no ext3 support - it is just seen as a normal ext2 partition. Also ext3 partitions allow files to be undeleted. Ext3 supports on-line filespace resizing, but it does not have a defragmentation facility nor can deleted files be recovered as it zeroes out inodes when a file is deleted.


ReiserFS was introduced with version 2.4.1 of the Linux kernel and was the first journaling file system available for Linux. It was developed by Hans Reiser and uses a balanced tree structure instead of the traditional i-file block based structure. As well as introducing journaling, ReiserFS is designed to be much faster than ext2, and the later versions allow the file system to be expanded online, and also to shrink offline.

It is generally considered to be reliable, be space efficient and deliver good performance. It does not have a defragmentation facility, though this is thought to be superfluous as the files are stored in a balanced tree format.

ReiserFS is supplied as the default filesystem for many implementations of linux, including SuSE, Xandros, Yoper, Linspire, Kurumin Linux, FTOSX and Libranet Linux distributions.


Btrfs is a copy on write (CoW) filesystem for Linux and was created to address the lack of pooling, snapshots, checksums, and integrated multi-device spanning in Linux file systems. The plan is that it will work equally well for petabyte systems and massive block devices all the way down to mobile phones. Because all reads are checksum-verified, which should ensure that backups are not corrupt. It is more scalable that Ext4, supporting up to sixteen times more file and partition sizes.

It is expected that Btrfs will eventually replace ext4 as the default file system for Linux. However, while it has been in development since 2007, is GPL-licensed and Linux is often shipped with Btrfs as an option, it is considered to be unstable at present and so not suitable for business use.

If you decide to use it in a test environment, then you might consider the following

  • Use two (ideally three) equally sized disks, partition them identically, and add each partition to a btrfs raid1 profile volume.
  • Alternatively, dedicate one disk for holding backups, because not much benefit in throughput or iops is yet gained by using btrfs raid1.
  • Do not enable or use transparent filesystem compression with a mount option, in fstab, with "chattr +c", or with "btrfs filesystem defrag".
  • Do not use quotas/qgroups.
  • Keep regular backups and use a backup program that supports deduplication (eg: borgbackup).
  • Do not enable mount -o discard, autodefrag, or space_cache=v
  • 2.
  • Overprovision your SSD when partitioning so periodic trim won't be needed.
  • Periodically run btrfs defrag against source subvolumes.
  • Never run btrfs defrag against a child subvolume (eg: snapshots).
  • Insure that the number of snapshots per volume/filesystem never exceeds 12; two or three times that might not cause ill effects, but keeping it under this number provides the greatest odds for avoiding morbid performance issues and out of space conditions. On the upside, many more btrfs snapshots can be taken before performance crashes when compared to LVM snapshots, where a single snapshot can introduce a performance crash.
  • Take care to not fill the volume beyond 90%. If this occurs it may become necessary to run periodic balances to consolidate free space into contiguous chunks. Also, performance will become less predictable.
  • XFS

    Silicon Graphics originally developed XFS for its own IRIX servers, It can handle file sizes of millions of Terabytes and was originally designed for high performance applications like computer graphics. It uses a B+ tree a bit like ReiserFS but has a lot of flexibility in the type of data it can store in each file.

    UNIX and Linux

    Lascon latest major updates