Linux File Systems - What you Need to Know

Earlier this year we entered a new era of hard disk storage when manufacturers rolled out a 1TB single hard disk drive.

One hard disk drive manufacturer is planning to deliver 4TB hard disk drives in 2011. While we all expect our personal libraries of music, videos, and documents to increase over the next few years, the entertainment industry is expecting to see a 1000% increase in total digital storage. According to analysis by Coughlin Associates, more than six exabytes of digital storage will be used for archiving, content conversion, and content preservation by 2012. An exabyte is a billion gigabytes in decimal terms. Another way to visualise this is in terms of DVDs; one exabyte is equivalent to 250 million DVDs. (That’s the 4GB size.) Six exabytes would equal 1.5 billion DVDs.

Transmitted computer data isn’t far behind. Networking giant Cisco Systems, Inc., published a forecast about the amount of data that will flow through the Internet in the very near future. The report lists total Internet traffic nearly doubling every two years and that consumer IP traffic will surpass business traffic and will be at 18 exabytes per month by 2011. Global Internet video (excluding Peer-2-peer usage) is estimated to be approximately 120 petabytes per month in 2006.

As Internet IP traffic grows, storage needs will also grow. We’ll move from the terabyte era to the exabyte era in a matter of years.

Your users or clients are dealing with an explosion of data growth. The challenge to IT professionals right now is managing all of this data and improving file access performance. Why do we say "improving"? Maintaining the status quo is not good enough. Large data storage devices are storing millions or even billions of files. With data storage systems growing exponentially our accessing tools need to be progressing as well. In addition to storing large data sets, what options are available for file systems? And when an unforeseen data loss occurs with one of these behemoths, who can provide data recovery?

Over the years, Linux has become the operating system of choice for many IT professionals. In the Linux environment, there are many different file systems available. With all the choices, selecting the right file system for users or clients can be challenging. Read on to explore some of the things to consider. (Read here first to learn more about Linux operating systems.)

Linux File Systems

In the past, there wasn’t a lot of choice when it came to file systems. The operating system only offered one or two choices for a file system and the file system was usually so transparent that it was taken for granted.

Over the years, programmers have contributed to the development of new and existing file systems for Linux. Linux operating systems offer a variety of choices for the organization and management of data files on hard disk drives. File systems are interchangeable with the Linux operating system by design; this is part of the portability of the operating system. This is called the Virtual File System (VFS) inside the Linux kernel (the fundamental core of the operating system). There is a great deal of discussion in Linux communities regarding the positive and negative aspects of each file system type. The following table lists, in no particular order, basic choices of Linux file systems and their commonly used aliases. Current choices of Linux file systems are as follows. At the end of this article there are links to information that describes these file systems.

Linux File System Alias
Second Extend File System EXT2
Third Extended File System EXT3
File system for Silicon Graphics’ IRIX operating system XFS
Journaling File System for IBM’s AIX operation system JFS
Journaling File System, 64bit file system for IBM’s AIX operation system JFS2
Journaling file system from NameSys ReiserFS
FAT12, FAT16, FAT32 owned by Microsoft FAT
Unix File System; similar to the Berkeley Fast File System (BSD FFS) UFS

The following is a list of special Linux file systems that require additional configuration or that are owned by specific companies:

Linux File System Alias
New Technology File System owned by Microsoft NTFS
Veritas File System owned by Veritas/Symantec VxFS
Oracle Cluster File System owned by Oracle OCFS2
Global File System; used for Linux cluster computing GFS
General Parallel File System developed by IBM for clustered computing GPFS
Novell Storage Services owned by Novell and ported over to SUSE Linux NSS
Zetabyte File System owned by Sun Solaris ZFS

There are a lot of choices of Linux file systems for workstations and servers. Where does one start? Here are some things to consider.

  1. Determine your file system needs by reviewing your user’s or customer’s environment. Here are a few business IT requirements to consider.
    • File system recoverability
    • Security requirements
    • Database file support
    • File server
  2. Will the data be stored as part of a high-performance computing operation? Examples of high-performance computing servers would be weather modeling systems, molecular modeling databases, or human genome databases. These types of high-end systems require a lot of processing power and memory and also a database and file system that stores massive amounts of raw information.
  3. Finally, determine the file system for user workstations so that business productivity is maximised. Because of the portability of Linux, a variety of file systems can be used based on business requirements. For example, a company’s video production unit may require vast amounts of storage space for editing; however, the business administration side of the company is hardly likely to require that level of performance from their file system.

File System Testing

The best way to answer the above questions is to perform research and testing. The goal should be to determine the performance and reliability of each file system under consideration. Use applications that test and benchmark the file systems being considered (here are some utilities to do that.) Then begin using the system normally, logging the timing, and performance. One writer for the Linux Gazette has benchmarked the most popular file system, read his findings here.

Other recommended tests involve simulating high volume file environments and then reproducing power failures. How long does it take for the volume to become ready, or 'mount?' How long does it take File System Check (FSCK) to work through the file system when there are errors? To test file data integrity, use a MD5 Hash Generator for a group of files, then perform the above tests to make sure the files remain the same. An MD5 Hash Generator is a mathematical algorithm that is used to create a unique signature, or "fingerprint" of a file or set of files to determine if any files suffered internal data corruption.

Testing the storage and performance of large files is important because nearly all Linux file systems fragment the files that are stored. Getting benchmarks for large file storage, helps determine what file system handles user or client needs.

The above suggestions for testing simulate extreme cases and it may be that users or systems will never reach the limits of the testing. However, to make the best choice in Linux file systems, they must be tested to know what can and cannot be handled.

The Leader in Linux File System Recoveries

Perhaps users or clients do not realise they are using a version of Linux. For example, Digital Video Recorders (DVR) have a Linux file system variant on it. A small Network Attached Storage (NAS) device for the home or small office network may also have a version of Linux on it. Future mobile phones may be running a Linux operating system simply because of its ease of design and flexibility. To sum up, software developers are using elements of different Linux file systems for new products.

The proliferation of Linux file systems are due to the open-source nature and general public licensing that follows these designs. No one person or company owns them, therefore their growth and improvement is limitless.

Despite improvements, however, there will always be unforeseen data loss occurrences where either the hard disk drive will malfunction or crash, or errant data corruption will occur and the file system will no longer be mountable. This is where a professional data recovery service is needed.

Kroll Ontrack has been successfully recovering data from Unix and Linux file systems for many years and our unique approach sets us apart from other data recovery companies.

The Kroll Ontrack Edge

What makes Kroll Ontrack the choice for data disasters? Companies choose us because of our experience, dedication to research and development and quality recoveries. We know that data recovery is a science - a discipline that requires trained experts. Using a company that claims to specialise in data recovery and uses off-the-shelf recovery tools does not guarantee success. Also, needed are software developers on staff to customise the recovery tools for your specific file system needs; Unix/Linux file system variants are very common.

We research and study these file systems and designs a suite of tools to recover the data. We take the side of the customer and do all we can to recovery quality data, providing the best solution to data loss.


© 2007 KrollOntrack