Linux File System

 

            

 

What are filesystems?

 

A filesystem is the methods and data structures that an operating system uses to keep track of files on a disk or partition; that is, the way the files are organised on the disk. File Systems is also used to refer to a partition or disk that is used to store the files or the type of the filesystem. Before a partition or disk can be used as a filesystem, it needs to be initialised, and the bookkeeping data structures need to be written to the disk. This process is called making a filesystem.

 

The central concepts are superblock, inode, data block, directory block, and indirection block. The superblock contains information about the filesystem as a whole, such as its size (the exact information here depends on the filesystem). An inode contains all information about a file, except its name. The name is stored in the directory, together with the number of the inode. A directory entry consists of a filename and the number of the inode which represents the file. The inode contains the numbers of several data blocks, which are used to store the data in the file. There is space only for a few data block numbers in the inode, however, and if more are needed, more space for pointers to the data blocks is allocated dynamically. These dynamically allocated blocks are indirect blocks; the name indicates that in order to find the data block, one has to find its number in the indirect block first.

 

 

 

Types

 

Linux supports several types of filesystems.:

 

minix

The oldest, presumed to be the most reliable, but quite limited in features (some time stamps are

missing, at most 30 character filenames) and restricted in capabilities (at most 64 MB per filesystem).

 

 

ext2

The most featureful of the native Linux filesystems, currently also the most popular one. It is

designed to be easily upwards compatible, so that new versions of the filesystem code do not require re-making the existing filesystems.

 

ext

An older version of ext2 that wasn't upwards compatible. It is hardly ever used in new installations any more, and most people have converted to ext2.

 

In addition, support for several foreign filesystem exists, to make it easier to exchange files with other operating systems.

 

 

msdos

Compatibility with MS-DOS (and OS/2 and Windows NT) FAT filesystems.

 

vfat

This is an extension of the FAT filesystem known as FAT32. It supports larger disk sizes than FAT.Most MS Windows disks are vfat.

iso9660

The standard CD-ROM filesystem; the popular Rock Ridge extension to the CD-ROM standard that allows longer file names is supported automatically.

nfs

A networked filesystem that allows sharing a filesystem between many computers to allow easy

access to the files from all of them.

smbfs

A networks filesystem which allows sharing of a filesystem with an MS Windows computer. It is

compatible with the Windows file sharing protocols.

 

The choice of filesystem to use depends on the situation. If compatibility or other reasons make one of the non-native filesystems necessary, then that one must be used. If one can choose freely, then it is probably wisest to use ext2, since it has all the features but does not suffer from lack of performance.

 

·        Creating a file system

 

 

Filesystems are created, i.e., initialised, with the mkfs command. There is actually a separate program for each filesystem type. mkfs is just a front end that runs the appropriate program depending on the desired filesystem type. The type is selected with the -t fstype option.

To create an ext2 filesystem on a floppy, one would give the following command:

$ mkfs -t ext2 –c /dev/fd0H1440

mke2fs 0.5a, 5-Apr-94 for EXT2 FS 0.5, 94/03/10

360 inodes, 1440 blocks

72 blocks (5.00%) reserved for the super user

First data block=1

Block size=1024 (log=0)

Fragment size=1024 (log=0)

1 block group

8192 blocks per group, 8192 fragments per group

360 inodes per group

Checking for bad blocks (read-only test): done

Writing inode tables: done

Writing superblocks and filesystem accounting information:

done

$

 

 

 

 

FILE SYSTEM STRUCTURE

 

 

             

·         Overview of the File System Hierarchy

This section is loosely based on the Filesystems Hierarchy Standard (FHS) version 2.1, which attempts to set a standard for how the directory tree in a Linux  system is organised.

The full directory tree is intended to be breakable into smaller parts, each capable of being on its own disk or partition, to accommodate to disk size limits and to ease backup and other system administration tasks. The major parts are the root (/), /usr, /var, and /home filesystems (see Figure 1). Each part has a different purpose. The directory tree has been designed so that it works well in a network of Linux machines which may share some parts of the filesystems over a read-only device (e.g., a CD-ROM), or over the network with NFS.

 

Figure 1

 

The root filesystem is specific for each machine and contains the files that are necessary for booting the system up, and to bring it up to such a state that the other filesystems may be mounted. The contents of the root filesystem will therefore be sufficient for the single user state. It will also contain tools for fixing a broken system, and for recovering lost files from backups.

 

The root filesystem should generally be small, since it contains very critical files and a small, infrequently modified filesystem has a better chance of not getting corrupted. A corrupted root filesystem will generally mean that the system becomes unbootable except with special measures (e.g., from a floppy), so you don't want to risk it.

 

 

The /usr filesystem contains all commands, libraries, manual pages, and other unchanging files

needed during normal operation. No files in /usr should be specific for any given machine, nor

should they be modified during normal use. This allows the files to be shared over the network,

which can be cost-effective since it saves disk space (there can easily be hundreds of megabytes,

increasingly multiple gigabytes in /usr). It can make administration easier (only the master

/usr needs to be changed when updating an application, not each machine separately) to have /usr network mounted. Even if the filesystem is on a local disk, it could be mounted read-only, to lessen the chance of filesystem corruption during a crash.

 

The /var filesystem contains files that change, such as spool directories (for mail, news, printers, etc), log files, formatted manual pages, and temporary files. Traditionally everything in /var has been somewhere below /usr, but that made it impossible to mount /usr read-only.

 

The  /home filesystem contains the users' home directories, i.e., all the real data on the system.

Separating home directories to their own directory tree or filesystem makes backups easier; the other parts often do not have to be backed up, or at least not as often as they seldom change. A big /home might have to be broken across several filesystems, which requires adding an extra naming level below /home, for example /home/students and /home/staff.

 

 

The /etc directory contains a lot of files. Many networking configuration files are in /etc

 

The /dev directory contains the special device files for all the devices. The device files are named using special conventions. The device files are created during installation, and later with the /dev/MAKEDEV script. The /dev/MAKEDEV.local is a script written by the system administrator that creates local-only device files or links (i.e. those that are not part of the standard MAKEDEV, such as device files for some non-standard device driver).

 

The /proc filesystem contains an illusionary filesystem. It does not exist on a disk. Instead, the kernel creates it in memory. It is used to provide information about the system.

 

 

A file system can be seen in terms of two different logical categories of files

 

Shareable vs. unshareable files

 Variable vs. static files

 

 

·          Shareable vs unsharable

 

Shareable files are those that can be accessed by various hosts;

Unshareable files are not available to any other hosts..

 

·          Static vs variable

 

Variable files can change at any time without any intervention;

Static files, such as read-only documentation and binaries, do not change without an action from the system administrator or an agent that the system administrator has placed in motion to accomplish that task

 

The reason for looking at files in this way is to help you understand the type of permissions given to the directory that holds them. The way in which the operating system and its users need to use the files determines the directory where those files should be placed, whether the directory is mounted read-only or read-write, and the level of access allowed on each file. The top level of this organization is crucial, as the access to the underlying directories can be restricted or security problems may manifest themselves if the top level is left disorganized or without a widely-used structure.

 

 

 

 Mounting and unmounting

 

Before one can use a filesystem, it has to be mounted.. Since all files in UNIX are in a single directory tree, the mount operation will make it look like the contents of the new filesystem are the contents of an existing subdirectory in some already mounted filesystem.

 

 

For example, Figure 2 shows three separate filesystems, each with their own root directory. When the last two filesystems are mounted below /home and /usr, respectively, on the first filesystem, we can get a single directory tree, as in Figure 3.

 

 

Figure 2.. Three separate filesystems.

 

Figure 3.. /home and /usr have been mounted.

 

 

 

The mounts could be done as in the following example:

$ mount /dev/hda2 /home

$ mount /dev/hda3 /usr

$

 

To mount an MS-DOS floppy, you could use the following command:

$ mount -t msdos /dev/fd0

/floppy

$

 

When a filesystem no longer needs to be mounted, it can be unmounted with umount.

To unmount the directories of the previous example, one could use the commands

$ umount /dev/hda2

$ umount /usr

$

 

Linux file permissions & ownership

 

In a multi user system you need a way to protect each user from another. One of the reasons

is that a user can abuse the system for his needs, or be able to read/modify/delete another

users work. Even if your using your Linux box in a single-user mode you need to protect your

self from making deadly mistakes that can damage your system.

 

 

 

The Chmod Command

 What is Chmod?

Chmod is used to change the access permissions of a named file, directory, device or program. These permissions can be set to three different classes, user, group, and the world. Each of these classes of user (owner, group and world) can have permission to read, write or execute the file, depending on your preference.


Permissions & Values


In Linux, every file and directory has three(3) sets of access permissions. Those applied to the owner of the file, those applied to the group the file has, and those of all users on the system.

You can see these permissions when you do an ls -l.
The output will look like:

ls -l

total 16

drwx------

2 ty   ty

      4096 Jun  9 00:01 mail

-rw-------

1 ty   ty

        557 Jul  4 12:22 mbox

drwx------

2 ty   ty

      4096 Apr  5 20:55 nsmail

drwx---r-x

4 ty   ty

      4096 Jun 11 21:34 public_html



What does all this mean?

This first column of the listing is the permissions of the file.

drwx---r-x

The first character represents the type of file.
The 'd' means directory.

drwx---r-x

The next nine(9) characters are the file permissions.
The first three(3) characters represent the permissions held by the file's owner (ty), the second three(3) are for the group the files are in and the last three(3) are the world permissions.



The following letters are used to represent those permissions:

Letter

Meaning

 

 

r

Read

w

Write

x

Execute


Each permission has a corresponding value. Seen here:

Read =     4
Write =     2
Execute = 1


When you combine attributes, you add their value.

Permission

 Values

Meaning

 

 

 

---

0

 No permissions

r--

4

 Read only

rw-

6

 Read and Write

rwx

7

 Read, Write and Execute

r-x

5

 Read and Execute

--x

1

 Execute


Sure other combinations exist, but this is all you'll need (I hope). When you combine these values, you get three numbers that make up the files the files permissions. Here are some examples:

Permission

Values

Meaning

-rw-------

600

The owner has read and write permissions. Nobody else has privileges.This is what you'll want to set for the majority of your files.

-rw-r--r--

644

The owner has read and write permissions. The group and world has read only permissions. Use this when you're sure you want to let others read this file.

-rw-rw-rw-

666

*THIS IS BAD* Everybody has read and write permissions.You don't want people to be allowed to change your files.

-rwx------

700

The owner has read, write and execute permissions. This is what you'd use for programs you'll want to run.

-rwxr-xr-x

755

The owner has read, write and execute permissions. The group and rest of the world have read and execute.

-rwxrwxrwx

777

*THIS IS BAD* Everyone has read, write and execute permissions.Allowing people to edit your files is just asking for trouble.

-rwx--x--x

711

The owner has read, write and execute permissions.The group and the rest of the world have execute only permissions.This is perfect for letting others run programs, but not copy.

drwx------

700

This is a directory. Only the owner can read and write to it. (Note: All directories must have an executable bit set)

drwxr-xr-x

755

This directory can be changed only by the owner, but everyone else can view it's contents.

drwx--x--x

711

This is perfect for when you need to keep a directory world readable, but you don't want people being able to view it's content. Only if they know the file name they're looking for will they be allowed to read it.



 Chmod Usage


 To change the permissions on a file, log in as root and then enter the following:

# chmod permissions filename


Where permissions is a numeric value (three(3) digits which can be seen above) and file is the name of the file for which you want to affect.

For example, to set the ty.html file to be read and writeable by the owner, but only want to allow the group and world read access, the command would be:

]# chmod 644 ty.html


To recursively change the permissions on all the files in a specific directory, use the -R option in the command. For example, to make all the files on /home/ty/html set to the permission 755, you would:

# chmod -R 755 /home/ty/html



 

The File Systems Table

 

fstab stand's for File System TABle. It is where the system administrator can tell the OS about any filesystems the machine may have access to. It also allows default parameters to be provided for each filesystem.

A typical fstab looks something like the following:

#
# /etc/fstab
#
# <device>      <mountpoint>    <filesystemtype><options>  <dump> <fsckorder>
 
/dev/hdb5 /              ext2         defaults                   1              1
/dev/hdb2 /home                      ext2         defaults                   1              2
/dev/hdc   /mnt/cdrom              iso9660    noauto,ro,user          0              0
/dev/hda1 /mnt/dos/c               msdos      defaults                   0              0
/dev/hdb1 /mnt/dos/d               msdos      defaults                   0              0
/dev/fd0   /mnt/floppy              ext2         noauto,user              0              0
/dev/hdb4 none                        ignore      defaults                   0              0
 
none        /proc                       proc         defaults
/dev/hdb3 none                        swap        sw

Note that this system has two IDE partitions, one which is used as /, and the other used as /home. It also has two DOS partitions which are mounted under /mnt. Note the user option provided for the cdrom, and the floppy drive. This is one of the many default parameters you can specify. In this case it means that any user can mount a cdrom, or floppy disk.

 

 

 

 

 

 

File System Types

 

 

Ext2

 

·         Motivation of the second extended file system

The Second Extended File System has been designed and implemented to fix some problems present in the first Extended File System. Ext2fs was designed to have excellent performance, a very robust filesystem in order to reduce the risk of data loss in intensive use and designed to include provision for extensions to allow users to benefit from new features without reformatting their filesystem.

``Standard'' Ext2fs features

The Ext2fs supports standard Unix file types: regular files, directories, device special files and symbolic links. Ext2fs is able to manage filesystems created on really big partitions. While the original kernel code restricted the maximal filesystem size to 2 GB, recent work in the VFS layer have raised this limit to 4 TB. Thus, it is now possible to use big disks without the need of creating many partitions.

Ext2fs provides long file names. It uses variable length directory entries. The maximal file name size is 255 characters. This limit could be extended to 1012 if needed.

Ext2fs reserves some blocks for the super user (root). Normally, 5% of the blocks are reserved. This allows the administrator to recover easily from situations where user processes fill up filesystems.

 

“Advanced” Ext2fs features

Ext2fs allows the administrator to choose the logical block size when creating the filesystem. Block sizes can typically be 1024, 2048 and 4096 bytes. Using big block sizes can speed up I/O since fewer I/O requests, and thus fewer disk head seeks, need to be done to access a file

 

 

Ext2fs implements fast symbolic links. A fast symbolic link does not use any data block on the filesystem. The target name is not stored in a data block but in the inode itself. This policy can save some disk space (no data block needs to be allocated) and speeds up link operations (there is no need to read a data block when accessing such a link).

 

Ext2fs keeps track of the filesystem state. A special field in the superblock is used by the kernel code to indicate the status of the file system. When a filesystem is mounted in read/write mode, its state is set to ``Not Clean''. When it is unmounted or remounted in read-only mode, its state is reset to ``Clean''. At boot time, the filesystem checker uses this information to decide if a filesystem must be checked. The kernel code also records errors in this field. When an inconsistency is detected by the kernel code, the filesystem is marked as ``Erroneous''. The filesystem checker tests this to force the check of the filesystem regardless of its apparently clean state. 

Always skipping filesystem checks may sometimes be dangerous, so Ext2fs forces checks at regular intervals

Performance optimizations

The Ext2fs kernel code contains many performance optimizations, which tend to improve I/O speed when reading and writing files.

 Ext2fs takes advantage of the buffer cache management by performing readaheads: when a block has to be read, the kernel code requests the I/O on several contiguous blocks. This way, it tries to ensure that the next block to read will already be loaded into the buffer cache. Readaheads are normally performed during sequential reads on files and Ext2fs extends them to directory reads.

 

Ext2fs also contains many allocation optimizations. Block groups are used to cluster together related inodes and data: the kernel code always tries to allocate data blocks for a file in the same group as its inode. This is intended to reduce the disk head seeks made when the kernel reads an inode and its data blocks.

 

When writing data to a file, Ext2fs preallocates up to 8 adjacent blocks when allocating a new block. Preallocation hit rates are around 75% even on very full filesystems. This preallocation achieves good write performances under heavy load. It also allows contiguous blocks to be allocated to files, thus it speeds up the future sequential reads.

 

What are the advantages of ext3?

Why do you want to migrate from ext2 to ext3? Four main reasons: availability, data integrity, speed, and easy transition.

 

Availability

After an unclean system shutdown (unexpected power failure, system crash), each ext2 file system cannot be mounted until the e2fsck program has checked its consistency. The amount of time that the e2fsck program takes is determined primarily by the size of the file system, and for today's relatively large (many tens of gigabytes) file systems, this takes a long time. Also, the more files you have on the file system, the longer the consistency check takes. File systems that are several hundreds of gigabytes in size may take an hour or more to check. This severely limits availability.

By contrast, ext3 does not require a file system check, even after an unclean system shutdown, except for certain rare hardware failure cases (e.g. hard drive failures). This is because the data is written to disk in such a way that the file system is always consistent. The time to recover an ext3 file system after an unclean system shutdown does not depend on the size of the file system or the number of files; rather, it depends on the size of the "journal" used to maintain consistency. The default journal size takes about a second to recover (depending on the speed of the hardware).

Data Integrity
Using the ext3 file system can provide stronger guarantees about data integrity in case of an unclean system shutdown. You choose the type and level of protection that your data receives. You can choose to keep the file system consistent, but allow for damage to data on the file system in the case of unclean system shutdown; this can give a modest speed up under some but not all circumstances. Alternatively, you can choose to ensure that the data is consistent with the state of the file system; this means that you will never see garbage data in recently-written files after a crash. The safe choice, keeping the data consistent with the state of the file system, is the default.

Speed
Despite writing some data more than once, ext3 is often faster (higher throughput) than ext2 because ext3's journaling optimizes hard drive head motion. You can choose from three journaling modes to optimize speed, optionally choosing to trade off some data integrity.

 

 

·         TuningSuggestions

 

Most Linux block device drivers use a generic tunable "elevator" algorithm for scheduling block I/O. The /sbin/elvtune program can be used to trade off between throughput and latency

 

 

 

References:

 

http://www.linuxlookup.com

http://www.redhat.com/docs/manuals/linux/

http://www.redhat.com/docs/manuals/linux/RHL-8.0-Manual/getting-started-guide/

www.pathname.com/fhs/

www.124.ibm.com/developerworks/oss/jfs/

www.jan.joh.cam.ac.uk/~adm36/StegFS

www.perso.wanadoo.fr/matthieu.willm/ext2-os2

www.kalamazoolinux.org/presentations/19981015