A history of S_IFMT
In Unix, S_IFMT
is a mask identifying the bits of an inode's mode that indicate the file's type, i.e. whether it is a directory, a symbolic link, a socket, and so on. It is conventionally 0170000
, which corresponds to the top 4 bits of a 16-bit mode.
I saw someone asking the other day why 4 bits are used when POSIX only defines 7 types, and so could be stored just as well in 3 bits. The straightforward answer is that it allows room for expansion, and indeed many Unixes define several more. Solaris, for example, has an additional 3 types: doors, event ports, and ACL shadows (though the latter is not exposed in userspace).
But that's not the whole story. The question I'm going to answer in this post is not why 4 bits are used, but why they're used the way they are. If you have a look at the standard file types, their values seem pretty arbitrary, when you might expect a simple count upwards.
Symbol | Octal | Bits | Type |
---|---|---|---|
S_IFMT | 170000 | 1111 | |
S_IFIFO | 010000 | 0001 | Named pipe |
S_IFCHR | 020000 | 0010 | Character special |
S_IFDIR | 040000 | 0100 | Directory |
S_IFBLK | 060000 | 0110 | Block special |
S_IFREG | 100000 | 1000 | Regular file |
S_IFLNK | 120000 | 1010 | Symbolic link |
S_IFSOCK | 140000 | 1100 | Socket |
I saw some patterns in there, but I couldn't work it out, so I had a look at some historical manuals and header files.
1st Edition UNIX
1st Edition UNIX (1971) had no type field as such. The top 4 bits of the mode had the following layout. A dot (.
) means that the bit's value doesn't matter.
Octal | Bit | Meaning |
---|---|---|
100000 | 1... | Inode is allocated |
040000 | .1.. | Directory |
020000 | ..1. | Has been modified |
010000 | ...1 | Large file storage |
We can see the origin of S_IFDIR
here, but the other bits had completely different meanings. In fact, 1st Edition had a very different layout for the mode in general. For one thing, groups had yet to be introduced. The bottom 6 bits were used, from higher to lower, to mean: setuid, executable, owner-read, owner-write, other-read, and other-write. And so 1st Edition ls might write --xrwr-
to mean something like -rwxr-xr-x
today.
Bit 020000
was apparently always set to 1, and so was likely just ignored by the time of the 1st Edition. Bit 100000
was also always set to 1 for allocated inodes, but this allowed the file system to distinguish between an unallocated inode and a regular file with no permissions (-------
).
4th Edition UNIX
The mode layout changed in 4th Edition UNIX (1973), coinciding with the addition of groups and a switch to the modern -rwxrwxrwx
layout for the file permissions. This was the first Unix to have a mask for these inode types, though it was only 2 bits wide, taking the place of the directory bit and modification bit.
Symbol | Octal | Bits | Type |
---|---|---|---|
IFMT | 060000 | 0110 | |
000000 | .00. | Regular file | |
IFCHR | 020000 | .01. | Character special |
IFDIR | 040000 | .10. | Directory |
IFBLK | 060000 | .11. | Block special |
The allocation bit (IALLOC
) and large file bit (ILARG
) were still used as in the 1st Edition.
7th Edition UNIX
The next change happened in 7th Edition UNIX (1979), when the mask was extended to the present 4 bits, by extending it by a single bit in each direction, displacing IALLOC
and ILARG
. Yet each bit retained its absolute position in the mode, which is why the earliest types are not counted from 1. In addition, regular files kept their highest bit set (as it will have been when IALLOC
was in use), so as to distinguish between an unallocated inode (stored with a fully zeroed mode), and a regular file with no permissions (----------
).
Also added were two types no longer in use, multiplexed special files, which had the same codes as their uniplexed counterparts, but with their lowest bit set. These types did not however last long.
Symbol | Octal | Bits | Type |
---|---|---|---|
S_IFMT | 170000 | 1111 | |
S_IFCHR | 020000 | 0010 | Character special |
S_IFMPC | 030000 | 0011 | Multiplexed character special |
S_IFDIR | 040000 | 0100 | Directory |
S_IFBLK | 060000 | 0110 | Block special |
S_IFMPB | 070000 | 0111 | Multiplexed block special |
S_IFREG | 100000 | 1000 | Regular file |
System III
System III (1982) added named pipes, starting at the lowest value now possible.
Symbol | Octal | Bits | Type |
---|---|---|---|
S_IFIFO | 010000 | 0001 | Named pipe |
4.3BSD
4.3BSD (1986) added symbolic links and sockets, also counting up but only using the top 3 bits, 160000
, presumably so as not to step on AT&T's toes.
Symbol | Octal | Bits | Type |
---|---|---|---|
S_IFLNK | 120000 | 1010 | Symbolic link |
S_IFSOCK | 140000 | 1100 | Socket |
Enumerating S_IFMT
Something interesting (to me) about how this layout has come about is that, if you twiddle the bits a little, you can end up with a reasonably chronological numbering of the types. Specifically, in code:
fmt = mode >> 12; // drop file permissions, leaving IFMT
if (fmt == 010) return 0; // if only IALLOC bit is set, clear it
return ((fmt >> 1) | (fmt << 2)) & 07; // fold rightmost bit onto leftmost bit
And this gives us:
# | Type |
---|---|
0 | Regular file |
1 | Character special |
2 | Directory |
3 | Block special |
4 | Named pipe |
5 | Symbolic link |
6 | Socket |
So anyway, those are the reasons for the unusual S_IFMT
values.