= Implementation and design of the file system layer = Unlike in monolithic designs, the file system functionality is spread across several components in HelenOS: * [#FSDesignLibc standard library support], * [#FSDesignVFS VFS server], * [#FSDesignFS endpoint file system servers], and * [#FSDesignLibFS libfs library]. == Standard library support == #FSDesignLibc The standard library translates more or less POSIX file system requests made by the user application to the VFS server frontend protocol and passes them to ''VFS''. The library emulates some calls such as ''opendir()'', ''readdir()'', ''rewinddir()'' and ''closedir()'' using other calls. In this case these will be ''open()'', ''read()'', ''lseek()'' and ''close()''. The VFS server accepts only absolute file paths and so the standard library takes care of providing the ''getcwd()'' and ''chdir()'' interfaces. It also translates all relative paths to absolute. Passing absolute paths may not be always optimal, but it simplifies the design of the VFS server and the libfs algorithms. In addition, thanks to this feature, the ''dot-dot'' file path components can be processed lexically, which leads to further simplifications. The standard library forwards all other requests, which it is unable to handle itslef, to the VFS server and does not contribute to the file system functionality by any other means. Each file system request forwarded to VFS is composed of one or more IPC phone calls. == VFS server == #FSDesignVFS The VFS server is the focal point and also the most complex element of the file system support in the HelenOS operating system. It exists as a standalone user task. We talk about the VFS frontend and VFS backend. === VFS frontend === The frontend is responsible for accepting requests from the client tasks. For each client, VFS spawns a dedicated connection fibril which handles the connection. Arguments of the incoming requests are either absolute file paths, file handles of already opened files, and in some special cases also VFS triplets (see below). Regardless of their type, the arguments typically reference some file and, as we will see later, the frontend always converts this reference to an internal representation called VFS node. Each request understood by the frontend has a symbolic name, which starts with the VFS_IN prefix. ==== Paths as Arguments ==== If the argument is a file path, VFS uses the ''vfs_lookup_internal()'' function to translate the path into the so called lookup result represented by the ''vfs_lookup_res_t'' type. The lookup result predominantly contains a VFS triplet, which is an ordered triplet containing a global handle of the file system instance, a global device handle and a file index. Thus a VFS triplet uniquely identifies a file on some file system instance. An example VFS triplet could look like this: {{{ (2, 1, 10) }}} In the above example, the VFS triplet describes a file on a file system which was assigned number 2 by the VFS service, located on a device, which was assigned number 1 by the DEVMAP service, and which has a file-system specific index number 10. The last number is also known as i-node number in other operating systems. VFS keeps information about each referenced file in an abstraction called VFS node, for which there is the ''vfs_node_t'' type. Thus, a VFS node represents some file which is referenced by VFS. VFS nodes are the first class entities in the VFS server, because for most operations it needs to have the VFS node. The VFS server calls the ''vfs_node_get()'' function in order to get a VFS node for the corresponding lookup result. This function creates a new VFS node or adds a reference to an existing one. VFS nodes are organized in a hash table with the VFS triplet as a search key. The following example illustrates how the VFS server obtains the VFS node in the implementation of the unlink operation: {{{ int rc; int lflag = ...; char *path = ...; /* file path */ ... vfs_lookup_res_t lr; rc = vfs_lookup_internal(path, lflag | L_UNLINK, &lr, NULL); if (rc != EOK) { /* handle error */ ... } vfs_node_t *node = vfs_node_get(&lr); /* now we have a reference to the node and work with it */ ... vfs_node_put(node); }}} The example is simplified and does not show all the details (e.g. it omits all synchronization), but it shows the main idea. Note the trailing ''vfs_node_put()'' function which drops a reference to a VFS node. If the last reference is dropped from a node, ''vfs_node_put()'' removes it from the hash table and cleans it up. ==== Handles as Arguments ==== The VFS server understands file handles and can accept them as arguments for VFS requests made by the client. Each client is using its private set of file handles to refer to its open files. VFS maintains each client's open files in a table of open files, which is local to the servicing connection fibril. The table is composed of ''vfs_file_t'' pointers and the file handles index it. The associated connection fibril does not need to synchronize accesses to the table of open files because it is its exclusive owner. The ''vfs_file_t'' structures track things like how many file handles reference it, the current position in the open file and the corresponding VFS node. The transition from a file handle to a VFS node is therefore straightforward and is best shown on the following example: {{{ int fd; /* file handle */ ... /* Lookup the file structure corresponding to the file descriptor. */ vfs_file_t *file = vfs_file_get(fd); ... /* * Lock the open file structure so that no other thread can manipulate * the same open file at a time. */ fibril_mutex_lock(&file->lock); ... /* * Lock the file's node so that no other client can read/write to it at * the same time. */ if (read) fibril_rwlock_read_lock(&file->node->contents_rwlock); else fibril_rwlock_write_lock(&file->node->contents_rwlock); }}} In the above code snippet, the ''vfs_rdwr()'' function first translates the file handle using the ''vfs_file_get()'' interface to a ''vfs_file_t'' structure and then locks the result. The VFS node is directly accessed in the two RW-lock lock operations at the end of the example. === VFS backend === As soon as the VFS server knows the VFS node associated with the request, it either asks one of the endpoint file system servers to carry out the operation for it or, when it has enough information, it completes the operation itself. For example, VFS handles the VFS_IN_SEEK request, which corresponds to the POSIX call ''lseek()'', entirely on its own, because it just manipulates the current position pointer within the respective ''vfs_file_t'' structure. In the worst case, when seeking to the end of the file, VFS needs to know the size of the file, but this is not a problem, because the server maintains the current file size in each VFS node. We refer to the part which communicates with the endpoint file system servers as to VFS backend. VFS backend knows the handle of the endpoint file system (and also of the underlying device) from the VFS node, so it can use it to obtain an IPC phone and communicate with it. The set of calls that VFS can make to an endpoint file system server defines the VFS output protocol because all potential endpoint file system servers need to understand it and implement it in some way. The symbolic names of requests in the VFS output protocol are prefixed with VFS_OUT. === PLB and canonical file paths === VFS and the endpoint file system servers cooperate in resolving file system paths to VFS triplets. Roughly speaking, VFS consults the file systems mounted along the given path. Each of them resolves maximum of the yet unresolved portion of the path until it either reaches a mount point or the end of the path. Eventually, the last file system server will manage to resolve the path and reply to the VFS server by sending the resulting VFS triplet. One of the design goals of the HelenOS file system layer is to avoid the situation in which a path or its portion would be repeatedly copied back and forth between VFS and each endpoint file system server. In order to meet this design criteria, VFS allocates and maintains a ring buffer in which it stores all looked-up paths. Owing to its use, the buffer is called Pathname Lookup Buffer, or PLB, and each endpoint file system server shares it read-only with VFS. The paths are placed into the buffer by the above mentioned function ''vfs_lookup_internal()''. To maximally ease the process of path resolution, the PLB is expected to contain only paths that are in the canonical form, which can be defined as follows: 1. the path is absolute (i.e. a/b/c is not canonical) 2. there is no trailing slash in the path (i.e. /a/b/c/ is not canonical) 3. there is no extra slash in the path (i.e. /a//b/c is not canonical) 4. there is no ''dot'' component in the path (i.e. /a/./b/c is not canonical) 5. there is no 'dot-dot' component in the path (i.e. /a/b/../c is not canonical) The standard library contains the ''canonify()'' function, which checks whether a path is canonical and possibly converts a non-canonical path to a canonical one. In a more detailed view, the path translation starts by ''vfs_lookup_internal()'' storing a canonical path into the PLB. VFS then contacts the file system server which is mounted under the file system root and sends it the VFS_OUT_LOOKUP request along with the indices of the first and last characters of the path in the PLB. After the root file system resolves its part of the path it does not necessarily reply back to VFS. If there is still a portion of the path to be resolved, it forwards the VFS_OUT_LOOKUP request to the file system which is mounted under the mount point where the resolution stopped. At the same time, it modifies the argument of the forwarded call, which contains the PLB index of the path's first character, to index the first character of the yet unresolved portion of the path. The resolution continues in the same spirit until one of the file system servers reaches the end of the path. This file system will complete the path resolution by specifying the VFS triplet of the resulting node in an answer to the VFS_OUT_LOOKUP request. The answer will go directly to the originator of the request, which is the VFS server. == Endpoint file system servers == #FSDesignFS As mentioned above, each endpoint file system server needs to implement the VFS output protocol. Through the polymorphism this offers, HelenOS currently supports the following file system types (and we believe that more can be added): * TMPFS --- a custom memory based file system without an on-disk format and permanent storage * FAT16 --- a well known, non-Unix like file system with simple on-disk format * DEVFS --- a custom pseudo file system