= IPC for Dummies = Understanding HelenOS IPC is essential for the development of HelenOS userspace servers and services and, to a much lesser extent, for the development of any HelenOS userspace code. This document attempts to concisely explain how to use the HelenOS IPC. It doesn't aspire to be exhaustive nor to cover the implementation details of the IPC subsystem itself, which is dealt with in chapter 8 of the HelenOS design [http://www.helenos.eu/doc/design.pdf documentation]. * [#IpcIntroRT Introduction to the runtime environment] * [#IpcIntroIPC Basics of IPC communication] * [#IpcConnect Connecting to another task] * [#IpcShortMsg Passing short IPC messages] * [#IpcDataCopy Passing large data via IPC] * [#IpcShareMem Sharing memory via IPC] * [#IpcSkeletonSrv Writing a skeleton server] == Introduction to the runtime environment == #IpcIntroRT The HelenOS kernel maintains a hospitable environment for running instances of user programs called ''tasks''. All tasks run in a separate address space so that one task cannot access another task's address space, but the kernel provides means of inter-task communication (IPC). In order to exploit the parallelism of today's processors, each task breaks down to one or more independently scheduled ''threads''. In userspace, each thread executes by the means of a lightweight execution entities called ''fibrils''. The distinction between threads and fibrils is that the kernel schedules threads and is completely unaware of fibrils. The standard library cooperatively schedules fibrils and lets them run on behalf of the underlying thread. Due to this cooperative way of scheduling, fibrils will run uninterrupted until completion unless: * They explicitly yield the processor to another fibril * They wait for an IPC reply that has not arrived yet * They request an IPC operation which results in the underlying thread being blocked * The underlying thread is preempted by the kernel Fibrils were introduced especially to facilitate more straight forward IPC communication. == Basics of IPC communication == #IpcIntroIPC Because tasks are isolated from each other, they need to use the kernel's syscall interface for communication with the rest of the world. In the last generation of microkernels, the emphasis was put on synchronous IPC communication. In HelenOS, both synchronous and asynchronous communication is possible, but it could be concluded that the HelenOS IPC is primarily asynchronous. The concept of and the terminology used in HelenOS IPC is based on the natural abstraction of a telephone dialogue between a man on one side of the connection and an answerbox on the other. The presence of a passive answerbox determines the asynchronous nature of the communication. Because of that, the call cannot be immediately answered, but needs to be first picked up from the answerbox by the second party. In HelenOS, the IPC communication goes like in the following example. A userspace fibril uses one of its ''phones'', which is connected to the callee task's ''answerbox'', and makes a short ''call''. The caller fibril can either make another call or wait for the answer. The callee task has a missed call stored in its answerbox now. Sooner or later, one of the callee task's fibril will pick the call up, process it and either answer it or forward it to a third party's answerbox. Under all circumstances, the call will get eventually answered and the answer will be stored in the answerbox of the caller task. === Asynchronous framework === If a task is multithreaded or even if it has only one thread but several fibrils, the idea of connection is jeopardized. How can the task tell which of its fibrils should pick up the next call from the answerbox so that the right guy receives the right data? One approach would be to allow the first available fibril to pick it up, but then we could not talk about a connection and if we tried to preserve the concept of connection, the code handling incoming calls would most likely become full of state automata and callbacks. In HelenOS, there is a specialized piece of software called asynchronous framework, which forms a layer above the low-level IPC library functions. The asynchronous framework does all the state automata and callback dirty work itself and hides the implementation details from the programmer. The asynchronous framework is making an extensive use of fibrils and in fact it was the asynchronous framework which justified the existence of HelenOS fibrils. With the asynchronous framework in place, there are two kinds of fibrils: * manager fibrils and * worker fibrils. Manager fibrils pick up calls from answerboxes and according to their internal routing tables pass them to respective worker fibrils, which handle particular connections. If a worker fibril decides to wait for an answer which has not arrived yet, the asynchronous framework will register all necessary callbacks and switch to another runnable fibril. The framework will switch back to the original fibril only after the answer has arrived. If there are no runnable fibrils to switch to, the asynchronous framework will block the entire thread. The benefit of using the asynchronous framework and fibrils is that the programmer can do without callbacks and state automata and still use asynchronous communication. === Capabilities of HelenOS IPC === The capabilities of HelenOS IPC can be summarized in the following list: * short calls, consisting of one argument for method number and five arguments of payload, * answers, consisting of one argument for return code and five arguments of payload, * sending large data to another task, * receiving large data from another task, * sharing memory from another task, * sharing memory to another task, * interrupt notifications for userspace device drivers. The first two items can be considered basic building blocks. Using short calls and answers, copying larger blocks of data and sharing memory between address spaces is possible (and also elegant) thanks to kernel monitoring the situation. The kernel basically snoops on the communication between the negotiating tasks and takes care of data transfers or memory sharing when the two parties agree on the data transfer or the memory sharing, respectively. == Connecting to another task == #IpcConnect A HelenOS task can only communicate with another task to which it has an open phone. When created, each task has one open phone to start with. This initial phone is always connected to the naming service. The naming service is a system task at which other services register and which can connect clients to other registered services. The following snippet demonstrates how a task asks the naming service to connect it to the VFS server: {{{ #include #include ... vfs_phone = ipc_connect_me_to(PHONE_NS, SERVICE_VFS, 0, 0); if (vfs_phone < 0) { /* handle error */ } }}} The naming service simply forwards the ''IPC_M_CONNECT_ME_TO'' call, which is marshalled by the ipc_connect_me_to(), to the destination service, provided that such service exists. Note that the service to which you intend connecting to will create a new fibril for handling the connection from your task. The newly created fibril in the destination task will receive the ''IPC_M_CONNECT_ME_TO'' call and will be given chance either to accept or reject the connection. In the snippet above, the client doesn't make use of two server-defined connection arguments. If the connection is accepted, a new non-negative phone number will be returned to the client task. From that time on, the task can use that phone for making calls to the service. The connection exists until either side closes it. The client uses the ''ipc_hangup(int phone)'' interface to close the connection. == Passing short IPC messages == #IpcShortMsg On the lowest level, tasks communicate by making calls to other tasks to which they have an open phone. Each call is a data structure accommodating six native arguments (i.e. six 32-bit arguments on 32-bit systems or six 64-bit arguments on 64-bit systems). The first argument of the six will be interpreted as a method number for requests and return code for answers. Method is either a system method or a protocol-defined method. System method numbers range from 0 to 1023, protocol-defined method numbers start at 1024. In the case of system methods, the payload arguments will have a predefined meaning and will be interpreted by the kernel. In the case of protocol-defined methods, the payload arguments will be defined by the protocol in question. Even though a call can be made by using the low-level IPC primitives, it is strongly discouraged (unless you know what you are doing) in favor of using the asynchronous framework. Making an asynchronous request via the asynchronous framework is fairly easy, as can be seen in the following example: {{{ #include #include ... int vfs_phone; aid_t req; ipc_call_t answer; ipcarg_t rc; ... req = async_send_3(vfs_phone, VFS_OPEN, lflag, oflag, 0, &answer); ... async_wait_for(req, &rc); .... if (rc != EOK) { /* handle error */ } }}} In the example above, the standard library is making an asynchronous call to the VFS server. The method number is ''VFS_OPEN'', and ''lflag'', ''oflag'' and 0 are three payload arguments defined by the VFS protocol. Note that the number of arguments figures in the numeric suffix of the async_send_3() function name. There are analogous interfaces which take from zero to five payload arguments. In this example, there are no payload return arguments except for the return value. If there were some return arguments of interest, the client could access them using ''IPC_GET_ARG1()'' through ''IPC_GET_ARG5()'' macros on the ''answer'' variable. The advantage of the asynchronous call is that the client doesn't block during the send operation and can do some fruitful work before it starts to wait for the answer. If there is nothing to be done before sending the message and waiting for the answer, it is better to perform a synchronous call. Using the asynchronous framework, this is achieved in the following way: {{{ #include #include ... int vfs_phone; int fildes; ipcarg_t rc; ... rc = async_req_1_0(vfs_phone, VFS_CLOSE, fildes); if (rc != EOK) { /* handle error */ } }}} The example above illustrates how the standard library synchronously calls the VFS server and asks it to close a file descriptor passed in the ''fildes'' argument, which is the only payload argument defined for the ''VFS_CLOSE'' method. The interface name encodes the number of input and return arguments in the function name, so there are variants that take or return different number of arguments. Note that contrary to the asynchronous example above, the return arguments would be stored directly to pointers passed to the function. The interface for answering calls is ''ipc_answer_n()'', where ''n'' is the number of return arguments. This is how the VFS server answers the ''VFS_OPEN'' call: {{{ ipc_answer_1(rid, EOK, fd); }}} In this example, ''rid'' is the hash of the received call, ''EOK'' is the return value and ''fd'' is the only return argument. == Passing large data via IPC == #IpcDataCopy Passing five words of payload in a request and five words of payload in an answer is not very suitable for larger data transfers. Instead, the application can use these building blocks to negotiate transfer of a much larger block (currently there is a hard limit on 64KiB). The negotiation has three phases: * the initial phase in which the client announces its intention to copy memory to or from the recipient, * the receive phase in which the server learns about the bid, and * the final phase in which the server either accepts or rejects the bid. We use the terms client and server instead of the terms sender and recipient, because a client can be both the sender and the recipient and a server can be both the recipient and the sender, depending on the direction of the data transfer. In the following text, we'll cover both. In theory, the programmer can use the low-level short IPC messages to implement all three phases himself. However, this is can be tedious and error prone and therefore the standard library offers convenience wrappers for each phase instead. === Sending data === When sending data, the client is the sender and the server is the recipient. The following snippet illustrates the initial phase on the example of the libc ''open()'' call which transfers the path name to the VFS server. The initial phase is also the only step needed on the sender's side. {{{ #include ... int vfs_phone; int rc; char *pa; size_t pa_len; ... rc = ipc_data_write_start(vfs_phone, pa, pa_len); if (rc != EOK) { /* an error or the recipient denied the bid */ } }}} The ''pa'' and ''pa_len'' arguments, respectively, specify the source address and the suggested number of bytes to transfer, respectively. The recipient will be able to determine the size parameter of the transfer in the receive phase: {{{ #include ... ipc_callid_t callid; size_t len; ... if (!ipc_data_write_receive(&callid, &len)) { /* protocol error - the sender is not sending data */ } /* success, the receive phase is complete */ }}} After the receive phase, the recipient will know - from the ''len'' variable - how many bytes is the sender willing to send. So far, no data is transfered. The separation of the receive and the final phase is important, because the recipient can get ready for the transfer (e.g. allocate the required amount of memory). Now the recipient is on the cross-roads. It can do one of three things. It can answer the call with a non-zero return code, or it can accept and restrict the size of the transfer, or it can accept the transfer including the suggested size. The latter two options are achieved like this: {{{ char *path; ... (void) ipc_data_write_finalize(callid, path, len); }}} After this call, the data transfer of ''len'' bytes to address ''path'' will be realized. The operation can theoretically fail, so you should check the return value of ''ipc_data_write_finalize()''. If it is non-zero, then there was an error. === Accepting data === When accepting data, the client is the recipient and the server is the sender. The situation is similar to the previous one, the only difference is that the client specifies the destination address and the largest possible size for the transfer. The server can send less data than requested. In the following example, the ''read()'' function in the standard library is requesting ''nbyte'' worth of data to be read from a file system into the ''buf'' buffer: {{{ #include ... int vfs_phone; void *buf; size_t nbyte; ipcarg_t rc; ... rc = ipc_data_read_start(vfs_phone, buf, nbyte); if (rc != EOK) { /* handle error */ } }}} Now the file system, say it is TMPFS, receives the request like this: {{{ #include ... ipc_callid_t callid; size_t len; ... if (!ipc_data_read_receive(&callid, &len)) { /* protocol error - the recipient is not accepting data */ } /* success, the receive phase is complete */ }}} After the receive phase is over, ''len'' is the maximum possible size of data the client is willing to accept. The sender can only restrict this value. Until the final phase is over, no data is transfered. The final phase follows: {{{ (void) ipc_data_read_finalize(callid, dentry->data + pos, bytes); }}} Here the sender specifies the source address and the actual number of bytes to transfer. After the function call completes, the data has been transfered to the recipient. Note that the return value of ''ipc_data_read_finalize()'' is, maybe unjustly, ignored. == Sharing memory via IPC == #IpcShareMem In HelenOS, tasks can share memory only via IPC as the kernel does not provide dedicated system calls for memory sharing. Instead, the tasks negotiate much like in the case of [#IpcDataCopy passing large data]. The negotiation has three phases and is very similar to the previous case: * the initial phase in which the client announces its intention to share memory to or from the recipient, * the receive phase in which the server learns about the bid, and * the final phase in which the server either accepts or rejects the bid. The semantics of the client and server also remains the same. Note that the direction of sharing is significant as well as it is significant during data copying. === Sharing address space area out === When sharing an address space area to other tasks, the client is the sender and the server is the recipient. The client offers one of its address space areas to the server for sharing. The following code snippet illustrates libblock's ''block_init()'' function offering a part of its address space starting at ''com_area'' to a block device associated with the ''dev_phone'' phone handle: {{{ #include ... int rc; int dev_phone; void *com_area; ... rc = ipc_share_out_start(dev_phone, com_area, AS_AREA_READ | AS_AREA_WRITE); if (rc != EOK) { /* handle error */ } }}} This is how the RD server receives the address space area offer made above: {{{ #include ... ipc_callid_t callid; size_t maxblock_size; int flags; ... if (!ipc_share_out_receive(&callid, &maxblock_size, &flags)) { /* handle error */ } }}} After the offer is received, the server has a chance to reject it by answering ''callid'' with an error code distinct from EOK. The reason for denial can be an inappropriate ''maxblock_size'' or non-suitable address space area flags in the ''flags'' variable. If the offer looks good to the server, it will accept it like this: {{{ void *fs_va; ... (void) ipc_share_out_finalize(callid, fs_va); }}} Note that the return value of ''ipc_share_out_finalize()'' is maybe unjustly ignored here. The kernel will attempt to create the mapping only after the server calls ''ipc_share_out_finalize()''. ==== Sharing address space area in ==== When sharing memory from other tasks, the client is the recipient and the server is the sender. The client asks the server to provide an address space area. In the following example, the libfs library asks the VFS server to share the Path Lookup Buffer: {{{ #include ... fs_reg_t *reg; int vfs_phone; int rc; ... rc = ipc_share_in_start_0_0(vfs_phone, reg->plb_ro, PLB_SIZE); if (rc != EOK) { /* handle error */ } }}} The VFS server learns about the request by performing the following code: {{{ #include ... ipc_callid_t callid; size_t size; ... if (!ipc_share_in_receive(&callid, &size)) { /* handle error */ } }}} The server now has a chance to react to the request. If ''size'' does not meet the server's requirement, the server will reject the offer. Otherwise the server will accept it. Note that so far, the address space area flags were not specified. That will happen in the final phase: {{{ ... uint8_t *plb; (void) ipc_share_in_finalize(callid, plb, AS_AREA_READ | AS_AREA_CACHEABLE); }}} Again, the kernel will not create the mapping before the server completes the final phase of the negotiation via ''ipc_share_in_finalize()''.