Changes between Initial Version and Version 1 of IPC


Ignore:
Timestamp:
2009-04-09T00:17:26Z (15 years ago)
Author:
trac
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • IPC

    v1 v1  
     1== IPC for Dummies ==
     2
     3Understanding HelenOS IPC is essential for the development of HelenOS userspace servers and services and,
     4to a much lesser extent, for the development of any HelenOS userspace code. This document attempts to concisely explain how
     5to use the HelenOS IPC. It doesn't aspire to be exhaustive nor to cover the implementation details of the IPC
     6subsystem itself, which is dealt with in chapter 8 of the HelenOS design [http://www.helenos.eu/doc/design.pdf documentation].
     7
     8 * [#IpcIntroRT Introduction to the runtime environment]
     9 * [#IpcIntroIPC Basics of IPC communication]
     10 * [#IpcConnect Connecting to another task]
     11 * [#IpcShortMsg Passing short IPC messages]
     12 * [#IpcDataCopy Passing large data via IPC]
     13 * [#IpcShareMem Sharing memory via IPC]
     14 * [#IpcSkeletonSrv Writing a skeleton server]
     15
     16=== Introduction to the runtime environment === #IpcIntroRT
     17
     18The HelenOS kernel maintains a hospitable environment for running instances of user programs called ''tasks''.
     19All tasks run in a separate address space so that one task cannot access another task's address space, but the
     20kernel provides means of inter-task communication (IPC). In order to exploit the parallelism of today's processors,
     21each task breaks down to one or more independently scheduled ''threads''.
     22
     23In userspace, each thread executes by the means of a lightweight execution entities called ''fibrils''.
     24The distinction between threads and fibrils is that the kernel schedules threads and is completely unaware of fibrils.
     25
     26The standard library cooperatively schedules fibrils and lets them run on behalf of the underlying thread. Due to this
     27cooperative way of scheduling, fibrils will run uninterrupted until completion unless:
     28
     29 * They explicitly yield the processor to another fibril
     30 * They wait for an IPC reply that has not arrived yet
     31 * They request an IPC operation which results in the underlying thread being blocked
     32 * The underlying thread is preempted by the kernel
     33
     34Fibrils were introduced especially to facilitate more straight forward IPC communication.
     35
     36=== Basics of IPC communication === #IpcIntroIPC
     37
     38Because tasks are isolated from each other, they need to use the kernel's syscall interface for communication with the rest of
     39the world. In the last generation of microkernels, the emphasis was put on synchronous IPC communication. In HelenOS, both
     40synchronous and asynchronous communication is possible, but it could be concluded that the HelenOS IPC is primarily asynchronous.
     41
     42The concept of and the terminology used in HelenOS IPC is based on the natural abstraction of a telephone dialogue between a man on one
     43side of the connection and an answerbox on the other. The presence of a passive answerbox determines the asynchronous nature of the communication.
     44Because of that, the call cannot be immediately answered, but needs to be first picked up from the answerbox by the second party.
     45
     46In HelenOS, the IPC communication goes like in the following example. A userspace fibril uses one of its ''phones'', which is connected to the
     47callee task's ''answerbox'', and makes a short ''call''. The caller fibril can either make another call or wait for the answer. The callee task
     48has a missed call stored in its answerbox now. Sooner or later, one of the callee task's fibril will pick the call up, process it and either answer
     49it or forward it to a third party's answerbox. Under all circumstances, the call will get eventually answered and the answer will be stored in the answerbox
     50of the caller task.
     51
     52==== Asynchronous framework ====
     53
     54If a task is multithreaded or even if it has only one thread but several fibrils, the idea of connection is jeopardized. How can the task tell which
     55of its fibrils should pick up the next call from the answerbox so that the right guy receives the right data? One approach would be to allow the first
     56available fibril to pick it up, but then we could not talk about a connection and if we tried to preserve the concept of connection, the code handling
     57incoming calls would most likely become full of state automata and callbacks. In HelenOS, there is a specialized piece of software called asynchronous
     58framework, which forms a layer above the low-level IPC library functions. The asynchronous framework does all the state automata and callback dirty work
     59itself and hides the implementation details from the programmer.
     60
     61The asynchronous framework is making an extensive use of fibrils and in fact it was the asynchronous framework which justified the existence of HelenOS fibrils.
     62With the asynchronous framework in place, there are two kinds of fibrils:
     63
     64 * manager fibrils and
     65 * worker fibrils.
     66
     67Manager fibrils pick up calls from answerboxes and according to their internal routing tables pass them to respective worker fibrils, which handle particular
     68connections. If a worker fibril decides to wait for an answer which has not arrived yet, the asynchronous framework will register all necessary callbacks and switch to another runnable fibril. The framework will switch back to the original fibril only after the answer has arrived. If there are no runnable fibrils to switch to, the asynchronous framework will block the entire thread.
     69
     70The benefit of using the asynchronous framework and fibrils is that the programmer can do without callbacks and state automata and still use asynchronous communication.
     71
     72==== Capabilities of HelenOS IPC ====
     73
     74The capabilities of HelenOS IPC can be summarized in the following list:
     75
     76 * short calls, consisting of one argument for method number and five arguments of payload,
     77 * answers, consisting of one argument for return code and five arguments of payload,
     78 * sending large data to another task,
     79 * receiving large data from another task,
     80 * sharing memory from another task,
     81 * sharing memory to another task,
     82 * interrupt notifications for userspace device drivers.
     83
     84The first two items can be considered basic building blocks.
     85
     86Using short calls and answers, copying larger blocks of data and sharing memory between address spaces is possible (and also elegant) thanks to kernel monitoring the situation. The kernel basically snoops on the communication between the negotiating tasks and takes care of data transfers or memory sharing when the two parties agree
     87on the data transfer or the memory sharing, respectively.
     88
     89=== Connecting to another task === #IpcConnect
     90
     91A HelenOS task can only communicate with another task to which it has an open phone. When created, each task has one open phone to start with. This initial phone is always connected to the naming service. The naming service is a system task at which other services register and which can connect clients to other registered services. The following snippet demonstrates how a task asks the naming service to connect it to the VFS server:
     92
     93{{{
     94#include <ipc/ipc.h>
     95#include <ipc/services.h>
     96...
     97        vfs_phone = ipc_connect_me_to(PHONE_NS, SERVICE_VFS, 0, 0);
     98        if (vfs_phone < 0) {
     99                /* handle error */
     100        }
     101}}}
     102
     103The naming service simply forwards the ''IPC_M_CONNECT_ME_TO'' call, which is marshalled by the ipc_connect_me_to(),
     104to the destination service, provided that such service exists. Note that the service to which you intend connecting to will create
     105a new fibril for handling the connection from your task. The newly created fibril in the destination task will receive the
     106''IPC_M_CONNECT_ME_TO'' call and will be given chance either to accept or reject the connection. In the snippet above, the
     107client doesn't make use of two server-defined connection arguments. If the connection is accepted, a new non-negative phone
     108number will be returned to the client task. From that time on, the task can use that phone for making calls to the service.
     109The connection exists until either side closes it.
     110
     111The client uses the ''ipc_hangup(int phone)'' interface to close the connection.
     112
     113=== Passing short IPC messages === #IpcShortMsg
     114
     115On the lowest level, tasks communicate by making calls to other tasks to which they have an open phone. Each call is a data structure
     116accommodating six native arguments (i.e. six 32-bit arguments on 32-bit systems or six 64-bit arguments on 64-bit systems). The first argument of the
     117six will be interpreted as a method number for requests and return code for answers.
     118
     119Method is either a system method or a protocol-defined method. System method numbers range from 0 to 1023, protocol-defined method numbers start
     120at 1024. In the case of system methods, the payload arguments will have a predefined meaning and will be interpreted by the kernel. In the case of
     121protocol-defined methods, the payload arguments will be defined by the protocol in question.
     122
     123Even though a call can be made by using the low-level IPC primitives, it is strongly discouraged (unless you know what you are doing) in favor of
     124using the asynchronous framework. Making an asynchronous request via the asynchronous framework is fairly easy, as can be seen in the following example:
     125
     126
     127{{{
     128#include <ipc/ipc.h>
     129#include <async.h>
     130...
     131        int vfs_phone;
     132        aid_t req;
     133        ipc_call_t answer;
     134        ipcarg_t rc;
     135...
     136        req = async_send_3(vfs_phone, VFS_OPEN, lflag, oflag, 0, &answer);
     137...
     138        async_wait_for(req, &rc);
     139....
     140        if (rc != EOK) {
     141                /* handle error */
     142        }
     143}}}
     144
     145In the example above, the standard library is making an asynchronous call to the VFS server.
     146The method number is ''VFS_OPEN'', and ''lflag'', ''oflag'' and 0 are three payload arguments defined
     147by the VFS protocol. Note that the number of arguments figures in the numeric suffix of the async_send_3()
     148function name. There are analogous interfaces which take from zero to five payload arguments.
     149
     150In this example, there are no payload return arguments except for the return value. If there were some return arguments of interest,
     151the client could access them using ''IPC_GET_ARG1()'' through ''IPC_GET_ARG5()'' macros on the ''answer'' variable.
     152
     153The advantage of the asynchronous call is that the client doesn't block during the send operation and can do some fruitful work
     154before it starts to wait for the answer. If there is nothing to be done before sending the message and waiting for the answer,
     155it is better to perform a synchronous call. Using the asynchronous framework, this is achieved in the following way:
     156
     157{{{
     158#include <ipc/ipc.h>
     159#include <async.h>
     160...
     161        int vfs_phone;
     162        int fildes;
     163        ipcarg_t rc;
     164...
     165        rc = async_req_1_0(vfs_phone, VFS_CLOSE, fildes);
     166        if (rc != EOK) {
     167                /* handle error */
     168        }
     169}}}
     170
     171The example above illustrates how the standard library synchronously calls the VFS server and asks it to close a file descriptor passed
     172in the ''fildes'' argument, which is the only payload argument defined for the ''VFS_CLOSE'' method. The interface name encodes the number of input and return arguments in the function name, so there are variants that take or return different number of arguments. Note that contrary to the asynchronous example above, the return arguments would be stored directly to pointers passed to the function.
     173
     174The interface for answering calls is ''ipc_answer_n()'', where ''n'' is the number of return arguments. This is how the VFS server answers the ''VFS_OPEN'' call:
     175
     176{{{
     177        ipc_answer_1(rid, EOK, fd);
     178}}}
     179
     180In this example, ''rid'' is the hash of the received call, ''EOK'' is the return value and ''fd'' is the only return argument.
     181
     182=== Passing large data via IPC === #IpcDataCopy
     183
     184Passing five words of payload in a request and five words of payload in an answer is not very suitable for larger data transfers. Instead, the application can use these
     185building blocks to negotiate transfer of a much larger block (currently there is a hard limit on 64KiB). The negotiation has three phases:
     186
     187 * the initial phase in which the client announces its intention to copy memory to or from the recipient,
     188 * the receive phase in which the server learns about the bid, and
     189 * the final phase in which the server either accepts or rejects the bid.
     190
     191We use the terms client and server instead of the terms sender and recipient, because a client can be both the sender and the recipient and a server can be both the recipient and the sender, depending on the direction of the data transfer. In the following text, we'll cover both.
     192
     193In theory, the programmer can use the low-level short IPC messages to implement all three phases himself. However, this is can be tedious and error prone and therefore the standard library offers convenience wrappers for each phase instead.
     194
     195==== Sending data ====
     196When sending data, the client is the sender and the server is the recipient. The following snippet illustrates the initial phase on the example of the libc ''open()'' call which transfers the path name to the VFS server. The initial phase is also the only step needed on the sender's side.
     197
     198{{{
     199#include <ipc/ipc.h>
     200...
     201int vfs_phone;
     202int rc;
     203char *pa;
     204size_t pa_len;
     205...
     206        rc = ipc_data_write_start(vfs_phone, pa, pa_len);
     207        if (rc != EOK) {
     208                /* an error or the recipient denied the bid */
     209        }
     210}}}
     211
     212The ''pa'' and ''pa_len'' arguments, respectively, specify the source address and the suggested number of bytes to transfer, respectively.
     213The recipient will be able to determine the size parameter of the transfer in the receive phase:
     214
     215{{{
     216#include <ipc/ipc.h>
     217...
     218ipc_callid_t callid;
     219size_t len;
     220...
     221        if (!ipc_data_write_receive(&callid, &len)) {
     222                /* protocol error - the sender is not sending data */
     223        }
     224        /* success, the receive phase is complete */
     225}}}
     226
     227After the receive phase, the recipient will know - from the ''len'' variable - how many bytes is the sender willing to send. So far, no data is transfered.
     228The separation of the receive and the final phase is important, because the recipient can get ready for the transfer (e.g. allocate the required amount of memory).
     229
     230Now the recipient is on the cross-roads. It can do one of three things. It can answer the call with a non-zero return code, or it can accept and restrict the size of the
     231transfer, or it can accept the transfer including the suggested size. The latter two options are achieved like this:
     232
     233{{{
     234char *path;
     235...
     236        (void) ipc_data_write_finalize(callid, path, len);
     237}}}
     238
     239After this call, the data transfer of ''len'' bytes to address ''path'' will be realized. The operation can theoretically fail, so you should check the return value
     240of ''ipc_data_write_finalize()''. If it is non-zero, then there was an error.
     241
     242==== Accepting data ====
     243
     244When accepting data, the client is the recipient and the server is the sender. The situation is similar to the previous one, the only difference is that the client
     245specifies the destination address and the largest possible size for the transfer. The server can send less data than requested. In the following example, the ''read()''
     246function in the standard library is requesting ''nbyte'' worth of data to be read from a file system into the ''buf'' buffer:
     247
     248{{{
     249#include <ipc/ipc.h>
     250...
     251int vfs_phone;
     252void *buf;
     253size_t nbyte;
     254ipcarg_t rc;
     255...
     256        rc = ipc_data_read_start(vfs_phone, buf, nbyte);
     257        if (rc != EOK) {
     258                /* handle error */
     259        }
     260}}}
     261
     262Now the file system, say it is TMPFS, receives the request like this:
     263
     264{{{
     265#include <ipc/ipc.h>
     266...
     267ipc_callid_t callid;
     268size_t len;
     269...
     270        if (!ipc_data_read_receive(&callid, &len)) {
     271                /* protocol error - the recipient is not accepting data */
     272        }
     273        /* success, the receive phase is complete */
     274}}}
     275
     276After the receive phase is over, ''len'' is the maximum possible size of data the client is willing to accept. The sender can only restrict this value.
     277Until the final phase is over, no data is transfered. The final phase follows:
     278
     279{{{
     280        (void) ipc_data_read_finalize(callid, dentry->data + pos, bytes);
     281}}}
     282
     283Here the sender specifies the source address and the actual number of bytes to transfer. After the function call completes, the data has been transfered to the recipient.
     284Note that the return value of ''ipc_data_read_finalize()'' is, maybe unjustly, ignored.
     285
     286=== Sharing memory via IPC  === #IpcShareMem
     287
     288In HelenOS, tasks can share memory only via IPC as the kernel does not provide dedicated system calls for memory sharing. Instead, the tasks negotiate much like in the case of [#IpcDataCopy passing large data]. The negotiation has three phases and is very similar to the previous case:
     289
     290 * the initial phase in which the client announces its intention to share memory to or from the recipient,
     291 * the receive phase in which the server learns about the bid, and
     292 * the final phase in which the server either accepts or rejects the bid.
     293
     294The semantics of the client and server also remains the same. Note that the direction of sharing is significant as well as it is significant during data copying.
     295
     296==== Sharing address space area out ====
     297
     298When sharing an address space area to other tasks, the client is the sender and the server is the recipient. The client offers one of its address space areas to the server for sharing.
     299The following code snippet illustrates libblock's ''block_init()'' function offering a part of its address space starting at ''com_area'' to a block device associated with the ''dev_phone'' phone handle:
     300
     301{{{
     302#include <ipc/ipc.h>
     303...
     304        int rc;
     305        int dev_phone;
     306        void *com_area;
     307...
     308        rc = ipc_share_out_start(dev_phone, com_area,
     309            AS_AREA_READ | AS_AREA_WRITE);
     310        if (rc != EOK) {
     311                /* handle error */
     312        }
     313}}}
     314
     315This is how the RD server receives the address space area offer made above:
     316
     317{{{
     318#include <ipc/ipc.h>
     319...
     320        ipc_callid_t callid;
     321        size_t maxblock_size;
     322        int flags;
     323...
     324        if (!ipc_share_out_receive(&callid, &maxblock_size, &flags)) {
     325                /* handle error */
     326        }
     327}}}
     328
     329After the offer is received, the server has a chance to reject it by answering ''callid'' with an error code distinct from EOK. The reason for denial can be an inappropriate ''maxblock_size'' or non-suitable address space area flags in the ''flags'' variable. If the offer looks good to the server, it will accept it like this:
     330
     331{{{
     332        void *fs_va;
     333        ...
     334        (void) ipc_share_out_finalize(callid, fs_va);
     335}}}
     336
     337Note that the return value of ''ipc_share_out_finalize()'' is maybe unjustly ignored here.
     338
     339The kernel will attempt to create the mapping only after the server calls ''ipc_share_out_finalize()''.
     340
     341==== Sharing address space area in ====
     342
     343When sharing memory from other tasks, the client is the recipient and the server is the sender. The client asks the server to provide an address space area. In the following example, the libfs library asks the VFS server to share the Path Lookup Buffer:
     344
     345{{{
     346#include <ipc/ipc.h>
     347...
     348        fs_reg_t *reg;
     349        int vfs_phone;
     350        int rc;
     351...
     352        rc = ipc_share_in_start_0_0(vfs_phone, reg->plb_ro, PLB_SIZE);
     353        if (rc != EOK) {
     354                /* handle error */
     355        }
     356
     357}}}
     358
     359The VFS server learns about the request by performing the following code:
     360
     361{{{
     362#include <ipc/ipc.h>
     363...
     364        ipc_callid_t callid;
     365        size_t size;
     366...
     367        if (!ipc_share_in_receive(&callid, &size)) {
     368                /* handle error */
     369        }
     370}}}
     371
     372The server now has a chance to react to the request. If ''size'' does not meet the server's requirement, the server will reject the offer. Otherwise the server will accept it. Note that so far, the address space area flags were not specified. That will happen in the final phase:
     373
     374{{{
     375...
     376        uint8_t *plb;
     377        (void) ipc_share_in_finalize(callid, plb, AS_AREA_READ | AS_AREA_CACHEABLE);
     378}}}
     379
     380Again, the kernel will not create the mapping before the server completes the final phase of the negotiation via the ''ipc_share_in_finalize()''.