= Sysel = An effort to design a high-level programming language for writing HelenOS severs and applications. * [https://launchpad.net/sysel Sysel project at launchpad] * [http://trac.helenos.org/trac.fcgi/browser/head/uspace/dist/src/sysel/demos Examples of real (working) Sysel code] Note that Sysel syntax is not finalized. Some important language features are missing at the moment (especially visibility control and packaging) so the examples presented will need to change when these are implemented. This article currently serves several purposes. First as a memo, not to forget ideas and elaborations. Second as a temporary source of information for anyone who wants to learn about Sysel (and plans for it). Third by sharing the plans to allow discussion and brainstorming. == Roadmap == || '''Sub-project name''' || '''Status''' || '''Description''' || || Sysel Bootstrap Interpreter (SBI) || In progress || Interpreter of Sysel written in C. Runs in HelenOS and POSIX. || || Sysel Compiler Toolkit (NNPS) || Not started || Modular compiler of Sysel written in Sysel itself. To produce C and/or LLVM IR. == SBI == SBI is an interpreter of Sysel currently in development. It is available stand-alone for POSIX or bundled with HelenOS (only in ''Bazaar repository'', not yet in a stable release). You can run it with the command "`sbi `''source_file.sy''". Demos that you can run are available in `/src/sysel/demos`. Source files comprising the library are in `/src/sysel/lib`. You can also run `sbi` without parameters to enter interactive mode. === Synopsis of current SBI features === * Primitive types: `bool`, `char`, `int`, `string` * Compound types: class, multi-dimensional array * Other types: delegates, enumerations * Objective features: constructors, inheritance, grandfather class, static and non-static method invocation * Syntactic sugar: variadic functions, accessor methods (named and indexed properties), autoboxing * Arithmetic: big integers, addition, subtraction, multiplication, boolean operators * Static type checking (mostly), generic classes (unconstrained), exception handling * Bindings: Text file I/O, `WriteLine`, `Exec` === Missing SBI features === * division * structs * interfaces * builtin object methods/properties * static class variables * visibility control * working with binary data * generic type constraints * method and operator overloading * code organization (packages and modules) == Ideas for Sysel == Notes on features which are almost certain to appear in Sysel in one way another. === Code organization === Sysel shall employ ''packages'' and ''modules''. Together, these two constructs provide full information about organization of the codebase and allow for a certain degree of freedom in how finely the code is partitioned, both in terms of namespace and code volume. ==== Packages ==== Packages provide two main features: a namespace and visibility controls. Packages thus provide a greater level of isolation than mere classes and allow safe composition of code developed by different (uncoordinated) teams. Packages can have a well defined API/ABI and can be delivered in compiled form via libraries. Each package has a name which must be fully qualified. Within a package all symbol references only need to be qualified relative to the package. To reference symbols outside of the current package they must be either imported or the reference must be fully qualified. (TODO: Should we enforce explicit import of all symbols?) Symbols can only be imported individually or in a qualified manner. This ensures that there can be no collisions of symbols from different namespaces (which need not be under the control of the same entity). When importing symbols the symbols being imported must be specified using their fully qualified names. ==== Modules ==== Modules provide a complementary and finer-grained means of decomposition. Usually each source file corresponds to exactly one module. For each module we define its (unqualified) name and fully qualified name of the package it belongs to (which 'anchors' it in the code base). Conversely, each package specifies all modules it consists of. Consequently, for each module we can determine which package it belongs to and for each package we can determine all modules (and thus all symbols) it consists of. As we explained, modules allow the source code to be broken into separate files and at the same time tie it together in a formal manner. When building a package or program, there is thus no need to specify all its source files informally in a makefile. It is sufficient to point the compiler to directories where it should look for source files and tell it which package we want built. Modules do not represent a namespace. Any symbols defined or imported in one module will be accessible (unqualified) in any other module within the same package. Names of global symbols in all modules of a package must therefore be coordinated. Note that due to object-oriented nature of the language there are usually not very many global symbols defined in a package and also packages are assumed to be under the control of a single entity. Definitions of classes can be split across multiple modules (but not packages). Thus large classes can be split accross multiple source files. === Dynamic linking === It should be possible to use, with similar simplicity and the same level of static type checking, not only ''compulsory libraries'', but also ''optional libraries'' and ''plugin libraries''. Compulsory libraries are those required every time the executable is invoked (equivalent to `gcc -lname`). Optional libraries are only loaded once the application touches some symbol from the library. This is a very useful feature that allows building binaries with all optional dependencies enabled, yet the user need not install all these libraries if they do not want to. This helps avoiding ''dependency avalanches''. Plugin libraries are those where multiple libraries can exist written again some common plugin interface. One possibility is to have ''packages'' implement ''package interfaces''. A package could be loaded at run time, a reference to it stored to a variable whose type is the ''package interface'' type. Then it would be possible to refer to symbols within the dynamic package using standard qualified names (e.g. `P.symbol`). This enables full static type checking / interface checking for both the implementor and user of the plugin. === Remote objects === ==== Basics ==== HelenOS IPC is usually employed in an RPC-like style. Remote objects would support asynchronous messaging in the language itself. Remote object classes (and interfaces) form a separate hierarchy of inheritance to the ''local'' classes and interfaces. Remote interfaces are equivalent to IPC interfaces now usually defined in HelenOS in `uspace/lib/c/include/ipc`. They would naturally support (multiple) inheritance. Servers contain remote classes which implement these interfaces. When a client wants to use some service, they are given a reference to a remote object. This reference identifies not only the server which we talk to, but possibly also the individual resource within the server that we are accessing. For a contrived example, a console server might provide the two interfaces: {{{ interface IConsole, remote is fun GetVC(vc_index : int) : IVC; end interface IVC, remote is fun GotoXY(x, y : int); fun Write(s : string); end }}} When we invoke the GetVC() method, the console server will pass us a reference to the remote object implementing the requested VC. Then we can work with this particular VC using that reference: {{{ var Con : IConsole; var VC : IVC; C = NameService.GetConnection("console") as IConsole; VC = C.GetVC(2); VC.GotoXY(10, 10); VC.Write("Hello World!"); }}} Connection creation and termination, as well as transaction management (identifying the objects being worked with) is automatically handled by the language run-time. Also handled automatically is the creation of threads and fibrils within a server. A server can potentially handle any number of parallel requests (though it might be possible to limit this with some quota, if required). Concurrent access to remote objects is possible (and often desired). ==== Remote invocation ==== When a method of a remote object is invoked, the method ID and its parameters are serialized and the resulting message is sent to the server. On the server the method ID and arguments are de-serialized and the implementation of the method is invoked. When the method returns, the return value (and possibly output arguments) are serialized and sent back to the client. At the client the return value(s) are de-serialized and returned to the caller. Some notes: * Multiple threads/fibrils may use the same remote object in parallel without fear of blocking each other (as long as the server is properly implemented) * Stateful services can be implemented by the server handing out state objects (such as open-file object on a file server). ==== Promises ==== [http://en.wikipedia.org/wiki/Futures_and_promises Promises] can be used to express asynchronous behavior and potentially allow for [http://en.wikipedia.org/wiki/Promise_pipelining#Promise_pipelining promise pipelining] (a form of optimization). In our case it would suffice to have a specialized form of promise, one that promises some data to be delivered from a remote object. Promises would be declared using a prefix type operator `future`. As long as the data received from a remote object stays in a type that is `future`, it is handled in an asynchronous fashion. Once the data is converted to a non-future type, the execution blocks until the data is received. Example: {{{ interface IAsyncIO is fun AReadBlock(addr : int) : future Block; end }}} {{{ fun ReadBlocksParallel(start_addr, count : int) : Block[] is var fblock : (future Block)[]; for i in range(0, count) do -- This does not block fblock[i] = AReadBlock(start_addr + i); end -- All reads are now being executed in parallel. -- Each array element is implicitly converted from future Block to Block. -- This blocks until all data has been received. return fblock; end }}} === String language specification === It has been suggested by Pavel Rimsky that very often string literals in a program contain data in some machine readable language (e.g. format strings, SQL statements) or references to external resources. It might be useful to be able to somehow specify this in the program, so that external tools could recognize and work with these for purposes such as syntax checking, refactoring, etc. Note: That means ''identifying'' the language the string contains. Defining any ''properties'' (e.g. syntax) of the language the string contains is out of scope! One typical example here is a formatting function. The format string argument is in a well-defined language. Here it would be useful to specify language of the formal argument. With any use of such function the compilation tools could try to verify the real argument. Similarly we might to specify language for a member variable. A different approach is specifying language of a string literal in situ. This is reminiscent of (X)HTML which allows embedding pieces of code written in different languages (e.g. CSS, ECMAscript) while specifying the external language using its MIME type, or language constructs such as "extern C". Both approaches could be combined. TODO: Consider where language annotations would be useful and how they should be realized lexically and syntactically. (Must look pretty!) === Miscellaneous ideas === These ideas are considered for inclusion (but need not be included). They need elaborating. ==== Member pointers ==== Delegates identify the object instance and method to be called (but not the arguments). Conversely, member pointers identify the method to be called, but not the arguments and not the object instance (It can be invoked on any object which is instace of a given class). This feature comes from C++. ==== True inner classes ==== A true inner class is non-static in the sense that any instance of this class implicitly contains a reference to some instance of the outer class. Thus the inner class is constructed in non-static context (in context of an object instance) and the outer object can be referenced via a keyword. ==== Output function arguments ==== Semantically equivalent to additional return values of a function. A simple way to return multiple values (especially since Sysel does not have tuples). ==== Built-in associative arrays ==== Maps and sets are so commonly used and so immensely useful that it might be worth incorporating into the langauge core. This could bring greater ease of use and optimization opportunities.