Fork us on GitHub Follow us on Facebook Follow us on Twitter

Version 24 (modified by Jiri Svoboda, 11 years ago) (diff)

String language spec

Sysel

An effort to design a high-level programming language for writing HelenOS severs and applications.

Note that Sysel syntax is not finalized. Some important language features are missing at the moment (especially visibility control and packaging) so the examples presented will need to change when these are implemented.

This article currently serves several purposes. First as a memo, not to forget ideas and elaborations. Second as a temporary source of information for anyone who wants to learn about Sysel (and plans for it). Third by sharing the plans to allow discussion and brainstorming.

Roadmap

Sub-project name Status Description
Sysel Bootstrap Interpreter (SBI) In progress Interpreter of Sysel written in C. Runs in HelenOS and POSIX.
Sysel Compiler Toolkit (NNPS) Not started Modular compiler of Sysel written in Sysel itself. To produce C and/or LLVM IR.

SBI

SBI is an interpreter of Sysel currently in development. It is available stand-alone for POSIX or bundled with HelenOS (only in Bazaar repository, not yet in a stable release). You can run it with the command "sbi source_file.sy". Demos that you can run are available in /src/sysel/demos. Source files comprising the library are in /src/sysel/lib.

You can also run sbi without parameters to enter interactive mode.

Synopsis of current SBI features

  • Primitive types: bool, char, int, string
  • Compound types: class, multi-dimensional array
  • Other types: delegates, enumerations
  • Objective features: constructors, inheritance, grandfather class, static and non-static method invocation
  • Syntactic sugar: variadic functions, accessor methods (named and indexed properties), autoboxing
  • Arithmetic: big integers, addition, subtraction, multiplication, boolean operators
  • Static type checking (mostly), generic classes (unconstrained), exception handling
  • Bindings: Text file I/O, WriteLine, Exec

Missing SBI features

  • division
  • structs
  • interfaces
  • builtin object methods/properties
  • static class variables
  • visibility control
  • working with binary data
  • generic type constraints
  • method and operator overloading
  • code organization (packages and modules)

Ideas for Sysel

Notes on features which are almost certain to appear in Sysel in one way another.

Code organization

Sysel shall employ packages and modules. Together, these two constructs provide full information about organization of the codebase and allow for a certain degree of freedom in how finely the code is partitioned, both in terms of namespace and code volume.

Packages

Packages provide two main features: a namespace and visibility controls. Packages thus provide a greater level of isolation than mere classes and allow safe composition of code developed by different (uncoordinated) teams. Packages can have a well defined API/ABI and can be delivered in compiled form via libraries. Each package has a name which must be fully qualified.

Within a package all symbol references only need to be qualified relative to the package. To reference symbols outside of the current package they must be either imported or the reference must be fully qualified. (TODO Should we enforce explicit import of all symbols?) Symbols can only be imported individually or in a qualified manner. This ensures that there can be no collisions of symbols from different namespaces (which need not be under the control of the same entity). When importing symbols the symbols being imported must be specified using their fully qualified names.

Modules

Modules provide a complementary and finer-grained means of decomposition. Usually each source file corresponds to exactly one module. For each module we define its (unqualified) name and fully qualified name of the package it belongs to (which 'anchors' it in the code base). Conversely, each package specifies all modules it consists of. Consequently, for each module we can determine which package it belongs to and for each package we can determine all modules (and thus all symbols) it consists of.

As we explained, modules allow the source code to be broken into separate files and at the same time tie it together in a formal manner. When building a package or program, there is thus no need to specify all its source files informally in a makefile. It is sufficient to point the compiler to directories where it should look for source files and tell it which package we want built.

Modules do not represent a namespace. Any symbols defined or imported in one module will be accessible (unqualified) in any other module within the same package. Names of global symbols in all modules of a package must therefore be coordinated. Note that due to object-oriented nature of the language there are usually not very many global symbols defined in a package and also packages are assumed to be under the control of a single entity.

Definitions of classes can be split across multiple modules (but not packages). Thus large classes can be split accross multiple source files.

Dynamic linking

It should be possible to use, with similar simplicity and the same level of static type checking, not only compulsory libraries, but also optional libraries and plugin libraries.

Compulsory libraries are those required every time the executable is invoked (equivalent to gcc -lname). Optional libraries are only loaded once the application touches some symbol from the library. This is a very useful feature that allows building binaries with all optional dependencies enabled, yet the user need not install all these libraries if they do not want to. This helps avoiding dependency avalanches.

Plugin libraries are those where multiple libraries can exist written again some common plugin interface. One possibility is to have packages implement package interfaces. A package could be loaded at run time, a reference to it stored to a variable whose type is the package interface type. Then it would be possible to refer to symbols within the dynamic package using standard qualified names (e.g. P.symbol). This enables full static type checking / interface checking for both the implementor and user of the plugin.

Remote objects

Basics

HelenOS IPC is usually employed in an RPC-like style. Remote objects would support asynchronous messaging in the language itself. Remote object classes (and interfaces) form a separate hierarchy of inheritance to the local classes and interfaces. Remote interfaces are equivalent to IPC interfaces now usually defined in HelenOS in uspace/lib/c/include/ipc. They would naturally support (multiple) inheritance. Servers contain remote classes which implement these interfaces.

When a client wants to use some service, they are given a reference to a remote object. This reference identifies not only the server which we talk to, but possibly also the individual resource within the server that we are accessing. For a contrived example, a console server might provide the two interfaces:

interface IConsole, remote is
        fun GetVC(vc_index : int) : IVC;
end

interface IVC, remote is
        fun GotoXY(x, y : int);
        fun Write(s : string);
end

When we invoke the GetVC() method, the console server will pass us a reference to the remote object implementing the requested VC. Then we can work with this particular VC using that reference:

var Con : IConsole;
var VC : IVC;

C = NameService.GetConnection("console") as IConsole;
VC = C.GetVC(2);
VC.GotoXY(10, 10);
VC.Write("Hello World!");

Connection creation and termination, as well as transaction management (identifying the objects being worked with) is automatically handled by the language run-time. Also handled automatically is the creation of threads and fibrils within a server. A server can potentially handle any number of parallel requests (though it might be possible to limit this with some quota, if required). Concurrent access to remote objects is possible (and often desired).

Remote invocation

When a method of a remote object is invoked, the method ID and its parameters are serialized and the resulting message is sent to the server. On the server the method ID and arguments are de-serialized and the implementation of the method is invoked. When the method returns, the return value (and possibly output arguments) are serialized and sent back to the client. At the client the return value(s) are de-serialized and returned to the caller.

Some notes:

  • Multiple threads/fibrils may use the same remote object in parallel without fear of blocking each other (as long as the server is properly implemented)
  • Stateful services can be implemented by the server handing out state objects (such as open-file object on a file server).

Promises

Promises can be used to express asynchronous behavior and potentially allow for promise pipelining (a form of optimization). In our case it would suffice to have a specialized form of promise, one that promises some data to be delivered from a remote object. Promises would be declared using a prefix type operator future.

As long as the data received from a remote object stays in a type that is future, it is handled in an asynchronous fashion. Once the data is converted to a non-future type, the execution blocks until the data is received.

Example:

interface IAsyncIO is
        fun AReadBlock(addr : int) : future Block;
end
fun ReadBlocksParallel(start_addr, count : int) : Block[] is
        var fblock : (future Block)[];

        for i in range(0, count) do
                -- This does not block
                fblock[i] = AReadBlock(start_addr + i);
        end

        -- All reads are now being executed in parallel.

        -- Each array element is implicitly converted from future Block to Block.
        -- This blocks until all data has been received.
        return fblock;
end

String language specification

It has been suggested by Pavel Rimsky that very often string literals in a program contain data in some machine readable language (e.g. format strings, SQL statements) or references to external resources. It might be useful to be able to somehow specify this in the program, so that external tools could recognize and work with these for purposes such as syntax checking, refactoring, etc.

Note: That means identifying the language the string contains. Defining any properties (e.g. syntax) of the language the string contains is out of scope!

One typical example here is a formatting function. The format string argument is in a well-defined language. Here it would be useful to specify language of the formal argument. With any use of such function the compilation tools could try to verify the real argument. Similarly we might to specify language for a member variable.

A different approach is specifying language of a string literal in situ. This is reminiscent of (X)HTML which allows embedding pieces of code written in different languages (e.g. CSS, ECMAscript) while specifying the external language using its MIME type, or language constructs such as "extern C".

Both approaches could be combined.

TODO Consider where language annotations would be useful and how they should be realized lexically and syntactically. (Must look pretty!)

Miscellaneous ideas

These ideas are considered for inclusion, need elaborating:

  • Member pointers
  • True inner classes
  • Output function arguments
  • Built-in associative arrays