wiki:Sysel/Ideas

Ideas for Sysel

This article currently serves several purposes. First as a memo, not to forget ideas and elaborations. Second as a temporary source of information for anyone who wants to learn about plans for Sysel. Third by sharing the plans to allow discussion and brainstorming. Your comments and ideas are appreciated!

Unless stated otherwise, the ideas presented here are planned for inclusion, although they are likely to evolve as the implementation progresses.

Code organization

Note: This section might be out of date.

Sysel shall employ packages and modules. Together, these two constructs provide full information about organization of the codebase and allow for a certain degree of freedom in how finely the code is partitioned, both in terms of namespace and code volume.

Packages

Note: This section might be out of date.

Packages provide two main features: a namespace and visibility controls. Packages thus provide a greater level of isolation than mere classes and allow safe composition of code developed by different (uncoordinated) teams. Packages can have a well defined API/ABI and can be delivered in compiled form via libraries. Each package has a name which must be fully qualified.

Within a package all symbol references only need to be qualified relative to the package. To reference symbols outside of the current package they must be either imported or the reference must be fully qualified. (TODO Should we enforce explicit import of all symbols?) Symbols can only be imported individually or in a qualified manner. This ensures that there can be no collisions of symbols from different namespaces (which need not be under the control of the same entity). When importing symbols the symbols being imported must be specified using their fully qualified names.

Modules

Modules provide a complementary and finer-grained means of decomposition. Usually each source file corresponds to exactly one module. For each module we define its (unqualified) name and fully qualified name of the package it belongs to (which 'anchors' it in the code base). Conversely, each package specifies all modules it consists of. Consequently, for each module we can determine which package it belongs to and for each package we can determine all modules (and thus all symbols) it consists of.

As we explained, modules allow the source code to be broken into separate files and at the same time tie it together in a formal manner. When building a package or program, there is thus no need to specify all its source files informally in a makefile. It is sufficient to point the compiler to directories where it should look for source files and tell it which package we want built.

Modules do not represent a namespace. Any symbols defined or imported in one module will be accessible (unqualified) in any other module within the same package. Names of global symbols in all modules of a package must therefore be coordinated. Note that due to object-oriented nature of the language there are usually not very many global symbols defined in a package and also packages are assumed to be under the control of a single entity.

Definitions of classes can be split across multiple modules (but not packages). Thus large classes can be split accross multiple source files.

Dynamic linking

It should be possible to use, with similar simplicity and the same level of static type checking, not only compulsory libraries, but also optional libraries and plugin libraries.

Compulsory libraries are those required every time the executable is invoked (equivalent to gcc -lname). Optional libraries are only loaded once the application touches some symbol from the library. This is a very useful feature that allows building binaries with all optional dependencies enabled, yet the user need not install all these libraries if they do not want to. This helps avoiding dependency avalanches.

Plugin libraries are those where multiple libraries can exist written again some common plugin interface. One possibility is to have packages implement package interfaces. A package could be loaded at run time, a reference to it stored to a variable whose type is the package interface type. Then it would be possible to refer to symbols within the dynamic package using standard qualified names (e.g. P.symbol). This enables full static type checking / interface checking for both the implementor and user of the plugin.

Remote objects

Note: This section might be out of date.

Basics

HelenOS IPC is usually employed in an RPC-like style. Remote objects would support asynchronous messaging in the language itself. Remote object classes (and interfaces) form a separate hierarchy of inheritance to the local classes and interfaces. Remote interfaces are equivalent to IPC interfaces now usually defined in HelenOS in uspace/lib/c/include/ipc. They would naturally support (multiple) inheritance. Servers contain remote classes which implement these interfaces.

When a client wants to use some service, they are given a reference to a remote object. This reference identifies not only the server which we talk to, but possibly also the individual resource within the server that we are accessing. For a contrived example, a console server might provide the two interfaces:

interface IConsole, remote is
        fun GetVC(vc_index : int) : IVC;
end

interface IVC, remote is
        fun GotoXY(x, y : int);
        fun Write(s : string);
end

When we invoke the GetVC() method, the console server will pass us a reference to the remote object implementing the requested VC. Then we can work with this particular VC using that reference:

var Con : IConsole;
var VC : IVC;

C = NameService.GetConnection("console") as IConsole;
VC = C.GetVC(2);
VC.GotoXY(10, 10);
VC.Write("Hello World!");

Connection creation and termination, as well as transaction management (identifying the objects being worked with) is automatically handled by the language run-time. Also handled automatically is the creation of threads and fibrils within a server. A server can potentially handle any number of parallel requests (though it might be possible to limit this with some quota, if required). Concurrent access to remote objects is possible (and often desired).

Remote invocation

When a method of a remote object is invoked, the method ID and its parameters are serialized and the resulting message is sent to the server. On the server the method ID and arguments are de-serialized and the implementation of the method is invoked. When the method returns, the return value (and possibly output arguments) are serialized and sent back to the client. At the client the return value(s) are de-serialized and returned to the caller.

Some notes:

  • Multiple threads/fibrils may use the same remote object in parallel without fear of blocking each other (as long as the server is properly implemented)
  • Stateful services can be implemented by the server handing out state objects (such as open-file object on a file server).

Promises

Promises can be used to express asynchronous behavior and potentially allow for promise pipelining (a form of optimization). In our case it would suffice to have a specialized form of promise, one that promises some data to be delivered from a remote object. Promises would be declared using a prefix type operator future.

As long as the data received from a remote object stays in a type that is future, it is handled in an asynchronous fashion. Once the data is converted to a non-future type, the execution blocks until the data is received.

Example:

interface IAsyncIO is
        fun AReadBlock(addr : int) : future Block;
end
fun ReadBlocksParallel(start_addr, count : int) : Block[] is
        var fblock : (future Block)[];

        for i in range(0, count) do
                -- This does not block
                fblock[i] = AReadBlock(start_addr + i);
        end

        -- All reads are now being executed in parallel.

        -- Each array element is implicitly converted from future Block to Block.
        -- This blocks until all data has been received.
        return fblock;
end

String language specification

It has been suggested by Pavel Rimsky that very often string literals in a program contain data in some machine readable language (e.g. format strings, SQL statements) or references to external resources. It might be useful to be able to somehow specify this in the program, so that external tools could recognize and work with these for purposes such as syntax checking, refactoring, etc.

Note: That means identifying the language the string contains. Defining any properties (e.g. syntax) of the language the string contains is out of scope!

One typical example here is a formatting function. The format string argument is in a well-defined language. Here it would be useful to specify language of the formal argument. With any use of such function the compilation tools could try to verify the real argument. Similarly we might to specify language for a member variable.

A different approach is specifying language of a string literal in situ. This is reminiscent of (X)HTML which allows embedding pieces of code written in different languages (e.g. CSS, ECMAscript) while specifying the external language using its MIME type, or language constructs such as "extern C".

Both approaches could be combined.

TODO Consider where language annotations would be useful and how they should be realized lexically and syntactically. (Must look pretty!)

Miscellaneous ideas

These ideas are considered for inclusion (but need not be included).

Member pointers

Delegates identify the object instance and method to be called (but not the arguments). Conversely, member pointers identify the method to be called, but not the arguments and not the object instance (It can be invoked on any object which is instace of a given class). This feature comes from C++.

True inner classes

A true inner class is non-static in the sense that any instance of this class implicitly contains a reference to some instance of the outer class. Thus the inner class is constructed in non-static context (in context of an object instance) and the outer object can be referenced via a keyword.

Explicit interface implementation

This allows a class to optionally specify explicitly the interface from which it is implementing a method. The benefit is that if a class implements two interfaces IA and IB both requiring a method foo with the same name (this is a name conflict), the implementation can be different for each interface. This feature is present in C#.

Output function arguments

Semantically equivalent to additional return values of a function. A convenient way to return multiple values (especially since Sysel does not have tuples).

Built-in maps/sets/relations

Maps, sets and relations are very useful constructs (similar to their set-theory equivalents). Without such types relating two sets of objects means putting links in the objects themselves for each relation the objects participates in (similar to HelenOS ADT's link_t), which can be highly inconvenient.

Without support in the language core these can be difficult to implement. In C#, for example, they are based on a (library-implementable) hash table and rely on a hash function. This is not a very good design. Providing a stable (for the life of a process) hash function based on object identity is problematic. It cannot be based on object address since that would prohibit a moving garbage collector. Mono generates a hash upon object allocation and stores it along with the object. This, obviously, leads to wasting memory.

With maps, sets or relations in the language core, on the contrary, identity can be established solely upon relationships between the objects. No stable hashes are needed. The moving GC is not a problem because when moving an object it makes sure that all references to that object are updated.

This feature is present in D language core, for example.

Last modified 13 years ago Last modified on 2011-03-28T19:46:30Z
Note: See TracWiki for help on using the wiki.