wiki:Sysel

Context Navigation

Version 40 (modified by Jiri Svoboda, 13 years ago) ( diff )
Small clarification

Sysel

An effort to design a high-level programming language for writing HelenOS severs and applications.

Quick Links

Note that Sysel syntax is not finalized. Some important language features are missing at the moment (especially visibility control and packaging) so the examples presented will need to change when these are implemented.

Roadmap

Sub-project name	Status	Description
Sysel Bootstrap Interpreter (SBI)	Mostly	Interpreter of Sysel written in C. Runs in HelenOS and POSIX.
Sysel Compiler Toolkit (NNPS)	In progress	Modular compiler of Sysel written in Sysel itself. To produce C and/or LLVM IR.

SBI

SBI is an interpreter of Sysel. It is available stand-alone for POSIX or bundled with HelenOS (only in Bazaar repository, not yet in a stable release). You can run it with the command "sbi source_file.sy". Demos that you can run are available in /src/sysel/demos. Source files comprising the library are in /src/sysel/lib.

You can also run sbi without parameters to enter interactive mode.

SBI still has some missing features, but covers enough of the language to start development of NNPS.

Synopsis of current SBI features

Primitive types: bool, char, int, string
Compound types: class, multi-dimensional array
Other types: delegates, enumerations
Objective features: constructors, inheritance, grandfather class, static and non-static method invocation
Interfaces
Static functions, static member variables, static properties
Syntactic sugar: variadic functions, accessor methods (named and indexed properties), autoboxing
Arithmetic: big integers, addition, subtraction, multiplication, boolean operators
Static type checking (mostly), generic classes (unconstrained), exception handling
Bindings: Text file I/O, WriteLine, Exec

Missing SBI features

More important:

Access control
~~Method overloading~~ (rejected)
Code organization (packages and modules)
Explicit overriding (virtual, override)
Property overriding

Less important:

Division
Structs
Working with binary data
Generic type constraints
Operator overloading

Janitorial tasks

Add cspan to all error and warning messages.
Most run-time errors should have been caught during static checking. They need to be reviewed, effectiveness of static checking verified and run-time errors converted to asserts.
All errors should be handled gracefully. Calls to exit() must be eliminated.

How SBI works

SBI first takes all the code in the library and all the source files provided on the command-line and pre-processes them in several stages:

Parsing	Lex and parse source files to produce a syntax tree
Ancestry resolution	Determine ancestry of classes and interfaces
Typing	Annotate syntax tree with static types and make all type conversions explicit

The result is a syntax tree with symbol references resolved, annotated with static types and augmented so that type all conversions are explicit. This syntax tree is considered the program, it is treated as read-only for the purpose of execution.

SBI has a concept of a runner object (run_t) which is sort of similar to a process. It has a reference to the code that should be executed, to global/shared state (i.e. the heap) and to the thread(s). (There is only one thread currently, anyway.) A thread has its own private state consisting of a stack (of procedure activation records) and error/exception state.

Data is managed using a system of interlinked structures — rdata nodes. rdata_var nodes are used to implement both variables (addressable memory nodes that can be read and written) and values. Values are immutable. so they can be copied just by copying the pointer to them. Values can be written to or read from variables.

The equivalent of a data pointer in the rdata system is an address. An address can refer both to a variable or a to property. Reading from or writing to a property (using its address) causes its getter or setter to be invoked. The equivalent of a code pointer is a delegate (this is not the same delegate as the language construct), which refers to a symbol and, optionally, to an object instance on which the symbol should be invoked.

To implement L-values and R-values, an item is the result of evaluating an expression. An item can be either an address item (L-value) or a value item (R-value).

NNPS

NNPS (Nativní Nástroje pro Překlad Syslu, en: Native Sysel Compilation Toolkit) is a prospective toolkit written in Sysel itself that should allow compiling Sysel to binary form (machine code). Currently it is in development. The current plan is to only implement it as a front end, transforming Sysel into low-level — but machine-neutral — IR. Most likely the first available output option should be C (used as if it were a machine-independent assembly) and the second LLVM IR. The native in NNPS means it is written in Sysel itself (i.e. it should be also self-hosting).

Ideally NNPS should compile natively in POSIX, cross-compile from POSIX to HelenOS and eventually compile natively in HelenOS. The eventually is there because an appropriate backend (i.e. a C compiler) needs to be ported to HelenOS before native compilation is feasible.

NNPS will be bootstrapped using SBI. That is by running SBI(NNPS(NNPS)) we will obtain a binary version of NNPS. This process will presumably require 'significant' computing resources since SBI is rather slow and consumes a lot of memory. Once compiled to binary form, NNPS should be much more modest.

Currently the NNPS lexer and skeleton parser has been implemented (it verifies that input is syntactically valid, but nothing else). I am now focusing on developing NNPS while simultaneously improving SBI where needed.

NNPS shall process the code in several separate stages. The first few are common with SBI:

Parsing	Lex and parse source files to produce a syntax tree
Ancestry resolution	Determine ancestry of classes and interfaces
Typing	Annotate syntax tree with static types and make all type conversions explicit

While the remaining are specific to NNPS:

Code lowering	A.k.a. code generation. Produce CFG with linear blocks of instructions. Implements/eliminates structured code and OO features.
Data lowering	Implements/eliminates structured data, strings, big integers. We get a CFG again, but with a different instruction set.
Output translation	Conversion to the desired output format (LLVM IR, C). Straightforward.

From the code lowering phase we obtain a CFG where the instructions operate on structured data (objects, arrays), but the code is strictly procedural (functions, but no methods, no inheritance). The data lowering phase translates these instructions into another instruction set that is more like an abstract CPU instruction (or LLVM IR). Thus in this phase, we need to implement the objects, arrays, strings and big integers. The output translation should be a simple 1-1 translation.