Sysel
An effort to design a high-level programming language suitable for writing HelenOS severs and applications.
Note: Although the wiki is hosted here at helenos.org, Sysel is an independent and purely experimental project developed by Jiří Svoboda. While code from Sysel project runs on HelenOS, the HelenOS project made, as of the time of writing, no decision or commitment to using Sysel for any particular purpose. The implementation language for HelenOS remains C. HelenOS does not and should not depend on Sysel for its functionality.
Quick Links
- Sysel project at launchpad
- Sysel demos in HelenOS repo (real working Sysel code)
- WIP NNPS sources at Launchpad (real working Sysel code)
- Description of planned features and ideas
Note that Sysel syntax is not finalized. Some important language features are missing at the moment (especially visibility control and packaging) so the examples presented will need to change when these are implemented.
Roadmap
Sub-project name | Status | Description |
Sysel Bootstrap Interpreter (SBI) | Mostly | Interpreter of Sysel written in C. Runs in HelenOS and POSIX. |
Sysel Compiler Toolkit (NNPS) | In progress | Modular compiler of Sysel written in Sysel itself. To produce C and/or LLVM IR. |
SBI
SBI is an interpreter of Sysel. It is available stand-alone for POSIX or bundled with HelenOS (only in Bazaar repository, not yet in a stable release). You can run it with the command "sbi
source_file.sy". Demos that you can run are available in /src/sysel/demos
. Source files comprising the library are in /src/sysel/lib
.
You can also run sbi
without parameters to enter interactive mode.
SBI still has some missing features, but covers enough of the language to start development of NNPS.
Synopsis of current SBI features
- Primitive types:
bool
,char
,int
,string
- Compound types: class, multi-dimensional array
- Other types: delegates, enumerations
- Objective features: constructors, inheritance, grandfather class, static and non-static method invocation
- Interfaces
- Static functions, static member variables, static properties
- Syntactic sugar: variadic functions, accessor methods (named and indexed properties), autoboxing
- Arithmetic: big integers, addition, subtraction, multiplication, boolean operators
- Static type checking (mostly), generic classes (unconstrained), exception handling
- Bindings: Text file I/O,
WriteLine
,Exec
Missing SBI features
More important:
- Access control
Method overloading(rejected)- Code organization (packages and modules)
- Explicit overriding (
virtual
,override
) - Property overriding
Less important:
- Division
- Structs
- Working with binary data
- Generic type constraints
- Operator overloading
Janitorial tasks
- Add cspan to all error and warning messages.
- Most run-time errors should have been caught during static checking. They need to be reviewed, effectiveness of static checking verified and run-time errors converted to asserts.
- All errors should be handled gracefully. Calls to
exit()
must be eliminated.
How SBI works
SBI first takes all the code in the library and all the source files provided on the command-line and pre-processes them in several stages:
Parsing | Lex and parse source files to produce a syntax tree |
Ancestry resolution | Determine ancestry of classes and interfaces |
Typing | Annotate syntax tree with static types and make all type conversions explicit |
The result is a syntax tree with symbol references resolved, annotated with static types and augmented so that type all conversions are explicit. This syntax tree is considered the program, it is treated as read-only for the purpose of execution.
SBI has a concept of a runner object (run_t
) which is sort of similar to a process. It has a reference to the code
that should be executed, to global/shared state (i.e. the heap) and to the thread(s). (There is only one thread
currently, anyway.) A thread has its own private state consisting of a stack (of procedure
activation records) and error/exception state.
Data is managed using a system of interlinked structures — rdata
nodes. rdata_var
nodes are used to implement
both variables (addressable memory nodes that can be read and written) and values. Values are immutable.
so they can be copied just by copying the pointer to them. Values can be written to or read from variables.
The equivalent of a data pointer in the rdata
system is an address. An address can refer both to a variable
or a to property. Reading from or writing to a property (using its address) causes its getter or setter to be invoked.
The equivalent of a code pointer is a delegate (this is not the same delegate as the language construct), which
refers to a symbol and, optionally, to an object instance on which the symbol should be invoked.
To implement L-values and R-values, an item is the result of evaluating an expression. An item can be either an address item (L-value) or a value item (R-value).
NNPS
NNPS (Nativní Nástroje pro Překlad Syslu, en: Native Sysel Compilation Toolkit) is a prospective toolkit written in Sysel itself that should allow compiling Sysel to binary form (machine code). Currently it is in development. The current plan is to only implement it as a front end, transforming Sysel into low-level — but machine-neutral — IR. Most likely the first available output option should be C (used as if it were a machine-independent assembly) and the second LLVM IR. The native in NNPS means it is written in Sysel itself (i.e. it should be also self-hosting).
Ideally NNPS should compile natively in POSIX, cross-compile from POSIX to HelenOS and eventually compile natively in HelenOS. The eventually is there because an appropriate backend (i.e. a C compiler) needs to be ported to HelenOS before native compilation is feasible.
NNPS will be bootstrapped using SBI. That is by running SBI(NNPS(NNPS)) we will obtain a binary version of NNPS. This process will presumably require 'significant' computing resources since SBI is rather slow and consumes a lot of memory. Once compiled to binary form, NNPS should be much more modest.
Currently the NNPS lexer and skeleton parser has been implemented (it verifies that input is syntactically valid, but nothing else). I am now focusing on developing NNPS while simultaneously improving SBI where needed.
NNPS shall process the code in several separate stages. The first few are common with SBI:
Parsing | Lex and parse source files to produce a syntax tree |
Ancestry resolution | Determine ancestry of classes and interfaces |
Typing | Annotate syntax tree with static types and make all type conversions explicit |
While the remaining are specific to NNPS:
Code lowering | A.k.a. code generation. Produce CFG with linear blocks of instructions. Implements/eliminates structured code and OO features. |
Data lowering | Implements/eliminates structured data, strings, big integers. We get a CFG again, but with a different instruction set. |
Output translation | Conversion to the desired output format (LLVM IR, C). Straightforward. |
From the code lowering phase we obtain a CFG where the instructions operate on structured data (objects, arrays), but the code is strictly procedural (functions, but no methods, no inheritance). The data lowering phase translates these instructions into another instruction set that is more like an abstract CPU instruction (or LLVM IR). Thus in this phase, we need to implement the objects, arrays, strings and big integers. The output translation should be a simple 1-1 translation.
Other Links
Interesting reading material: