wiki:Sysel

Version 41 (modified by Jiri Svoboda, 13 years ago) ( diff )

Add a big disclaimer to stop all the confusion.

Sysel

An effort to design a high-level programming language suitable for writing HelenOS severs and applications.

Note: Although the wiki is hosted here at helenos.org, Sysel is an independent and purely experimental project developed by Jiří Svoboda. While code from Sysel project runs on HelenOS, the HelenOS project made, as of the time of writing, no decision or commitment to using Sysel for any particular purpose. The implementation language for HelenOS remains C. HelenOS does not and should not depend on Sysel for its functionality.

Note that Sysel syntax is not finalized. Some important language features are missing at the moment (especially visibility control and packaging) so the examples presented will need to change when these are implemented.

Roadmap

Sub-project name Status Description
Sysel Bootstrap Interpreter (SBI) Mostly Interpreter of Sysel written in C. Runs in HelenOS and POSIX.
Sysel Compiler Toolkit (NNPS) In progress Modular compiler of Sysel written in Sysel itself. To produce C and/or LLVM IR.

SBI

SBI is an interpreter of Sysel. It is available stand-alone for POSIX or bundled with HelenOS (only in Bazaar repository, not yet in a stable release). You can run it with the command "sbi source_file.sy". Demos that you can run are available in /src/sysel/demos. Source files comprising the library are in /src/sysel/lib.

You can also run sbi without parameters to enter interactive mode.

SBI still has some missing features, but covers enough of the language to start development of NNPS.

Synopsis of current SBI features

  • Primitive types: bool, char, int, string
  • Compound types: class, multi-dimensional array
  • Other types: delegates, enumerations
  • Objective features: constructors, inheritance, grandfather class, static and non-static method invocation
  • Interfaces
  • Static functions, static member variables, static properties
  • Syntactic sugar: variadic functions, accessor methods (named and indexed properties), autoboxing
  • Arithmetic: big integers, addition, subtraction, multiplication, boolean operators
  • Static type checking (mostly), generic classes (unconstrained), exception handling
  • Bindings: Text file I/O, WriteLine, Exec

Missing SBI features

More important:

  • Access control
  • Method overloading (rejected)
  • Code organization (packages and modules)
  • Explicit overriding (virtual, override)
  • Property overriding

Less important:

  • Division
  • Structs
  • Working with binary data
  • Generic type constraints
  • Operator overloading

Janitorial tasks

  • Add cspan to all error and warning messages.
  • Most run-time errors should have been caught during static checking. They need to be reviewed, effectiveness of static checking verified and run-time errors converted to asserts.
  • All errors should be handled gracefully. Calls to exit() must be eliminated.

How SBI works

SBI first takes all the code in the library and all the source files provided on the command-line and pre-processes them in several stages:

Parsing Lex and parse source files to produce a syntax tree
Ancestry resolution Determine ancestry of classes and interfaces
Typing Annotate syntax tree with static types and make all type conversions explicit

The result is a syntax tree with symbol references resolved, annotated with static types and augmented so that type all conversions are explicit. This syntax tree is considered the program, it is treated as read-only for the purpose of execution.

SBI has a concept of a runner object (run_t) which is sort of similar to a process. It has a reference to the code that should be executed, to global/shared state (i.e. the heap) and to the thread(s). (There is only one thread currently, anyway.) A thread has its own private state consisting of a stack (of procedure activation records) and error/exception state.

Data is managed using a system of interlinked structures — rdata nodes. rdata_var nodes are used to implement both variables (addressable memory nodes that can be read and written) and values. Values are immutable. so they can be copied just by copying the pointer to them. Values can be written to or read from variables.

The equivalent of a data pointer in the rdata system is an address. An address can refer both to a variable or a to property. Reading from or writing to a property (using its address) causes its getter or setter to be invoked. The equivalent of a code pointer is a delegate (this is not the same delegate as the language construct), which refers to a symbol and, optionally, to an object instance on which the symbol should be invoked.

To implement L-values and R-values, an item is the result of evaluating an expression. An item can be either an address item (L-value) or a value item (R-value).

NNPS

NNPS (Nativní Nástroje pro Překlad Syslu, en: Native Sysel Compilation Toolkit) is a prospective toolkit written in Sysel itself that should allow compiling Sysel to binary form (machine code). Currently it is in development. The current plan is to only implement it as a front end, transforming Sysel into low-level — but machine-neutral — IR. Most likely the first available output option should be C (used as if it were a machine-independent assembly) and the second LLVM IR. The native in NNPS means it is written in Sysel itself (i.e. it should be also self-hosting).

Ideally NNPS should compile natively in POSIX, cross-compile from POSIX to HelenOS and eventually compile natively in HelenOS. The eventually is there because an appropriate backend (i.e. a C compiler) needs to be ported to HelenOS before native compilation is feasible.

NNPS will be bootstrapped using SBI. That is by running SBI(NNPS(NNPS)) we will obtain a binary version of NNPS. This process will presumably require 'significant' computing resources since SBI is rather slow and consumes a lot of memory. Once compiled to binary form, NNPS should be much more modest.

Currently the NNPS lexer and skeleton parser has been implemented (it verifies that input is syntactically valid, but nothing else). I am now focusing on developing NNPS while simultaneously improving SBI where needed.

NNPS shall process the code in several separate stages. The first few are common with SBI:

Parsing Lex and parse source files to produce a syntax tree
Ancestry resolution Determine ancestry of classes and interfaces
Typing Annotate syntax tree with static types and make all type conversions explicit

While the remaining are specific to NNPS:

Code lowering A.k.a. code generation. Produce CFG with linear blocks of instructions. Implements/eliminates structured code and OO features.
Data lowering Implements/eliminates structured data, strings, big integers. We get a CFG again, but with a different instruction set.
Output translation Conversion to the desired output format (LLVM IR, C). Straightforward.

From the code lowering phase we obtain a CFG where the instructions operate on structured data (objects, arrays), but the code is strictly procedural (functions, but no methods, no inheritance). The data lowering phase translates these instructions into another instruction set that is more like an abstract CPU instruction (or LLVM IR). Thus in this phase, we need to implement the objects, arrays, strings and big integers. The output translation should be a simple 1-1 translation.

Interesting reading material:

Note: See TracWiki for help on using the wiki.