wiki:Bithenge/Syntax

Version 1 (modified by Sean Bartell, 12 years ago) ( diff )

Bithenge Syntax

This page describes the syntax and semantics of Bithenge scripts.

Simple example

transform item(scale) = struct {
    .name_len <- uint8;
    .name <- ascii <- known_length(.name_len);
    .value <- (in * scale) <- uint32le;
};

transform main = repeat {item(100)};

More complicated examples can be found in uspace/dist/src/bithenge.

Tokenization

Bithenge scripts consist of ASCII text. When a script is read, it is first divided into pieces known as tokens. As in most programming languages, the longest possible token is read, so == is always a single token rather than two. Whitespace (space, form‐feed, newline, carriage return, horizontal tab, and vertical tab) and comments (# until newline) are allowed between tokens; both LF (Unix) and CRLF (DOS) newline conventions can be used.

There are several types of tokens:

Symbols
&& ++ == >= // <- <= != || < > = + - * % ; : { } ( ) .
Keywords
do else false if in partial repeat struct switch transform true while
Identifiers
Any sequence of letters, digits, and underscores, beginning with a letter, that is not a keyword.
Integer literals
A sequence of decimal digits; negative literals are not yet possible.

Keywords and identifiers are case‐sensitive. Tokens can be at most 256 characters long.

Scripts

  • script → *definition
  • definition → "transform" identifier [parameters] "=" transform ";"
  • parameters → "(" [identifier *("," identifier)] ")"

A Bithenge script consists of a series of transform definitions. It must define a transform named main, which is applied when the script is used. Transforms may be defined with parameters, in which case corresponding arguments must be provided when the transform is invoked.

Examples:

  • transform main = uint8; defines an extremely simple main transform.
  • transform is_odd(val) = (val % 2 == 1); defines a transform that checks whether its argument is odd.

Transform invocation

  • transform → identifier [arguments]
  • arguments → "(" [ expression *("," expression) ] ")"

A built‐in or previously defined transform can be invoked, with arguments if necessary.

Examples:

  • is_odd(17) is a transform that applies is_odd with 17 as the argument.

Transform composition

  • transform → transform "<-" transform

A transform can be created by composing two existing transforms. It will work by first applying the right transform to the input, then applying the left transform to the result.

Examples:

  • ascii <- zero_terminated decodes a zero‐terminated ASCII string.
  • repeat{uint32le} <- known_length(8) decodes two 32‐bit integers.

Expression transforms

  • transform → "(" expression ")"

Expression transforms produce their output by calculating an expression. If "in" is used in the expression, it refers to the expression transform’s input; otherwise, the input must be an empty blob. In either case, the parentheses are mandatory.

Examples:

  • (in + 1) adds 1 to its input.
  • (.bytes_per_sector * .sectors_per_cluster) calculates the number of bytes per cluster.

Struct transforms

  • transform → "struct" "{" *struct-field "}"
  • struct-field → [ "." identifier ] "<-" transform ";"

A struct transform applies each of its subtransforms in sequence to the input blob. The result is an internal node, with the specified key used for the result of each subtransform. If a subtransform has no corresponding key, its result must be an internal node; the result’s keys will be merged into the struct transform’s result.

It must be possible to determine the size of the subblob each subtransform consumes. Bithenge currently only handles the simple cases: it knows uint32le needs 4 bytes, but not that known_length(2) <- repeat{uint32le} needs 8 bytes.

Example:

struct {
    .min <- uint32le;
    .max <- uint32le;
    .mid <- ((.min + .max) // 2);
    <- repeat(3) {uint8};
}

One possible output is {"min": 0, "max": 8, "mid": 4, 0: 127, 1: 126, 2: 125}.

Partial transforms

  • transform → "partial" [ "(" expression ")" ] "{" transform "}"

A partial transform applies its subtransform to a prefix of a blob. An expression can be given to specify the offset within the blob at which the subtransform is applied.

Examples:

  • partial {uint8} decodes the first byte of the arbitrary‐length input as an integer.
  • partial(1) {uint8} decodes the second byte of the arbitrary‐length input as an integer.

Conditional transforms

  • transform → "if" "(" expression ")" "{" transform "}" "else" "{" transform "}"
  • struct-field → "if" "(" expression ")" "{" *struct-field "}" [ "else" "{" *struct-field "}" ]

An if transform applies its first subtransform if the expression evaluates to true, and the second subtransform if the expression evaluates to false. A second form can be used in struct transforms: struct { ... if (...) {...} ... } is equivalent to struct { ... <- if (...) { struct { ... } } else { struct { } }; ... }.

  • transform → "switch" "(" expression ")" "{" *( (expression/"else") ":" transform ";" ) "}"
  • struct-field → "switch" "(" expression ")" "{" *( (expression/"else") ":" "{" *struct-field "}" ";" ) "}"

A switch transform compares the first expression to each of the case expressions in turn, stopping when they compare equal and using that subtransform. The last case can be "else", which will always be used if none of the previous cases matched. The second form can be used in struct transforms, much like the second form of "if".

Examples:

  • if (little_endian) { uint32le } else { uint32be } decodes a little‐ or big‐endian integer, depending on little_endian.
  • struct { switch (.type) { 0: {.x <- uint8; .y <- uint8;}; 1: {.val <- uint16;}; } } decodes two 8‐bit integers or one 16‐bit integer depending on the value of .type.

Repetition transforms

  • transform → "repeat" [ "(" expression ")" ] "{" transform "}"

A repeat transform applies its subtransform repeatedly to sequential subblobs of the input. The output is an internal node with a member for each repetition, with keys starting at 0. If an expression is given, it determines the number of times to repeat; otherwise, the subtransform is applied until it fails or there is no input remaining.

  • transform → "do" "{" transform "}" "while" "(" expression ")"

A do/while transform applies its subtransform repeatedly until the expression evaluates to false. The output has the same format as repeat transform output. Scope member expressions should be used in the expression to access the output of the subtransform and determine when to stop.

Examples:

  • repeat{uint8} decodes each byte as an integer.
  • repeat(256) {uint32le} decodes an array of 256 integers.
  • do { struct { .type <- uint8; .value <- uint32le; } } while (.type != 0) decodes types and values, stopping after a type of 0 is seen.

Expressions

  • expression → "true"
  • expression → "false"
  • expression → integer

Expressions can be boolean or integer literals.

  • expression → identifier

Expressions can use parameters.

  • expression → "in"

In an expression transform, "in" refers to the transform’s input. Otherwise, it refers to the input of the whole transform in the definition.

  • expression → "." identifier

A scope member expression looks in each containing struct transform for a member with the given key. It starts from the innermost struct transform and searches as far as the whole transform being defined.

  • expression → expression "." identifier
  • expression → expression "[" expression "]"

An expression can look for a member of another expression with a given key.

  • expression → expression "[" expression ":" "]"
  • expression → expression "[" expression ":" expression "]"
  • expression → expression "[" expression "," expression "]"

An expression can create a subblob of another expression. The first expression specifies the blob and the second expression specifies the start offset within the blob. If a comma is used, the third expression specifies the length of the subblob; otherwise it specifies the end offset. If no third expression is given, the subblob extends to the end of the blob.

  • expression → "(" expression ")"
  • expression → expression operator expression

Expressions can, of course, use binary operators. The operators and their precedences are as follows:

Operator Precedence Associativity Description
"." 5 Left‐to‐right Member (see above)
"[]" 5 Left‐to‐right Member or subblob (see above)
"*" 4 Left‐to‐right Integer multiplication
"//" 4 Left‐to‐right Floored/euclidean integer division; divisor must be positive
"%" 4 Left‐to‐right Floored/euclidean modulo operation; divisor must be positive
"+" 3 Left‐to‐right Integer addition
"-" 3 Left‐to‐right Integer subtraction
"++" 3 Left‐to‐right Blob concatenation
"<" 2 Left‐to‐right Integer less‐than
"<=" 2 Left‐to‐right Integer less‐than‐or‐equal
">" 2 Left‐to‐right Integer greater‐than
">=" 2 Left‐to‐right Integer greater‐than‐or‐equal
"==" 1 Left‐to‐right Equal‐to (not supported for internal nodes)
"!=" 1 Left‐to‐right Unequal‐to (not supported for internal nodes)
"&&" 0 Left‐to‐right Logical or
"||" 0 Left‐to‐right Logical and

Built‐in transforms

These transforms are implemented in C and included with Bithenge. Note that precise names are preferred; scripts can define shorter aliases if necessary.

name input output description example
ascii byte blob node string decodes some bytes as ASCII characters hex:6869 becomes "hi"
bit 1‐bit blob node boolean decodes a single bit 1 becomes true
bits_be byte blob node bit blob node decodes bytes as bits, starting with the most‐significant bit hex:0f becomes bit:00001111
bits_le byte blob node bit blob node decodes bytes as bits, starting with the least‐significant bit hex:0f becomes bit:11110000
known_length(len) blob node blob node requires the input to have a known length
nonzero_boolean integer boolean decodes a boolean where nonzero values are true 0 becomes false
uint8 1‐byte blob node integer node decodes a 1‐byte unsigned integer hex:11 becomes 17
uint16be 2‐byte blob node integer node decodes a 2‐byte big‐endian unsigned integer hex:0201 becomes 513
uint16le 2‐byte blob node integer node decodes a 2‐byte little‐endian unsigned integer hex:0101 becomes 257
uint32be 4‐byte blob node integer node decodes a 4‐byte big‐endian unsigned integer hex:00000201 becomes 513
uint32le 4‐byte blob node integer node decodes a 4‐byte little‐endian unsigned integer hex:01010000 becomes 257
uint64be 8‐byte blob node integer node decodes a 8‐byte big‐endian unsigned integer hex:0000000000000201 becomes 513
uint64le 8‐byte blob node integer node decodes a 8‐byte little‐endian unsigned integer hex:0101000000000000 becomes 257
uint_be(len) bit blob node integer node decodes bits as an unsigned integer, starting with the most‐significant bit
uint_le(len) bit blob node integer node decodes bits as an unsigned integer, starting with the least‐significant bit
zero_terminated byte blob node byte blob nodetakes bytes up until the first 00 hex:7f0400 becomes hex:7f04
Note: See TracWiki for help on using the wiki.