| 1 | = Bithenge Syntax = |
| 2 | |
| 3 | [[PageOutline(2-3)]] |
| 4 | |
| 5 | This page describes the syntax and semantics of [[Bithenge]] scripts. |
| 6 | |
| 7 | == Simple example == |
| 8 | |
| 9 | {{{ |
| 10 | transform item(scale) = struct { |
| 11 | .name_len <- uint8; |
| 12 | .name <- ascii <- known_length(.name_len); |
| 13 | .value <- (in * scale) <- uint32le; |
| 14 | }; |
| 15 | |
| 16 | transform main = repeat {item(100)}; |
| 17 | }}} |
| 18 | |
| 19 | More complicated examples can be found in |
| 20 | [https://bazaar.launchpad.net/~wtachi/helenos/bithenge/files/head:/uspace/dist/src/bithenge uspace/dist/src/bithenge]. |
| 21 | |
| 22 | == Tokenization == |
| 23 | |
| 24 | Bithenge scripts consist of ASCII text. When a script is read, it is first |
| 25 | divided into pieces known as ''tokens''. As in most programming languages, the |
| 26 | longest possible token is read, so `==` is always a single token rather than |
| 27 | two. Whitespace (space, form‐feed, newline, carriage return, horizontal tab, |
| 28 | and vertical tab) and comments (`#` until newline) are allowed between tokens; |
| 29 | both LF (Unix) and CRLF (DOS) newline conventions can be used. |
| 30 | |
| 31 | There are several types of tokens: |
| 32 | Symbols:: `&& ++ == >= // <- <= != || < > = + - * % ; : { } ( ) .` |
| 33 | Keywords:: `do else false if in partial repeat struct switch transform true while` |
| 34 | Identifiers:: Any sequence of letters, digits, and underscores, beginning with a letter, that is not a keyword. |
| 35 | Integer literals:: A sequence of decimal digits; negative literals are not yet possible. |
| 36 | |
| 37 | Keywords and identifiers are case‐sensitive. Tokens can be at most |
| 38 | 256 characters long. |
| 39 | |
| 40 | == Scripts == |
| 41 | |
| 42 | - script → *definition |
| 43 | - definition → "`transform`" identifier [parameters] "`=`" transform "`;`" |
| 44 | - parameters → "`(`" [identifier *("`,`" identifier)] "`)`" |
| 45 | |
| 46 | A Bithenge script consists of a series of transform definitions. It must define |
| 47 | a transform named `main`, which is applied when the script is used. Transforms |
| 48 | may be defined with parameters, in which case corresponding arguments must be |
| 49 | provided when the transform is invoked. |
| 50 | |
| 51 | Examples: |
| 52 | - `transform main = uint8;` defines an extremely simple main transform. |
| 53 | - `transform is_odd(val) = (val % 2 == 1);` defines a transform that checks |
| 54 | whether its argument is odd. |
| 55 | |
| 56 | == Transform invocation == |
| 57 | |
| 58 | - transform → identifier [arguments] |
| 59 | - arguments → "`(`" [ expression *("`,`" expression) ] "`)`" |
| 60 | |
| 61 | A built‐in or previously defined transform can be invoked, with arguments if |
| 62 | necessary. |
| 63 | |
| 64 | Examples: |
| 65 | - `is_odd(17)` is a transform that applies `is_odd` with 17 as the argument. |
| 66 | |
| 67 | == Transform composition == |
| 68 | |
| 69 | - transform → transform "`<-`" transform |
| 70 | |
| 71 | A transform can be created by composing two existing transforms. It will work |
| 72 | by first applying the right transform to the input, then applying the left |
| 73 | transform to the result. |
| 74 | |
| 75 | Examples: |
| 76 | - `ascii <- zero_terminated` decodes a zero‐terminated ASCII string. |
| 77 | - `repeat{uint32le} <- known_length(8)` decodes two 32‐bit integers. |
| 78 | |
| 79 | == Expression transforms == |
| 80 | |
| 81 | - transform → "`(`" expression "`)`" |
| 82 | |
| 83 | Expression transforms produce their output by calculating an expression. If |
| 84 | "`in`" is used in the expression, it refers to the expression transform’s |
| 85 | input; otherwise, the input must be an empty blob. In either case, the |
| 86 | parentheses are mandatory. |
| 87 | |
| 88 | Examples: |
| 89 | - `(in + 1)` adds 1 to its input. |
| 90 | - `(.bytes_per_sector * .sectors_per_cluster)` calculates the number of bytes |
| 91 | per cluster. |
| 92 | |
| 93 | == Struct transforms == |
| 94 | |
| 95 | - transform → "`struct`" "`{`" *struct-field "`}`" |
| 96 | - struct-field → [ "`.`" identifier ] "`<-`" transform "`;`" |
| 97 | |
| 98 | A struct transform applies each of its subtransforms in sequence to the input |
| 99 | blob. The result is an internal node, with the specified key used for the |
| 100 | result of each subtransform. If a subtransform has no corresponding key, its |
| 101 | result must be an internal node; the result’s keys will be merged into the |
| 102 | struct transform’s result. |
| 103 | |
| 104 | It must be possible to determine the size of the subblob each subtransform |
| 105 | consumes. Bithenge currently only handles the simple cases: it knows `uint32le` |
| 106 | needs 4 bytes, but not that `known_length(2) <- repeat{uint32le}` needs 8 |
| 107 | bytes. |
| 108 | |
| 109 | Example: |
| 110 | {{{ |
| 111 | struct { |
| 112 | .min <- uint32le; |
| 113 | .max <- uint32le; |
| 114 | .mid <- ((.min + .max) // 2); |
| 115 | <- repeat(3) {uint8}; |
| 116 | } |
| 117 | }}} |
| 118 | One possible output is `{"min": 0, "max": 8, "mid": 4, 0: 127, 1: 126, 2: 125}`. |
| 119 | |
| 120 | == Partial transforms == |
| 121 | |
| 122 | - transform → "`partial`" [ "`(`" expression "`)`" ] "`{`" transform "`}`" |
| 123 | |
| 124 | A partial transform applies its subtransform to a prefix of a blob. An |
| 125 | expression can be given to specify the offset within the blob at which the |
| 126 | subtransform is applied. |
| 127 | |
| 128 | Examples: |
| 129 | - `partial {uint8}` decodes the first byte of the arbitrary‐length input as an |
| 130 | integer. |
| 131 | - `partial(1) {uint8}` decodes the second byte of the arbitrary‐length input as |
| 132 | an integer. |
| 133 | |
| 134 | == Conditional transforms == |
| 135 | |
| 136 | - transform → "`if`" "`(`" expression "`)`" "`{`" transform "`}`" "`else`" "`{`" transform "`}`" |
| 137 | - struct-field → "`if`" "`(`" expression "`)`" "`{`" *struct-field "`}`" [ "`else`" "`{`" *struct-field "`}`" ] |
| 138 | |
| 139 | An if transform applies its first subtransform if the expression evaluates to |
| 140 | true, and the second subtransform if the expression evaluates to false. A |
| 141 | second form can be used in struct transforms: |
| 142 | `struct { ... if (...) {...} ... }` is equivalent to |
| 143 | `struct { ... <- if (...) { struct { ... } } else { struct { } }; ... }`. |
| 144 | |
| 145 | - transform → "`switch`" "`(`" expression "`)`" "`{`" *( (expression/"`else`") "`:`" transform "`;`" ) "`}`" |
| 146 | - struct-field → "`switch`" "`(`" expression "`)`" "`{`" *( (expression/"`else`") "`:`" "`{`" *struct-field "`}`" "`;`" ) "`}`" |
| 147 | |
| 148 | A switch transform compares the first expression to each of the case |
| 149 | expressions in turn, stopping when they compare equal and using that |
| 150 | subtransform. The last case can be "`else`", which will always be used if none |
| 151 | of the previous cases matched. The second form can be used in struct |
| 152 | transforms, much like the second form of "`if`". |
| 153 | |
| 154 | Examples: |
| 155 | - `if (little_endian) { uint32le } else { uint32be }` decodes a little‐ or |
| 156 | big‐endian integer, depending on `little_endian`. |
| 157 | - `struct { switch (.type) { 0: {.x <- uint8; .y <- uint8;}; 1: {.val <- uint16;}; } }` |
| 158 | decodes two 8‐bit integers or one 16‐bit integer depending on the value of |
| 159 | `.type`. |
| 160 | |
| 161 | == Repetition transforms == |
| 162 | |
| 163 | - transform → "`repeat`" [ "`(`" expression "`)`" ] "`{`" transform "`}`" |
| 164 | |
| 165 | A repeat transform applies its subtransform repeatedly to sequential subblobs |
| 166 | of the input. The output is an internal node with a member for each repetition, |
| 167 | with keys starting at `0`. If an expression is given, it determines the number |
| 168 | of times to repeat; otherwise, the subtransform is applied until it fails or |
| 169 | there is no input remaining. |
| 170 | |
| 171 | - transform → "`do`" "`{`" transform "`}`" "`while`" "`(`" expression "`)`" |
| 172 | |
| 173 | A do/while transform applies its subtransform repeatedly until the expression |
| 174 | evaluates to `false`. The output has the same format as repeat transform |
| 175 | output. Scope member expressions should be used in the expression to access the |
| 176 | output of the subtransform and determine when to stop. |
| 177 | |
| 178 | Examples: |
| 179 | - `repeat{uint8}` decodes each byte as an integer. |
| 180 | - `repeat(256) {uint32le}` decodes an array of 256 integers. |
| 181 | - `do { struct { .type <- uint8; .value <- uint32le; } } while (.type != 0)` |
| 182 | decodes types and values, stopping after a type of 0 is seen. |
| 183 | |
| 184 | == Expressions == |
| 185 | |
| 186 | - expression → "`true`" |
| 187 | - expression → "`false`" |
| 188 | - expression → integer |
| 189 | |
| 190 | Expressions can be boolean or integer literals. |
| 191 | |
| 192 | - expression → identifier |
| 193 | |
| 194 | Expressions can use parameters. |
| 195 | |
| 196 | - expression → "`in`" |
| 197 | |
| 198 | In an expression transform, "`in`" refers to the transform’s input. Otherwise, |
| 199 | it refers to the input of the whole transform in the definition. |
| 200 | |
| 201 | - expression → "`.`" identifier |
| 202 | |
| 203 | A scope member expression looks in each containing struct transform for a |
| 204 | member with the given key. It starts from the innermost struct transform and |
| 205 | searches as far as the whole transform being defined. |
| 206 | |
| 207 | - expression → expression "`.`" identifier |
| 208 | - expression → expression "`[`" expression "`]`" |
| 209 | |
| 210 | An expression can look for a member of another expression with a given key. |
| 211 | |
| 212 | - expression → expression "`[`" expression "`:`" "`]`" |
| 213 | - expression → expression "`[`" expression "`:`" expression "`]`" |
| 214 | - expression → expression "`[`" expression "`,`" expression "`]`" |
| 215 | |
| 216 | An expression can create a subblob of another expression. The first expression |
| 217 | specifies the blob and the second expression specifies the start offset within |
| 218 | the blob. If a comma is used, the third expression specifies the length of the |
| 219 | subblob; otherwise it specifies the end offset. If no third expression is |
| 220 | given, the subblob extends to the end of the blob. |
| 221 | |
| 222 | - expression → "`(`" expression "`)`" |
| 223 | - expression → expression operator expression |
| 224 | |
| 225 | Expressions can, of course, use binary operators. The operators and their |
| 226 | precedences are as follows: |
| 227 | |
| 228 | ||= Operator =||= Precedence =||= Associativity =||= Description =|| |
| 229 | || "`.`" || 5 || Left‐to‐right || Member (see above) || |
| 230 | || "`[]`" || 5 || Left‐to‐right || Member or subblob (see above) || |
| 231 | || "`*`" || 4 || Left‐to‐right || Integer multiplication || |
| 232 | || "`//`" || 4 || Left‐to‐right || [https://en.wikipedia.org/wiki/Modulo_operation Floored/euclidean] integer division; divisor must be positive || |
| 233 | || "`%`" || 4 || Left‐to‐right || [https://en.wikipedia.org/wiki/Modulo_operation Floored/euclidean] modulo operation; divisor must be positive || |
| 234 | || "`+`" || 3 || Left‐to‐right || Integer addition || |
| 235 | || "`-`" || 3 || Left‐to‐right || Integer subtraction || |
| 236 | || "`++`" || 3 || Left‐to‐right || Blob concatenation || |
| 237 | || "`<`" || 2 || Left‐to‐right || Integer less‐than || |
| 238 | || "`<=`" || 2 || Left‐to‐right || Integer less‐than‐or‐equal || |
| 239 | || "`>`" || 2 || Left‐to‐right || Integer greater‐than || |
| 240 | || "`>=`" || 2 || Left‐to‐right || Integer greater‐than‐or‐equal || |
| 241 | || "`==`" || 1 || Left‐to‐right || Equal‐to (not supported for internal nodes) || |
| 242 | || "`!=`" || 1 || Left‐to‐right || Unequal‐to (not supported for internal nodes) || |
| 243 | || "`&&`" || 0 || Left‐to‐right || Logical or || |
| 244 | || "`||`" || 0 || Left‐to‐right || Logical and || |
| 245 | |
| 246 | == Built‐in transforms == |
| 247 | |
| 248 | These transforms are implemented in C and included with Bithenge. Note that |
| 249 | precise names are preferred; scripts can define shorter aliases if necessary. |
| 250 | |
| 251 | ||= name =||= input =||= output =||= description =||= example =|| |
| 252 | ||ascii ||byte blob node ||string ||decodes some bytes as ASCII characters || `hex:6869` becomes `"hi"` || |
| 253 | ||bit ||1‐bit blob node ||boolean ||decodes a single bit || `1` becomes `true` || |
| 254 | ||bits_be ||byte blob node ||bit blob node ||decodes bytes as bits, starting with the most‐significant bit || `hex:0f` becomes `bit:00001111` || |
| 255 | ||bits_le ||byte blob node ||bit blob node ||decodes bytes as bits, starting with the least‐significant bit || `hex:0f` becomes `bit:11110000` || |
| 256 | ||known_length(len) ||blob node ||blob node ||requires the input to have a known length || || |
| 257 | ||nonzero_boolean ||integer ||boolean ||decodes a boolean where nonzero values are true || `0` becomes `false` || |
| 258 | ||uint8 ||1‐byte blob node ||integer node ||decodes a 1‐byte unsigned integer || `hex:11` becomes `17` || |
| 259 | ||uint16be ||2‐byte blob node ||integer node ||decodes a 2‐byte big‐endian unsigned integer || `hex:0201` becomes `513` || |
| 260 | ||uint16le ||2‐byte blob node ||integer node ||decodes a 2‐byte little‐endian unsigned integer || `hex:0101` becomes `257` || |
| 261 | ||uint32be ||4‐byte blob node ||integer node ||decodes a 4‐byte big‐endian unsigned integer || `hex:00000201` becomes `513` || |
| 262 | ||uint32le ||4‐byte blob node ||integer node ||decodes a 4‐byte little‐endian unsigned integer || `hex:01010000` becomes `257` || |
| 263 | ||uint64be ||8‐byte blob node ||integer node ||decodes a 8‐byte big‐endian unsigned integer || `hex:0000000000000201` becomes `513` || |
| 264 | ||uint64le ||8‐byte blob node ||integer node ||decodes a 8‐byte little‐endian unsigned integer || `hex:0101000000000000` becomes `257` || |
| 265 | ||uint_be(len) ||bit blob node ||integer node ||decodes bits as an unsigned integer, starting with the most‐significant bit || || |
| 266 | ||uint_le(len) ||bit blob node ||integer node ||decodes bits as an unsigned integer, starting with the least‐significant bit || || |
| 267 | ||zero_terminated ||byte blob node ||byte blob node||takes bytes up until the first `00` || `hex:7f0400` becomes `hex:7f04` || |