Fork us on GitHub Follow us on Facebook Follow us on Twitter

Changes between Version 11 and Version 12 of StructuredBinaryData

2012-06-29T21:48:56Z (7 years ago)
Sean Bartell

elaborate on future language ideas


  • StructuredBinaryData

    v11 v12  
    197197There are some example files in `uspace/dist/src/bithenge`.
    199 == Future features ==
    201 * Parameters for transforms
    202   * Keyword parameters only?
    203 * Expressions depending on previously decoded values
    204 * Enumerations
    205 * Variables
    206 * Transforming internal nodes
    207 * Assertions
    208   * Transforms that return their input
    209   * Different levels (expected, required, mandatory)
    210 * Error handling
    211 * Hidden fields
    212 * Iteration/recursion/repetition
    213 * Seeking and detecting position
    214 * Checking alignment
    215 * Reference to structures at other offsets
    216   * How to know what blob to go within?
    217   * How to know current offset within that blob?
    218   * Could be relative to multiple things at once...
    219   * Blob node can be an inherited parameter
    220     * This is also useful for endianness
    221   * Offset could be an automatically incremented parameter
    222 * Ad hoc tweaks at runtime
     199== Future language ideas ==
     201In approximate order of priority.
     203 Transform parameters:: Currently, a transform can only have one input. This
     204   makes various things impossible, particularly cases where the way a field is
     205   decoded depends on previous fields. Parameters will allow a transform to use
     206   multiple inputs, allowing things like
     207   `.len <- uint32le; .str <- ascii <- known_length(.len);`. It still needs to
     208   be determined whether parameters should be named or just positional.
     209 Simple expressions:: These are needed to determine the value of parameters.
     210   Simple expressions will allow transforms like `known_length(.len)` or
     211   `known_length(8)`; more complicated arithmetic expressions will be developed
     212   later.
     213 Conditional transforms:: A way to apply different transforms depending on an
     214   expression. For example, something like:
     215   `if (.has_extra) { struct { .extra <- uint32le; } }`.
     216 Repetition:: Transforms may need to be repeated a known number of times, until
     217   the end of the data, or until the transform indicates that repetition should
     218   stop. For instance, `repeat(.len) {uint32le;}`. The result could be a tree
     219   like `{0: 1351, 1: 17}`.
     220 Subblobs:: When there are pointers to other offsets in the blob, the script
     221   could pass the whole blob as a parameter and apply transforms to subblobs.
     222   This is essential for non‐sequential blobs like filesystems.
     223 Bitfields:: `struct` will be extended to work with bits instead of just bytes.
     224 Assertions:: These could be implemented as transforms that don't actually
     225   change the input. There could be multiple levels, ranging from “warning” to
     226   “fatal error”.
     227 Enumerations:: An easier way to handle many constant values, like
     228   `enum { 0: "none", 1: "file", 2: "directory", 3: "symlink" }`.
     229 Transforming internal nodes:: After binary data is decoded into a tree, it
     230   should be possible to apply further transforms to interpret the data
     231   further. For instance, after the FAT and directory entries of a FAT
     232   filesystem have been decoded, a further transform could determine the data
     233   for each file.
     234 Hidden fields:: Some fields, such as length fields, are no longer interesting
     235   after the data is decoded, so they should be hidden by default.
     236 Search:: Decoding may require searching for a fixed sequence of bytes in the
     237   data.
     238 Automatic parameters:: It could be useful to automatically pass some
     239   parameters rather than computing and passing them explicitly. For instance,
     240   a version number that affects the format of many different parts of the file
     241   could be passed automatically, without having to write it out every time. A
     242   more advanced automatic parameter could keep track of current offset being
     243   decoded within a blob.
    224245=== Constraint‐based version ===
    377398[ Space‐filling curves]
    378399look cool, but this project is about ''avoiding'' looking at raw binary data.
    382 The next step is to design and implement the domain-specific language. I
    383 will do this incrementally: start with basic features, design them,
    384 implement them, and make an example, then move on to more advanced
    385 features, and so on. I will post an update after each step, especially
    386 after each part of the design. This is different from the schedule I
    387 gave in my proposal, but my goal for July 1st is the same: a program
    388 that can use a format specification file to interpret data and dump it
    389 in JSON format.