Context Navigation

Changes between Version 11 and Version 12 of StructuredBinaryData

Timestamp:: 2012-06-29T21:48:56Z (13 years ago)
Author:: Sean Bartell
Comment:: elaborate on future language ideas

Legend:

: Unmodified
: Added
: Removed
: Modified

StructuredBinaryData

-              v11
+              v12
 There are some example files in `uspace/dist/src/bithenge`.
+== Future features ==
+* Parameters for transforms
+  * Keyword parameters only?
+* Expressions depending on previously decoded values
+* Enumerations
+* Variables
+* Transforming internal nodes
+* Assertions
+  * Transforms that return their input
+  * Different levels (expected, required, mandatory)
+* Error handling
+* Hidden fields
+* Iteration/recursion/repetition
+* Seeking and detecting position
+* Checking alignment
+* Reference to structures at other offsets
+  * How to know what blob to go within?
+  * How to know current offset within that blob?
+  * Could be relative to multiple things at once...
+  * Blob node can be an inherited parameter
+    * This is also useful for endianness
+  * Offset could be an automatically incremented parameter
+* Ad hoc tweaks at runtime
+== Future language ideas ==
+In approximate order of priority.
+ Transform parameters:: Currently, a transform can only have one input. This
+   makes various things impossible, particularly cases where the way a field is
+   decoded depends on previous fields. Parameters will allow a transform to use
+   multiple inputs, allowing things like
+   `.len <- uint32le; .str <- ascii <- known_length(.len);`. It still needs to
+   be determined whether parameters should be named or just positional.
+ Simple expressions:: These are needed to determine the value of parameters.
+   Simple expressions will allow transforms like `known_length(.len)` or
+   `known_length(8)`; more complicated arithmetic expressions will be developed
+   later.
+ Conditional transforms:: A way to apply different transforms depending on an
+   expression. For example, something like:
+   `if (.has_extra) { struct { .extra <- uint32le; } }`.
+ Repetition:: Transforms may need to be repeated a known number of times, until
+   the end of the data, or until the transform indicates that repetition should
+   stop. For instance, `repeat(.len) {uint32le;}`. The result could be a tree
+   like `{0: 1351, 1: 17}`.
+ Subblobs:: When there are pointers to other offsets in the blob, the script
+   could pass the whole blob as a parameter and apply transforms to subblobs.
+   This is essential for non‐sequential blobs like filesystems.
+ Bitfields:: `struct` will be extended to work with bits instead of just bytes.
+ Assertions:: These could be implemented as transforms that don't actually
+   change the input. There could be multiple levels, ranging from “warning” to
+   “fatal error”.
+ Enumerations:: An easier way to handle many constant values, like
+   `enum { 0: "none", 1: "file", 2: "directory", 3: "symlink" }`.
+ Transforming internal nodes:: After binary data is decoded into a tree, it
+   should be possible to apply further transforms to interpret the data
+   further. For instance, after the FAT and directory entries of a FAT
+   filesystem have been decoded, a further transform could determine the data
+   for each file.
+ Hidden fields:: Some fields, such as length fields, are no longer interesting
+   after the data is decoded, so they should be hidden by default.
+ Search:: Decoding may require searching for a fixed sequence of bytes in the
+   data.
+ Automatic parameters:: It could be useful to automatically pass some
+   parameters rather than computing and passing them explicitly. For instance,
+   a version number that affects the format of many different parts of the file
+   could be passed automatically, without having to write it out every time. A
+   more advanced automatic parameter could keep track of current offset being
+   decoded within a blob.
 === Constraint‐based version ===
 …
 [http://corte.si/posts/visualisation/binvis/index.html Space‐filling curves]
 look cool, but this project is about ''avoiding'' looking at raw binary data.
-The next step is to design and implement the domain-specific language. I
-will do this incrementally: start with basic features, design them,
-implement them, and make an example, then move on to more advanced
-features, and so on. I will post an update after each step, especially
-after each part of the design. This is different from the schedule I
-gave in my proposal, but my goal for July 1st is the same: a program
-that can use a format specification file to interpret data and dump it
-in JSON format.