|Version 1 (modified by 11 years ago) ( diff ),|
Bithenge consists of a library and tools that make working with structured binary data easier. The project was started as part of Google Summer of Code 2012 to address #317. The code is at lp:~wtachi/helenos/bithenge and periodic updates are posted to HelenOS-devel.
Bithenge is intended to be a general‐purpose library with a variety of uses, but only one tool has been implemented so far. It is a program that reads a script file describing a binary format, applies it to binary data, and prints the decoded result.
The Bithenge source code is in
in the bithenge branch. It is automatically built in HelenOS to create the
bithenge application. For Linux, it can be built by running
make -f Makefile.linux, first in
uspace/lib/bithenge and then in
Bithenge represents all data in the form of a data structure called a “tree,” similar to the data structure represented by JSON. A tree consists of a boolean node, integer node, string node, or blob node, or an internal node with children. A boolean node holds a boolean value, an integer node holds a signed integer, and a string holds a Unicode string.
A blob node represents an arbitrary sequence of raw bytes or bits. Blob nodes are polymorphic, allowing any source of raw binary data to be used. Bithenge includes blob node implementations for in‐memory buffers, files, and block devices. An implementation has also been written that reads another task’s virtual memory, but it hasn’t been committed because it’s unclear whether it will be useful in its current form.
An internal node has an arbitrary number of children, each with a unique key. The key can be any node other than an internal node. Arrays can be represented by internal nodes with integer keys starting at 0. The tree node can provide children in an arbitrary order; the order will be used when displaying the tree, but should have no semantic significance. Internal nodes are polymorphic and can delay creation of child nodes until necessary, so keeping the whole tree in memory can be avoided.
Internal nodes are currently responsible for freeing their own children. In the future, it should be possible for there to be multiple references to the same node, but it isn’t clear whether this should be implemented with symbolic links, an acyclic graph with reference counting, or a full graph.
Note that all interpreted data is represented in Bithenge with nodes. Therefore, the word “blob” usually refers to a blob node, and so on.
A decoded tree for a FAT filesystem might look something like this:
○───bits─▶16 │ ├───fat──▶○ │ ├───0───▶0xfff0 │ ├───1───▶0xffff │ └───2───▶0x0000 │ └───root──▶○ ├───0───▶○ │ ├───name───▶README.TXT │ └───size───▶0x1351 │ └───1───▶○ ├───name───▶KERNEL.ELF └───size───▶0x38e9a2
A transform is a function from an input tree and zero or more argument trees to
an output tree. One example is
uint32le, which takes a 4‐byte blob node as
the input tree and provides an integer node as the output tree. Another example
uint_le(len), which takes a bit blob node as the input tree and an integer
node as an argument tree, and produces an integer node as the output tree.
Several transforms, including
uint_le(len), are built in to
Bithenge. More complicated transforms can be created by a format provider, as
explained below. For instance, the
file contains a
fat_filesystem transform that takes a byte node as the input
tree and produces a complex output tree with various decoded information about
Generally speaking, a data provider is anything that provides data from outside Bithenge in the form of a tree. The Bithenge library includes functions for various data providers, and can automatically create a tree from a prefixed string:
| ||Byte blob node|| ||Read the contents of a file. This is the default if no prefix is used.|
| ||Byte blob node|| ||Read the contents of a block device. (HelenOS only.)|
| ||Byte blob node|| ||Use a string of hexadecimal characters to create a blob node.|
These prefixes can also be used with the
A format provider creates transforms based on external data, or vice versa. The only implemented format provider is the script parser, which creates a transform from a file written in the Bithenge scripting language. A future idea is the DWARF format provider, which would read DWARF information from an executable and create a transform that could be applied to a task memory dump.
The Bithenge library includes APIs for working with trees and creating and applying transforms. It also contains the built‐in transforms, data providers, and format providers. The library overview describes the API and conventions used; detailed documentation is included in Doxygen comments in the source.
A client is any program that provides a user interface to the Bithenge library.
The only implemented client is the simple
bithenge program described above.
In the future, there could be an interactive browser for decoded trees, or a
program to generate C code based on a script file.