wiki:Bithenge

Version 2 (modified by Sean Bartell, 11 years ago) ( diff )

Mention new design

Bithenge

Bithenge consists of a library and tools that make working with structured binary data easier. The project was started by Sean Bartell as part of Google Summer of Code 2012 to address #317. The code is at lp:~wtachi/helenos/bithenge and periodic updates are posted to HelenOS-devel.

Bithenge is intended to be a general‐purpose library with a variety of uses, but only one tool has been implemented so far. It is a program that reads a script file describing a binary format, applies it to binary data, and prints the decoded result.

Quick start

The Bithenge source code is in uspace/app/bithenge and uspace/lib/bithenge in the bithenge branch. It is automatically built in HelenOS to create the bithenge application. For Linux, it can be built by running make -f Makefile.linux, first in uspace/lib/bithenge and then in uspace/app/bithenge.

The program can be run with bithenge <script file> <binary file> to decode a file. There are some example files in uspace/dist/src/bithenge; see the syntax documentation.

Overview

Trees

Bithenge represents all data in the form of a data structure called a “tree,” similar to the data structure represented by JSON. A tree consists of a boolean node, integer node, string node, or blob node, or an internal node with children. A boolean node holds a boolean value, an integer node holds a signed integer, and a string holds a Unicode string.

A blob node represents an arbitrary sequence of raw bytes or bits. Blob nodes are polymorphic, allowing any source of raw binary data to be used. Bithenge includes blob node implementations for in‐memory buffers, files, and block devices. An implementation has also been written that reads another task’s virtual memory, but it hasn’t been committed because it’s unclear whether it will be useful in its current form.

An internal node has an arbitrary number of children, each with a unique key. The key can be any node other than an internal node. Arrays can be represented by internal nodes with integer keys starting at 0. The tree node can provide children in an arbitrary order; the order will be used when displaying the tree, but should have no semantic significance. Internal nodes are polymorphic and can delay creation of child nodes until necessary, so keeping the whole tree in memory can be avoided.

Internal nodes are currently responsible for freeing their own children. In the future, it should be possible for there to be multiple references to the same node, but it isn’t clear whether this should be implemented with symbolic links, an acyclic graph with reference counting, or a full graph.

Note that all interpreted data is represented in Bithenge with nodes. Therefore, the word “blob” usually refers to a blob node, and so on.

A decoded tree for a FAT filesystem might look something like this:

○───bits─▶16
│
├───fat──▶○
│         ├───0───▶0xfff0
│         ├───1───▶0xffff
│         └───2───▶0x0000
│
└───root──▶○
           ├───0───▶○
           │        ├───name───▶README.TXT
           │        └───size───▶0x1351
           │
           └───1───▶○
                    ├───name───▶KERNEL.ELF
                    └───size───▶0x38e9a2

Transforms

A transform is a function from an input tree and zero or more argument trees to an output tree. One example is uint32le, which takes a 4‐byte blob node as the input tree and provides an integer node as the output tree. Another example is uint_le(len), which takes a bit blob node as the input tree and an integer node as an argument tree, and produces an integer node as the output tree.

Several transforms, including uint32le and uint_le(len), are built in to Bithenge. More complicated transforms can be created by a format provider, as explained below. For instance, the fat.bh file contains a fat_filesystem transform that takes a byte node as the input tree and produces a complex output tree with various decoded information about the filesystem.

Data providers

Generally speaking, a data provider is anything that provides data from outside Bithenge in the form of a tree. The Bithenge library includes functions for various data providers, and can automatically create a tree from a prefixed string:

Prefix Tree Type Example Description
file: Byte blob node file:/textdemo Read the contents of a file. This is the default if no prefix is used.
block: Byte blob node block:bd/initrd Read the contents of a block device. (HelenOS only.)
hex: Byte blob node hex:01000000 Use a string of hexadecimal characters to create a blob node.

These prefixes can also be used with the bithenge program.

Format providers

A format provider creates transforms based on external data, or vice versa. The only implemented format provider is the script parser, which creates a transform from a file written in the Bithenge scripting language. A future idea is the DWARF format provider, which would read DWARF information from an executable and create a transform that could be applied to a task memory dump.

Library

The Bithenge library includes APIs for working with trees and creating and applying transforms. It also contains the built‐in transforms, data providers, and format providers. The library overview describes the API and conventions used; detailed documentation is included in Doxygen comments in the source.

Clients

A client is any program that provides a user interface to the Bithenge library. The only implemented client is the simple bithenge program described above. In the future, there could be an interactive browser for decoded trees, or a program to generate C code based on a script file.

New design

I am currently working on a new constraint‐based design, where the format specification consists of statements that must be true about the raw and interpreted data, and the program figures out how to solve these constraints. This design should make a number of things possible, particularly editing and bidirectional encoding/decoding. As of December 2012, I have a proof‐of‐concept that generates Python code to encode or decode a trivial structure, and I’m working on improving it.

More information

Note: See TracWiki for help on using the wiki.