Fork us on GitHub Follow us on Facebook Follow us on Twitter

Version 6 (modified by Sean Bartell, 7 years ago) (diff)

Requirements and Existing Tools

Structured Binary Data

This page will document my thoughts and design ideas for the structured binary data project. The project aims to address #317; a description of my overall approach can be found on the GSoC project page.

Requirements

  • View on different levels; for instance, view the integer and sequence of bytes comprising a string if necessary.
  • Check whether files are consistent.
  • Handle broken files.
  • Don’t try to read the whole file at once.
  • Allow full modifications. Ideally, allow creation of a whole filesystem from scratch.

Existing Tools

I am researching existing tools related to my project, so they can be used for inspiration.

Construct

A Python library for creating declarative structure definitions. Each instance of the Construct class has a name, and knows how to read from a stream, write to a stream, and determine its length. Some predefined Construct subclasses use an arbitrary Python function evaluated at runtime, or behave differently depending on whether sub‐Constructs throw exceptions. Const uses a sub‐Construct and makes sure the value is correct. Also has lazy Constructs.

Unfortunately, if you change the size of a structure, you still have to change everything else manually.

TODO look at issues and forks.

BinData

Makes good use of Ruby syntax; mostly has the same features as Construct.

Imperative DSLs

DSLs in this category are used in an obvious, deterministic manner, and complex structures can’t be edited. They are simple imperative languages in which fields, structures, bitstructures, and arrays can be defined. The length, decoded value, and presence of fields can be determined by expressions using any previously decoded field, and structures can use if/while/continue/break and similar statements. Structures can inherit from other structures, meaning that the parent’s fields are present at the beginning of the child. Statements can move to a different offset in the input data. There may be a real programming language that can be used along with the DSL.

PyFFI
Lets you create or modify files instead of just reading them. Fields can refer to blocks of data elsewhere in the file. Uses an XML format.
Synalize It!
Not completely imperative; if you declare optional structs where part of the data is constant, the correct struct will be displayed. Has a Graphviz export of file structure. Uses an XML format.
Other free
Wireshark Generic Dissector.
Other proprietary
Hex Editor Neo.

Less interesting tools

Simple formats in hex editors
These support static fields and dynamic lengths only: FlexHex, HexEdit, Hex Workshop, and Okteta.
Simple formats elsewhere
ffe, Node Packet, and Scapy can only handle trivial structures. Python’s struct and VStruct use concise string formats to describe simple structures. Hachoir uses Python for most things.
Protocol definition formats
ASN.1, MIDL, Piqi, and other IPC implementations go in the other direction: they generate a binary format from a text description of a structure. ASN.1 in particular has many features.
Wireshark and tcpdump
As the Construct wiki notes, you would expect these developers to have some sort of DSL, but they just use C for everything. Wireshark does use ASN.1, Diameter, and MIDL for protocols developed with them.