= Structured Binary Data = [[PageOutline(2-3)]] This page will document my thoughts and design ideas for the structured binary data project. The project aims to address #317; a description of my overall approach can be found on the [https://www.google-melange.com/gsoc/project/google/gsoc2012/wtachi/46005 GSoC project page]. == Requirements == * View on different levels; for instance, view the integer and sequence of bytes comprising a string if necessary. * Check whether files are consistent. * Handle broken files. * Don’t try to read the whole file at once. * Allow full modifications. Ideally, allow creation of a whole filesystem from scratch. == Existing Tools == I am researching existing tools related to my project, so they can be used for inspiration. === [http://construct.wikispaces.com/ Construct] === A Python library for creating declarative structure definitions. Each instance of the `Construct` class has a name, and knows how to read from a stream, write to a stream, and determine its length. Some predefined `Construct` subclasses use an arbitrary Python function evaluated at runtime, or behave differently depending on whether sub‐`Construct`s throw exceptions. `Const` uses a sub‐`Construct` and makes sure the value is correct. Also has lazy `Construct`s. Unfortunately, if you change the size of a structure, you still have to change everything else manually. TODO: look at issues and forks. === [http://bindata.rubyforge.org/ BinData] === Makes good use of Ruby syntax; mostly has the same features as Construct. === Imperative DSLs === DSLs in this category are used in an obvious, deterministic manner, and complex structures can’t be edited. They are simple imperative languages in which fields, structures, bitstructures, and arrays can be defined. The length, decoded value, and presence of fields can be determined by expressions using any previously decoded field, and structures can use `if`/`while`/`continue`/`break` and similar statements. Structures can inherit from other structures, meaning that the parent’s fields are present at the beginning of the child. Statements can move to a different offset in the input data. There may be a real programming language that can be used along with the DSL. [http://pyffi.sourceforge.net/ PyFFI]:: Lets you create or modify files instead of just reading them. Fields can refer to blocks of data elsewhere in the file. Uses an XML format. [http://www.synalysis.net/ Synalize It!]:: Not completely imperative; if you declare optional structs where part of the data is constant, the correct struct will be displayed. Has a Graphviz export of file structure. Uses an XML format. Other free:: [http://wsgd.free.fr/ Wireshark Generic Dissector]. Other proprietary:: [http://www.hhdsoftware.com/doc/hex-editor/language-reference-overview.html Hex Editor Neo]. === Less interesting tools === Simple formats in hex editors:: These support static fields and dynamic lengths only: [http://www.flexhex.com/ FlexHex], [http://hexedit.com/ HexEdit], [http://www.hexworkshop.com/ Hex Workshop], and [http://kde.org/applications/utilities/okteta/ Okteta]. Simple formats elsewhere:: [http://ff-extractor.sourceforge.net/ ffe], [http://bigeasy.github.com/node-packet/ Node Packet], and [https://www.secdev.org/projects/scapy/ Scapy] can only handle trivial structures. [http://docs.python.org/library/struct.html Python’s struct] and [https://github.com/ToxicFrog/vstruct VStruct] use concise string formats to describe simple structures. [https://bitbucket.org/haypo/hachoir Hachoir] uses Python for most things. Protocol definition formats:: [https://en.wikipedia.org/wiki/Abstract_Syntax_Notation_One ASN.1], [https://en.wikipedia.org/wiki/Microsoft_Interface_Definition_Language MIDL], [http://piqi.org/ Piqi], and other IPC implementations go in the other direction: they generate a binary format from a text description of a structure. ASN.1 in particular has many features. [https://www.wireshark.org/ Wireshark] and [http://www.tcpdump.org/ tcpdump]:: As the Construct wiki notes, you would expect these developers to have some sort of DSL, but they just use C for everything. Wireshark does use ASN.1, Diameter, and MIDL for protocols developed with them.