Changes between Version 6 and Version 7 of StructuredBinaryData


Ignore:
Timestamp:
2012-05-15T04:45:53Z (12 years ago)
Author:
Sean Bartell
Comment:

Add requirements, interesting formats, misc. ideas, and more existing tools.

Legend:

Unmodified
Added
Removed
Modified
  • StructuredBinaryData

    v6 v7  
    1010== Requirements ==
    1111
    12 * View on different levels; for instance, view the integer and sequence of
    13   bytes comprising a string if necessary.
    14 * Check whether files are consistent.
    15 * Handle broken files.
    16 * Don’t try to read the whole file at once.
    17 * Allow full modifications. Ideally, allow creation of a whole filesystem from scratch.
     12* Work in HelenOS—this means the code must be in C and/or an easily ported
     13  language like Lua.
     14* View on different layers; for instance, switch between viewing the formatted
     15  date and time for a FAT directory entry, the integers, and the original
     16  bytes.
     17* Check whether data is valid; handle broken data reasonably well.
     18* Parse pieces of the data lazily; don’t try to read everything at once.
     19* Work in both directions (parsing and building) without requiring too much
     20  extra effort.
     21* Support full modifications. Ideally, allow creation of a whole filesystem
     22  from scratch.
     23
     24== Interesting formats ==
     25
     26These formats will be interesting and/or difficult to handle. I will keep them
     27in mind when designing the library.
     28
     29* Filesystem allocation tables, which should be kept consistent with the actual
     30  usage of the disk.
     31* Filesystem logs, which should be applied to the rest of the disk before
     32  interpreting it.
     33* Formats where the whole file can have either endianness depending on a field
     34  in the header.
     35* The [http://www.blender.org/development/architecture/blender-file-format/ Blender file format]
     36  is especially dynamic. When Blender saves a file, it just copies the
     37  structures from memory and translates the pointers. Since each Blender
     38  version and architecture will have different structures, the output file
     39  includes a header describing the fields and binary layout of each structure.
     40  When the file is loaded, the header is read first and the structures will be
     41  translated as necessary.
     42* If the language is powerful enough, it might be possible to have a native
     43  description of Zlib and other compression formats.
     44* It could be interesting to parse ARM or x86 machine code.
    1845
    1946== Existing Tools ==
    2047
    21 I am researching existing tools related to my project, so they can be used for inspiration.
     48I am researching existing tools related to my project, so they can be used for
     49inspiration.
    2250
    2351=== [http://construct.wikispaces.com/ Construct] ===
     
    3462everything else manually.
    3563
    36 TODO: look at issues and forks.
    37 
    3864=== [http://bindata.rubyforge.org/ BinData] ===
    3965
     
    4369
    4470DSLs in this category are used in an obvious, deterministic manner, and complex
    45 structures can’t be edited. They are simple imperative languages in which
    46 fields, structures, bitstructures, and arrays can be defined. The length,
    47 decoded value, and presence of fields can be determined by expressions using
    48 any previously decoded field, and structures can use
    49 `if`/`while`/`continue`/`break` and similar statements. Structures can inherit
    50 from other structures, meaning that the parent’s fields are present at the
    51 beginning of the child. Statements can move to a different offset in the input
    52 data. There may be a real programming language that can be used along with the
    53 DSL.
     71edits (changing the length of a structure) are difficult or impossible. They
     72are simple imperative languages in which fields, structures, bitstructures, and
     73arrays can be defined. The length, decoded value, and presence of fields can be
     74determined by expressions using any previously decoded field, and structures
     75can use `if`/`while`/`continue`/`break` and similar statements. Structures can
     76inherit from other structures, meaning that the parent’s fields are present at
     77the beginning of the child. Statements can move to a different offset in the
     78input data. There may be a real programming language that can be used along
     79with the DSL.
    5480
     81 [http://dwarfstd.org/ DWARF]::
     82  Uses a simple stack‐based VM to calculate variable locations.
     83 [http://ijdc.net/index.php/ijdc/article/view/207 “Grammar‐based specification and parsing of binary file formats”]::
     84  Actually uses an attribute grammar, but it isn’t terribly different from an
     85  imperative language.
    5586 [http://pyffi.sourceforge.net/ PyFFI]::
    5687  Lets you create or modify files instead of just reading them. Fields can
    5788  refer to blocks of data elsewhere in the file. Uses an XML format.
     89 [http://aluigi.altervista.org/quickbms.htm QuickBMS]::
     90  A popular tool for extracting files from video game archives. Its main
     91  strength is the broad number of compression formats supported. It can put
     92  modified files back in the archive in trivial situations.
    5893 [http://www.synalysis.net/ Synalize It!]::
    5994  Not completely imperative; if you declare optional structs where part of the
     
    6196  of file structure. Uses an XML format.
    6297 Other free::
    63   [http://wsgd.free.fr/ Wireshark Generic Dissector].
     98  [http://www-old.bro-ids.org/wiki/index.php/BinPAC_Userguide BinPAC],
     99  [https://metacpan.org/module/Data::ParseBinary Data::ParseBinary],
     100  [http://datascript.berlios.de/DataScriptLanguageOverview.html DataScript],
     101  [http://www.dataworkshop.de/ DataWorkshop],
     102  [http://wsgd.free.fr/ Wireshark Generic Dissector],
     103  [http://metafuzz.rubyforge.org/binstruct/ Metafuzz BinStruct], and
     104  [http://www.padsproj.org/ PADS].
    64105 Other proprietary::
    65   [http://www.hhdsoftware.com/doc/hex-editor/language-reference-overview.html Hex Editor Neo].
     106  [http://www.sweetscape.com/010editor/#templates 010 Editor],
     107  [http://www.nyangau.org/be/be.htm Andys Binary Folding Editor],
     108  [https://www.technologismiki.com/prod.php?id=31 Hackman Suite],
     109  [http://www.hhdsoftware.com/doc/hex-editor/language-reference-overview.html Hex Editor Neo],
     110  [http://apps.tempel.org/iBored/ iBored], and
     111  [https://www.x-ways.net/winhex/templates.html#User_Templates WinHext].
    66112
    67113=== Less interesting tools ===
     
    71117  [http://www.flexhex.com/ FlexHex],
    72118  [http://hexedit.com/ HexEdit],
     119  [http://sourceforge.net/projects/hexplorer/ Hexplorer],
    73120  [http://www.hexworkshop.com/ Hex Workshop], and
    74121  [http://kde.org/applications/utilities/okteta/ Okteta].
    75122 Simple formats elsewhere::
     123  [http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/sys/ctf.h CTF],
    76124  [http://ff-extractor.sourceforge.net/ ffe],
    77   [http://bigeasy.github.com/node-packet/ Node Packet], and
    78   [https://www.secdev.org/projects/scapy/ Scapy]
     125  [http://bigeasy.github.com/node-packet/ Node Packet],
     126  [https://www.secdev.org/projects/scapy/ Scapy], and
     127  [http://sourceware.org/gdb/current/onlinedocs/stabs/ stabs]
    79128  can only handle trivial structures.
    80   [http://docs.python.org/library/struct.html Python’s struct] and
     129  [http://www.tecgraf.puc-rio.br/~lhf/ftp/lua/#lpack lpack],
     130  [http://perldoc.perl.org/functions/pack.html Perl’s pack],
     131  [http://docs.python.org/library/struct.html Python’s struct], and
    81132  [https://github.com/ToxicFrog/vstruct VStruct]
    82133  use concise string formats to describe simple structures.
     
    94145  sort of DSL, but they just use C for everything. Wireshark does use ASN.1,
    95146  Diameter, and MIDL for protocols developed with them.
     147
     148== Miscellaneous ideas ==
     149
     150=== Code exporter ===
     151
     152A tool could generate C code to read and write data given a specification. A
     153separate file could be used to specify which types should be used and which
     154things should be read lazily or strictly.
     155
     156=== Diff ===
     157
     158A diff tool could show differences in the interpreted data.
     159
     160=== Space‐filling curves ===
     161
     162[http://corte.si/posts/visualisation/binvis/index.html Space‐filling curves]
     163look cool, but this project is about ''avoiding'' looking at raw binary data.