| 12 | | * View on different levels; for instance, view the integer and sequence of |
| 13 | | bytes comprising a string if necessary. |
| 14 | | * Check whether files are consistent. |
| 15 | | * Handle broken files. |
| 16 | | * Don’t try to read the whole file at once. |
| 17 | | * Allow full modifications. Ideally, allow creation of a whole filesystem from scratch. |
| | 12 | * Work in HelenOS—this means the code must be in C and/or an easily ported |
| | 13 | language like Lua. |
| | 14 | * View on different layers; for instance, switch between viewing the formatted |
| | 15 | date and time for a FAT directory entry, the integers, and the original |
| | 16 | bytes. |
| | 17 | * Check whether data is valid; handle broken data reasonably well. |
| | 18 | * Parse pieces of the data lazily; don’t try to read everything at once. |
| | 19 | * Work in both directions (parsing and building) without requiring too much |
| | 20 | extra effort. |
| | 21 | * Support full modifications. Ideally, allow creation of a whole filesystem |
| | 22 | from scratch. |
| | 23 | |
| | 24 | == Interesting formats == |
| | 25 | |
| | 26 | These formats will be interesting and/or difficult to handle. I will keep them |
| | 27 | in mind when designing the library. |
| | 28 | |
| | 29 | * Filesystem allocation tables, which should be kept consistent with the actual |
| | 30 | usage of the disk. |
| | 31 | * Filesystem logs, which should be applied to the rest of the disk before |
| | 32 | interpreting it. |
| | 33 | * Formats where the whole file can have either endianness depending on a field |
| | 34 | in the header. |
| | 35 | * The [http://www.blender.org/development/architecture/blender-file-format/ Blender file format] |
| | 36 | is especially dynamic. When Blender saves a file, it just copies the |
| | 37 | structures from memory and translates the pointers. Since each Blender |
| | 38 | version and architecture will have different structures, the output file |
| | 39 | includes a header describing the fields and binary layout of each structure. |
| | 40 | When the file is loaded, the header is read first and the structures will be |
| | 41 | translated as necessary. |
| | 42 | * If the language is powerful enough, it might be possible to have a native |
| | 43 | description of Zlib and other compression formats. |
| | 44 | * It could be interesting to parse ARM or x86 machine code. |
| 45 | | structures can’t be edited. They are simple imperative languages in which |
| 46 | | fields, structures, bitstructures, and arrays can be defined. The length, |
| 47 | | decoded value, and presence of fields can be determined by expressions using |
| 48 | | any previously decoded field, and structures can use |
| 49 | | `if`/`while`/`continue`/`break` and similar statements. Structures can inherit |
| 50 | | from other structures, meaning that the parent’s fields are present at the |
| 51 | | beginning of the child. Statements can move to a different offset in the input |
| 52 | | data. There may be a real programming language that can be used along with the |
| 53 | | DSL. |
| | 71 | edits (changing the length of a structure) are difficult or impossible. They |
| | 72 | are simple imperative languages in which fields, structures, bitstructures, and |
| | 73 | arrays can be defined. The length, decoded value, and presence of fields can be |
| | 74 | determined by expressions using any previously decoded field, and structures |
| | 75 | can use `if`/`while`/`continue`/`break` and similar statements. Structures can |
| | 76 | inherit from other structures, meaning that the parent’s fields are present at |
| | 77 | the beginning of the child. Statements can move to a different offset in the |
| | 78 | input data. There may be a real programming language that can be used along |
| | 79 | with the DSL. |
| 63 | | [http://wsgd.free.fr/ Wireshark Generic Dissector]. |
| | 98 | [http://www-old.bro-ids.org/wiki/index.php/BinPAC_Userguide BinPAC], |
| | 99 | [https://metacpan.org/module/Data::ParseBinary Data::ParseBinary], |
| | 100 | [http://datascript.berlios.de/DataScriptLanguageOverview.html DataScript], |
| | 101 | [http://www.dataworkshop.de/ DataWorkshop], |
| | 102 | [http://wsgd.free.fr/ Wireshark Generic Dissector], |
| | 103 | [http://metafuzz.rubyforge.org/binstruct/ Metafuzz BinStruct], and |
| | 104 | [http://www.padsproj.org/ PADS]. |
| 65 | | [http://www.hhdsoftware.com/doc/hex-editor/language-reference-overview.html Hex Editor Neo]. |
| | 106 | [http://www.sweetscape.com/010editor/#templates 010 Editor], |
| | 107 | [http://www.nyangau.org/be/be.htm Andys Binary Folding Editor], |
| | 108 | [https://www.technologismiki.com/prod.php?id=31 Hackman Suite], |
| | 109 | [http://www.hhdsoftware.com/doc/hex-editor/language-reference-overview.html Hex Editor Neo], |
| | 110 | [http://apps.tempel.org/iBored/ iBored], and |
| | 111 | [https://www.x-ways.net/winhex/templates.html#User_Templates WinHext]. |
| | 147 | |
| | 148 | == Miscellaneous ideas == |
| | 149 | |
| | 150 | === Code exporter === |
| | 151 | |
| | 152 | A tool could generate C code to read and write data given a specification. A |
| | 153 | separate file could be used to specify which types should be used and which |
| | 154 | things should be read lazily or strictly. |
| | 155 | |
| | 156 | === Diff === |
| | 157 | |
| | 158 | A diff tool could show differences in the interpreted data. |
| | 159 | |
| | 160 | === Space‐filling curves === |
| | 161 | |
| | 162 | [http://corte.si/posts/visualisation/binvis/index.html Space‐filling curves] |
| | 163 | look cool, but this project is about ''avoiding'' looking at raw binary data. |