Simutrans: De-mystifying the PAK format

Tags:

Let's de-mystify the .pak file format. Paks are actuallyfairly simple data files, although the details can certainly be complex.

In besch/reader/obj_reader.cc, obj_reader_t::read_file() opens a file and calls read_nodes() in that same .cc to reach each node. The file begins with the version information, terminated with a Ctrl+Z (0x1A) byte:

<code class="bbc_code" style="font-size: 66%;">53 69 6d 75 74 72 61 6e  73 20 6f 62 6a 65 63 74  |Simutrans object|
20 66 69 6c 65 0a 43 6f  6d 70 69 6c 65 64 20 77  | file.Compiled w|
69 74 68 20 53 69 6d 4f  62 6a 65 63 74 73 20 30  |ith SimObjects 0|
2e 31 2e 33 65 78 70 0a  1a eb 03 00 00 52 4f 4f  |.1.3exp......ROO|
54 01 00 00 00 42 55 49  4c 26 00 25 00 08 80 03  |T....BUIL&.%....|</code>

Following that are four bytes of Pak-File version (eb 03 00 00 above), and then a series of nodes until the end of file. Each node is processed by its appropriate reader found in the besch/reader/ subdirectory. In the file, each node begins with four characters describing the node type, as defined in besch/objversion.h:

<code class="bbc_code">enum obj_type
{
        obj_bridge      = C4ID('B','R','D','G'),
        obj_building    = C4ID('B','U','I','L'),</code>

and then a 2-byte (16-bit) child count and 2-byte (16-bit) data block size. If the data block is more than 64k bytes, 0xFFFF is used for the data size, followed by a four-byte (32-bit) data block size. Then the actual data block bytes, followed by any additional nodes in this same format.

The child count indicates how many of the following nodes are considered to belong to (be "inside") the current node. The BUIL node in the example has 0x0026 child nodes. This is how, for example, a single pak file can contain multiple objects, with each object containing several child nodes.

Note that read_nodes() chooses the internal class type from the four-character name, using the following line of code:

<code class="bbc_code">        obj_reader_t *reader = obj_reader->get(static_cast<obj_type>(node.type));
</code>

How exactly that works, in converting a text representation to a somewhat conceptual C++ class type, is left to the student as an exercise.