cicadas wrote:around 0:45 you mentioned tools to "unpack and re-pack [...] one-off proprietary custom archives."
this caught my attention since the game i'm most interested in trying to translate extracts mostly into a single 948 MB DATA.IMG file.
it sounds like extracting this sort of archive is case-by-case, but i've found it difficult to find relevant answers just from searching around.
would you mind touching briefly on your experience with opening this sort of custom archive?
You're correct that it's often case-by-case. However, many Dreamcast games use the AFS container format, for which there are already plenty of extract/rebuild tools available. That being said, this DATA.IMG file is more-or-less your standard packed archive. It's actually many nested archives inside possibly many
other archives, all residing in the top-level archive that is DATA.IMG.
Below, we see the file signature
PACK (#1), followed by a slew of pointers storing the address of files contained within the archive (#2 and #3, for example).
Interestingly, this file uses big-endian format to store its pointers, which is extremely atypical for the Dreamcast. The Hitachi SH4 is little-endian, and you can do a quick Google search to understand the difference. For quick-and-dirty purposes, let's say we want to store the decimal number 415 in four bytes (hexadecimal 0x19f).
• Little-endian: 9F 01 00 00
• Big-endian: 00 00 01 9F
"Mercurius Pretty" here is using BE for its stored offsets/pointers inside this master PACK file, so if we look at #2 and #3 from the above image, we get some example offsets:
• 00 00 F8 00
• 00 1C 19 E0
• 00 1D 13 40
• 00 2A 10 E0
If we open DATA.IMG inside a hex editor and go to each of those addresses (e.g., 0x0000f800), we see the start of contained files (in this case, yet another PACK archive).
What about some of these other pointers?
• 00 06 CB E0
• 00 06 F5 A0
Going to
those addresses reveals contained files with a different signature,
GN.
From here, one would either use something like
QuickBMS to write an extractor script, or use a programming language of choice to write an extractor/rebuilder. This involves reading in pointer tables, extracting individual files, dealing with nested containers, etc. For the rebuild, this involves adjusting pointer tables (and any other size-specific data) to reflect the new size/location of modified files.
However I must say that at a quick-ish glance, I don't see any
easily recognizable texture or text data stored anywhere in this game. And when I say quick, I mean
quick. I spent the majority of my time on this post wanting to explain the basics of containers/archives. On the one hand, you can consider yourself lucky that these are all absolute pointers with a very basic indexing layout, rather than a jumbled mess of relative pointers and "pagination" (for lack of better term). Although, the nested archive stuff
does get hairy...
On the other hand, there appear to be no off-the-shelf PVR textures. They could be headerless PVRs, or they could be compressed. This could make doing texture modifications difficult without time and experience. That said, I'm sure 1ST_READ.BIN has some Shift-JIS text strings embedded in it here and there, or perhaps the entire game uses a custom character encoding.