I got a chance to look at the European release of ICO, and pretty much immediately noticed the new files SRCFILE.TXT and TRFILE.TXT.
SRCFILE is the complete 'objdump -d' output of the game, with the debugging line numbers, and TRFILE is the complete linker log.
Which includes these function names:
00136cc0:0003215616:0710:ffff:huft_build():fumi/ios/inflate.c:119 00137488:0003244212:0160:ffff:inflate_codes():fumi/ios/inflate.c:335 00137bd0:0003268411:00c0:ffff:inflate_stored():fumi/ios/inflate.c:439 00137ef0:0003278744:04d0:ffff:inflate_fixed():fumi/ios/inflate.c:485 00138150:0003288348:05e0:ffff:inflate_dynamic():fumi/ios/inflate.c:549 00138a68:0003319614:ffff:ffff:inflate_start():fumi/ios/inflate.c:706 00138ab8:0003321500:0030:ffff:close_inflate_handler():fumi/ios/inflate.c:750 00138b80:0003324593:00d0:ffff:inflate():fumi/ios/inflate.c:772 00139048:0003340442:0040:ffff:open_inflate_handler():fumi/ios/inflate.c:730 001390d8:0003343118:0060:ffff:fill_inbuf():fumi/ios/inflate.c:887 001391b8:0003346411:0020:ffff:huft_free():fumi/ios/inflate.c:309 00139568:0003361590:0040:ffff:new_mblock_node():fumi/ios/mblock.c:16 00139668:0003365214:ffff:ffff:reuse_mblock1():fumi/ios/mblock.c:95 00139690:0003365880:ffff:ffff:init_mblock():fumi/ios/mblock.c:12 001396a0:0003366175:0030:ffff:new_segment():fumi/ios/mblock.c:72 00139748:0003369314:0030:ffff:reuse_mblock():fumi/ios/mblock.c:105 001397a0:0003370659:0060:ffff:strdup_mblock():fumi/ios/mblock.c:123
This matches perfectly with all the stuff down here. I'm going to stop looking now, in case the Japanese release has Fumito Ueda's credit card numbers on it or something.
I haven't suceeded in contacting anyone about this; SCEI and ONICOS/Izumo don't read their email. Someone who speaks better Japanese than me should try writing them a letter.
ICO, a video game by Sony Computer Entertainment for the PlayStation 2, seems to be using parts of the GPL library libarc for compressed data handling. It doesn't credit the author or mention libarc or the GPL.
This isn't a big problem in terms of code — the two files from libarc used are under 1500 lines put together, and one is a heavily-edited copy of inflate.c from zlib, which is public domain. But, it's a GPL violation, and they need to fix it.
To follow along with this, you'll need:
ios/inflate.c incomplete literal tree incomplete distance tree ios/mblock.c
ICO, helpfully, has all its debug logging still in the release binary. Here we can see the names of two files from libarc. Note the space before "incomplete" in both strings; this indicates a really old version of zlib. Even find-zlib, which claims to go back to zlib 0.1, doesn't have these. (It also doesn't find any data tables.)
From inflate.c:
And:/* Copyright (C) 2000 Masanao Izumo <mo@goice.co.jp> This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */ /* inflate.c -- Not copyrighted 1992 by Mark Adler version c10p1, 10 January 1993 */ /* You can do whatever you like with this source file, though I would prefer that if you modify it and redistribute it that you include comments to that effect with your name and the date. Thank you. [The history has been moved to the file ChangeLog.] */
/* build the decoding tables for literal/length and distance codes */
bl = lbits;
i = huft_build(ll, nl, 257, cplens, cplext, &tl, &bl, &decoder->pool);
if(bl == 0) /* no literals or lengths */
i = 1;
if(i)
{
if(i == 1)
fprintf(stderr, " incomplete literal tree\n");
reuse_mblock(&decoder->pool);
return -1; /* incomplete code set */
}
bd = dbits;
i = huft_build(ll + nl, nd, 0, cpdist, cpdext, &td, &bd, &decoder->pool);
if(bd == 0 && nl > 257) /* lengths but no distances */
{
fprintf(stderr, " incomplete distance tree\n");
reuse_mblock(&decoder->pool);
return -1;
}
if(i == 1) {
#ifdef PKZIP_BUG_WORKAROUND
i = 0;
#else
fprintf(stderr, " incomplete distance tree\n");
#endif
}
if(i)
{
reuse_mblock(&decoder->pool);
return -1;
}
libarc uses a very old copy of INFLATE with the same
error messages.
Now that we've seen that, it's time for MIPS assembly!
I'll be using ps2dis's output here.
The equivalent to fprintf() is located at 0x001A6E28 in the
binary. It's been simplified - the first argument is missing, but
I'll use the same name for clarity.
Searching for those error strings finds this:
jal $001a6e28 # 0013531c:0c069b8a v fprintf
addiu a0, a0, $6b10 # 00135320:24846b10 a0=" incomplete literal tree\n"
beq zero, zero, $0013540c # 00135324:10000039 v __0013540c
daddu a0, s0, zero # 00135328:0200202d
__0013532c: #
addiu v0, zero, $0006 # 0013532c:24020006 v0=$00000006
lui a3, $0028 # 00135330:3c070028 a3=$00280000
lui t0, $0028 # 00135334:3c080028 t0=$00280000
sll a0, v1, 2 # 00135338:00032080
lw a1, $0514(sp) # 0013533c:8fa50514
sw v0, $04fc(sp) # 00135340:afa204fc
addu a0, sp, a0 # 00135344:03a42021
addiu a3, a3, $0b20 # 00135348:24e70b20 a3=$00280b20
addiu t0, t0, $0b60 # 0013534c:25080b60 t0=$00280b60
daddu a2, zero, zero # 00135350:0000302d
addiu t1, sp, $04f8 # 00135354:27a904f8
addiu t2, sp, $04fc # 00135358:27aa04fc
jal $001336c0 # 0013535c:0c04cdb0 ^ FNC_001336c0
daddu t3, s0, zero # 00135360:0200582d
lw v1, $04fc(sp) # 00135364:8fa304fc
bne v1, zero, $00135394 # 00135368:1460000a v __00135394
daddu s4, v0, zero # 0013536c:0040a02d s4=$00000006
lw v1, $0510(sp) # 00135370:8fa30510
sltiu v0, v1, $0102 # 00135374:2c620102
bne v0, zero, $00135398 # 00135378:14400007 v __00135398
addiu v0, zero, $0001 # 0013537c:24020001 v0=$00000001
lui a0, $0055 # 00135380:3c040055 a0=$00550000
jal $001a6e28 # 00135384:0c069b8a v fprintf
addiu a0, a0, $6b30 # 00135388:24846b30 a0=" incomplete distance tree\n"
beq zero, zero, $0013540c # 0013538c:1000001f v __0013540c
daddu a0, s0, zero # 00135390:0200202d
__00135394: #
addiu v0, zero, $0001 # 00135394:24020001 v0=$00000001
__00135398: #
bne s4, v0, $001353a8 # 00135398:16820003 v __001353a8
lui a0, $0055 # 0013539c:3c040055 a0=$00550000
jal $001a6e28 # 001353a0:0c069b8a v fprintf
addiu a0, a0, $6b30 # 001353a4:24846b30 a0=" incomplete distance tree\n"
This calls fprintf three times, as you can see.
Now, let's do a Google Code Search for the errors. Almost all of these are the same — they're either commented out or there's only one call to each. The only different one is TiMidity++, which turns out to use libarc!
After the error message, all three paths jump to here:
__0013540c: #
jal $00136140 # 0013540c:0c04d850 v FNC_00136140
nop # 00135410:00000000
beq zero, zero, $00135434 # 00135414:10000007 v __00135434
addiu v0, zero, $ffff # 00135418:2402ffff v0=$ffffffff
which goes to:
FNC_00136140: #
addiu sp, sp, $ffd0 # 00136140:27bdffd0
sd s1, $0010(sp) # 00136144:ffb10010
sd ra, $0020(sp) # 00136148:ffbf0020
daddu s1, a0, zero # 0013614c:0080882d
sd s0, $0000(sp) # 00136150:ffb00000
lw s0, $0000(s1) # 00136154:8e300000
beq s0, zero, $00136184 # 00136158:1200000a v __00136184
ld ra, $0020(sp) # 0013615c:dfbf0020
daddu a0, s0, zero # 00136160:0200202d
nop # 00136164:00000000
__00136168: #
jal $00136060 # 00136168:0c04d818 ^ FNC_00136060
lw s0, $000c(s0) # 0013616c:8e10000c
bne s0, zero, $00136168 # 00136170:1600fffd ^ __00136168
daddu a0, s0, zero # 00136174:0200202d
[...]
and further to:
FNC_00136060: #
lw v0, $0004(a0) # 00136060:8c820004
sltiu v0, v0, $2001 # 00136064:2c422001
bne v0, zero, $00136078 # 00136068:14400003 v __00136078
lw v0, $9758(gp) # 0013606c:8f829758 v0=$00632048
j $00139598 # 00136070:0804e566 v FNC_00139598
lw a0, $0000(a0) # 00136074:8c840000
__00136078: #
sw a0, $9758(gp) # 00136078:af849758 [00632048]
jr ra # 0013607c:03e00008
sw v0, $000c(a0) # 00136080:ac82000c
nop # 00136084:00000000
__00136088: #
sw zero, $0004(a0) # 00136088:ac800004
jr ra # 0013608c:03e00008
sw zero, $0000(a0) # 00136090:ac800000
nop # 00136094:00000000
and finally:
FNC_00139598: #
addiu sp, sp, $fb70 # 00139598:27bdfb70
sd s5, $0450(sp) # 0013959c:ffb50450
daddu s5, a0, zero # 001395a0:0080a82d
sd ra, $0480(sp) # 001395a4:ffbf0480
lui a0, $0055 # 001395a8:3c040055 a0=$00550000
sd s7, $0470(sp) # 001395ac:ffb70470
sd s6, $0460(sp) # 001395b0:ffb60460
addiu a0, a0, $72d8 # 001395b4:248472d8 a0="mem:free "
sd s4, $0440(sp) # 001395b8:ffb40440
sd s3, $0430(sp) # 001395bc:ffb30430
sd s2, $0420(sp) # 001395c0:ffb20420
sd s1, $0410(sp) # 001395c4:ffb10410
jal $001a6e28 # 001395c8:0c069b8a v fprintf
sd s0, $0400(sp) # 001395cc:ffb00400
bne s5, zero, $00139618 # 001395d0:16a00011 v __00139618
addiu s1, s5, $fff0 # 001395d4:26b1fff0
lui a0, $0055 # 001395d8:3c040055 a0=$00550000
jal $001a6e28 # 001395dc:0c069b8a v fprintf
addiu a0, a0, $72e8 # 001395e0:248472e8 a0="null memory pointer\n"
break (00000) # 001395e4:0000000d
lui s0, $0055 # 001395e8:3c100055 s0=$00550000
lui a2, $0055 # 001395ec:3c060055 a2=$00550000
addiu s0, s0, $70e0 # 001395f0:261070e0 s0="ios/memory.c"
addiu a2, a2, $7300 # 001395f4:24c67300 a2="IOSFREE(): NULL MEMORY POINTER\n"
daddu a0, s0, zero # 001395f8:0200202d a0="ios/memory.c"
jal $001ad748 # 001395fc:0c06b5d2 v FNC_001ad748
addiu a1, zero, $0334 # 00139600:24050334 a1=$00000334
lui a2, $0063 # 00139604:3c060063 a2=$00630000
daddu a0, s0, zero # 00139608:0200202d a0="ios/memory.c"
addiu a2, a2, $20b8 # 0013960c:24c620b8 a2=$006320b8
beq zero, zero, $001399b8 # 00139610:100000e9 v __001399b8
addiu a1, zero, $0334 # 00139614:24050334 a1=$00000334
[...]
That last function sure looks like free() to me.
From mblock.c:
Assuming you can read MIPS assembly (and you can, right?) they're obviously the same code. The memory management code here (reuse_mblock()) is entirely original; nothing that uses zlib compression would have this, unless it used libarc.static void reuse_mblock1(MBlockNode *p) { if(p->block_size > MIN_MBLOCK_SIZE) free(p); else /* p->block_size <= MIN_MBLOCK_SIZE */ { p->next = free_mblock_list; free_mblock_list = p; } } void reuse_mblock(MBlockList *mblock) { MBlockNode *p; if((p = mblock->first) == NULL) return; /* There is nothing to collect memory */ while(p) { MBlockNode *tmp; tmp = p; p = p->next; reuse_mblock1(tmp); } init_mblock(mblock); }
I could go further, but pointing out more of the same control flow in a bunch of assembly text isn't really needed.
Instead, I wrote a tool to decompress ICO's data archive,
using libarc. libarc's compressor (in deflate.c) uses the same
DEFLATE algorithm as gzip, but doesn't store a gzip or zip
header. Nevertheless, it decompresses all the files perfectly*
without any messing with the compressed stream needed. Get it in
the links below.
("advertise.pss" is an MPEG-2 video and will play in VLC, although it won't have sound.)
* It doesn't have a checksum, so it might not actually be perfect, but it doesn't error at least!
Shadow of the Colossus, the "sequel" to ICO, doesn't seem to
use any other code. I haven't disassembled it, but it's even more
helpful: function names aren't stripped at all!
All of them look safe to me, aside from being as unorganized as
any game code.
I tried contacting Masanao Izumo, the author of libarc, but one of his emails (mo@goice.co.jp) stopped working and I haven't received a response on the other (iz@onicos.co.jp). Maybe he can be reached through ONICOS?
Thanks !WAHa.06x36 for helping me with the format of DATA.DF.
(Why are the default colors for code2html so ugly? Why does tidy destroy text with CSS white-space: pre?)