Yeah, exactly. This is how i was able to encrypt the bootblock of "my bios replacement".
OK, let's start with the "nasty decryption logic bug". This, combined with the features of the XOR encryption you described, make the whole encryption useless (for nintendo) and implementing a new bios is a straight-forward task (provided that "high speed" (30Mhz) programmable logic with infinite memory attached to it is available. as this is not the case (at last for me), the thing is a bit complicated, but in theory it's easy.).
The Bios chip (some name it IPL, some name it BS/BS2, it just doesn't matter since the name of the device probably isn't part of the encryption), which btw includes sram and rtc (but that won't matter here), is attached to the EXI0 bus.
The Exi bus (nothing new here, just to refresh it is an SPI-like bus. SPI is nothing complicated, just four interesting lines: CS (used mainly for syncing, since you need a defined start point, and you can easily attach multiple devices (memory card, ...) to the same bus with seperate CS lines), SI (aka MOSI, master out, slave in - the CPU is always master, the IPL-chip is slave. so SI is gamecube -> device), SO (device -> gamecube, tristated when a device is not active), and CLK (generated by the master).
a transfer is basically:
- lower CS (it's low active)
for every bit do:
- set SI bit
- clock
- read SO bit
then:
- put CS high again.
(the exact timing (WHEN to sample SO, clock polarity) is different for different SPI modes, and the one descriped here is not necessarily the one used in the GC. anyway, it doesn't matter here)
so, based on that, we can transfer n-bit messages in BOTH DIRECTIONS.
technically this is implemented with a 32bit shift register, with every clock cycle one bit is shifted out (to SI), and one bit is shifted in (to SO). so after n clock cycles, you have n new bits in the shift register and shifted n bits out.
the used protocol on the Bus is in most cases very simple but device dependant. In the case of the IPL chip, it's the following:
GC -> IPL
1 bit read/write (0 for read, 1 for write, the latter only valid for RTC/Sram of course)
1 unknown bit
1 bits selection (0 for ROM, 1 for RTC/Sram)
23 bits address
6 bits dummy
after that, the data transfer starts. the 6 dummy cycles are mainly to give the IPL time to read out the first byte.
so you send 32 bits of data (the "address"), and start receiving the ROM bytes.
but hey - we said the SPI bus always transfers 2 bits per clock cycle (in marketing terms), since it's fullduplex (in technical terms). we transfer one bit TO the device, and one BACK. we HAVE to. there's no way to NOT send a bit - but it doesn't matter, since for example the bits send from the IPL to the GC in the first 32bits are just ignored - they would contain most probably only zeros, ones, or the bus might be tristate. it's simply not defined, so there's no data to be expected.
the same goes for the transfer of the data. the IPL chip sets the correct data at the SO line, but the gamecube - well, sends dummy bits, too.
normally you would send zeros, ones, or whatever. it's ignored by the IPL chip anyway (unless it's a write, that would turn the whole thing upside down)
now i told that technically the SPI port is implemented by a shifting register of 32bit length. after transferring 32bits, we would have to read out the new value, store it into memory, and .. well, "start the next transfer" the nintendo/artx/whatever engineer thought. (don't tell me you didn't thought that, too).
but what's about CLEARING the register before? yes, they didn't. in the next transfer, the last 32bit are shifted out as dummy bits.
well, one might say, it's just the data just shifted in, so it's completely uninteresting.
yes, BUT: the decryption of the loader is done in hardware. it's a part between the SO line of the IPL and the DI port of the shift register. (the encryption is build into the flipper, so no way to intercept the content AFTER decryption)... well ... no way?
well, there is one. because the (decrypted) data just shifted in (and stored into memory) is shifted out again - we got the decrypted data.
hey. weren't they stupid?
if you sniff the SI line to the IPL chip, you will get a log like this:
00 00 40 00 (address written to the IPL, in this case: 0x100)
FF FF FF FF (well just dummy data)
xx xx xx xx (the data from the last 4 bytes, DECRYPTED)
...
...
xx xx xx xx (the data from the n-1 transfer, decrypted)
so in the end you get every 32bit words except one. For every transfered block you miss 32bits of plaintext data, but you'll get the rest. This should be enough to decrypt huge parts of the bios.
now, let's take a look at the bootprocess of the gamecube:
there's the Gekko. It's a standard processor, made by IBM, nothing special with it (just paired singles, write pipes etc, but still a standard ppc).
then there's the Flipper. every transfer to the outside memory goes through the flipper. The flipper has some (more) hardware stuff at 0x0C000000 (the addresses you usually know as 0xCC000000, because of BAT translation). then there's the memory at 0, the EFB at 0x05000000 (i believe, not sure), but - where does the gekko boot from?
it boots from 0x100, that's what you read in almost any ppc instruction manual - the reset vector. well, this isn't the complete truth - it boots from it's exception base + 0x100. And the exception base is normally zero, BUT, as the ppc manual (dunno which it was exactly) states: there's a bit in a HID (i think) register, which turns the exception base to 0xFFF00000. and this bit is "set usually at boot time".
So the processor starts to fetch instructions at 0xFFF000100. If you read a bit further, you'll notice that the CPU always reads 64bits at once for code.
The memory at 0xFFF00000 is mapped inside the flipper to an automated exi transfer (with that shift register), with the decryption logic active.
so the processor starts executing the decrypted instructions, reading 8 bytes at a time, of which we get 4 bytes in plain - not much, (although enough to make some funny experiments, but that's another topic).
Luckily, the IPL itself (the cube menu) isn't executed this way. (that wouldn't be possible thanks to the "dumb" decryption logic)
The first ~0x800 bytes start to read data out of the IPL chip and store it to memory (still using the hardware decryption logic), and jump there. they read 1024 bytes at once. Well - now we know 1020 bytes of each transfer, enough to have a complete block of code we can exchange (we have the ciphertext Cl^K = Ci on SO, and the plaintext (delayed by 32bits on SI), and can XOR them to get Cl^K^Ci = K. now we can encrypt our code with K).
so now we can make a small code which just dumps the whole IPL - well, to the EXI bus or whereever you can receive it. I received it using my sniffer.
Now we have all Cl, and thus we can compute all K, thus we can get the complete Plaintext of all available IPLs.
Well, that's it basically.
i exchanged the code to a small code which reads a block from memory card and jumps there, the block on memory card reads the other blocks from memory card and jumps there, giving me almost 512k of own code to execute. this is (more than) enough to make a bios replacement, which is what i did - i can now boot homebrew stuff (and for example the IPL, or anything i like..) without ever needing PSO again.
OK that was a huge posting, sorry for that.
a small note why i wasn't been able to recover the plaintext of the original loader:
The decryption logic is, whatever it is, a PRNG. It generates a stream of ciphertext ("K"), which has random properties (non-repeating, at least not in the range of some MB), but is always the same.
it is incremented with every EXI-transfer. the address is NOT used in the calculation. thus reading from 0xFFF00100 more than one time will give you each time another result. the first time you get Ci(0)^K(0) (the correct result), the second time you get Ci(0)^K(1) etc., i.e. wrong results. Since we never get the K(n) for odd n, i see no chance of recovering it this way, even if we can read at 0xFFF00000+x (and we can do this if we don't set a specific bit to disable the logic).
but in the end we don't really need that code, since it wouldn't be very interesting anyway, because, as said, the encryption logic is done in hardware (anyone who knows undocumented EXI registers? *g*)
all we would probably recover with luck would be an initialisation seed, not the algorithm itself. But maybe not even that.
Oh, and: yes, i plan to release my "modchip" (i hate that word). An no, you still can't read from self made DVDs or stuff. It hasn't todo ANYTHING with that. And no, i will not build a solution without memory card, just because you need both slots to be successfull in any game you could probably play in some way. After all, it will be a DEVELOPER solution, not a "i'd like to play warez without having to buy PSO"-kiddie solution. (i know it will probably develop in that direction, but concerning the new developments of the scene, i don't care much).
Currently i'm not able to boot my games anymore, since the PAL IPL refuses to load and the NTSC IPL refuses to load my pal games *g*. I don't care, i never really played.
Oh, hardware used: (just if you care)
- Xilinx CPLD XCR3064XL (tiny, cheap thing. the biggest one you can get in an PLCC housing.) - for intercepting the IPL rom, connection to the FX2. < $10 i think.
- Cypress FX2 USB2.0 chip - for sniffing. $20 or so? my (3rd party) development board costs $90, and is nothing more than the FX2 + voltage regulation + usb plug)
- an Action Replay memory card - the only one where i was successful in reading AND writing to. (can PLEASE someone reverse the stupid "card unlocking process"?). $40 but you get an action replay "for free".
- a TDS224 digital oscilloscope (4channels, 100Mhz, nothing special, the cheapest one (well, except the 2 channel devices) of tektronix at that time. now they have color models for the same price.). $3000, but it's worth it unless you have to pay it for yourself *g* (but even then it might be worth it).
- USB 2.0 card for my laptop - $30 maybe
- the gamecube itself - $100
- a pen to open the gamecube (the "melting plastic"-trick) - $1