I was not completely satisfied by my first encoding of Bad Apple 3D, the colors were pretty washed out and of course there was only half of the video.
I had a few ideas in mind to make a better version:
- Use the two SMS hardware palettes, to up the max colors on screen.
- Use horizontal and vertical mirrors on the tiles to improve size/quality ratio.
- Have a global tile repository. Prior versions had a per keyframe repository which made tiles a lot less reusable.
- Make it compatible with 60Hz hardware.
So, first I had to rewrite the encoder because the encoding pipeline would be different.
v6 was: load -> keyframe tiling -> frame tiling -> dithering -> reindex -> save.
Tiling was done on truecolor tiles, which wouldn't now work if the goal was to make tiles reusable no matter the palette.
I came up with this pipeline for v7: load -> dithering -> make unique -> global tiling -> frame tiling -> smoothing -> reindex -> save.
- I used a much better dithering algorithm from
http://bisqwit.iki.fi/story/howto/dither/jy/ which improved color even with a single palette. This process also tries to find the best keyframes palettes.
- "make unique" is just a simple algorithm that merges strictly similar tiles to make tiling faster.
- As tiling is now done on colorless 4bit tiles, KMeans wouldn't work (color indices are not continuous, so eg. indices 1 and 2 are the same distance as indices 1 and 15). In statistics, we would say we have 64 nominal factors per tile. KModes is what I used then as it's a bit like KMeans except for nominal data.
Global tiling uses KModes to reduce the overall number of tiles to a fixed number and frame tiling uses DCT (discrete cosine tranform) and KModes to find the best matching tiles for the frame, accounting for palettes and H/V mirrors.
Especially global tiling is really time consuming as there is often more than 1 million tiles to reduce so I used SIMD optimisation to speed it up.
- Smoothing uses DCT to tag tiles that are roughly the same as in preceding frame, so that the tilemap can only be updated partially each video frame.
- Save compresses tiles indexes and tilemaps into byte command streams (basically encoding index differences for tiles indexes and dictionnary with skips and repeats for TileMap).
I then wrote the player naively and it was 2x too slow, even without sound. The main optimisations I made were:
- Use jump tables to handle tiles indexes and tilemaps byte command streams. They are cheap on the Z80 and aligning jump tables and jump offsets on 256 bytes and storing only high address bytes makes them super fast (eg. code table).
- Use 256 bytes aligned lookup tables, so eg you load H with the upper byte of the LUT address, and L with the byte to be looked up, the load is then a simple LD A, (HL).
- Remove as many conditional branches as possible and use the main jump table instead. The best of those was to make the encoder handle the switch between fast and slow VRAM tiles upload depending on VBlank or active display by counting the Z80 cycles and the corresponding scanlines at encode time.
- Systematically abuse the stack as a general purpose data pointer and consequently use no interrupts.
Sound was another huge problem. Counting cycles and adding PlaySample macros every X CPU cycles wasn't possible with jump tables as you don't have a fixed execution flow anymore.
My first idea was to use the Z80 R register to keep track of time but it didn't work well (it's incremented each op, but ops timings varies widely on the Z80) and was not cheap in CPU time.
I finally came up with a macro that was able to handle "partial" samples. That is, use some kind of fixed point math for the sample index so you can use it in any function and just pass any cycle count as parameter (code). You end up calling it more often than a simple PlaySample but it's very good at keeping a low sample jitter no matter the code.
Anyway I think the end result is better than v6 and probably more maintainable too. In the end I didn't do 60Hz compatibility but at least it's doable now as tilemaps updates are done in both active display and VBlank.
The full source code is available online at:
https://github.com/gligli/tiler and encoder binaries are included in
http://www.smspower.org/uploads/Homebre ... S-7.00.zip