About the demo #
Inside the Machine is a demo for the Amiga OCS platform, released by Desire at Deadline Party in Berlin in October 2024. We’re pleased to say that we won first place in the Old School category. Big thanks to the organisers for an amazing party!
The people involved in this production were:
- Me (gigabates): Code
- Steffest: Graphics
- Pellicus: 3D, Tooling
- Ma2e: Music
- Platon: Framework
- RamonB5: Support
The source code is available on GitHub
Before we get into things, here’s a capture of the demo:
The idea #
This demo started from a conversation with Pellicus at Revision. I showed him some stuff I’d been doing with simple HAM7 table effects, and we starting thinking about how we could use UV data from real 3D scenes, rather than the usual tunnel style deformations. Building on some of the tooling he’d built for some AGA demos, he set about building a data exporter for Unity. The goal we had was to make something that feels like an AGA demo, or an old school PC demo on OCS.
The concept #
Initially we started trying to incorporate this tech into an existing design concept that our group had been considering for our next production. Relatively quickly we realised that it wasn’t a good fit, and that we needed a new concept that would fit these kind of 3D visuals.
The idea was initially a metaphor for being ‘in the zone’ with our demomaking, and being transported into a computer world, filled with demoish visuals. We built on this concept to include the idea that the four main contributors to this demo are all inside this world, initially isolated, navigating our way through the space before eventually meeting and being united. In reality we’re all from different parts of the world, and different backgrounds, and this Amiga stuff has brought us together through our shared passion. The metaphor of connections, both human and electronic was a great fit too. In the greetings parts you see connections branching out to our friends and acquaintances that we’ve met through the scene.
We’d suggested drum & bass as a potential music style, and Maze came up with a fantastic track. The upbeat feel was not necessarily what I was expecting, but it really contributed to the positive vibe of the demo. The track is available on Sound Cloud for your listening pleasure.
The end result #
Originally the demo was intended to be released at Nova party in the UK. We got to the party with all of the scenes working in some shape or form, but no amount of energy drinks and party coding was enough to assemble everything into a track loader before the competition, so we had to admit defeat. I’d definitely underestimated the complexity of this part, having never released a trackmo before. Things like memory management and sync are difficult to get right, and I had made our lives particularly difficult by making so many of the effects heavy on memory usage and precalc!
With several lessons learnt, we went away and made some plans around which party would be best for a release. We wanted a party with a dedicated old school category where several of us could attend in person. Deadline ended up being the perfect choice, and I’m really glad we chose to release at this party. Myself, Steffest and Pellicus were able to attend, and we had a fantastic time.
In the end, having more time to work on the demo definitely paid off. We took this time to refine the scenes, tweak the compositions and better consider the choice of textures. This led to a much better end result in my opinion. The other place where we invested time is in learning how to properly utilise Platon’s PLatOS framework to manage background tasks for loading and precalc. This resulted in better flow in the demo without obvious delays between parts. More on PLatOS later…
Uh oh spaghettios… #
Despite testing on our own hardware, we had a nasty surprise running the demo on the compo machine the first time. It would crash intermittently at different points in a way we’d never seen before. This is pretty much the worst type of issue to have. It only happens on real hardware, on someone else’s machine, and can’t be reproduced consistently. What’s more, the iteration time in trying potential fixes was huge. We had to pin down Insane, one of the very busy (and very helpful and patient!) organiser team, get him to write a real floppy disk and run the demo through multiple times.
So with the clock ticking and faced with the prospect of leaving a second party without releasing our prod, we needed a plan. Given the iteration time, it seemed like the only option was the “debug the code in your head” approach. This was clearly a timing related issue, given the intermittent nature. This pointed to it being interrupt related. Pellicus spotted another clue – a few corrupted pixels on the part before the issue most frequently happened – and this part did sometimes happen in the emulator. Through the novel technique of “reading the code” I spotted a potential cause. I wasn’t stashing all of the used registers in one of the VBlank interrupts 🤦. We fixed this issue, Insane tested the new disk and it ran three times without crashing. For good measure we tested in on Charlie’s A500 – the legendary Revision compo machine. All good. Problem solved. Right?
FFFFFUUUUUUUU… #
Well no. The demo crashed on the big screen. Twice. I still don’t know for sure exactly what the cause was, but we’ve since fixed several potential race conditions and the demo seems stable now.
On the positive side it at least reached the final part, and people must have still liked what they saw because they voted for us. Thank you! And thanks to all the people who reassured us at the time, and helped with testing since, especially Bifat and h0ffman.
PLaTOS Framework #
This demo wouldn’t have been possible without Platon’s awesome demo framework PLaTOS aka Pitiful Loader and Trackmo Operating System. The source code is available along with his HAMazing demo which also has a detailed write-up. As well as Platon’s own productions, it’s now been used on New Art by AttentionWhore, which was also released at Deadline. So what’s good about it, and why should you use it?
Multitasking #
The killer feature that allowed this demo to work is the idea of background tasks. With most frameworks if you want to do stuff while an effect is running, like loading and precalcing the next part, your only option is to put your effect code in the vertical blank interrupt. But what if your effect doesn’t run in a single frame? What if you’re using blitter and copper interrupts too?
PLaTOS introduces the concept of task switching, where instead of your main loop busy-waiting until the next vsync once a frame is done, it backs up and replaces the stack to jump into another routine. When the vblank interrupt fires, it restores the original stack before the start of the next frame. Platon explains it better than I can. It supports multiple tasks with priorities, and a whole bunch of stuff I didn’t even touch!
Compression #
PLaTOS includes a number of options for compression, including some with in-place decompression. In this demo we mainly used zx0. One thing I found invaluable was the choice between fully loading and unpacking the next part, or deferring the unpacking until later where we didn’t have sufficient RAM.
Hard drive installable #
It allows you to create a hard disk installable version of your demo for higher end Amigas basically for free. At least it does if you follow the example properly, unlike me!
Batteries included #
The framework includes several utility features that can be optionally enabled, including:
- Trig tables
- Scripting
- Blitter queue
- Palette lerping
Thanks again to Platon for the framework, and for his ongoing support.
Common techniques used #
Before I get into the technical details of each of the effects, there are some fundamental building blocks that we need to know about. Some/all this is probably well known to experienced Amiga demo coders, but I’ll start from the beginning anyway.
7 bitplane trick / HAM7 mode #
On OCS there’s a trick where you request a screen with 7 bitplanes, and you get an broken/undocumented mode that can be
exploited. Obviously 7 is not a valid number of bitplanes on OCS! The maximum is 5 for a normal screen, or 6 in dual
playfield, HAM or EHB modes. What you actually get instead is 6 bitplanes, but planes 5/6 have DMA switched off, and
just repeat a static value from the bpldat
registers. This means that if you can make use of a repeating pattern in
those planes, you get them for free with zero DMA overhead.
One good use of this is in HAM mode. For each pixel in a HAM6 image, bitplanes 5/6 contain the control bits that specify what the other bits represent. They can either select a colour index from the palette, or modify one of the RGB components of the current colour.
code | operation |
---|---|
%00 |
Set a color from the 16 colour palette |
%01 |
Modify Blue |
%10 |
Modify Red |
%11 |
Modify Green |
While normal HAM images choose the control bits dynamically for each pixel to best represent an original true colour image,
we can make use of the static bpldat
registers to have a repeating sequence of control codes.
If we just repeatedly set the RGB values one after the other, we effectively get a lower resolution true colour mode.
The pattern needs to repeat in 16 bits, so we settle on the sequence RGBB
, setting the blue value twice so that each
four pixels represent a single RGB value.
bpl | ||||
---|---|---|---|---|
1 | r0 | g0 | b0 | b0 |
2 | r1 | g1 | b1 | b1 |
3 | r2 | g2 | b2 | b2 |
4 | r3 | g3 | b3 | b3 |
5 | 0 | 1 | 1 | 1 |
6 | 1 | 1 | 0 | 0 |
%01 set red |
%11 set green |
%10 set blue |
%10 set blue (again!) |
The constant values we can put in bpl5dat
and bpl6dat
are:
%0111011101110111 = $7777
,
%1100110011001100 = $cccc
This gives us a slightly blurry 4x1 virtual resolution. By repeating lines with the copper, we can make this 4x4 where that’s more useful. This is ideal for us to transform our chunky pixel data into via chunky-to-planar, which we’ll get onto later.
Unfortunately for us, 7 is a valid number of bitplanes on AGA, which means that this trick won’t work, and instead we get exactly what we ask for! This is easy to work around though, because we can just detect AGA and take a different code path which opens a normal 6 bitplane screen, and use real bitplane data filled with the two repeating patterns. AGA machines will have enough of a speed advantage that we don’t need the free DMA anyway.
Incidentally there was another production at Deadline that made use of HAM7 mode: Therican by Greippi and Slumgud. Rather than chunky effects, this demo exploits the mode to do some interesting experimental stuff with blitter objects and dithering. I love it!
Blitter C2P #
Going right back to basics, graphics on the Amiga are ‘planar’ in that they use multiple bitplanes, which are combined to produce the colour value for each individual pixel. Chunky modes on the other hand have a single value for each single pixel. The upshot of this is that to set the value of a single pixel on Amiga, we need to set bits in multiple places in memory, rather than writing a single byte or word. This is not at all practical for PC style effects like rotozoomers, tunnels etc, where we want to update each pixel on the screen in real-time.
This problem is typically solved by writing the effect in chunky format in memory, and later converting it to planar data that the Amiga can display. On AGA demos this conversion is done by the CPU with a chunky-to-planar routine such as these ones by Kalms. This is really a type of rotation where you swap groups of bits in decreasing sized blocks as described by Scout here. As you can imagine this is very CPU intensive, and not practical for real-time use on base OCS machines.
A much better solution on the A500 is to do the chunky-to-planar conversion using the blitter. With a few compromises we can get this fast enough for real-time, and the big win is that it can run in parallel with the CPU (DMA contention notwithstanding) freeing it up to actually render the effect.
The blitter has everything we need to do the C2P operation, using masking, modulo and shift. One non-obvious thing is that although it can seemingly only shift in one direction, you can use descending mode to shift in reverse. Because this operation will require multiple passes of the blitter, and we’re planning to do CPU work in parallel, we’ll want to use blitter interrupts to chain the operations together in the background. The number of operations is higher still, because we’ll be exceeding the maximum blit height on OCS (1024), and while it would be possible on ECS/AGA we’ll need to split these up.
In this demo we use two varieties of blitter C2P. I’ll detail the implementation of both of them, along with the limitations. This is possibly excessive, but here you go!
HAM7 C2P #
This is the operation to convert a chunky buffer to the 4px HAM7 layout we described earlier. It’s relatively straightforward in that the pixels are written in order as individual words.
Input: Chunky buffer (consecutive words, each containing the 4 bit RGB values for a single pixel)
[Ar3 Ag3 Ab3 Ab3 Ar2 Ag2 Ab2 Ab2 Ar1 Ag1 Ab1 Ab1 Ar0 Ag0 Ab0 Ab0]
[Br3 Bg3 Bb3 Bb3 Br2 Bg2 Bb2 Bb2 Br1 Bg1 Bb1 Bb1 Br0 Bg0 Bb0 Bb0]
[Cr3 Cg3 Cb3 Cb3 Cr2 Cg2 Cb2 Cb2 Cr1 Cg1 Cb1 Cb1 Cr0 Cg0 Cb0 Cb0]
[Dr3 Dg3 Db3 Db3 Dr2 Dg2 Db2 Db2 Dr1 Dg1 Db1 Db1 Dr0 Dg0 Db0 Db0]
...
The RGB values are pre-scrambled to simplify the operation. This isn’t a problem for the type of effects we’re doing, because we just apply this scramble once to the texture data. The bits for the RGB components are interleaved, and we repeat the blue component to match our output format.
Output: 4 bitplanes of RGBB values for our HAM7 mode (non-interleaved)
[Ar0 Ag0 Ab0 Ab0 Br0 Bg0 Bb0 Bb0 Cr0 Cg0 Cb0 Cb0 Dr0 Dg0 Db0 Db0]... bpl1
[Ar1 Ag1 Ab1 Ab1 Br1 Bg1 Bb1 Bb1 Cr1 Cg1 Cb1 Cb1 Dr1 Dg1 Db1 Db1]... bpl2
[Ar2 Ag2 Ab2 Ab2 Br2 Bg2 Bb2 Bb2 Cr2 Cg2 Cb2 Cb2 Dr2 Dg2 Db2 Db2]... bpl3
[Ar3 Ag3 Ab3 Ab3 Br3 Bg3 Bb3 Bb3 Cr3 Cg3 Cb3 Cb3 Dr3 Dg3 Db3 Db3]... bpl4
Blitter operations
- 8x2 swap chunky -> tmp
((a >> 8) & 0x00FF) | (b & 0xFF00)
a = chunkyStart + 2 words
b = chunkyStart
blit width = 2 words
modulo = 2 words
[Ar3 Ag3 Ab3 Ab3 Ar2 Ag2 Ab2 Ab2 Cr3 Cg3 Cb3 Cb3 Cr2 Cg2 Cb2 Cb2]
[Br3 Bg3 Bb3 Bb3 Br2 Bg2 Bb2 Bb2 Dr3 Dg3 Db3 Db3 Dr2 Dg2 Db2 Db2]
[ ]
[ ]
- 8x2 swap chunky -> tmp (DESC)
((a << 8) & 0xFF00) | (b & 0x00FF)
descending mode for right shift
a = chunkyEnd - 3 words
b = chunkyEnd - 1 word
blit width = 2 words
modulo = 2 words
[Ar3 Ag3 Ab3 Ab3 Ar2 Ag2 Ab2 Ab2 Cr3 Cg3 Cb3 Cb3 Cr2 Cg2 Cb2 Cb2]
[Br3 Bg3 Bb3 Bb3 Br2 Bg2 Bb2 Bb2 Dr3 Dg3 Db3 Db3 Dr2 Dg2 Db2 Db2]
[Ar1 Ag1 Ab1 Ab1 Ar0 Ag0 Ab0 Ab0 Cr1 Cg1 Cb1 Cb1 Cr0 Cg0 Cb0 Cb0]
[Br1 Bg1 Bb1 Bb1 Br0 Bg0 Bb0 Bb0 Dr1 Dg1 Db1 Db1 Dr0 Dg0 Db0 Db0]
- tmp → bpl3
((a >> 4) & 0x0F0F) | (b & 0xF0F0)
a = tmpStart + 1 word
b = tmpStart
blit width = 1 words
modulo = 3 words (d=0)
[Ar3 Ag3 Ab3 Ab3 Br3 Bg3 Bb3 Bb3 Cr3 Cg3 Cb3 Cb3 Dr3 Dg3 Db3 Db3]
- tmp → bpl1
((a >> 4) & 0x0F0F) | (b & 0xF0F0)
a = tmpStart + 3 words
b = tmpStart + 2 words
blit width = 1 words
modulo = 3 words (d=0)
[Ar1 Ag1 Ab1 Ab1 Br1 Bg1 Bb1 Bb1 Cr1 Cg1 Cb1 Cb1 Dr1 Dg1 Db1 Db1]
- tmp → bpl2
((a << 8) & ~0x0F0F) | (b & 0x0F0F)
descending mode for right shift
a = tmpEnd - 4 words
b = tmpStart - 3 words
blit width = 1 words
modulo = 3 words (d=0)
[Ar2 Ag2 Ab2 Ab2 Br2 Bg2 Bb2 Bb2 Cr2 Cg2 Cb2 Cb2 Dr2 Dg2 Db2 Db2]
- tmp → bpl0
((a << 8) & ~0x0F0F) | (b & 0x0F0F)
descending mode for right shift
a = tmpEnd - 2 words
b = tmpStart - 1 word
blit width = 1 words
modulo = 3 words (d=0)
[Ar0 Ag0 Ab0 Ab0 Br0 Bg0 Bb0 Bb0 Cr0 Cg0 Cb0 Cb0 Dr0 Dg0 Db0 Db0]
4 bitplane C2P #
We used another C2P mode for the non-HAM scenes, which offers a 16 colour mode at a better resolution of 2x2 at 25Hz. However this has some more significant restrictions.
In addition to the pixel values being scrambled, the order of the pixels in the chunky buffer is also out of sequence. By incorporating this into the effect code we can skip a swap operation.
The bits of the pixel values need to be scrambled differently for odd and even pixels:
3 2 _ _ 1 0 _ _
vs _ _ 3 2 _ _ 1 0
and these need to be ORed together into the chunky buffer, so each byte contains two pixels. All of this is fine in a table effect, but annoying for something like a texture mapper, where you want to jump to and write specific pixels.
In practical terms this means you need two versions of your pre-scrambled texture. On some table scenes we went a step further for optimisation, and created additional texture variants for repeated pixels (a3 a2 a3 a2 a1 a0 a1 a0
) and consecutive pixels (a3 a2 b3 b2 a1 a0 b1 b0
). This avoids some moves but takes time to pre-calculate and takes up a lot of memory.
Input: Each group of 8 pixels already in scrambled order A,B,E,F,C,D,G,H across two words
[ A3 A2 B3 B2 A1 A0 B1 B0 E3 E2 F3 F2 E1 E0 F1 F0 ]
[ C3 C2 D3 D2 C1 C0 D1 D0 G3 G2 H3 H2 G1 G0 H1 H0 ]
...
Output: Non-interleaved bitplanes with 3/2 swapped. Each bit is repeated to halve horizontal resolution.
[ A0 A0 B0 B0 C0 C0 D0 E0 E0 F0 F0 G0 G0 H0 H0 ]... bpl1
[ A2 A2 B2 B2 C2 C2 D2 E2 E2 F2 F2 G2 G2 H2 H2 ]... bpl3
[ A1 A1 B1 B1 C1 C1 D1 E1 E1 F1 F1 G1 G1 H1 H1 ]... bpl2
[ A3 A3 B3 B3 C3 C3 D3 E3 E3 F3 F3 G3 G3 H3 H3 ]... bpl4
Blitter operations
- Swap to bpls 1/3 (start of planar buffer)
(a & 0x0F0F) | ((b >> 4) & ~0x0F0F)
a = chunkyStart
b = chunkyStart + 1 word
blit width = 1 word
modulo = 1 word
[ A3 A2 B3 B2 C3 C2 D3 D2 E3 E2 F3 D0 G3 G2 H3 H2 ]
- Swap to bpls 2/4 (end of planar buffer)
((a << 4) & 0x0F0F) | (b & ~0x0F0F)
descending mode for right shift
a = chunkyEnd - 2 words
b = chunkyEnd - 1 word
blit width = 1 word
modulo = 1 word
[ A1 A0 B1 B0 C1 C0 D1 D0 E1 E0 F1 F0 G1 G0 H1 H0 ]
- Copy to bpls 2/4 and apply pixel doubling
(a & 0xAAAA) | ((b >> 1) & ~0xAAAA)
a = bpl1Start
b = bpl1Start
d = bpl2Start
blit width = 1 word
modulo = 0
[ A1 A1 B1 B1 C1 C1 D1 E1 E1 F1 F1 G1 G1 H1 H1 ]... bpl2
[ A3 A3 B3 B3 C3 C3 D3 E3 E3 F3 F3 G3 G3 H3 H3 ]... bpl4
- Apply pixel doubling in place on bpls 1/3
((a << 1) & 0xAAAA) | (b & ~0xAAAA)
descending mode for right shift
a = bpl3End - 1 word
b = bpl3End - 1 word
blit width = 1 word
modulo = 0
[ A0 A0 B0 B0 C0 C0 D0 E0 E0 F0 F0 G0 G0 H0 H0 ]... bpl1
[ A2 A2 B2 B2 C2 C2 D2 E2 E2 F2 F2 G2 G2 H2 H2 ]... bpl3
Table effects #
Ok, so now we have a couple of chunky modes available to us, what can we do with them? The type of oldschool PC effect that’s at the center of this demo is the table effect aka ‘UV table’, ‘move table’, ‘offset table’, and probably a bunch of other things.
So what’s the idea behind this? Basically for every pixel on the screen, we pre-calculate which pixel to use from a texture image (i.e. the UV coordinate of a ’texel’). This can be generated offline, and the classic usage you’ll see in demos is for something like a tunnel effect. These UV coordinates translate into a byte offset in the texture data e.g.
screen pixel | texture pixel | byte offset |
---|---|---|
(0,0) | (4,6) | 4+6*TEX_W |
(1,0) | (5,7) | 5+7*TEX_W |
(2,0) | (2,5) | 2+5*TEX_W |
This gives us a single pre-calculated frame of a texture mapped 3D scene, which we can render through a series of moves. The trick that allows this to fool the viewer into thinking they’re seeing real-time 3D, is that although the scene can’t move, the texture can. By having a repeated texture, and moving the start pointer for UV (0,0), we can move the texture on both axes and the scene appears to be animated.
Here’s the effect in action with a simple tunnel. IQ has a great article on the kind of effects you can do using this technique with simple plane deformation formulas. We can do something more interesting though, as we’ll get onto shortly.
Speed code #
If you’re used to work in high level languages, you might imagine that the code to render a table effect would be a simple loop that reads an array of offsets and then looks up the texture data. Something like:
for (i=0; i<offsets.length; i++) {
offset = offsets[i]
color = texture[offset]
screen[i] = color
}
This isn’t going to cut it for our purposes though. The offset lookups and looping would make it far too slow. To avoid these we could instead just have a big chunk of code containing a completely unrolled sequence of writes with the offsets baked in:
screen[i++] = texture[0x23f5]
screen[i++] = texture[0x52a0]
screen[i++] = texture[0x32f1]
...
or in assembly
move.w $23f5(a0),(a1)+ ; 32e8 23f5
move.w $52a0(a0),(a1)+ ; 32e8 52a0
move.w $32f1(a0),(a1)+ ; 32e8 32f1
...
We can transform our offsets array into code in precalc. The idea that instructions are just binary data, and having code that generates code is key to a lot of demo effects.
The Effects #
Motherboard #
Nothing too technical on this one. Mainly just Steffest’s amazing artwork! There some scripting to pan around the giant
bitmap, with some blitting here and there. The line plotting is just a simple bset
that follows the pan position, with a
sprite for the spark. There’s some palette lerping for the fade in and out, and the change to the red colours when
the processor heats up, and finally some cycling for the chrome effect, which is going to be a recurring theme!
My previous collaborations with Steffest have been on size restricted productions, so it was really nice to see what he came up with given the freedom to use large graphics and animation. We use this scene to preload the music, and the PCB line burning effect works as a psuedo progress bar, even though we only realised this after the fact.
Amiga chrome #
Now that the Amiga has been powered up with Desire magic, it starts to spread, transforming the user into a chrome Silver Surfer, and transporting them into the computer world.
Probably pretty obvious, but what’s happening here is that we’re gradually replacing the initial image, blitting in a chrome version set up for colour cycling. There’s a separate mask for each part of the Amiga and user, and we intersect each of these with a moving wobbly line to give it an organic feeling animated mask.
The Amiga image itself started out as a 3D rendered image (not cheating–it’s necessary for the effect!). I completely re-pixeled the 2D version, hand dithering everything. For the chrome version, the colour indexes map to the V component of the surface normals, so that the cycled colours give the impression of lighting moving in one dimension. This is a taste of things to come in the chunky parts.
Title #
More huge bitmap graphics from Steffest with this crazy impressive logo. It was originally made in true colour, with the intention to render it as HAM. With time pressure mounting he made a 32 colour version to simplify things, and honestly I think it looks better! Thanks to his fantastic conversion it looks nice and sharp with really clean dithering, and we didn’t even need to resort to any copper tricks.
2D Tunnel #
Still 2D, but ramping things up a bit to build to the drop. It might be ‘only’ colour cycling, but it’s a pretty complex one, with multiple colour groups, fading and cycling at different rates. Luckily for me, like several of the effects, Steffest provided a fully working prototype in JavaScript. The running figure is rendered using sprites, and the main image is completely static. We increase the rate of the flashing in sync with the building of the music.
I think it was a nice idea to have a 2D version of the effect, before the transition into the 3D world, and the reveal of the chunky tech.
Face lights #
Ok, now for the first chunky effect. This uses the 2x2 16 colour mode with described earlier. The illusion we’re going for here is a lighting effect, with multiple light sources. Of course this isn’t really what’s happening. It’s a table effect, environment mapping a texture consisting of radial gradients, giving a ‘fake phong’ effect. The way this works is that the UV values in our table use the surface normal of the object i.e. the direction it’s facing on that screen pixel.
While Pellicus was busy building the scene exporter for Unity, I figured out a crude way to get UV data out of Blender for single object scenes like this.
I render an image using a probe texture (left) such that on the resulting RGB output, the red component maps to U and green maps to V. I also overlay a mask in blue to indicate background pixels which should always remain black. These pixels also give us a performance boost because there’s no texture lookup.
I built a script in JS to convert the png output into binary offset tables. For optimal compression, we export separate delta tables for the U and V components.
The texture looks like this, giving the effect of multiple light sources. The limited palette doesn’t give much scope for smooth gradients, but I quite like the posterised look it gives. We move sprites around to simulate the light sources, aligning them the to texture offset and gradient positions.
We raised a few eyebrows with the palette choices, and I appreciate that bright colours like this have an association with ‘coder colours’ in the demoscene i.e. placeholder colours with no artistic input. This was a deliberate choice by us though, as part of the concept, and I stand by it :-)
Rotozoomer #
I couldn’t resist trying out our new true colour chunky mode with this classic effect. It runs at 50Hz, and could be full screen if we weren’t reserving some raster lines for background tasks. The main technique at play here is self modifying code. On each frame, once we’ve done the necessary calculations for the current angle and scale, we generate speed code to render a single horizontal line. We can then just call it repeatedly, adjusting the start offset for each line.
The maths is probably better explained elsewhere, but I’ll give it a go. The calculation for the rotation gives four variables that describe the gradient or… for each whole pixel step we take on the screen x or y, the corresponding steps we take along the original image.
dudy = sin(a) / scale; // horizontal step on image per vertical step on screen
dvdy = cos(a) / scale; // vertical step on image per vertical step on screen
dudx = dvdy; // horizontal step on image per horizontal step on screen
dvdx = -dudy; // vertical step on image per horizontal step on screen
We use sprites to add the face graphic on the left hand side, and add some movement to the palette using Perlin noise. This carries the narrative through, and also has the added benefit of hiding some jaggies on the left hand side from scroll offsets.
Circuit greets #
Now we’re back to 2D for this recurring greetings effect, that we’ll use to break up the chunky parts. This is good not only for pacing, but actually essential to allow space for the loading and precalcing the resource intensive parts.
This is another effect where Steffest’s prototype was invaluable. This one even includes a live editor, and outputs coordinate data that I was easily able to convert to ASM. It defines an array of lines to draw, each of which can start and end with a dot image of varying size. There’s also an additional array of stand-alone dots.
let lines = [
{coordinates:[[64,199],[74,189],[110,189],[118,181],[156,181],[170,195],[238,195]],start:2,end:1,speed:4},
{coordinates:[[59,196],[59,185],[63,181],[72,181]],start:1,end:2,speed:2},
{coordinates:[[61,174],[64,177]],start:2,end:2,speed:2},
...
];
let dots = [[78,61],[80,57],[82,54],[84,50],[87,45],[86,54],[75,118],[75,122],[75,126],[79,122],[71,120],[71,124],[66,188],[77,105],[76,109],[79,126],[63,143],[74,78],[74,83],[74,88],[71,134]];
The code iterates over the lines, plotting them a pixel at a time with and blitting the dots until they’re flagged as complete. Once a line is complete, it will periodically trace over them to add a highlight by plotting on another bitplane. The tail of each highlight is cleared after a given number of pixels, to give a short dash. The dots are also highlighted in a similar way by blitting the image to another plane.
In additonal to all of this there’s some colour lerping going on, and we fade in a background image containing the greeting text and face image.
3D objects #
Hey, not everything’s faked with table effects in this demo! These parts are legitimate real-time 3D. The two torus scenes both use a texture mapped polygon routine, rendering to our HAM7 chunky mode.
The first scene uses a table effect as a background, aiming for a high impact by dropping two new effects together. We’ll skip over the background effect for now, as we’ll cover that soon. In actual fact, rendering the table effect isn’t that much worse than clearing the chunky buffer with the CPU anyway, and we’re already down to 25Hz for complex objects. I did have a some simpler texture mapped scenes running at 50Hz, like a cube or a simple icosphere, but decided the impact of the torus justified the lower frame rate.
So let’s focus on the texture mapper for now. I really have spkr to thank for piquing my interest in texture mappers. He pointed me in the direction of this classic textfile.
I won’t go over the whole implementation but my main takeaways are that:
- For non-perspective correct (affine) texture mapping, the whole triangle uses a constant gradient, just like a rotozoomer
- You can use self-modifying code in much the same way
- To draw a triangle you sort the points vertically and split it into two sections, top and bottom, with different cases where mid-point is on the right or the left. You need to handle the case when the top or bottom are flat too.
- It’s then a matter of interpolating the x coordinates, and the UVs for each line.
For the self modifying code, you just need to generate code to write the longest horizontal span, which will be the line between the mid point, and the interpolated point on the long side. This generated code should write in reverse, right to left, finishing on a zero offset. You can then determine the length of each line you draw by where you jump into code.
This is mostly it, but I spent a lot of time optimising this routine. A few techniques used were:
- Convert divs to muls using a table of fixed point fractions. Rather divide by 6, multiply by
(1/6)<<15
, and eventually convert back. - Use
addx
trick for interpolation. Combine the U fractional part with V, to get an integer UV with no shifting; d0, d2 ------UU ; d1, d3 uu--VVvv add.l d3,d1 addx.b d2,d0 move.b d0,d1 ; VVUU
- When we know that faces form pairs that make up quads they share some calculations e.g. dot product for back-face culling
As for the objects themselves, another trick I owe to spkr is presorting faces in a torus. You’d think that because a torus is inconvex, you’d need to sort the transformed faces by Z distance to render correctly, but actually drawing from the inside out is enough i.e. the faces closest to the center are drawn first. Along with simple backface culling, the painter’s algorithm ensures that the correct faces are visible. We just sort the faces in the object data and don’t have to worry about this again.
The first torus is flat shaded. The brightness levels are just multiple versions of the texture, generated in precalc, and the brightness of a face (which texture variant to use) is determined by its dot product. Values aren’t normalised, so this also has the effect that objects get darker as they get further from the camera, which is good actually.
The second torus is a fair bit more complex. It has 32 faces, compared to 24 on the previous one, and also includes normals which need to be transformed along with the vertices. This effect uses env map fake lighting, a lot like the Face Lights part. We replicated the same palette for continuity, but this time in true colour with smooth gradients. Like on the table effect, we do this by using the object’s normals as texture UVs.
So aside from losing the background, how do we squeeze out enough performance to do all of this on 25Hz? The main trick is that the screen size is set dynamically per frame, making it as small as it can be to fit the object bounds. This minimises the size of the chunky buffer which needs to be cleared and later converted to planar by the blitter, as well as reducing DMA usage, where bitplane DMA is off.
Here I’ve set color00
so you can see the screen resizing in action.
The next trick is the use of a ring buffer. Now more than ever, there are some frames which are much quicker to draw than others. Now we buffer up to 20 frames ahead of the visible one, and take advantage of the time available on the ’easy frames’ to write ahead and fill the buffer.
Finally, one thing you might have noticed with the screen resizing is that we’re now putting sprites in the borders.
While you can set a flag to do this on ECS, it isn’t possible on OCS, right? Well luckily for us
there’s a way. If you write to the
bpl1dat
register, sprite DMA is kicked into action for the rest of the line. We add this to our copper list on every
line and boom, it works.
HAM7 Table scenes #
Finally we get to the type of scene that that whole concept was built around. These are where Pellicus’s tooling comes into play. Using an extension written in C++ with Dear ImGui he was able to compose scenes in Unity, and export them in our custom binary format to drop into the Amiga routines. The tool allows the scenes to be previewed as they would be rendered on the Amiga, trying out texture images and animation. It also exports png images for the sprite overlays. There’s a lot more he can tell us the tooling, but maybe we’ll save this for a follow-up article.
On the Amiga side, the implementation is a table effect using the HAM7 C2P as we described previously, but we’ve done a few things to make it more interesting.
Luminance levels #
As well a texture UV offset, each pixel has a brightness value 0-15. In terms of the data, this is just another delta table, but how do we render it? We certainly can’t fade the RGB values in real-time, but like on the flat shaded texture mapper we can do this in precalc, and generate 16 variants of the texture. This takes a while and uses a lot of RAM, but we can manage it. Now, how does this translate into our draw code? Remember our speed code looks something like this:
move.w offset1(a1),(a0)+
move.w offset2(a1),(a0)+
move.w offset3(a1),(a0)+
Maybe we could just treat it as one really long texture and add TEXTURE_SIZE*color
to the offset? Unfortunately the
offset in this addressing mode can only cover a range of 65,535 bytes, and with the repetition we need for the texture
movement, each variant will be 16k. Luckily we have multiple registers available. We can reach 3 variants per register
and have 6 registers available, which is more than enough.
uvOffset-TEXTURE_SIZE(a1) ; shade 0
uvOffset(a1) ; shade 1
uvOffset+TEXTURE_SIZE(a1) ; shade 2
uvOffset-TEXTURE_SIZE(a2) ; shade 3
uvOffset(a2) ; shade 4
uvOffset+TEXTURE_SIZE(a2) ; shade 5
uvOffset-TEXTURE_SIZE(a3) ; shade 6
uvOffset(a3) ; shade 7
uvOffset+TEXTURE_SIZE(a3) ; shade 8
...
Fading #
We’re able to adjust the global brightness level of the textures to fade the scene in and out, and to add the synced flashes. This is almost straightforward! We can just adjust the initial offset of each of the 6 address registers. This doesn’t exactly fade, as much as transpose the RGB values, clipping against the maximum and minimum values. The annoying caveat is that because we’re sharing 3 texture variants on each register, we need the first and last variants to be repeated x3 to avoid out-of-range offsets, using up even more RAM!
Panning #
You’ll notice that these scenes scroll around horizontally and vertically within the viewport. Given the way table scenes work with a fixed sequence of hard-coded moves, this takes some thinking about. The trick here is to generate code that would write to a much bigger screen, and then modify it for each frame.
We jump into the code at an offset to determine the first pixel, and therefore x and y pan, but first we need to modify the code to prevent it overflowing the chunky buffer.
So say our screen (and chunky buffer) is 80x50 chunky pixels, and the table data is for a scene is 110x110. After
80 moves from our entry point, we need to overwrite the next instruction with a bra
to jump ahead and skip 30 moves,
which happens to be 120 bytes. This works just like the modulo on the Amiga’s planar screen.
We repeat this for each of our 50 screen lines and write an rts
on the last one.
Once we’ve executed our modified code we need to restore the original instructions, which we backed up to the stack.
This all works, but so far it only moves within the 4x4 chunky resolution, and looks very jerky. We can improve this to 1x1 pixel resolution by adding the x remainder via the bitplane scroll registers and adjusting the height of the first row in the copper.
Vertical resolution doubling #
With this mode we can get 50Hz at 4x4 chunky resolution, and 25Hz at 4x2. On these parts I tried to get the best of both worlds by alternating odd and even lines like an interlaced screen mode. The table data contains twice as many lines, as if the resolution was 4x2. We skip lines using additional modulo, and shift the screen +-2px vertically on each frame. The advantage of this is that the texture movement and scrolling is still 50Hz, and the temporal blur it adds actually works well to hide the blockiness.
Sprites #
The human figures are added as animated sprites, using the same normal mapped colour cycling we used on the Amiga chrome effect. I tried to choose palettes for the cycling colours that matched the textures used on each scene to give the impression of moving reflections. The position of the sprites needs to be adjusted to match the panning on the table effect to keep them aligned.
4 Bitplane Table scenes #
This is another variety of table effect that we use to render Unity scenes. It has different pros and cons. We get 2x2 resolution at 25Hz, but in only 16 colours, and with more complex drawing code. This means that luminance and panning go out of the window, but instead we can have multiple textures which move independently. Using the same data structure, we now store a texture index instead of a luminance level. We can reference a maximum of 3 textures, which have to share a common palette. The limiting factor here is that in the 2x2 mode we need two versions of each texture, scrambled for odd and even pixels. These use up a lot of RAM, and each one needs a separate register.
To portray the narrative of the characters coming together, we have a split screen view containing four individual scenes. I’m sure this was challenging to do on the PC tooling side, but as far as the Amiga is concerned it’s still a single table scene with multiple textures, so there’s really nothing different about it! The only fiddly part is having the four 16 colour sprites on screen together. To achieve this they’re just laid out on a way that none of them overlap horizontally.
Credits #
The credits have the big reveal that the four characters are actually us, as they transform from chrome back into human form. It’s another fake lighting effect, this time in true colour, and for this to work we needed 3D models of our heads. Without access to a 3D scanner, I looked for a solution to generate models from 2D images. FaceBuilder for Blender seemed to be the best option. I got everyone to supply photos from a variety of angles, and it did a great job of creating models that were good enough for our purposes.
Like with the other Blender scenes I rendered a png with probe textures to export the normals, and this time we also need the colour data.
The colour image needed to be reduced to a common 16 colour palette. I tried various options for this and got the best results by first adjusting each face to be closer in hue and brightness, quantising using https://squoosh.app/, and finally doing some manual repixeling. The generated textures unavoidably contain some baked in lighting from the original photos, and the effect works much better if we flatten out highlights and shadows.
The way we implement the lighting effect in multiple colours is pretty similar to how we added luminance levels on the previous scenes. Instead of having 16 variants of the same texture at different brightness levels, we have one texture per colour index, mapped to registers and offsets in the same way. The textures themselves consist of a radial gradient from each colour fading to black, similar to the fake phong texture we used before.
The next part of this effect is the transition from multiple colours to a single chrome texture. The way we do this by modifying the speed code one pixel-move at a time, changing the source register to one pointing to the chrome texture. The order in which we change the pixels is generated offline and included as a table. It’s a combination of the x position and the surface normal, so it goes from left to right weighted by the shape of the faces.
The text is overlaid as sprites, and the scroll movement is at 50Hz, even though the C2P is at 25Hz.
Fin #
So that’s it. We had planned a more complex end screen, but we the cut to a simple logo that I’d initially added as a placeholder, gave a nice mic drop moment and a fitting ending.
This is the place where I should probably write some sort of meaningful conclusion, but three weeks on my brain is still pretty fried! I’m really pleased with the reception the demo has had, and I want to thank everyone who’s taken the time to give us any sort of feedback. I’m happy that we achieved what we set out to do with this demo, even though I think we’ve all had enough of table effects for the foreseeable future! I think the thing I’m most pleased with is how we managed to incorporate a real narrative and a positive message that seemed to really resonate with people. This is something I’m keen to nail with future productions too.
I’m really looking forward to working with this team again, and excited to see where we go next. Oh, and I’m sure there are some lessons to be learnt about time management and testing, but we’ll forget about that for now.