Flegma blog header photo
Flegma's c-blog
Fronts 3Posts 945Blogs 93Following 0Followers 20



Fledgling retrodev: Balancing space/time 2 - space


Space. The long-bygone frontier. These are the chronicles of Fleggy and its on-and-off journey to discover the pains of creating games on old hardware, to go where the pioneers went over thirty years ago.

Today... If you're a consumer and need space, you just buy a multi-terabyte external HDD and you're good to go, unless you're using a Switch, a 3DS or a Vita. It's not like anyone uses Doublespace or comparable nowadays either. If you're a dev and need space, let the customer take care of the need for extra space. Even when you're doing a physical release and not just a download code in the box, it's not like you can't just put in a Steam client on the disc and let the user download the game. Or just hide the game (or most of it) as an update. No, I'm not serious - don't do it.

Some time ago, I wrote a cblargh on how I started creating an on-rails shooter for an old microcomputer and how I had trouble with getting it to run fast. This time, I'm going to look at the other side of the coin - making the game small enough.

Just saying "there's not enough space" isn't always a valid reason to skip adding something; it may just take more time and effort to squeeze it to a smaller space in a space/time tradeoff of a different kind. Just look at this Bad Apple animation played back on C64 and the writeup included in the comments.

And now... on with the show. As a heads-up, a "frame" will mean a cutout through which the the player will fly. Think "Hole in the Wall" (the TV series) and you get the idea. I also seem prone to mistaking horizonal and vertical, and not just left and right. I meant that each 8*8 pixel character was split with a horizontal line into two 4-pixel-tall "subcharacters".

Using precomputed results

When you're lacking in CPU power to do the same calculations over and over, use precomputed results. Taking each output 'pixel' and seeing where the ray traced through it would hit in each frame would be just that and as such, a prime candidate to be stored in advance. We have 32*40 'pixels', eight "depths" and need to store the ray's X and Y coordinate in the frame. This means rows*columns*depths*(coordinate pair) = 32*40*8*2=20480 bytes. Did I mention I'm trying to stay under 16KB of ROM, although I think 48KB would still be acceptable? 

As for how much RAM available? 16KB. 

Okay, time to rethink how to store that information. Let's start with not storing X and Y for each pixel, but map column to column and row to row. For instance, column 5 could map in the first frame to column 5, column 2 in the second frame, -1 in the third and so on. From what I can tell, these columns and rows were constant regardless, except now we need to compute the memory addresses for those. We're down to (32 + 40)*8=576 bytes. Much better, and what I should've done right at start, rather than the many bad ideas I had before this.

What about the frames? Each is 32*40=1280 'pixels' in size, and flying through just one or two frames repeated ad infinitum wasn't in my plans. Storing one pixel per byte is not the way to go -- that'd mean wasting 7/8 of space. Using one bit to represent one pixel would take 40*32/8=40*4=160 bytes, which is more space-efficient.

I didn't go with that solution, though, which I may come to regret. Instead, I tried to compromise between time and space. Given how the renderer would work -- no resampling or antialiasing at all and at such a low resolution -- any details would get lost, so they're better avoided. We also planned to make the world monochrome, so everything is fine for using run-length encoding (RLE).

RLE means giving instructions like "repeat 123 zeroes, then 5 ones, then 189 zeroes", rather than spending 123+5+189 bytes to store it unencoded. Here, we use RLE to tell which pixels are walls that will both obstruct the view and mean death for speeding dragons.

We can almost check quickly if a pixel is lit or not. To make it faster, I chose to represent blocks as runs of blocking sections. For example, 0, 5, 14, 19, 20, 32 would mean "block columns 0 to 4, block columns 14 to 18, block columns 20 to 31". Given how few columns the display will have and how quickly the details are lost in the background, I chose to have exactly three blocking runs per row. That ought to be enough... I hope. I'm spending two bytes more on each row, but the resulting code is more legible (for me).

 An example of a frame and how one scanline would be depicted in it.

Data compression

Need more space? Well, just zip a folder and you're already better off. And that's what I'll do.

For a layman, compressing data is often just a question of speed, how many megabytes the resulting file has and how much data is lost like in using JPEG instead of a BMP.

An oft-ignored side is that you should also factor in the size of the program used to decompress the code. Otherwise, you could set the first byte to 0, and the decompressor will know to produce a specific several megabytes large file embedded in the decompressor. In this case, the compressor size can't be ignored.

Are there ready code snippets for compressing data on Z80? Yes, and they probably do better job than I can. But if I only use other people's code, I won't learn to code such things myself.

Huffman codes are a simple way of compressing data that assigns shorter codes for values that occur frequently and longer codes for the rarer ones. Your regular ZIP file may very well have been compressed with it.

In my case, I need to store the decoder and the compressed data in the memory. The original compression I can easily do on PC. And if I make a few (valid) assumptions of the data, we can avoid certain checks and get the decoder to use under 100 bytes in ROM and only a few bytes of RAM.

Huffman codes work better when the values in the input do not occur at similar frequencies. (For more details, see also entropy in information theory if you want to.) We can use this information to produce data that compresses better, although the theoretical limit will still be one bit per byte. This means shrinking to 12.5% in size will be the (unachievable) target without any other tricks I won't be implementing. So, if I use, say, only even numbers to describe the runs instead of both odd and even, the compression ratio will improve. I don't know if this'll be necessary, but that can wait until I actually start creating content.

However, I may have broken the rules of optimization: don't optimize, and don't optimize yet. If one frame takes 40*6=240 bytes, that's actually not all that much space taken. Even worse, I might be more constrained with RAM than I am with ROM, unless I go crazy in the form of different levels or graphics, so I might be better off storing the frames in their RLE-format in ROM.

So, I can now store a fair number of different frames in the ROM and extract them to the RAM... in which can't store the whole level as a full list of frames. Instead, I've implemented a one-level codebook: the level is an array that tells which frames in a codebook I'll use.

Space and time working together

In May 2017, I rewrote the renderer as the so-called Painter's Algorithm, which is just a fancy way of saying "Paint the closest items last over the more distant items you painted first". It's also a whole lot faster... for two reasons.

The first simpler reason is that I took cue of Wolfenstein 3D and Doom that let the player reduce the viewport size or resolution. I did just that, and ditched the vertically splitted characters. The rendering area is now 24*20 pixels instead of 32*40. If I got spare clock cycles in the end, it'll be easy enough to update that to back to 32x20. As a bonus, the frames now take only half of the space they used to.

  Right: viewport maximized (or close to it) in Wolfenstein 3D. Left: the ultra cinematic version.

The more complex second reason is reducing repetition. "But you now might redraw the same pixel multiple times, how is that reducing repetition?" Because now I don't repeatedly recompute the memory addresses for the frames and the row I'm looking at. Basic optimization that I hadn't done for the previous version.

Current version of the engine (reduced resolution, Painter's algorithm) running at its top speed in an emulator.

As I was writing this cblog, I also noticed I'm doing something not much unlike SEGA did with their Space Harrier on Sega Master System: the incoming obstacles and enemies are background tiles rather than sprites. At this point, though, I have to say that while Master System has the same CPU (at a higher clock rate) than MSX, its video chip is far better, meaning I'll fall short of that game.

Cart before the horse

At this point I realized "Ok, I have a good part of the engine done. What about the game and gameplay?"


Until next time: designing the game with the controller in mind.

Login to vote this up!


Kevin Mersereau   27
Wes Tacos   17
Uber Mashu   8
Dinosir   7



Please login (or) make a quick account (free)
to view and post comments.

 Login with Twitter

 Login with Dtoid

Three day old threads are only visible to verified humans - this helps our small community management team stay on top of spam

Sorry for the extra step!


About Flegmaone of us since 11:34 PM on 01.17.2015

Very much unprofessional writer, don't take anything I write without a truckload of salt.

On a hopefully long-term break from saying anything.