Sixty Frames per Second or Bust
In 1989, when I was still in high school, I released a video game for the TRS-80 Color Computer 3 (known as the Coco) named Zenix. Soon thereafter, I released another video game called the Crystal City. I had been programming in Basic since I was 11, programming in assembly language and publishing articles in the Rainbow magazine since I was 13*.
I loved video games, and I knew I wanted to write my own. And if I was going to write a game, I knew I wanted it to be smooth, fast, and play music in the background. The only way to get both smooth and fast was to make it run at the TV’s native frame rate of 60 frames per second. Fast, smooth, music, and sound effects. If I couldn’t achieve that, I wouldn’t bother. Sixty frames per second or bust.
* My dad wrote the text of the articles since I was only interested in programming. Thanks dad!
The source code for Zenix and the Crystal City is located here github.com/gosub-com/Coco and is freely available under the MIT license. Included in the package is a C# application called CocoDisk which can read virtual Coco disk files. Also included is a utility called CocoCom, which can be used to transfer files and disks from a Coco via RS-232. Download CocoDisk.exe
My old Coco disk drives were broken, so I bought a 3.5″ floppy drive, hooked it up, and amazingly, it worked. Even more amazing, these disks had been sitting unused for over 28 years, many of those years in a non-climate controlled storage unit. There were a few errors, but most of them were on unused portions of the disk.
The Coco runs at 2 megahertz, but the shortest instruction (a nop) takes two cycles making the absolute maximum speed one million instructions per second. But those instructions are not very powerful. The most that can be done in two cycles is an 8 bit add of a register and a constant number between 0 and 255.
A sixteen bit add takes 4 cycles, and if you want to add something from memory, you are up to 6 or more cycles depending on the addressing mode. Code to compute ‘a = a + b’ when both parameters are 16 bits long and in memory takes a minimum of 16 cycles. In one second, the Coco can execute 125000 such computations per second which works out to just 2083 per frame.
That’s what you have to work with, and if you can’t generate sound, erase, and draw everything you need to in that time, you don’t get 60 frames per second. Every cycle counts.
The original Coco3 came with 128Kb of virtual memory, which get mapped into the physical 64Kb address space in 8Kb chunks. Zenix uses 64Kb for a single frame buffer, leaving no possibility of double buffering. Here is a memory map of Zenix on 11/4/1988, and discontinued map of Crystal City on 1/14/1990:
The Coco doesn’t have any fancy sound generation hardware, other than poking a 6 bit number to $FF20 at the exact right time to make whatever noise you want. The fast IRQ is setup to trigger 3623 times per second (about 60x per frame) and calculate the value to store in the DAC. For every cycle used in the sound routine, you lose 60 cycles per frame. In the speed example above, if the sound routine takes 120 cycles, that leaves us with processing power to calculate ‘a=a+b’ only 1633 times per frame. To get the best performance possible, the sound routine was placed in the direct page and self modifying code was used.
Take a look at GAME1.TXT and scroll down until you see “***FIRQ***” If you don’t know the code is self modifying, it looks nonsensical, like ‘V1=0+0’. But if you examine the assembly code, you’ll see that the second 0 is replace by the store instruction, turning the code into ‘V1=0+V1’. Way down in the slow 60x per second interrupt (search for NOTE1), the first zero is periodically replaced by a value from a new note, turning the equation into ‘V1=Frequency+V1’, thus generating a 16 bit counter continuously running at any frequency we care to calculate.
Down below, at WAVE3, there is code that looks like ‘B=N+N+N’. But search the code for WAVE1, WAVE2, and WAVE3 and you’ll see that it’s modified to become ‘B=WaveTable1[V1/256] + WaveTable2[V2/256]+WaveTable3[V3/256]’. This allows the FIRQ to be a fast wave table sound generator using less than 25% of the CPU processing power. The complex calculations (note generation, wave table changes, etc.) are done in the slow IRQ, only 60x per second.
The graphics mode used by Zenix uses 256 bytes per line, but only 160 of them are shown on the screen. The unused bytes are used to crop the icons on the left and right. By leaving unused space above the frame buffer, the hardware can crop the top of the sprites. That leaves the bottom of the screen, which can’t be cropped in the same way because of the score and shield status line. Instead, the frame buffer is positioned so that the top of the status line is at an 8Kb boundary. This allows the MMU to be used to crop the bottom of the screen above the status line.
Since Crystal City uses the hardware to move the frame buffer, creating the horizontal scroll effect, the MMU can’t be used to crop the bottom of the screen. Instead, the FIRQ interrupt is used to calculate when the raster hits the status line. Then the video frame buffer address is changed so it points to the status line. This method works reasonably well on the actual Coco hardware, but fails pretty badly on the emulator. If I were to do it again, I would recommend using the HSYNC interrupt, count scan lines, and run sound routines every 4th line. But there would be more CPU overhead, so I can’t say for sure if this is a good idea.
There isn’t enough memory to have two frame buffers so double buffering is out of the question. No matter how fast you can erase and redraw a sprite, you’ll always have some flickering. And the worst case is when the sprite is moving horizontally across the screen, hits the raster over and over, and is partially or even completely invisible. Aside from the ugliness, having an invisible insect shoot at you is not fair.
The solution is to check where the raster is before drawing. If the raster is closely above where the sprite is, queue it up to be drawn later. Search the code for HLINE and you can see how this is done.
Erasing and drawing sprites is the biggest bottleneck in a Coco video game. To get the best performance possible, sprites are converted to assembly language. No loops, just load a register and store directly to video memory. The first sprites were created manually by tediously hand coding the assembly language from pictures drawn on graph paper. Later on, I wrote a sprite generator in assembly language, SPRITE.TXT. Here are examples of the sprite drawing assembly language:
- Erase a sprite: CBBUG.TXT
- Draw a sprite with a single color: FASTBUG1.TXT
- Draw a sprite using palette: OBUG1.TXT
- Generated code: OSHIP.TXT
Two other tricks were used. Sprites in the squadron are not moved every frame. They go slow enough so it doesn’t look like they are blurry or jerky when they move. And the boss ships are never erased. They are drawn with an outline around them, then never move faster than the outline would allow. Hand assembled sprites:
Loading and saving from disk was way too slow, not to mention that the disks didn’t hold very much data. So, I wrote a ram drive, JMSDOS.TXT., that could read and write 3.5″ 80 track floppy disks. Building Zenix and Crystal City on the ram drive was orders of magnitude faster than using the floppy disk drive. I replaced the Radio Shack disk controller ROM with my own EPROM so the Coco would boot up ready to ready to go. My EPROM had built in utilities to edit files ‘JMS E’, speedup the computer ‘JMS F’, backup disks ‘JMS B’, and various other niceties.
I copied the EPROM from my old Coco and tried to get it to work with the emulator, but I wasn’t successful. Without that EPROM, it’s impossible to mount my old .DSK files on the emulator.
I started coding Zenix with Edtasm. It soon ran out of memory, and my parents bought me “The Worlds Best Assembler” by “The Micro Works.” AS.BAS. That assembler ran slower and slower and also ran out of memory. So I wrote “The Worlds Fastest Assembler”, ASSEM.BAS which uses a hash table to look up op codes and a binary search to find symbols. It even had a recursive descent parser.
Alas, had I only known what a linker was, I wouldn’t have had to write an assembler. In the end, it was the BINF instruction (Binary Include File) that made Zenix and Crystal City assemble so fast.