Re: Gens4All with Z80 Emulation
Posted: Wed Sep 01, 2021 12:11 am
Progress update:
Rendering overhead has been reduced to around 2.0-2.2 ms. Originally, when creating the Genesis VRAM texture, it would write to main RAM, then DMA it into the DC's video memory. It would have to wait for DMA to complete, since it interfered with writing to the TA. Now it writes directly to VRAM using SQs. When I first wrote it, trying to SQ to VRAM was much slower than writing to main RAM then DMAing to VRAM, but I've learned that it's possible to SQ to video RAM at DMA speeds. You can't use the address KOS returns when allocating VRAM, you have to convert it to the address used for DMA (basically, replace the top byte of the pointer with 0x11), set some of the DMA registers, then SQ there.
Partial support for color change raster effects has been added. See the first row of screenshots (left image is before, right is after). It mostly works, but at the moment, the color change happens on a per-tile basis (On the screenshots, the color change lines up closely with the tile map tiles, so it's not too noticeable there. The cyan dot on the left marks where the color change is supposed to be.). For tiles that cross the color change line, I'll need to cut the quad in two at the line and draw each half with the correct palette. Only one color change is supported per frame.
There's now Smash Pack style line scroll faking. It looks at the line scroll values for the top and bottoms rows of a tile and skews the tiles to fake line scroll. See the second row of images. Left is before, right is with the line scroll faking.
There are some problems, though. It works well for something like the floor in a fighting game, since the lines in between the top and bottom of a tile change by the similar amounts. The stars in Gunstar Heroes' space areas would have problems since each line has a completely different scroll value. Another place with issues is with infinitely scrolling parallax, like Sonic 2's Emerald Hill background. Eventually, it wraps around and the rendering code thinks it's supposed to be skewed the other way. The floor in Toy Story or the icebergs in S3's Ice Cap wouldn't have this problem because they don't really scroll forever. They scroll one repetition of the pattern, then jump back to keep it from scrolling too far.
The 3rd row of screenshots shows pretty much every flaw with this method of faking line scroll. The area around the rings is skewed one way in the left shot, but in the right shot, after scrolling a bit more, the skew direction switches.
Something I wasn't expecting with tile skewing is that at certain steep angles of skew, there are incorrect pixels drawn. (See the out of place pixels in the red box in the last image.) I think the PVR's texturing math has precision issues and it's sampling texels from an adjacent tile. It might be possible to hide or work around this. Maybe detect problem skew values and replace them with similar ones without the problem?
The line scroll method probably needs to be selectable by the user, or the line scroll values need to be scanned and the renderer tries to pick the best option. Subdividing tiles (i.e. rendering 8x4 quads) for greater line scroll resolution might help.
I'm going to work on getting save states working, since that will help with testing. It looks like when switching to the SH4 68K core, the method for reading and writing 68K state changed, but the save state code was #if 0'ed out instead of updating it.
I think I've figured out a possible improvement on the sound handling by skipping SDL. Currently, sound works like this:
Rendering overhead has been reduced to around 2.0-2.2 ms. Originally, when creating the Genesis VRAM texture, it would write to main RAM, then DMA it into the DC's video memory. It would have to wait for DMA to complete, since it interfered with writing to the TA. Now it writes directly to VRAM using SQs. When I first wrote it, trying to SQ to VRAM was much slower than writing to main RAM then DMAing to VRAM, but I've learned that it's possible to SQ to video RAM at DMA speeds. You can't use the address KOS returns when allocating VRAM, you have to convert it to the address used for DMA (basically, replace the top byte of the pointer with 0x11), set some of the DMA registers, then SQ there.
Partial support for color change raster effects has been added. See the first row of screenshots (left image is before, right is after). It mostly works, but at the moment, the color change happens on a per-tile basis (On the screenshots, the color change lines up closely with the tile map tiles, so it's not too noticeable there. The cyan dot on the left marks where the color change is supposed to be.). For tiles that cross the color change line, I'll need to cut the quad in two at the line and draw each half with the correct palette. Only one color change is supported per frame.
There's now Smash Pack style line scroll faking. It looks at the line scroll values for the top and bottoms rows of a tile and skews the tiles to fake line scroll. See the second row of images. Left is before, right is with the line scroll faking.
There are some problems, though. It works well for something like the floor in a fighting game, since the lines in between the top and bottom of a tile change by the similar amounts. The stars in Gunstar Heroes' space areas would have problems since each line has a completely different scroll value. Another place with issues is with infinitely scrolling parallax, like Sonic 2's Emerald Hill background. Eventually, it wraps around and the rendering code thinks it's supposed to be skewed the other way. The floor in Toy Story or the icebergs in S3's Ice Cap wouldn't have this problem because they don't really scroll forever. They scroll one repetition of the pattern, then jump back to keep it from scrolling too far.
The 3rd row of screenshots shows pretty much every flaw with this method of faking line scroll. The area around the rings is skewed one way in the left shot, but in the right shot, after scrolling a bit more, the skew direction switches.
Something I wasn't expecting with tile skewing is that at certain steep angles of skew, there are incorrect pixels drawn. (See the out of place pixels in the red box in the last image.) I think the PVR's texturing math has precision issues and it's sampling texels from an adjacent tile. It might be possible to hide or work around this. Maybe detect problem skew values and replace them with similar ones without the problem?
The line scroll method probably needs to be selectable by the user, or the line scroll values need to be scanned and the renderer tries to pick the best option. Subdividing tiles (i.e. rendering 8x4 quads) for greater line scroll resolution might help.
I'm going to work on getting save states working, since that will help with testing. It looks like when switching to the SH4 68K core, the method for reading and writing 68K state changed, but the save state code was #if 0'ed out instead of updating it.
I think I've figured out a possible improvement on the sound handling by skipping SDL. Currently, sound works like this:
- The Genesis emulation writes 32-bit samples generated in one frame to two buffers (one per channel)
- At the end of the frame, these samples are clamped to 16-bits and written to a single buffer, with channels interleaved
- SDL thread periodically checks the AICA playback position
- When the AICA has played enough, it requests samples from the buffer
- SDL de-interleaves the samples when writing to sound RAM
- The Genesis emulation writes 32-bit samples generated in one frame to two buffers (one per channel)
- At the end of the frame, DMA these buffers to sound RAM and let the ARM deal with it