Update to Nareez: Dreamcast port.
Ian micheal: Changes i made to Improve speed. 1: Optimizing and fixing Pvr dma rendering is now 32bit aligned and working added sh4 math header. 2: I have been optimizing this and have fixed the 32-bit DMA rendering alignment and added SH4 math functions. 3: Replace the inner for loop with a while loop that terminates early when the projected height falls below zero or above the screen height. 4: I reorder the heightMap and pixelmap arrays to improve cache locality by aligning adjacent memory elements to adjacent points in the world, resulting in fewer cache misses and better performance.
Running on Dreamcast hardware now 21fps
38fps on Dreamcast emulators
21fps on a real Dreamcast.. Almost 10fps faster then before..
Dreamcast Voxel engine
Forum rules
Please check the other forums in the Dreamcast section before posting here to see if your topic would fit better in those categories. Example: A new game/homebrew release would go in the New Releases/Homebrew/Emulation section: http://dreamcast-talk.com/forum/viewforum.php?f=5 or if you're having an issue with getting your Dreamcast to work or a game to boot it would go in the Support section: http://dreamcast-talk.com/forum/viewforum.php?f=42
Please check the other forums in the Dreamcast section before posting here to see if your topic would fit better in those categories. Example: A new game/homebrew release would go in the New Releases/Homebrew/Emulation section: http://dreamcast-talk.com/forum/viewforum.php?f=5 or if you're having an issue with getting your Dreamcast to work or a game to boot it would go in the Support section: http://dreamcast-talk.com/forum/viewforum.php?f=42
- Ian Micheal
- Developer
- Posts: 6280
- Location: USA
- Contact:
- Ian Micheal
- Developer
- Posts: 6280
- Location: USA
- Contact:
Re: Dreamcast Voxel engine
Update again further
Now 35fps on hardware
Here are the changes
Now 35fps on hardware
Here are the changes
- *1: Optimizing and fixing Pvr dma rendering is now 32bit aligned and working added sh4 math header.
*2: I have been optimizing this and have fixed the 32-bit DMA rendering alignment and added SH4 math functions.
*3: Replace the inner for loop with a while loop that terminates early when the projected height falls below zero or above the screen height.
*4: I reorder the heightMap and pixelmap arrays to improve cache locality by aligning adjacent memory elements to adjacent points in the world, resulting in fewer cache misses and better performance.
*5: Precompute values that are used in the loop, like sh4FSCARadianSine, sh4FSCARadianCosine, plx, ply, prx, and pry, instead of recomputing them in each iteration of the loop.
*6: Unroll the loop that iterates over the z variable by incrementing z by 2 in each iteration since the loop body is executed twice for each value of z.
*7: Compute the rx and ry variables outside the loop that iterates over z and increment them inside the loop instead of recomputing them in each iteration of the loop. This eliminates one multiplication and one addition per iteration of the z loop.
- Attachments
-
voxelspaceengine2.rar
- (1.97 MiB) Downloaded 215 times
- Ian Micheal
- Developer
- Posts: 6280
- Location: USA
- Contact:
Re: Dreamcast Voxel engine
I have been trying to use your void modified_sq_cpy_pvr32 first up when you compile it there is this error dreamroq-player.c: In function 'modified_sq_cpy_pvr32':TapamN wrote:Are you talking about the demo from boob or SRR? I looked at the boob demo a very long time ago, so I don't remember much of it. But SRR does some weirdness when the camera goes below the water, and the waterfall stage has some weird glitches that make me think it might work like the Flipcode hardware accelerated version.cloofoofoo wrote:Its polygon based. Its grid like near the player and simpler from a distance.
I noticed you're using sq_cpy to update the frame buffer in VRAM. This is much better than trying to use video RAM directly, but it's still pretty slow. There are faster ways to update a VRAM buffer.Nareez wrote:Hi everyone, I'm really happy that some people are interested in the Voxel Engine on the Dreamcast. I didn't expect anyone to care about that.
Unfortunately the demo still doesn't work on real hardware. Ian Michael showed me the error and I will work to correct it.
Many improvements need to be made, as I understand how to get the best out of the Dreamcast Hardware I will improve the Engine.
I'll show the progress on my twitter @NaReeZ
I got the following timings for copying a frame buffer from main RAM to video RAM for a 640x480 16-bit frame buffer:For a 320x240 resolution screen, they would take about 1/4th the time. If you let the DMA work in the background, you wouldn't actually get all of the 1.90 ms saved when waiting, since the DMA will slow down CPU memory access. You might only save something like 1 ms total in an real game.
- memcpy: 20.80 ms
- KOS sq_cpy: 8.24 ms
- Modified sq_cpy: 3.89 ms
- DMA (includes optimized cache flush, waits for DMA to complete): 1.98 ms
- DMA (includes optimized cache flush, DMA works in background): 0.08 ms
It looks like you tried to use DMA, but had trouble since KOS's DMA isn't designed to update the frame buffer. Video RAM DMA will only work if you enable the 3D driver... Also, KOS's cache flush function is partially broken; it does flush the cache, but it's much slower than it needs to be. Using KOS's dcache_flush_range would add an extra 1.84 ms to DMA timings I listed.
I don't have complete, sharable code for using DMA to the frame buffer without the PVR driver right now, but you can speed up sq_cpy by using this function instead of regular sq_cpy:Code: Select all
void modified_sq_cpy_pvr32(void *dst, void *src, size_t len) { //Set PVR DMA register (volatile int *)0xA05F6888 = 1; //Convert read/write area pointer to DMA write only area pointer void *dmaareaptr = ((uintptr_t)dst & 0xffffff) | 0x11000000; sq_cpy(dmaareaptr, src, len); }
dreamroq-player.c:72:31: error: lvalue required as left operand of assignment
dreamroq-player.c:75:23: warning: initialization makes pointer from integer without a cast [enabled by default]
So correct the compile error
Code: Select all
void modified_sq_cpy_pvr32(void *dst, void *src, size_t len) {
//Set PVR DMA register
*(volatile int *)0xA05F6888 = 1;
//Convert read/write area pointer to DMA write only area pointer
uint32_t dmaareaptr = ((uintptr_t)dst & 0xffffff) | 0x11000000;
sq_cpy((void *)dmaareaptr, src, len);
}
Thanks, TapamN. I follow and study all your code, one of my heroes in coding, trying to step up my understanding of the DC hardware.
-
- drunken sailor
- Posts: 160
Re: Dreamcast Voxel engine
It looks like another error with the example is that I put the wrong address there, it should be 0xA05F6884.Ian Micheal wrote:But when I use it on hardware now, I only get a black screen. I have talked to Flycast and Redream Development, and I need a working example so they can make it work in their emulators. As I think you told me, this does not work in them; it never did when I tested. I can't remember how I got this working when you posted it in the gens4all thread. But can you explain it more for them or give a working example using it simple, like for a CDI, I can send it to them so redream and flycast can then work this? And when I fixed the compile error, did I break the function?
I worked on some speedups for the voxel demo myself but never posted it. Here's my modified demo that uses either SQ or DMA to copy the framebuffer to video RAM (32-bit area) that can be used as a reference. I left out the romdisk directory, so you'll need to add your own.
In display_flip_framebuffer of display.c, there's an if block that you can use to switch between SQ or DMA. It defaults to SQ, but DMA is slightly faster. It uses a small library for framebuffer DMA that doesn't use the PVR driver at all. I'm not sure if I ever posted it.
- Attachments
-
voxel.7z
- (65.23 KiB) Downloaded 195 times
- Ian Micheal
- Developer
- Posts: 6280
- Location: USA
- Contact:
Re: Dreamcast Voxel engine
Thank you yeah could not work out why the address was like that. I thought must be TapamN magic lolTapamN wrote:It looks like another error with the example is that I put the wrong address there, it should be 0xA05F6884.Ian Micheal wrote:But when I use it on hardware now, I only get a black screen. I have talked to Flycast and Redream Development, and I need a working example so they can make it work in their emulators. As I think you told me, this does not work in them; it never did when I tested. I can't remember how I got this working when you posted it in the gens4all thread. But can you explain it more for them or give a working example using it simple, like for a CDI, I can send it to them so redream and flycast can then work this? And when I fixed the compile error, did I break the function?
I worked on some speedups for the voxel demo myself but never posted it. Here's my modified demo that uses either SQ or DMA to copy the framebuffer to video RAM (32-bit area) that can be used as a reference. I left out the romdisk directory, so you'll need to add your own.
In display_flip_framebuffer of display.c, there's an if block that you can use to switch between SQ or DMA. It defaults to SQ, but DMA is slightly faster. It uses a small library for framebuffer DMA that doesn't use the PVR driver at all. I'm not sure if I ever posted it.
-
- Similar Topics
- Replies
- Views
- Last post
-
- 5 Replies
- 7338 Views
-
Last post by DarkSynbios
-
- 0 Replies
- 3398 Views
-
Last post by gamesreup
-
- 21 Replies
- 17082 Views
-
Last post by Guimli
-
- 0 Replies
- 3903 Views
-
Last post by gamesreup
-
- 20 Replies
- 15006 Views
-
Last post by lerabot