
Gens4All with Z80 Emulation
-
- drunken sailor
- Posts: 160
Re: Gens4All with Z80 Emulation
Here's a texture memory memset function. It only works on 32 byte blocks.Ian Micheal wrote:Do you have an example of this very simple i can learn off..
I have to have this dumbed down for me (basically, replace the top byte of the pointer with 0x11), set some of the DMA registers, then SQ there.
Or maybe where it is in this emulator.. I cant seem to find that part..
Code: Select all
void pvr_tex_lmemset32(pvr_ptr_t dst, int l, size_t len) {
len /= 32;
//Set PVR DMA registers
volatile int *pvrdmacfg = (int*)0xA05F6888;
pvrdmacfg[1] = pvrdmacfg[0] = 0;
//Set QACR registers
volatile int *qacr = (int*)0xFF000038;
qacr[1] = qacr[0] = 0x11;
//Get SQ area address for the texture location
volatile int *sq = (int*)(0xe1000000 | ((uintptr_t)dst & 0xffffff));
//Initialize store queues
sq[0] = l; sq[1] = l; sq[2] = l; sq[3] = l;
sq[4] = l; sq[5] = l; sq[6] = l; sq[7] = l;
sq[8] = l; sq[9] = l; sq[10] = l; sq[11] = l;
sq[12] = l; sq[13] = l; sq[14] = l; sq[15] = l;
//Write to texture
while(len--) {
__asm__ __volatile__("pref @%0" : : "r" (sq) : "memory");
sq += 8;
}
}
There's no need to try to wait for SQs to finish. SQs always start immediately (if the CPU can't start the SQ right away, because some kind of memory transfer is already happening, the CPU will stall until it can start the SQ). And while the SQ is happening, it's not possible to access the memory bus (if you try to, the CPU will stall until the SQ transfer completes). So there's never a point you can access RAM or an external hardware register before the SQ write completes.
Also, there's no performance advantage to using OCBI. As long as you don't write to a cache line, it can be unloaded for free automatically when something else needs to be loaded in. OCBI is really only for reading from memory that has been changed by an outside device (e.g. DMA wrote something to RAM, and you want to read the new data and not old cached data).
Using SQs to write textures would really only noticably faster than DMA if you can do something while the SQ is being sent. Like if you're decoding video, using SQs while converting YUV to RGB makes sense (e.g. convert some YUVs from cache, write a bit to texture memory with SQs, repeat), but doing all YUV conversion at once then doing all texture writes with SQs at once is unlikely to be faster than DMA.
Actually, now that I think about it, SQs might be faster for small transfers. From what I've noticed, DMA seemed to have a bit of a startup delay. Using SQs might be faster for certain transfer sizes (like less than one or two kilobytes), even if you can't find something else to do. This needs to be benchmarked to know for certain, and for what sizes.
- Ian Micheal
- Developer
- Posts: 6277
- Location: USA
- Contact:
Re: Gens4All with Z80 Emulation
Thank youTapamN wrote:Here's a texture memory memset function. It only works on 32 byte blocks.Ian Micheal wrote:Do you have an example of this very simple i can learn off..
I have to have this dumbed down for me (basically, replace the top byte of the pointer with 0x11), set some of the DMA registers, then SQ there.
Or maybe where it is in this emulator.. I cant seem to find that part..
Some comments on the code you posted:Code: Select all
void pvr_tex_lmemset32(pvr_ptr_t dst, int l, size_t len) { len /= 32; //Set PVR DMA registers volatile int *pvrdmacfg = (int*)0xA05F6888; pvrdmacfg[1] = pvrdmacfg[0] = 0; //Set QACR registers volatile int *qacr = (int*)0xFF000038; qacr[1] = qacr[0] = 0x11; //Get SQ area address for the texture location volatile int *sq = (int*)(0xe1000000 | ((uintptr_t)dst & 0xffffff)); //Initialize store queues sq[0] = l; sq[1] = l; sq[2] = l; sq[3] = l; sq[4] = l; sq[5] = l; sq[6] = l; sq[7] = l; sq[8] = l; sq[9] = l; sq[10] = l; sq[11] = l; sq[12] = l; sq[13] = l; sq[14] = l; sq[15] = l; //Write to texture while(len--) { __asm__ __volatile__("pref @%0" : : "r" (sq) : "memory"); sq += 8; } }
There's no need to try to wait for SQs to finish. SQs always start immediately (if the CPU can't start the SQ right away, because some kind of memory transfer is already happening, the CPU will stall until it can start the SQ). And while the SQ is happening, it's not possible to access the memory bus (if you try to, the CPU will stall until the SQ transfer completes). So there's never a point you can access RAM or an external hardware register before the SQ write completes.
Also, there's no performance advantage to using OCBI. As long as you don't write to a cache line, it can be unloaded for free automatically when something else needs to be loaded in. OCBI is really only for reading from memory that has been changed by an outside device (e.g. DMA wrote something to RAM, and you want to read the new data and not old cached data).
Using SQs to write textures would really only noticably faster than DMA if you can do something while the SQ is being sent. Like if you're decoding video, using SQs while converting YUV to RGB makes sense (e.g. convert some YUVs from cache, write a bit to texture memory with SQs, repeat), but doing all YUV conversion at once then doing all texture writes with SQs at once is unlikely to be faster than DMA.
Actually, now that I think about it, SQs might be faster for small transfers. From what I've noticed, DMA seemed to have a bit of a startup delay. Using SQs might be faster for certain transfer sizes (like less than one or two kilobytes), even if you can't find something else to do. This needs to be benchmarked to know for certain, and for what sizes.

You have open my eyes to all of this for sure also i always thought you had to wait for SQ to complete
- Ian Micheal
- Developer
- Posts: 6277
- Location: USA
- Contact:
Re: Gens4All with Z80 Emulation
So here is Yours and normal below i cant seem to get it to work and replace the old one below
Do i have to include your sh4lib or headers as well only getting a black screen?
Normal
Do i have to include your sh4lib or headers as well only getting a black screen?
Code: Select all
void pvr_tex_lmemset32(pvr_ptr_t dst, int l, size_t len) {
len /= 32;
//Set PVR DMA registers
volatile int *pvrdmacfg = (int*)0xA05F6888;
pvrdmacfg[1] = pvrdmacfg[0] = 0;
//Set QACR registers
volatile int *qacr = (int*)0xFF000038;
qacr[1] = qacr[0] = 0x11;
//Get SQ area address for the texture location
volatile int *sq = (int*)(0xe1000000 | ((uintptr_t)dst & 0xffffff));
//Initialize store queues
sq[0] = l; sq[1] = l; sq[2] = l; sq[3] = l;
sq[4] = l; sq[5] = l; sq[6] = l; sq[7] = l;
sq[8] = l; sq[9] = l; sq[10] = l; sq[11] = l;
sq[12] = l; sq[13] = l; sq[14] = l; sq[15] = l;
//Write to texture
while(len--) {
__asm__ __volatile__("pref @%0" : : "r" (sq) : "memory");
sq += 8;
}
}
Code: Select all
/* copies n bytes from src to dest, dest must be 32-byte aligned */
void * sq_cpy2(void *dest, const void *src, int n) {
unsigned int *d = (unsigned int *)(void *)
(0xe0000000 | (((unsigned long)dest) & 0x03ffffe0));
const unsigned int *s = src;
/* Set store queue memory area as desired */
QACR0 = ((((unsigned int)dest) >> 26) << 2) & 0x1c;
QACR1 = ((((unsigned int)dest) >> 26) << 2) & 0x1c;
/* fill/write queues as many times necessary */
n >>= 5;
while(n--) {
__asm__("pref @%0" : : "r"(s + 8)); /* prefetch 32 bytes for next loop */
d[0] = *(s++);
d[1] = *(s++);
d[2] = *(s++);
d[3] = *(s++);
d[4] = *(s++);
d[5] = *(s++);
d[6] = *(s++);
d[7] = *(s++);
__asm__("pref @%0" : : "r"(d));
d += 8;
}
/* Wait for both store queues to complete */
d = (unsigned int *)0xe0000000;
d[0] = d[8] = 0;
return dest;
}
- Ian Micheal
- Developer
- Posts: 6277
- Location: USA
- Contact:
Re: Gens4All with Z80 Emulation
Yours
So this what where using in kos every where can you improve this and submit a fix for kos
With all you have done and found out with this emulator and past sh4lib could we not just included your work in kos ?
?
How does this related to
In your sh4 lib
Kos standard
I'm using
Using sq_cpy it works so does Dreamhal but i cant get your pvr_tex_lmemset32 to work just black screen i know doing something stupid here..
Sorry to hijack the thread just i dont get to talk to someone that knows dreamcast hardware as well as you now moop is gone..
Code: Select all
void pvr_tex_lmemset32(pvr_ptr_t dst, int l, size_t len) {
len /= 32;
//Set PVR DMA registers
volatile int *pvrdmacfg = (int*)0xA05F6888;
pvrdmacfg[1] = pvrdmacfg[0] = 0;
//Set QACR registers
volatile int *qacr = (int*)0xFF000038;
qacr[1] = qacr[0] = 0x11;
//Get SQ area address for the texture location
volatile int *sq = (int*)(0xe1000000 | ((uintptr_t)dst & 0xffffff));
//Initialize store queues
sq[0] = l; sq[1] = l; sq[2] = l; sq[3] = l;
sq[4] = l; sq[5] = l; sq[6] = l; sq[7] = l;
sq[8] = l; sq[9] = l; sq[10] = l; sq[11] = l;
sq[12] = l; sq[13] = l; sq[14] = l; sq[15] = l;
//Write to texture
while(len--) {
__asm__ __volatile__("pref @%0" : : "r" (sq) : "memory");
sq += 8;
}
}
Code: Select all
/* copies n bytes from src to dest, dest must be 32-byte aligned */
void * sq_cpy(void *dest, const void *src, int n) {
unsigned int *d = (unsigned int *)(void *)
(0xe0000000 | (((unsigned long)dest) & 0x03ffffe0));
const unsigned int *s = src;
/* Set store queue memory area as desired */
QACR0 = ((((unsigned int)dest) >> 26) << 2) & 0x1c;
QACR1 = ((((unsigned int)dest) >> 26) << 2) & 0x1c;
/* fill/write queues as many times necessary */
n >>= 5;
while(n--) {
__asm__("pref @%0" : : "r"(s + 8)); /* prefetch 32 bytes for next loop */
d[0] = *(s++);
d[1] = *(s++);
d[2] = *(s++);
d[3] = *(s++);
d[4] = *(s++);
d[5] = *(s++);
d[6] = *(s++);
d[7] = *(s++);
__asm__("pref @%0" : : "r"(d));
d += 8;
}
/* Wait for both store queues to complete */
d = (unsigned int *)0xe0000000;
d[0] = d[8] = 0;
return dest;
}
?
Code: Select all
//Set QACR registers
volatile int *qacr = (int*)0xFF000038;
qacr[1] = qacr[0] = 0x11;
Code: Select all
/* Set store queue memory area as desired */
QACR0 = ((((unsigned int)sbuf->vramData)>>26)<<2)&0x1c;
QACR1 = ((((unsigned int)sbuf->vramData)>>26)<<2)&0x1c;
Code: Select all
#define NONCACHED(a) (typeof (&(a)[0]))(((unsigned int)(a)) | (1 << 29))
#define CACHED(a) (typeof (&(a)[0]))(((unsigned int)(a)) & ~(1 << 29))
#define OCI_BANK0(a) (typeof (&(a)[0]))(((unsigned int)(a)) & ~(1 << 25))
#define OCI_BANK1(a) (typeof (&(a)[0]))(((unsigned int)(a)) | (1 << 25))
Code: Select all
/** \brief Store Queue 0 access register */
#define QACR0 (*(volatile unsigned int *)(void *)0xff000038)
/** \brief Store Queue 1 access register */
#define QACR1 (*(volatile unsigned int *)(void *)0xff00003c)
Code: Select all
void StreamRender_DisplayFrame( StreamBuffer * sbuf )
{
while(!sbuf->frames)
thd_pass();
//printf("RenderFrame: Buffer Contains %i frames\n", sbuf->frames );
while(sbuf->locked)
thd_pass();
sbuf->locked = 1;
StreamTexturePVR( sbuf );
sbuf->locked = 0;
pvr_wait_ready();
pvr_scene_begin();
pvr_list_begin( PVR_LIST_OP_POLY );
sq_cpy( (void*)0x10000000, VERTEX_BUFFER, VERTEX_COUNT * 32 );
pvr_list_finish();
pvr_scene_finish();
}
Sorry to hijack the thread just i dont get to talk to someone that knows dreamcast hardware as well as you now moop is gone..
- KmusDC
- fire
- Posts: 82
- Dreamcast Games you play Online: Worms World Party, Phantasy Star Online v2, POD Speed Zone, Alien Front Online.
Re: Gens4All with Z80 Emulation
Hello bro, excellent contribution, good to see updates for this magnificent emulator. I would like to know how to convert it to an iso for dreamshell. Greetings
- Ian Micheal
- Developer
- Posts: 6277
- Location: USA
- Contact:
Re: Gens4All with Z80 Emulation
You would to take the build i did 1st_read.bin and unscramble itKmusDC wrote:Hello bro, excellent contribution, good to see updates for this magnificent emulator. I would like to know how to convert it to an iso for dreamshell. Greetings
below
as normal that should work use iso make tools once you get the files from abovr un scramble the 1St_read.bin normal stuffWedgeStratos wrote:I am currently going through a TOSEC full-set to apply only the games that work with this emulator. A number of issues exist, particularly with raster graphics, so there are NO racing games that work, and it causes graphical issues in a number of games like Ecco or Comix Zone. With that said?
Gens4SSP: The Sega Smash Pack Stand-In. Download here.
This uses a rebuilt ELF provided by Ian Michael (thanks m8) and contains 22 games collected from the 5 separate Sega Smash Pack releases across PC, Dreamcast and GBA, plus stand-ins for the games that don't work.
This has only been tested via Demul and Redream in emulation, and via GDemu on retail hardware, so I cannot be held liable for excess coasters.ABBC3_SPOILER_SHOW
Again, I am testing the TOSEC for valid games. It will be a few days, but I have already completed everything from A-E in the games offered in the US NTSC catalog. A game has to be playable with no major graphical defects that hinder the experience, as well as not having sound issues.
Released further in this thread, or linked here
-
- shadow
- Posts: 6
- Dreamcast Games you play Online: Quake 3 Arena
Re: Gens4All with Z80 Emulation

I have a bug. My specs:
Dreamcast PAL with GDEMU clone
VGA2HDMI Cabble Bitfunx
16:9 720P TV
Some green bar appears in left bottom part of screen, also screen edges are flickering. Please fix it.
- Ian Micheal
- Developer
- Posts: 6277
- Location: USA
- Contact:
Re: Gens4All with Z80 Emulation
debug prof bars are supposed to be there this is a test version fliker on some games edge on left hand side you have to wait for TapamN
Left bottom is not a bug
Left bottom is not a bug
-
- Similar Topics
- Replies
- Views
- Last post
-
- 24 Replies
- 18299 Views
-
Last post by al73r
-
- 2 Replies
- 6153 Views
-
Last post by majestic_lizard
-
- 9 Replies
- 8322 Views
-
Last post by KmusDC
-
- 9 Replies
- 10154 Views
-
Last post by Ian Micheal
-
- 0 Replies
- 2555 Views
-
Last post by Maztr_0n