Gens4All with Z80 Emulation

Place for discussing homebrew games, development, new releases and emulation.

Moderators: pcwzrd13, deluxux, VasiliyRS

kremiso
Rank 9
Posts: 966

Re: Gens4All with Z80 Emulation

Post#71 » Wed Oct 20, 2021 12:14 pm

finally i can pump up the volume playing Genesis games :)

TapamN
letterbomb
Posts: 149

Re: Gens4All with Z80 Emulation

Post#72 » Sat Oct 23, 2021 6:29 pm

Ian Micheal wrote:Do you have an example of this very simple i can learn off..

I have to have this dumbed down for me (basically, replace the top byte of the pointer with 0x11), set some of the DMA registers, then SQ there.

Or maybe where it is in this emulator.. I cant seem to find that part..


Here's a texture memory memset function. It only works on 32 byte blocks.

Code: Select all

void pvr_tex_lmemset32(pvr_ptr_t dst, int l, size_t len) {
   len /= 32;
   
   //Set PVR DMA registers
   volatile int *pvrdmacfg = (int*)0xA05F6888;
   pvrdmacfg[1] = pvrdmacfg[0] = 0;
   
   //Set QACR registers
   volatile int *qacr = (int*)0xFF000038;
   qacr[1] = qacr[0] = 0x11;
   
   //Get SQ area address for the texture location
   volatile int *sq = (int*)(0xe1000000 | ((uintptr_t)dst & 0xffffff));
   
   //Initialize store queues
   sq[0] = l; sq[1] = l; sq[2] = l; sq[3] = l;
   sq[4] = l; sq[5] = l; sq[6] = l; sq[7] = l;
   sq[8] = l; sq[9] = l; sq[10] = l; sq[11] = l;
   sq[12] = l; sq[13] = l; sq[14] = l; sq[15] = l;
   
   //Write to texture
   while(len--) {
      __asm__ __volatile__("pref @%0" : : "r" (sq) : "memory");
      sq += 8;
   }
}


Some comments on the code you posted:

There's no need to try to wait for SQs to finish. SQs always start immediately (if the CPU can't start the SQ right away, because some kind of memory transfer is already happening, the CPU will stall until it can start the SQ). And while the SQ is happening, it's not possible to access the memory bus (if you try to, the CPU will stall until the SQ transfer completes). So there's never a point you can access RAM or an external hardware register before the SQ write completes.

Also, there's no performance advantage to using OCBI. As long as you don't write to a cache line, it can be unloaded for free automatically when something else needs to be loaded in. OCBI is really only for reading from memory that has been changed by an outside device (e.g. DMA wrote something to RAM, and you want to read the new data and not old cached data).

Using SQs to write textures would really only noticably faster than DMA if you can do something while the SQ is being sent. Like if you're decoding video, using SQs while converting YUV to RGB makes sense (e.g. convert some YUVs from cache, write a bit to texture memory with SQs, repeat), but doing all YUV conversion at once then doing all texture writes with SQs at once is unlikely to be faster than DMA.

Actually, now that I think about it, SQs might be faster for small transfers. From what I've noticed, DMA seemed to have a bit of a startup delay. Using SQs might be faster for certain transfer sizes (like less than one or two kilobytes), even if you can't find something else to do. This needs to be benchmarked to know for certain, and for what sizes.

User avatar
Ian Micheal
Developer
Posts: 5994
Contact:

Re: Gens4All with Z80 Emulation

Post#73 » Sat Oct 23, 2021 9:30 pm

TapamN wrote:
Ian Micheal wrote:Do you have an example of this very simple i can learn off..

I have to have this dumbed down for me (basically, replace the top byte of the pointer with 0x11), set some of the DMA registers, then SQ there.

Or maybe where it is in this emulator.. I cant seem to find that part..


Here's a texture memory memset function. It only works on 32 byte blocks.

Code: Select all

void pvr_tex_lmemset32(pvr_ptr_t dst, int l, size_t len) {
   len /= 32;
   
   //Set PVR DMA registers
   volatile int *pvrdmacfg = (int*)0xA05F6888;
   pvrdmacfg[1] = pvrdmacfg[0] = 0;
   
   //Set QACR registers
   volatile int *qacr = (int*)0xFF000038;
   qacr[1] = qacr[0] = 0x11;
   
   //Get SQ area address for the texture location
   volatile int *sq = (int*)(0xe1000000 | ((uintptr_t)dst & 0xffffff));
   
   //Initialize store queues
   sq[0] = l; sq[1] = l; sq[2] = l; sq[3] = l;
   sq[4] = l; sq[5] = l; sq[6] = l; sq[7] = l;
   sq[8] = l; sq[9] = l; sq[10] = l; sq[11] = l;
   sq[12] = l; sq[13] = l; sq[14] = l; sq[15] = l;
   
   //Write to texture
   while(len--) {
      __asm__ __volatile__("pref @%0" : : "r" (sq) : "memory");
      sq += 8;
   }
}


Some comments on the code you posted:

There's no need to try to wait for SQs to finish. SQs always start immediately (if the CPU can't start the SQ right away, because some kind of memory transfer is already happening, the CPU will stall until it can start the SQ). And while the SQ is happening, it's not possible to access the memory bus (if you try to, the CPU will stall until the SQ transfer completes). So there's never a point you can access RAM or an external hardware register before the SQ write completes.

Also, there's no performance advantage to using OCBI. As long as you don't write to a cache line, it can be unloaded for free automatically when something else needs to be loaded in. OCBI is really only for reading from memory that has been changed by an outside device (e.g. DMA wrote something to RAM, and you want to read the new data and not old cached data).

Using SQs to write textures would really only noticably faster than DMA if you can do something while the SQ is being sent. Like if you're decoding video, using SQs while converting YUV to RGB makes sense (e.g. convert some YUVs from cache, write a bit to texture memory with SQs, repeat), but doing all YUV conversion at once then doing all texture writes with SQs at once is unlikely to be faster than DMA.

Actually, now that I think about it, SQs might be faster for small transfers. From what I've noticed, DMA seemed to have a bit of a startup delay. Using SQs might be faster for certain transfer sizes (like less than one or two kilobytes), even if you can't find something else to do. This needs to be benchmarked to know for certain, and for what sizes.


Thank you :)

You have open my eyes to all of this for sure also i always thought you had to wait for SQ to complete

User avatar
Ian Micheal
Developer
Posts: 5994
Contact:

Re: Gens4All with Z80 Emulation

Post#74 » Sun Oct 24, 2021 10:31 am

So here is Yours and normal below i cant seem to get it to work and replace the old one below
Do i have to include your sh4lib or headers as well only getting a black screen?

Code: Select all

void pvr_tex_lmemset32(pvr_ptr_t dst, int l, size_t len) {
   len /= 32;
   
   //Set PVR DMA registers
   volatile int *pvrdmacfg = (int*)0xA05F6888;
   pvrdmacfg[1] = pvrdmacfg[0] = 0;
   
   //Set QACR registers
   volatile int *qacr = (int*)0xFF000038;
   qacr[1] = qacr[0] = 0x11;
   
   //Get SQ area address for the texture location
   volatile int *sq = (int*)(0xe1000000 | ((uintptr_t)dst & 0xffffff));
   
   //Initialize store queues
   sq[0] = l; sq[1] = l; sq[2] = l; sq[3] = l;
   sq[4] = l; sq[5] = l; sq[6] = l; sq[7] = l;
   sq[8] = l; sq[9] = l; sq[10] = l; sq[11] = l;
   sq[12] = l; sq[13] = l; sq[14] = l; sq[15] = l;
   
   //Write to texture
   while(len--) {
      __asm__ __volatile__("pref @%0" : : "r" (sq) : "memory");
      sq += 8;
   }
}


Normal

Code: Select all

/* copies n bytes from src to dest, dest must be 32-byte aligned */
void * sq_cpy2(void *dest, const void *src, int n) {
    unsigned int *d = (unsigned int *)(void *)
                      (0xe0000000 | (((unsigned long)dest) & 0x03ffffe0));
    const unsigned int *s = src;

    /* Set store queue memory area as desired */
    QACR0 = ((((unsigned int)dest) >> 26) << 2) & 0x1c;
    QACR1 = ((((unsigned int)dest) >> 26) << 2) & 0x1c;

    /* fill/write queues as many times necessary */
    n >>= 5;

    while(n--) {
        __asm__("pref @%0" : : "r"(s + 8));  /* prefetch 32 bytes for next loop */
        d[0] = *(s++);
        d[1] = *(s++);
        d[2] = *(s++);
        d[3] = *(s++);
        d[4] = *(s++);
        d[5] = *(s++);
        d[6] = *(s++);
        d[7] = *(s++);
        __asm__("pref @%0" : : "r"(d));
        d += 8;
    }

    /* Wait for both store queues to complete */
    d = (unsigned int *)0xe0000000;
    d[0] = d[8] = 0;

    return dest;
}

User avatar
Ian Micheal
Developer
Posts: 5994
Contact:

Re: Gens4All with Z80 Emulation

Post#75 » Sun Oct 24, 2021 12:49 pm

Yours

Code: Select all

void pvr_tex_lmemset32(pvr_ptr_t dst, int l, size_t len) {
   len /= 32;
   
   //Set PVR DMA registers
   volatile int *pvrdmacfg = (int*)0xA05F6888;
   pvrdmacfg[1] = pvrdmacfg[0] = 0;
   
   //Set QACR registers
   volatile int *qacr = (int*)0xFF000038;
   qacr[1] = qacr[0] = 0x11;
   
   //Get SQ area address for the texture location
   volatile int *sq = (int*)(0xe1000000 | ((uintptr_t)dst & 0xffffff));
   
   //Initialize store queues
   sq[0] = l; sq[1] = l; sq[2] = l; sq[3] = l;
   sq[4] = l; sq[5] = l; sq[6] = l; sq[7] = l;
   sq[8] = l; sq[9] = l; sq[10] = l; sq[11] = l;
   sq[12] = l; sq[13] = l; sq[14] = l; sq[15] = l;
   
   //Write to texture
   while(len--) {
      __asm__ __volatile__("pref @%0" : : "r" (sq) : "memory");
      sq += 8;
   }
}

So this what where using in kos every where can you improve this and submit a fix for kos

Code: Select all

/* copies n bytes from src to dest, dest must be 32-byte aligned */
void * sq_cpy(void *dest, const void *src, int n) {
    unsigned int *d = (unsigned int *)(void *)
                      (0xe0000000 | (((unsigned long)dest) & 0x03ffffe0));
    const unsigned int *s = src;

    /* Set store queue memory area as desired */
    QACR0 = ((((unsigned int)dest) >> 26) << 2) & 0x1c;
    QACR1 = ((((unsigned int)dest) >> 26) << 2) & 0x1c;

    /* fill/write queues as many times necessary */
    n >>= 5;

    while(n--) {
        __asm__("pref @%0" : : "r"(s + 8));  /* prefetch 32 bytes for next loop */
        d[0] = *(s++);
        d[1] = *(s++);
        d[2] = *(s++);
        d[3] = *(s++);
        d[4] = *(s++);
        d[5] = *(s++);
        d[6] = *(s++);
        d[7] = *(s++);
        __asm__("pref @%0" : : "r"(d));
        d += 8;
    }

    /* Wait for both store queues to complete */
    d = (unsigned int *)0xe0000000;
    d[0] = d[8] = 0;

    return dest;
}


With all you have done and found out with this emulator and past sh4lib could we not just included your work in kos ?

?

Code: Select all

   //Set QACR registers
   volatile int *qacr = (int*)0xFF000038;
   qacr[1] = qacr[0] = 0x11;


How does this related to

Code: Select all

    /* Set store queue memory area as desired */
    QACR0 = ((((unsigned int)sbuf->vramData)>>26)<<2)&0x1c;
    QACR1 = ((((unsigned int)sbuf->vramData)>>26)<<2)&0x1c;


In your sh4 lib

Code: Select all

#define NONCACHED(a) (typeof (&(a)[0]))(((unsigned int)(a)) |  (1 << 29))
#define CACHED(a)    (typeof (&(a)[0]))(((unsigned int)(a)) & ~(1 << 29))
#define OCI_BANK0(a) (typeof (&(a)[0]))(((unsigned int)(a)) & ~(1 << 25))
#define OCI_BANK1(a) (typeof (&(a)[0]))(((unsigned int)(a)) |  (1 << 25))


Kos standard

Code: Select all

/** \brief  Store Queue 0 access register */
#define QACR0 (*(volatile unsigned int *)(void *)0xff000038)

/** \brief  Store Queue 1 access register */
#define QACR1 (*(volatile unsigned int *)(void *)0xff00003c)


I'm using

Code: Select all

void StreamRender_DisplayFrame( StreamBuffer * sbuf )
{
    while(!sbuf->frames)
        thd_pass();
   
    //printf("RenderFrame: Buffer Contains %i frames\n", sbuf->frames );
   
    while(sbuf->locked)
        thd_pass();
    sbuf->locked = 1;
   
    StreamTexturePVR( sbuf );
   
    sbuf->locked = 0;
   
    pvr_wait_ready();
   
    pvr_scene_begin();
   
    pvr_list_begin( PVR_LIST_OP_POLY );

    sq_cpy( (void*)0x10000000, VERTEX_BUFFER, VERTEX_COUNT * 32 );
   
    pvr_list_finish();

    pvr_scene_finish();
}

Using sq_cpy it works so does Dreamhal but i cant get your pvr_tex_lmemset32 to work just black screen i know doing something stupid here..

Sorry to hijack the thread just i dont get to talk to someone that knows dreamcast hardware as well as you now moop is gone..

User avatar
KmusDC
fire
Posts: 82

Re: Gens4All with Z80 Emulation

Post#76 » Sun Oct 24, 2021 2:47 pm

Hello bro, excellent contribution, good to see updates for this magnificent emulator. I would like to know how to convert it to an iso for dreamshell. Greetings

User avatar
Ian Micheal
Developer
Posts: 5994
Contact:

Re: Gens4All with Z80 Emulation

Post#77 » Sun Oct 24, 2021 3:03 pm

KmusDC wrote:Hello bro, excellent contribution, good to see updates for this magnificent emulator. I would like to know how to convert it to an iso for dreamshell. Greetings


You would to take the build i did 1st_read.bin and unscramble it


below
WedgeStratos wrote:I am currently going through a TOSEC full-set to apply only the games that work with this emulator. A number of issues exist, particularly with raster graphics, so there are NO racing games that work, and it causes graphical issues in a number of games like Ecco or Comix Zone. With that said?

Gens4SSP: The Sega Smash Pack Stand-In. Download here.

This uses a rebuilt ELF provided by Ian Michael (thanks m8) and contains 22 games collected from the 5 separate Sega Smash Pack releases across PC, Dreamcast and GBA, plus stand-ins for the games that don't work.

► Show Spoiler


This has only been tested via Demul and Redream in emulation, and via GDemu on retail hardware, so I cannot be held liable for excess coasters.

Again, I am testing the TOSEC for valid games. It will be a few days, but I have already completed everything from A-E in the games offered in the US NTSC catalog. A game has to be playable with no major graphical defects that hinder the experience, as well as not having sound issues.

Released further in this thread, or linked here


as normal that should work use iso make tools once you get the files from abovr un scramble the 1St_read.bin normal stuff

SODIX
shadow
Posts: 6

Re: Gens4All with Z80 Emulation

Post#78 » Sun Oct 24, 2021 5:57 pm

Image

I have a bug. My specs:
Dreamcast PAL with GDEMU clone
VGA2HDMI Cabble Bitfunx
16:9 720P TV

Some green bar appears in left bottom part of screen, also screen edges are flickering. Please fix it.

User avatar
Ian Micheal
Developer
Posts: 5994
Contact:

Re: Gens4All with Z80 Emulation

Post#79 » Sun Oct 24, 2021 6:08 pm

debug prof bars are supposed to be there this is a test version fliker on some games edge on left hand side you have to wait for TapamN

Left bottom is not a bug

SODIX
shadow
Posts: 6

Re: Gens4All with Z80 Emulation

Post#80 » Sun Oct 24, 2021 6:11 pm

Thakns, waiting for a final version. This is bomb!

  • Similar Topics
    Replies
    Views
    Last post

Return to “New Releases/Homebrew/Emulation”

Who is online

Users browsing this forum: No registered users