GTA 3 Port by SKMP
-
- noob
- Posts: 2
- Dreamcast Games you play Online: Currently only phantasy star and quake I'm running on a bba
- lozz
- blackout!
- Posts: 133
- Dreamcast Games you play Online: Phantasy Star Online, Quake III, Starlancer, 4x4 Evo, Maximum Pool, Sega Swirl.
Re: GTA 3 Port by SKMP
If I recall correctly, I believe you can unzip the ISO. Once you've done that, you'll find there is an installer in the folder which you just run.Arnold.D wrote: ↑Thu Feb 27, 2025 6:07 pm i cant get dreamSDK to download from GITHUB i only see on the bottom of the page the SDK R3 setup iso? is that what i need? what program do i open it with my CPU does not know. I purchased the GTA3 from rockstar and want to make a CDI but found a CDI someone made online in the archives.....that one is old and freezes on the car crasher stuff (been playing the game a few weeks. HATE the radio situation but GTA 2 was like that too so im not crying...) Im trying to make the latest and greatest CDI but am having issues even starting.....
- Arnold.D
- fire
- Posts: 73
- Dreamcast Games you play Online: ALL OF THEM!!! DREAMPI BABY!
Re: GTA 3 Port by SKMP
ok groovy ill do that this weekend. thank you kindly! all the missions im on right now around the city are pretty much car crushing involved. This weekend i may be able to progress a little more.
- mistamontiel
- Shark Patrol
- Posts: 2151
- Dreamcast Games you play Online: Errythan except Tetris o.0
- Location: Miami, FL, CUBA
- Contact:
-
- noob
- Posts: 4
- Dreamcast Games you play Online: none
Re: GTA 3 Port by SKMP
I ask Grok to Analyse the 10 last post of @falco_girgis on Xr relative to GTA3.
To optimize with full sh4 asm simd and mathematical approximation for speed and cycle count.
I don't think this code works but ican't test it so maybe it can inspire
; Optimized Vertex Transformation for GTA III on Dreamcast
; Includes reciprocal table generation and SIMD matrix multiplication
; Author: Grok (inspired by @falco_girgis's work)
; Date: March 06, 2025
; Inputs:
; r1 = Pointer to vertex buffer (x1, y1, z1, x2, y2, z2, ...), 4 vertices at a time
; r2 = Pointer to 3x3 matrix (m00-m22, aligned for SIMD)
; r3 = Pointer to translation vector (tx, ty, tz)
; r4 = Pointer to output buffer for transformed vertices (x', y', z')
; r15 = End address of vertex buffer (for loop termination)
; Outputs:
; Transformed vertices written to r4
.section .data
.align 4
recip_table: ; Reciprocal lookup table (4096 entries, 4 KB, filled by init_recip_table)
.space 4096*4 ; Reserve 4 KB for 4096 single-precision floats (1/w values)
.section .text
.align 4 ; Align code for optimal instruction fetch
; Subroutine to initialize the reciprocal table (call once at startup)
init_recip_table:
MOV #0x3F000000, r5 ; r5 = 0.5 (start of range, IEEE 754)
MOV #4096, r6 ; r6 = number of entries
MOV #recip_table, r7; r7 = pointer to table
MOV #0x3A4, r8 ; r8 = step size (~0.000366, for 0.5 to 2.0 range)
.init_loop:
MOV.L r5, @r7 ; Store current value (w)
FCNVSD r5, fr0 ; Convert w to float in fr0
FSRRA fr0, fr1 ; fr1 = 1/sqrt(w) (approx), then refine
FMUL fr1, fr1, fr1 ; fr1 = 1/w (square the reciprocal sqrt)
MOV.L fr1, @(r7,4) ; Store 1/w in table
FADD r8, r5, r5 ; Increment w by step size
ADD #4, r7 ; Move to next table entry
DT r6 ; Decrement counter
BF .init_loop ; Loop until all 4096 entries are filled
RTS ; Return
NOP ; Delay slot
; Main vertex transformation routine
transform_vertices:
MOV #recip_table, r10 ; r10 = pointer to precomputed reciprocal table
.loop:
PREF @r1 ; Preload vertex data into SH-4's 16 KB cache
; Load 3x3 matrix into vector registers (128-bit, 4 floats each)
FMOV.DR @r2+, DR4 ; DR4 = {m00, m01, m02, m10} - First row + part of second
FMOV.DR @r2+, DR6 ; DR6 = {m11, m12, m20, m21} - Rest of second + third
FMOV.DR @r2, DR8 ; DR8 = {m22, pad, pad, pad} - Last element (padded)
; Load 4 vertices (x, y, z) into vector registers
FMOV.DR @r1+, DR0 ; DR0 = {x1, x2, x3, x4} - 4 x-coordinates
FMOV.DR @r1+, DR2 ; DR2 = {y1, y2, y3, y4} - 4 y-coordinates
FMOV.DR @r1+, DR10 ; DR10 = {z1, z2, z3, z4} - 4 z-coordinates
; Compute x' for 4 vertices in parallel
FMAC DR0, DR4, DR12 ; DR12 = x * m00 (SIMD multiply-accumulate)
FMAC DR2, DR5, DR12 ; DR12 += y * m01
FMAC DR10, DR6, DR12; DR12 += z * m02
FMOV.DR @r3, DR14 ; DR14 = {tx, tx, tx, tx} - Load translation x
FADD DR12, DR14, DR0; DR0 = x' = x*m00 + y*m01 + z*m02 + tx
; Compute y' for 4 vertices
FMAC DR0, DR4+1, DR12; DR12 = x * m10
FMAC DR2, DR6, DR12 ; DR12 += y * m11
FMAC DR10, DR6+1, DR12; DR12 += z * m12
FMOV.DR @(4,r3), DR14; DR14 = {ty, ty, ty, ty} - Load translation y
FADD DR12, DR14, DR1; DR1 = y' = x*m10 + y*m11 + z*m12 + ty
; Compute z' for 4 vertices
FMAC DR0, DR4+2, DR12; DR12 = x * m20
FMAC DR2, DR6+1, DR12; DR12 += y * m21
FMAC DR10, DR8, DR12 ; DR12 += z * m22
FMOV.DR @(8,r3), DR14; DR14 = {tz, tz, tz, tz} - Load translation z
FADD DR12, DR14, DR2; DR2 = z' = x*m20 + y*m21 + z*m22 + tz
; Fast perspective division (1/w) using lookup table
; Example for first vertex (repeat for others in DR0-DR2)
FMOV FR0, r5 ; r5 = w1 (assuming w from projection or vertex data)
SHLR r5, r6, #9 ; r6 = w >> 9 = index in table (10-bit precision)
AND #0x0080, r7, r5 ; r7 = fraction (7 bits for interpolation)
MOV.L @(r6,r10), r8 ; r8 = 1/w from table
ADD #4, r6 ; Next entry
MOV.L @(r6,r10), r9 ; r9 = 1/w[i+1]
FSUB r9, r8, r11 ; r11 = delta = 1/w[i+1] - 1/w
FMUL r11, r7, r11 ; r11 = delta * fraction
FADD r8, r11, FR0 ; FR0 = 1/w = 1/w + delta * fraction (~8 cycles)
; Apply 1/w to x', y', z' for first vertex (repeat for others)
FMUL FR0, FR0, FR0 ; x' = x' * (1/w)
FMUL FR1, FR0, FR1 ; y' = y' * (1/w)
FMUL FR2, FR0, FR2 ; z' = z' * (1/w)
; Store results without cache latency
MOVCA.L DR0, @r4+ ; Write 4 transformed x' coordinates
MOVCA.L DR1, @r4+ ; Write 4 transformed y' coordinates
MOVCA.L DR2, @r4+ ; Write 4 transformed z' coordinates
; Loop control
ADD #12, r1 ; Next group of 4 vertices (3 floats per vertex)
CMP/HS r1, r15 ; End of vertex buffer?
BT .loop ; Continue if more vertices
.end:
RTS ; Return
NOP ; Delay slot
To optimize with full sh4 asm simd and mathematical approximation for speed and cycle count.
I don't think this code works but ican't test it so maybe it can inspire
; Optimized Vertex Transformation for GTA III on Dreamcast
; Includes reciprocal table generation and SIMD matrix multiplication
; Author: Grok (inspired by @falco_girgis's work)
; Date: March 06, 2025
; Inputs:
; r1 = Pointer to vertex buffer (x1, y1, z1, x2, y2, z2, ...), 4 vertices at a time
; r2 = Pointer to 3x3 matrix (m00-m22, aligned for SIMD)
; r3 = Pointer to translation vector (tx, ty, tz)
; r4 = Pointer to output buffer for transformed vertices (x', y', z')
; r15 = End address of vertex buffer (for loop termination)
; Outputs:
; Transformed vertices written to r4
.section .data
.align 4
recip_table: ; Reciprocal lookup table (4096 entries, 4 KB, filled by init_recip_table)
.space 4096*4 ; Reserve 4 KB for 4096 single-precision floats (1/w values)
.section .text
.align 4 ; Align code for optimal instruction fetch
; Subroutine to initialize the reciprocal table (call once at startup)
init_recip_table:
MOV #0x3F000000, r5 ; r5 = 0.5 (start of range, IEEE 754)
MOV #4096, r6 ; r6 = number of entries
MOV #recip_table, r7; r7 = pointer to table
MOV #0x3A4, r8 ; r8 = step size (~0.000366, for 0.5 to 2.0 range)
.init_loop:
MOV.L r5, @r7 ; Store current value (w)
FCNVSD r5, fr0 ; Convert w to float in fr0
FSRRA fr0, fr1 ; fr1 = 1/sqrt(w) (approx), then refine
FMUL fr1, fr1, fr1 ; fr1 = 1/w (square the reciprocal sqrt)
MOV.L fr1, @(r7,4) ; Store 1/w in table
FADD r8, r5, r5 ; Increment w by step size
ADD #4, r7 ; Move to next table entry
DT r6 ; Decrement counter
BF .init_loop ; Loop until all 4096 entries are filled
RTS ; Return
NOP ; Delay slot
; Main vertex transformation routine
transform_vertices:
MOV #recip_table, r10 ; r10 = pointer to precomputed reciprocal table
.loop:
PREF @r1 ; Preload vertex data into SH-4's 16 KB cache
; Load 3x3 matrix into vector registers (128-bit, 4 floats each)
FMOV.DR @r2+, DR4 ; DR4 = {m00, m01, m02, m10} - First row + part of second
FMOV.DR @r2+, DR6 ; DR6 = {m11, m12, m20, m21} - Rest of second + third
FMOV.DR @r2, DR8 ; DR8 = {m22, pad, pad, pad} - Last element (padded)
; Load 4 vertices (x, y, z) into vector registers
FMOV.DR @r1+, DR0 ; DR0 = {x1, x2, x3, x4} - 4 x-coordinates
FMOV.DR @r1+, DR2 ; DR2 = {y1, y2, y3, y4} - 4 y-coordinates
FMOV.DR @r1+, DR10 ; DR10 = {z1, z2, z3, z4} - 4 z-coordinates
; Compute x' for 4 vertices in parallel
FMAC DR0, DR4, DR12 ; DR12 = x * m00 (SIMD multiply-accumulate)
FMAC DR2, DR5, DR12 ; DR12 += y * m01
FMAC DR10, DR6, DR12; DR12 += z * m02
FMOV.DR @r3, DR14 ; DR14 = {tx, tx, tx, tx} - Load translation x
FADD DR12, DR14, DR0; DR0 = x' = x*m00 + y*m01 + z*m02 + tx
; Compute y' for 4 vertices
FMAC DR0, DR4+1, DR12; DR12 = x * m10
FMAC DR2, DR6, DR12 ; DR12 += y * m11
FMAC DR10, DR6+1, DR12; DR12 += z * m12
FMOV.DR @(4,r3), DR14; DR14 = {ty, ty, ty, ty} - Load translation y
FADD DR12, DR14, DR1; DR1 = y' = x*m10 + y*m11 + z*m12 + ty
; Compute z' for 4 vertices
FMAC DR0, DR4+2, DR12; DR12 = x * m20
FMAC DR2, DR6+1, DR12; DR12 += y * m21
FMAC DR10, DR8, DR12 ; DR12 += z * m22
FMOV.DR @(8,r3), DR14; DR14 = {tz, tz, tz, tz} - Load translation z
FADD DR12, DR14, DR2; DR2 = z' = x*m20 + y*m21 + z*m22 + tz
; Fast perspective division (1/w) using lookup table
; Example for first vertex (repeat for others in DR0-DR2)
FMOV FR0, r5 ; r5 = w1 (assuming w from projection or vertex data)
SHLR r5, r6, #9 ; r6 = w >> 9 = index in table (10-bit precision)
AND #0x0080, r7, r5 ; r7 = fraction (7 bits for interpolation)
MOV.L @(r6,r10), r8 ; r8 = 1/w from table
ADD #4, r6 ; Next entry
MOV.L @(r6,r10), r9 ; r9 = 1/w[i+1]
FSUB r9, r8, r11 ; r11 = delta = 1/w[i+1] - 1/w
FMUL r11, r7, r11 ; r11 = delta * fraction
FADD r8, r11, FR0 ; FR0 = 1/w = 1/w + delta * fraction (~8 cycles)
; Apply 1/w to x', y', z' for first vertex (repeat for others)
FMUL FR0, FR0, FR0 ; x' = x' * (1/w)
FMUL FR1, FR0, FR1 ; y' = y' * (1/w)
FMUL FR2, FR0, FR2 ; z' = z' * (1/w)
; Store results without cache latency
MOVCA.L DR0, @r4+ ; Write 4 transformed x' coordinates
MOVCA.L DR1, @r4+ ; Write 4 transformed y' coordinates
MOVCA.L DR2, @r4+ ; Write 4 transformed z' coordinates
; Loop control
ADD #12, r1 ; Next group of 4 vertices (3 floats per vertex)
CMP/HS r1, r15 ; End of vertex buffer?
BT .loop ; Continue if more vertices
.end:
RTS ; Return
NOP ; Delay slot
-
- noob
- Posts: 2
Re: GTA 3 Port by SKMP
Hey y'all, i know being new and asking questions suck, but i didn't find it anywhere else.
So, what settings do i need to run this GTA 3 port on DreamShell/Retrodream? I tried both with an ISO and CDI and couldn't get past the loader saying "Executing" no matter what. Does it need any exact settings to run that way? I saw people running via Serial but i can't do that since i dont have a serial cable (I got this from a friend pre-modded with Dreamshell/Retrodream and a SD Card).
ISO images are fine since they worked on another friend's GDEMU and any emulators can load and read them.
Thanks in advance.
So, what settings do i need to run this GTA 3 port on DreamShell/Retrodream? I tried both with an ISO and CDI and couldn't get past the loader saying "Executing" no matter what. Does it need any exact settings to run that way? I saw people running via Serial but i can't do that since i dont have a serial cable (I got this from a friend pre-modded with Dreamshell/Retrodream and a SD Card).
ISO images are fine since they worked on another friend's GDEMU and any emulators can load and read them.
Thanks in advance.
-
- shadow
- Posts: 7
Re: GTA 3 Port by SKMP
Could anyone desrcibe the exact process of updating with newer builds? I deleted "dca3-game" folder, I re-cloned the repo from "git clone --branch alpha https://gitlab.com/skmp/dca3-game.git", then downloaded the latest "build-dreamcast-liberty" from here and unpacked to "dca3-game\dreamcast", renamed the newer ELF as "dca3.elf" and rebuilt the image. After booting the cdi under gdemu, I got a crash at the main screen (right before the main menu options). Following the guide for the original build, game loads and plays fine.
- mistamontiel
- Shark Patrol
- Posts: 2151
- Dreamcast Games you play Online: Errythan except Tetris o.0
- Location: Miami, FL, CUBA
- Contact:
Re: GTA 3 Port by SKMP
No longer is DCA in alpha! First step just don't key in --branch alpha
ELFs mustn't be renamed
I too also got VC going two disc unpatched v1.0 and that takes a bit more attempts, close DreamSDK and resume proceed again oh that instead uses:
Not for disc by any means just uses that command to cook it a lil north of a gig CDI will spit out
ELFs mustn't be renamed
I too also got VC going two disc unpatched v1.0 and that takes a bit more attempts, close DreamSDK and resume proceed again oh that instead uses:
Code: Select all
make FOR_DISC=2 cdi-prebuilt
-
- Similar Topics
- Replies
- Views
- Last post
-
- 0 Replies
- 6922 Views
-
Last post by pizzahotline
-
- 0 Replies
- 2932 Views
-
Last post by Gamer
-
- 18 Replies
- 21972 Views
-
Last post by Anthony817
-
- 11 Replies
- 14340 Views
-
Last post by DCGX
-
- 2 Replies
- 8364 Views
-
Last post by streeker