From d7a025fd08ed0a1a3179a4c839e86407275ddeb2 Mon Sep 17 00:00:00 2001 From: slederer Date: Mon, 13 Oct 2025 23:33:30 +0200 Subject: [PATCH 01/24] update documentation for October 2025 update --- README.md | 21 ++++++++++++++++++++- doc/mem.md | 1 + 2 files changed, 21 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index be77d37..758c36b 100644 --- a/README.md +++ b/README.md @@ -41,6 +41,25 @@ Other inspirations were, among others, in no particular order: - the Magic-1 by Bill Buzbee - the OPC by revaldinho +## October 2025 Update +This update introduces a data cache for the Tridora-CPU. It is similar to the instruction cache +as it caches the 16 bytes coming from the DRAM memory controller. It is a write-back cache, i.e. +when a word inside the cached area is written, it updates the cache instead of invalidating it. + +This is important because there are many idioms in the stack machine assembly language where you +store a local variable and then read it again (e.g. updating a loop variable). + +Since for most programs, the user stack and parts of the heap are inside the DRAM area, the data cache +has a more noticable impact. In the benchmark program that was already used for the last update, +the data cache results in a 50% improvement for the empty loop test. This is in comparison to the version +without data cache but with the instruction cache, both running code out of DRAM. + +It is also noticable for compile times: With the data cache, compiling and assembling the +"hello,world" program takes 16 seconds instead of 20. With a little tweak of the SD-Card controller +that slightly increased the data transfer rate, the build time goes down to 15 seconds. + +Also, an audio controller was added that allows interrupt-driven sample playback via an AMP2 PMOD. + ## April 2025 Update The clock has been reduced to 77 MHz from 83 MHz. Apparently the design was at the limit and timing problems were cropping up seemingly at random. Reducing the clock speed made some @@ -62,7 +81,7 @@ on the emulator image. - the [Hackaday project](https://hackaday.io/project/198324-tridora-cpu) (mostly copy-paste from this README) - the [YouTube channel](https://www.youtube.com/@tridoracpu/videos) with some demo videos - the [emulator](https://git.insignificance.de/slederer/-/packages/generic/tridoraemu/0.0.5/files/12) (source and windows binary) -- the [FPGA bitstream](https://git.insignificance.de/slederer/-/packages/generic/tdr-bitstream/0.0.2/files/14) for the Arty-A7-35T board +- the [FPGA bitstream](https://git.insignificance.de/slederer/-/packages/generic/tdr-bitstream/0.0.3/files/15) for the Arty-A7-35T board - an [SD-card image](https://git.insignificance.de/slederer/-/packages/generic/tdr-cardimage/0.0.4/files/13) Contact the author here: tridoracpu [at] insignificance.de diff --git a/doc/mem.md b/doc/mem.md index e24fbe2..f7dbc2b 100644 --- a/doc/mem.md +++ b/doc/mem.md @@ -34,3 +34,4 @@ Currently, only I/O slots 0-3 are being used. | 1 | $880 | SPI-SD | | 2 | $900 | VGA | | 3 | $980 | IRQC | +| 4 | $A00 | TDRAUDIO | From 0f72080c56072136be3301530256a2c0ba5bfc3e Mon Sep 17 00:00:00 2001 From: slederer Date: Sun, 26 Oct 2025 00:27:34 +0200 Subject: [PATCH 02/24] tridoracpu: experimented with synthesis options again - workaround for an apparent bug with LOAD address generation at offsets >= 3584 - updated bitstream URL --- README.md | 2 +- tridoracpu/tridoracpu.xpr | 24 ++++++++++++------------ 2 files changed, 13 insertions(+), 13 deletions(-) diff --git a/README.md b/README.md index 758c36b..fe930a7 100644 --- a/README.md +++ b/README.md @@ -81,7 +81,7 @@ on the emulator image. - the [Hackaday project](https://hackaday.io/project/198324-tridora-cpu) (mostly copy-paste from this README) - the [YouTube channel](https://www.youtube.com/@tridoracpu/videos) with some demo videos - the [emulator](https://git.insignificance.de/slederer/-/packages/generic/tridoraemu/0.0.5/files/12) (source and windows binary) -- the [FPGA bitstream](https://git.insignificance.de/slederer/-/packages/generic/tdr-bitstream/0.0.3/files/15) for the Arty-A7-35T board +- the [FPGA bitstream](https://git.insignificance.de/slederer/-/packages/generic/tdr-bitstream/0.0.4/files/16) for the Arty-A7-35T board - an [SD-card image](https://git.insignificance.de/slederer/-/packages/generic/tdr-cardimage/0.0.4/files/13) Contact the author here: tridoracpu [at] insignificance.de diff --git a/tridoracpu/tridoracpu.xpr b/tridoracpu/tridoracpu.xpr index 3767063..30d168a 100644 --- a/tridoracpu/tridoracpu.xpr +++ b/tridoracpu/tridoracpu.xpr @@ -356,12 +356,15 @@ - + - - Vivado Synthesis Defaults + + Performs general area optimizations including changing the threshold for control set optimizations, forcing ternary adder implementation, applying lower thresholds for use of carry chain in comparators and also area optimized mux optimizations. - + + + + @@ -378,14 +381,14 @@ - + - - Similar to Performance_ExplorePostRoutePhysOpt, but enables logic optimization step (opt_design) with the ExploreWithRemap directive. + + Uses multiple algorithms for optimization, placement, and routing to get potentially better results. - + @@ -396,12 +399,9 @@ - - - - + From 87ec71bd6de7a6a649c10be06d16d09b3e264414 Mon Sep 17 00:00:00 2001 From: slederer Date: Wed, 5 Nov 2025 00:30:49 +0100 Subject: [PATCH 03/24] align _END label, add ALIGN directive to assembler - fixes failing memory allocator when _END label is not aligned --- pcomp/emit.pas | 4 +++- pcomp/sasm.pas | 3 +++ 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/pcomp/emit.pas b/pcomp/emit.pas index a201714..d440951 100644 --- a/pcomp/emit.pas +++ b/pcomp/emit.pas @@ -324,7 +324,9 @@ begin rewindStringList(usedUnits); while nextStringListItem(usedUnits, unitName) do emitInclude(unitName + UnitSuffix2); - + (* _END label needs to be word-aligned because + it is used as the start of the heap *) + emitIns('.ALIGN'); emitLabelRaw('_END'); end; diff --git a/pcomp/sasm.pas b/pcomp/sasm.pas index 1858f11..2af55c3 100644 --- a/pcomp/sasm.pas +++ b/pcomp/sasm.pas @@ -2056,6 +2056,9 @@ begin operandValue := 0; emitBlock(count, operandValue); end + else + if lastToken.tokenText6 = '.ALIGN' then + alignOutput(wordSize) else errorExit2('Unrecognized directive', lastToken.tokenText); end; From 8f4d0176683bef38852d5a2893e651a59666b0eb Mon Sep 17 00:00:00 2001 From: slederer Date: Sun, 30 Nov 2025 23:49:44 +0100 Subject: [PATCH 04/24] sasm: fix typo error; examples: add fire demo --- examples/fastfire.inc | 5 + examples/fastfire.s | 326 ++++++++++++++++++++++++++++++++++++++++++ examples/fire.pas | 76 ++++++++++ examples/fire2.pas | 84 +++++++++++ pcomp/sasm.pas | 2 +- 5 files changed, 492 insertions(+), 1 deletion(-) create mode 100644 examples/fastfire.inc create mode 100644 examples/fastfire.s create mode 100644 examples/fire.pas create mode 100644 examples/fire2.pas diff --git a/examples/fastfire.inc b/examples/fastfire.inc new file mode 100644 index 0000000..bf0dce6 --- /dev/null +++ b/examples/fastfire.inc @@ -0,0 +1,5 @@ +const FIREWIDTH = 319; FIREHEIGHT = 79; (* keep in sync with fastfire.s! *) +type FireBuf = array [0..FIREHEIGHT, 0..FIREWIDTH] of integer; + +procedure FastFireUpdate(var f:FireBuf); external; +procedure FastFireDraw(var f:FireBuf;screenx, screeny:integer); external; diff --git a/examples/fastfire.s b/examples/fastfire.s new file mode 100644 index 0000000..f0e10e4 --- /dev/null +++ b/examples/fastfire.s @@ -0,0 +1,326 @@ + ; width and height of the fire cell matrix + ; Be sure to sync this with fastfire.inc! + .EQU FIREWIDTH 319 + .EQU FIREHEIGHT 79 + + ; + ; The cell matrix actually has one column + ; and one row more than FIREWIDTH and + ; FIREHEIGHT to handle the negative + ; X offsets when calculating new + ; cell values. + ; Likewise, there is one more row. + ; So rows are processed from 0 to FIREHEIGHT - 2 + ; and columms from 1 to FIREWIDTH - 1. + + ; cells considered for calculating new + ; value for cell O (reference cells): + ; .....O...... + ; ....123..... + ; .....4...... + +; args: pointer to fire cell buffer + .EQU FF_ROW_COUNT 0 + .EQU FF_COL_COUNT 4 + .EQU FF_ROW_OFFS 8 + .EQU FF_OFFS1 12 + .EQU FF_OFFS2 16 + .EQU FF_OFFS3 20 + .EQU FF_OFFS4 24 + .EQU FF_CELL_PTR 28 + .EQU FF_FS 32 +FASTFIREUPDATE: + FPADJ -FF_FS + STORE FF_CELL_PTR + LOADC FIREHEIGHT-1 + STORE FF_ROW_COUNT + + ; calculate offsets for reference cells + LOADC FIREWIDTH+1 + SHL 2 + DUP + STORE FF_ROW_OFFS ; offset to next row: WIDTH*4 + DEC 4 ; offset to cell 1: row offset - 4 + DUP + STORE FF_OFFS1 + INC 4 + DUP + STORE FF_OFFS2 ; offset to cell 2: + 4 + INC 4 + STORE FF_OFFS3 ; offset to cell 3: + 4 + LOAD FF_ROW_OFFS + SHL 1 ; offset to cell 4: row offset * 2 + STORE FF_OFFS4 + + ; start at column 1 + LOAD FF_CELL_PTR + INC 4 + STORE FF_CELL_PTR +FF_ROW: + LOADC FIREWIDTH-1 + STORE FF_COL_COUNT + +FF_COL: + LOAD FF_CELL_PTR + LOAD FF_OFFS1 + ADD + LOADI + + LOAD FF_CELL_PTR + LOAD FF_OFFS2 + ADD + LOADI + + LOAD FF_CELL_PTR + LOAD FF_OFFS3 + ADD + LOADI + + LOAD FF_CELL_PTR + LOAD FF_OFFS4 + ADD + LOADI + + ADD + ADD + ADD + + SHR + SHR + + ; if new cell value > 0, subtract 1 to cool down + DUP + CBRANCH.Z FF_SKIP + DEC 1 +FF_SKIP: + LOAD FF_CELL_PTR ; load cell ptr + SWAP ; swap with new value + STOREI 4 ; store with postincrement + STORE FF_CELL_PTR ; save new ptr value + + LOAD FF_COL_COUNT ; decrement column count + DEC 1 + DUP + STORE FF_COL_COUNT + CBRANCH.NZ FF_COL ; loop if col count <> 0 + + ; at the end of a row, go to next row + ; by adding 8 to the cell pointer, + ; skipping the first cell of the next row + LOAD FF_CELL_PTR + INC 8 + STORE FF_CELL_PTR + + LOAD FF_ROW_COUNT ; decrement row count + DEC 1 + DUP + STORE FF_ROW_COUNT + CBRANCH.NZ FF_ROW ; loop if row count <> 0 + +FF_EXIT: + FPADJ FF_FS + RET + +; framebuffer controller registers + .EQU FB_RA $900 + .EQU FB_WA $901 + .EQU FB_IO $902 + .EQU FB_PS $903 + .EQU FB_PD $904 + .EQU FB_CTL $905 + .EQU WORDS_PER_LINE 80 + +; fire width in vmem words (strict left-to-right evaluation) + .EQU FFD_ROW_WORDS 1 + FIREWIDTH / 8 + +; draw all fire cells +; args: pointer to fire cell buffer, screen x, screen y + .EQU FFD_CELL_PTR 0 + .EQU FFD_X 4 + .EQU FFD_Y 8 + .EQU FFD_ROW_COUNT 12 + .EQU FFD_ROW_WORDCOUNT 16 + .EQU FFD_VMEM_PTR 20 + .EQU FFD_FS 24 +FASTFIREDRAW: + FPADJ -FFD_FS + STORE FFD_Y + STORE FFD_X + STORE FFD_CELL_PTR + + ; calculate video memory addr + ; addr = y * 80 + X / 8 + LOAD FFD_Y + SHL 2 ; y * 16 + SHL 2 + DUP + SHL 2 ; + y * 64 + ADD ; = y * 80 + + LOAD FFD_X + SHR + SHR + SHR + ADD ; + x / 8 + + DUP + STORE FFD_VMEM_PTR + LOADC FB_WA ; set vmem write address + SWAP + STOREI + DROP + + LOADC FIREHEIGHT + 1 + STORE FFD_ROW_COUNT +FFD_ROW: + LOADC FFD_ROW_WORDS + STORE FFD_ROW_WORDCOUNT + + LOADC FB_WA ; set vmem write address + LOAD FFD_VMEM_PTR + STOREI + DROP + +FFD_WORD: + LOAD FFD_CELL_PTR ; load cell ptr + LOADC 0 ; vmem word, start with 0 + + ; leftmost pixel (0) + OVER ; [ cptr, vmemw, cptr ] + LOADI ; load cell value [ cptr, vmemw, cellval ] + SHR ; scale it down (from 7 bits to 4) + SHR + SHR ; [ cptr, vmemw, cellval shr 3 ] + OR ; [ cptr, vmemw ] + SWAP ; [ vmemw, cptr ] + INC 4 ; increment cell ptr on stack [ vmemw, cptr + 4 ] + SWAP ; [ cptr + 4, vmemw ] + + SHL 2 ; move bits to left for next pixel + SHL 2 + + ; pixel 1 + OVER + LOADI ; load cell value + SHR ; scale it down (from 7 bits to 4) + SHR + SHR + OR + SWAP + INC 4 ; increment cell ptr on stack + SWAP + + SHL 2 ; move bits to left for next pixel + SHL 2 + + ; pixel 2 + OVER + LOADI ; load cell value + SHR ; scale it down (from 7 bits to 4) + SHR + SHR + OR + SWAP + INC 4 ; increment cell ptr on stack + SWAP + + SHL 2 ; move bits to left for next pixel + SHL 2 + + ; pixel 3 + OVER + LOADI ; load cell value + SHR ; scale it down (from 7 bits to 4) + SHR + SHR + OR + SWAP + INC 4 ; increment cell ptr on stack + SWAP + + SHL 2 ; move bits to left for next pixel + SHL 2 + + ; pixel 4 + OVER + LOADI ; load cell value + SHR ; scale it down (from 7 bits to 4) + SHR + SHR + OR + SWAP + INC 4 ; increment cell ptr on stack + SWAP + + SHL 2 ; move bits to left for next pixel + SHL 2 + + ; pixel 5 + OVER + LOADI ; load cell value + SHR ; scale it down (from 7 bits to 4) + SHR + SHR + OR + SWAP + INC 4 ; increment cell ptr on stack + SWAP + + SHL 2 ; move bits to left for next pixel + SHL 2 + + ; pixel 6 + OVER + LOADI ; load cell value + SHR ; scale it down (from 7 bits to 4) + SHR + SHR + OR + SWAP + INC 4 ; increment cell ptr on stack + SWAP + + SHL 2 ; move bits to left for next pixel + SHL 2 + + ; pixel 7 + OVER + LOADI ; load cell value + SHR ; scale it down (from 7 bits to 4) + SHR + SHR + OR + SWAP + INC 4 ; increment cell ptr on stack + SWAP + + ; store word to vmem + ; vmem write addr will autoincrement + LOADC FB_IO + SWAP + STOREI + DROP + + STORE FFD_CELL_PTR + + ; prepare for next word + LOAD FFD_ROW_WORDCOUNT + DEC 1 + DUP + STORE FFD_ROW_WORDCOUNT + CBRANCH.NZ FFD_WORD + + ; prepare for next row + LOAD FFD_VMEM_PTR + LOADC WORDS_PER_LINE + ADD + STORE FFD_VMEM_PTR + + LOAD FFD_ROW_COUNT + DEC 1 + DUP + STORE FFD_ROW_COUNT + CBRANCH.NZ FFD_ROW +FFD_EXIT: + FPADJ FFD_FS + RET diff --git a/examples/fire.pas b/examples/fire.pas new file mode 100644 index 0000000..22f0217 --- /dev/null +++ b/examples/fire.pas @@ -0,0 +1,76 @@ +{$H1} +{$S2} +program fire; +const MAXX = 30; + MAXY = 50; +var firebuf: array [0..MAXY, 0..MAXX] of integer; + + firepalette: array [0..15] of integer = + ( $FFA, $FF8, $FF4, $FF0, $FE0, $FD0, $FA0, $F90, + $F00, $E00, $D00, $A00, $800, $600, $300, $000); + x,y:integer; + +procedure createPalette; +var i:integer; +begin + for i := 15 downto 0 do + setpalette(15 - i, firepalette[i]); +end; + +procedure fireItUp; +var x,y:integer; +begin + y := MAXY - 1; + for x := 1 to MAXX - 1 do + firebuf[y, x] := random and 127; +end; + +procedure updateFire; +var i,x,y:integer; +begin + for y := 0 to MAXY - 2 do + for x := 1 to MAXX - 1 do + begin + i := + ((firebuf[y + 1, x - 1] + + firebuf[y + 1, x] + + firebuf[y + 1, x + 1] + + firebuf[y + 2, x]) + ) shr 2; + if i > 0 then + i := i - 1; + firebuf[y, x] := i; + end; +end; + +procedure drawFire; +var x, y, col:integer; +begin + for y := 0 to MAXY - 1 do + begin + x := 0; + for col in firebuf[y] do + begin + putpixel(300 + x, 150 + y, col shr 3); + x := x + 1; + end; + end; +end; + +begin + randomize; + initgraphics; + createPalette; + while not conavail do + begin + fireItUp; + updateFire; + drawFire; + end; + + for y := 0 to MAXY do + begin + x := firebuf[y, 10]; + drawline(0, y, x, y, 1); + end; +end. diff --git a/examples/fire2.pas b/examples/fire2.pas new file mode 100644 index 0000000..72fb254 --- /dev/null +++ b/examples/fire2.pas @@ -0,0 +1,84 @@ +{$H1} +{$S1} +program fire2; +uses fastfire; + +const MAXX = FIREWIDTH; + MAXY = FIREHEIGHT; + +var firecells: FireBuf; + + firepalette: array [0..15] of integer = + { ( $FFA, $FF8, $FF4, $FF0, $FE0, $FD0, $FA0, $F90, + $F00, $E00, $D00, $A00, $800, $600, $300, $000); } + ( $FFA, $FFA, $FFA, $FFA, $FF0, $FF0, $FF0, $FF0, + $FF0, $FD0, $FA0, $C00, $A00, $700, $400, $000); + x,y:integer; + +procedure createPalette; +var i:integer; +begin + for i := 15 downto 0 do + setpalette(15 - i, firepalette[i]); +end; + +procedure fireItUp; +var x,y:integer; +begin + y := MAXY - 1; + for x := 1 to MAXX - 1 do + firecells[y, x] := random and 127; +end; + + +procedure updateFire; +var i,x,y:integer; +begin + for y := 0 to MAXY - 2 do + for x := 1 to MAXX - 1 do + begin + i := + ((firecells[y + 1, x - 1] + + firecells[y + 1, x] + + firecells[y + 1, x + 1] + + firecells[y + 2, x]) + ) shr 2; + if i > 0 then + i := i - 1; + firecells[y, x] := i; + end; +end; + +procedure drawFire; +var x, y, col:integer; +begin + for y := 0 to MAXY - 1 do + begin + x := 0; + for col in firecells[y] do + begin + putpixel(100 + x, 150 + y, col shr 3); + x := x + 1; + end; + end; +end; + +begin + randomize; + initgraphics; + createPalette; + while not conavail do + begin + fireItUp; + FastFireUpdate(firecells); + { updateFire; } + FastFireDraw(firecells, 160, 100); + { drawFire; } + end; + + for y := 0 to MAXY do + begin + x := firecells[y, 10]; + drawline(0, y, x, y, 1); + end; +end. diff --git a/pcomp/sasm.pas b/pcomp/sasm.pas index 2af55c3..d032748 100644 --- a/pcomp/sasm.pas +++ b/pcomp/sasm.pas @@ -2057,7 +2057,7 @@ begin emitBlock(count, operandValue); end else - if lastToken.tokenText6 = '.ALIGN' then + if lastToken.tokenText = '.ALIGN' then alignOutput(wordSize) else errorExit2('Unrecognized directive', lastToken.tokenText); From 0016d4ea25d95614417f1334c58fd9571052a809 Mon Sep 17 00:00:00 2001 From: slederer Date: Fri, 5 Dec 2025 00:58:15 +0100 Subject: [PATCH 05/24] utils/serload: add interactive mode xfer: reset block count on transfer start --- progs/xfer.pas | 1 + utils/serload.py | 151 ++++++++++++++++++++++++++++++++++------------- 2 files changed, 110 insertions(+), 42 deletions(-) diff --git a/progs/xfer.pas b/progs/xfer.pas index 13a7cc2..0d871d2 100644 --- a/progs/xfer.pas +++ b/progs/xfer.pas @@ -226,6 +226,7 @@ begin if not invalid then begin open(xferFile, filename, ModeOverwrite); + blockNo := 0; done := false; repeat serReadBlock(ok); diff --git a/utils/serload.py b/utils/serload.py index 6ccc4a6..e69837f 100644 --- a/utils/serload.py +++ b/utils/serload.py @@ -16,6 +16,7 @@ # limitations under the License. import sys +import os import serial import time import random @@ -41,30 +42,6 @@ def get_default_device(): return '/dev/ttyUSB1' -def serwrite_slow(databytes, ser): - total = len(data) - count = 1 - for d in data: - sys.stdout.write("writing {0:02x} {1:04d}/{2:04d}\r".format(ord(d), count, total)) - ser.write(bytes(d,"utf8")) - count += 1 - time.sleep(0.020) - print() - - -def serwrite(datafile, ser): - with open(datafile) as f: - data = f.read() - total = len(data) - count = 1 - for d in data: - sys.stdout.write("writing {0:02x} {1:04d}/{2:04d}\r".format(ord(d), count, total)) - ser.write(bytes(d,"utf8")) - count += 1 - time.sleep(0.020) - print() - - def checksum(databytes): i = 0 cksum = 0 @@ -85,10 +62,26 @@ def sendchar(char, ser): ser.write(char.to_bytes(1, 'big')) -def sendcommand(ser, cmd=b'L'): +def sendcommand(ser, cmd=b'L', verbose=False): + verbose = True ser.write(cmd) resp = ser.read_until() - print(cmd,"sent, response:", str(resp)) + if verbose: + print(cmd,"sent, response:", str(resp)) + return resp + + +# send command and wait for echo +def commandwait(ser, cmd): + resp = sendcommand(ser, cmd, verbose=False) + if len(resp) == 0: + print("timeout sending '{}' command".format(cmd)) + return None + + if resp != bytearray(cmd + b"\r\n"): + print("invalid response to '{}' command".format(cmd)) + return None + return resp @@ -153,6 +146,8 @@ def serload_bin(datafile, ser): data += bytearray(pad) + print("{} total blocks".format((len(data) + blocksize - 1) // blocksize)) + if not send_size_header(ser, filesize): print("Error sending size header.") return @@ -279,18 +274,8 @@ def serdownload(fname, ser): def mput(filenames, ser): for f in filenames: - f_encoded = f.encode('utf8') - print("Setting filename", f) - resp = sendcommand(ser, b'S') - if len(resp) == 0: - print("timeout sending 'S' command") - return - if resp != b'S\r\n' and resp != b'> S\r\n': - print("unrecognized response to 'S' command, aborting") - return - resp = sendcommand(ser, f_encoded + b'\r') - if not f_encoded in resp: - print("unrecognized response to filename, aborting") + resp = set_filename(f, ser) + if resp is None: return serload_bin(f, ser) @@ -299,12 +284,92 @@ def mput(filenames, ser): time.sleep(2) +def set_filename(f, ser): + f_encoded = f.encode('utf8') + print("Setting filename", f) + resp = commandwait(ser, b'S') + if resp is None: + return None + resp = sendcommand(ser, f_encoded + b'\r') + if not f_encoded in resp: + print("unrecognized response to filename, aborting") + return None + return resp + + +def getnamedfile(filename, ser): + resp = set_filename(filename, ser) + if resp is None: + return None + serdownload(filename, ser) + + +def putnamedfile(filename, ser): + resp = set_filename(filename, ser) + if resp is None: + return None + serload_bin(filename, ser) + print("Remote status:") + showdata(ser) + + +def showdata(ser): + + promptseen = False + + while not promptseen: + c = ser.read(1) + if c == b'>': + promptseen = True + else: + print(c.decode('utf8'), end='') + rest = ser.read(1) + + +def localdir(): + result = os.walk(".") + for dirpath, dirnames, filenames in os.walk("."): + for f in filenames: + print(f) + break + + +def interactive(ser): + done = False + while not done: + args = input("> ").strip().split() + if len(args) > 0: + cmd = args[0] + args.pop(0) + if cmd == 'dir': + if commandwait(ser, b'Y') is None: + return + showdata(ser) + elif cmd == 'get': + if len(args) > 1: + print("exactly one argument required (filename)") + else: + getnamedfile(args[0], ser) + elif cmd == 'put': + if len(args) > 1: + print("exactly one argument required (filename)") + else: + putnamedfile(args[0], ser) + elif cmd == 'ldir': + if len(args) > 0: + print("superfluous argument") + else: + localdir() + else: + print("Unknown command. Valid commands are: dir get ldir put") + + if __name__ == "__main__": argparser = argparse.ArgumentParser( description='transfer files from/to the Tridora-CPU') argparser.add_argument('-d', '--device', help='serial device', default=get_default_device()) - argparser.add_argument('command', choices=['get', 'put', 'mput']) - argparser.add_argument('filename', nargs='+') + argparser.add_argument('command', choices=['get', 'put', 'mput', 'interactive']) + argparser.add_argument('filename', nargs='*') args = argparser.parse_args() cmd = args.command @@ -319,8 +384,10 @@ if __name__ == "__main__": serload_bin(filenames[0], ser) elif cmd == 'mput': mput(filenames, ser) + elif cmd == 'interactive': + interactive(ser) else: print("should not get here") - if cmd is not None: - ser.close() + #if cmd is not None: + # ser.close() From d2f3b09e72e1990dc38c5c643b9cf7446c0f70b7 Mon Sep 17 00:00:00 2001 From: slederer Date: Mon, 15 Dec 2025 00:53:36 +0100 Subject: [PATCH 06/24] tridoracpu: cleaned up top a bit, removed some warnings --- .../tridoracpu.srcs/Arty-A7-35-Master.xdc | 6 ++-- tridoracpu/tridoracpu.srcs/stackcpu.v | 23 +++++-------- tridoracpu/tridoracpu.srcs/tdraudio.v | 22 +++++++++---- tridoracpu/tridoracpu.srcs/top.v | 21 ++++++------ tridoracpu/tridoracpu.xpr | 33 +++++++------------ 5 files changed, 47 insertions(+), 58 deletions(-) diff --git a/tridoracpu/tridoracpu.srcs/Arty-A7-35-Master.xdc b/tridoracpu/tridoracpu.srcs/Arty-A7-35-Master.xdc index 2a33ae0..d2c3160 100644 --- a/tridoracpu/tridoracpu.srcs/Arty-A7-35-Master.xdc +++ b/tridoracpu/tridoracpu.srcs/Arty-A7-35-Master.xdc @@ -8,8 +8,8 @@ set_property -dict {PACKAGE_PIN E3 IOSTANDARD LVCMOS33} [get_ports clk] create_clock -period 10.000 -name sys_clk_pin -waveform {0.000 5.000} -add [get_ports clk] ## Switches -set_property -dict {PACKAGE_PIN A8 IOSTANDARD LVCMOS33} [get_ports sw0] -set_property -dict {PACKAGE_PIN C11 IOSTANDARD LVCMOS33} [get_ports sw1] +#set_property -dict {PACKAGE_PIN A8 IOSTANDARD LVCMOS33} [get_ports sw0] +#set_property -dict {PACKAGE_PIN C11 IOSTANDARD LVCMOS33} [get_ports sw1] #set_property -dict { PACKAGE_PIN C10 IOSTANDARD LVCMOS33 } [get_ports { sw[2] }]; #IO_L13N_T2_MRCC_16 Sch=sw[2] #set_property -dict { PACKAGE_PIN A10 IOSTANDARD LVCMOS33 } [get_ports { sw[3] }]; #IO_L14P_T2_SRCC_16 Sch=sw[3] @@ -34,7 +34,7 @@ set_property -dict {PACKAGE_PIN T9 IOSTANDARD LVCMOS33} [get_ports led2] set_property -dict {PACKAGE_PIN T10 IOSTANDARD LVCMOS33} [get_ports led3] ## Buttons -set_property -dict {PACKAGE_PIN D9 IOSTANDARD LVCMOS33} [get_ports btn0] +#set_property -dict {PACKAGE_PIN D9 IOSTANDARD LVCMOS33} [get_ports btn0] #set_property -dict { PACKAGE_PIN C9 IOSTANDARD LVCMOS33 } [get_ports { btn1 }]; #IO_L11P_T1_SRCC_16 Sch=btn[1] #set_property -dict { PACKAGE_PIN B9 IOSTANDARD LVCMOS33 } [get_ports { btn[2] }]; #IO_L11N_T1_SRCC_16 Sch=btn[2] #set_property -dict { PACKAGE_PIN B8 IOSTANDARD LVCMOS33 } [get_ports { btn[3] }]; #IO_L12P_T1_MRCC_16 Sch=btn[3] diff --git a/tridoracpu/tridoracpu.srcs/stackcpu.v b/tridoracpu/tridoracpu.srcs/stackcpu.v index 33b58ec..1d929f7 100644 --- a/tridoracpu/tridoracpu.srcs/stackcpu.v +++ b/tridoracpu/tridoracpu.srcs/stackcpu.v @@ -16,11 +16,11 @@ module stackcpu #(parameter ADDR_WIDTH = 32, WIDTH = 32, output wire write_enable, input wire mem_wait, - output wire led1, - output wire led2, - output wire led3 + output wire debug1, + output wire debug2, + output wire debug3 ); - + localparam EVAL_STACK_INDEX_WIDTH = 6; wire reset = !rst; @@ -90,7 +90,6 @@ module stackcpu #(parameter ADDR_WIDTH = 32, WIDTH = 32, wire mem_write; wire x_is_zero; - // wire [WIDTH-1:0] y_plus_operand = Y + operand; wire x_equals_y = X == Y; wire y_lessthan_x = $signed(Y) < $signed(X); @@ -105,16 +104,10 @@ module stackcpu #(parameter ADDR_WIDTH = 32, WIDTH = 32, assign write_enable = mem_write_enable; // debug output ------------------------------------------------------------------------------------ - assign led1 = reset; - assign led2 = ins_loadc; - assign led3 = ins_branch; -// assign debug_out1 = { mem_read_enable, mem_write_enable, x_is_zero, -// ins_branch, ins_aluop, y_lessthan_x, x_equals_y, {7{1'b0}}, seq_state}; -// assign debug_out2 = data_in; -// assign debug_out3 = nX; -// assign debug_out4 = nPC; -// assign debug_out5 = ins; -// assign debug_out6 = IV; + assign debug1 = reset; + assign debug2 = ins_loadc; + assign debug3 = ins_branch; + //-------------------------------------------------------------------------------------------------- // instruction decoding diff --git a/tridoracpu/tridoracpu.srcs/tdraudio.v b/tridoracpu/tridoracpu.srcs/tdraudio.v index 0cc055e..1629e31 100644 --- a/tridoracpu/tridoracpu.srcs/tdraudio.v +++ b/tridoracpu/tridoracpu.srcs/tdraudio.v @@ -7,7 +7,7 @@ module wavegen #(DATA_WIDTH=32, CLOCK_DIV_WIDTH=22, input wire reset, input wire [1:0] reg_sel, output wire [DATA_WIDTH-1:0] rd_data, - input wire [DATA_WIDTH-1:0] wr_data, + input wire [AMP_WIDTH-1:0] wr_data, input wire rd_en, input wire wr_en, @@ -20,6 +20,9 @@ module wavegen #(DATA_WIDTH=32, CLOCK_DIV_WIDTH=22, localparam TDRAU_REG_CLK = 1; /* clock divider register */ localparam TDRAU_REG_AMP = 2; /* amplitude (volume) register */ + /* avoid warning about unconnected port */ + (* keep="soft" *) wire _unused = rd_en; + reg channel_enable; reg [CLOCK_DIV_WIDTH-1:0] clock_div; reg [CLOCK_DIV_WIDTH-1:0] div_count; @@ -29,12 +32,12 @@ module wavegen #(DATA_WIDTH=32, CLOCK_DIV_WIDTH=22, wire fifo_wr_en; wire fifo_rd_en, fifo_full, fifo_empty; - wire [DATA_WIDTH-1:0] fifo_rd_data; + wire [AMP_WIDTH-1:0] fifo_rd_data; fifo #(.ADDR_WIDTH(4), .DATA_WIDTH(16)) sample_buf( clk, reset, fifo_wr_en, fifo_rd_en, - wr_data, fifo_rd_data, + wr_data[AMP_WIDTH-1:0], fifo_rd_data, fifo_full, fifo_empty ); @@ -166,9 +169,14 @@ module tdraudio #(DATA_WIDTH=32) ( localparam AMP_BIAS = 32768; localparam DAC_WIDTH = 18; + /* avoid warning about unconnected port */ + (* keep="soft" *) wire [DATA_WIDTH-1:AMP_WIDTH] _unused = wr_data[DATA_WIDTH-1:AMP_WIDTH]; + wire [4:0] chan_sel = io_addr[6:2]; wire [1:0] reg_sel = io_addr[1:0]; + wire [AMP_WIDTH-1:0] amp_wr_data = wr_data[AMP_WIDTH-1:0]; + wire [AMP_WIDTH-1:0] chan0_amp; wire [DATA_WIDTH-1:0] chan0_rd_data; wire chan0_running; @@ -210,25 +218,25 @@ module tdraudio #(DATA_WIDTH=32) ( {DATA_WIDTH{1'b1}}; wavegen chan0(clk, reset, reg_sel, - chan0_rd_data, wr_data, + chan0_rd_data, amp_wr_data, chan0_rd_en, chan0_wr_en, chan0_amp, chan0_running, chan0_irq); wavegen chan1(clk, reset, reg_sel, - chan1_rd_data, wr_data, + chan1_rd_data, amp_wr_data, chan1_rd_en, chan1_wr_en, chan1_amp, chan1_running, chan1_irq); wavegen chan2(clk, reset, reg_sel, - chan2_rd_data, wr_data, + chan2_rd_data, amp_wr_data, chan2_rd_en, chan2_wr_en, chan2_amp, chan2_irq, chan2_running); wavegen chan3(clk, reset, reg_sel, - chan3_rd_data, wr_data, + chan3_rd_data, amp_wr_data, chan3_rd_en, chan3_wr_en, chan3_amp, chan3_running, chan3_irq); diff --git a/tridoracpu/tridoracpu.srcs/top.v b/tridoracpu/tridoracpu.srcs/top.v index 6a70ef0..a4533d2 100644 --- a/tridoracpu/tridoracpu.srcs/top.v +++ b/tridoracpu/tridoracpu.srcs/top.v @@ -15,9 +15,6 @@ module top( input wire clk, input wire rst, - input wire btn0, - input wire sw0, - input wire sw1, output wire led0, output wire led1, output wire led2, @@ -229,6 +226,15 @@ module top( assign uart_rd_data = { {WIDTH-10{1'b1}}, uart_rx_avail, uart_tx_busy, uart_rx_data }; wire audio_irq; + + buart #(.CLKFREQ(`clkfreq)) uart0(`clock, rst, + uart_baud, + uart_txd_in, uart_rxd_out, + uart_rx_clear, uart_tx_en, + uart_rx_avail, uart_tx_busy, + uart_tx_data, uart_rx_data); + + // audio controller `ifdef ENABLE_TDRAUDIO wire [WIDTH-1:0] tdraudio_wr_data; wire [WIDTH-1:0] tdraudio_rd_data; @@ -273,13 +279,6 @@ module top( `endif -1; - buart #(.CLKFREQ(`clkfreq)) uart0(`clock, rst, - uart_baud, - uart_txd_in, uart_rxd_out, - uart_rx_clear, uart_tx_en, - uart_rx_avail, uart_tx_busy, - uart_tx_data, uart_rx_data); - // CPU ----------------------------------------------------------------- stackcpu cpu0(.clk(`clock), .rst(rst), .irq(irq), .addr(mem_addr), @@ -287,7 +286,7 @@ module top( .read_ins(dram_read_ins), .data_out(mem_write_data), .write_enable(mem_write_enable), .mem_wait(mem_wait), - .led1(led1), .led2(led2), .led3(led3)); + .debug1(led1), .debug2(led2), .debug3(led3)); // Interrupt Controller irqctrl irqctrl0(`clock, irq_in, irqc_cs, mem_write_enable, diff --git a/tridoracpu/tridoracpu.xpr b/tridoracpu/tridoracpu.xpr index 30d168a..a9dc20f 100644 --- a/tridoracpu/tridoracpu.xpr +++ b/tridoracpu/tridoracpu.xpr @@ -356,15 +356,12 @@ - + - - Performs general area optimizations including changing the threshold for control set optimizations, forcing ternary adder implementation, applying lower thresholds for use of carry chain in comparators and also area optimized mux optimizations. + + Vivado Synthesis Defaults - - - - + @@ -381,26 +378,18 @@ - + - - Uses multiple algorithms for optimization, placement, and routing to get potentially better results. + + Default settings for Implementation. - - - + - - - + - - - - - - + + From a9412d1339d16114e85b75c1478cac4fcf2ccb57 Mon Sep 17 00:00:00 2001 From: slederer Date: Thu, 1 Jan 2026 02:07:36 +0100 Subject: [PATCH 07/24] tdraudio: fix wiring for channel 2, irqctrl: increase delay --- tridoracpu/tridoracpu.srcs/irqctrl.v | 2 +- tridoracpu/tridoracpu.srcs/tdraudio.v | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/tridoracpu/tridoracpu.srcs/irqctrl.v b/tridoracpu/tridoracpu.srcs/irqctrl.v index b71df60..5608440 100644 --- a/tridoracpu/tridoracpu.srcs/irqctrl.v +++ b/tridoracpu/tridoracpu.srcs/irqctrl.v @@ -1,6 +1,6 @@ `timescale 1ns / 1ps -module irqctrl #(IRQ_LINES = 3, IRQ_DELAY_WIDTH = 4) ( +module irqctrl #(IRQ_LINES = 3, IRQ_DELAY_WIDTH = 8) ( input wire clk, input wire [IRQ_LINES-1:0] irq_in, input wire cs, diff --git a/tridoracpu/tridoracpu.srcs/tdraudio.v b/tridoracpu/tridoracpu.srcs/tdraudio.v index 1629e31..4ad978d 100644 --- a/tridoracpu/tridoracpu.srcs/tdraudio.v +++ b/tridoracpu/tridoracpu.srcs/tdraudio.v @@ -233,7 +233,7 @@ module tdraudio #(DATA_WIDTH=32) ( chan2_rd_data, amp_wr_data, chan2_rd_en, chan2_wr_en, chan2_amp, - chan2_irq, chan2_running); + chan2_running, chan2_irq); wavegen chan3(clk, reset, reg_sel, chan3_rd_data, amp_wr_data, From caa07474f8574ad7cc088587bf50326ab79a0c53 Mon Sep 17 00:00:00 2001 From: slederer Date: Thu, 1 Jan 2026 02:09:02 +0100 Subject: [PATCH 08/24] minor comment/documentation cleanups --- doc/uart.md | 2 +- tridoracpu/tridoracpu.srcs/stackcpu.v | 5 ++++- tridoracpu/tridoracpu.srcs/top.v | 7 +++---- tridoracpu/tridoracpu.xpr | 12 +++--------- 4 files changed, 11 insertions(+), 15 deletions(-) diff --git a/doc/uart.md b/doc/uart.md index b349eb4..6a6d191 100644 --- a/doc/uart.md +++ b/doc/uart.md @@ -37,7 +37,7 @@ It uses a fixed serial configuration of 115200 bps, 8 data bits, 1 stop bit, no ## Notes -A 16 byte FIFO is used when receiving data. +A 64 byte FIFO is used when receiving data. When reading data, each byte needs to be acknowledged by writing the _C_ flag to the UART register. diff --git a/tridoracpu/tridoracpu.srcs/stackcpu.v b/tridoracpu/tridoracpu.srcs/stackcpu.v index 1d929f7..b8ef78c 100644 --- a/tridoracpu/tridoracpu.srcs/stackcpu.v +++ b/tridoracpu/tridoracpu.srcs/stackcpu.v @@ -399,7 +399,10 @@ module stackcpu #(parameter ADDR_WIDTH = 32, WIDTH = 32, // process irq always @(posedge clk) begin - if(seq_state == MEM && irq_pending && !(ins_xfer & xfer_r2p)) // in FETCH state, clear irq_pending. + // in MEM state, clear irq_pending, when nPC has been set to IV + // RET instruction is a special case because we need to use + // the new PC that is in mem_data + if(seq_state == MEM && irq_pending && !(ins_xfer && xfer_r2p)) irq_pending <= 0; else irq_pending <= irq_pending || irq; // else set irq_pending when irq is high diff --git a/tridoracpu/tridoracpu.srcs/top.v b/tridoracpu/tridoracpu.srcs/top.v index a4533d2..0dc3346 100644 --- a/tridoracpu/tridoracpu.srcs/top.v +++ b/tridoracpu/tridoracpu.srcs/top.v @@ -278,6 +278,9 @@ module top( (io_slot == 4) ? tdraudio_rd_data: `endif -1; + irqctrl irqctrl0(`clock, irq_in, irqc_cs, mem_write_enable, + irqc_seten, irqc_rd_data0, + irq); // CPU ----------------------------------------------------------------- stackcpu cpu0(.clk(`clock), .rst(rst), .irq(irq), @@ -288,10 +291,6 @@ module top( .mem_wait(mem_wait), .debug1(led1), .debug2(led2), .debug3(led3)); - // Interrupt Controller - irqctrl irqctrl0(`clock, irq_in, irqc_cs, mem_write_enable, - irqc_seten, irqc_rd_data0, - irq); // count clock ticks // generate interrupt every 20nth of a second diff --git a/tridoracpu/tridoracpu.xpr b/tridoracpu/tridoracpu.xpr index a9dc20f..a3dd3f6 100644 --- a/tridoracpu/tridoracpu.xpr +++ b/tridoracpu/tridoracpu.xpr @@ -358,9 +358,7 @@ - - Vivado Synthesis Defaults - + @@ -380,9 +378,7 @@ - - Default settings for Implementation. - + @@ -391,9 +387,7 @@ - - - + From 7751d8576520d55d6d6afeeb8d6f89f3c96a1828 Mon Sep 17 00:00:00 2001 From: slederer Date: Wed, 31 Dec 2025 13:24:20 +0100 Subject: [PATCH 09/24] pcomp: Makefile bugfixes --- pcomp/Makefile | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/pcomp/Makefile b/pcomp/Makefile index e183f73..4afc3ef 100644 --- a/pcomp/Makefile +++ b/pcomp/Makefile @@ -13,7 +13,7 @@ LSYMGEN=./lsymgen .pas: fpc -Mobjfpc -gl $< -all: pcomp sasm sdis lsymgen shortgen nativeprogs +all: libs pcomp sasm sdis lsymgen shortgen nativecomp nativeprogs libs: pcomp sasm lsymgen shortgen $(SASM) ../lib/coreloader.s @@ -22,9 +22,9 @@ libs: pcomp sasm lsymgen shortgen $(SASM) ../lib/stdlibwrap.s ../lib/stdlib.lib $(LSYMGEN) ../lib/stdlibwrap.sym ../lib/stdlib.lsym -test: sasm.s pcomp.s lsymgen.s shortgen.s +test: libs sasm.s pcomp.s lsymgen.s shortgen.s -testprgs: sasm.prog pcomp.prog lsymgen.prog shortgen.prog +testprgs: libs sasm.prog pcomp.prog lsymgen.prog shortgen.prog nativecomp: libs pcomp.prog sasm.prog lsymgen.prog shortgen.prog @@ -41,4 +41,5 @@ examples: nativecomp ../tests/readtest.prog ../tests/readchartest.prog ../tests/ -$(MAKE) -C ../rogue -f Makefile.tridoracpu clean: - rm -f pcomp sasm sdis libgen lsymgen *.o *.s *.prog + rm -f pcomp sasm sdis libgen lsymgen shortgen*.o *.s *.prog \ + ../lib/stdlib.s ../lib/stdlib.lib ../lib/stdlib.lsym From 79baf3cef534aa310f9f1d29da4cfaf5bd9ea16e Mon Sep 17 00:00:00 2001 From: slederer Date: Fri, 2 Jan 2026 22:49:54 +0100 Subject: [PATCH 10/24] serload: add exit command, correctly parse prompt after command --- utils/serload.py | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/utils/serload.py b/utils/serload.py index e69837f..0ee6962 100644 --- a/utils/serload.py +++ b/utils/serload.py @@ -78,6 +78,9 @@ def commandwait(ser, cmd): print("timeout sending '{}' command".format(cmd)) return None + if resp.startswith(b"> "): + resp = resp[2:] + if resp != bytearray(cmd + b"\r\n"): print("invalid response to '{}' command".format(cmd)) return None @@ -323,7 +326,7 @@ def showdata(ser): promptseen = True else: print(c.decode('utf8'), end='') - rest = ser.read(1) + rest = ser.read(1) # read trailing space of prompt def localdir(): @@ -360,8 +363,10 @@ def interactive(ser): print("superfluous argument") else: localdir() + elif cmd == 'exit' or cmd == 'x': + done = True else: - print("Unknown command. Valid commands are: dir get ldir put") + print("Unknown command. Valid commands are: dir get ldir put exit") if __name__ == "__main__": From 11814cd24fa147d278c6a648e7e36ee461e07e47 Mon Sep 17 00:00:00 2001 From: slederer Date: Fri, 2 Jan 2026 22:56:39 +0100 Subject: [PATCH 11/24] pcmaudio: bugfix corrupted audio, loop mode, adjust examples --- examples/pcmtest.pas | 53 +++++---- examples/pcmtest2.pas | 2 +- examples/xmas25.pas | 251 ++++++++++++++++++++++++++++++++++++++++++ lib/pcmaudio.inc | 2 +- lib/pcmaudio.s | 101 +++++++++++++---- 5 files changed, 363 insertions(+), 46 deletions(-) create mode 100644 examples/xmas25.pas diff --git a/examples/pcmtest.pas b/examples/pcmtest.pas index 423faaf..5122219 100644 --- a/examples/pcmtest.pas +++ b/examples/pcmtest.pas @@ -1,32 +1,21 @@ -{$H1536} -program pcmtest; +{$H2560} +program pcmtest2; uses pcmaudio; var filename:string; buf:SndBufPtr; - f:file; - size:integer; - i:integer; - c:char; sampleRate:integer; err:integer; + done:boolean; + c:char; + +function readAudioFile(fname:string):SndBufPtr; +var i,size:integer; + c:char; + buf:SndBufPtr; + f:file; begin - if ParamCount > 0 then - filename := ParamStr(1) - else - begin - write('Filename> '); - readln(filename); - end; - - err := 1; - if ParamCount > 1 then - val(ParamStr(2),sampleRate, err); - - if err <> 0 then - sampleRate := 16000; - - open(f, filename, ModeReadOnly); + open(f, fname, ModeReadOnly); size := FileSize(f); new(buf, size); @@ -41,6 +30,26 @@ begin close(f); + readAudioFile := buf; +end; + +begin + if ParamCount > 0 then + filename := ParamStr(1) + else + begin + write('Filename> '); + readln(filename); + end; + + err := 1; + if ParamCount > 1 then + val(ParamStr(2), sampleRate, err); + if err > 0 then + sampleRate := 22050; + + buf := readAudioFile(filename); + PlaySample(buf, sampleRate); dispose(buf); diff --git a/examples/pcmtest2.pas b/examples/pcmtest2.pas index f72e5e6..b5bf47a 100644 --- a/examples/pcmtest2.pas +++ b/examples/pcmtest2.pas @@ -50,7 +50,7 @@ begin buf := readAudioFile(filename); - SampleQStart(buf, sampleRate); + SampleQStart(buf, false, sampleRate); write('Press ESC to stop> '); done := false; diff --git a/examples/xmas25.pas b/examples/xmas25.pas new file mode 100644 index 0000000..1a7d8b5 --- /dev/null +++ b/examples/xmas25.pas @@ -0,0 +1,251 @@ +{$H2560} +{$S8} +program xmas252; +uses pcmaudio, fastfire, tiles; + +const MAXX = FIREWIDTH; + MAXY = FIREHEIGHT; + +(* type PixelData = array[0..31999] of integer; *) + +type Picture = record + magic:integer; + mode:integer; + palette: array[0..15] of integer; + pixels: PixelData; + end; + +var firecells: FireBuf; + + firepalette: array [0..15] of integer = + { ( $FFA, $FF8, $FF4, $FF0, $FE0, $FD0, $FA0, $F90, + $F00, $E00, $D00, $A00, $800, $600, $300, $000); } + { ( $FFA, $FFA, $FFA, $FFA, $FF0, $FF0, $FF0, $FF0, } + ( $00F, $00F, $00F, $00F, $00F, $00F, $00F, $00F, + $FF0, $FD0, $FA0, $C00, $A00, $700, $400, $000); + x,y:integer; + infile:file; + pic:^Picture; + tilesheet:^Picture; + animationTick:integer; + animationHold:integer; + animationState:integer; + + filename: string; + + audiodata: SndBufPtr; + +procedure createPalette; +var i:integer; +begin + for i := 15 downto 0 do + setpalette(15 - i, firepalette[i]); +end; + +procedure fireItUp; +var x,y:integer; +begin + y := MAXY - 1; + for x := 1 to MAXX - 1 do + firecells[y, x] := random and 127; +end; + + +procedure updateFire; +var i,x,y:integer; +begin + for y := 0 to MAXY - 2 do + for x := 1 to MAXX - 1 do + begin + i := + ((firecells[y + 1, x - 1] + + firecells[y + 1, x] + + firecells[y + 1, x + 1] + + firecells[y + 2, x]) + ) shr 2; + if i > 0 then + i := i - 1; + firecells[y, x] := i; + end; +end; + +procedure drawFire(startX,startY:integer); +var x, y, col, col2:integer; +begin + for y := 0 to MAXY - 1 do + begin + x := 0; + for col in firecells[y] do + begin + { scale and clamp color value } + col2 := col shr 3; + if col2 > FIREMAXCOLOR then col2 := FIREMAXCOLOR; + + putpixel(startX + x, startY + y, col2); + x := x + 1; + end; + end; +end; + +procedure readBackgroundPic(filename:string); +var i:integer; +begin + open(infile, filename, ModeReadonly); + read(infile, pic^); + close(infile); + + for i := 0 to 15 do + SetPalette(i, pic^.palette[i]); + + PutScreen(pic^.pixels); +end; + +procedure animate; +var tileSrcX,tilesrcY:integer; +begin + animationTick := animationTick + 1; + + if animationHold = 0 then + animationHold := 40; + + if animationTick < animationHold then + exit; + + animationTick := 0; + + case animationState of + 0: begin + tileSrcX := 0; + tileSrcY := 0; + animationHold := 40; + end; + 1: begin + tileSrcX := 19; + tileSrcY := 0; + animationHold := 20; + + if random and 7 > 4 then + animationState := -1; + end; + 2: begin + tileSrcX := 38; + tileSrcY := 0; + animationHold := 2; + end; + 3: begin; + tileSrcX := 57; + tileSrcY := 0; + animationHold := 2; + end; + 4: begin + tileSrcX := 0; + tileSrcY := 13; + animationHold := 15; + end; + 5: begin + tileSrcX := 57; + tileSrcY := 0; + animationHold := 2; + end; + 6: begin + tileSrcX := 38; + tileSrcY := 0; + animationHold := 2; + end; + 7: begin + tileSrcX := 0; + tileSrcY := 0; + animationHold := 2; + animationState := -1; + end; + end; + + CopyTilesScr(tilesheet^.pixels, + tileSrcX, tileSrcY, + 34,34, + 19,13); + + animationState := animationState + 1; +end; + + +procedure readTilesheet; +var filename:string; + i:integer; +begin + filename := 'tilesheet.pict'; + open(infile, filename, ModeReadonly); + read(infile, tilesheet^); + close(infile); +end; + +function newAudioData(fname:string):SndBufPtr; +var i,size:integer; + c:char; + buf:SndBufPtr; + f:file; +begin + open(f, fname, ModeReadOnly); + size := FileSize(f); + new(buf, size); + + buf^ := ''; + write('Reading ', size, ' bytes...'); + for i := 1 to size do + begin + read(f,c); + AppendChar(buf^,c); + end; + writeln; + + close(f); + + newAudioData := buf; +end; + + +begin + if ParamCount > 0 then + filename := ParamStr(1) + else + filename := 'xmas25bg.pict'; + + Randomize; + + audiodata := newAudioData('fireplace-loop.tdrau'); + + InitGraphics; + + new(pic); + readBackgroundPic(filename); + + new(tilesheet); + readTilesheet; + + SampleQStart(audiodata, true, 22050); + + while not ConAvail do + begin + fireItUp; + FastFireUpdate(firecells); + { updateFire; } + FastFireDraw(firecells, 216, 165); + { drawFire(216, 165); } + animate; + end; + + SampleQStop; + + for y := 0 to MAXY do + begin + x := firecells[y, 10]; + drawline(0, y, x, y, 1); + + end; + + InitGraphics; + + dispose(tilesheet); + dispose(pic); + dispose(audiodata); +end. diff --git a/lib/pcmaudio.inc b/lib/pcmaudio.inc index 4c3cdb3..dc1dbba 100644 --- a/lib/pcmaudio.inc +++ b/lib/pcmaudio.inc @@ -2,6 +2,6 @@ type SndBuf = string[32768]; type SndBufPtr = ^SndBuf; procedure PlaySample(buf:SndBufPtr;sampleRate:integer); external; -procedure SampleQStart(buf:SndBufPtr;sampleRate:integer); external; +procedure SampleQStart(buf:SndBufPtr;loop:boolean;sampleRate:integer); external; procedure SampleQStop; external; function SampleQSize:integer; external; diff --git a/lib/pcmaudio.s b/lib/pcmaudio.s index d1add4f..530f52f 100644 --- a/lib/pcmaudio.s +++ b/lib/pcmaudio.s @@ -1,25 +1,25 @@ .EQU AUDIO_BASE $A00 .EQU IRQC_REG $980 .EQU IRQC_EN $80 + .EQU CPU_FREQ 77000000 ; args: sample rate START_PCMAUDIO: ; calculate clock divider - LOADCP 77000000 + LOADCP CPU_FREQ SWAP LOADCP _DIV CALL LOADC AUDIO_BASE + 1 SWAP ; put clock divider on ToS -; LOADCP 4812 ; clock divider for 16KHz sample rate -; LOADCP 2406 ; clock divider for 32KHz sample rate STOREI 1 LOADCP 32768 ; set amplitude to biased 0 STOREI DROP + LOADC AUDIO_BASE - LOADC 17 ; enable channel, enable interrupt + LOADC 1 ; enable channel STOREI DROP RET @@ -101,18 +101,14 @@ PLAY1_L0: DROP RET -; start interrupt-driven sample playback -; args: pointer to pascal string, sample rate -SAMPLEQSTART: - LOADCP START_PCMAUDIO - CALL - +; set sample queue count and pointer from string header +; args: pointer to string/SndBufPtr +_STR2SMPLQPTR: LOADCP SMPLQ_COUNT OVER LOADI ; get string size from header SHR ; divide by 4 to get word count SHR - STOREI DROP @@ -121,6 +117,38 @@ SAMPLEQSTART: INC 8 ; skip rest of header STOREI ; store sample data pointer DROP + RET + +; start interrupt-driven sample playback +; args: pointer to pascal string, loop flag, sample rate +SAMPLEQSTART: + LOADCP START_PCMAUDIO ; sample rate is on ToS as arg to subroutine + CALL + + SWAP ; swap loop flag and buf ptr + + LOADCP _STR2SMPLQPTR + CALL + + ; loop flag is now on ToS + CBRANCH.Z SQ_S_1 + ; if nonzero, set loop ptr + LOADCP SMPLQ_PTR + LOADI + DEC 8 ; subtract offset for string header again + BRANCH SQ_S_0 +SQ_S_1: + LOADC 0 +SQ_S_0: + LOADCP SMPLQ_NEXT + SWAP + STOREI + DROP + + LOADC AUDIO_BASE + LOADC 17 ; enable channel, enable interrupt + STOREI + DROP LOADCP SMPLQ_ISR ; set interrupt handler STOREREG IV @@ -154,6 +182,7 @@ SAMPLEQSIZE: SMPLQ_PTR: .WORD 0 SMPLQ_COUNT: .WORD 0 +SMPLQ_NEXT: .WORD 0 SMPLQ_ISR: LOADC IRQC_REG @@ -170,7 +199,7 @@ SMPLQ_I_L: DROP BRANCH SMPLQ_I_XT ; if null, end interrupt routine SMPLQ_I_B: - LOADI ; load next word + LOADI ; load next word which contains two samples DUP BROT ; get high half-word @@ -205,23 +234,42 @@ SMPLQ_I_B: STOREI DROP - ; check if fifo is full - LOADC AUDIO_BASE - LOADI - LOADC 8 ; fifo_full + ; put up to 16 samples into the sample queue + LOADCP SMPLQ_COUNT + LOADI ; load word counter again + LOADC 7 ; check if count modulo 7 = 0 AND - CBRANCH.Z SMPLQ_I_L ; next sample if not full + CBRANCH.NZ SMPLQ_I_L ; if not, next two samples - LOADC AUDIO_BASE - LOADC 17 ; re-enable channel interrupt - STOREI + ; check if fifo is full + ; does not work reliably when running in DRAM, + ; maybe because at least one sample has already played + ; since start of ISR? +; LOADC AUDIO_BASE +; LOADI +; LOADC 8 ; fifo_full +; AND +; CBRANCH.Z SMPLQ_I_L ; next sample if not full + + BRANCH SMPLQ_I_XT + + ; end of sample buffer, check for next +SMPLQ_I_END: DROP + DROP + + LOADCP SMPLQ_NEXT ; skip to end + LOADI ; if NEXT ptr is zero + DUP + CBRANCH.Z SMPLQ_I_END1 + + LOADCP _STR2SMPLQPTR + CALL BRANCH SMPLQ_I_XT ; end playback, set ptr and counter to zero -SMPLQ_I_END: - DROP +SMPLQ_I_END1: DROP LOADCP SMPLQ_PTR LOADC 0 @@ -238,7 +286,16 @@ SMPLQ_I_END: STOREI DROP + ; exit without enabling interrupts for this channel + BRANCH SMPLQ_I_XT2 + SMPLQ_I_XT: + LOADC AUDIO_BASE + LOADC 17 ; re-enable channel interrupt + STOREI + DROP + +SMPLQ_I_XT2: LOADC IRQC_REG ; re-enable interrupts LOADC IRQC_EN STOREI From d17c4c41fd2b27fd94b2c28986626bb8f5a8387c Mon Sep 17 00:00:00 2001 From: slederer Date: Sun, 25 Jan 2026 23:23:22 +0100 Subject: [PATCH 12/24] docs: add section about units to the pascal programming guide --- doc/pascalprogramming.md | 79 ++++++++++++++++++++++++++++++++++++++++ lib/corelib.s | 2 +- 2 files changed, 80 insertions(+), 1 deletion(-) diff --git a/doc/pascalprogramming.md b/doc/pascalprogramming.md index df5d3fd..454b46b 100644 --- a/doc/pascalprogramming.md +++ b/doc/pascalprogramming.md @@ -235,6 +235,85 @@ In Wirth Pascal, labels must be numbers. Other Pascal dialects also allow normal Tridora-Pascal only allows identifiers as labels. +## Units +Units are the method to create libraries in Tridora-Pascal, that is, codes module that can +be reused in other programs. + +Tridora-Pascal follows the unit syntax that has been established in UCSD-Pascal and is also +used in Turbo Pascal. + +Units are imported with the *USES* keyword, right after the *PROGRAM* statement. +Multiple units can be imported by separating the unit names with commas. + +There are some differences: In Tridora-Pascal, the unit file does not contain the interface +section, only the implementation section. The interface section is instead placed into a +separate file with the extension *.inc*, without any *UNIT* or *INTERFACE* keywords. + +This file will be included by the compiler and should contain +procedure or function declarations (as *EXTERNAL*). It can also contain *TYPE*, +*CONST* and *VAR* statements. + +All Pascal symbols of the unit are imported into the main program. There +is no separate namespace for units. + +### Using an Existing Unit +An existing unit is imported with the *USES* statement that must be placed +immediately after the *PROGRAM* statement. + +The compiler will look for an include file with the unit name and an *.inc* extension. +It will also +tell the assembler to include an assembly language file for each +unit. The filename must be the unit name plus an *.s* extension. + +Since there is no linker in Tridora-Pascal, all imported units will be +assembled together with the main program. + +The compiler looks for unit *.inc* and *.s* files in the current volume or +in the *SYSTEM* volume. + +### Compiling a Unit +A unit implementation file should start with a *UNIT* statement instead of a *PROGRAM* +statement. + +It should be compiled, not assembled. + +When building a program that uses units, the assembler will include an assembly language +file for each unit. + +It is possible to write units in assembly language. This is done by +directly providing the *.s* file and creating an *.inc* file with +the *EXTERNAL* declarations matching the assembly language +file. +#### Example +```pascal +(* UnitExamples.pas *) +program UnitExample; +uses hello; + +begin + sayHello('unit'); +end. +``` + +#### Example Unit Implementation File +```pascal +(* hello.pas *) +unit hello; + +implementation + +procedure sayHello(s:string); +begin + writeln('hello, ', s); +end; + +end. +``` +#### Example Unit Include File +```pascal +(* hello.inc *) +procedure sayHello(s:string); external; +``` ## Compiler Directives Tridora-Pascal understands a small number of compiler directives which are introduced as usual with a comment and a dollar-sign. Both comment styles can be used. diff --git a/lib/corelib.s b/lib/corelib.s index 8b8f403..d147934 100644 --- a/lib/corelib.s +++ b/lib/corelib.s @@ -822,7 +822,7 @@ PUTPIXEL_4BPP: SHL 2 ; * 16 DUP SHL 2; * 64 - ADD ; x*16 + x*64 + ADD ; y*16 + y*64 ADD ; add results together for vmem addr From 248c9ae919f9fe64adc294daa3baead6b35695c7 Mon Sep 17 00:00:00 2001 From: slederer Date: Mon, 26 Jan 2026 02:03:28 +0100 Subject: [PATCH 13/24] vgafb: first attempt at shifter/masker acceleration functionality --- tridoracpu/tridoracpu.srcs/top.v | 4 +- tridoracpu/tridoracpu.srcs/vgafb.v | 94 ++++++++++++++++++++++++++++++ tridoracpu/tridoracpu.xpr | 3 +- 3 files changed, 97 insertions(+), 4 deletions(-) diff --git a/tridoracpu/tridoracpu.srcs/top.v b/tridoracpu/tridoracpu.srcs/top.v index 0dc3346..bf3bea8 100644 --- a/tridoracpu/tridoracpu.srcs/top.v +++ b/tridoracpu/tridoracpu.srcs/top.v @@ -137,7 +137,7 @@ module top( assign fb_wr_data = mem_write_data; vgafb vgafb0(`clock, pixclk, rst, - mem_addr[3:0], fb_rd_data, fb_wr_data, + mem_addr[5:2], fb_rd_data, fb_wr_data, fb_rd_en, fb_wr_en, VGA_HS_O, VGA_VS_O, VGA_R, VGA_G, VGA_B); `endif @@ -247,7 +247,7 @@ module top( assign tdraudio_wr_data = mem_write_data; tdraudio tdraudio0(`clock, ~rst, - mem_addr[6:0], + mem_addr[8:2], tdraudio_rd_data, tdraudio_wr_data, tdraudio_rd_en, diff --git a/tridoracpu/tridoracpu.srcs/vgafb.v b/tridoracpu/tridoracpu.srcs/vgafb.v index f87e514..408079a 100644 --- a/tridoracpu/tridoracpu.srcs/vgafb.v +++ b/tridoracpu/tridoracpu.srcs/vgafb.v @@ -1,6 +1,9 @@ `timescale 1ns / 1ps `default_nettype none +// enable shifter/masker registers +`define ENABLE_FB_ACCEL + // Project F: Display Timings // (C)2019 Will Green, Open Source Hardware released under the MIT License // Learn more at https://projectf.io @@ -126,6 +129,14 @@ module vgafb #(VMEM_ADDR_WIDTH = 15, VMEM_DATA_WIDTH = 32) ( localparam REG_PAL_SLOT = 3; localparam REG_PAL_DATA = 4; localparam REG_CTL = 5; +`ifdef ENABLE_FB_ACCEL + localparam REG_SHIFTER = 6; + localparam REG_SHIFTCOUNT = 7; + localparam REG_SHIFTERM = 9; + localparam REG_SHIFTERSP = 10; + localparam REG_MASKGEN = 11; +`endif + localparam COLOR_WIDTH = 12; localparam PALETTE_WIDTH = 4; @@ -145,12 +156,30 @@ module vgafb #(VMEM_ADDR_WIDTH = 15, VMEM_DATA_WIDTH = 32) ( wire pix_rd; wire [VMEM_DATA_WIDTH-1:0] status; +`ifdef ENABLE_FB_ACCEL + reg [VMEM_DATA_WIDTH-1:0] acc_shifter_in; + reg [(VMEM_DATA_WIDTH*2)-1:0] acc_shifter_out; + reg [2:0] acc_shift_count; + reg acc_start_shift; + reg [VMEM_DATA_WIDTH-1:0] acc_mask_in; + wire [VMEM_DATA_WIDTH-1:0] acc_mask_out; + wire [VMEM_DATA_WIDTH-1:0] acc_shifter_mask; + wire [VMEM_DATA_WIDTH-1:0] acc_shifter_out_h = acc_shifter_out[(VMEM_DATA_WIDTH*2)-1:VMEM_DATA_WIDTH]; + wire [VMEM_DATA_WIDTH-1:0] acc_shifter_out_l = acc_shifter_out[VMEM_DATA_WIDTH-1:0]; + `endif + assign vmem_rd_en = rd_en; assign vmem_wr_en = (reg_sel == REG_VMEM) && wr_en; assign rd_data = (reg_sel == REG_VMEM) ? vmem_rd_data : (reg_sel == REG_RD_ADDR) ? cpu_rd_addr : (reg_sel == REG_WR_ADDR) ? cpu_wr_addr : (reg_sel == REG_CTL) ? status : +`ifdef ENABLE_FB_ACCEL + (reg_sel == REG_SHIFTER) ? acc_shifter_out_h: + (reg_sel == REG_SHIFTERM) ? acc_shifter_mask : + (reg_sel == REG_SHIFTERSP) ? acc_shifter_out_l : + (reg_sel == REG_MASKGEN) ? acc_mask_out : + `endif 32'hFFFFFFFF; wire [VMEM_ADDR_WIDTH-1:0] cpu_addr = vmem_wr_en ? cpu_wr_addr : cpu_rd_addr; @@ -271,6 +300,71 @@ module vgafb #(VMEM_ADDR_WIDTH = 15, VMEM_DATA_WIDTH = 32) ( if(rd_en && reg_sel == REG_VMEM) cpu_rd_addr <= cpu_rd_addr + 1; // auto-increment read addr on read end +`ifdef ENABLE_FB_ACCEL + // + // shifter/masker registers + // + always @(posedge cpu_clk) + begin + if(wr_en && reg_sel == REG_SHIFTER) + acc_shifter_in <= { wr_data, {32{1'b0}}}; + end + + always @(posedge cpu_clk) + begin + if(wr_en && reg_sel == REG_SHIFTCOUNT) + begin + acc_shift_count <= wr_data[2:0]; + acc_start_shift <= 1; + end + + if(acc_start_shift) + acc_start_shift <= 0; + end + + always @(posedge cpu_clk) + begin + if (acc_start_shift) + begin + acc_shifter_out <= {acc_shifter_in, {VMEM_DATA_WIDTH{1'b0}}} >> acc_shift_count; + end + end + + // mask register + always @(posedge cpu_clk) + begin + if (wr_en && reg_sel == REG_MASKGEN) + begin + acc_mask_in <= wr_data; + end + end + + assign acc_mask_out = { + {4{|{acc_mask_in[31:28]}}}, + {4{|{acc_mask_in[27:24]}}}, + {4{|{acc_mask_in[23:20]}}}, + {4{|{acc_mask_in[19:16]}}}, + {4{|{acc_mask_in[15:12]}}}, + {4{|{acc_mask_in[11:8]}}}, + {4{|{acc_mask_in[7:4]}}}, + {4{|{acc_mask_in[3:0]}}} + }; + + assign acc_shifter_mask = { + {4{|{acc_shifter_out_h[31:28]}}}, + {4{|{acc_shifter_out_h[27:24]}}}, + {4{|{acc_shifter_out_h[23:20]}}}, + {4{|{acc_shifter_out_h[19:16]}}}, + {4{|{acc_shifter_out_h[15:12]}}}, + {4{|{acc_shifter_out_h[11:8]}}}, + {4{|{acc_shifter_out_h[7:4]}}}, + {4{|{acc_shifter_out_h[3:0]}}} + }; +`endif + + // + // shifting pixels at pixel clock + // always @(posedge pix_clk) begin if(scanline || shift_count == MAX_SHIFT_COUNT) // before start of a line diff --git a/tridoracpu/tridoracpu.xpr b/tridoracpu/tridoracpu.xpr index a3dd3f6..2926f59 100644 --- a/tridoracpu/tridoracpu.xpr +++ b/tridoracpu/tridoracpu.xpr @@ -376,7 +376,7 @@ - + @@ -389,7 +389,6 @@ - From 937369f60b5349fe7d8f71b6f6f4b3d4190e3e9f Mon Sep 17 00:00:00 2001 From: slederer Date: Wed, 28 Jan 2026 01:15:16 +0100 Subject: [PATCH 14/24] lib,examples: changes for new register address mapping --- examples/fastfire.s | 10 +++++----- examples/sprites.s | 10 +++++----- lib/corelib.s | 10 +++++----- lib/pcmaudio.s | 10 +++++----- tridoracpu/tridoracpu.srcs/vgafb.v | 10 +++------- tridoracpu/tridoracpu.xpr | 3 ++- 6 files changed, 25 insertions(+), 28 deletions(-) diff --git a/examples/fastfire.s b/examples/fastfire.s index f0e10e4..63ace51 100644 --- a/examples/fastfire.s +++ b/examples/fastfire.s @@ -123,11 +123,11 @@ FF_EXIT: ; framebuffer controller registers .EQU FB_RA $900 - .EQU FB_WA $901 - .EQU FB_IO $902 - .EQU FB_PS $903 - .EQU FB_PD $904 - .EQU FB_CTL $905 + .EQU FB_WA $904 + .EQU FB_IO $908 + .EQU FB_PS $90C + .EQU FB_PD $910 + .EQU FB_CTL $914 .EQU WORDS_PER_LINE 80 ; fire width in vmem words (strict left-to-right evaluation) diff --git a/examples/sprites.s b/examples/sprites.s index 3391339..6962eda 100644 --- a/examples/sprites.s +++ b/examples/sprites.s @@ -3,9 +3,9 @@ .EQU WORDS_PER_LINE 80 .EQU FB_RA $900 - .EQU FB_WA $901 - .EQU FB_IO $902 - .EQU FB_PS $903 + .EQU FB_WA $904 + .EQU FB_IO $908 + .EQU FB_PS $90C ; calculate mask for a word of pixels ; args: word of pixels with four bits per pixel @@ -95,7 +95,7 @@ PS_LOOP1: ; in the vga controller LOADC FB_RA ; read address register LOAD PS_VMEM_ADDR - STOREI 1 ; use autoincrement to get to the next register + STOREI 4 ; use autoincrement to get to the next register LOAD PS_VMEM_ADDR STOREI DROP @@ -322,7 +322,7 @@ UD_S_L1: ; store vmem offset into write addr reg LOADCP FB_WA LOAD UD_S_OFFSET - STOREI 1 ; ugly but fast: reuse addr + STOREI 4 ; ugly but fast: reuse addr ; with postincrement to ; get to FB_IO for STOREI below diff --git a/lib/corelib.s b/lib/corelib.s index d147934..a21b95c 100644 --- a/lib/corelib.s +++ b/lib/corelib.s @@ -701,11 +701,11 @@ CMPWORDS_XT2: ; --------- Graphics Library --------------- ; vga controller registers .EQU FB_RA $900 - .EQU FB_WA $901 - .EQU FB_IO $902 - .EQU FB_PS $903 - .EQU FB_PD $904 - .EQU FB_CTL $905 + .EQU FB_WA $904 + .EQU FB_IO $908 + .EQU FB_PS $90C + .EQU FB_PD $910 + .EQU FB_CTL $914 ; set a pixel in fb memory ; parameters: x,y - coordinates PUTPIXEL_1BPP: diff --git a/lib/pcmaudio.s b/lib/pcmaudio.s index 530f52f..ebe812a 100644 --- a/lib/pcmaudio.s +++ b/lib/pcmaudio.s @@ -11,9 +11,9 @@ START_PCMAUDIO: LOADCP _DIV CALL - LOADC AUDIO_BASE + 1 + LOADC AUDIO_BASE + 4 SWAP ; put clock divider on ToS - STOREI 1 + STOREI 4 LOADCP 32768 ; set amplitude to biased 0 STOREI DROP @@ -95,7 +95,7 @@ PLAY1_L0: AND CBRANCH.NZ PLAY1_L0 ; loop if fifo is full - LOADC AUDIO_BASE+2 ; store amplitude value + LOADC AUDIO_BASE+8 ; store amplitude value SWAP STOREI DROP @@ -207,7 +207,7 @@ SMPLQ_I_B: LOADCP $FFFF AND - LOADC AUDIO_BASE+2 + LOADC AUDIO_BASE+8 SWAP STOREI ; write sample, keep addr @@ -281,7 +281,7 @@ SMPLQ_I_END1: DROP ; set amplitude out to zero (biased) - LOADC AUDIO_BASE+2 + LOADC AUDIO_BASE+8 LOADCP 32768 STOREI DROP diff --git a/tridoracpu/tridoracpu.srcs/vgafb.v b/tridoracpu/tridoracpu.srcs/vgafb.v index 408079a..2d6bc55 100644 --- a/tridoracpu/tridoracpu.srcs/vgafb.v +++ b/tridoracpu/tridoracpu.srcs/vgafb.v @@ -132,9 +132,9 @@ module vgafb #(VMEM_ADDR_WIDTH = 15, VMEM_DATA_WIDTH = 32) ( `ifdef ENABLE_FB_ACCEL localparam REG_SHIFTER = 6; localparam REG_SHIFTCOUNT = 7; - localparam REG_SHIFTERM = 9; - localparam REG_SHIFTERSP = 10; - localparam REG_MASKGEN = 11; + localparam REG_SHIFTERM = 8; + localparam REG_SHIFTERSP = 09; + localparam REG_MASKGEN = 10; `endif localparam COLOR_WIDTH = 12; @@ -325,18 +325,14 @@ module vgafb #(VMEM_ADDR_WIDTH = 15, VMEM_DATA_WIDTH = 32) ( always @(posedge cpu_clk) begin if (acc_start_shift) - begin acc_shifter_out <= {acc_shifter_in, {VMEM_DATA_WIDTH{1'b0}}} >> acc_shift_count; - end end // mask register always @(posedge cpu_clk) begin if (wr_en && reg_sel == REG_MASKGEN) - begin acc_mask_in <= wr_data; - end end assign acc_mask_out = { diff --git a/tridoracpu/tridoracpu.xpr b/tridoracpu/tridoracpu.xpr index 2926f59..a3dd3f6 100644 --- a/tridoracpu/tridoracpu.xpr +++ b/tridoracpu/tridoracpu.xpr @@ -376,7 +376,7 @@ - + @@ -389,6 +389,7 @@ + From 042a18fc9b7ecd5b93b88a1f1a3ea4b633302b4b Mon Sep 17 00:00:00 2001 From: slederer Date: Thu, 29 Jan 2026 01:53:35 +0100 Subject: [PATCH 15/24] vgafb: bugfixes, change synthesis optimization settings --- tridoracpu/tridoracpu.srcs/vgafb.v | 6 +++--- tridoracpu/tridoracpu.xpr | 34 ++++++++++++++++++++++-------- 2 files changed, 28 insertions(+), 12 deletions(-) diff --git a/tridoracpu/tridoracpu.srcs/vgafb.v b/tridoracpu/tridoracpu.srcs/vgafb.v index 2d6bc55..411e956 100644 --- a/tridoracpu/tridoracpu.srcs/vgafb.v +++ b/tridoracpu/tridoracpu.srcs/vgafb.v @@ -159,7 +159,7 @@ module vgafb #(VMEM_ADDR_WIDTH = 15, VMEM_DATA_WIDTH = 32) ( `ifdef ENABLE_FB_ACCEL reg [VMEM_DATA_WIDTH-1:0] acc_shifter_in; reg [(VMEM_DATA_WIDTH*2)-1:0] acc_shifter_out; - reg [2:0] acc_shift_count; + reg [4:0] acc_shift_count; reg acc_start_shift; reg [VMEM_DATA_WIDTH-1:0] acc_mask_in; wire [VMEM_DATA_WIDTH-1:0] acc_mask_out; @@ -307,14 +307,14 @@ module vgafb #(VMEM_ADDR_WIDTH = 15, VMEM_DATA_WIDTH = 32) ( always @(posedge cpu_clk) begin if(wr_en && reg_sel == REG_SHIFTER) - acc_shifter_in <= { wr_data, {32{1'b0}}}; + acc_shifter_in <= wr_data; end always @(posedge cpu_clk) begin if(wr_en && reg_sel == REG_SHIFTCOUNT) begin - acc_shift_count <= wr_data[2:0]; + acc_shift_count <= { wr_data[2:0], 2'b0}; acc_start_shift <= 1; end diff --git a/tridoracpu/tridoracpu.xpr b/tridoracpu/tridoracpu.xpr index a3dd3f6..4d21f83 100644 --- a/tridoracpu/tridoracpu.xpr +++ b/tridoracpu/tridoracpu.xpr @@ -356,10 +356,16 @@ - + - - + + Performs optimizations which creates alternative logic technology mapping, including disabling LUT combining, forcing F7/F8/F9 to logic, increasing the threshold of shift register inference. + + + + + + @@ -376,16 +382,26 @@ - + - + + Best predicted directive for place_design. + - + + + - + + + - - + + + + + + From 8900eb90be47d2d3eceb5b2e2b5417ce58b24993 Mon Sep 17 00:00:00 2001 From: slederer Date: Sat, 31 Jan 2026 02:31:00 +0100 Subject: [PATCH 16/24] corelib: new putpixel routine using shifter/maskgen --- lib/corelib.s | 190 +++++++++----------------------------------------- 1 file changed, 34 insertions(+), 156 deletions(-) diff --git a/lib/corelib.s b/lib/corelib.s index a21b95c..b228d20 100644 --- a/lib/corelib.s +++ b/lib/corelib.s @@ -706,108 +706,32 @@ CMPWORDS_XT2: .EQU FB_PS $90C .EQU FB_PD $910 .EQU FB_CTL $914 -; set a pixel in fb memory -; parameters: x,y - coordinates -PUTPIXEL_1BPP: - ; calculate vmem address: - OVER ; duplicate x - ; divide x by 32 - SHR - SHR - SHR - SHR - SHR - SWAP - ; multiply y by words per line - SHL 2 - SHL 2 - SHL + .EQU FB_SHIFTER $918 + .EQU FB_SHIFTCOUNT $91C + .EQU FB_SHIFTERM $920 + .EQU FB_SHIFTERSP $924 + .EQU FB_MASKGEN $928 - ADD ; add results together for vmem addr +; draw a single pixel +; args: x, y, color - DUP - LOADCP FB_WA - SWAP - STOREI ; store to framebuffer write addr register - DROP - LOADCP FB_RA ; and to framebuffer read addr register - SWAP - STOREI - DROP - - ; x is now at top of stack - ; get bit value from x modulo 32 - LOADC 31 - AND - SHL 2 ; (x & 31) * 4 = offset into table - LOADCP INT_TO_PIX_TABLE - ADD - LOADI - - LOADCP FB_IO - ; read old vmem value - LOADCP FB_IO - LOADI - ; or in new bit - OR - ; write new value - STOREI - DROP - - RET - -INT_TO_PIX_TABLE: - .WORD %10000000_00000000_00000000_00000000 - .WORD %01000000_00000000_00000000_00000000 - .WORD %00100000_00000000_00000000_00000000 - .WORD %00010000_00000000_00000000_00000000 - .WORD %00001000_00000000_00000000_00000000 - .WORD %00000100_00000000_00000000_00000000 - .WORD %00000010_00000000_00000000_00000000 - .WORD %00000001_00000000_00000000_00000000 - .WORD %00000000_10000000_00000000_00000000 - .WORD %00000000_01000000_00000000_00000000 - .WORD %00000000_00100000_00000000_00000000 - .WORD %00000000_00010000_00000000_00000000 - .WORD %00000000_00001000_00000000_00000000 - .WORD %00000000_00000100_00000000_00000000 - .WORD %00000000_00000010_00000000_00000000 - .WORD %00000000_00000001_00000000_00000000 - .WORD %00000000_00000000_10000000_00000000 - .WORD %00000000_00000000_01000000_00000000 - .WORD %00000000_00000000_00100000_00000000 - .WORD %00000000_00000000_00010000_00000000 - .WORD %00000000_00000000_00001000_00000000 - .WORD %00000000_00000000_00000100_00000000 - .WORD %00000000_00000000_00000010_00000000 - .WORD %00000000_00000000_00000001_00000000 - .WORD %00000000_00000000_00000000_10000000 - .WORD %00000000_00000000_00000000_01000000 - .WORD %00000000_00000000_00000000_00100000 - .WORD %00000000_00000000_00000000_00010000 - .WORD %00000000_00000000_00000000_00001000 - .WORD %00000000_00000000_00000000_00000100 - .WORD %00000000_00000000_00000000_00000010 - .WORD %00000000_00000000_00000000_00000001 - -PUTMPIXEL: - LOADC 1 -; set a pixel in fb memory -; parameters: x,y,color - coordinates, color value (0-15) PUTPIXEL: PUTPIXEL_4BPP: .EQU PUTPIXEL_X 0 .EQU PUTPIXEL_Y 4 .EQU PUTPIXEL_COLOR 8 - .EQU PUTPIXEL_PIXPOS 12 + .EQU PUTPIXEL_BPSAV 12 .EQU PUTPIXEL_FS 16 FPADJ -PUTPIXEL_FS - STORE PUTPIXEL_COLOR STORE PUTPIXEL_Y STORE PUTPIXEL_X + LOADREG BP + STORE PUTPIXEL_BPSAV + LOADC 0 + STOREREG BP ; calculate vmem address: (x / 8) + (y * 80) LOAD PUTPIXEL_X @@ -826,83 +750,37 @@ PUTPIXEL_4BPP: ADD ; add results together for vmem addr - LOADCP FB_WA - OVER - STOREI ; store to framebuffer write addr register - DROP - LOADCP FB_RA ; and to framebuffer read addr register - SWAP ; swap addr and value for STOREI - STOREI - DROP - - LOAD PUTPIXEL_X - ; |0000.0000|0000.0000|0000.0000|0000.1111| - LOADC 7 - AND ; calculate pixel position in word - LOADC 7 - SWAP - SUB ; pixpos = 7 - (x & 7) - STORE PUTPIXEL_PIXPOS + DUP + STORE.B FB_WA ; set as write and read addresses + STORE.B FB_RA + ; create pixel data from color value in + ; leftmost pixel data bits (31-28) LOAD PUTPIXEL_COLOR - LOAD PUTPIXEL_PIXPOS - SHR ; rcount = pixpos / 2 -ROTLOOP_: - DUP ; exit loop if rcount is 0 - CBRANCH.Z ROTLOOP_END - SWAP ; pixel value is now on top of stack - BROT ; value = value << 8 - SWAP ; rcount is now on top of stack - DEC 1 ; rcount = rcount - 1 - BRANCH ROTLOOP_ -ROTLOOP_END: - DROP ; drop rcount - ; shifted pixel value is now at top of stack - LOAD PUTPIXEL_PIXPOS - LOADC 1 - AND - CBRANCH.Z EVEN_PIXPOS - SHL 2 ; if pixpos is odd, shift by 4 bits + BROT + BROT + BROT SHL 2 -EVEN_PIXPOS: - LOAD PUTPIXEL_X - ; get bit value from x modulo 8 - LOADC 7 - AND - SHL 2 ; (x & 7) * 4 = offset into table - LOADCP INT_TO_MASK_TABLE - ADD - LOADI + SHL 2 + STORE.B FB_SHIFTER ; store pixel into shifter - ; read old vmem value - LOADCP FB_IO - LOADI - ; mask bits - AND - ; or in shifted pixel value - OR + LOAD PUTPIXEL_X ; use x coord as shift count + STORE.B FB_SHIFTCOUNT ; writing triggers shifting - ; write new value - LOADCP FB_IO - SWAP - STOREI - DROP + LOAD.B FB_SHIFTERM ; get shift result as mask + LOAD.B FB_IO ; get background pixel data + AND ; remove bits for new pixel from bg + + LOAD.B FB_SHIFTER ; load shifted pixel + OR ; OR in new pixel bits + STORE.B FB_IO ; write new pixel data word to vmem + + LOAD PUTPIXEL_BPSAV + STOREREG BP FPADJ PUTPIXEL_FS RET - .CPOOL - -INT_TO_MASK_TABLE: - .WORD %00001111_11111111_11111111_11111111 - .WORD %11110000_11111111_11111111_11111111 - .WORD %11111111_00001111_11111111_11111111 - .WORD %11111111_11110000_11111111_11111111 - .WORD %11111111_11111111_00001111_11111111 - .WORD %11111111_11111111_11110000_11111111 - .WORD %11111111_11111111_11111111_00001111 - .WORD %11111111_11111111_11111111_11110000 - ; draw a line between two points ; parameters: x0, y0, x1, y1, color .EQU DL_X0 0 From 1e56251fc1417ff53f2444c578e4a3323400c0c9 Mon Sep 17 00:00:00 2001 From: slederer Date: Sat, 31 Jan 2026 17:24:36 +0100 Subject: [PATCH 17/24] vgafb: buffer maskgen outputs to avoid timing problems --- lib/corelib.s | 5 ++- tridoracpu/tridoracpu.srcs/vgafb.v | 57 +++++++++++++++++------------- tridoracpu/tridoracpu.xpr | 8 ++--- 3 files changed, 37 insertions(+), 33 deletions(-) diff --git a/lib/corelib.s b/lib/corelib.s index b228d20..93dc81f 100644 --- a/lib/corelib.s +++ b/lib/corelib.s @@ -756,10 +756,9 @@ PUTPIXEL_4BPP: ; create pixel data from color value in ; leftmost pixel data bits (31-28) + LOADC 0 LOAD PUTPIXEL_COLOR - BROT - BROT - BROT + BPLC SHL 2 SHL 2 STORE.B FB_SHIFTER ; store pixel into shifter diff --git a/tridoracpu/tridoracpu.srcs/vgafb.v b/tridoracpu/tridoracpu.srcs/vgafb.v index 411e956..fd42627 100644 --- a/tridoracpu/tridoracpu.srcs/vgafb.v +++ b/tridoracpu/tridoracpu.srcs/vgafb.v @@ -162,10 +162,12 @@ module vgafb #(VMEM_ADDR_WIDTH = 15, VMEM_DATA_WIDTH = 32) ( reg [4:0] acc_shift_count; reg acc_start_shift; reg [VMEM_DATA_WIDTH-1:0] acc_mask_in; - wire [VMEM_DATA_WIDTH-1:0] acc_mask_out; - wire [VMEM_DATA_WIDTH-1:0] acc_shifter_mask; + reg [VMEM_DATA_WIDTH-1:0] acc_mask_buf; + reg [VMEM_DATA_WIDTH-1:0] acc_shiftmask_buf; + wire [VMEM_DATA_WIDTH-1:0] acc_shifter_mask = acc_shiftmask_buf; wire [VMEM_DATA_WIDTH-1:0] acc_shifter_out_h = acc_shifter_out[(VMEM_DATA_WIDTH*2)-1:VMEM_DATA_WIDTH]; wire [VMEM_DATA_WIDTH-1:0] acc_shifter_out_l = acc_shifter_out[VMEM_DATA_WIDTH-1:0]; + `endif assign vmem_rd_en = rd_en; @@ -176,9 +178,9 @@ module vgafb #(VMEM_ADDR_WIDTH = 15, VMEM_DATA_WIDTH = 32) ( (reg_sel == REG_CTL) ? status : `ifdef ENABLE_FB_ACCEL (reg_sel == REG_SHIFTER) ? acc_shifter_out_h: - (reg_sel == REG_SHIFTERM) ? acc_shifter_mask : + (reg_sel == REG_SHIFTERM) ? acc_shiftmask_buf : (reg_sel == REG_SHIFTERSP) ? acc_shifter_out_l : - (reg_sel == REG_MASKGEN) ? acc_mask_out : + (reg_sel == REG_MASKGEN) ? acc_mask_buf : `endif 32'hFFFFFFFF; @@ -335,27 +337,34 @@ module vgafb #(VMEM_ADDR_WIDTH = 15, VMEM_DATA_WIDTH = 32) ( acc_mask_in <= wr_data; end - assign acc_mask_out = { - {4{|{acc_mask_in[31:28]}}}, - {4{|{acc_mask_in[27:24]}}}, - {4{|{acc_mask_in[23:20]}}}, - {4{|{acc_mask_in[19:16]}}}, - {4{|{acc_mask_in[15:12]}}}, - {4{|{acc_mask_in[11:8]}}}, - {4{|{acc_mask_in[7:4]}}}, - {4{|{acc_mask_in[3:0]}}} - }; + // mask output is buffered to avoid timing problems + always @(posedge cpu_clk) + begin + acc_mask_buf <= { + {4{~|{acc_mask_in[31:28]}}}, + {4{~|{acc_mask_in[27:24]}}}, + {4{~|{acc_mask_in[23:20]}}}, + {4{~|{acc_mask_in[19:16]}}}, + {4{~|{acc_mask_in[15:12]}}}, + {4{~|{acc_mask_in[11:8]}}}, + {4{~|{acc_mask_in[7:4]}}}, + {4{~|{acc_mask_in[3:0]}}} + }; + end - assign acc_shifter_mask = { - {4{|{acc_shifter_out_h[31:28]}}}, - {4{|{acc_shifter_out_h[27:24]}}}, - {4{|{acc_shifter_out_h[23:20]}}}, - {4{|{acc_shifter_out_h[19:16]}}}, - {4{|{acc_shifter_out_h[15:12]}}}, - {4{|{acc_shifter_out_h[11:8]}}}, - {4{|{acc_shifter_out_h[7:4]}}}, - {4{|{acc_shifter_out_h[3:0]}}} - }; + always @(posedge cpu_clk) + begin + acc_shiftmask_buf = { + {4{~|{acc_shifter_out_h[31:28]}}}, + {4{~|{acc_shifter_out_h[27:24]}}}, + {4{~|{acc_shifter_out_h[23:20]}}}, + {4{~|{acc_shifter_out_h[19:16]}}}, + {4{~|{acc_shifter_out_h[15:12]}}}, + {4{~|{acc_shifter_out_h[11:8]}}}, + {4{~|{acc_shifter_out_h[7:4]}}}, + {4{~|{acc_shifter_out_h[3:0]}}} + }; + end `endif // diff --git a/tridoracpu/tridoracpu.xpr b/tridoracpu/tridoracpu.xpr index 4d21f83..a088319 100644 --- a/tridoracpu/tridoracpu.xpr +++ b/tridoracpu/tridoracpu.xpr @@ -358,9 +358,7 @@ - - Performs optimizations which creates alternative logic technology mapping, including disabling LUT combining, forcing F7/F8/F9 to logic, increasing the threshold of shift register inference. - + @@ -384,9 +382,7 @@ - - Best predicted directive for place_design. - + From c119a2a5bb25a12f6fc13ca1b9f0f43c9ba8703e Mon Sep 17 00:00:00 2001 From: slederer Date: Sat, 31 Jan 2026 17:26:13 +0100 Subject: [PATCH 18/24] add line/points drawing benchmark --- examples/graphbench.pas | 92 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 92 insertions(+) create mode 100644 examples/graphbench.pas diff --git a/examples/graphbench.pas b/examples/graphbench.pas new file mode 100644 index 0000000..327e72e --- /dev/null +++ b/examples/graphbench.pas @@ -0,0 +1,92 @@ +program graphbench; +var starttime,endtime:DateTime; + +procedure startBench(name:string); +begin + write(name:20, ' '); + starttime := GetTime; +end; + +procedure endBench; +var secDelta, minDelta, hourDelta:integer; + procedure write2Digits(i:integer); + begin + if i < 10 then + write('0'); + write(i); + end; +begin + endTime := GetTime; + + hourDelta := endtime.hours - starttime.hours; + minDelta := endtime.minutes - starttime.minutes; + secDelta := endtime.seconds - starttime.seconds; + + if secDelta < 0 then + begin + secDelta := 60 + secDelta; + minDelta := minDelta - 1; + end; + + if minDelta < 0 then + begin + minDelta := 60 + minDelta; + hourDelta := hourDelta - 1; + end; + + write2Digits(hourDelta); + write(':'); write2Digits(minDelta); + write(':'); write2Digits(secDelta); + writeln; +end; + +function randint(lessthan:integer):integer; +var r:integer; +begin + r := random and 511; + if r >= lessthan then + r := r - lessthan; + randint := r; +end; + +procedure drawlines(count:integer); +var i,col,x1,y1,x2,y2:integer; +begin + col := 1; + for i := 1 to count do + begin + x1 := randint(500); + y1 := randint(400); + x2 := randint(500); + y2 := randint(400); + DrawLine(x1,y1,x2,y2,col); + col := col + 1; + if col > 15 then col := 1; + end; +end; + +procedure drawpoints(count:integer); +var i,col,x,y:integer; +begin + col := 1; + for i := 1 to count do + begin + x := randint(500); + y := randint(400); + PutPixel(x,y,col); + col := col + 1; + if col > 15 then col := 1; + end; +end; + +begin + InitGraphics; + startBench('200K points'); + drawpoints(200000); + endBench; + + InitGraphics; + startBench('10K lines'); + drawlines(10000); + endBench; +end. From 66a50d5ea86bb28891476fd41b485480cfefac31 Mon Sep 17 00:00:00 2001 From: slederer Date: Sun, 1 Feb 2026 00:44:34 +0100 Subject: [PATCH 19/24] update sprites unit to use shifter/maskgen --- examples/graphbench.pas | 39 +++++++- examples/sprites.s | 204 ++++++++++------------------------------ 2 files changed, 86 insertions(+), 157 deletions(-) diff --git a/examples/graphbench.pas b/examples/graphbench.pas index 327e72e..9abbfba 100644 --- a/examples/graphbench.pas +++ b/examples/graphbench.pas @@ -1,5 +1,17 @@ program graphbench; +uses sprites; + var starttime,endtime:DateTime; + spriteData:SpritePixels; + +procedure readSpriteData(filename:string); +var f:file; +begin + open(f,filename,ModeReadOnly); + seek(f,8); (* skip file header *) + read(f,spriteData); + close(f); +end; procedure startBench(name:string); begin @@ -13,7 +25,7 @@ var secDelta, minDelta, hourDelta:integer; begin if i < 10 then write('0'); - write(i); + write(i); end; begin endTime := GetTime; @@ -49,6 +61,20 @@ begin randint := r; end; +procedure drawsprites(count:integer); +var i,col,x,y:integer; +begin + col := 1; + for i := 1 to count do + begin + x := randint(350); + y := randint(350); + PutSprite(x,y,spriteData); + col := col + 1; + if col > 15 then col := 1; + end; +end; + procedure drawlines(count:integer); var i,col,x1,y1,x2,y2:integer; begin @@ -80,13 +106,20 @@ begin end; begin + readSpriteData('rocket.sprt'); + InitGraphics; - startBench('200K points'); + startBench('points 200K'); drawpoints(200000); endBench; InitGraphics; - startBench('10K lines'); + startBench('lines 10K'); drawlines(10000); endBench; + + InitGraphics; + startBench('sprites 50K'); + drawsprites(50000); + endBench; end. diff --git a/examples/sprites.s b/examples/sprites.s index 6962eda..ab2e580 100644 --- a/examples/sprites.s +++ b/examples/sprites.s @@ -6,28 +6,13 @@ .EQU FB_WA $904 .EQU FB_IO $908 .EQU FB_PS $90C - -; calculate mask for a word of pixels -; args: word of pixels with four bits per pixel -; returns: value that masks out all pixels that are set -CALC_MASK: - LOADC $F ; pixel mask -C_M_L0: - SWAP ; swap mask and pixels value - AND.S1.X2Y ; isolate one pixel, keep args - CBRANCH.Z C_M_L1 ; if pixel is zero, dont set mask bits - OVER ; copy current mask - OR ; or into pixels value -C_M_L1: - SWAP ; swap back, ToS is now mask bits - SHL 2 ; shift mask for next pixel to the left - SHL 2 - - DUP - CBRANCH.NZ C_M_L0 ; if mask is zero, we are done - DROP ; remove mask bits - NOT ; invert result - RET + .EQU FB_PD $910 + .EQU FB_CTL $914 + .EQU FB_SHIFTER $918 + .EQU FB_SHIFTCOUNT $91C + .EQU FB_SHIFTERM $920 + .EQU FB_SHIFTERSP $924 + .EQU FB_MASKGEN $928 ; calculate vmem address from coordinates ; args: x,y @@ -67,13 +52,19 @@ CALC_VMEM_ADDR: .EQU PS_SHIFT_C 20 .EQU PS_SPILL 24 .EQU PS_STRIPE_C 28 - .EQU PS_FS 32 + .EQU PS_BPSAVE 32 + .EQU PS_FS 36 PUTSPRITE: FPADJ -PS_FS STORE PS_SPRITE_DATA STORE PS_Y STORE PS_X + LOADREG BP + STORE PS_BPSAVE + LOADC 0 + STOREREG BP + ; calculate vmem address LOAD PS_X LOAD PS_Y @@ -81,11 +72,6 @@ PUTSPRITE: CALL STORE PS_VMEM_ADDR - LOAD PS_X ; shift count = x mod 8 - LOADC 7 - AND - STORE PS_SHIFT_C - LOADC SPRITE_HEIGHT STORE PS_SPRITE_LINES @@ -93,12 +79,10 @@ PUTSPRITE: PS_LOOP1: ; set read and write address ; in the vga controller - LOADC FB_RA ; read address register LOAD PS_VMEM_ADDR - STOREI 4 ; use autoincrement to get to the next register - LOAD PS_VMEM_ADDR - STOREI - DROP + DUP + STORE.B FB_RA + STORE.B FB_WA LOAD PS_SPRITE_DATA ; address of sprite data DUP @@ -106,61 +90,19 @@ PS_LOOP1: STORE PS_SPRITE_DATA ; and store it again LOADI ; load word from orig. address + ; ------- one word of sprite pixels on stack - LOADC 0 - STORE PS_SPILL + STORE.B FB_SHIFTER + LOAD PS_X + STORE.B FB_SHIFTCOUNT - ; loop to shift pixel data to right - LOAD PS_SHIFT_C ; load shift count -PS_LOOP2: - DUP ; test it for zero - CBRANCH.Z PS_LOOP2_X + LOAD.B FB_SHIFTERM ; get shifted mask + LOAD.B FB_IO ; and background pixel data + AND ; remove foreground pixels - SWAP ; swap count with pixels - - ; save the pixel that is shifted out - LOADC $F ; mask the four bits - AND.S0 ; keep original value on stack - BROT ; and move them to MSB - BROT - BROT - SHL 2 - SHL 2 ; shift by 28 in total - - LOAD PS_SPILL ; load spill bits - SHR ; shift by four to make space - SHR - SHR - SHR - OR ; or with orig value - STORE PS_SPILL ; store new value - - SHR ; shift pixels right - SHR ; four bits per pixel - SHR - SHR - - SWAP ; swap back, count now ToS - DEC 1 - BRANCH PS_LOOP2 -PS_LOOP2_X: - DROP ; remove shift count, shifted pixels now in ToS - - DUP - LOADCP CALC_MASK ; calculate sprite mask for this word - CALL - - LOADCP FB_IO ; address of the i/o register - LOADI ; read word from video mem - - AND ; and word with mask - - OR ; OR sprite data with original pixels - - LOADCP FB_IO - SWAP - STOREI ; store result into i/o reg - DROP + LOAD.B FB_SHIFTER ; get shifted pixels + OR ; combine with background + STORE.B FB_IO ; store into vmem ; set counter for remaining stripes LOADC SPRITE_STRIPES - 1 @@ -170,8 +112,8 @@ PS_LOOP2_X: ; process spilled bits and next vertical stripe of sprite data ; PS_NEXT_STRIPE: - ; put spill bits on stack for later - LOAD PS_SPILL + ;use spill bits from first column + LOAD.B FB_SHIFTERSP LOAD PS_SPRITE_DATA ; address of sprite data DUP @@ -179,65 +121,20 @@ PS_NEXT_STRIPE: STORE PS_SPRITE_DATA ; and store it again LOADI ; load word from orig. address - ; reset spill bits - LOADC 0 - STORE PS_SPILL - - ; last spill bits are on ToS now - - ; shift pixel data to right - LOAD PS_SHIFT_C ; load shift count -PS_LOOP3: ; test it for zero + STORE.B FB_SHIFTER ; store into shifter + LOAD PS_X + STORE.B FB_SHIFTCOUNT ; shift stuff + LOAD.B FB_SHIFTER ; get shifted pixels + OR ; combine with spill bits (see above) DUP - CBRANCH.Z PS_LOOP3_X + STORE.B FB_MASKGEN ; store to mask reg to get new mask - SWAP ; swap count with pixels + LOAD.B FB_MASKGEN ; get mask for spill bits + shifted pixels + LOAD.B FB_IO ; get vmem data + AND ; remove foreground pixels from bg - ; save the pixel that is shifted out - LOADC $F ; mask the four bits - AND.S0 ; keep original value on stack - BROT ; and move them to MSB - BROT - BROT - SHL 2 - SHL 2 ; shift by 28 in total - - LOAD PS_SPILL ; load spill bits - SHR ; shift by four to make space - SHR - SHR - SHR - OR ; or with orig value - STORE PS_SPILL ; store new value - - SHR ; shift pixels right - SHR ; four bits per pixel - SHR - SHR - - SWAP ; swap back, count now ToS - DEC 1 - BRANCH PS_LOOP3 -PS_LOOP3_X: - DROP ; remove shift count, shifted pixels now in ToS - - OR ; or together with spill bits - - DUP - LOADCP CALC_MASK ; calculate sprite mask - CALL - - LOADCP FB_IO ; load original pixels - LOADI - - AND ; and with mask - - OR ; or together with original pixels - - LOADCP FB_IO - SWAP - STOREI - DROP + OR ; combine with shifted pixels + STORE.B FB_IO ; write to vmem LOAD PS_STRIPE_C ; decrement stripe count DEC 1 @@ -246,22 +143,18 @@ PS_LOOP3_X: CBRANCH.NZ PS_NEXT_STRIPE ; if non-zero, next stripe ; write spilled bits of the last stripe into next vmem word - LOAD PS_SPILL ; get spill bits + LOAD.B FB_SHIFTERSP ; get spill bits DUP - LOADCP CALC_MASK ; calculate sprite mask for spill bits - CALL + STORE.B FB_MASKGEN + LOAD.B FB_MASKGEN ; get sprite mask for spill bits - LOADCP FB_IO - LOADI ; load next vmem word + LOAD.B FB_IO ; load next vmem word AND ; apply sprite mask OR ; OR in spill bits - LOADCP FB_IO - SWAP ; swap pixels and addr - STOREI ; write back - DROP - + STORE.B FB_IO ; write to vmem + LOAD PS_SPRITE_LINES ; decrement lines count DEC 1 DUP @@ -275,7 +168,10 @@ PS_LOOP3_X: BRANCH PS_LOOP1 PS_L_XT: DROP - + + LOAD PS_BPSAVE + STOREREG BP + FPADJ PS_FS RET From bf813fac1d43250d26eac49c1a2847507fee2919 Mon Sep 17 00:00:00 2001 From: slederer Date: Sun, 1 Feb 2026 11:52:16 +0100 Subject: [PATCH 20/24] corelib: revert PUTPIXEL changes - changes to corelib made sdcard i/o unstable for unknown reasons and the performance improvement for PUTPIXEL was only about 10% --- lib/corelib.s | 189 +++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 156 insertions(+), 33 deletions(-) diff --git a/lib/corelib.s b/lib/corelib.s index 93dc81f..a21b95c 100644 --- a/lib/corelib.s +++ b/lib/corelib.s @@ -706,32 +706,108 @@ CMPWORDS_XT2: .EQU FB_PS $90C .EQU FB_PD $910 .EQU FB_CTL $914 - .EQU FB_SHIFTER $918 - .EQU FB_SHIFTCOUNT $91C - .EQU FB_SHIFTERM $920 - .EQU FB_SHIFTERSP $924 - .EQU FB_MASKGEN $928 +; set a pixel in fb memory +; parameters: x,y - coordinates +PUTPIXEL_1BPP: + ; calculate vmem address: + OVER ; duplicate x + ; divide x by 32 + SHR + SHR + SHR + SHR + SHR + SWAP + ; multiply y by words per line + SHL 2 + SHL 2 + SHL -; draw a single pixel -; args: x, y, color + ADD ; add results together for vmem addr + DUP + LOADCP FB_WA + SWAP + STOREI ; store to framebuffer write addr register + DROP + LOADCP FB_RA ; and to framebuffer read addr register + SWAP + STOREI + DROP + + ; x is now at top of stack + ; get bit value from x modulo 32 + LOADC 31 + AND + SHL 2 ; (x & 31) * 4 = offset into table + LOADCP INT_TO_PIX_TABLE + ADD + LOADI + + LOADCP FB_IO + ; read old vmem value + LOADCP FB_IO + LOADI + ; or in new bit + OR + ; write new value + STOREI + DROP + + RET + +INT_TO_PIX_TABLE: + .WORD %10000000_00000000_00000000_00000000 + .WORD %01000000_00000000_00000000_00000000 + .WORD %00100000_00000000_00000000_00000000 + .WORD %00010000_00000000_00000000_00000000 + .WORD %00001000_00000000_00000000_00000000 + .WORD %00000100_00000000_00000000_00000000 + .WORD %00000010_00000000_00000000_00000000 + .WORD %00000001_00000000_00000000_00000000 + .WORD %00000000_10000000_00000000_00000000 + .WORD %00000000_01000000_00000000_00000000 + .WORD %00000000_00100000_00000000_00000000 + .WORD %00000000_00010000_00000000_00000000 + .WORD %00000000_00001000_00000000_00000000 + .WORD %00000000_00000100_00000000_00000000 + .WORD %00000000_00000010_00000000_00000000 + .WORD %00000000_00000001_00000000_00000000 + .WORD %00000000_00000000_10000000_00000000 + .WORD %00000000_00000000_01000000_00000000 + .WORD %00000000_00000000_00100000_00000000 + .WORD %00000000_00000000_00010000_00000000 + .WORD %00000000_00000000_00001000_00000000 + .WORD %00000000_00000000_00000100_00000000 + .WORD %00000000_00000000_00000010_00000000 + .WORD %00000000_00000000_00000001_00000000 + .WORD %00000000_00000000_00000000_10000000 + .WORD %00000000_00000000_00000000_01000000 + .WORD %00000000_00000000_00000000_00100000 + .WORD %00000000_00000000_00000000_00010000 + .WORD %00000000_00000000_00000000_00001000 + .WORD %00000000_00000000_00000000_00000100 + .WORD %00000000_00000000_00000000_00000010 + .WORD %00000000_00000000_00000000_00000001 + +PUTMPIXEL: + LOADC 1 +; set a pixel in fb memory +; parameters: x,y,color - coordinates, color value (0-15) PUTPIXEL: PUTPIXEL_4BPP: .EQU PUTPIXEL_X 0 .EQU PUTPIXEL_Y 4 .EQU PUTPIXEL_COLOR 8 - .EQU PUTPIXEL_BPSAV 12 + .EQU PUTPIXEL_PIXPOS 12 .EQU PUTPIXEL_FS 16 FPADJ -PUTPIXEL_FS + STORE PUTPIXEL_COLOR STORE PUTPIXEL_Y STORE PUTPIXEL_X - LOADREG BP - STORE PUTPIXEL_BPSAV - LOADC 0 - STOREREG BP ; calculate vmem address: (x / 8) + (y * 80) LOAD PUTPIXEL_X @@ -750,36 +826,83 @@ PUTPIXEL_4BPP: ADD ; add results together for vmem addr - DUP - STORE.B FB_WA ; set as write and read addresses - STORE.B FB_RA + LOADCP FB_WA + OVER + STOREI ; store to framebuffer write addr register + DROP + LOADCP FB_RA ; and to framebuffer read addr register + SWAP ; swap addr and value for STOREI + STOREI + DROP + + LOAD PUTPIXEL_X + ; |0000.0000|0000.0000|0000.0000|0000.1111| + LOADC 7 + AND ; calculate pixel position in word + LOADC 7 + SWAP + SUB ; pixpos = 7 - (x & 7) + STORE PUTPIXEL_PIXPOS - ; create pixel data from color value in - ; leftmost pixel data bits (31-28) - LOADC 0 LOAD PUTPIXEL_COLOR - BPLC + LOAD PUTPIXEL_PIXPOS + SHR ; rcount = pixpos / 2 +ROTLOOP_: + DUP ; exit loop if rcount is 0 + CBRANCH.Z ROTLOOP_END + SWAP ; pixel value is now on top of stack + BROT ; value = value << 8 + SWAP ; rcount is now on top of stack + DEC 1 ; rcount = rcount - 1 + BRANCH ROTLOOP_ +ROTLOOP_END: + DROP ; drop rcount + ; shifted pixel value is now at top of stack + LOAD PUTPIXEL_PIXPOS + LOADC 1 + AND + CBRANCH.Z EVEN_PIXPOS + SHL 2 ; if pixpos is odd, shift by 4 bits SHL 2 - SHL 2 - STORE.B FB_SHIFTER ; store pixel into shifter +EVEN_PIXPOS: + LOAD PUTPIXEL_X + ; get bit value from x modulo 8 + LOADC 7 + AND + SHL 2 ; (x & 7) * 4 = offset into table + LOADCP INT_TO_MASK_TABLE + ADD + LOADI - LOAD PUTPIXEL_X ; use x coord as shift count - STORE.B FB_SHIFTCOUNT ; writing triggers shifting + ; read old vmem value + LOADCP FB_IO + LOADI + ; mask bits + AND + ; or in shifted pixel value + OR - LOAD.B FB_SHIFTERM ; get shift result as mask - LOAD.B FB_IO ; get background pixel data - AND ; remove bits for new pixel from bg - - LOAD.B FB_SHIFTER ; load shifted pixel - OR ; OR in new pixel bits - STORE.B FB_IO ; write new pixel data word to vmem - - LOAD PUTPIXEL_BPSAV - STOREREG BP + ; write new value + LOADCP FB_IO + SWAP + STOREI + DROP FPADJ PUTPIXEL_FS RET + .CPOOL + +INT_TO_MASK_TABLE: + .WORD %00001111_11111111_11111111_11111111 + .WORD %11110000_11111111_11111111_11111111 + .WORD %11111111_00001111_11111111_11111111 + .WORD %11111111_11110000_11111111_11111111 + .WORD %11111111_11111111_00001111_11111111 + .WORD %11111111_11111111_11110000_11111111 + .WORD %11111111_11111111_11111111_00001111 + .WORD %11111111_11111111_11111111_11110000 + ; draw a line between two points ; parameters: x0, y0, x1, y1, color .EQU DL_X0 0 From f90d52926f7a90f52a6a47e858b570ec99a063fe Mon Sep 17 00:00:00 2001 From: slederer Date: Sun, 1 Feb 2026 22:08:06 +0100 Subject: [PATCH 21/24] vgafb: simplify maskgen a bit to avoid timing problems --- examples/sprites.s | 3 +++ tridoracpu/tridoracpu.srcs/vgafb.v | 32 ++++++++++++++-------------- tridoracpu/tridoracpu.xpr | 34 ++++++++++++------------------ utils/tdrimg.py | 1 + 4 files changed, 33 insertions(+), 37 deletions(-) diff --git a/examples/sprites.s b/examples/sprites.s index ab2e580..5f50081 100644 --- a/examples/sprites.s +++ b/examples/sprites.s @@ -97,6 +97,7 @@ PS_LOOP1: STORE.B FB_SHIFTCOUNT LOAD.B FB_SHIFTERM ; get shifted mask + NOT LOAD.B FB_IO ; and background pixel data AND ; remove foreground pixels @@ -130,6 +131,7 @@ PS_NEXT_STRIPE: STORE.B FB_MASKGEN ; store to mask reg to get new mask LOAD.B FB_MASKGEN ; get mask for spill bits + shifted pixels + NOT LOAD.B FB_IO ; get vmem data AND ; remove foreground pixels from bg @@ -147,6 +149,7 @@ PS_NEXT_STRIPE: DUP STORE.B FB_MASKGEN LOAD.B FB_MASKGEN ; get sprite mask for spill bits + NOT LOAD.B FB_IO ; load next vmem word AND ; apply sprite mask diff --git a/tridoracpu/tridoracpu.srcs/vgafb.v b/tridoracpu/tridoracpu.srcs/vgafb.v index fd42627..49dad2d 100644 --- a/tridoracpu/tridoracpu.srcs/vgafb.v +++ b/tridoracpu/tridoracpu.srcs/vgafb.v @@ -341,28 +341,28 @@ module vgafb #(VMEM_ADDR_WIDTH = 15, VMEM_DATA_WIDTH = 32) ( always @(posedge cpu_clk) begin acc_mask_buf <= { - {4{~|{acc_mask_in[31:28]}}}, - {4{~|{acc_mask_in[27:24]}}}, - {4{~|{acc_mask_in[23:20]}}}, - {4{~|{acc_mask_in[19:16]}}}, - {4{~|{acc_mask_in[15:12]}}}, - {4{~|{acc_mask_in[11:8]}}}, - {4{~|{acc_mask_in[7:4]}}}, - {4{~|{acc_mask_in[3:0]}}} + {4{|{acc_mask_in[31:28]}}}, + {4{|{acc_mask_in[27:24]}}}, + {4{|{acc_mask_in[23:20]}}}, + {4{|{acc_mask_in[19:16]}}}, + {4{|{acc_mask_in[15:12]}}}, + {4{|{acc_mask_in[11:8]}}}, + {4{|{acc_mask_in[7:4]}}}, + {4{|{acc_mask_in[3:0]}}} }; end always @(posedge cpu_clk) begin acc_shiftmask_buf = { - {4{~|{acc_shifter_out_h[31:28]}}}, - {4{~|{acc_shifter_out_h[27:24]}}}, - {4{~|{acc_shifter_out_h[23:20]}}}, - {4{~|{acc_shifter_out_h[19:16]}}}, - {4{~|{acc_shifter_out_h[15:12]}}}, - {4{~|{acc_shifter_out_h[11:8]}}}, - {4{~|{acc_shifter_out_h[7:4]}}}, - {4{~|{acc_shifter_out_h[3:0]}}} + {4{|{acc_shifter_out_h[31:28]}}}, + {4{|{acc_shifter_out_h[27:24]}}}, + {4{|{acc_shifter_out_h[23:20]}}}, + {4{|{acc_shifter_out_h[19:16]}}}, + {4{|{acc_shifter_out_h[15:12]}}}, + {4{|{acc_shifter_out_h[11:8]}}}, + {4{|{acc_shifter_out_h[7:4]}}}, + {4{|{acc_shifter_out_h[3:0]}}} }; end `endif diff --git a/tridoracpu/tridoracpu.xpr b/tridoracpu/tridoracpu.xpr index a088319..5d8ff88 100644 --- a/tridoracpu/tridoracpu.xpr +++ b/tridoracpu/tridoracpu.xpr @@ -356,14 +356,12 @@ - + - - - - - - + + Vivado Synthesis Defaults + + @@ -380,24 +378,18 @@ - + - + + Default settings for Implementation. + - - - + - - - + - - - - - - + + diff --git a/utils/tdrimg.py b/utils/tdrimg.py index b7ce4cb..4eeaead 100644 --- a/utils/tdrimg.py +++ b/utils/tdrimg.py @@ -614,6 +614,7 @@ def create_image_with_stuff(imgfile): slotnr = putfile("../examples/benchmarks.pas", None , f, part, partstart, slotnr) slotnr = putfile("../examples/animate.pas", None , f, part, partstart, slotnr) + slotnr = putfile("../examples/graphbench.pas", None , f, part, partstart, slotnr) slotnr = putfile("../examples/sprites.inc", None , f, part, partstart, slotnr) slotnr = putfile("../examples/sprites.s", None , f, part, partstart, slotnr) slotnr = putfile("../examples/background.pict", None , f, part, partstart, slotnr) From 885e50c1c09838ca19f8560774cf225011f169f4 Mon Sep 17 00:00:00 2001 From: slederer Date: Sun, 1 Feb 2026 22:46:18 +0100 Subject: [PATCH 22/24] corelib: restore new PUTPIXEL implementation --- lib/corelib.s | 190 +++++++++----------------------------------------- 1 file changed, 34 insertions(+), 156 deletions(-) diff --git a/lib/corelib.s b/lib/corelib.s index a21b95c..c57a94e 100644 --- a/lib/corelib.s +++ b/lib/corelib.s @@ -706,108 +706,32 @@ CMPWORDS_XT2: .EQU FB_PS $90C .EQU FB_PD $910 .EQU FB_CTL $914 -; set a pixel in fb memory -; parameters: x,y - coordinates -PUTPIXEL_1BPP: - ; calculate vmem address: - OVER ; duplicate x - ; divide x by 32 - SHR - SHR - SHR - SHR - SHR - SWAP - ; multiply y by words per line - SHL 2 - SHL 2 - SHL + .EQU FB_SHIFTER $918 + .EQU FB_SHIFTCOUNT $91C + .EQU FB_SHIFTERM $920 + .EQU FB_SHIFTERSP $924 + .EQU FB_MASKGEN $928 - ADD ; add results together for vmem addr +; draw a single pixel +; args: x, y, color - DUP - LOADCP FB_WA - SWAP - STOREI ; store to framebuffer write addr register - DROP - LOADCP FB_RA ; and to framebuffer read addr register - SWAP - STOREI - DROP - - ; x is now at top of stack - ; get bit value from x modulo 32 - LOADC 31 - AND - SHL 2 ; (x & 31) * 4 = offset into table - LOADCP INT_TO_PIX_TABLE - ADD - LOADI - - LOADCP FB_IO - ; read old vmem value - LOADCP FB_IO - LOADI - ; or in new bit - OR - ; write new value - STOREI - DROP - - RET - -INT_TO_PIX_TABLE: - .WORD %10000000_00000000_00000000_00000000 - .WORD %01000000_00000000_00000000_00000000 - .WORD %00100000_00000000_00000000_00000000 - .WORD %00010000_00000000_00000000_00000000 - .WORD %00001000_00000000_00000000_00000000 - .WORD %00000100_00000000_00000000_00000000 - .WORD %00000010_00000000_00000000_00000000 - .WORD %00000001_00000000_00000000_00000000 - .WORD %00000000_10000000_00000000_00000000 - .WORD %00000000_01000000_00000000_00000000 - .WORD %00000000_00100000_00000000_00000000 - .WORD %00000000_00010000_00000000_00000000 - .WORD %00000000_00001000_00000000_00000000 - .WORD %00000000_00000100_00000000_00000000 - .WORD %00000000_00000010_00000000_00000000 - .WORD %00000000_00000001_00000000_00000000 - .WORD %00000000_00000000_10000000_00000000 - .WORD %00000000_00000000_01000000_00000000 - .WORD %00000000_00000000_00100000_00000000 - .WORD %00000000_00000000_00010000_00000000 - .WORD %00000000_00000000_00001000_00000000 - .WORD %00000000_00000000_00000100_00000000 - .WORD %00000000_00000000_00000010_00000000 - .WORD %00000000_00000000_00000001_00000000 - .WORD %00000000_00000000_00000000_10000000 - .WORD %00000000_00000000_00000000_01000000 - .WORD %00000000_00000000_00000000_00100000 - .WORD %00000000_00000000_00000000_00010000 - .WORD %00000000_00000000_00000000_00001000 - .WORD %00000000_00000000_00000000_00000100 - .WORD %00000000_00000000_00000000_00000010 - .WORD %00000000_00000000_00000000_00000001 - -PUTMPIXEL: - LOADC 1 -; set a pixel in fb memory -; parameters: x,y,color - coordinates, color value (0-15) PUTPIXEL: PUTPIXEL_4BPP: .EQU PUTPIXEL_X 0 .EQU PUTPIXEL_Y 4 .EQU PUTPIXEL_COLOR 8 - .EQU PUTPIXEL_PIXPOS 12 + .EQU PUTPIXEL_BPSAV 12 .EQU PUTPIXEL_FS 16 FPADJ -PUTPIXEL_FS - STORE PUTPIXEL_COLOR STORE PUTPIXEL_Y STORE PUTPIXEL_X + LOADREG BP + STORE PUTPIXEL_BPSAV + LOADC 0 + STOREREG BP ; calculate vmem address: (x / 8) + (y * 80) LOAD PUTPIXEL_X @@ -826,83 +750,37 @@ PUTPIXEL_4BPP: ADD ; add results together for vmem addr - LOADCP FB_WA - OVER - STOREI ; store to framebuffer write addr register - DROP - LOADCP FB_RA ; and to framebuffer read addr register - SWAP ; swap addr and value for STOREI - STOREI - DROP - - LOAD PUTPIXEL_X - ; |0000.0000|0000.0000|0000.0000|0000.1111| - LOADC 7 - AND ; calculate pixel position in word - LOADC 7 - SWAP - SUB ; pixpos = 7 - (x & 7) - STORE PUTPIXEL_PIXPOS + DUP + STORE.B FB_WA ; set as write and read addresses + STORE.B FB_RA + ; create pixel data from color value in + ; leftmost pixel data bits (31-28) + LOADC 0 LOAD PUTPIXEL_COLOR - LOAD PUTPIXEL_PIXPOS - SHR ; rcount = pixpos / 2 -ROTLOOP_: - DUP ; exit loop if rcount is 0 - CBRANCH.Z ROTLOOP_END - SWAP ; pixel value is now on top of stack - BROT ; value = value << 8 - SWAP ; rcount is now on top of stack - DEC 1 ; rcount = rcount - 1 - BRANCH ROTLOOP_ -ROTLOOP_END: - DROP ; drop rcount - ; shifted pixel value is now at top of stack - LOAD PUTPIXEL_PIXPOS - LOADC 1 - AND - CBRANCH.Z EVEN_PIXPOS - SHL 2 ; if pixpos is odd, shift by 4 bits + BPLC SHL 2 -EVEN_PIXPOS: - LOAD PUTPIXEL_X - ; get bit value from x modulo 8 - LOADC 7 - AND - SHL 2 ; (x & 7) * 4 = offset into table - LOADCP INT_TO_MASK_TABLE - ADD - LOADI + SHL 2 + STORE.B FB_SHIFTER ; store pixel into shifter - ; read old vmem value - LOADCP FB_IO - LOADI - ; mask bits - AND - ; or in shifted pixel value - OR + LOAD PUTPIXEL_X ; use x coord as shift count + STORE.B FB_SHIFTCOUNT ; writing triggers shifting - ; write new value - LOADCP FB_IO - SWAP - STOREI - DROP + LOAD.B FB_SHIFTERM ; get shift result as mask + NOT ; invert to get background mask + LOAD.B FB_IO ; get background pixel data + AND ; remove bits for new pixel from bg + + LOAD.B FB_SHIFTER ; load shifted pixel + OR ; OR in new pixel bits + STORE.B FB_IO ; write new pixel data word to vmem + + LOAD PUTPIXEL_BPSAV + STOREREG BP FPADJ PUTPIXEL_FS RET - .CPOOL - -INT_TO_MASK_TABLE: - .WORD %00001111_11111111_11111111_11111111 - .WORD %11110000_11111111_11111111_11111111 - .WORD %11111111_00001111_11111111_11111111 - .WORD %11111111_11110000_11111111_11111111 - .WORD %11111111_11111111_00001111_11111111 - .WORD %11111111_11111111_11110000_11111111 - .WORD %11111111_11111111_11111111_00001111 - .WORD %11111111_11111111_11111111_11110000 - ; draw a line between two points ; parameters: x0, y0, x1, y1, color .EQU DL_X0 0 From 4ad879ba68b4153d83df94f08c9d372c34679a27 Mon Sep 17 00:00:00 2001 From: slederer Date: Sun, 1 Feb 2026 23:27:25 +0100 Subject: [PATCH 23/24] Update documentation --- LICENSE.md | 2 +- doc/mem.md | 5 ++-- doc/tdraudio.md | 10 ++++---- doc/vga.md | 68 ++++++++++++++++++++++++++++++++++++++++++++----- 4 files changed, 70 insertions(+), 15 deletions(-) diff --git a/LICENSE.md b/LICENSE.md index 3755dbb..6392510 100644 --- a/LICENSE.md +++ b/LICENSE.md @@ -4,7 +4,7 @@ All files, except where explicitly stated otherwise, are licensed according to t ------------------------------------------------------------------------------ -Copyright 2024 Sebastian Lederer +Copyright 2024-2026 Sebastian Lederer Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: diff --git a/doc/mem.md b/doc/mem.md index f7dbc2b..29177b2 100644 --- a/doc/mem.md +++ b/doc/mem.md @@ -22,11 +22,12 @@ The _BSEL_ and _BPLC_ instructions are designed to assist with accessing bytes w The byte ordering is big-endian. ## Accessing the I/O Area -The I/O area organizes memory slightly different. Here, pointing out individual bytes is not very useful, so the I/O controllers use register addresses with increments of one. In practice, there is only the VGA framebuffer controller which uses multiple registers. +The I/O area uses the same word addressing in increments of four to access the registers of the I/O controllers. In practice, only the VGA framebuffer controller and the audio controller use multiple registers. +For the other controllers, there is a single 32 bit register that is repeated all over the address space of the corresponding I/O slot. The individual I/O controllers each have a memory area of 128 bytes, so there is a maximum number of 16 I/O controllers. -Currently, only I/O slots 0-3 are being used. +Currently, only I/O slots 0-4 are being used. |I/O slot| Address | Controller | |--------|---------|------------| diff --git a/doc/tdraudio.md b/doc/tdraudio.md index 999ebfc..5d8b22f 100644 --- a/doc/tdraudio.md +++ b/doc/tdraudio.md @@ -10,12 +10,12 @@ For the first channel the register addresses are: |Address|Description| |-------|-----------| | $A00 | Control Register | -| $A01 | Clock Divider Register | -| $A02 | Amplitude Register | +| $A04 | Clock Divider Register | +| $A08 | Amplitude Register | -The register addresses for the second channel start at $A04, -the third channel at $A08 -and the fourth channel at $A0C. +The register addresses for the second channel start at $A10, +the third channel at $A20 +and the fourth channel at $A30. ## Reading the control register diff --git a/doc/vga.md b/doc/vga.md index b53f56d..76520f2 100644 --- a/doc/vga.md +++ b/doc/vga.md @@ -4,13 +4,16 @@ Registers |Name|Address|Description| |----|-------|-----------| |_FB_RA_ | $900 | Read Address | -|_FB_WA_ | $901 | Write Address | -| _FB_IO_ | $902 | I/O Register | -| _FB_PS_ | $903 | Palette Select | -| _FB_PD_ | $904 | Palette Data | -| _FB_CTL_ | $905 | Control Register | - - +|_FB_WA_ | $904 | Write Address | +| _FB_IO_ | $908 | I/O Register | +| _FB_PS_ | $90C | Palette Select | +| _FB_PD_ | $910 | Palette Data | +| _FB_CTL_ | $914 | Control Register | +| _FB_SHIFTER | $918 | Shift Assist Register | +| _FB_SHIFTCOUNT | $91C | Shift Count Register | +| _FB_SHIFTERM | $920 | Shifted Mask Register | +| _FB_SHIFTERSP | $924 | Shifter Spill Register | +| _FB_MASKGEN | $928 | Mask Generator Register | ## Pixel Data Pixel data is organized in 32-bit-words. With four bits per pixel, one word @@ -81,3 +84,54 @@ The control register contains status information. It can only be read. The _m_ field indicates the current graphics mode. At the time of writing, it is always 1 which denotes a 640x400x4 mode. The _vb_ bit is 1 when the video signal generator is in its vertical blank phase. + +## Shift Assist Register +The *shift assist register* can be used to accelerate shifting pixel/bitmap data. +Writing a word of pixel data to this register initialises the shifting process. + +After writing to the shift count register (see below), reading the shift assist +register retrieves the shifted pixel data. + +Writing to the shift assist register will reset the shift count. + +## Shift Count Register +Writing a number from 0-7 to the *shift count register* triggers shifting the +contents of the shift assist register. Pixel data is shifted by four bits +to the right times the shift count. Bits 31-3 of the shift count are ignored, so you can +directly write a horizontal screen coordinate to the register. + +This register cannot be read. + +## Shifter Mask Register +The *shifter mask register* contains the shifted pixel data converted into +a mask. See the *mask generator register* for an +explanation of the mask. + +## Shifter Spill Register +The *shifter spill register* contains the pixel data that has +been shifted out to the right. For example, if the shift count is two, +the spill register contains the two rightmost pixels (bits 7-0) of +the original pixel data, placed into the two topmost pixels (bits 31-24). + +The rest of the register is set to zero. + +## Mask Generator Register +The *mask generator register* creates a mask from pixel data. +For each four bits of a pixel, the corresponding four mask bits +are all set to one if the pixel value is not zero. + +This can be used to combine foreground and background pixel data +with a pixel value of zero for a transparent background color. + +Usually, the mask will be inverted with a *NOT* instruction +to clear all pixels in the background that are set in the foreground +with an *AND* instruction +before *ORing* foreground and background together. + +Example in hexadecimal, each digit is a pixel: +| Pixel Data | Mask | +|------------|------| +| $00000000 | $00000000 | +| $00000001 | $0000000F | +| $0407000F | $0F0F000F | +| $1234ABC0 | $FFFFFFF0 | From 4d103f99ec041a5e50ec5fbf64b8dcdac144d7ec Mon Sep 17 00:00:00 2001 From: slederer Date: Mon, 2 Feb 2026 00:33:50 +0100 Subject: [PATCH 24/24] corelib: PUTPIXEL can draw color 0 again --- lib/corelib.s | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/lib/corelib.s b/lib/corelib.s index c57a94e..1ac12e9 100644 --- a/lib/corelib.s +++ b/lib/corelib.s @@ -754,6 +754,9 @@ PUTPIXEL_4BPP: STORE.B FB_WA ; set as write and read addresses STORE.B FB_RA + LOAD PUTPIXEL_COLOR + CBRANCH.Z PUTPX_CLR ; color 0 is special case + ; create pixel data from color value in ; leftmost pixel data bits (31-28) LOADC 0 @@ -775,12 +778,29 @@ PUTPIXEL_4BPP: OR ; OR in new pixel bits STORE.B FB_IO ; write new pixel data word to vmem +PUTPX_XT: LOAD PUTPIXEL_BPSAV STOREREG BP FPADJ PUTPIXEL_FS RET +PUTPX_CLR: + LOADCP $F0000000 ; mask for leftmost pixel + STORE.B FB_SHIFTER ; shift accordingly + LOAD PUTPIXEL_X + STORE.B FB_SHIFTCOUNT + + LOAD.B FB_SHIFTER ; get shifted value + NOT ; invert for real mask + LOAD.B FB_IO ; get background pixels + AND ; clear pixel with mask + STORE.B FB_IO ; no need to OR in new pixel, just store to vmem + + BRANCH PUTPX_XT + + + ; draw a line between two points ; parameters: x0, y0, x1, y1, color .EQU DL_X0 0