Processor Branches and RAM¶
Overview¶
This lecture covers the last two major pieces needed to finish the single-cycle RISC-V processor: conditional branches and data memory (RAM). We start by inventorying every instruction the processor must still support, grouped into data processing, control, and memory categories. We then build the Branch Unit (BU) that compares two register values to decide whether a branch is taken, and we work out how the PCsel logic chooses between PC+4 and the branch/jump target address. Finally we add a Digital RAM component for data memory (the stack), and we compute how to size it and convert byte addresses into doubleword addresses. These pieces complete the data path and control path for Project 6.
Learning Objectives¶
- Enumerate the remaining RISC-V instructions (data processing, control, memory) the processor must support
- Describe the two-step branch mechanism: compute the branch target address (BTA), then decide whether to take the branch
- Compute
BTA = PC + imm-band explain why the immediate is a sign-extended B-type immediate - Design a Branch Unit that selects among
=,!=,<, and>=comparisons using aBUOpcontrol line - Combine
PCseland the branch outcome (PCbr) to conditionally update the PC - Size a Digital RAM data memory and convert a byte address into a doubleword address
- Connect data memory to the ALU and instruction decoder, including the helper logic for
lw/swandlb/sb - Explain why programs need explicit initialization (set up
sp, registers, and anunimpend marker)
Prerequisites¶
- RISC-V instruction formats (R, I, S, B, J types) and opcodes (Project 4, Lab 03)
- The Part 1 / Part 2 processor: PC, instruction memory, register file, ALU, and the three decoders (RegDecoder, ImmDecoder, InstDecoder)
- The instruction decoder spreadsheet methodology and ROM-based control lines (Lecture 10)
- Multiplexers, comparators, splitters, and the Digital RAM component (Lab 09, Lab 10)
- Binary/hexadecimal conversion and bit masking/shifting (Project 1, Project 3)
- JAL/JALR support and the
PCselMUX betweenPC+4and the jump target address
1. Where We Are: The Remaining Instructions¶
By this point the processor can already execute I-type and R-type data processing instructions, plus jal and jalr for calls and returns. Today we finish the instruction set. The instructions still to be handled fall into three groups.
| Data Processing | Control | Memory |
|---|---|---|
addi, add, sub, mul |
jal, jalr |
lb, sb |
sll, srl, slli, srai |
beq, bne, blt |
lw, sw |
ld, sd |
The data processing and shift instructions are handled by the ALU and are mostly already in place. jal and jalr were added in Part 2 (calls and returns). The two genuinely new capabilities are:
- Conditional branches (
beq,bne,blt,bge) — choose betweenPC+4and a target address based on a comparison of two registers. - Data memory (
lb/sb,lw/sw,ld/sd) — load and store values from a RAM that holds the program's stack.
flowchart LR
A[Instruction Word IW] --> B[InstDecoder]
B --> C[Data Processing - ALU]
B --> D[Control - PC update]
B --> E[Memory - RAM]
D --> F[Branch Unit + PCsel]
E --> G[Load/Store helper logic]
The InstDecoder's job stays the same: decode the instruction word and set control lines. The new behavior is encapsulated in dedicated components (the Branch Unit and the data memory helper circuits) so the decoder does not grow unwieldy.
Lab 10 versus Project 6¶
This material is exercised first in Lab 10 and then completed in Project 6:
- Lab 10 Part 1:
addi(li),add,unimp. - Lab 10 Part 2: adds
jal(call,j) andjalr(ret) for function calls and returns.
Most components are shared between the two parts, but Part 1 and Part 2 need separate instruction decoders because they produce different sets of control-line outputs. The submission file names matter for autograding:
You also submit a PDF of the control spreadsheet: delete most unused rows, choose Workbook and All sheets, then Download as PDF. In Project 6 the final circuit lives in a final/ subdirectory, with part1, part2, part3 showing incremental development. Note that Digital recursively searches subdirectories for components, so keep components isolated to avoid name conflicts and duplicates.
2. The Branch Mechanism¶
A conditional branch looks like:
The RISC-V assembler computes a PC-relative offset to label and encodes it as the B-type immediate. The branch is a two-step process:
-
Compute the branch target address (BTA).
imm-bis the 64-bit sign-extended immediate from the B-type instruction format. It comes from yourImmDecoder. Because it is sign-extended, branches can jump both forward (positive offset) and backward (negative offset) — backward branches are how loops work. -
Determine whether to take the branch. Compare
rs1andrs2(the register file'sRD0andRD1). Then update the PC to BTA conditionally: if the comparison is true,PC = BTA; otherwisePC = PC + 4.
This is the crucial difference from jal/jalr. Jumps always redirect the PC to the target; conditional branches redirect only if the comparison succeeds.
| Instruction class | PC update |
|---|---|
Sequential (e.g., add, addi) |
always PC + 4 |
jal / jalr (jumps) |
always the target address |
beq / bne / blt / bge (branches) |
target if comparison true, else PC + 4 |
The Four Branch Comparisons¶
| Instruction | Meaning | Comparison of rs1, rs2 |
|---|---|---|
beq |
branch if equal | rs1 == rs2 |
bne |
branch if not equal | rs1 != rs2 |
blt |
branch if less than | rs1 < rs2 |
bge |
branch if greater or equal | rs1 >= rs2 |
Note that bgt rs1, rs2, label is a pseudo-instruction implemented as blt rs2, rs1, label — swapping the operands turns "greater than" into "less than," so the hardware only needs the four comparisons above.
flowchart TD
A[Fetch branch instruction] --> B["Compute BTA = PC + imm-b (ALU)"]
A --> C["Read rs1, rs2 from RegFile"]
C --> D{"Branch Unit: comparison true?"}
D -- yes --> E[PC = BTA]
D -- no --> F[PC = PC + 4]
We must do three things in hardware: compute the BTA, do the comparison, and update the PC based on the result. The BTA computation reuses the ALU (we already use the ALU to compute the jump target for jal/jalr), so the comparison logic must live somewhere else — in a dedicated Branch Unit.
3. The Branch Unit (BU)¶
Because the ALU is already busy computing the BTA, the comparisons go into a separate Branch Unit. The BU takes the two register values and a control line that selects which comparison to perform; it outputs a single bit, take_branch (also called PCbr), that says whether the branch should be taken.
Inputs
A(64 bits): the value ofrs1(the register file'sRD0).B(64 bits): the value ofrs2(the register file'sRD1).BUOp(2 bits): selects which comparison to apply.
Output
take_branch/PCbr(1 bit): high when the branch should be taken.
Internally, the BU feeds A and B into four parallel comparators — =, !=, <, and >= — and uses a 4-input MUX driven by BUOp to select the result of the comparison that matches the current branch instruction.
BUOp (2)
|
A (64) ----+----[ = ]----0 \ |
| | \
+----[ != ]---1 | MUX --> take_branch (PCbr)
| | /
B (64) ----+----[ < ]----2 | /
| |/
+----[ >= ]---3
Mapping the BUOp selector to the comparison (one natural ordering):
BUOp |
Comparison | Branch |
|---|---|---|
00 |
A == B |
beq |
01 |
A != B |
bne |
10 |
A < B |
blt |
11 |
A >= B |
bge |
flowchart LR
A["A = rs1 (64)"] --> EQ["="]
B["B = rs2 (64)"] --> EQ
A --> NE["!="]
B --> NE
A --> LT["<"]
B --> LT
A --> GE[">="]
B --> GE
EQ --> M["MUX (BUOp)"]
NE --> M
LT --> M
GE --> M
M --> O["take_branch (PCbr)"]
Design Notes from Class¶
- Don't decode the funct3 directly inside the BU. You could drive the comparison selection straight from the instruction's
funct3field, but RISC-V branchfunct3codes are not contiguous (beq=000,bne=001,blt=100,bge=101), which would force awkward MUX wiring or dummy inputs. It is cleaner to define a tidy 2-bitBUOpcontrol line in the InstDecoder spreadsheet and map each branch instruction to it. - Add a "BU off" state. Non-branch instructions should not accidentally signal a taken branch. The control spreadsheet should set things up so that for non-branch instructions the BU output is forced to 0 (or
PCselis 0, which has the same effect — see the next section). One option is to encode a "branch unit disabled" mode in the control lines. - One comparator can do it all. A single subtractor/comparator can in principle produce equal, not-equal, less-than, and greater-or-equal flags simultaneously, which avoids four separate comparator components. Either approach is acceptable; the four-comparator version is easiest to read.
The BU is a combinational component — given A, B, and BUOp, it produces take_branch immediately, with no clock needed.
4. PC Selection: Combining PCsel and the Branch Outcome¶
The processor already has a PCsel MUX that chooses between PC+4 and the target address (used for jal/jalr). For branches we need that choice to depend on both the decoder (PCsel) and the runtime comparison result (PCbr). The agreed policy:
PCsel = 1for branch instructions (and jumps),PCsel = 0for non-branch instructions such asadd.- When
PCsel = 0, the PC always advances toPC+4, regardless of any branch/jump signals. - When
PCsel = 1, whether we use the target depends on the branch outcome.
Two implementations were discussed in class.
Option 1: A Pre-MUX Selected by PCbr¶
Use a small inner MUX driven by PCbr to choose between PC+4 and BTA, then feed that into the main PCsel MUX (which also selects the jump target, JTA). This keeps PCsel as the top-level "are we redirecting the PC?" signal and lets the branch outcome decide the branch target separately.
PCsel (selects PC+4 / JTA / branch-result)
|
PC+4 ----------0 |
JTA ----------1 MUX ----> PC
(branch result) 2 |
|
PCbr |
| |
PC+4 -0 \ |
MUX ------+ (inner MUX: PCbr picks PC+4 or BTA)
BTA -1 /
Option 2 (rejected): Gate PCsel with PCbr¶
Kevin's simpler-looking idea was to AND PCsel with PCbr and use that to drive a MUX choosing among PC+4, JTA, and BTA. In class this option was crossed out — the gating interacts badly with the always-redirect jumps, so the cleaner pre-MUX approach (Option 1) is preferred.
flowchart TD
PCbr["PCbr (from Branch Unit)"] --> IMUX["inner MUX"]
PC4a["PC+4"] --> IMUX
BTA["BTA (from ALU)"] --> IMUX
IMUX --> PMUX["PCsel MUX"]
PC4b["PC+4"] --> PMUX
JTA["JTA"] --> PMUX
PCsel["PCsel (from InstDecoder)"] --> PMUX
PMUX --> PC["PC register"]
The key behavioral guarantee: if PCsel = 0, the next PC is PC+4 no matter what the Branch Unit or jump logic says. This is what makes ordinary sequential instructions correct. The InstDecoder spreadsheet must therefore add the new control bits (PCsel, BUOp) and set PCsel = 0 for every existing non-branch instruction.
New Control Lines¶
| Signal | Width | Meaning |
|---|---|---|
PCsel |
1 (or wider) | Selects whether PC is redirected; chooses PC+4 vs. target |
BUOp |
2 | Selects the Branch Unit comparison (beq/bne/blt/bge) |
PCbr / take_branch |
1 | Branch Unit output: 1 if the comparison succeeded |
5. Worked Example: A Loop Using a Branch¶
To see the branch path exercised end-to-end, consider a countdown loop. RISC-V assembly:
main:
li t0, 3 # counter = 3
li t1, 0 # accumulator = 0
loop:
beq t0, zero, done # if counter == 0, exit loop
add t1, t1, t0 # acc += counter
addi t0, t0, -1 # counter -= 1
jal loop # unconditional jump back (j loop)
done:
add a0, t1, zero # a0 = acc (result)
unimp # end marker
This computes 3 + 2 + 1 = 6 into a0. Trace the relevant control decisions:
| Instruction | PCsel |
BUOp |
PCbr |
Next PC |
|---|---|---|---|---|
beq t0, zero, done (t0 = 3) |
1 | 00 (==) |
0 (3 != 0) | PC + 4 |
add t1, t1, t0 |
0 | x | x | PC + 4 |
addi t0, t0, -1 |
0 | x | x | PC + 4 |
jal loop |
1 | x (jump) | n/a | JTA (loop) |
| ... (after t0 reaches 0) | ||||
beq t0, zero, done (t0 = 0) |
1 | 00 (==) |
1 (0 == 0) | BTA (done) |
The backward jal loop works because the J-type immediate (and the B-type immediate for branches) is sign-extended, so the offset can be negative. The loop exits exactly when beq finally sees t0 == 0 and PCbr goes high, redirecting the PC to done.
6. Data Memory: Adding RAM¶
Programs need somewhere to store and load data. For our processor this is the stack (arrays, strings, saved registers, and the calling convention), though a heap could be added the same way. The processor must support ld/sd (doubleword), lw/sw (word), and lb/sb (byte).
We use Digital's RAM (Separated Ports) component. You configure two things:
- Data bits — the width of each stored element. We use 64 data bits so each cell holds a doubleword. This makes
ld/sdtrivial. - Address bits — the number of elements (cells).
The completed processor now has all the major sub-circuits side by side:
+----+ +--------+ +--------+ +-----+ +-----------+
| PC | | Inst | | Reg | | ALU | | Data Mem |
| | | Mem | | File | | BU | | RAM |
+----+ +--------+ +--------+ +-----+ +-----------+
^
|
stack
Sizing the RAM¶
Worked example from class. We want a 1024-byte data memory built from 64-bit cells.
64 = 2^6 bits per cell
2^3 bytes per cell = 8 bytes (64 bits / 8 = 8)
How many cells (n) for 1024 bytes?
2^3 (bytes/cell) * n = 1024
n = 1024 / 8 = 128 = 2^7
So: 2^3 * 2^7 = 2^10 = 1024 bytes
That means the RAM needs 7 address bits (2^7 = 128 cells) and 64 data bits per cell, for a total of 1024 bytes. In the test programs the stack pointer is initialized near the top of this region, for example li sp, 1024.
| Quantity | Value |
|---|---|
| Bytes per cell | 8 (2^3) |
| Bits per cell | 64 (2^6) |
| Number of cells | 128 (2^7) |
| Address bits | 7 |
| Total size | 1024 bytes (2^10) |
Byte Address vs. Doubleword Address¶
The ALU computes the target memory address as a byte address, because all addresses live in registers as byte addresses. But the RAM's A (ADDR) input expects a doubleword (DW) address — an index into 8-byte cells. We must convert:
In hardware you do this with a splitter: drop the low 3 bits of the byte address and feed the remaining high bits into the RAM's address input. (The low 3 bits are the byte offset within a doubleword.)
byte_addr (from ALU):
bit: ... 9 8 7 6 5 4 3 | 2 1 0
\-----------/ \---/
DW address byte-in-DW offset
(to RAM ADDR) (discarded for ld/sd)
flowchart LR
ALU["ALU result (byte address, 64b)"] --> SP["splitter: drop low 3 bits"]
SP --> RAM["RAM ADDR (DW address)"]
RAM -->|D out 64b| LD["load logic"]
SI["store logic"] -->|D in 64b| RAM
7. Connecting RAM and Supporting Sub-Word Access¶
The RAM connects to the ALU (which computes the address) and to the InstDecoder (which provides control lines). Loads route the RAM's D output back to the register file; stores route a register value into the RAM's Din.
New control lines from the InstDecoder for memory operations:
| Signal | Meaning |
|---|---|
LD (ld) |
RAM read enable (load) |
ST (str) |
RAM write enable (store) |
MSZ |
Memory size: byte / word / doubleword |
M2R (or expanded WDsel) |
Selects RAM output to write back to RegFile |
For loads, you either expand the existing WDsel MUX or add a new two-input M2R MUX that selects between the ALU result and the RAM output and feeds the register file's write-data input.
ld/sd (doubleword) — the easy case¶
Because each cell is 64 bits, ld reads a full cell at the DW address and sd writes a full cell. No sub-word logic is needed.
lw/sw (word) — read-modify for stores¶
The RAM stores 64-bit cells but a word is only 32 bits, so we add helper logic. Crucially, keep the RAM component at the top level of the processor so you can open it during simulation; add load logic after the RAM and store logic before the RAM.
Load word. The ALU computes a 4-byte-aligned byte address; the splitter converts it to a DW address. Read the 64-bit cell, split it into the lower 32 bits (0-31) and upper 32 bits (32-63), and use a MUX to pick which half. The selector is bit 2 of the byte address (the word index within a doubleword — bits 0-1 are the byte index). Sign-extend the chosen 32-bit value to 64 bits.
byte_addr bit 2 = word index inside the doubleword
bit 2 = 0 -> lower word (bits 0..31)
bit 2 = 1 -> upper word (bits 32..63)
Store word. Stores are harder: we must preserve the other 32 bits of the cell. Set both ld and str high so we read the current 64-bit value (D64cur) and write back in the same clock cycle. Take the update value from RD1 (D64in), extract its lower 32 bits (Wnew), and recombine with the untouched half of D64cur. Using splitters: extract W0 (bits 0-31) and W1 (bits 32-63) from D64cur, build the two candidate cells Wnew:W1 and W0:Wnew with mergers, then a MUX selected by bit 2 picks the right one. That feeds the MSZ MUX, then the RAM Din.
lb/sb (byte)¶
Derive byte support the same way as word support, but operate on 8-bit slices using bits 0-2 of the byte address to pick which byte of the cell.
MSZ Encoding¶
Following the RISC-V funct3 low bits, the data size control values are:
| Operation | MSZ |
Width |
|---|---|---|
lb / sb |
00 |
8 bits, sign-extended to 64 |
lw / sw |
10 |
32 bits, sign-extended to 64 |
ld / sd |
11 |
64 bits |
The final load-side MUX selects among the full 64-bit cell (ld), the sign-extended 32-bit word (lw), and the sign-extended 8-bit byte (lb), driven by MSZ.
flowchart TD
RAMOUT["RAM D out (64b)"] --> LDD["ld: full 64b"]
RAMOUT --> SPW["split words, MUX by bit 2"]
SPW --> SXW["sign-extend 32->64"]
RAMOUT --> SPB["split bytes, MUX by bits 0-2"]
SPB --> SXB["sign-extend 8->64"]
LDD --> MSZMUX["MSZ MUX"]
SXW --> MSZMUX
SXB --> MSZMUX
MSZMUX --> WB["to WDsel / M2R MUX -> RegFile"]
8. Program and Code Initialization¶
The processor starts in a blank state: all registers are 0 and memory is uninitialized. So every test program needs explicit setup before it can run. The conventions for making a program runnable on the processor:
- Add an assembly
mainthat sets up parameters (Project 6 expects at least five parameters for your functions). - Initialize the stack pointer, e.g.
li sp, 1024, so loads/stores hit valid RAM. - Remove any
.globaldirectives. - Use
jalinstead ofcallfor function calls. - End the program with
unimp, the marker that tells the processor to stop fetching.
main:
li sp, 1024 # set up the stack pointer
li a0, 5 # parameter 1
li a1, 10 # parameter 2
jal myfunc # call (use jal, not call)
unimp # end marker -> processor halts
When you simulate the circuit, you press play, select the program (PROG) value, then toggle EN to 1 so execution begins — EN defaults to disabling writes to the PC and register file so you have time to choose the program.
ROM Programming Recap¶
The instruction decoder's control bits are stored in a ROM keyed by INUM. Earlier approaches (a Python script, or a hand-derived binary-to-hex equation) work for small data, but direct pasting becomes unreliable for large datasets because of formatting issues. The recommended, scalable approach is to generate a .hex file with the required prefix and load it directly into the ROM — this works at any size and is the same idea used for instruction memory (via makerom3.py).
Key Concepts¶
| Concept | Definition | Example |
|---|---|---|
| BTA | Branch Target Address, where a taken branch goes | BTA = PC + imm-b |
| imm-b | Sign-extended B-type immediate (PC-relative branch offset) | negative for backward loop branches |
| Branch Unit (BU) | Component comparing rs1,rs2 to decide if a branch is taken |
outputs take_branch / PCbr |
| BUOp | Control line selecting the BU comparison | 00===, 01=!=, 10=<, 11=>= |
| PCsel | Control bit deciding whether the PC is redirected | 1 for branch/jump, 0 for sequential |
| PCbr | Runtime signal: 1 when the branch comparison succeeds | gates use of BTA |
| Data memory | RAM holding the program's stack | 64-bit cells, used by ld/sd, etc. |
| Byte vs. DW address | Registers hold byte addresses; RAM is indexed by 8-byte cells | DW = byte >> 3 |
| MSZ | Memory-size control for load/store width | 00=byte, 10=word, 11=doubleword |
| M2R / WDsel | Selects RAM output (vs. ALU result) for register write-back | needed for loads |
Practice Problems¶
Problem 1: Branch Target Address¶
A beq instruction is at PC address 0x40 and the assembler computed a B-type immediate of -16 (decimal). What is the branch target address (BTA)?
Click to reveal solution
The negative immediate makes this a **backward** branch (target is before the branch), which is exactly how a loop's "branch back to the top" works. This is why `imm-b` must be **sign-extended** to 64 bits before adding it to the PC.Problem 2: BUOp Selection¶
For each branch instruction, give the BUOp value (using the lecture's ordering 00===, 01=!=, 10=<, 11=>=) and state when the branch is taken.
Click to reveal solution
| Instruction | `BUOp` | Branch taken when | |-------------|--------|-------------------| | `beq` | `00` | `rs1 == rs2` | | `bne` | `01` | `rs1 != rs2` | | `blt` | `10` | `rs1 < rs2` | | `bge` | `11` | `rs1 >= rs2` | `bgt rs1, rs2, label` is not a separate hardware case: the assembler turns it into `blt rs2, rs1, label`, so only the four comparisons above are needed in the Branch Unit.Problem 3: PC Update Logic¶
Fill in the next PC for each row, given the policy that PCsel = 0 forces PC+4. Assume each instruction is 4 bytes and PC = 0x100.
| Instruction | PCsel |
PCbr |
Next PC |
|---|---|---|---|
add t0,t1,t2 |
0 | x | ? |
beq taken |
1 | 1 | ? |
beq not taken |
1 | 0 | ? |
jal label |
1 | n/a | ? |
Click to reveal solution
| Instruction | `PCsel` | `PCbr` | Next PC | |-------------|---------|--------|---------| | `add t0,t1,t2` | 0 | x | `PC+4 = 0x104` | | `beq` taken | 1 | 1 | `BTA` | | `beq` not taken | 1 | 0 | `PC+4 = 0x104` | | `jal label` | 1 | n/a | `JTA` | The rule: when `PCsel = 0` the PC always advances to `PC+4` regardless of the Branch Unit. When `PCsel = 1`, a branch uses `BTA` only if `PCbr = 1`; a jump always uses its target. This is why the inner MUX (selected by `PCbr`) feeds the outer `PCsel` MUX (Option 1).Problem 4: Sizing the RAM¶
You need a data memory of 2048 bytes built from 64-bit cells. How many cells are there, and how many address bits does the RAM need?
Click to reveal solution
So the RAM is configured with **8 address bits** and **64 data bits**.Problem 5: Byte Address to Doubleword Address¶
A program executes ld t0, 16(sp) with sp = 1024. What byte address does the ALU compute, and what DW address goes to the RAM's A input?
Click to reveal solution
In hardware, a splitter drops the low 3 bits of `1040` (`0b10000010000`): the low 3 bits (`000`) are the byte-within-doubleword offset and are discarded for `ld`; the remaining high bits form `130`, the DW index sent to the RAM. Because `1040` is a multiple of 8, this is a valid doubleword-aligned `ld`.Problem 6: Load Word Half Selection¶
A lw reads from byte address 0x2C. After converting to a DW address and reading the 64-bit cell, which 32-bit half (lower bits 0-31 or upper bits 32-63) does the load logic select, and why?
Click to reveal solution
Bit 2 is the word-index selector. Since **bit 2 = 1**, the load logic selects the **upper word (bits 32-63)**. That 32-bit value is then sign-extended to 64 bits before going to the `MSZ` MUX. (If bit 2 were 0, it would select the lower word, bits 0-31.) Note `0x2C` is 4-byte aligned, as required for `lw`.Further Reading¶
- Source notes: "/notes/CS315-01 2025-11-06 Processor Branches RAM.pdf"
- Project 6 specification: /assignments/project06/
- RISC-V Unprivileged ISA Specification
- RISC-V Branch and Jump Instructions reference (RISC-V Reader)
- Digital logic simulator (hneemann/Digital)
Summary¶
-
The remaining instructions split into data processing (ALU), control (
jal/jalrplus the new branches), and memory (lb/sb,lw/sw,ld/sd); branches and data memory are the genuinely new work. -
A conditional branch is a two-step process: compute
BTA = PC + imm-b(reusing the ALU), then comparers1andrs2to decide whether to redirect the PC to the BTA or fall through toPC+4. -
The Branch Unit encapsulates the four comparisons (
=,!=,<,>=), selected by a clean 2-bitBUOpcontrol line, and outputstake_branch/PCbr— keeping non-contiguousfunct3decoding out of the data path. -
PC selection combines
PCselandPCbr: an inner MUX driven byPCbrchoosesPC+4vs.BTA, feeding the outerPCselMUX (Option 1). WhenPCsel = 0, the PC always advances toPC+4. -
Data memory uses a Digital RAM with 64-bit cells for the stack; a 1024-byte memory needs 128 cells (7 address bits). Registers hold byte addresses, so a splitter converts
byte_addr >> 3into the doubleword address for the RAM. -
Sub-word access needs helper logic: keep the RAM at the top level, add load logic after it and store logic before it; use byte-address bit 2 to pick the word half, sign-extend results, and use
MSZto select byte/word/doubleword. Stores do a read-modify-write so the rest of the cell is preserved. -
Programs require explicit initialization: set up
sp(e.g.,li sp, 1024), provide amain, drop.global, usejalinstead ofcall, and end withunimp. Decode-ROM contents are best generated as a.hexfile rather than pasted by hand.