Programs and Data Memory¶
Overview¶
This lecture prepares us to finish the single-cycle RISC-V processor by adding the last major component: data memory. We first cover how to package a RISC-V function so that it runs standalone on our Digital processor (writing an assembly main, dropping .global, and using jal/unimp). We then review the disciplined recipe for adding any new instruction to the datapath, and finally we design the data-memory subsystem for ld/sd, working out the byte-address-to-doubleword-address conversion, the load path, the store (read-modify-write) path, and the control lines that drive it all. The handwritten notes also cover Project 6 logistics and the Lab 10 spreadsheet that seeds the instruction decoder.
Learning Objectives¶
- Package a RISC-V function so it runs standalone on the Digital processor (assembly
main, no.global,jalnotcall,unimpend marker) - Initialize the stack pointer correctly given the size of the Data Memory RAM
- Compute the relationship between RAM data bits, address bits, and total capacity
- Apply the five-step recipe for adding a new instruction to the processor (pick, components, datapath, decoder, test)
- Explain why a byte address must be converted to a doubleword (DW) address before indexing the RAM, and do the conversion with a shift/splitter
- Design the
ld/sddatapath, including the M2R / WDsel write-back MUX and theld/str/clkcontrol lines - Explain the read-modify-write pattern required to implement
sw/sbon a 64-bit-wide RAM - Build Project 6 incrementally with
part1–finaldirectories and a four-sheet decoder spreadsheet
Prerequisites¶
- RISC-V assembly: registers, calling convention, the stack (Lab 01, Lab 03)
- Two's complement and sign extension (Lab 02)
- The single-cycle processor datapath: PC, Instruction Memory, Register File, ALU (Processor Guide Part 1)
- The instruction decoder spreadsheet methodology and control lines (Processor Guide Part 2)
- Digital components: registers, ROM, RAM (separated ports), MUX, splitter, comparator
- Byte-addressable memory and word layout
1. Project 6 Logistics and Incremental Development¶
Before diving into data memory, the session opened with how Project 6 should be organized. The single most important idea is incremental development: do not try to build the entire instruction decoder and full datapath at once. Start from the Lab 10 spreadsheet (which already maps a handful of instructions to control lines) and grow the circuit one instruction (or group of instructions) at a time, testing after every step.
Directory layout¶
Project 6 requires at least three partial implementations plus a final one, each in its own directory and each containing the complete set of circuit files for that stage:
project06-<github_userid>/
├── part1/ # *.dig files + *.hex files
├── part2/ # *.dig files + *.hex files
├── part3/ # *.dig files + *.hex files
└── final/ # *.dig files + *.hex files
This snapshots your progress so the graders can see evidence of incremental work (worth 10 points of the rubric). You may add more than three partial stages if you like.
Running the autograder¶
The autograder runs per directory. From inside a partial directory you run:
The -p project06 flag tells the grader which test set to use. (One of the action items from this session was fixing -p so it works correctly inside subdirectories.)
Lab 10 observations and the spreadsheet¶
Two practical observations carried over from Lab 10:
- Use a Digital Register for state elements (the PC, and any pipeline-style holding registers) — this improves robustness and avoids subtle latch issues.
- Spreadsheets drive the instruction decoder. For Project 6 you maintain one workbook with four sheets named
part1,part2,part3, andfinal. Each sheet is the decoder ROM contents for that stage. You submit both the.xlsxand a PDF export so the incremental decoder development is visible.
A potential redundancy was flagged in the Lab 10 spreadsheet — two control columns (informally "PC Select" and "WD Select") may always carry the same value, in which case one could be derived from the other. This is the kind of optimization you can make once a stage is working, never before.
2. Preparing RISC-V Functions to Run on the Processor¶
Our processor is not an operating system. There is no loader, no _start, no C runtime, and no ret address magically sitting in ra when execution begins. So a function that compiled and ran fine under the emulator or on Linux needs a small amount of wrapping before it can run on the bare Digital processor.
There are four rules.
Rule 1 — Write an assembly main¶
Execution begins at the first instruction in Instruction Memory. You must provide a main that sets up arguments and then calls the function under test. The main is responsible for putting parameters into a0, a1, ... before the call.
Rule 2 — No .global¶
.global is a directive for the linker, telling it which symbols are visible to other translation units. We assemble a single self-contained program and there is no linker step that needs it, so drop all .global directives from your test programs.
Rule 3 — Use jal, not call¶
call is a pseudo-instruction that the assembler may expand into a two-instruction sequence (auipc + jalr) so it can reach symbols anywhere in a 32-bit address space. We do not yet support auipc, and our programs are tiny, so use the real instruction jal (jump-and-link) which fits our datapath directly.
Rule 4 — End with unimp¶
There is no return to an OS. We mark the end of the program with unimp, which our processor recognizes as the end marker and uses to stop fetching/executing.
Putting it together¶
main:
# initialize the stack pointer
li sp, 1024 # top of the RAM (see Section 3)
jal swap_s # call the function under test
unimp # end marker -- stop the processor
swap_s:
# ... the function we actually want to exercise ...
ret
Compare this with a function written for the emulator, which would use .global swap_s and be invoked with call swap_s. The handwritten notes show exactly this skeleton: li sp, 1024, jal swap_s, unimp, then the swap_s: label.
flowchart TD
A["main:"] --> B["li sp, 1024 (init stack pointer)"]
B --> C["jal swap_s (link return addr into ra, jump)"]
C --> D["swap_s: body runs..."]
D --> E["ret (jalr to ra, back into main)"]
E --> F["unimp (end marker: processor halts)"]
3. Sizing the RAM and Initializing the Stack Pointer¶
Our test programs use Data Memory as a stack. The stack pointer sp must start at a valid top-of-memory address. To pick that number we need to know how big the RAM is.
Capacity arithmetic¶
The RAM we use is configured with two parameters:
- data bits — the width of one memory cell. We use 64 so that a single cell holds one doubleword (perfect for
ld/sd). - address bits — how many cells the RAM has. With
naddress bits the RAM has2^ncells.
The total number of bytes is (number of cells) × (bytes per cell). With 64-bit cells, each cell is 8 bytes = 2^3 bytes.
The handwritten notes work a concrete example: we want 1024 bytes of stack memory.
We want 1024 bytes of stack.
1024 = 2^10 bytes
Each cell holds 8 bytes = 2^3 bytes
Number of cells = 2^10 / 2^3 = 2^7 = 128 cells
=> address bits = 7
So: data bits = 64
addr bits = 7 (2^7 = 128 cells x 8 bytes = 1024 bytes)
| RAM parameter | Value | Meaning |
|---|---|---|
| data bits | 64 | one cell = one doubleword (8 bytes) |
| address bits | 7 | 2^7 = 128 cells |
| cells | 128 | 2^7 doublewords |
| total bytes | 1024 | 128 × 8 = 2^7 × 2^3 = 2^10 |
Where the stack pointer starts¶
The stack grows downward (toward lower addresses) in RISC-V. So sp is initialized to the top of the RAM region — the byte address just past the last usable byte. In the notes sp => points to the top of the RAM box, and li sp, 1024 sets it there.
byte addr
1024 <- sp starts here (top); stack grows DOWN
+-----------------------+
| |
| RAM | 128 cells x 64 bits
| (the stack) |
| |
0 +-----------------------+
A function that allocates a frame does addi sp, sp, -N (N a multiple of 16), uses the space, then addi sp, sp, N on the way out. Because we initialize sp to the very top, the first push lands at the highest valid doubleword.
Note: a real machine would not put
spat byte 1024 of a tiny RAM, but for our simulated stack this is exactly the right idea — the only requirement is thatspand the addresses it produces stay within0 .. 1023.
4. The Recipe for Adding a New Instruction¶
The bulk of Project 6 is repeatedly extending the processor to support more instructions. The lecture gave a five-step recipe that you apply every time. Internalize it — every new instruction (including the data-memory ones we design below) follows it.
flowchart TD
S1["1. Pick an instruction (or group of instructions)"] --> S2
S2["2. Modify or add components"] --> S3
S3["3. Modify or add to the datapath"] --> S4
S4["4. Extend the instruction decoder / control lines"] --> S5
S5["5. Test"]
S5 -.->|"repeat for next instruction"| S1
Step 1 — Pick an instruction or group¶
Choose one instruction, or a tightly related group (e.g. all the conditional branches, or ld+sd together). Small steps make debugging tractable.
Step 2 — Modify or add components¶
Decide what hardware the instruction needs. Maybe the ALU needs a new operation, maybe you need a new sub-circuit (like a Branch Unit), or a RAM, or a sign-extender. Add it as a clean, labeled component.
Step 3 — Modify or add to the datapath¶
Connect the new component's data wires. Usually you will need to extend or add MUXes — whenever a value can now come from more than one place, a MUX selects between them. The key principle the instructor underlined:
Always add new datapath wires to new MUX inputs.
That is, when you widen a MUX, the existing input(s) keep their original meaning and the new source goes on a new input line. This keeps every previously-working instruction working without change.
Step 4 — Extend the instruction decoder / control lines¶
New components and MUXes need new control signals. The workflow is:
- Update the spreadsheet — add the new instruction row(s) and any new control-output column(s), filling in values for the new instruction and setting the new columns to their no-op value (almost always
0) for every existing instruction. - Update the instruction decoder — regenerate the ROM hex from the spreadsheet, expand the priority-encoder inputs and the ROM output split so the new control bits reach their MUXes/components.
Because new MUX inputs are added as higher-numbered inputs and the original source is input 0, existing instructions can usually leave the new control line at 0 and behave exactly as before.
Step 5 — Test¶
Run the autograder and any hand tests for just this instruction before moving on. The session emphasized single-stepping and probes as the core debugging techniques: add probes on the datapath to watch intermediate values, single-step the clock, and compare against an objdump of the test program so you can match each executing instruction to its expected behavior.
5. Data Memory: ld / sd¶
Now we add the Data Memory subsystem. We start with the doubleword instructions ld (load doubleword) and sd (store doubleword) because the RAM is 64 bits wide, so a doubleword maps to exactly one cell — no extraction or masking needed yet.
The instruction form¶
Both are computed the same way the notes spell out:
The target address is base + offset. We already have an adder that can do this: the ALU. So ld/sd reuse the ALU (with an add operation) to compute the memory address, where input A is the base register value (RD0) and input B is the sign-extended I-type immediate (imm-I) for loads, or the S-type immediate for stores.
flowchart LR
RD0["RD0 (base, e.g. sp)"] --> ALUA["ALU input A"]
IMM["imm-I / imm-S (offset)"] --> ALUB["ALU input B"]
ALUA --> ALU["ALU (add)"]
ALUB --> ALU
ALU --> TA["target byte address"]
The byte-address to doubleword-address problem¶
Here is the crucial subtlety the lecture spent time on. All addresses in registers are byte addresses. The ALU therefore produces a byte address. But our RAM is indexed by cell (doubleword) number — its A (ADDR) input expects a DW address, not a byte address.
Two facts make the fix simple:
- Convert byte address to DW address. Each cell is 8 bytes, so the DW address is
byte_addr / 8, i.e. a right shift by 3 (>> 3). Equivalently, you drop the low 3 bits with a splitter. - We only need 7 bits. The RAM has
2^7 = 128cells, so the DW address is only 7 bits wide. The ALU emits a 64-bit byte address; we take bits3..9(the 7 bits above the low 3) to get the 7-bit DW index.
The handwritten diagram shows this as a hatched conversion block sitting between the 64-bit ALU output and the 7-bit RAM A input, annotated 3-9 (take bits 3 through 9) and labeled:
1) byte addr -> dw addr (shift right by 3, or drop low 3 bits)
2) only need 7 bits (because 2^7 = 128 cells)
ALU result (64-bit byte address)
|
v
bits [9:3] <- splitter: take bits 3..9 (drop low 3, keep 7)
|
v (7 bits = DW address)
RAM "A" (ADDR) input
In Digital you implement this with a splitter: input is the 64-bit byte address, and you wire out bits 3 through 9 as a 7-bit bus into the RAM A input. (Bits 0–2 are the byte offset within a doubleword and are ignored for a ld/sd, which must be 8-byte aligned.)
The RAM component and its ports¶
We use Digital's RAM (Separated Ports) component, configured with data bits = 64 and address bits = 7. Its ports, as drawn in the notes:
| Port | Width | Direction | Purpose |
|---|---|---|---|
A (ADDR) |
7 | in | DW address to read/write |
Din |
64 | in | data to write (for store) |
Dout |
64 | out | data read at A |
str |
1 | in | store enable (write) |
ld |
1 | in | load enable (read) |
clk |
1 | in | clock |
Important: keep the RAM at the top level of your processor circuit. That is the only way to open and inspect the RAM contents during simulation, which is invaluable for debugging the stack.
Routing the load result back: the M2R / WDsel MUX¶
For a load, the value read from Dout must be written into the register file (t0 in ld t0, 8(sp)). But the register file's write-data line (WD) already receives the ALU result (for arithmetic and jal). So we need a MUX to choose the write-back source.
The guide gives two options, both shown in the notes as the "RegFile WDsel MUX":
- Expand the WDsel MUX to add a third input (the RAM
Dout), controlled by a widerWDsel(2-bit) selector. - Add a new two-input M2R (Memory-to-Register) MUX that chooses between the ALU result and RAM
Doutbased on a newM2Rcontrol line, then feed its output into the existing WDsel MUX.
Either way the principle from Section 4 holds: the RAM output goes on a new MUX input; the ALU result stays where it was.
+-----------+
ALU result ----------> | 0 |
| M2R MUX |--+
RAM Dout (load) -----> | 1 | | +-----------+
+-----------+ +---> | 0 |
^ | WDsel MUX |--> RegFile WD
M2R | 1 PC+4 |
+-----------+
^
WDsel
Putting the ld/sd datapath together¶
flowchart LR
RD0["RD0 (base addr)"] --> ALU
IMM["imm (offset)"] --> ALU
ALU["ALU (add)"] --> CONV["byte->DW: bits[9:3]"]
CONV --> A["RAM A (7-bit DW addr)"]
RD1["RD1 (value to store)"] --> DIN["RAM Din (64)"]
A --> RAM["RAM 64x128"]
DIN --> RAM
RAM --> DOUT["RAM Dout (64)"]
DOUT --> M2R["M2R / WDsel MUX"]
M2R --> WD["RegFile WriteData"]
Control lines for ld and sd¶
Two RAM control inputs drive the operation: ld (read enable) and str (write enable). Plus the write-back selector (WDsel, 2 bits, or M2R, 1 bit). The handwritten control-line table:
| inst | WDsel (2) or M2R (1) |
ld |
str |
|---|---|---|---|
ld |
select RAM Dout |
1 | 0 |
sd |
(don't care — no write to RF) | 0 | 1 |
So a ld reads the RAM (ld=1, str=0) and routes Dout to the register file; an sd writes the RAM (ld=0, str=1) and does not write the register file (RFW=0, so WDsel/M2R is a don't-care). These are exactly the new control-output columns you add to the decoder spreadsheet in Step 4 of the recipe.
6. lw / sw and lb / sb: Sub-Doubleword Access¶
Once ld/sd work, we extend to smaller sizes: word (lw/sw, 32 bits) and byte (lb/sb, 8 bits). The challenge: our RAM cell is 64 bits, but a word is only half a cell and a byte is one-eighth. We must extract or insert the right slice after (for loads) or before (for stores) the RAM, keeping the RAM itself unchanged at 64-bit width and at the top level.
The handwritten page for lw/sw draws a 64-bit cell being split into two 32-bit halves and selected by a MUX — that is the load path.
Word addressing inside a doubleword¶
A 64-bit doubleword holds two 32-bit words. Given a byte address:
- bits
0..1= byte index within a word - bit
2= word index within the doubleword (0 = lower word, 1 = upper word) - bits
3..= doubleword (cell) index
So once we read the 64-bit cell (using bits [9:3] for the DW address as before), bit 2 of the byte address tells us which 32-bit half we want.
64-bit cell (one RAM doubleword)
+---------------------+---------------------+
| upper word (32) | lower word (32) |
| bits 63..32 | bits 31..0 |
+---------------------+---------------------+
word index 1 word index 0
^
selected by byte-address bit 2
Load word (lw) — extract then sign-extend¶
1. ALU computes target byte address (4-byte aligned for lw).
2. Splitter bits[9:3] -> DW address -> RAM A; RAM ld=1.
3. Split RAM Dout (64) into W0 = bits[31:0] and W1 = bits[63:32].
4. MUX selects W0 or W1 using byte-address bit 2 (2-2) as selector.
5. Sign-extend the chosen 32-bit value to 64 bits.
6. A memory-size (MSZ) MUX selects the final 64-bit value:
lb -> 8-bit sign-extended to 64
lw -> 32-bit sign-extended to 64
ld -> full 64-bit value
7. That value goes to the M2R / WDsel MUX -> RegFile.
Following the RISC-V funct3 ordering, the MSZ (memory size) encodings are: lb = 0b00, lw = 0b10, ld = 0b11.
flowchart LR
DOUT["RAM Dout (64)"] --> SPL["split: W0[31:0], W1[63:32]"]
SPL --> WMUX["word MUX (sel = byte addr bit 2)"]
WMUX --> SX["sign-extend 32 -> 64"]
SX --> MSZ["MSZ MUX (lb / lw / ld)"]
DOUT --> MSZ
MSZ --> M2R["to M2R / WDsel MUX -> RegFile"]
Store word (sw) — read-modify-write¶
Storing a word is harder than loading because we can only write a whole 64-bit cell, but we want to change just one 32-bit half and leave the other half untouched. The answer is the read-modify-write pattern the session introduced:
- Read the current 64-bit cell (
D64cur) at the target DW address. - Modify only the selected 32-bit half, inserting the new word (
WnewfromRD1). - Write the merged 64-bit value back to the same cell.
Because we read and write the same cell in one clock cycle, you must set both RAM control lines: ld = 1 and str = 1.
The splitter/merger construction from the guide:
D64cur (current cell) --split--> W0 (bits 31..0), W1 (bits 63..32)
D64in (RD1) --split--> Wnew (low 32 bits we want to write)
Build two candidate 64-bit values with mergers:
option A (write lower word): { W1 : Wnew } keep upper, replace lower
option B (write upper word): { Wnew : W0 } keep lower, replace upper
word MUX selects A or B using byte-address bit 2
|
v
MSZ MUX selects sb / sw / sd value
|
v
RAM Din (with ld=1, str=1)
flowchart TD
CUR["RAM Dout = D64cur"] --> S1["split: W0, W1"]
RD1["RD1 = D64in"] --> S2["take Wnew = low 32"]
S1 --> M1["merge {W1:Wnew}"]
S2 --> M1
S1 --> M2["merge {Wnew:W0}"]
S2 --> M2
M1 --> WMUX["word MUX (sel = bit 2)"]
M2 --> WMUX
WMUX --> MSZ["MSZ MUX (sb/sw/sd)"]
MSZ --> DIN["RAM Din (str=1, ld=1)"]
lb / sb by analogy¶
Byte access works the same way at finer granularity. For lb, use byte-address bits 0..2 to select one of the eight bytes in the cell, then sign-extend 8 -> 64. For sb, read-modify-write the chosen byte, keeping the other seven. The lecture noted that once you have lw/sw working, lb/sb follow the same structure with a byte-selecting MUX instead of a word-selecting one.
Why a 64-bit value, not just 32¶
The notes' rightmost sketch reinforces the abstraction: a word is conceptually one unit, but in our 64-bit memory it lives inside a 64-bit cell. When we load a word we get back a 64-bit value (the cell), pick the relevant 32 bits, and sign-extend back to a full 64-bit register value — because RISC-V registers are 64 bits wide and signed loads sign-extend.
7. Memory Sizes, Alignment, and Sign Extension Recap¶
A consolidated view of the size hierarchy our memory subsystem must handle:
| Mnemonic | Size | Bits | Alignment (byte addr multiple of) | Selector inside cell |
|---|---|---|---|---|
lb/sb |
byte | 8 | 1 | byte-addr bits 2..0 |
lw/sw |
word | 32 | 4 | byte-addr bit 2 |
ld/sd |
doubleword | 64 | 8 | none (whole cell) |
Key invariants to keep straight:
- Addresses in registers are always byte addresses. Every conversion to a cell index happens just before the RAM
Ainput. - The DW (cell) index = byte address >> 3 = byte-address bits
[9:3]for our 128-cell RAM. - Loads sign-extend the loaded value to 64 bits (these are the signed
lb/lw/ld). To sign-extend an n-bit value to 64 bits you replicate the top (sign) bit. In bit-shift terms: shift the value all the way left, then shift-right-arithmetic all the way back (the same trick from Lab 02). - Stores below 64 bits require read-modify-write because the RAM is written one full cell at a time.
Sign extension example (8 -> 64), the lb case:
byte read = 0b1111_1110 (= -2 as a signed byte)
sign bit = 1
result = 0xFFFF_FFFF_FFFF_FFFE (= -2 as a signed 64-bit value)
Key Concepts¶
| Concept | Definition | Example |
|---|---|---|
| Standalone program | A RISC-V program wrapped to run on the bare processor: assembly main, no .global, jal not call, unimp end marker |
main: li sp,1024; jal swap_s; unimp |
unimp end marker |
Instruction the processor recognizes to stop executing | terminates the program after the last call returns |
| Byte address | An address that counts individual bytes; what all registers hold | sp + 8 from ld t0, 8(sp) |
| Doubleword (DW) address | A cell index into 64-bit-wide RAM = byte address >> 3 | byte addr 16 -> DW addr 2 |
| byte->DW conversion | Drop low 3 bits (÷8) and keep enough high bits for the RAM | splitter taking bits [9:3] (7 bits) |
| RAM data/addr bits | Cell width and number of cells; capacity = 2^addr × (data/8) bytes |
64 data bits, 7 addr bits → 1024 bytes |
| M2R / WDsel MUX | MUX that selects the register write-back source (ALU result vs. RAM Dout) |
M2R=1 routes Dout to WD for ld |
ld / str lines |
RAM read-enable and write-enable control lines | ld: ld=1,str=0; sd: ld=0,str=1 |
| Read-modify-write | Read a 64-bit cell, replace a sub-field, write it back | sw/sb set both ld=1 and str=1 |
| Word index bit | Byte-address bit 2 selects which 32-bit half of a cell | bit2=0 lower word, bit2=1 upper word |
| MSZ (memory size) | Control selecting byte/word/doubleword path | lb=0b00, lw=0b10, ld=0b11 |
Practice Problems¶
Problem 1: Size the RAM and place the stack pointer¶
You want 2 KiB (2048 bytes) of stack memory in a RAM with 64 data bits. How many address bits does the RAM need, how many cells does it have, and what value should li sp, ... use to put sp at the top?
Click to reveal solution
Each cell is 64 bits = 8 bytes = `2^3` bytes. So configure the RAM with **data bits = 64, address bits = 8** (256 cells). The stack grows downward, so initialize `sp` to the top byte address: The first push (`addi sp, sp, -16`) then lands inside the valid range `0..2047`.Problem 2: Convert a byte address to a DW address¶
A program executes ld t0, 24(sp) where sp = 1000. What is the target byte address, and what DW (cell) index goes into the RAM A input for our 7-bit-address, 128-cell RAM?
Click to reveal solution
Wait — `1024` is one past the top of a 1024-byte RAM, so in practice `sp` would be lower; treat this as pure arithmetic. The DW index is the byte address divided by 8 (shift right 3): `128` does not fit in 7 bits (valid DW indices are `0..127`), which signals an out-of-range access. For an in-range example, `ld t0, 16(sp)` with `sp = 1000` gives byte addr `1016`, DW addr `1016 >> 3 = 127` — the last valid cell. The splitter wires byte-address bits `[9:3]` to the 7-bit `A` input; bits `0..2` (here `0`) are ignored because `ld` is 8-byte aligned.Problem 3: Control-line table¶
Fill in the RAM control lines (ld, str) and whether the register file is written (RFW) for each of: ld, sd, lw, sw.
Click to reveal solution
| inst | `ld` (read) | `str` (write) | `RFW` | notes | |------|-------------|---------------|-------|-------| | `ld` | 1 | 0 | 1 | read cell, write result to RF | | `sd` | 0 | 1 | 0 | write whole cell, no RF write | | `lw` | 1 | 0 | 1 | read cell, extract+sign-ext word to RF | | `sw` | 1 | 1 | 0 | read-modify-write: read cell, merge word, write back | The surprise is `sw`: because we change only 32 of the 64 bits, we must **read** the current cell *and* **write** the merged value in the same cycle, so both `ld=1` and `str=1`.Problem 4: Why read-modify-write?¶
The RAM is 64 bits wide. Explain why sw t0, 0(sp) cannot simply write t0's low 32 bits into the RAM, and describe the merge for storing into the upper word of a cell whose current value is 0xAAAA_AAAA_BBBB_BBBB with t0 = 0x1111_1111.
Click to reveal solution
The RAM has no way to write *just* 32 bits — its `Din` is a full 64-bit cell and a write replaces the entire cell. If we wrote only the low 32 bits, the upper 32 bits would have to come from somewhere; whatever we drove on the rest of `Din` would clobber the other word in the cell. So we must preserve the half we are not changing. For the upper word (byte-address bit 2 = 1) we keep the **lower** word `W0` and replace the **upper** word with `Wnew`: This merged value is written back with `ld=1, str=1` in the same cycle. The lower word `0xBBBB_BBBB` is preserved; only the upper word changed.Problem 5: Trace a standalone program¶
What is wrong with running this program on the Digital processor, and how do you fix it so it runs standalone?
Click to reveal solution
Two problems: there is no `main` to set up arguments and start execution, and it uses `.global`. There is also no end marker, and `ra` is never initialized so `ret` has nowhere valid to return to. Fixed standalone version:main:
li sp, 1024 # top of RAM; gives swap_s a place to read/write
li a0, 992 # address of first value on the stack
li a1, 1000 # address of second value
jal swap_s # link return into ra, jump
unimp # end marker -- processor halts here
swap_s:
ld t0, 0(a0)
ld t1, 0(a1)
sd t1, 0(a0)
sd t0, 0(a1)
ret # jalr to ra -> back into main, then unimp
Problem 6: Apply the five-step recipe¶
You are about to add lw/sw to a processor that already supports ld/sd. Walk through the five-step recipe and name the concrete change at each step.
Click to reveal solution
1. **Pick** — choose the group `lw` + `sw` (related word-size memory ops). 2. **Components** — add a 32-bit sign-extender for the load path; reuse the existing RAM (no width change). 3. **Datapath** — on the load side, split `Dout` into two 32-bit halves and add a word-select MUX (selector = byte-addr bit 2), feed into a new MSZ MUX. On the store side, add the read-modify-write merge (splitters + mergers + word-select MUX + MSZ MUX) before `Din`. All new sources go on *new* MUX inputs. 4. **Decoder** — add `lw` and `sw` rows to the spreadsheet; add/extend the `MSZ` control column (`lb=0b00, lw=0b10, ld=0b11`); set `sw` to `ld=1, str=1` (read-modify-write) and `lw` to `ld=1, str=0`. Regenerate the ROM hex and widen the priority encoder and ROM output split. 5. **Test** — `grade test -p project06` in the relevant directory; single-step a failing word load/store with probes on the word-select MUX, the sign-extender, and `Din`/`Dout`.Further Reading¶
- Processor Guide Part 1: /guides/processor-part-1/ — major components, ALU, register file
- Processor Guide Part 2: /guides/processor-part-2/ — register/immediate/instruction decoders and the spreadsheet methodology
- Processor Guide Part 3: /guides/processor-part-3/ — conditional branching and data memory (the source for the
ld/sd,lw/swdesign above) - Project 6 spec: /assignments/project06/
- Instructor handwritten notes: /notes/CS315-01 2025-11-11 Progs Data Memory.pdf
- RISC-V Instruction Set Manual, Volume I (load/store, immediates)
- Digital simulator documentation
Summary¶
-
Standalone packaging: to run a function on the bare processor, write an assembly
main, remove.global, usejalinstead ofcall, and end withunimp. -
Stack pointer init depends on RAM size: with 64 data bits and 7 address bits the RAM holds
2^7 × 8 = 1024bytes, soli sp, 1024placesspat the top of the downward-growing stack. -
Capacity arithmetic: total bytes =
2^(addr bits) × (data bits / 8); chooseaddr bitsto get the stack size you need. -
Five-step recipe for every new instruction: pick it, add/modify components, extend the datapath (usually new MUX inputs), update the decoder spreadsheet and ROM, then test — incrementally.
-
Byte vs. DW address: registers hold byte addresses; the RAM is indexed by doubleword. Convert with a right-shift-by-3 (a splitter taking bits
[9:3]for our 128-cell RAM). -
ld/sddatapath: the ALU computesbase + offset; loads route RAMDoutback to the register file through the M2R/WDsel MUX (ld=1, str=0); stores write the whole cell (ld=0, str=1). -
Sub-doubleword access:
lw/lbextract a slice with the word-index bit (byte-addr bit 2) and sign-extend to 64 bits;sw/sbneed read-modify-write (ld=1andstr=1) to change part of a cell without clobbering the rest. -
Build and debug incrementally: keep
part1–finaldirectories and a four-sheet decoder spreadsheet, keep the RAM at the top level so you can inspect it, and use single-stepping and probes against anobjdumpto find faults.