← Back to Course
# Lab: Data Memory — LW and SW ## CS 315 Computer Architecture --- ## What We Are Adding Today Our single-cycle processor already handles: - Arithmetic, logic, shifts - Jumps, branches - Register reads and writes **Missing:** the ability to read from and write to **data memory** Today: add a RAM component and wire `ld`/`sd`, then `lw`/`sw` --- ## RISC-V Load / Store Instructions | Mnemonic | Size | Action | |----------|------|--------| | `ld` / `sd` | 64-bit | `rd = mem[rs1+imm]` / `mem[rs1+imm] = rs2` | | `lw` / `sw` | 32-bit | load sign-extended word / store low 32 bits | | `lb` / `sb` | 8-bit | load sign-extended byte / store low byte |
Lab strategy: get
ld
/
sd
working first, then add wrapper logic for
lw
/
sw
.
--- ## High-Level Datapath
flowchart LR RF["Register File\nbase addr / store val"] --> ALU["ALU\nTA = base + imm"] IMM["ImmDecoder\nimm-i / imm-s"] --> ALU ALU --> SP["splitter\nbyte to DW addr"] SP --> RAM["RAM\n64-bit cells"] RF --> RAM RAM --> LL["Load logic\nselect + sign-ext"] LL --> MUX["Write-back MUX\nM2R / WDsel"] MUX --> RF
--- ## The Digital RAM Component Use **RAM, Separated Ports** in Digital | Setting | Value | Why | |---------|-------|-----| | **Data bits** | 64 | One cell = one RISC-V doubleword | | **Addr bits** | 7 | 2^7 = 128 cells × 8 bytes = **1024 bytes** | | Pin | Direction | Meaning | |-----|-----------|---------| | `A` | in | 7-bit doubleword address | | `Din` | in | 64-bit value to write | | `Dout` | out | 64-bit value read | | `str` | in | write enable (`MST`) | | `ld` | in | read enable (`MLD`) | | `clk` | in | clock | --- ## Keep the RAM at the Top Level
Critical lab tip:
place the RAM component at the
top level
of your processor circuit — not inside a sub-circuit. This is the only way Digital lets you open the RAM during simulation to inspect and pre-load its contents.
- Put **load logic after** the RAM - Put **store logic before** the RAM - The RAM itself stays exposed at the top level --- ## Byte Addresses vs. Doubleword Addresses Registers always hold **byte addresses**. The RAM is indexed by **doubleword (DW) addresses**. ```text DW address = byte address / 8 = byte address >> 3 ``` In hardware: use a **splitter** to extract bits `[9:3]` of the 64-bit ALU output. ```text bit: ... 9 8 7 6 5 4 3 | 2 | 1 0 \______ DW addr (7b) ___/ |wrd|byte| -> RAM A sel sel ```
Common mistake:
shifting by 2 (word divide) instead of 3 (doubleword divide). The RAM holds
64-bit
cells, so the address must be a
doubleword
address — take bits
[9:3]
.
--- ## Memory Layout: Three Views ```text byte addr word addr dword addr ---------- --------- ---------- 16 4 2 15 14 } word 3 ---, 13 } dword 1 12 } word 2 ---' 11 10 } word 1 ---, 9 } dword 0 (cells 0 and 1) 8 } word 0 ---' 7 6 } word 1 ---, 5 } dword 0 4 } word 0 ---' ... ``` **Selector bits inside each 64-bit cell:** - Bit 2 = **word selector** (0 → lower 32 bits, 1 → upper 32 bits) - Bits [1:0] = **byte selector** within a word --- ## Step 1: Implementing `ld` / `sd` For 64-bit loads/stores the RAM cell **is** the doubleword — no sub-word logic needed. **Datapath connections:** 1. ALU computes `TA = base + imm` 2. Splitter extracts `TA[9:3]` → RAM `A` (7 bits) 3. For `sd`: `RD1` → `Din`; assert `MST = 1` 4. For `ld`: assert `MLD = 1`; route `Dout` → write-back MUX
flowchart LR RD0["RD0 (base)"] --> ALU["ALU"] IMM["imm-i / imm-s"] --> ALU ALU -->|"TA (64-bit)"| SP["splitter\nTA[9:3]"] SP -->|"A (7-bit)"| RAM["RAM"] RD1["RD1 (store val)"] -->|"Din"| RAM RAM -->|"Dout (64-bit)"| MUX["M2R MUX"] ALU --> MUX MUX --> WD["WDsel MUX\n→ RegFile WD"]
--- ## The M2R Write-Back MUX Loads must write **memory output** into the register file, not the ALU result.
flowchart LR ALU["ALU result"] --> M2R{"M2R MUX\nsel = M2R"} RAM["RAM Dout"] --> M2R M2R --> WD["WDsel MUX"] PC4["PC + 4"] --> WD WD --> RF["RegFile WD"]
- `M2R = 0` → use ALU result (arithmetic instructions) - `M2R = 1` → use RAM `Dout` (any load) --- ## Worked Example: `sd` then `ld` ```asm li sp, 1024 # sp = 1024 (byte addr) li t0, 0xABCD # value to store sd t0, -8(sp) # mem[1016] = t0 ld t1, -8(sp) # t1 = mem[1016] ``` Trace of `sd t0, -8(sp)`: | Step | Value | |------|-------| | base = `sp` | `1024` | | imm-s | `-8` | | `TA = 1024 + (-8)` | `1016` | | DW addr = `1016 >> 3` | `127` → `A = 0b1111111` | | `Din = t0` | `0xABCD` | | `MST` | `1` (write on clock edge) | The matching `ld` recomputes the same address, asserts `MLD`, and `0xABCD` flows back into `t1`. --- ## Step 2: Load Word (`lw`) — Load Logic After RAM The RAM returns a full 64-bit cell; `lw` wants only **32 bits, sign-extended**. **Four sub-steps after `Dout`:** 1. Read the 64-bit cell (same as `ld`, assert `MLD`) 2. **Split** `Dout` into `W0 = Dout[31:0]` and `W1 = Dout[63:32]` 3. **Select** with `TA` bit 2 → 2-input MUX (bit 2 = 0: lower, bit 2 = 1: upper) 4. **Sign-extend** 32 → 64 bits --- ## Load Logic Diagram
flowchart TD RAM["RAM Dout (64)"] --> SPLIT["Splitter"] SPLIT --> W0["W0 = Dout[31:0]"] SPLIT --> W1["W1 = Dout[63:32]"] W0 --> WS{"Word-select MUX\nsel = TA bit 2"} W1 --> WS WS --> SX["Sign-extend 32 to 64"] SX --> MSZ{"MSZ MUX"} RAM --> MSZ MSZ --> OUT["to M2R / WDsel"]
`MSZ` selects load width: `00` = byte, `10` = word, `11` = dword --- ## Why Sign-Extend? `lw` loads a **signed** 32-bit value into a 64-bit register. If bit 31 of the word is 1 (negative), the upper 32 bits of the register must all be 1. | Word value | Bit 31 | 64-bit result | |------------|--------|---------------| | `0x00000007` | 0 | `0x0000000000000007` | | `0xFFFF8000` | 1 | `0xFFFFFFFFFFFF8000` | The sign extender copies bit 31 into bits `[63:32]`. --- ## Worked Example: `lw` Word Selection Cell at DW addr 0 holds `0xFFFF8000_00000007`. | Instruction | `TA` | bit 2 | Selected word | 64-bit result | |-------------|------|-------|---------------|---------------| | `lw rd, 0(x0)` | `0` | `0` | `0x00000007` | `0x0000000000000007` | | `lw rd, 4(x0)` | `4` | `1` | `0xFFFF8000` | `0xFFFFFFFFFFFF8000` |
Bug check:
if
lw
returns wrong values for offsets 4, 12, 20 but correct for 0, 8, 16 — the word-select MUX is using the wrong selector bit (should be
TA[2]
).
--- ## Step 3: Store Word (`sw`) — Store Logic Before RAM `sw` must change only **32 bits** and leave the other 32 unchanged. The solution: **read-modify-write** in a single clock cycle.
Key control trick:
for
sw
, assert
both
str = 1
and
ld = 1
. With
ld = 1
the RAM presents the current cell on
Dout
(so we can preserve the unchanged half). With
str = 1
it writes our assembled
Din
on the same clock edge.
--- ## Store Logic: Read-Modify-Write Recipe 1. Read current cell `D64cur` from `Dout` (because `ld = 1`) 2. Extract `Wnew = RD1[31:0]` (the value to store) 3. Split `D64cur` into `W0 = cur[31:0]` and `W1 = cur[63:32]` 4. Build two 64-bit candidates: - Replace lower: `[W1 | Wnew]` - Replace upper: `[Wnew | W0]` 5. Pick with `TA` bit 2 (0 → lower, 1 → upper) 6. `MSZ` store MUX selects final `Din` --- ## Store Logic Diagram
flowchart TD RD1["RD1 (store value)"] --> WN["Wnew = RD1[31:0]"] DOUT["RAM Dout = D64cur"] --> S2["Splitter"] S2 --> W0["W0 = cur[31:0]"] S2 --> W1["W1 = cur[63:32]"] WN --> M1["merge W1 and Wnew"] W1 --> M1 W0 --> M2["merge Wnew and W0"] WN --> M2 M1 --> DW{"DWn MUX\nsel = TA bit 2"} M2 --> DW DW --> MSZ{"MSZ store MUX"} RD1 --> MSZ MSZ --> DIN["RAM Din"]
--- ## Worked Example: `sw` into Lower Half Cell at DW addr 0: `0xDEADBEEF_11112222` Execute `sw rs2, 0(x0)` with `rs2 = 0x00000099` | Step | Value | |------|-------| | `TA = 0`, bit 2 = `0` | replace **lower** word | | `W1 = cur[63:32]` (preserve) | `0xDEADBEEF` | | `Wnew = rs2[31:0]` | `0x00000099` | | assembled `Din` | `0xDEADBEEF_00000099` | Only the lower 32 bits changed. Upper word preserved because `ld = 1` drove `Dout`.
Common mistake:
forgetting to assert
ld
during a store. If
ld = 0
,
Dout
is not driven and you clobber the neighboring word.
--- ## New Control Lines | Line | Width | Purpose | Set when... | |------|-------|---------|-------------| | `MLD` | 1 | RAM `ld` — drive `Dout` | any load **and** any sub-word store | | `MST` | 1 | RAM `str` — write on edge | any store | | `MSZ` | 2 | memory size select | loads and stores | | `M2R` | 1 | write-back = memory (not ALU) | any load | `MSZ` encoding follows RISC-V `funct3`: | Op | `MSZ` | |----|-------| | `lb` / `sb` | `0b00` | | `lw` / `sw` | `0b10` | | `ld` / `sd` | `0b11` | --- ## Decoder Spreadsheet Rows | Instr | `M2R` | `MLD` | `MST` | `MSZ` | |-------|-------|-------|-------|-------| | `ld` | 1 | 1 | 0 | 11 | | `lw` | 1 | 1 | 0 | 10 | | `sd` | 0 | 0 | 1 | 11 | | `sw` | 0 | 1 | 1 | 10 |
Note:
sw
sets
both
MLD = 1
and
MST = 1
(read-modify-write), while
sd
only needs
MST = 1
because it overwrites the whole 64-bit cell.
--- ## Top-Level View
flowchart LR ALU["ALU TA (64, byte addr)"] --> SP["splitter\nTA[9:3]"] SP --> RAMA["RAM A (7, DW addr)"] subgraph STORE["Store logic"] RD1["RD1"] --> SL["read-modify-write\nMSZ store MUX"] end DOUT2["RAM Dout"] --> SL SL --> DIN["RAM Din"] RAMA --> RAM["RAM 64-bit\nstr=MST ld=MLD"] DIN --> RAM RAM --> DOUT["RAM Dout"] RAM --> DOUT2 subgraph LOAD["Load logic"] DOUT --> LL["word-select bit 2\nsign-extend + MSZ"] end LL --> M2R{"M2R MUX"} ALU --> M2R M2R --> WD["WDsel MUX\n→ RegFile WD"]
--- ## Build Order 1. Add the RAM (64-bit data, 7 addr bits) at the **top level**; wire `CLK` 2. Add the **address splitter** (`TA[9:3]` → `A`) 3. Wire `ld`/`sd`: `RD1`→`Din`, `MST`→`str`, `MLD`→`ld`; add `M2R` MUX 4. **Test `ld`/`sd` end-to-end** with a stack program before continuing 5. Add **load logic** (split, word-select on bit 2, sign-extend, `MSZ` MUX) for `lw` 6. Add **store logic** (read-modify-write, DWn MUX on bit 2, `MSZ` MUX) for `sw` 7. Derive `lb`/`sb` using bits `[1:0]` as byte selector --- ## `lb` / `sb`: Same Pattern, Byte Granularity The selector hierarchy mirrors the address bit structure: ```text 64-bit cell (selected by TA[9:3]) | +-- TA bit 2 → picks 32-bit word (lw / sw) | +-- TA bits [1:0] → picks byte within word (lb / sb) ``` - **`lb`**: select byte with bits `[2:0]`, sign-extend 8 → 64, `MSZ = 0b00` - **`sb`**: read-modify-write at byte granularity, preserve 7 bytes, `MSZ = 0b00`, assert both `str` and `ld` --- ## Key Concepts Summary Table | Concept | Key Point | |---------|-----------| | **Byte address** | What registers hold; `TA = base + imm` | | **DW address** | `TA >> 3`; goes to RAM `A` input | | **Address splitter** | Takes `TA[9:3]` → 7 bits → RAM `A` | | **Word selector** | `TA` bit 2: 0 = lower word, 1 = upper word | | **Byte selector** | `TA` bits `[1:0]`: which of 4 bytes | | **Sign extension** | Copies bit 31 (or bit 7) into upper bits | | **Read-modify-write** | `sw`/`sb`: assert both `ld` and `str` | | **`MSZ`** | `00`=byte, `10`=word, `11`=dword | --- ## Practice: Byte → DW Address `ld t0, 16(s0)` with `s0 = 1000` - `TA = 1000 + 16 = 1016` - `DW = 1016 >> 3 = 127` - `A = 0b1111111` (7 bits)
The 1024-byte RAM covers DW addresses 0–127 (byte addresses 0–1023). A stack initialized to 1024 should grow
downward
so accesses land in range.
--- ## Practice: `lw` Word Selection Cell at DW addr 3 holds `0xCAFEF00D_DEADBEEF` | `TA` | DW addr | bit 2 | Selected word | Result (after sign-ext) | |------|---------|-------|---------------|-------------------------| | 24 | `24>>3 = 3` | `0` | `0xDEADBEEF` | `0xFFFFFFFF_DEADBEEF` | | 28 | `28>>3 = 3` | `1` | `0xCAFEF00D` | `0xFFFFFFFF_CAFEF00D` | Both upper words have bit 31 = 1, so sign extension fills upper half with `F`s. --- ## Practice: `sw` Read-Modify-Write Cell at DW addr 2: `0x00000000_FFFFFFFF` Execute `sw rs2, 16(x0)` with `rs2 = 0x80000001` - `TA = 16`, `DW = 2`, bit 2 of 16 = **0** → replace lower word - `W1 = 0x00000000` (preserve) - `Wnew = 0x80000001` - `Din = 0x00000000_80000001` Upper word `0x00000000` is preserved because `ld = 1` drove `Dout`. --- ## Common Mistakes | Mistake | Symptom | Fix | |---------|---------|-----| | Shift by 2 instead of 3 | Addresses are off by 2x | Use bits `[9:3]`, not `[8:2]` | | Wrong word-select bit | `lw` correct for even cells, wrong for odd | MUX selector must be `TA[2]` | | Forgetting `ld` on `sw` | Neighboring word corrupted | Set both `MLD=1` and `MST=1` for `sw` | | RAM inside a sub-circuit | Cannot inspect RAM during simulation | Keep RAM at top level | --- ## Summary 1. **64-bit Digital RAM** (data=64, addr=7) gives 1024 bytes; keep it at the top level 2. **Registers hold byte addresses** — shift right 3 (`TA[9:3]`) to get the 7-bit DW address for RAM `A` 3. **Low address bits are selectors**: bit 2 = word (lower/upper), bits `[1:0]` = byte within word 4. **`ld`/`sd` are straightforward** — ALU → splitter → RAM, `Dout` → `M2R` → register file 5. **`lw` adds load logic after RAM**: split, word-select (bit 2), sign-extend, `MSZ` MUX 6. **`sw` adds store logic before RAM**: read-modify-write with both `ld=1` and `str=1` 7. **New decoder columns**: `MLD`, `MST`, `MSZ`, `M2R` — `MSZ` encoding matches RISC-V `funct3` 8. **`lb`/`sb` follow the same pattern** at byte granularity using bits `[2:0]` as selectors