← Back to Course
# Programs and Data Memory ## CS 315 Computer Architecture --- ## Overview - Wrap a RISC-V function to run **standalone** on the Digital processor - **Size the RAM** and initialize the stack pointer - Apply the **five-step recipe** for adding any new instruction - Design the **data memory subsystem** for `ld` / `sd` - Extend to sub-doubleword: `lw` / `sw` and `lb` / `sb` --- ## Running Functions on the Bare Processor The Digital processor is **not** an OS — there is no loader, `_start`, or C runtime. Four rules to wrap any function: | Rule | Action | |------|--------| | 1. Assembly `main` | Provide entry point; set up arguments | | 2. No `.global` | Drop linker directives | | 3. `jal` not `call` | `call` may expand to `auipc`+`jalr` | | 4. `unimp` end marker | Halts the processor | --- ## Standalone Skeleton ```asm main: li sp, 1024 # top of the RAM li a0, ... # set up arguments jal swap_s # link ra, jump unimp # processor halts here swap_s: # ... function body ... ret ```
unimp
is the end marker — execution stops when the processor fetches it.
--- ## Execution Flow
flowchart TD A["main:"] --> B["li sp, 1024"] B --> C["jal swap_s (ra = PC+4)"] C --> D["swap_s: body runs"] D --> E["ret (jalr ra → back to main)"] E --> F["unimp (processor halts)"]
--- ## Why `jal`, Not `call`? - `call` is a **pseudo-instruction**: assembler may expand to `auipc` + `jalr` - Our processor does **not yet support `auipc`** - `jal` is a single real instruction — fits our datapath directly
For tiny programs (all of Instruction Memory), every label is reachable with
jal
.
--- ## Sizing the RAM We want **1024 bytes** of stack. The RAM has **64-bit cells**: ```text 1024 bytes = 2^10 bytes each cell = 8 bytes = 2^3 bytes cells = 2^10 / 2^3 = 2^7 = 128 cells address bits = 7 ``` | RAM parameter | Value | Meaning | |---------------|-------|---------| | data bits | 64 | one cell = one doubleword | | address bits | 7 | 2^7 = 128 cells | | total bytes | 1024 | 128 × 8 | --- ## Stack Pointer Initialization Stack grows **downward** in RISC-V — `sp` starts at the **top** of RAM. ```text byte addr 1024 <-- sp starts here +-----------------------+ | RAM | 128 cells × 64 bits | (the stack) | | | 0 +-----------------------+ ``` ```asm li sp, 1024 # top-of-RAM byte address ``` First push: `addi sp, sp, -16` → lands at byte 1008 (valid). --- ## Capacity Formula $$\text{total bytes} = 2^{\text{addr bits}} \times \frac{\text{data bits}}{8}$$ Examples: | data bits | addr bits | cells | total bytes | |-----------|-----------|-------|-------------| | 64 | 7 | 128 | 1 024 | | 64 | 8 | 256 | 2 048 | | 64 | 10 | 1 024 | 8 192 | --- ## The Five-Step Recipe Apply this **every time** you add a new instruction:
flowchart TD S1["1. Pick an instruction (or group)"] --> S2 S2["2. Add / modify components"] --> S3 S3["3. Extend the datapath (add MUX inputs)"] --> S4 S4["4. Update decoder spreadsheet + ROM"] --> S5 S5["5. Test"] S5 -.->|"next instruction"| S1
--- ## Recipe: Key Principles - **Step 3 — MUX inputs**: new data sources go on *new* MUX inputs; existing inputs are unchanged - **Step 4 — Spreadsheet first**: add the new instruction row AND set new control columns to `0` for all existing instructions - **Step 5 — Incremental**: run the autograder after *each* instruction group before moving on
Keeping the
RAM at the top level
of the circuit lets you inspect its contents during simulation.
--- ## Project 6 Layout ```text project06-
/ ├── part1/ # *.dig + *.hex (snapshot 1) ├── part2/ # *.dig + *.hex (snapshot 2) ├── part3/ # *.dig + *.hex (snapshot 3) └── final/ # *.dig + *.hex (finished) ``` - Run autograder **per directory**: `grade test -p project06` - Decoder spreadsheet: **one workbook, four sheets** (`part1`, `part2`, `part3`, `final`) - Submit `.xlsx` and PDF export --- ## `ld` and `sd` Instructions ```asm ld t0, 8(sp) # t0 = memory[sp + 8] (64-bit load) sd t0, 8(sp) # memory[sp + 8] = t0 (64-bit store) ``` **Address computation**: `target_addr = base + offset` - Reuses the **ALU** (add operation) - Input A = base register (`RD0`) - Input B = sign-extended immediate (`imm-I` for loads, `imm-S` for stores) --- ## The Byte-Address Problem
Registers hold
byte addresses
. The RAM is indexed by
doubleword (cell) number
.
Fix: drop the low 3 bits (divide by 8): ```text DW address = byte_address >> 3 = byte_address bits [9:3] (7 bits for 128-cell RAM) ``` Implement in Digital with a **splitter**: wire bits `[9:3]` of the 64-bit ALU result to the 7-bit RAM `A` input. --- ## Byte → DW Address Conversion
flowchart LR ALU["ALU result\n(64-bit byte addr)"] --> SPL["splitter:\nbits 9..3"] SPL --> RAMA["RAM A\n(7-bit DW addr)"]
- Bits `0..2` are the byte offset *within* a doubleword — ignored for `ld`/`sd` (8-byte aligned) - Bits `3..9` are the cell index (0..127) --- ## RAM (Separated Ports) Ports | Port | Width | Dir | Purpose | |------|-------|-----|---------| | `A` (ADDR) | 7 | in | DW address | | `Din` | 64 | in | data to write | | `Dout` | 64 | out | data read | | `str` | 1 | in | store (write) enable | | `ld` | 1 | in | load (read) enable | | `clk` | 1 | in | clock | Configure: **data bits = 64**, **address bits = 7** --- ## Write-Back: M2R / WDsel MUX For a load, `Dout` must reach the register file — but `WD` already carries the ALU result. ```text ALU result --> | 0 | | M2R MUX | --> | 0 | RAM Dout --> | 1 | | WDsel MUX | --> RegFile WD ^ | 1 PC+4 | M2R +-----------+ ^ WDsel ``` New sources always go on **new MUX inputs** — existing instructions unaffected. --- ## `ld` / `sd` Datapath
flowchart LR RD0["RD0\n(base)"] --> ALU["ALU\n(add)"] IMM["imm\n(offset)"] --> ALU ALU --> CONV["bits 9..3\n(splitter)"] CONV --> RAMA["RAM A\n7-bit"] RD1["RD1\n(store value)"] --> DIN["RAM Din"] RAMA --> RAM["RAM\n64×128"] DIN --> RAM RAM --> DOUT["RAM Dout"] DOUT --> M2R["M2R/WDsel\nMUX"] M2R --> WD["RegFile\nWriteData"]
--- ## Control Lines: `ld` and `sd` | inst | `ld` (read) | `str` (write) | `M2R` | `RFW` | |------|-------------|---------------|-------|-------| | `ld` | 1 | 0 | 1 (RAM Dout) | 1 | | `sd` | 0 | 1 | don't care | 0 | - `ld`: read RAM, route `Dout` to register file - `sd`: write whole cell; no register file write These become new columns in the decoder spreadsheet (Step 4). --- ## Sub-Doubleword: `lw` / `sw` A 64-bit cell holds **two 32-bit words**: ```text 63 32 31 0 +------------------+------------------+ | upper word (W1) | lower word (W0) | +------------------+------------------+ word index 1 word index 0 ^--- selected by byte-address bit 2 ``` - **Load** (`lw`): read cell, select half, sign-extend 32→64 - **Store** (`sw`): **read-modify-write** the cell --- ## Load Word Path
flowchart LR DOUT["RAM Dout\n(64-bit)"] --> SPL["split:\nW0 bits 31..0\nW1 bits 63..32"] SPL --> WMUX["word MUX\n(sel = bit 2)"] WMUX --> SX["sign-extend\n32 → 64"] SX --> MSZ["MSZ MUX\nlb/lw/ld"] DOUT --> MSZ MSZ --> M2R["to M2R/WDsel\n→ RegFile"]
MSZ encodings (from `funct3`): `lb = 00`, `lw = 10`, `ld = 11` --- ## Read-Modify-Write for `sw` The RAM can only write **full 64-bit cells** — we must preserve the half we are not changing. ```text 1. READ current cell D64cur (ld=1) 2. MODIFY: option A (write lower): { W1 : Wnew } option B (write upper): { Wnew : W0 } word MUX selects A or B via byte-addr bit 2 3. WRITE merged value back (str=1) ```
For
sw
: set
both
ld=1
and
str=1
simultaneously.
--- ## Store Word Path
flowchart TD CUR["RAM Dout\nD64cur"] --> S1["split: W0, W1"] RD1["RD1 = D64in"] --> S2["Wnew = low 32 bits"] S1 --> M1["merge\nW1 : Wnew"] S2 --> M1 S1 --> M2["merge\nWnew : W0"] S2 --> M2 M1 --> WMUX["word MUX\n(sel = bit 2)"] M2 --> WMUX WMUX --> MSZ["MSZ MUX\nsb/sw/sd"] MSZ --> DIN["RAM Din\n(str=1, ld=1)"]
--- ## `lb` / `sb` by Analogy Same pattern, finer granularity: | op | granularity | selector bits | sign-ext | |----|-------------|---------------|----------| | `ld`/`sd` | 64-bit cell | none | no | | `lw`/`sw` | 32-bit half | bit 2 | 32→64 | | `lb`/`sb` | 8-bit byte | bits 2..0 | 8→64 | Once `lw`/`sw` works, `lb`/`sb` follows the same structure with an 8-way byte-select MUX. --- ## Control Lines: Full Table | inst | `ld` | `str` | `RFW` | notes | |------|------|-------|-------|-------| | `ld` | 1 | 0 | 1 | read cell → RF | | `sd` | 0 | 1 | 0 | write whole cell | | `lw` | 1 | 0 | 1 | read, extract, sign-ext | | `sw` | 1 | 1 | 0 | read-modify-write | | `lb` | 1 | 0 | 1 | read, extract byte, sign-ext | | `sb` | 1 | 1 | 0 | read-modify-write (byte) | --- ## Memory Size Hierarchy | Mnemonic | Bits | Alignment | Cell selector | |----------|------|-----------|---------------| | `lb`/`sb` | 8 | 1 byte | addr bits `2..0` | | `lw`/`sw` | 32 | 4 bytes | addr bit `2` | | `ld`/`sd` | 64 | 8 bytes | none (whole cell) |
Key invariant
: addresses in registers are always
byte
addresses. The DW cell index = byte address >> 3.
--- ## Sign Extension Recap Loads sign-extend the loaded value to 64 bits: ```text lb example: read byte = 0b1111_1110 (= -2 signed) sign bit = 1 result = 0xFFFF_FFFF_FFFF_FFFE (= -2 as 64-bit) ``` Trick: shift fully left to put sign bit at MSB, then arithmetic shift right back. --- ## Debugging Tips - Keep the **RAM at the top level** — only then can you open it and inspect the stack during simulation - **Single-step** the clock; use **probes** on intermediate wires - Compare against `objdump` output — match each instruction to its expected behavior - Add a new partial directory (`part2/`, `part3/`) **before** making the next change --- ## Summary 1. **Standalone packaging**: assembly `main`, no `.global`, `jal` not `call`, `unimp` end marker 2. **Stack pointer**: `li sp, 1024` — top of a 1024-byte RAM (64 data bits, 7 addr bits) 3. **Capacity**: total bytes = 2^(addr bits) × (data bits / 8) 4. **Five-step recipe**: pick → components → datapath (new MUX inputs) → decoder → test 5. **Byte → DW address**: splitter takes bits `[9:3]` of the 64-bit ALU result 6. **`ld`/`sd`**: ALU computes address; M2R MUX routes `Dout` to RF for loads 7. **Sub-doubleword**: `lw`/`sw` use word-index bit; `sw`/`sb` need read-modify-write (`ld=1, str=1`) 8. **Incremental builds**: `part1`–`final` directories; four-sheet decoder spreadsheet