Programs and Data Memory¶

Overview¶

This lecture prepares us to finish the single-cycle RISC-V processor by adding the last major component: data memory. We first cover how to package a RISC-V function so that it runs standalone on our Digital processor (writing an assembly main, dropping .global, and using jal/unimp). We then review the disciplined recipe for adding any new instruction to the datapath, and finally we design the data-memory subsystem for ld/sd, working out the byte-address-to-doubleword-address conversion, the load path, the store (read-modify-write) path, and the control lines that drive it all. The handwritten notes also cover Project 6 logistics and the Lab 10 spreadsheet that seeds the instruction decoder.

Learning Objectives¶

Package a RISC-V function so it runs standalone on the Digital processor (assembly main, no .global, jal not call, unimp end marker)
Initialize the stack pointer correctly given the size of the Data Memory RAM
Compute the relationship between RAM data bits, address bits, and total capacity
Apply the five-step recipe for adding a new instruction to the processor (pick, components, datapath, decoder, test)
Explain why a byte address must be converted to a doubleword (DW) address before indexing the RAM, and do the conversion with a shift/splitter
Design the ld/sd datapath, including the M2R / WDsel write-back MUX and the ld/str/clk control lines
Explain the read-modify-write pattern required to implement sw/sb on a 64-bit-wide RAM
Build Project 6 incrementally with part1–final directories and a four-sheet decoder spreadsheet

Prerequisites¶

RISC-V assembly: registers, calling convention, the stack (Lab 01, Lab 03)
Two's complement and sign extension (Lab 02)
The single-cycle processor datapath: PC, Instruction Memory, Register File, ALU (Processor Guide Part 1)
The instruction decoder spreadsheet methodology and control lines (Processor Guide Part 2)
Digital components: registers, ROM, RAM (separated ports), MUX, splitter, comparator
Byte-addressable memory and word layout

1. Project 6 Logistics and Incremental Development¶

Before diving into data memory, the session opened with how Project 6 should be organized. The single most important idea is incremental development: do not try to build the entire instruction decoder and full datapath at once. Start from the Lab 10 spreadsheet (which already maps a handful of instructions to control lines) and grow the circuit one instruction (or group of instructions) at a time, testing after every step.

Directory layout¶

Project 6 requires at least three partial implementations plus a final one, each in its own directory and each containing the complete set of circuit files for that stage:

project06-<github_userid>/
├── part1/      # *.dig files + *.hex files
├── part2/      # *.dig files + *.hex files
├── part3/      # *.dig files + *.hex files
└── final/      # *.dig files + *.hex files

This snapshots your progress so the graders can see evidence of incremental work (worth 10 points of the rubric). You may add more than three partial stages if you like.

Running the autograder¶

The autograder runs per directory. From inside a partial directory you run:

cd part1
grade test -p project06

The -p project06 flag tells the grader which test set to use. (One of the action items from this session was fixing -p so it works correctly inside subdirectories.)

Lab 10 observations and the spreadsheet¶

Two practical observations carried over from Lab 10:

Use a Digital Register for state elements (the PC, and any pipeline-style holding registers) — this improves robustness and avoids subtle latch issues.
Spreadsheets drive the instruction decoder. For Project 6 you maintain one workbook with four sheets named part1, part2, part3, and final. Each sheet is the decoder ROM contents for that stage. You submit both the .xlsx and a PDF export so the incremental decoder development is visible.

A potential redundancy was flagged in the Lab 10 spreadsheet — two control columns (informally "PC Select" and "WD Select") may always carry the same value, in which case one could be derived from the other. This is the kind of optimization you can make once a stage is working, never before.

2. Preparing RISC-V Functions to Run on the Processor¶

Our processor is not an operating system. There is no loader, no _start, no C runtime, and no ret address magically sitting in ra when execution begins. So a function that compiled and ran fine under the emulator or on Linux needs a small amount of wrapping before it can run on the bare Digital processor.

There are four rules.

Rule 1 — Write an assembly `main`¶

Execution begins at the first instruction in Instruction Memory. You must provide a main that sets up arguments and then calls the function under test. The main is responsible for putting parameters into a0, a1, ... before the call.

Rule 2 — No `.global`¶

.global is a directive for the linker, telling it which symbols are visible to other translation units. We assemble a single self-contained program and there is no linker step that needs it, so drop all .global directives from your test programs.

Rule 3 — Use `jal`, not `call`¶

call is a pseudo-instruction that the assembler may expand into a two-instruction sequence (auipc + jalr) so it can reach symbols anywhere in a 32-bit address space. We do not yet support auipc, and our programs are tiny, so use the real instruction jal (jump-and-link) which fits our datapath directly.

Rule 4 — End with `unimp`¶

There is no return to an OS. We mark the end of the program with unimp, which our processor recognizes as the end marker and uses to stop fetching/executing.

Putting it together¶

main:
        # initialize the stack pointer
        li    sp, 1024        # top of the RAM (see Section 3)

        jal   swap_s          # call the function under test

        unimp                 # end marker -- stop the processor

swap_s:
        # ... the function we actually want to exercise ...
        ret

Compare this with a function written for the emulator, which would use .global swap_s and be invoked with call swap_s. The handwritten notes show exactly this skeleton: li sp, 1024, jal swap_s, unimp, then the swap_s: label.

flowchart TD
    A["main:"] --> B["li sp, 1024  (init stack pointer)"]
    B --> C["jal swap_s  (link return addr into ra, jump)"]
    C --> D["swap_s:  body runs..."]
    D --> E["ret  (jalr to ra, back into main)"]
    E --> F["unimp  (end marker: processor halts)"]

3. Sizing the RAM and Initializing the Stack Pointer¶

Our test programs use Data Memory as a stack. The stack pointer sp must start at a valid top-of-memory address. To pick that number we need to know how big the RAM is.

Capacity arithmetic¶

The RAM we use is configured with two parameters:

data bits — the width of one memory cell. We use 64 so that a single cell holds one doubleword (perfect for ld/sd).
address bits — how many cells the RAM has. With n address bits the RAM has 2^n cells.

The total number of bytes is (number of cells) × (bytes per cell). With 64-bit cells, each cell is 8 bytes = 2^3 bytes.

The handwritten notes work a concrete example: we want 1024 bytes of stack memory.

We want 1024 bytes of stack.
1024 = 2^10 bytes
Each cell holds 8 bytes = 2^3 bytes
Number of cells = 2^10 / 2^3 = 2^7 = 128 cells
=> address bits = 7

So:   data bits  = 64
      addr bits  = 7      (2^7 = 128 cells x 8 bytes = 1024 bytes)

RAM parameter	Value	Meaning
data bits	64	one cell = one doubleword (8 bytes)
address bits	7	`2^7 = 128` cells
cells	128	`2^7` doublewords
total bytes	1024	`128 × 8 = 2^7 × 2^3 = 2^10`

Where the stack pointer starts¶

The stack grows downward (toward lower addresses) in RISC-V. So sp is initialized to the top of the RAM region — the byte address just past the last usable byte. In the notes sp => points to the top of the RAM box, and li sp, 1024 sets it there.

byte addr
 1024  <- sp starts here (top); stack grows DOWN
       +-----------------------+
       |                       |
       |        RAM            |   128 cells x 64 bits
       |     (the stack)       |
       |                       |
    0  +-----------------------+

A function that allocates a frame does addi sp, sp, -N (N a multiple of 16), uses the space, then addi sp, sp, N on the way out. Because we initialize sp to the very top, the first push lands at the highest valid doubleword.

Note: a real machine would not put sp at byte 1024 of a tiny RAM, but for our simulated stack this is exactly the right idea — the only requirement is that sp and the addresses it produces stay within 0 .. 1023.

4. The Recipe for Adding a New Instruction¶

The bulk of Project 6 is repeatedly extending the processor to support more instructions. The lecture gave a five-step recipe that you apply every time. Internalize it — every new instruction (including the data-memory ones we design below) follows it.

flowchart TD
    S1["1. Pick an instruction (or group of instructions)"] --> S2
    S2["2. Modify or add components"] --> S3
    S3["3. Modify or add to the datapath"] --> S4
    S4["4. Extend the instruction decoder / control lines"] --> S5
    S5["5. Test"]
    S5 -.->|"repeat for next instruction"| S1

Step 1 — Pick an instruction or group¶

Choose one instruction, or a tightly related group (e.g. all the conditional branches, or ld+sd together). Small steps make debugging tractable.

Step 2 — Modify or add components¶

Decide what hardware the instruction needs. Maybe the ALU needs a new operation, maybe you need a new sub-circuit (like a Branch Unit), or a RAM, or a sign-extender. Add it as a clean, labeled component.

Step 3 — Modify or add to the datapath¶

Connect the new component's data wires. Usually you will need to extend or add MUXes — whenever a value can now come from more than one place, a MUX selects between them. The key principle the instructor underlined:

Always add new datapath wires to new MUX inputs.

That is, when you widen a MUX, the existing input(s) keep their original meaning and the new source goes on a new input line. This keeps every previously-working instruction working without change.

Step 4 — Extend the instruction decoder / control lines¶

New components and MUXes need new control signals. The workflow is:

Update the spreadsheet — add the new instruction row(s) and any new control-output column(s), filling in values for the new instruction and setting the new columns to their no-op value (almost always 0) for every existing instruction.
Update the instruction decoder — regenerate the ROM hex from the spreadsheet, expand the priority-encoder inputs and the ROM output split so the new control bits reach their MUXes/components.

Because new MUX inputs are added as higher-numbered inputs and the original source is input 0, existing instructions can usually leave the new control line at 0 and behave exactly as before.

Step 5 — Test¶

Run the autograder and any hand tests for just this instruction before moving on. The session emphasized single-stepping and probes as the core debugging techniques: add probes on the datapath to watch intermediate values, single-step the clock, and compare against an objdump of the test program so you can match each executing instruction to its expected behavior.

5. Data Memory: `ld` / `sd`¶

Now we add the Data Memory subsystem. We start with the doubleword instructions ld (load doubleword) and sd (store doubleword) because the RAM is 64 bits wide, so a doubleword maps to exactly one cell — no extraction or masking needed yet.

The instruction form¶

ld   t0, 8(sp)        # t0 = memory[sp + 8]   (64-bit load)
sd   t0, 8(sp)        # memory[sp + 8] = t0   (64-bit store)

Both are computed the same way the notes spell out:

ld t0, 8(sp)
       ^   ^  ^
   dest |  |  base register (address)
        offset

target_addr = base + offset      (= sp + 8)

The target address is base + offset. We already have an adder that can do this: the ALU. So ld/sd reuse the ALU (with an add operation) to compute the memory address, where input A is the base register value (RD0) and input B is the sign-extended I-type immediate (imm-I) for loads, or the S-type immediate for stores.

flowchart LR
    RD0["RD0 (base, e.g. sp)"] --> ALUA["ALU input A"]
    IMM["imm-I / imm-S (offset)"] --> ALUB["ALU input B"]
    ALUA --> ALU["ALU (add)"]
    ALUB --> ALU
    ALU --> TA["target byte address"]

The byte-address to doubleword-address problem¶

Here is the crucial subtlety the lecture spent time on. All addresses in registers are byte addresses. The ALU therefore produces a byte address. But our RAM is indexed by cell (doubleword) number — its A (ADDR) input expects a DW address, not a byte address.

Two facts make the fix simple:

Convert byte address to DW address. Each cell is 8 bytes, so the DW address is byte_addr / 8, i.e. a right shift by 3 (>> 3). Equivalently, you drop the low 3 bits with a splitter.
We only need 7 bits. The RAM has 2^7 = 128 cells, so the DW address is only 7 bits wide. The ALU emits a 64-bit byte address; we take bits 3..9 (the 7 bits above the low 3) to get the 7-bit DW index.

The handwritten diagram shows this as a hatched conversion block sitting between the 64-bit ALU output and the 7-bit RAM A input, annotated 3-9 (take bits 3 through 9) and labeled:

1) byte addr -> dw addr      (shift right by 3, or drop low 3 bits)
2) only need 7 bits          (because 2^7 = 128 cells)

 ALU result (64-bit byte address)
        |
        v
  bits [9:3]            <- splitter: take bits 3..9 (drop low 3, keep 7)
        |
        v  (7 bits = DW address)
   RAM "A" (ADDR) input

In Digital you implement this with a splitter: input is the 64-bit byte address, and you wire out bits 3 through 9 as a 7-bit bus into the RAM A input. (Bits 0–2 are the byte offset within a doubleword and are ignored for a ld/sd, which must be 8-byte aligned.)

The RAM component and its ports¶

We use Digital's RAM (Separated Ports) component, configured with data bits = 64 and address bits = 7. Its ports, as drawn in the notes:

Port	Width	Direction	Purpose
`A` (ADDR)	7	in	DW address to read/write
`Din`	64	in	data to write (for store)
`Dout`	64	out	data read at `A`
`str`	1	in	store enable (write)
`ld`	1	in	load enable (read)
`clk`	1	in	clock

Important: keep the RAM at the top level of your processor circuit. That is the only way to open and inspect the RAM contents during simulation, which is invaluable for debugging the stack.

Routing the load result back: the M2R / WDsel MUX¶

For a load, the value read from Dout must be written into the register file (t0 in ld t0, 8(sp)). But the register file's write-data line (WD) already receives the ALU result (for arithmetic and jal). So we need a MUX to choose the write-back source.

The guide gives two options, both shown in the notes as the "RegFile WDsel MUX":

Expand the WDsel MUX to add a third input (the RAM Dout), controlled by a wider WDsel (2-bit) selector.
Add a new two-input M2R (Memory-to-Register) MUX that chooses between the ALU result and RAM Dout based on a new M2R control line, then feed its output into the existing WDsel MUX.

Either way the principle from Section 4 holds: the RAM output goes on a new MUX input; the ALU result stays where it was.

                          +-----------+
   ALU result ----------> | 0         |
                          |  M2R MUX  |--+
   RAM Dout (load) -----> | 1         |  |     +-----------+
                          +-----------+  +---> | 0         |
                            ^                  | WDsel MUX |--> RegFile WD
                            M2R                | 1  PC+4   |
                                               +-----------+
                                                  ^
                                                  WDsel

Putting the `ld`/`sd` datapath together¶

flowchart LR
    RD0["RD0 (base addr)"] --> ALU
    IMM["imm (offset)"] --> ALU
    ALU["ALU (add)"] --> CONV["byte->DW: bits[9:3]"]
    CONV --> A["RAM A (7-bit DW addr)"]
    RD1["RD1 (value to store)"] --> DIN["RAM Din (64)"]
    A --> RAM["RAM 64x128"]
    DIN --> RAM
    RAM --> DOUT["RAM Dout (64)"]
    DOUT --> M2R["M2R / WDsel MUX"]
    M2R --> WD["RegFile WriteData"]

Control lines for `ld` and `sd`¶

Two RAM control inputs drive the operation: ld (read enable) and str (write enable). Plus the write-back selector (WDsel, 2 bits, or M2R, 1 bit). The handwritten control-line table:

inst	`WDsel` (2) or `M2R` (1)	`ld`	`str`
`ld`	select RAM `Dout`	1	0
`sd`	(don't care — no write to RF)	0	1

So a ld reads the RAM (ld=1, str=0) and routes Dout to the register file; an sd writes the RAM (ld=0, str=1) and does not write the register file (RFW=0, so WDsel/M2R is a don't-care). These are exactly the new control-output columns you add to the decoder spreadsheet in Step 4 of the recipe.

6. `lw` / `sw` and `lb` / `sb`: Sub-Doubleword Access¶

Once ld/sd work, we extend to smaller sizes: word (lw/sw, 32 bits) and byte (lb/sb, 8 bits). The challenge: our RAM cell is 64 bits, but a word is only half a cell and a byte is one-eighth. We must extract or insert the right slice after (for loads) or before (for stores) the RAM, keeping the RAM itself unchanged at 64-bit width and at the top level.

The handwritten page for lw/sw draws a 64-bit cell being split into two 32-bit halves and selected by a MUX — that is the load path.

Word addressing inside a doubleword¶

A 64-bit doubleword holds two 32-bit words. Given a byte address:

bits 0..1 = byte index within a word
bit 2 = word index within the doubleword (0 = lower word, 1 = upper word)
bits 3.. = doubleword (cell) index

So once we read the 64-bit cell (using bits [9:3] for the DW address as before), bit 2 of the byte address tells us which 32-bit half we want.

 64-bit cell (one RAM doubleword)
 +---------------------+---------------------+
 |  upper word (32)    |  lower word (32)    |
 |   bits 63..32       |   bits 31..0        |
 +---------------------+---------------------+
       word index 1          word index 0
                  ^
       selected by byte-address bit 2

Load word (`lw`) — extract then sign-extend¶

1. ALU computes target byte address (4-byte aligned for lw).
2. Splitter bits[9:3] -> DW address -> RAM A; RAM ld=1.
3. Split RAM Dout (64) into W0 = bits[31:0] and W1 = bits[63:32].
4. MUX selects W0 or W1 using byte-address bit 2 (2-2) as selector.
5. Sign-extend the chosen 32-bit value to 64 bits.
6. A memory-size (MSZ) MUX selects the final 64-bit value:
      lb  -> 8-bit  sign-extended to 64
      lw  -> 32-bit sign-extended to 64
      ld  -> full 64-bit value
7. That value goes to the M2R / WDsel MUX -> RegFile.

Following the RISC-V funct3 ordering, the MSZ (memory size) encodings are: lb = 0b00, lw = 0b10, ld = 0b11.

flowchart LR
    DOUT["RAM Dout (64)"] --> SPL["split: W0[31:0], W1[63:32]"]
    SPL --> WMUX["word MUX (sel = byte addr bit 2)"]
    WMUX --> SX["sign-extend 32 -> 64"]
    SX --> MSZ["MSZ MUX (lb / lw / ld)"]
    DOUT --> MSZ
    MSZ --> M2R["to M2R / WDsel MUX -> RegFile"]

Store word (`sw`) — read-modify-write¶

Storing a word is harder than loading because we can only write a whole 64-bit cell, but we want to change just one 32-bit half and leave the other half untouched. The answer is the read-modify-write pattern the session introduced:

Read the current 64-bit cell (D64cur) at the target DW address.
Modify only the selected 32-bit half, inserting the new word (Wnew from RD1).
Write the merged 64-bit value back to the same cell.

Because we read and write the same cell in one clock cycle, you must set both RAM control lines: ld = 1 and str = 1.

The splitter/merger construction from the guide:

D64cur (current cell) --split--> W0 (bits 31..0), W1 (bits 63..32)
D64in  (RD1)          --split--> Wnew (low 32 bits we want to write)

Build two candidate 64-bit values with mergers:
    option A (write lower word):  { W1 : Wnew }   keep upper, replace lower
    option B (write upper word):  { Wnew : W0 }   keep lower, replace upper

word MUX selects A or B using byte-address bit 2
        |
        v
   MSZ MUX selects sb / sw / sd value
        |
        v
   RAM Din  (with ld=1, str=1)

flowchart TD
    CUR["RAM Dout = D64cur"] --> S1["split: W0, W1"]
    RD1["RD1 = D64in"] --> S2["take Wnew = low 32"]
    S1 --> M1["merge {W1:Wnew}"]
    S2 --> M1
    S1 --> M2["merge {Wnew:W0}"]
    S2 --> M2
    M1 --> WMUX["word MUX (sel = bit 2)"]
    M2 --> WMUX
    WMUX --> MSZ["MSZ MUX (sb/sw/sd)"]
    MSZ --> DIN["RAM Din (str=1, ld=1)"]

`lb` / `sb` by analogy¶

Byte access works the same way at finer granularity. For lb, use byte-address bits 0..2 to select one of the eight bytes in the cell, then sign-extend 8 -> 64. For sb, read-modify-write the chosen byte, keeping the other seven. The lecture noted that once you have lw/sw working, lb/sb follow the same structure with a byte-selecting MUX instead of a word-selecting one.

Why a 64-bit value, not just 32¶

The notes' rightmost sketch reinforces the abstraction: a word is conceptually one unit, but in our 64-bit memory it lives inside a 64-bit cell. When we load a word we get back a 64-bit value (the cell), pick the relevant 32 bits, and sign-extend back to a full 64-bit register value — because RISC-V registers are 64 bits wide and signed loads sign-extend.

7. Memory Sizes, Alignment, and Sign Extension Recap¶

A consolidated view of the size hierarchy our memory subsystem must handle:

Mnemonic	Size	Bits	Alignment (byte addr multiple of)	Selector inside cell
`lb`/`sb`	byte	8	1	byte-addr bits `2..0`
`lw`/`sw`	word	32	4	byte-addr bit `2`
`ld`/`sd`	doubleword	64	8	none (whole cell)

Key invariants to keep straight:

Addresses in registers are always byte addresses. Every conversion to a cell index happens just before the RAM A input.
The DW (cell) index = byte address >> 3 = byte-address bits [9:3] for our 128-cell RAM.
Loads sign-extend the loaded value to 64 bits (these are the signed lb/lw/ld). To sign-extend an n-bit value to 64 bits you replicate the top (sign) bit. In bit-shift terms: shift the value all the way left, then shift-right-arithmetic all the way back (the same trick from Lab 02).
Stores below 64 bits require read-modify-write because the RAM is written one full cell at a time.

Sign extension example (8 -> 64), the lb case:
  byte read = 0b1111_1110   (= -2 as a signed byte)
  sign bit  = 1
  result    = 0xFFFF_FFFF_FFFF_FFFE   (= -2 as a signed 64-bit value)

Key Concepts¶

Concept	Definition	Example
Standalone program	A RISC-V program wrapped to run on the bare processor: assembly `main`, no `.global`, `jal` not `call`, `unimp` end marker	`main: li sp,1024; jal swap_s; unimp`
`unimp` end marker	Instruction the processor recognizes to stop executing	terminates the program after the last call returns
Byte address	An address that counts individual bytes; what all registers hold	`sp + 8` from `ld t0, 8(sp)`
Doubleword (DW) address	A cell index into 64-bit-wide RAM = byte address >> 3	byte addr `16` -> DW addr `2`
byte->DW conversion	Drop low 3 bits (÷8) and keep enough high bits for the RAM	splitter taking bits `[9:3]` (7 bits)
RAM data/addr bits	Cell width and number of cells; capacity = `2^addr × (data/8)` bytes	64 data bits, 7 addr bits → 1024 bytes
M2R / WDsel MUX	MUX that selects the register write-back source (ALU result vs. RAM `Dout`)	`M2R=1` routes `Dout` to `WD` for `ld`
`ld` / `str` lines	RAM read-enable and write-enable control lines	`ld`: `ld=1,str=0`; `sd`: `ld=0,str=1`
Read-modify-write	Read a 64-bit cell, replace a sub-field, write it back	`sw`/`sb` set both `ld=1` and `str=1`
Word index bit	Byte-address bit 2 selects which 32-bit half of a cell	`bit2=0` lower word, `bit2=1` upper word
MSZ (memory size)	Control selecting byte/word/doubleword path	`lb=0b00`, `lw=0b10`, `ld=0b11`

Practice Problems¶

Problem 1: Size the RAM and place the stack pointer¶

You want 2 KiB (2048 bytes) of stack memory in a RAM with 64 data bits. How many address bits does the RAM need, how many cells does it have, and what value should li sp, ... use to put sp at the top?

Click to reveal solution

Each cell is 64 bits = 8 bytes = `2^3` bytes.

total bytes = 2048 = 2^11
cells       = 2^11 / 2^3 = 2^8 = 256 cells
address bits = 8

So configure the RAM with **data bits = 64, address bits = 8** (256 cells). The stack grows downward, so initialize `sp` to the top byte address:

li sp, 2048

The first push (`addi sp, sp, -16`) then lands inside the valid range `0..2047`.

Problem 2: Convert a byte address to a DW address¶

A program executes ld t0, 24(sp) where sp = 1000. What is the target byte address, and what DW (cell) index goes into the RAM A input for our 7-bit-address, 128-cell RAM?

Click to reveal solution

target byte address = base + offset = sp + 24 = 1000 + 24 = 1024

Wait — `1024` is one past the top of a 1024-byte RAM, so in practice `sp` would be lower; treat this as pure arithmetic. The DW index is the byte address divided by 8 (shift right 3):

DW address = 1024 >> 3 = 128

`128` does not fit in 7 bits (valid DW indices are `0..127`), which signals an out-of-range access. For an in-range example, `ld t0, 16(sp)` with `sp = 1000` gives byte addr `1016`, DW addr `1016 >> 3 = 127` — the last valid cell. The splitter wires byte-address bits `[9:3]` to the 7-bit `A` input; bits `0..2` (here `0`) are ignored because `ld` is 8-byte aligned.

Problem 3: Control-line table¶

Fill in the RAM control lines (ld, str) and whether the register file is written (RFW) for each of: ld, sd, lw, sw.

Click to reveal solution

| inst | `ld` (read) | `str` (write) | `RFW` | notes | |------|-------------|---------------|-------|-------| | `ld` | 1 | 0 | 1 | read cell, write result to RF | | `sd` | 0 | 1 | 0 | write whole cell, no RF write | | `lw` | 1 | 0 | 1 | read cell, extract+sign-ext word to RF | | `sw` | 1 | 1 | 0 | read-modify-write: read cell, merge word, write back | The surprise is `sw`: because we change only 32 of the 64 bits, we must **read** the current cell *and* **write** the merged value in the same cycle, so both `ld=1` and `str=1`.

Problem 4: Why read-modify-write?¶

The RAM is 64 bits wide. Explain why sw t0, 0(sp) cannot simply write t0's low 32 bits into the RAM, and describe the merge for storing into the upper word of a cell whose current value is 0xAAAA_AAAA_BBBB_BBBB with t0 = 0x1111_1111.

Click to reveal solution

The RAM has no way to write *just* 32 bits — its `Din` is a full 64-bit cell and a write replaces the entire cell. If we wrote only the low 32 bits, the upper 32 bits would have to come from somewhere; whatever we drove on the rest of `Din` would clobber the other word in the cell. So we must preserve the half we are not changing. For the upper word (byte-address bit 2 = 1) we keep the **lower** word `W0` and replace the **upper** word with `Wnew`:

D64cur = 0xAAAA_AAAA_BBBB_BBBB
W1 (upper) = 0xAAAA_AAAA   (discarded)
W0 (lower) = 0xBBBB_BBBB   (kept)
Wnew (low 32 of t0) = 0x1111_1111

merged = { Wnew : W0 } = 0x1111_1111_BBBB_BBBB

This merged value is written back with `ld=1, str=1` in the same cycle. The lower word `0xBBBB_BBBB` is preserved; only the upper word changed.

Problem 5: Trace a standalone program¶

What is wrong with running this program on the Digital processor, and how do you fix it so it runs standalone?

.global swap_s
swap_s:
        ld   t0, 0(a0)
        ld   t1, 0(a1)
        sd   t1, 0(a0)
        sd   t0, 0(a1)
        ret

Click to reveal solution

Two problems: there is no `main` to set up arguments and start execution, and it uses `.global`. There is also no end marker, and `ra` is never initialized so `ret` has nowhere valid to return to. Fixed standalone version:

main:
        li    sp, 1024        # top of RAM; gives swap_s a place to read/write
        li    a0, 992         # address of first value on the stack
        li    a1, 1000        # address of second value
        jal   swap_s          # link return into ra, jump
        unimp                 # end marker -- processor halts here

swap_s:
        ld    t0, 0(a0)
        ld    t1, 0(a1)
        sd    t1, 0(a0)
        sd    t0, 0(a1)
        ret                   # jalr to ra -> back into main, then unimp

We removed `.global`, added a `main` that initializes `sp` and the argument registers, used `jal` to set `ra`, and ended with `unimp`. Now `ret` returns into `main` and the next instruction (`unimp`) stops the processor.

Problem 6: Apply the five-step recipe¶

You are about to add lw/sw to a processor that already supports ld/sd. Walk through the five-step recipe and name the concrete change at each step.

Click to reveal solution

1. **Pick** — choose the group `lw` + `sw` (related word-size memory ops). 2. **Components** — add a 32-bit sign-extender for the load path; reuse the existing RAM (no width change). 3. **Datapath** — on the load side, split `Dout` into two 32-bit halves and add a word-select MUX (selector = byte-addr bit 2), feed into a new MSZ MUX. On the store side, add the read-modify-write merge (splitters + mergers + word-select MUX + MSZ MUX) before `Din`. All new sources go on *new* MUX inputs. 4. **Decoder** — add `lw` and `sw` rows to the spreadsheet; add/extend the `MSZ` control column (`lb=0b00, lw=0b10, ld=0b11`); set `sw` to `ld=1, str=1` (read-modify-write) and `lw` to `ld=1, str=0`. Regenerate the ROM hex and widen the priority encoder and ROM output split. 5. **Test** — `grade test -p project06` in the relevant directory; single-step a failing word load/store with probes on the word-select MUX, the sign-extender, and `Din`/`Dout`.

Summary¶

Standalone packaging: to run a function on the bare processor, write an assembly main, remove .global, use jal instead of call, and end with unimp.
Stack pointer init depends on RAM size: with 64 data bits and 7 address bits the RAM holds 2^7 × 8 = 1024 bytes, so li sp, 1024 places sp at the top of the downward-growing stack.
Capacity arithmetic: total bytes = 2^(addr bits) × (data bits / 8); choose addr bits to get the stack size you need.
Five-step recipe for every new instruction: pick it, add/modify components, extend the datapath (usually new MUX inputs), update the decoder spreadsheet and ROM, then test — incrementally.
Byte vs. DW address: registers hold byte addresses; the RAM is indexed by doubleword. Convert with a right-shift-by-3 (a splitter taking bits [9:3] for our 128-cell RAM).
ld/sd datapath: the ALU computes base + offset; loads route RAM Dout back to the register file through the M2R/WDsel MUX (ld=1, str=0); stores write the whole cell (ld=0, str=1).
Sub-doubleword access: lw/lb extract a slice with the word-index bit (byte-addr bit 2) and sign-extend to 64 bits; sw/sb need read-modify-write (ld=1 and str=1) to change part of a cell without clobbering the rest.
Build and debug incrementally: keep part1–final directories and a four-sheet decoder spreadsheet, keep the RAM at the top level so you can inspect it, and use single-stepping and probes against an objdump to find faults.

Programs and Data Memory¶

Overview¶

Learning Objectives¶

Prerequisites¶

1. Project 6 Logistics and Incremental Development¶

Directory layout¶

Running the autograder¶

Lab 10 observations and the spreadsheet¶

2. Preparing RISC-V Functions to Run on the Processor¶

Rule 1 — Write an assembly main¶

Rule 2 — No .global¶

Rule 3 — Use jal, not call¶

Rule 4 — End with unimp¶

Putting it together¶

3. Sizing the RAM and Initializing the Stack Pointer¶

Capacity arithmetic¶

Where the stack pointer starts¶

4. The Recipe for Adding a New Instruction¶

Step 1 — Pick an instruction or group¶

Step 2 — Modify or add components¶

Step 3 — Modify or add to the datapath¶

Step 4 — Extend the instruction decoder / control lines¶

Step 5 — Test¶

5. Data Memory: ld / sd¶

The instruction form¶

The byte-address to doubleword-address problem¶

The RAM component and its ports¶

Routing the load result back: the M2R / WDsel MUX¶

Putting the ld/sd datapath together¶

Control lines for ld and sd¶

6. lw / sw and lb / sb: Sub-Doubleword Access¶

Word addressing inside a doubleword¶

Load word (lw) — extract then sign-extend¶

Store word (sw) — read-modify-write¶

lb / sb by analogy¶

Why a 64-bit value, not just 32¶

7. Memory Sizes, Alignment, and Sign Extension Recap¶

Key Concepts¶

Practice Problems¶

Problem 1: Size the RAM and place the stack pointer¶

Problem 2: Convert a byte address to a DW address¶

Problem 3: Control-line table¶

Problem 4: Why read-modify-write?¶

Problem 5: Trace a standalone program¶

Problem 6: Apply the five-step recipe¶

Further Reading¶

Summary¶

Rule 1 — Write an assembly `main`¶

Rule 2 — No `.global`¶

Rule 3 — Use `jal`, not `call`¶

Rule 4 — End with `unimp`¶

5. Data Memory: `ld` / `sd`¶

Putting the `ld`/`sd` datapath together¶

Control lines for `ld` and `sd`¶

6. `lw` / `sw` and `lb` / `sb`: Sub-Doubleword Access¶

Load word (`lw`) — extract then sign-extend¶

Store word (`sw`) — read-modify-write¶

`lb` / `sb` by analogy¶