Processor Components¶
Overview¶
This lecture marks the transition from digital design fundamentals to building a real processor. We pull together everything covered so far — C coding, data representation, memory, RISC-V assembly and machine code, the emulator, cache design, and digital logic — and use it to design the hardware that actually executes machine instructions. We define the Instruction Set Architecture (ISA) as the hardware/software interface, distinguish it from the microarchitecture that implements it, and then walk through the major components of a single-cycle RISC-V processor: the PC register, instruction memory, decoders, the register file, the ALU, and data memory. We finish with the detailed design of the register file, including how to add a CLEAR (CLR) input to Digital's register and how to wire 31 physical registers (X0 is hardwired to 0) into a two-read, one-write structure.
Learning Objectives¶
- Distinguish the Instruction Set Architecture (ISA) from the microarchitecture that implements it
- Explain the hardware/software interface and where the ISA sits in it
- State Moore's Law and relate transistor growth to processor complexity (caches, multi-core, GPUs)
- Identify the major components of a single-cycle processor and describe the data flow through one clock cycle
- Describe the register file interface (read ports, write port, control signals) and why X0 is hardwired to 0
- Wrap a Digital register with a MUX to add synchronous CLEAR (CLR) functionality
- Implement a register file with two read ports and one write port using MUX trees and a decoder-with-enable
- Connect the register-file design to the broader Lab 9 and Project 6 roadmap (single-cycle → multi-cycle → pipelined)
Prerequisites¶
- RISC-V assembly and machine-code instruction formats (Lectures 3–6)
- The RISC-V emulator / fetch-decode-execute cycle
- Digital design fundamentals: MUXes, decoders, encoders, and the ALU
- Sequential logic: D flip-flops, registers, clocks (rising edge), CLK/EN/CLR control signals
- Familiarity with the Digital simulator (Project 5, Labs 7–8)
1. Where We Are: The Course Arc¶
Computer architecture is the study of how a processor is built and how software talks to it. By this point in CS 315 we have climbed the whole stack from C source down to gates, and we are about to put the pieces together into a working processor.
Topics covered so far that all feed into processor design:
- C coding — the high-level language whose compiled output we will execute
- Data representation — integers, two's complement, ASCII (IEEE 754 floating point is still ahead of us)
- Memory — byte addressing, word/double-word layout, alignment
- RISC-V assembly — the human-readable instruction language
- RISC-V machine code — the 32-bit binary encoding the hardware actually reads
- RISC-V emulator — a software model of fetch-decode-execute
- Cache design — how memory is made fast
- Digital design — gates, MUXes, decoders, latches, flip-flops, registers
flowchart TD
A[C coding] --> P[Processor Design]
B[Data representation] --> P
C[Memory] --> P
D[RISC-V Assembly] --> P
E[RISC-V Machine Code] --> P
F[RISC-V Emulator] --> P
G[Cache Design] --> P
H[Digital Design] --> P
style P fill:#f9f,stroke:#333,stroke-width:2px
The emulator we wrote earlier was a software implementation of the fetch-decode-execute loop. Now we build the hardware version: the same loop, but realized with registers, memory, decoders, and an ALU wired together in the Digital simulator.
A note on incremental development¶
Greg opened with a theme that applies to both Project 5 (software) and the upcoming hardware labs: build and test in small, verifiable steps. Implementing a large system all at once and then debugging it is far slower than wiring up one component, testing it, then adding the next. Incremental development takes upfront planning but dramatically reduces debugging time and deepens understanding. For Project 6 this is even a graded requirement — you must submit part1, part2, part3, and final versions to show incremental progress.
2. Instruction Set Architecture vs. Microarchitecture¶
The single most important conceptual distinction in this lecture is ISA vs. microarchitecture.
The ISA is an interface¶
The Instruction Set Architecture (ISA) is the contract between software and hardware. It specifies the observable behavior of instructions — what instructions exist, how they are encoded, what registers are visible, and what each instruction does — without saying how the hardware accomplishes it.
Software is written to the ISA; hardware is built to satisfy the ISA. As long as both sides honor the contract, the same program runs on any conforming processor.
flowchart TB
SW["Software<br/>(compilers, assemblers, programs)"]
ISA["ISA<br/>(the interface / contract)"]
HW["Hardware<br/>(the processor)"]
SW -->|emits machine code defined by| ISA
ISA -->|implemented by| HW
HW -->|executes per| ISA
The microarchitecture is an implementation¶
The microarchitecture is the specific hardware implementation of an ISA — the digital-logic design that realizes the contract. One ISA can have many microarchitectures: a tiny single-cycle teaching processor and a 64-core server chip can implement the same RISC-V ISA with wildly different internal designs.
flowchart TD
CA["Computer Architecture"]
CA --> ISA["Instruction Set Architecture<br/>(specification)"]
CA --> MA["Microarchitecture"]
ISA --> MC["Machine Code"]
ISA --> RM["Registers / Memory model"]
MC --> AC["Assemblers / Compilers"]
MA --> DD["Digital Design"]
DD --> SE["Schematic entry (visual)"]
DD --> HDL["HDL (Hardware Description Language)"]
HDL --> V["Verilog"]
HDL --> VHDL["IEEE VHDL"]
Two ways to do digital design¶
The microarchitecture is realized with digital design, and there are two broad approaches:
| Approach | What it is | Example |
|---|---|---|
| Schematic entry (visual) | Draw the circuit by placing and wiring components | What we do in Digital |
| HDL (Hardware Description Language) | Describe the hardware in code, then synthesize it | Verilog, IEEE VHDL |
In this course we use schematic entry in the Digital simulator — we draw and wire the components directly, which makes the data path easy to see and reason about. Industry typically uses HDLs (Verilog, VHDL) so designs can be parameterized, version-controlled, and synthesized to real silicon, but the underlying concepts are identical.
| ISA | Microarchitecture | |
|---|---|---|
| Nature | Specification / interface | Implementation |
| Answers | What does the instruction do? | How is it carried out in hardware? |
| Visible to software? | Yes | No (hidden) |
| Examples | RISC-V, x86, ARM | single-cycle, multi-cycle, pipelined, superscalar |
| Can change without breaking software? | No — would break the contract | Yes — many designs satisfy one ISA |
3. Moore's Law and Why Processors Got Complex¶
Moore's Law: The number of transistors doubles roughly every 1.5 years for processors.
(The classic statement is "every 18–24 months." Greg quoted ~1.5 years, which is the common doubling-time figure.)
The key insight: as transistor budgets exploded, designers had room to add complexity that improves performance. That extra silicon went into things like:
- Caches (multiple levels) to hide slow memory
- Multi-core designs to run several instruction streams at once
- GPUs for massive data-parallel work
- Neural engines / accelerators for machine-learning workloads
flowchart LR
M["Moore's Law:<br/>2x transistors / ~1.5 yr"] --> C["More transistors available"]
C --> A["Caches"]
C --> B["Multi-core"]
C --> D["GPUs"]
C --> E["Neural engines"]
For CS 315, all of this complexity is out of scope. We deliberately start with the simplest possible microarchitecture: a single-cycle processor, where every instruction completes in exactly one clock cycle. It is slow and inefficient, but it is the clearest way to see the data path. Later designs improve on it:
- Single-cycle processor — one instruction per clock cycle (our starting point)
- Multi-cycle processor — break an instruction into stages, reuse hardware across cycles
- Pipelined processor — overlap stages of multiple instructions to increase throughput
The roadmap of assignments mirrors this progression:
- Lab 9 — build the core components (PC register, register file, ALU); due next Monday (Nov 3)
- Lab 10 — extend toward a working processor; due the following Monday
- Project 6 — full single-cycle RISC-V processor with interactive grading (due Nov 17, grading Nov 18)
4. The Single-Cycle Processor: Major Components¶
A single-cycle processor executes one complete instruction per clock cycle. On each rising clock edge the PC moves to the next instruction and all state updates (register writes, memory writes) take effect. Within a cycle, signals flow combinationally through the components.
Here is the data path sketched in lecture, redrawn as a block diagram:
+----------------+
| Inst Decode |---> control lines
+----------------+
^
+-----+ ADDR +-----+ IW +-----------+ +----------+ +-----+ +-----+
| PC |--------->| ROM |------->| Reg/Imm |-->| Register |-->| ALU |-->| RAM |
| Reg | |(Inst| | Decode | | File | | | | |
|64bit| | Mem)| +-----------+ +----------+ +-----+ +-----+
+-----+ +-----+ | ^ | ^ |
^ v | | | |
| +---+ +-----+ +---+---------+---------+
+--| + |<---- 4 | Imm | (RD0, RD1 feed ALU; ALU result
+---+ +-----+ and RAM data write back to RegFile)
The components, in data-flow order:
| Component | Role |
|---|---|
| PC register (64-bit) | Holds the address of the instruction currently executing |
| +4 adder | Computes PC + 4 to advance to the next instruction |
| Instruction memory (ROM) | Stores the 32-bit machine-code program; returns the instruction word (IW) at the PC address |
| Instruction decoder | Reads the IW and produces control lines for every component |
| Register decoder | Extracts the 5-bit register fields (rs1, rs2, rd) from the IW |
| Immediate decoder (Imm) | Extracts and sign-extends immediate values from the IW |
| Register file | Two read ports + one write port over 32 logical registers |
| ALU | Performs add, sub, mul, sll, srl; also computes branch/memory target addresses |
| Data memory (RAM) | Read/write data for loads/stores (the simulated stack) |
One clock cycle, step by step¶
flowchart LR
A["PC selects<br/>instruction"] --> B["Fetch IW<br/>from ROM"]
B --> C["Decode IW<br/>(control + reg + imm)"]
C --> D["Read registers<br/>RD0, RD1"]
D --> E["ALU computes<br/>result / address"]
E --> F["Access RAM<br/>(load/store)"]
F --> G["Write back<br/>on rising edge"]
G --> H["PC <- PC+4<br/>(or branch target)"]
Importantly, the register and memory writes, plus the PC update, all happen on the rising clock edge — that is the moment the instruction "completes." Everything between edges is combinational settling.
The clock and "complete instruction"¶
In the lecture's clock-waveform sketch, each rising edge of the clock marks the point where the current instruction completes and the next begins. Between two rising edges, the data path has one full clock period to settle: the PC drives the ROM, the IW propagates through the decoders, register values flow to the ALU, the ALU result (or RAM data) settles at the write-back input — and then the rising edge latches everything.
____ ____ ____
| | | | | |
_____| |____| |____| |____
^ ^ ^
| | |
complete complete complete
instr instr instr
Because one clock period must be long enough for the slowest instruction's entire path to settle, the single-cycle design is simple but not fast — every instruction pays for the worst case. That limitation is exactly what multi-cycle and pipelined designs later address.
5. The Register File¶
The register file is the processor's fast on-chip storage for the 32 architectural registers. Its job: let the data path read two registers and write one register, all within a single clock cycle.
Specification¶
- 32 64-bit registers: X0, X1, …, X31
- Read up to two register values in a single clock cycle
- Write to a single register in a clock cycle
- X0 (zero) is always 0 — reads of X0 always return 0, and writes to X0 are discarded
That last point is a RISC-V ISA rule. Because X0 can never hold anything but zero, we do not build a physical flip-flop register for it — we just hardwire its output to constant 0. So the register file has 32 logical registers but only 31 physical registers (X1–X31).
Interface (block diagram)¶
5 +-----------------------------+ 64
RR0 --/-| RR0 RD0 |-/--->
5 | | 64
RR1 --/-| RR1 RD1 |-/--->
| |
5 | |
WR --/-| WR |
64 | |
WD --/-| WD |
1 | |
WE --/-| WE |
1 | |
CLK --/-| CLK |
1 | |
CLR --/-| CLR |
+-----------------------------+
Signal glossary¶
| Signal | Width | Meaning |
|---|---|---|
| RR0 | 5 | Read register 0 — selects which register to output on RD0 |
| RR1 | 5 | Read register 1 — selects which register to output on RD1 |
| RD0 | 64 | Read data 0 — value of the register named by RR0 |
| RD1 | 64 | Read data 1 — value of the register named by RR1 |
| WR | 5 | Write register — selects the destination register to update |
| WD | 64 | Write data — the value to write into WR |
| WE | 1 | Write enable — only write when WE = 1 |
| CLK | 1 | Clock — writes happen on the rising edge |
| CLR | 1 | Clear — synchronously reset register values to 0 |
The selectors are 5 bits because we have 32 registers and 2^5 = 32. The data buses are 64 bits because RV64 registers are 64 bits wide.
Why two read ports and one write port?¶
Look at a typical R-type instruction:
In a single cycle we must read two source registers (a0, a1) and write one destination register (a2). That is exactly two read ports and one write port — the minimum a single-cycle data path needs.
flowchart LR
RR0["RR0 = a0"] --> RF[Register File]
RR1["RR1 = a1"] --> RF
RF --> RD0["RD0 = value of a0"]
RF --> RD1["RD1 = value of a1"]
RD0 --> ALU
RD1 --> ALU
ALU --> WD["WD = a0 + a1"]
WD --> RF
WR["WR = a2"] --> RF
WE["WE = 1"] --> RF
Reads are combinational (asynchronous): change RR0/RR1 and RD0/RD1 follow immediately. Writes are synchronous: WD lands in register WR only on the rising clock edge, and only if WE = 1.
6. Adding CLEAR (CLR) to a Digital Register¶
There is one snag: Digital's built-in register has D, CLK, and EN inputs, but no CLR (clear) input. We need CLR so we can reset all registers (and the PC) to a known state at the start of execution. The fix is to wrap the built-in register with a little logic — a classic incremental-design move.
The idea¶
CLR should synchronously force the stored value to 0 on the next clock edge. We achieve this with two small additions:
- A 2-to-1 MUX on the
Dinput that selects between the real data and constant 0. CLR is the MUX selector: when CLR = 1, the register loads 0; when CLR = 0, it loads the real D. - An OR gate that combines the original
ENwithCLRto drive the wrapped register's enable. This guarantees that even if the caller did not assert EN, a CLR still forces a write of 0 on the clock edge.
+-----+
D ------------>| 0 |
| MUX |---------> D +-----------------+
0 (const) ---->| 1 | | 64-bit |
+-----+ | Register |---> Q --->
^ | |
CLR --------------+--------------->| (sel) |
| | |
CLK ----------------------------->| CLK |
| | |
EN ----+ | | |
| +----+ | | |
+-->| OR |-+--------------->| EN |
CLR -+---->| | +-----------------+
| +----+
+-- (CLR also feeds MUX select above)
Wrapping logic, expressed as boolean equations:
D_in = CLR ? 0 : D # MUX: load 0 when CLR is asserted
EN_in = EN OR CLR # force a load when clearing, even if EN was low
Behavior table¶
| CLR | EN | On rising CLK edge, register loads... |
|---|---|---|
| 0 | 0 | (nothing — holds current value) |
| 0 | 1 | D (normal write) |
| 1 | 0 | 0 (cleared) |
| 1 | 1 | 0 (clear wins, because MUX forces 0) |
This wrapped 64-bit register is reused twice in the design: as the PC register (with EN tied high so it advances every cycle) and as the storage cell for each physical register X1–X31 in the register file.
flowchart LR
D["D (data in)"] --> MUX{MUX}
Z["0 (constant)"] --> MUX
CLR --> MUX
MUX --> RD["D"]
EN --> OR{OR}
CLR --> OR
OR --> RE["EN"]
CLK --> RC["CLK"]
subgraph REG["Wrapped 64-bit Register"]
RD
RC
RE
Q["Q (data out)"]
end
Q --> OUT["output"]
7. Register File Implementation¶
Now we assemble 31 wrapped registers (X1–X31) plus the hardwired X0 into the full register file. There are three sub-problems: the read path, the write path, and X0.
X0: hardwired zero¶
In the lecture sketch, the X0 register is drawn and then crossed out in red — a deliberate reminder that X0 is not a physical register. Its output is simply the constant 0. There is no flip-flop, no D input, no enable. Reads of X0 yield 0; writes to X0 are silently ignored (the write decoder never selects a non-existent X0 register).
The write path: a decoder with enable¶
Only one register can be written per cycle, and only when WE = 1. We use a 5-to-32 decoder with an enable input:
- Input: the 5-bit WR (write register number)
- Enable: WE
- Output: 32 one-hot lines, where line
igoes high only ifWR == iandWE == 1
Each one-hot line drives the EN input of the corresponding physical register. So at most one register gets enabled per cycle, and only when WE is asserted. The shared WD (write data) bus and CLK feed every register; the decoder picks which one actually latches.
+------------------+
WR (5) ------->| |--> EN_x1 ---> X1.EN
| 5-to-32 |--> EN_x2 ---> X2.EN
WE (1) ------->| Decoder |--> ...
(enable) | with Enable |--> EN_x31 ---> X31.EN
+------------------+
(X0 has no enable line - it is hardwired 0)
WD (64) --------> D input of every register (shared bus)
CLK ------------> CLK input of every register (shared)
Because the decoder has an enable, when WE = 0 all output lines are 0 and no register is written — exactly the behavior we want.
The read path: two independent MUX trees¶
To read two registers in the same cycle, we need two independent 32-to-1 multiplexers, each 64 bits wide:
- RD0 MUX: selector = RR0, inputs = {X0, X1, …, X31}, output = RD0
- RD1 MUX: selector = RR1, inputs = {X0, X1, …, X31}, output = RD1
Each MUX takes the outputs (Q) of all 32 registers as its data inputs and uses the 5-bit read-register number as its selector. Two separate MUXes mean RR0 and RR1 can name different registers and both are read simultaneously — combinationally, with no clock involved.
flowchart TD
subgraph REGS["Physical Registers"]
X0["X0 = 0 (hardwired)"]
X1["X1"]
X2["X2"]
XN["... X31"]
end
X0 --> M0[RD0 MUX 32:1]
X1 --> M0
X2 --> M0
XN --> M0
X0 --> M1[RD1 MUX 32:1]
X1 --> M1
X2 --> M1
XN --> M1
RR0["RR0 (sel)"] --> M0
RR1["RR1 (sel)"] --> M1
M0 --> RD0["RD0 (64)"]
M1 --> RD1["RD1 (64)"]
Full register-file structure¶
flowchart LR
RR0in["RR0 (5)"] --> RM0[RD0 MUX]
RR1in["RR1 (5)"] --> RM1[RD1 MUX]
WRin["WR (5)"] --> DEC["Decoder + EN"]
WEin["WE (1)"] --> DEC
WDin["WD (64)"] --> BUS["shared D bus"]
CLKin["CLK"] --> CLKBUS["shared CLK"]
DEC --> EN1["EN to X1..X31"]
BUS --> REGS["Registers X1..X31"]
CLKBUS --> REGS
EN1 --> REGS
X0c["X0 = const 0"] --> RM0
X0c --> RM1
REGS --> RM0
REGS --> RM1
RM0 --> RD0out["RD0 (64)"]
RM1 --> RD1out["RD1 (64)"]
Wires vs. tunnels¶
A practical Digital tip from lecture: for a circuit this dense, prefer explicit wires over tunnels for clarity. Tunnels (named nets that connect without a drawn line) reduce visual clutter and repetition, but in a complex circuit they can hide connectivity and make bugs hard to find. Reserve tunnels for the dashboard (probes showing register state) where repetition would be overwhelming, and use real wires for the data path so you can literally trace the signal flow.
Why this is reusable¶
A nice payoff: unlike Project 5 — where you built primitive components (adders, comparators) from gates — for the processor you can lean on Digital's existing library components (MUXes, decoders, registers, RAM, ROM). The register file is built almost entirely from a 5-to-32 decoder, two 32-to-1 MUXes, and 31 wrapped registers. This is the value of a component library: you assemble proven blocks instead of reinventing them.
8. The ALU and Data Memory (Preview)¶
The lecture diagram also showed the ALU and RAM at the right end of the data path. These get full treatment in later sessions, but here is the shape of them so the data path makes sense.
ALU (combinational)¶
The ALU is combinational logic — it has no state and therefore no clock input. It takes two 64-bit operands and an operation selector and produces a 64-bit result:
| Input | Width | Meaning |
|---|---|---|
| A | 64 | First operand |
| B | 64 | Second operand |
| ALUop | 3 | Operation selector |
| Output R | 64 | Result |
For Lab 9, the ALU must support these operations:
| ALUop | Operation |
|---|---|
0b000 |
add |
0b001 |
sub |
0b010 |
mul |
0b011 |
sll (shift left logical) |
0b100 |
srl (shift right logical) |
The ALU does double duty: it computes data-processing results and computes branch/memory target addresses (e.g., PC + immediate, base + offset), since address arithmetic is just addition.
# A - B in the ALU (ALUop = 0b001)
li t0, 1 # t0 = 1 (pseudo: addi t0, x0, 1)
li t1, 1 # t1 = 1
sub t0, t0, t1 # t0 = t0 - t1 = 0 -> ALU computes A - B
Data memory (RAM)¶
Loads and stores read and write the data memory, implemented with Digital's RAM component. For our programs this holds the stack (function parameters, preserved registers). The address arriving from the ALU is a byte address, but the RAM is addressed by 64-bit double-words — so we shift right by 3 (divide by 8) to convert a byte address to a double-word index before indexing the RAM. (Detailed load/store-byte and load/store-word logic comes in Part 3.)
Note: Data memory and the immediate extender are explicitly out of scope for Lab 9 — Lab 9 is the PC register, register file, and ALU only. They arrive in Lab 10 / Project 6.
9. Connecting to Lab 9 and Project 6¶
The lecture's component designs map directly onto the deliverables.
Lab 9 (due Mon Nov 3)¶
Build and test the core components, then a partial top-level circuit:
- A 64-bit PC register with CLR (use your wrapped register or the Digital register)
- A register file: 32 logical / 31 physical 64-bit registers, two reads + one write per cycle
- An ALU supporting
addi/add,sub,mul,sll,srl - A top-level dashboard (splitters, tunnels, probes) showing register and PC state
The autograder names matter exactly:
- Main circuit
lab09.digwith inputsCLK,CLR,RR0,RR1,WR,WE,ALUSrcB,ALUOp,Immand outputsT0,T1 - ALU circuit
alu.digwith inputsA,B,ALUOpand outputR
You should be able to execute these small programs by manually driving the inputs:
addi t0, t0, 1 # T0 = 1
li t1, 2 # T1 = 2
addi t0, t0, -1 # T0 = -1 (0xFFFFFFFFFFFFFFFF)
li t0, 1 # T0 = 1
li t1, 1 # T1 = 1
sub t0, t0, t1 # T0 = 0
Project 6 (due Mon Nov 17, interactive grading Nov 18)¶
Extend the components into a complete single-cycle RISC-V processor with the instruction/register/immediate decoders, branch control unit, and data memory wired together. Required: incremental submissions in part1, part2, part3, final, plus the instruction-decoder ROM spreadsheets. The full build details are in the processor guides; this lecture established the foundational components those guides build on.
Key Concepts¶
| Concept | Definition | Example |
|---|---|---|
| ISA | The hardware/software interface — what instructions exist and what they do | RISC-V, x86, ARM |
| Microarchitecture | A specific hardware implementation of an ISA | single-cycle, pipelined |
| Single-cycle processor | One instruction completes per clock cycle | our Lab 9 / Project 6 design |
| Moore's Law | Transistor count doubles ~every 1.5 years for processors | enabled caches, multi-core, GPUs |
| PC register | 64-bit register holding the current instruction's address | advances by PC + 4 each cycle |
| Register file | Fast storage with 2 read ports + 1 write port | reads a0,a1; writes a2 for add |
| X0 hardwiring | X0 always reads 0; not a physical register | 32 logical, 31 physical registers |
| CLR wrapper | MUX + OR logic added around a Digital register for synchronous clear | D_in = CLR ? 0 : D, EN_in = EN OR CLR |
| Decoder with enable | One-hot output gated by an enable line | selects write register only when WE=1 |
| Read MUX tree | 32-to-1 MUX selecting one register's value | RD0 = MUX(RR0, X0..X31) |
| ALU | Combinational unit for arithmetic/logic and address math | add, sub, mul, sll, srl |
| Schematic entry vs HDL | Visual circuit drawing vs. textual hardware description | Digital vs. Verilog/VHDL |
Practice Problems¶
Problem 1: ISA or microarchitecture?¶
Classify each statement as describing the ISA or the microarchitecture:
- "The
addinstruction is encoded as an R-type with opcode0110011." - "We implement the register file with two 32-to-1 MUXes for the read ports."
- "X0 always reads as zero."
- "Our processor completes one instruction per clock cycle."
Click to reveal solution
1. **ISA** — instruction encoding is part of the contract software relies on. 2. **Microarchitecture** — an implementation choice; another design could use a different structure. 3. **ISA** — a behavioral rule visible to software (RISC-V guarantees X0 = 0). 4. **Microarchitecture** — "single-cycle" is an implementation property; a pipelined chip implementing the same ISA behaves differently internally. The dividing line: if changing it would break a correctly written program, it is the **ISA**. If you can change it and programs still produce the same results, it is the **microarchitecture**.Problem 2: Why 5-bit selectors and 64-bit data?¶
The register file's RR0, RR1, and WR are each 5 bits, but RD0, RD1, and WD are 64 bits. Explain both widths.
Click to reveal solution
- **Selectors are 5 bits** because there are 32 registers and `2^5 = 32`. Five bits is exactly enough to name any one of X0–X31. - **Data buses are 64 bits** because this is an RV64 design — each register holds a 64-bit value, so the value read out (RD0/RD1) or written in (WD) must carry all 64 bits. In general: **selector width = log2(number of registers)**; **data width = register width**.Problem 3: Hand-trace the CLR wrapper¶
Given the wrapper logic D_in = CLR ? 0 : D and EN_in = EN OR CLR, fill in what the register loads on the next rising clock edge for each input combination:
| CLR | EN | D | Register loads? |
|---|---|---|---|
| 0 | 0 | 7 | ? |
| 0 | 1 | 7 | ? |
| 1 | 0 | 7 | ? |
| 1 | 1 | 7 | ? |
Click to reveal solution
| CLR | EN | D | Register loads? | |-----|----|----|-----------------| | 0 | 0 | 7 | holds old value (EN_in = 0, no write) | | 0 | 1 | 7 | **7** (EN_in = 1, MUX passes D) | | 1 | 0 | 7 | **0** (EN_in = 0 OR 1 = 1; MUX forces 0) | | 1 | 1 | 7 | **0** (CLR wins: EN_in = 1, MUX forces 0) | The two jobs of the wrapper: the **OR gate** makes sure a clear actually triggers a write even when EN is low, and the **MUX** makes sure the value written during a clear is 0 rather than D.Problem 4: Why two read MUXes instead of one?¶
A student proposes saving hardware by using a single 32-to-1 read MUX and reading the two source registers in sequence — RR0 first, then RR1. Why does this break the single-cycle design?
Click to reveal solution
A single MUX can only present **one** register's value at a time on its output. Reading two registers would require **two passes** (two selector settings), which means two sub-steps within what is supposed to be one clock cycle. That violates the single-cycle premise that an `add a2, a0, a1` reads *both* sources and computes the result in *one* cycle. Two independent MUXes let RR0 and RR1 select *different* registers and present *both* values **simultaneously and combinationally** on RD0 and RD1, so the ALU has both operands at once. The hardware cost (a second 64-bit 32-to-1 MUX) is the price of true single-cycle execution.Problem 5: Trace one cycle of add a2, a0, a1¶
Assume a0 = 5 (X10), a1 = 3 (X11), destination a2 (X12), WE = 1. List the values on RR0, RR1, RD0, RD1, the ALU inputs/output, WR, and WD, and say when X12 actually updates.
Click to reveal solution
RR0 = 10 (a0 = x10) RR1 = 11 (a1 = x11)
RD0 = 5 (value of a0) RD1 = 3 (value of a1) <- combinational reads
ALU: A = RD0 = 5, B = RD1 = 3, ALUop = 0b000 (add)
ALU result R = 5 + 3 = 8
WR = 12 (a2 = x12)
WD = 8 (ALU result routed to write data)
WE = 1
The write decoder asserts EN only on line 12 (because WR=12 and WE=1).
X12 latches 8 on the RISING CLOCK EDGE.
Problem 6: Counting physical registers¶
RISC-V has 32 logical registers but we build only 31 physical registers. (a) Which register is missing and why? (b) If we instead built a hypothetical ISA with 64 logical registers where two of them were hardwired constants, how many physical registers and how many selector bits would the register file need?
Click to reveal solution
**(a)** **X0** is omitted as a physical register because the ISA guarantees it always reads 0. Building a flip-flop for it would be wasted hardware (and would risk accidentally storing a nonzero value). Instead X0's output is wired to constant 0, and the write decoder never selects it. **(b)** With 64 logical registers, selectors need `log2(64) = 6` bits. If two registers are hardwired constants, you build `64 - 2 = 62` physical registers, but you **still need 6 selector bits** because the selectors must be able to *name* all 64 logical registers (including the hardwired ones, whose reads return their constants via the MUX inputs). Key takeaway: hardwired registers reduce **physical storage** but not **selector width**, which depends on the logical register count.Further Reading¶
- Lecture source notes: "/notes/CS315-01 2025-10-28 Processor Components.pdf"
- Processor design guides: /guides/processor-part-1/ , /guides/processor-part-2/ , /guides/processor-part-3/
- Lab 9 spec: /assignments/lab09/
- Project 6 spec: /assignments/project06/
- RISC-V Specifications (official)
- Moore's Law (Wikipedia)
- Register file (Wikipedia)
- Single-cycle datapath overview (Wikipedia: Datapath)
- Patterson & Hennessy, Computer Organization and Design: RISC-V Edition, Chapter 4 (The Processor)
Summary¶
-
The ISA is the hardware/software interface — it specifies what instructions do and how they are encoded, forming the contract software is written against.
-
The microarchitecture is the implementation — one ISA (e.g., RISC-V) can be realized by many microarchitectures, from a simple single-cycle design to a pipelined, multi-core chip.
-
Moore's Law (transistor count doubling ~every 1.5 years) provided the silicon budget for caches, multi-core, GPUs, and accelerators; we deliberately start with the simplest single-cycle design.
-
A single-cycle processor runs one instruction per clock cycle, flowing PC → instruction memory → decoders → register file → ALU → data memory, with all state updates latched on the rising clock edge.
-
The register file provides two read ports and one write port over 32 logical registers; reads are combinational and the write is synchronous (only when WE = 1, on the rising edge).
-
X0 is hardwired to 0, so only 31 physical registers (X1–X31) are built — its output is constant 0 and it is never written.
-
A CLR input is added to Digital's register by wrapping it: a MUX forces the data input to 0 when CLR is asserted, and an OR of EN and CLR guarantees the clear takes effect on the clock edge. This wrapped register serves as both the PC and each physical register.
-
The implementation reuses library components — two 32-to-1 read MUXes, a 5-to-32 decoder-with-enable for writes, and wrapped registers — and these foundational pieces lead directly into Lab 9, Lab 10, and Project 6.