Processor Components¶

Overview¶

This lecture marks the transition from digital design fundamentals to building a real processor. We pull together everything covered so far — C coding, data representation, memory, RISC-V assembly and machine code, the emulator, cache design, and digital logic — and use it to design the hardware that actually executes machine instructions. We define the Instruction Set Architecture (ISA) as the hardware/software interface, distinguish it from the microarchitecture that implements it, and then walk through the major components of a single-cycle RISC-V processor: the PC register, instruction memory, decoders, the register file, the ALU, and data memory. We finish with the detailed design of the register file, including how to add a CLEAR (CLR) input to Digital's register and how to wire 31 physical registers (X0 is hardwired to 0) into a two-read, one-write structure.

Learning Objectives¶

Distinguish the Instruction Set Architecture (ISA) from the microarchitecture that implements it
Explain the hardware/software interface and where the ISA sits in it
State Moore's Law and relate transistor growth to processor complexity (caches, multi-core, GPUs)
Identify the major components of a single-cycle processor and describe the data flow through one clock cycle
Describe the register file interface (read ports, write port, control signals) and why X0 is hardwired to 0
Wrap a Digital register with a MUX to add synchronous CLEAR (CLR) functionality
Implement a register file with two read ports and one write port using MUX trees and a decoder-with-enable
Connect the register-file design to the broader Lab 9 and Project 6 roadmap (single-cycle → multi-cycle → pipelined)

Prerequisites¶

RISC-V assembly and machine-code instruction formats (Lectures 3–6)
The RISC-V emulator / fetch-decode-execute cycle
Digital design fundamentals: MUXes, decoders, encoders, and the ALU
Sequential logic: D flip-flops, registers, clocks (rising edge), CLK/EN/CLR control signals
Familiarity with the Digital simulator (Project 5, Labs 7–8)

1. Where We Are: The Course Arc¶

Computer architecture is the study of how a processor is built and how software talks to it. By this point in CS 315 we have climbed the whole stack from C source down to gates, and we are about to put the pieces together into a working processor.

Topics covered so far that all feed into processor design:

C coding — the high-level language whose compiled output we will execute
Data representation — integers, two's complement, ASCII (IEEE 754 floating point is still ahead of us)
Memory — byte addressing, word/double-word layout, alignment
RISC-V assembly — the human-readable instruction language
RISC-V machine code — the 32-bit binary encoding the hardware actually reads
RISC-V emulator — a software model of fetch-decode-execute
Cache design — how memory is made fast
Digital design — gates, MUXes, decoders, latches, flip-flops, registers

flowchart TD
    A[C coding] --> P[Processor Design]
    B[Data representation] --> P
    C[Memory] --> P
    D[RISC-V Assembly] --> P
    E[RISC-V Machine Code] --> P
    F[RISC-V Emulator] --> P
    G[Cache Design] --> P
    H[Digital Design] --> P
    style P fill:#f9f,stroke:#333,stroke-width:2px

The emulator we wrote earlier was a software implementation of the fetch-decode-execute loop. Now we build the hardware version: the same loop, but realized with registers, memory, decoders, and an ALU wired together in the Digital simulator.

A note on incremental development¶

Greg opened with a theme that applies to both Project 5 (software) and the upcoming hardware labs: build and test in small, verifiable steps. Implementing a large system all at once and then debugging it is far slower than wiring up one component, testing it, then adding the next. Incremental development takes upfront planning but dramatically reduces debugging time and deepens understanding. For Project 6 this is even a graded requirement — you must submit part1, part2, part3, and final versions to show incremental progress.

2. Instruction Set Architecture vs. Microarchitecture¶

The single most important conceptual distinction in this lecture is ISA vs. microarchitecture.

The ISA is an interface¶

The Instruction Set Architecture (ISA) is the contract between software and hardware. It specifies the observable behavior of instructions — what instructions exist, how they are encoded, what registers are visible, and what each instruction does — without saying how the hardware accomplishes it.

            SW (software)
   ----------------------------------   <- interface = ISA
            HW (hardware)
            processor

Software is written to the ISA; hardware is built to satisfy the ISA. As long as both sides honor the contract, the same program runs on any conforming processor.

flowchart TB
    SW["Software<br/>(compilers, assemblers, programs)"]
    ISA["ISA<br/>(the interface / contract)"]
    HW["Hardware<br/>(the processor)"]
    SW -->|emits machine code defined by| ISA
    ISA -->|implemented by| HW
    HW -->|executes per| ISA

The microarchitecture is an implementation¶

The microarchitecture is the specific hardware implementation of an ISA — the digital-logic design that realizes the contract. One ISA can have many microarchitectures: a tiny single-cycle teaching processor and a 64-core server chip can implement the same RISC-V ISA with wildly different internal designs.

flowchart TD
    CA["Computer Architecture"]
    CA --> ISA["Instruction Set Architecture<br/>(specification)"]
    CA --> MA["Microarchitecture"]
    ISA --> MC["Machine Code"]
    ISA --> RM["Registers / Memory model"]
    MC --> AC["Assemblers / Compilers"]
    MA --> DD["Digital Design"]
    DD --> SE["Schematic entry (visual)"]
    DD --> HDL["HDL (Hardware Description Language)"]
    HDL --> V["Verilog"]
    HDL --> VHDL["IEEE VHDL"]

Two ways to do digital design¶

The microarchitecture is realized with digital design, and there are two broad approaches:

Approach	What it is	Example
Schematic entry (visual)	Draw the circuit by placing and wiring components	What we do in Digital
HDL (Hardware Description Language)	Describe the hardware in code, then synthesize it	Verilog, IEEE VHDL

In this course we use schematic entry in the Digital simulator — we draw and wire the components directly, which makes the data path easy to see and reason about. Industry typically uses HDLs (Verilog, VHDL) so designs can be parameterized, version-controlled, and synthesized to real silicon, but the underlying concepts are identical.

	ISA	Microarchitecture
Nature	Specification / interface	Implementation
Answers	What does the instruction do?	How is it carried out in hardware?
Visible to software?	Yes	No (hidden)
Examples	RISC-V, x86, ARM	single-cycle, multi-cycle, pipelined, superscalar
Can change without breaking software?	No — would break the contract	Yes — many designs satisfy one ISA

3. Moore's Law and Why Processors Got Complex¶

Moore's Law: The number of transistors doubles roughly every 1.5 years for processors.

(The classic statement is "every 18–24 months." Greg quoted ~1.5 years, which is the common doubling-time figure.)

The key insight: as transistor budgets exploded, designers had room to add complexity that improves performance. That extra silicon went into things like:

Caches (multiple levels) to hide slow memory
Multi-core designs to run several instruction streams at once
GPUs for massive data-parallel work
Neural engines / accelerators for machine-learning workloads

flowchart LR
    M["Moore's Law:<br/>2x transistors / ~1.5 yr"] --> C["More transistors available"]
    C --> A["Caches"]
    C --> B["Multi-core"]
    C --> D["GPUs"]
    C --> E["Neural engines"]

For CS 315, all of this complexity is out of scope. We deliberately start with the simplest possible microarchitecture: a single-cycle processor, where every instruction completes in exactly one clock cycle. It is slow and inefficient, but it is the clearest way to see the data path. Later designs improve on it:

Single-cycle processor — one instruction per clock cycle (our starting point)
Multi-cycle processor — break an instruction into stages, reuse hardware across cycles
Pipelined processor — overlap stages of multiple instructions to increase throughput

The roadmap of assignments mirrors this progression:

Lab 9  ->  Lab 10  ->  Project 6

Lab 9 — build the core components (PC register, register file, ALU); due next Monday (Nov 3)
Lab 10 — extend toward a working processor; due the following Monday
Project 6 — full single-cycle RISC-V processor with interactive grading (due Nov 17, grading Nov 18)

4. The Single-Cycle Processor: Major Components¶

A single-cycle processor executes one complete instruction per clock cycle. On each rising clock edge the PC moves to the next instruction and all state updates (register writes, memory writes) take effect. Within a cycle, signals flow combinationally through the components.

Here is the data path sketched in lecture, redrawn as a block diagram:

                    +----------------+
                    | Inst Decode    |---> control lines
                    +----------------+
                          ^
   +-----+   ADDR   +-----+   IW   +-----------+   +----------+   +-----+   +-----+
   | PC  |--------->| ROM |------->| Reg/Imm   |-->| Register |-->| ALU |-->| RAM |
   | Reg |          |(Inst|        | Decode    |   |  File    |   |     |   |     |
   |64bit|          | Mem)|        +-----------+   +----------+   +-----+   +-----+
   +-----+          +-----+              |             ^   |         ^         |
      ^                                  v             |   |         |         |
      |  +---+                        +-----+          +---+---------+---------+
      +--| + |<---- 4                 | Imm |        (RD0, RD1 feed ALU; ALU result
         +---+                        +-----+         and RAM data write back to RegFile)

The components, in data-flow order:

Component	Role
PC register (64-bit)	Holds the address of the instruction currently executing
+4 adder	Computes `PC + 4` to advance to the next instruction
Instruction memory (ROM)	Stores the 32-bit machine-code program; returns the instruction word (IW) at the PC address
Instruction decoder	Reads the IW and produces control lines for every component
Register decoder	Extracts the 5-bit register fields (rs1, rs2, rd) from the IW
Immediate decoder (Imm)	Extracts and sign-extends immediate values from the IW
Register file	Two read ports + one write port over 32 logical registers
ALU	Performs add, sub, mul, sll, srl; also computes branch/memory target addresses
Data memory (RAM)	Read/write data for loads/stores (the simulated stack)

One clock cycle, step by step¶

flowchart LR
    A["PC selects<br/>instruction"] --> B["Fetch IW<br/>from ROM"]
    B --> C["Decode IW<br/>(control + reg + imm)"]
    C --> D["Read registers<br/>RD0, RD1"]
    D --> E["ALU computes<br/>result / address"]
    E --> F["Access RAM<br/>(load/store)"]
    F --> G["Write back<br/>on rising edge"]
    G --> H["PC <- PC+4<br/>(or branch target)"]

Importantly, the register and memory writes, plus the PC update, all happen on the rising clock edge — that is the moment the instruction "completes." Everything between edges is combinational settling.

The clock and "complete instruction"¶

In the lecture's clock-waveform sketch, each rising edge of the clock marks the point where the current instruction completes and the next begins. Between two rising edges, the data path has one full clock period to settle: the PC drives the ROM, the IW propagates through the decoders, register values flow to the ALU, the ALU result (or RAM data) settles at the write-back input — and then the rising edge latches everything.

        ____      ____      ____
       |    |    |    |    |    |
  _____|    |____|    |____|    |____
       ^         ^         ^
       |         |         |
   complete  complete  complete
   instr     instr     instr

Because one clock period must be long enough for the slowest instruction's entire path to settle, the single-cycle design is simple but not fast — every instruction pays for the worst case. That limitation is exactly what multi-cycle and pipelined designs later address.

5. The Register File¶

The register file is the processor's fast on-chip storage for the 32 architectural registers. Its job: let the data path read two registers and write one register, all within a single clock cycle.

Specification¶

32 64-bit registers: X0, X1, …, X31
Read up to two register values in a single clock cycle
Write to a single register in a clock cycle
X0 (zero) is always 0 — reads of X0 always return 0, and writes to X0 are discarded

That last point is a RISC-V ISA rule. Because X0 can never hold anything but zero, we do not build a physical flip-flop register for it — we just hardwire its output to constant 0. So the register file has 32 logical registers but only 31 physical registers (X1–X31).

Interface (block diagram)¶

       5  +-----------------------------+   64
  RR0 --/-| RR0                     RD0 |-/--->
       5  |                             |   64
  RR1 --/-| RR1                     RD1 |-/--->
          |                             |
       5  |                             |
   WR --/-| WR                          |
      64  |                             |
   WD --/-| WD                          |
       1  |                             |
   WE --/-| WE                          |
       1  |                             |
  CLK --/-| CLK                         |
       1  |                             |
  CLR --/-| CLR                         |
          +-----------------------------+

Signal glossary¶

Signal	Width	Meaning
RR0	5	Read register 0 — selects which register to output on RD0
RR1	5	Read register 1 — selects which register to output on RD1
RD0	64	Read data 0 — value of the register named by RR0
RD1	64	Read data 1 — value of the register named by RR1
WR	5	Write register — selects the destination register to update
WD	64	Write data — the value to write into WR
WE	1	Write enable — only write when WE = 1
CLK	1	Clock — writes happen on the rising edge
CLR	1	Clear — synchronously reset register values to 0

The selectors are 5 bits because we have 32 registers and 2^5 = 32. The data buses are 64 bits because RV64 registers are 64 bits wide.

Why two read ports and one write port?¶

Look at a typical R-type instruction:

add a2, a0, a1    # a2 = a0 + a1

In a single cycle we must read two source registers (a0, a1) and write one destination register (a2). That is exactly two read ports and one write port — the minimum a single-cycle data path needs.

flowchart LR
    RR0["RR0 = a0"] --> RF[Register File]
    RR1["RR1 = a1"] --> RF
    RF --> RD0["RD0 = value of a0"]
    RF --> RD1["RD1 = value of a1"]
    RD0 --> ALU
    RD1 --> ALU
    ALU --> WD["WD = a0 + a1"]
    WD --> RF
    WR["WR = a2"] --> RF
    WE["WE = 1"] --> RF

Reads are combinational (asynchronous): change RR0/RR1 and RD0/RD1 follow immediately. Writes are synchronous: WD lands in register WR only on the rising clock edge, and only if WE = 1.

6. Adding CLEAR (CLR) to a Digital Register¶

There is one snag: Digital's built-in register has D, CLK, and EN inputs, but no CLR (clear) input. We need CLR so we can reset all registers (and the PC) to a known state at the start of execution. The fix is to wrap the built-in register with a little logic — a classic incremental-design move.

The idea¶

CLR should synchronously force the stored value to 0 on the next clock edge. We achieve this with two small additions:

A 2-to-1 MUX on the D input that selects between the real data and constant 0. CLR is the MUX selector: when CLR = 1, the register loads 0; when CLR = 0, it loads the real D.
An OR gate that combines the original EN with CLR to drive the wrapped register's enable. This guarantees that even if the caller did not assert EN, a CLR still forces a write of 0 on the clock edge.

                  +-----+
   D ------------>| 0   |
                  | MUX |---------> D  +-----------------+
   0 (const) ---->| 1   |             |        64-bit   |
                  +-----+             |     Register    |---> Q --->
                     ^                |                 |
   CLR --------------+--------------->| (sel)           |
                     |                |                 |
   CLK ----------------------------->| CLK             |
                     |                |                 |
   EN ----+          |                |                 |
          |   +----+ |                |                 |
          +-->| OR |-+--------------->| EN              |
   CLR -+---->|    |                  +-----------------+
        |     +----+
        +-- (CLR also feeds MUX select above)

Wrapping logic, expressed as boolean equations:

D_in   = CLR ? 0 : D        # MUX: load 0 when CLR is asserted
EN_in  = EN OR CLR          # force a load when clearing, even if EN was low

Behavior table¶

CLR	EN	On rising CLK edge, register loads...
0	0	(nothing — holds current value)
0	1	D (normal write)
1	0	0 (cleared)
1	1	0 (clear wins, because MUX forces 0)

This wrapped 64-bit register is reused twice in the design: as the PC register (with EN tied high so it advances every cycle) and as the storage cell for each physical register X1–X31 in the register file.

flowchart LR
    D["D (data in)"] --> MUX{MUX}
    Z["0 (constant)"] --> MUX
    CLR --> MUX
    MUX --> RD["D"]
    EN --> OR{OR}
    CLR --> OR
    OR --> RE["EN"]
    CLK --> RC["CLK"]
    subgraph REG["Wrapped 64-bit Register"]
        RD
        RC
        RE
        Q["Q (data out)"]
    end
    Q --> OUT["output"]

7. Register File Implementation¶

Now we assemble 31 wrapped registers (X1–X31) plus the hardwired X0 into the full register file. There are three sub-problems: the read path, the write path, and X0.

X0: hardwired zero¶

In the lecture sketch, the X0 register is drawn and then crossed out in red — a deliberate reminder that X0 is not a physical register. Its output is simply the constant 0. There is no flip-flop, no D input, no enable. Reads of X0 yield 0; writes to X0 are silently ignored (the write decoder never selects a non-existent X0 register).

The write path: a decoder with enable¶

Only one register can be written per cycle, and only when WE = 1. We use a 5-to-32 decoder with an enable input:

Input: the 5-bit WR (write register number)
Enable: WE
Output: 32 one-hot lines, where line i goes high only if WR == i and WE == 1

Each one-hot line drives the EN input of the corresponding physical register. So at most one register gets enabled per cycle, and only when WE is asserted. The shared WD (write data) bus and CLK feed every register; the decoder picks which one actually latches.

                  +------------------+
   WR (5) ------->|                  |--> EN_x1  ---> X1.EN
                  |   5-to-32        |--> EN_x2  ---> X2.EN
   WE (1) ------->|   Decoder        |--> ...
        (enable)  |   with Enable    |--> EN_x31 ---> X31.EN
                  +------------------+
                  (X0 has no enable line - it is hardwired 0)

   WD (64) --------> D input of every register (shared bus)
   CLK ------------> CLK input of every register (shared)

Because the decoder has an enable, when WE = 0 all output lines are 0 and no register is written — exactly the behavior we want.

The read path: two independent MUX trees¶

To read two registers in the same cycle, we need two independent 32-to-1 multiplexers, each 64 bits wide:

RD0 MUX: selector = RR0, inputs = {X0, X1, …, X31}, output = RD0
RD1 MUX: selector = RR1, inputs = {X0, X1, …, X31}, output = RD1

Each MUX takes the outputs (Q) of all 32 registers as its data inputs and uses the 5-bit read-register number as its selector. Two separate MUXes mean RR0 and RR1 can name different registers and both are read simultaneously — combinationally, with no clock involved.

flowchart TD
    subgraph REGS["Physical Registers"]
        X0["X0 = 0 (hardwired)"]
        X1["X1"]
        X2["X2"]
        XN["... X31"]
    end
    X0 --> M0[RD0 MUX 32:1]
    X1 --> M0
    X2 --> M0
    XN --> M0
    X0 --> M1[RD1 MUX 32:1]
    X1 --> M1
    X2 --> M1
    XN --> M1
    RR0["RR0 (sel)"] --> M0
    RR1["RR1 (sel)"] --> M1
    M0 --> RD0["RD0 (64)"]
    M1 --> RD1["RD1 (64)"]

Full register-file structure¶

flowchart LR
    RR0in["RR0 (5)"] --> RM0[RD0 MUX]
    RR1in["RR1 (5)"] --> RM1[RD1 MUX]
    WRin["WR (5)"] --> DEC["Decoder + EN"]
    WEin["WE (1)"] --> DEC
    WDin["WD (64)"] --> BUS["shared D bus"]
    CLKin["CLK"] --> CLKBUS["shared CLK"]

    DEC --> EN1["EN to X1..X31"]
    BUS --> REGS["Registers X1..X31"]
    CLKBUS --> REGS
    EN1 --> REGS

    X0c["X0 = const 0"] --> RM0
    X0c --> RM1
    REGS --> RM0
    REGS --> RM1

    RM0 --> RD0out["RD0 (64)"]
    RM1 --> RD1out["RD1 (64)"]

Wires vs. tunnels¶

A practical Digital tip from lecture: for a circuit this dense, prefer explicit wires over tunnels for clarity. Tunnels (named nets that connect without a drawn line) reduce visual clutter and repetition, but in a complex circuit they can hide connectivity and make bugs hard to find. Reserve tunnels for the dashboard (probes showing register state) where repetition would be overwhelming, and use real wires for the data path so you can literally trace the signal flow.

Why this is reusable¶

A nice payoff: unlike Project 5 — where you built primitive components (adders, comparators) from gates — for the processor you can lean on Digital's existing library components (MUXes, decoders, registers, RAM, ROM). The register file is built almost entirely from a 5-to-32 decoder, two 32-to-1 MUXes, and 31 wrapped registers. This is the value of a component library: you assemble proven blocks instead of reinventing them.

8. The ALU and Data Memory (Preview)¶

The lecture diagram also showed the ALU and RAM at the right end of the data path. These get full treatment in later sessions, but here is the shape of them so the data path makes sense.

ALU (combinational)¶

The ALU is combinational logic — it has no state and therefore no clock input. It takes two 64-bit operands and an operation selector and produces a 64-bit result:

Input	Width	Meaning
A	64	First operand
B	64	Second operand
ALUop	3	Operation selector
Output R	64	Result

For Lab 9, the ALU must support these operations:

ALUop	Operation
`0b000`	add
`0b001`	sub
`0b010`	mul
`0b011`	sll (shift left logical)
`0b100`	srl (shift right logical)

The ALU does double duty: it computes data-processing results and computes branch/memory target addresses (e.g., PC + immediate, base + offset), since address arithmetic is just addition.

# A - B in the ALU (ALUop = 0b001)
li   t0, 1        # t0 = 1   (pseudo: addi t0, x0, 1)
li   t1, 1        # t1 = 1
sub  t0, t0, t1   # t0 = t0 - t1 = 0   -> ALU computes A - B

Data memory (RAM)¶

Loads and stores read and write the data memory, implemented with Digital's RAM component. For our programs this holds the stack (function parameters, preserved registers). The address arriving from the ALU is a byte address, but the RAM is addressed by 64-bit double-words — so we shift right by 3 (divide by 8) to convert a byte address to a double-word index before indexing the RAM. (Detailed load/store-byte and load/store-word logic comes in Part 3.)

Note: Data memory and the immediate extender are explicitly out of scope for Lab 9 — Lab 9 is the PC register, register file, and ALU only. They arrive in Lab 10 / Project 6.

9. Connecting to Lab 9 and Project 6¶

The lecture's component designs map directly onto the deliverables.

Lab 9 (due Mon Nov 3)¶

Build and test the core components, then a partial top-level circuit:

A 64-bit PC register with CLR (use your wrapped register or the Digital register)
A register file: 32 logical / 31 physical 64-bit registers, two reads + one write per cycle
An ALU supporting addi/add, sub, mul, sll, srl
A top-level dashboard (splitters, tunnels, probes) showing register and PC state

The autograder names matter exactly:

Main circuit lab09.dig with inputs CLK, CLR, RR0, RR1, WR, WE, ALUSrcB, ALUOp, Imm and outputs T0, T1
ALU circuit alu.dig with inputs A, B, ALUOp and output R

You should be able to execute these small programs by manually driving the inputs:

addi t0, t0, 1     # T0 = 1
li   t1, 2         # T1 = 2
addi t0, t0, -1    # T0 = -1  (0xFFFFFFFFFFFFFFFF)

li   t0, 1         # T0 = 1
li   t1, 1         # T1 = 1
sub  t0, t0, t1    # T0 = 0

Project 6 (due Mon Nov 17, interactive grading Nov 18)¶

Extend the components into a complete single-cycle RISC-V processor with the instruction/register/immediate decoders, branch control unit, and data memory wired together. Required: incremental submissions in part1, part2, part3, final, plus the instruction-decoder ROM spreadsheets. The full build details are in the processor guides; this lecture established the foundational components those guides build on.

Key Concepts¶

Concept	Definition	Example
ISA	The hardware/software interface — what instructions exist and what they do	RISC-V, x86, ARM
Microarchitecture	A specific hardware implementation of an ISA	single-cycle, pipelined
Single-cycle processor	One instruction completes per clock cycle	our Lab 9 / Project 6 design
Moore's Law	Transistor count doubles ~every 1.5 years for processors	enabled caches, multi-core, GPUs
PC register	64-bit register holding the current instruction's address	advances by `PC + 4` each cycle
Register file	Fast storage with 2 read ports + 1 write port	reads `a0`,`a1`; writes `a2` for `add`
X0 hardwiring	X0 always reads 0; not a physical register	32 logical, 31 physical registers
CLR wrapper	MUX + OR logic added around a Digital register for synchronous clear	`D_in = CLR ? 0 : D`, `EN_in = EN OR CLR`
Decoder with enable	One-hot output gated by an enable line	selects write register only when WE=1
Read MUX tree	32-to-1 MUX selecting one register's value	RD0 = MUX(RR0, X0..X31)
ALU	Combinational unit for arithmetic/logic and address math	add, sub, mul, sll, srl
Schematic entry vs HDL	Visual circuit drawing vs. textual hardware description	Digital vs. Verilog/VHDL

Practice Problems¶

Problem 1: ISA or microarchitecture?¶

Classify each statement as describing the ISA or the microarchitecture:

"The add instruction is encoded as an R-type with opcode 0110011."
"We implement the register file with two 32-to-1 MUXes for the read ports."
"X0 always reads as zero."
"Our processor completes one instruction per clock cycle."

Click to reveal solution

1. **ISA** — instruction encoding is part of the contract software relies on. 2. **Microarchitecture** — an implementation choice; another design could use a different structure. 3. **ISA** — a behavioral rule visible to software (RISC-V guarantees X0 = 0). 4. **Microarchitecture** — "single-cycle" is an implementation property; a pipelined chip implementing the same ISA behaves differently internally. The dividing line: if changing it would break a correctly written program, it is the **ISA**. If you can change it and programs still produce the same results, it is the **microarchitecture**.

Problem 2: Why 5-bit selectors and 64-bit data?¶

The register file's RR0, RR1, and WR are each 5 bits, but RD0, RD1, and WD are 64 bits. Explain both widths.

Click to reveal solution

- **Selectors are 5 bits** because there are 32 registers and `2^5 = 32`. Five bits is exactly enough to name any one of X0–X31. - **Data buses are 64 bits** because this is an RV64 design — each register holds a 64-bit value, so the value read out (RD0/RD1) or written in (WD) must carry all 64 bits. In general: **selector width = log2(number of registers)**; **data width = register width**.

Problem 3: Hand-trace the CLR wrapper¶

Given the wrapper logic D_in = CLR ? 0 : D and EN_in = EN OR CLR, fill in what the register loads on the next rising clock edge for each input combination:

CLR	EN	D	Register loads?
0	0	7	?
0	1	7	?
1	0	7	?
1	1	7	?

Click to reveal solution

| CLR | EN | D | Register loads? | |-----|----|----|-----------------| | 0 | 0 | 7 | holds old value (EN_in = 0, no write) | | 0 | 1 | 7 | **7** (EN_in = 1, MUX passes D) | | 1 | 0 | 7 | **0** (EN_in = 0 OR 1 = 1; MUX forces 0) | | 1 | 1 | 7 | **0** (CLR wins: EN_in = 1, MUX forces 0) | The two jobs of the wrapper: the **OR gate** makes sure a clear actually triggers a write even when EN is low, and the **MUX** makes sure the value written during a clear is 0 rather than D.

Problem 4: Why two read MUXes instead of one?¶

A student proposes saving hardware by using a single 32-to-1 read MUX and reading the two source registers in sequence — RR0 first, then RR1. Why does this break the single-cycle design?

Click to reveal solution

A single MUX can only present **one** register's value at a time on its output. Reading two registers would require **two passes** (two selector settings), which means two sub-steps within what is supposed to be one clock cycle. That violates the single-cycle premise that an `add a2, a0, a1` reads *both* sources and computes the result in *one* cycle. Two independent MUXes let RR0 and RR1 select *different* registers and present *both* values **simultaneously and combinationally** on RD0 and RD1, so the ALU has both operands at once. The hardware cost (a second 64-bit 32-to-1 MUX) is the price of true single-cycle execution.

Problem 5: Trace one cycle of `add a2, a0, a1`¶

Assume a0 = 5 (X10), a1 = 3 (X11), destination a2 (X12), WE = 1. List the values on RR0, RR1, RD0, RD1, the ALU inputs/output, WR, and WD, and say when X12 actually updates.

Click to reveal solution

RR0 = 10  (a0 = x10)      RR1 = 11  (a1 = x11)
RD0 = 5   (value of a0)   RD1 = 3   (value of a1)   <- combinational reads

ALU: A = RD0 = 5, B = RD1 = 3, ALUop = 0b000 (add)
ALU result R = 5 + 3 = 8

WR  = 12  (a2 = x12)
WD  = 8   (ALU result routed to write data)
WE  = 1

The write decoder asserts EN only on line 12 (because WR=12 and WE=1).
X12 latches 8 on the RISING CLOCK EDGE.

Reads (RR0/RR1 → RD0/RD1) and the ALU computation are **combinational** and settle during the cycle. The actual state change (X12 ← 8) happens **only at the rising edge** — that edge is the moment the instruction "completes."

Problem 6: Counting physical registers¶

RISC-V has 32 logical registers but we build only 31 physical registers. (a) Which register is missing and why? (b) If we instead built a hypothetical ISA with 64 logical registers where two of them were hardwired constants, how many physical registers and how many selector bits would the register file need?

Click to reveal solution

**(a)** **X0** is omitted as a physical register because the ISA guarantees it always reads 0. Building a flip-flop for it would be wasted hardware (and would risk accidentally storing a nonzero value). Instead X0's output is wired to constant 0, and the write decoder never selects it. **(b)** With 64 logical registers, selectors need `log2(64) = 6` bits. If two registers are hardwired constants, you build `64 - 2 = 62` physical registers, but you **still need 6 selector bits** because the selectors must be able to *name* all 64 logical registers (including the hardwired ones, whose reads return their constants via the MUX inputs). Key takeaway: hardwired registers reduce **physical storage** but not **selector width**, which depends on the logical register count.

Summary¶

The ISA is the hardware/software interface — it specifies what instructions do and how they are encoded, forming the contract software is written against.
The microarchitecture is the implementation — one ISA (e.g., RISC-V) can be realized by many microarchitectures, from a simple single-cycle design to a pipelined, multi-core chip.
Moore's Law (transistor count doubling ~every 1.5 years) provided the silicon budget for caches, multi-core, GPUs, and accelerators; we deliberately start with the simplest single-cycle design.
A single-cycle processor runs one instruction per clock cycle, flowing PC → instruction memory → decoders → register file → ALU → data memory, with all state updates latched on the rising clock edge.
The register file provides two read ports and one write port over 32 logical registers; reads are combinational and the write is synchronous (only when WE = 1, on the rising edge).
X0 is hardwired to 0, so only 31 physical registers (X1–X31) are built — its output is constant 0 and it is never written.
A CLR input is added to Digital's register by wrapping it: a MUX forces the data input to 0 when CLR is asserted, and an OR of EN and CLR guarantees the clear takes effect on the clock edge. This wrapped register serves as both the PC and each physical register.
The implementation reuses library components — two 32-to-1 read MUXes, a 5-to-32 decoder-with-enable for writes, and wrapped registers — and these foundational pieces lead directly into Lab 9, Lab 10, and Project 6.

Processor Components¶

Overview¶

Learning Objectives¶

Prerequisites¶

1. Where We Are: The Course Arc¶

A note on incremental development¶

2. Instruction Set Architecture vs. Microarchitecture¶

The ISA is an interface¶

The microarchitecture is an implementation¶

Two ways to do digital design¶

3. Moore's Law and Why Processors Got Complex¶

4. The Single-Cycle Processor: Major Components¶

One clock cycle, step by step¶

The clock and "complete instruction"¶

5. The Register File¶

Specification¶

Interface (block diagram)¶

Signal glossary¶

Why two read ports and one write port?¶

6. Adding CLEAR (CLR) to a Digital Register¶

The idea¶

Behavior table¶

7. Register File Implementation¶

X0: hardwired zero¶

The write path: a decoder with enable¶

The read path: two independent MUX trees¶

Full register-file structure¶

Wires vs. tunnels¶

Why this is reusable¶

8. The ALU and Data Memory (Preview)¶

ALU (combinational)¶

Data memory (RAM)¶

9. Connecting to Lab 9 and Project 6¶

Lab 9 (due Mon Nov 3)¶

Project 6 (due Mon Nov 17, interactive grading Nov 18)¶

Key Concepts¶

Practice Problems¶

Problem 1: ISA or microarchitecture?¶

Problem 2: Why 5-bit selectors and 64-bit data?¶

Problem 3: Hand-trace the CLR wrapper¶

Problem 4: Why two read MUXes instead of one?¶

Problem 5: Trace one cycle of add a2, a0, a1¶

Problem 6: Counting physical registers¶

Further Reading¶

Summary¶

Problem 5: Trace one cycle of `add a2, a0, a1`¶