Key Concepts from Labs and Projects¶
Lab01 - Hello World in the RISC-V Dev Env¶
- Console based editing: micro or vim
- git and GitHub usage
- Makefiles
- Hello World in C
- The Autograder
Project01 - RISC-V VM Access and Number Conversion in C¶
- Dev Environment Setup
- Local RISC-V VM using qemu-system-riscv64
- Remote RISC-V VM on euryale
- ssh keys, ssh
config, sshauthorized_keys - Shell configuration
- bash:
~/.profile,~/.bash_profile,~/.bashrc,~/.bash_aliases - zsh:
~/.zprofile,~/.zshrc
- bash:
- C Basics
- Functions
- Data (global, stack, heap)
- Statements and expressions
- Data types
- Primitives -
int,char,float,uint8_t,uint32_t,int32_t,uint64_t,int64_t - Composite - arrays and structs
- Pointers -
int *,char *,uint32_t *,uint64_t *
- Primitives -
- Pointer operations
&address of, e.g.,int x; int *p; p = &x;*dereference, e.g.,x = *p;
- Strings in C
- Array of chars (bytes)
- Null
\0(0) terminated - ASCII character codes
- The C stack
- Local variables
- Allocated on function entry, deallocated on function exit (return)
- Signed (
int,int32_t) vs unsigned (unsigned int,uint32_t) values - Command line arguments with
int argcandchar *argv - Layout of
argvarray in memory - Array of pointers first,
argv[0],argv[1],argv[2],NULL (0) - String arrays second,
./foo+\0,-p+\0, etc. - Parsing command line arguments
- Number systems
- Decimal (base 10, digits:
0,1,2,3,4,5,6,7,8,9) - Binary (base 2, digits:
0,1, C literal prefix:0b) - Hexadecimal, "hex" (base 16: digits:
0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F, C literal prefix:0x)A= 10,B= 11,C= 12,D=13, E =14, F =15
- base conversions
- C string as decimal, binay, hexadecimal to int
- Place value:
2 * (10^2) + 5 * (10^1) + 4 * (10^0)
- Place value:
- int to C string as decimal, binary, hexadecimal
- Modulus to find digit
- Divide to get value for next digit
- Repeat until divide gives 0
Lab02 - Introduction to RISC-V Assembly Programming¶
- Instructions, registers, labels, directives
- See RISC-V Assembly Guides
- Assembling vs compiling
- Assembling (as) assembly source (.s) to object code (.o)
- Compiling (gcc) c source (.c) to object code (.o)
- The object files must be linked, we can use gcc for this
as -o foo.o foo.sgcc -o main main.c foo.o
- The C compiler can also assemble
gcc -o main main.c main.s
- Instructions have a name (mnemonic), like
addormul - Assembly programming model
- A RISC-V processor has 32 registers plus the
PCregistersx0,x1, ...x31- ABI names:
sp,ra,a0,a1, ...,t0,t1, ...,s0,s1, ...
- The
PCpoints to next instruction in memory to execute - Load an instruction from memory into processor
- Execute the instruction, usually updates one or more register values, but can also update memory
- Update
PC, often just the next instructionPC + 4, but can also be a diffenent address in the case of a branch or jump (control instruction) - Most instructions have three operands
- One target (destination)
- Two source
add t0, t1, t1meanst0 = t1 + t2- On RISC-V 64 bit registers are 64 bits (8 bytes)
- Data processing instructions
- Immediate values
- Control instructions
- conditional branchs:
beq,bne,blt,bge - jumps:
j - return:
ret - Assembly arguments passed in
a0,a1,a2,a3, etc. - Return value is put into
a0 - Basic assembly function
- if/then/else in assembly
- C:
- Asm: Assume
a0 - int x, a1 - int y, t0 - int r - for loops in assembly
- C:
- Assume
t0 - int r, t1 - int i, t2 - int n - Array access in assembly
- lw (load word)
lw t0, (a0)meanst0 = *a0 - Load word at address
a0into registert0 - Pointer based array access
- To get to next element in array of words (ints):
addi a0, a0, 4
- To get to next element in array of words (ints):
Project02 - RISC-V Assembly Language¶
- RISC-V ABI Function Calling Conventions
- Only one set of registers (32), so we need a way to coordinate usage between functions
- Caller-saved registers (
a0,a1, ...,a7,t0,t, ...t6,ra)- These must be preserved on the stack by the caller of a function
- You only need to preserve the registers you will be using after the function call
- Callee-saved registers (
sp,s0,s1, ...s11)- These must be preserved on the stack by the callee, on entry to the function.
- They should be restored on exit.
- The stack pointer is preserved by subtracting (stack allocation) on function entry and adding (stack deallocation) on function exit the name number of bytes.
- The
spshould be a multiple of 16 (for performance compatibility with some instructions)
- Functions that don't call other functions do not need to allocate stack space.
- Only need stack space if the function uses caller-saved registers
- Function call template when using stack:
- Indexed-based array access
- Assume
t0isint i,a0is base address of arrayint arr[] - Altneratively use a shift instead of multiply
- Recursive functions
- Nothing special, just that the func calls itself
- Same rules apply
- Presere any caller-saved registers on stack before the recursive call
- Restore any caller-saved registers from stack after the recursive call
- If a recursive function simply returns after the recursive call, then it is called tail recursive and the recursive call can be turned into a jump.
Project03 - RISC-V Assembly Language Part 2¶
- Working with C strings in assembly
- Use
lb(load byte) andsb(store byte) to access characters in a string - Calling C functions from Assembly
- Just add a
.global func_namefor the function you want to use - E.g.,
.global strlen - Then set arg registers and use
callinstruction as normal - Make sure to preserve any caller-saved registers before call
- Memory
- Byte addressable
- Smallest addressable unit is a byte (8 bits)
- A int (word) is 4 bytes, need to decide how to layout a word in memory
- Two byte orderings
- Big endian (put the most significant byte first)
- Little endian (put the least significant byte first)
- Two's Complement (2's Complement)
- Need a way to represent negative integer values
- To get the two's complement negative representation of a positive value:
- twos = invert_bits(val) + 1
- 3-bit two's complement values
- Two's complement properties
- The msb (most significant bit) is the sign bit (0 is pos, 1 is neg)
- Only one representation of 0
- Because of this, there is one more neg value than pos values
- Simple grade school arithmetic work on two's complement numbers
- Bit manipulation and bitwise operators
- Zero (0) / One (1)
- Other names: unset/set, low/higg, off/on, false/true
- A byte is 8 bits
- msb - most significant bit
- lsb - least significant bit
- A word is 4 bytes or 32 bits
- Bitwise Operaters (op, C, ASM)
AND,&,and/andiOR,|,or/oriNOT,~XOR,^,- Shift Left Logical
- Shift x to the left by n bits
- Fill lower bits with 0
uint32_t xx << nsll/slli- Shift Right Logical
- Shift x to the right by n bits
- Fill upper bits with 0
uint32_t xx >> nsrl/srli- Shift Right Arithmetic
- Shift x to the right by n bits
- Fill upper bits with sign bit value
int32_t xx >> nsra/srai
- Masking
- It is often need to pull out a sequence of bits from a larger value
- To get a sequence of bits:
- Shift the bits to the far right
- Mask the bits to zero out any upper bits we don't want
- Example: assume we want the middle 4 bits of an 8 bit value (bits 2-5)
- Combine individual bit sequences with right shift and OR (
|) - Sign extension
- Assume you have a 4-bit two's complement number you want to sign extend to be a 32-bit two's complement number.
- Shift all the way to the left and then do shift right arithmetic all the way back
Lab03 - RISC-V Machine Code Emulation¶
- RISC-V Machine Code and Emulation
- Instructions are 32 bits (1 word), there is also a 16 bit compressed format
- Each instruction encodes source register(s), a destination registers, and target address or offset
- We decode the
iw(instruction word) in steps- Look at
opcodeto determine instruction format type - Decode the
iwbased on opcode - Further decode
iwby looking at other fields such asfunct3andfunct7
- Look at
- The base RISC-V emulator includes the emulated state as a struct:
NREGSis 32 (because RISC-V has 32 registers)- The
pcis the program counter STACK_SIZEis 8192, but for prorgrams with lots of function calls and/or local vars this may need to be bigger.
- Basic emulation
- We will emulate assembly programs that have been linked with the emulator code
- Initialize the
rv_statewith - A pointer to the function we want to emulate
- The first 4 arguments as uint64_t values to send to the function
- Get
iw:iw = *pc - Get
opcodefromiw - Based on
opcodecall emulation function, like: emu_r_type(rsp, iw)rspisstruct rv_state *rsp- The emu functions will further decode the
iw - Use shift and mask (
get_bits()) to get each field - Update the state (registers and memory based on
opcode,funct3, andfunct7) - Update the
pc, usuallypc += 4to go to the next instruction- For control instructions, we will compute either an
offsetand add topc - Or, for
jalruse a register value to update thepc
- For control instructions, we will compute either an
- RISC-V Instruction Formats
- R-type
|funct7[31:25]|rs2[24:20]|rs1[19:15]|funct3[14:12]|rd[11:7]|opcode[6:0]|- For register instructions like
add,sub,sll
- For register instructions like
- I-type
|imm_11_0[31:20]|rs1[19:15]|funct3[14:12]|rd[11:7]|opcode[6:0]|- For immediate instruction like
addi,slli,lw,lb - Immediate must be constructed from parts and sign extended
int64_t imm = sign_extend(imm_11_0, 11)
- For immediate instruction like
- B-type
|imm_12[31]|imm_10_5[30:25]|rs2[24:20]|rs1[19:15]|funct3[14:12]|imm_4_1[11:8]|imm_11[7]|opcode[6:0]|- For branches, like
beq,bne,blt,bge int64_t imm = sign_extend(imm_12 << 12 | imm_11 << 11 | imm_10_5 << 5 | imm_4_1 << 1)- If branch taken
pc += imm, if not takenpc += 4
- For branches, like
- J-type
|imm_20[31]|imm_10_1[30:21]|imm_11[20]|imm_19_12[19:12]|rd[11:7]|opcode|- For 'jal' jump and link, also 'j'
int64_t imm = sign_extend(imm_20 << 20 | imm_19_12 << 12 | imm_11 < 11 | imm_10_1 << 1)pc += imm
- RISC-V Pseudo Instructions
- The assembler provide many convenience instructions that do not exist as real instructions
- Examples
li a0, 99isaddi a0, zero, 99j offsetisjal zero, offsetcall offsetisjal ra, offsetbgt r1, r2, offsetisblt r2, r1, offset
Project04 - RISC-V Emulation - Analysis - Cache Simulation¶
- Additional RISC-V Instruction Formats
- S-type
|imm_11_5[31:25]|rs2[24:20]|rs1[19:15]|funct3[14:12]|imm_4_0[11:7]|opcode[6:0]|- For store instuctions, like
sb,sw,sd - Immediate must be constructed from parts and sign extended
int64_t imm = sign_extend(imm_11_5 << 5 | imm_4_0, 11)
- For store instuctions, like
- U-type
|imm_31_12[31:12]|rd[11:7]|opcode[6:0]|- For
lui(load upper immediate) andauipcadd upper immediate to pc - Our code dosn't use these
- For
- Note that load instructions are an I-type form
- Dynamic Analysis
- Because we emulate each instruction we can do the following
- Count the total number of instructions executed
- Count the different types of instructions executed
- I/R count, Loads, Stores, Jumps, Branches Taken, Branches Not Taken
- Counts added to
struct rv_stateas a new structstruct rv_analysis - Need to update counts accurately in the emulation functions
- Cache Memory and Simulation
- A Cache is a small and fast memory that holds recently accessed data from main memory
- Main memory is slow relative to CPU speed
- A cache can work because program exeuction adheres to two principles of locality
- Principle of spatial locality - If code accesses a value, there is good chance it will access nearby values
- Principle of temporal locality - If code accesses a value now there is a good chance it will do so again soon
- We send an address to a cache an request a value back
- Hit - value is in the cache for the requested address
- Miss - value is not in the cache for the requested address
- Hit Rate = #hits / #refs
- Miss Rate = #misses / #refs
- In modern processes caches can achieve very high hit rates, > 95%
- Main cache ideas
- For a given memory address, how to locate value in the cache?
- How to replace a value?
- How many bytes do we load at at time into the cache (block size)?
- Three main cache designs
- Direct Mapped
- Fully Associative
- Set Associative
- Direct Mapped
- Each address maps to one slot in the cache
- On miss, just repace slot value with new value from memory
- Fully Associative
- Each address can map to any slot in the cache
- Use a form of LRU (least recently used) to choose which slot to evict on miss
- Set Associative
- An address maps to a set of slots, and can map to any slot in the set
- Use LRU within the slots in a set to choose a slot to evict
- Block Size
- To support spatial locatlity a cache will read in more than the word requested
- We can think of memory as an array of blocks
- We need to find the block for a give address
- Find the base of the block
- The load the entire block into the cache slot
Lab05 - Introduction to Digital Design¶
- Digital Design Fundamentals
- Analog circuits vs. Digital circuits
- Analog: continuous voltage signals (0V to 5V)
- Digital: discrete voltage levels (high/1 and low/0)
- Digital logic levels
- High: represented as 1 (approximately 5V)
- Low: represented as 0 (approximately 0V)
- Binary state system
- Hardware Abstraction Hierarchy
- Wires and devices abstracted into gates
- Gates combined into larger components
- Hierarchical design approach (building complex circuits from simpler components)
- Basic Logic Gates
- AND gate
- Symbol and circuit representation
- Truth table: output 1 only when all inputs are 1
- OR gate
- Symbol and circuit representation
- Truth table: output 1 when any input is 1
- NOT gate (inverter)
- Symbol and circuit representation
- Truth table: inverts the input
- XOR (Exclusive OR)
- XNOR (Exclusive NOR)
- Boolean Algebra and Logic Notation
- C code notation:
r = a & b(AND),r = a | b(OR),r = ~a(NOT) - Boolean algebra notation:
r = a · b(AND),r = a + b(OR),r = ā(NOT) - Logic notation:
r = a ∧ b(AND),r = a ∨ b(OR),r = ¬a(NOT) - Circuit Composition
- Building complex expressions from basic gates
- Example:
r = (a · b) + (c · d)- combining AND and OR operations - Circuits are directed acyclic graphs (DAGs) - no feedback loops in combinational logic
Lab08 - Combinational Logic¶
- Sum-of-Products Method
- Systematic approach to designing combinational logic circuits
- Three-step process:
- Create truth table
- Identify rows with output '1'
- Build equation from those rows
- Product terms: AND operations combining inputs (with inversions as needed)
- Sum: OR operation combining all product terms
- Example: Even/odd parity detector
- Input: 3-bit number (n₂, n₁, n₀)
- Output: Determines if number of '1' bits is even or odd
- Arithmetic Circuits - Adders
- 1-bit Half Adder
- Two inputs (a, b)
- Two outputs: sum and carry out (cout)
- Basic building block for addition
- 1-bit Full Adder
- Three inputs: a, b, and carry in (cin)
- Two outputs: sum and carry out (cout)
- Used for multi-bit addition
- 8-bit Ripple-Carry Adder
- Combines eight 1-bit full adders
- Carry propagates through the chain
- Cin initially set to 0
- Produces 8-bit sum output plus final cout
- Subtraction using Adders
- A - B = A + (-B)
- Using two's complement: invert B then add 1
- Invert B and set initial carry-in to 1
Lab09 - Digital Logic Components¶
- Comparators
- Equality comparator: checks if a == b
- Equation: eq = (ā·b̄) + (a·b)
- 1-bit, 2-bit, 8-bit, and N-bit comparators
- Built hierarchically using smaller comparators
- Multiplexers (MUX)
- 1-bit 2-input MUX: selects between two 1-bit inputs based on select signal
- 32-bit 2-input MUX: selects between two 32-bit inputs
- 1-bit 4-input MUX: selects one of four inputs using 2 select signals (S₁, S₀)
- Select encoding: 00→a, 01→b, 10→c, 11→d
- Implementation options:
- Direct implementation using AND-OR logic
- Tree-based implementation using smaller MUXes
- ALU (Arithmetic Logic Unit)
- Combines multiple operations (addition, subtraction, multiplication, etc.)
- Uses MUXes to select between different operation results
- Inputs: A and B (operands), ALU-op (select signal for operation)
- Outputs: Result of selected operation
- Decoders
- 2-to-4 Decoder
- Takes 2 input bits (S1, S0)
- Produces 4 output lines (r0, r1, r2, r3)
- Only one output is active (high) at a time
- Truth table relationships:
- r0 = (S̄1·S̄0), r1 = (S̄1·S0), r2 = (S1·S̄0), r3 = (S1·S0)
- Encoders
- Basic Encoder: Converts one-hot input to binary output
- Priority Encoder: Handles cases where multiple inputs may be active
- Assigns priority to inputs (higher numbered inputs have priority)
- Outputs binary encoding of highest priority active input
Lab09 - Sequential Logic¶
- Sequential Logic Fundamentals
- Circuits that maintain state over time
- Combine state storage with combinational logic
- Clock Signals
- Clock period: time duration of one complete cycle
- Frequency measured in Hz (cycles per second)
- Rising edge: transition from low (0) to high (1)
- Falling edge: transition from high (1) to low (0)
- Storage Elements Hierarchy
- SR Latch → D Latch → D Flip-Flop → 1-bit Register → N-bit Register → Counter
- SR Latch (Set/Reset Latch)
- Stores a 1-bit value
- Two inputs: S (Set) and R (Reset)
- Two outputs: Q and Q̄ (complement)
- Built using cross-coupled NOR gates
- Truth table:
- R=0, S=0: Hold current state
- R=0, S=1: Set Q to 1
- R=1, S=0: Reset Q to 0
- R=1, S=1: Undefined/invalid state
- D Latch
- Level-sensitive storage
- Single data input (D) and clock input (CLK)
- Passes D to Q when CLK is high (transparent)
- D Flip-Flop
- Edge-triggered storage
- Constructed from two D Latches in master-slave configuration
- Captures input only on clock edge (rising or falling)
- More reliable than latches for synchronous circuits
- N-Bit Register
- Multiple 1-bit registers operating in parallel
- Shared control signals: CLK, CLR, EN
- All bits update synchronously on same clock edge
- Counter Design
- 8-bit register stores current count value
- 8-bit adder increments the count
- Feedback loop: adder output connects to register input
- Increments on each clock cycle when enabled
Project05 - Instruction Memory and Decoding¶
- ROM (Read-Only Memory)
- Memory that can only be read from, not written to
- Uses address bits (A) to select memory locations
- Outputs instruction words (IW) of 32 bits
- Example: 8-bit address can access 2^8 = 256 memory locations
- Instruction Memory Architecture
- Program Counter (PC) contains 64-bit address pointing to current instruction
- PC feeds into instruction memory
- Instruction word (IW) output is 32 bits
- Address Calculation
- Memory addressed by bytes (byte addressing)
- Words consist of 4 bytes
- Word address = byte address / 4
- addr_word = addr_byte >> 2 (divide by 4 using right shift by 2)
- Instruction Decoder (InstDecoder)
- Analyzes 32-bit instruction word and determines instruction type
- Input: 32-bit instruction word (IW)
- Output: Instruction number (inum) indicating instruction type
- Instruction Type Mapping:
- 0: i-type, 1: r-type, 2: load, 3: s-type
- 4: b-type, 5: jal, 6: j, 7: jalr
- Instruction Decoding Process
- Bit Splitter: Extracts specific bit ranges from 32-bit instruction
- Opcode Recognition using equality comparators
- Example opcodes: 0010011 (i-type), 0110011 (r-type)
- Priority encoder for instruction type selection
Lab10 - Processor Components¶
- Computer Architecture Layers
- Instruction Set Architecture (ISA)
- Software-hardware interface
- Specification of machine code, registers, and memory
- Microarchitecture
- Processor implementation of the ISA
- Digital logic implementation
- Digital Design
- Schematic entry (visual) or HDL (Hardware Description Language)
- Languages: Verilog, VHDL (IEEE standards)
- Moore's Law
- The number of transistors doubles every 1.5 years for processors
- Processor Types
- Single-Cycle Processor: Each instruction completes in one clock cycle
- Multi-Cycle Processor: Instructions broken into multiple cycles
- Pipelined Processor: Multiple instructions execute simultaneously
- Single-Cycle Processor Components
- Program Counter (PC): 64-bit register, adder for PC+4
- Instruction Memory (ROM): Stores program instructions
- Instruction Decode: Decodes fetched instructions
- Register File: Connected to ALU
- Immediate value (Imm): Extracted from instruction
- Data Memory (RAM): For load/store operations
- Register File Architecture
- 32 registers, each 64-bit wide
- Register names: X0, X1, ..., X31
- X0 (zero register) always contains zero
- Operations:
- Read up to two register values in a single clock cycle
- Write to a single register in a clock cycle
- Interface Signals:
- RR0, RR1: Read register selectors (5 bits each)
- RD0, RD1: Read data outputs (64 bits each)
- WR: Write register selector (5 bits)
- WD: Write data input (64 bits)
- WE: Write enable (control signal)
- CLK: Clock signal, CLR: Clear signal
- Decoder with Enable
- Select signal controls which output is activated
- Enable signal (WE) gates the decoder operation
- Converts binary select input to one-hot encoded output
- ALU Structure
- Takes two 64-bit inputs: A and B
- ALU-op control signal (3 bits) selects operation
- Produces 64-bit result
- Internal functional units: ADD, SUB, MUL
- Final multiplexer selects output based on ALU-op
Project06 - Processor Instruction Decoding¶
- Register Decoder (RegDecoder)
- Takes instruction word (IW) as input
- Outputs register addresses: rs1, rs2, rd
- Bit fields:
- rd: bits 11-7
- rs1: bits 19-15
- rs2: bits 24-20
- Immediate Decoder (ImmDecoder)
- Takes instruction word (IW) as input
- Outputs all immediate types (I-type, S-type, B-type, U-type, J-type)
- S-Type Immediate construction:
- Bits 11-7 for imm[4:0] and bits 31-25 for imm[11:5]
- Sign extension required
- Sign Extension Hardware
- Converts 12-bit immediate to 64-bit signed value
- Uses multiplexer to select between 0 and -1 for upper bits
- Sign bit determines fill value for upper bits
- Instruction Detection Logic
- ADDI: opcode 0010011 (7-bit comparator)
- ADD: opcode 0110011, funct3 000, funct7 0000000
- AND gates combine comparator outputs to identify specific instructions
- Control Signals Generated
- RFW (Register File Write): Controls whether to write to register file
- ALUOp (ALU Operation): Specifies which ALU operation to perform
- ALUSrcB (ALU Source B): Selects second operand for ALU
- 0: use register value, 1: use immediate value
- ROM for Control Signals
- Input: 3-bit encoded instruction type (inum)
- Output: Control signal bundle (ALUSrcB, ALUOp, RFW)
- Adding New Instructions (5-Step Process)
- Pick an instruction or group of instructions
- Modify or add components to processor
- Modify or add to the datapath (usually need to extend or add muxes)
- Extend instruction decoder control lines
- Test the implementation
Project06 - Control Flow Instructions (JAL, JALR)¶
- JAL (Jump and Link) Instruction
- J-type format
- Register Update: rd = PC + 4 (stores return address)
- PC Update: PC = PC + imm (PC-relative addressing)
- JALR (Jump and Link Register) Instruction
- I-type format
- Register Update: rd = PC + 4 (stores return address)
- PC Update: PC = rs1 + imm (register-based addressing)
- Special Case - RET: imm is 0, rd is x0, jumps to address in rs1
- PCsel Signal (PC Select)
- Multiplexer selects between PC+4 (sequential) and JTA (jump target address)
Project06 - Branch Instructions¶
- Branch Mechanics
- Syntax: beq rs0, rs1, label
- Two-step process:
- Compute branch target address (BTA = PC + imm-b)
- Determine if branch should be taken based on comparison
- Branch Unit Architecture
- Comparison logic: equality (=), inequality (!=), less than (<), greater than or equal (>=)
- Outputs: take_branch signal, PCbr (branch target address)
- Branch Instructions Supported
- beq: Branch if equal
- bne: Branch if not equal
- blt: Branch if less than
- bge: Branch if greater than or equal
Project06 - Data Memory¶
- Memory Architecture
- RAM for data storage
- Memory size calculation: 2^address_bits × bytes_per_word
- Example: 7 address bits, 8 bytes per word = 1024 bytes
- Load/Store Instructions
- ld (load doubleword): loads 64-bit data from memory
- sd (store doubleword): stores 64-bit data to memory
- lw (load word): loads 32-bit data from memory
- sw (store word): stores 32-bit data to memory
- lb (load byte), sb (store byte)
- Memory Address Calculation
- Target address = base register + offset
- Example:
ld t0, 8(sp)- load from stack pointer + 8 byte offset - RAM Interface Signals
- Din (Data In): 64-bit input for write operations
- Dout (Data Out): 64-bit output for read operations
- str (store signal): control for write enable
- ld (load signal): control for read enable
- clk (clock): synchronous operation
- Control Signals for Memory
- WDsel or Mem2Reg: selects memory output to write back to register file
- MSZ (Memory Size): controls data size for load/store operations
- Preparing Programs for Processor Execution
- Remove ".global" directives
- Use
jalinstead ofcallfor function calls - Must include a
mainfunction - Initialize SP register (e.g.,
li sp, 1024) - Use
unimpinstruction to halt execution
Project07 - Processor Pipelining¶
- Pipelining Fundamentals
- Multiple instructions execute simultaneously in different stages
- Laundry analogy: overlapping wash, dry, fold, put away operations
- Theoretical speedup: n-stage pipeline can speed up by factor of n
- Five-Stage RISC-V Pipeline
- IF (Instruction Fetch): Fetch instruction from memory using PC
- DR/DE (Decode and Register File Read): Decode instruction, read registers
- EX (Execute): Execute operation using ALU, Branch Unit
- MEM (Memory): Memory store/load operations
- WB (Write Back): Write results back to Register File
- Pipeline Registers
- Placed between stages (IF/DE, DE/EX, EX/MEM, MEM/WB)
- Hold intermediate values
- Synchronize pipeline flow
- Pipeline Performance
- Total time = (stages × stage_time) + ((tasks - 1) × stage_time)
- First item latency + (n-1) additional items at stage intervals
Project07 - Pipeline Hazards¶
- Types of Pipeline Hazards
- Data Hazards: Instructions depend on results from previous instructions
- Control Hazards: Caused by jumps and branches
- Data Hazard Solutions
- Inverting the Clock to the RegFile
- DR (Data Read) and WB (Write Back) timing optimization
- Eliminates one NOP instruction
- Reduces cycles (e.g., 10 cycles to 9 cycles)
- Forwarding (Data Forwarding)
- Forward data directly between pipeline stages
- Bypass register file for data dependencies
- Further cycle reduction (e.g., 10 cycles to 6 cycles)
- Load Stalling (LD Stall)
- Specific handling for load-use dependencies
- Pipeline stall insertion when necessary
- Hazard detection unit implementation
- Enable signal control for stalling
- Control Hazard Solutions
- Branch and jump handling
- Managing control flow changes in the pipeline
- Hazard Detection Unit
- ld-stall detection
- PC enable control during stalls
- Combinational logic: (ld-stall OR en) = stl-en
- Pipeline Stage Tracking
- Visualizing concurrent instruction execution
- Identifying which stage each instruction occupies per cycle
- Counting total cycles for instruction sequences
Advanced Architecture Topics¶
- Speculative Execution
- Executing instructions before confirming they're needed
- Related to branch prediction
- Out-of-Order Execution
- Executing instructions in non-sequential order for performance
- Allows independent instructions to execute while waiting for dependencies
- Example:
ld t0, (a0)followed by independentadd t1, t2, t3 - Majority Function Circuit
- 3-input majority function: output is 1 when majority of inputs are 1
- SOP expression: majority = (a·b·c̄) + (a·b̄·c) + (ā·b·c) + (a·b·c)