Skip to content

Key Concepts from Labs and Projects

Lab01 - Hello World in the RISC-V Dev Env

  • Console based editing: micro or vim
  • git and GitHub usage
  • Makefiles
  • Hello World in C
  • The Autograder

Project01 - RISC-V VM Access and Number Conversion in C

  • Dev Environment Setup
  • Local RISC-V VM using qemu-system-riscv64
  • Remote RISC-V VM on euryale
  • ssh keys, ssh config, ssh authorized_keys
  • Shell configuration
    • bash: ~/.profile, ~/.bash_profile, ~/.bashrc, ~/.bash_aliases
    • zsh: ~/.zprofile, ~/.zshrc
  • C Basics
  • Functions
  • Data (global, stack, heap)
  • Statements and expressions
  • Data types
    • Primitives - int, char, float, uint8_t, uint32_t, int32_t, uint64_t, int64_t
    • Composite - arrays and structs
    • Pointers - int *, char *, uint32_t *, uint64_t *
  • Pointer operations
    • & address of, e.g., int x; int *p; p = &x;
    • * dereference, e.g., x = *p;
  • Strings in C
    • Array of chars (bytes)
    • Null \0 (0) terminated
    • ASCII character codes
  • The C stack
    • Local variables
    • Allocated on function entry, deallocated on function exit (return)
  • Signed (int, int32_t) vs unsigned (unsigned int, uint32_t) values
  • Command line arguments with int argc and char *argv
  • Layout of argv array in memory
  • Array of pointers first, argv[0], argv[1], argv[2], NULL (0)
  • String arrays second, ./foo + \0, -p + \0, etc.
  • Parsing command line arguments
  • Number systems
  • Decimal (base 10, digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
  • Binary (base 2, digits: 0, 1, C literal prefix: 0b)
  • Hexadecimal, "hex" (base 16: digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F, C literal prefix: 0x)
    • A = 10, B = 11, C = 12, D = 13, E = 14, F = 15
  • base conversions
  • C string as decimal, binay, hexadecimal to int
    • Place value: 2 * (10^2) + 5 * (10^1) + 4 * (10^0)
  • int to C string as decimal, binary, hexadecimal
    • Modulus to find digit
    • Divide to get value for next digit
    • Repeat until divide gives 0

Lab02 - Introduction to RISC-V Assembly Programming

  • Instructions, registers, labels, directives
  • See RISC-V Assembly Guides
  • Assembling vs compiling
  • Assembling (as) assembly source (.s) to object code (.o)
  • Compiling (gcc) c source (.c) to object code (.o)
  • The object files must be linked, we can use gcc for this
    • as -o foo.o foo.s
    • gcc -o main main.c foo.o
  • The C compiler can also assemble
    • gcc -o main main.c main.s
  • Instructions have a name (mnemonic), like add or mul
  • Assembly programming model
  • A RISC-V processor has 32 registers plus the PC registers
    • x0, x1, ... x31
    • ABI names: sp, ra, a0, a1, ..., t0, t1, ..., s0, s1, ...
  • The PC points to next instruction in memory to execute
  • Load an instruction from memory into processor
  • Execute the instruction, usually updates one or more register values, but can also update memory
  • Update PC, often just the next instruction PC + 4, but can also be a diffenent address in the case of a branch or jump (control instruction)
  • Most instructions have three operands
  • One target (destination)
  • Two source
  • add t0, t1, t1 means t0 = t1 + t2
  • On RISC-V 64 bit registers are 64 bits (8 bytes)
  • Data processing instructions
  • Immediate values
  • Control instructions
  • conditional branchs: beq, bne, blt, bge
  • jumps: j
  • return: ret
  • Assembly arguments passed in a0, a1, a2, a3, etc.
  • Return value is put into a0
  • Basic assembly function
    .global func_name
    func_name:
        add a0, a0, a1
        ret
    
  • if/then/else in assembly
  • C:
    int x, y, r;
    r = 0;
    if (x > y) {
       r = r + 1;
    } else {
       r = r + 99;
    }
    
  • Asm: Assume a0 - int x, a1 - int y, t0 - int r
        li t0, 0
        bgt a0, a1, else
        addi t0, t0, 1
        j endif
    else:
        addi t0, t0, 99
    endif:
    
  • for loops in assembly
  • C:
    int r, i, n;
    r = 0;
    n = 10
    for (i = 0; i < n; i++) {
        r = r + i;
    }
    
  • Assume t0 - int r, t1 - int i, t2 - int n
        li r, 0
        li n, 10
        li i, 0
    loop:
        bge t1, t2, loopend
        addi t0, t0, t1
        addi t1, t1, 1
        j loop
    loopend:
    
  • Array access in assembly
  • lw (load word) lw t0, (a0) means t0 = *a0
  • Load word at address a0 into register t0
  • Pointer based array access
    • To get to next element in array of words (ints): addi a0, a0, 4

Project02 - RISC-V Assembly Language

  • RISC-V ABI Function Calling Conventions
  • Only one set of registers (32), so we need a way to coordinate usage between functions
  • Caller-saved registers (a0, a1, ..., a7, t0, t, ... t6, ra)
    • These must be preserved on the stack by the caller of a function
    • You only need to preserve the registers you will be using after the function call
  • Callee-saved registers (sp, s0, s1, ... s11)
    • These must be preserved on the stack by the callee, on entry to the function.
    • They should be restored on exit.
    • The stack pointer is preserved by subtracting (stack allocation) on function entry and adding (stack deallocation) on function exit the name number of bytes.
    • The sp should be a multiple of 16 (for performance compatibility with some instructions)
  • Functions that don't call other functions do not need to allocate stack space.
  • Only need stack space if the function uses caller-saved registers
  • Function call template when using stack:
    .global func_name
    func_name:
        addi sp, sp, -N    # Allocate N bytes on stack, N must be a multiple of 16
        sd ra, (sp)         # Save ra onto stack
                            # Save any callee-saved registers if needed
        ...
        ld ra, (sp)
        addi sp, sp, +N
        ret
    
  • Indexed-based array access
  • Assume t0 is int i, a0 is base address of array int arr[]
    li t1, 4
    mul t1, t1, t0
    add t1, a0, t1
    lw t2, (t1)
    
  • Altneratively use a shift instead of multiply
    slli t1, t0, 2
    add t1, a0, t1
    lw t2, (t1)
    
  • Recursive functions
  • Nothing special, just that the func calls itself
  • Same rules apply
  • Presere any caller-saved registers on stack before the recursive call
  • Restore any caller-saved registers from stack after the recursive call
  • If a recursive function simply returns after the recursive call, then it is called tail recursive and the recursive call can be turned into a jump.

Project03 - RISC-V Assembly Language Part 2

  • Working with C strings in assembly
  • Use lb (load byte) and sb (store byte) to access characters in a string
  • Calling C functions from Assembly
  • Just add a .global func_name for the function you want to use
  • E.g., .global strlen
  • Then set arg registers and use call instruction as normal
  • Make sure to preserve any caller-saved registers before call
  • Memory
  • Byte addressable
  • Smallest addressable unit is a byte (8 bits)
  • A int (word) is 4 bytes, need to decide how to layout a word in memory
  • Two byte orderings
    • Big endian (put the most significant byte first)
    • Little endian (put the least significant byte first)
      int x = 0xFFAA1122
      4 |    |   4 |    |
      3 | 22 |   3 | FF |
      2 | 11 |   2 | AA |
      1 | AA |   1 | 11 |
      0 | FF |   0 | 22 |
      Big        Little
      Endian     Endian
      
  • Two's Complement (2's Complement)
  • Need a way to represent negative integer values
  • To get the two's complement negative representation of a positive value:
    • twos = invert_bits(val) + 1
  • 3-bit two's complement values
    Bin Dec
    000   0
    001   1
    010   2
    011   3
    100  -4
    101  -3
    110  -2
    111  -1
    
  • Two's complement properties
    • The msb (most significant bit) is the sign bit (0 is pos, 1 is neg)
    • Only one representation of 0
    • Because of this, there is one more neg value than pos values
    • Simple grade school arithmetic work on two's complement numbers
  • Bit manipulation and bitwise operators
  • Zero (0) / One (1)
    • Other names: unset/set, low/higg, off/on, false/true
  • A byte is 8 bits
    msb | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | lsb
    
    • msb - most significant bit
    • lsb - least significant bit
  • A word is 4 bytes or 32 bits
    msb | 31      24|23      16|15       8|7        0| lsb
        |   byte 3  |  byte 2  |  byte 1  |  byte 0  |
    
  • Bitwise Operaters (op, C, ASM)
    • AND , & , and/andi
    • OR , | , or/ori
    • NOT , ~
    • XOR , ^ ,
    • Shift Left Logical
    • Shift x to the left by n bits
    • Fill lower bits with 0
    • uint32_t x
    • x << n
    • sll/slli
    • Shift Right Logical
    • Shift x to the right by n bits
    • Fill upper bits with 0
    • uint32_t x
    • x >> n
    • srl/srli
    • Shift Right Arithmetic
    • Shift x to the right by n bits
    • Fill upper bits with sign bit value
    • int32_t x
    • x >> n
    • sra/srai
  • Masking
    • It is often need to pull out a sequence of bits from a larger value
    • To get a sequence of bits:
    • Shift the bits to the far right
    • Mask the bits to zero out any upper bits we don't want
    • Example: assume we want the middle 4 bits of an 8 bit value (bits 2-5)
      uint8_t x = 0b11011010;
      uint32_t v = x >> 2     // Shift the 4 bits to the beginning
      uint32_t v = v & 0b1111 // Mask off upper bits remaining in v
      
  • Combine individual bit sequences with right shift and OR (|)
  • Sign extension
    • Assume you have a 4-bit two's complement number you want to sign extend to be a 32-bit two's complement number.
      int32_t v = 0b1110 // -2 in 4 bits
      v = (v << 28) >> 28;
      
    • Shift all the way to the left and then do shift right arithmetic all the way back

Lab03 - RISC-V Machine Code Emulation

  • RISC-V Machine Code and Emulation
  • Instructions are 32 bits (1 word), there is also a 16 bit compressed format
  • Each instruction encodes source register(s), a destination registers, and target address or offset
  • We decode the iw (instruction word) in steps
    • Look at opcode to determine instruction format type
    • Decode the iw based on opcode
    • Further decode iw by looking at other fields such as funct3 and funct7
  • The base RISC-V emulator includes the emulated state as a struct:
    struct rv_state {
        uint64_t regs[NREGS];
        uint64_t pc;
        uint8_t stack[STACK_SIZE];
    };
    
    • NREGS is 32 (because RISC-V has 32 registers)
    • The pc is the program counter
    • STACK_SIZE is 8192, but for prorgrams with lots of function calls and/or local vars this may need to be bigger.
  • Basic emulation
    • We will emulate assembly programs that have been linked with the emulator code
    • Initialize the rv_state with
    • A pointer to the function we want to emulate
    • The first 4 arguments as uint64_t values to send to the function
    • Get iw: iw = *pc
    • Get opcode from iw
    • Based on opcode call emulation function, like:
    • emu_r_type(rsp, iw)
    • rsp is struct rv_state *rsp
    • The emu functions will further decode the iw
    • Use shift and mask (get_bits()) to get each field
    • Update the state (registers and memory based on opcode, funct3, and funct7)
    • Update the pc, usually pc += 4 to go to the next instruction
      • For control instructions, we will compute either an offset and add to pc
      • Or, for jalr use a register value to update the pc
  • RISC-V Instruction Formats
  • R-type |funct7[31:25]|rs2[24:20]|rs1[19:15]|funct3[14:12]|rd[11:7]|opcode[6:0]|
    • For register instructions like add, sub, sll
  • I-type |imm_11_0[31:20]|rs1[19:15]|funct3[14:12]|rd[11:7]|opcode[6:0]|
    • For immediate instruction like addi, slli, lw, lb
    • Immediate must be constructed from parts and sign extended
    • int64_t imm = sign_extend(imm_11_0, 11)
  • B-type |imm_12[31]|imm_10_5[30:25]|rs2[24:20]|rs1[19:15]|funct3[14:12]|imm_4_1[11:8]|imm_11[7]|opcode[6:0]|
    • For branches, like beq, bne, blt, bge
    • int64_t imm = sign_extend(imm_12 << 12 | imm_11 << 11 | imm_10_5 << 5 | imm_4_1 << 1)
    • If branch taken pc += imm, if not taken pc += 4
  • J-type |imm_20[31]|imm_10_1[30:21]|imm_11[20]|imm_19_12[19:12]|rd[11:7]|opcode|
    • For 'jal' jump and link, also 'j'
    • int64_t imm = sign_extend(imm_20 << 20 | imm_19_12 << 12 | imm_11 < 11 | imm_10_1 << 1)
    • pc += imm
  • RISC-V Pseudo Instructions
  • The assembler provide many convenience instructions that do not exist as real instructions
  • Examples
    • li a0, 99 is addi a0, zero, 99
    • j offset is jal zero, offset
    • call offset is jal ra, offset
    • bgt r1, r2, offset is blt r2, r1, offset

Project04 - RISC-V Emulation - Analysis - Cache Simulation

  • Additional RISC-V Instruction Formats
  • S-type |imm_11_5[31:25]|rs2[24:20]|rs1[19:15]|funct3[14:12]|imm_4_0[11:7]|opcode[6:0]|
    • For store instuctions, like sb, sw, sd
    • Immediate must be constructed from parts and sign extended
    • int64_t imm = sign_extend(imm_11_5 << 5 | imm_4_0, 11)
  • U-type |imm_31_12[31:12]|rd[11:7]|opcode[6:0]|
    • For lui (load upper immediate) and auipc add upper immediate to pc
    • Our code dosn't use these
  • Note that load instructions are an I-type form
  • Dynamic Analysis
  • Because we emulate each instruction we can do the following
    • Count the total number of instructions executed
    • Count the different types of instructions executed
    • I/R count, Loads, Stores, Jumps, Branches Taken, Branches Not Taken
    • Counts added to struct rv_state as a new struct struct rv_analysis
    • Need to update counts accurately in the emulation functions
  • Cache Memory and Simulation
  • A Cache is a small and fast memory that holds recently accessed data from main memory
  • Main memory is slow relative to CPU speed
  • A cache can work because program exeuction adheres to two principles of locality
    • Principle of spatial locality - If code accesses a value, there is good chance it will access nearby values
    • Principle of temporal locality - If code accesses a value now there is a good chance it will do so again soon
  • We send an address to a cache an request a value back
    • Hit - value is in the cache for the requested address
    • Miss - value is not in the cache for the requested address
    • Hit Rate = #hits / #refs
    • Miss Rate = #misses / #refs
    • In modern processes caches can achieve very high hit rates, > 95%
  • Main cache ideas
    • For a given memory address, how to locate value in the cache?
    • How to replace a value?
    • How many bytes do we load at at time into the cache (block size)?
  • Three main cache designs
    • Direct Mapped
    • Fully Associative
    • Set Associative
  • Direct Mapped
    • Each address maps to one slot in the cache
    • On miss, just repace slot value with new value from memory
  • Fully Associative
    • Each address can map to any slot in the cache
    • Use a form of LRU (least recently used) to choose which slot to evict on miss
  • Set Associative
    • An address maps to a set of slots, and can map to any slot in the set
    • Use LRU within the slots in a set to choose a slot to evict
  • Block Size
    • To support spatial locatlity a cache will read in more than the word requested
    • We can think of memory as an array of blocks
    • We need to find the block for a give address
    • Find the base of the block
    • The load the entire block into the cache slot

Lab05 - Introduction to Digital Design

  • Digital Design Fundamentals
  • Analog circuits vs. Digital circuits
    • Analog: continuous voltage signals (0V to 5V)
    • Digital: discrete voltage levels (high/1 and low/0)
  • Digital logic levels
    • High: represented as 1 (approximately 5V)
    • Low: represented as 0 (approximately 0V)
    • Binary state system
  • Hardware Abstraction Hierarchy
  • Wires and devices abstracted into gates
  • Gates combined into larger components
  • Hierarchical design approach (building complex circuits from simpler components)
  • Basic Logic Gates
  • AND gate
    • Symbol and circuit representation
    • Truth table: output 1 only when all inputs are 1
  • OR gate
    • Symbol and circuit representation
    • Truth table: output 1 when any input is 1
  • NOT gate (inverter)
    • Symbol and circuit representation
    • Truth table: inverts the input
  • XOR (Exclusive OR)
  • XNOR (Exclusive NOR)
  • Boolean Algebra and Logic Notation
  • C code notation: r = a & b (AND), r = a | b (OR), r = ~a (NOT)
  • Boolean algebra notation: r = a · b (AND), r = a + b (OR), r = ā (NOT)
  • Logic notation: r = a ∧ b (AND), r = a ∨ b (OR), r = ¬a (NOT)
  • Circuit Composition
  • Building complex expressions from basic gates
  • Example: r = (a · b) + (c · d) - combining AND and OR operations
  • Circuits are directed acyclic graphs (DAGs) - no feedback loops in combinational logic

Lab08 - Combinational Logic

  • Sum-of-Products Method
  • Systematic approach to designing combinational logic circuits
  • Three-step process:
    1. Create truth table
    2. Identify rows with output '1'
    3. Build equation from those rows
  • Product terms: AND operations combining inputs (with inversions as needed)
  • Sum: OR operation combining all product terms
  • Example: Even/odd parity detector
    • Input: 3-bit number (n₂, n₁, n₀)
    • Output: Determines if number of '1' bits is even or odd
  • Arithmetic Circuits - Adders
  • 1-bit Half Adder
    • Two inputs (a, b)
    • Two outputs: sum and carry out (cout)
    • Basic building block for addition
  • 1-bit Full Adder
    • Three inputs: a, b, and carry in (cin)
    • Two outputs: sum and carry out (cout)
    • Used for multi-bit addition
  • 8-bit Ripple-Carry Adder
    • Combines eight 1-bit full adders
    • Carry propagates through the chain
    • Cin initially set to 0
    • Produces 8-bit sum output plus final cout
  • Subtraction using Adders
  • A - B = A + (-B)
  • Using two's complement: invert B then add 1
  • Invert B and set initial carry-in to 1

Lab09 - Digital Logic Components

  • Comparators
  • Equality comparator: checks if a == b
    • Equation: eq = (ā·b̄) + (a·b)
  • 1-bit, 2-bit, 8-bit, and N-bit comparators
  • Built hierarchically using smaller comparators
  • Multiplexers (MUX)
  • 1-bit 2-input MUX: selects between two 1-bit inputs based on select signal
  • 32-bit 2-input MUX: selects between two 32-bit inputs
  • 1-bit 4-input MUX: selects one of four inputs using 2 select signals (S₁, S₀)
    • Select encoding: 00→a, 01→b, 10→c, 11→d
  • Implementation options:
    • Direct implementation using AND-OR logic
    • Tree-based implementation using smaller MUXes
  • ALU (Arithmetic Logic Unit)
  • Combines multiple operations (addition, subtraction, multiplication, etc.)
  • Uses MUXes to select between different operation results
  • Inputs: A and B (operands), ALU-op (select signal for operation)
  • Outputs: Result of selected operation
  • Decoders
  • 2-to-4 Decoder
    • Takes 2 input bits (S1, S0)
    • Produces 4 output lines (r0, r1, r2, r3)
    • Only one output is active (high) at a time
    • Truth table relationships:
    • r0 = (S̄1·S̄0), r1 = (S̄1·S0), r2 = (S1·S̄0), r3 = (S1·S0)
  • Encoders
  • Basic Encoder: Converts one-hot input to binary output
  • Priority Encoder: Handles cases where multiple inputs may be active
    • Assigns priority to inputs (higher numbered inputs have priority)
    • Outputs binary encoding of highest priority active input

Lab09 - Sequential Logic

  • Sequential Logic Fundamentals
  • Circuits that maintain state over time
  • Combine state storage with combinational logic
  • Clock Signals
  • Clock period: time duration of one complete cycle
  • Frequency measured in Hz (cycles per second)
  • Rising edge: transition from low (0) to high (1)
  • Falling edge: transition from high (1) to low (0)
  • Storage Elements Hierarchy
  • SR Latch → D Latch → D Flip-Flop → 1-bit Register → N-bit Register → Counter
  • SR Latch (Set/Reset Latch)
  • Stores a 1-bit value
  • Two inputs: S (Set) and R (Reset)
  • Two outputs: Q and Q̄ (complement)
  • Built using cross-coupled NOR gates
  • Truth table:
    • R=0, S=0: Hold current state
    • R=0, S=1: Set Q to 1
    • R=1, S=0: Reset Q to 0
    • R=1, S=1: Undefined/invalid state
  • D Latch
  • Level-sensitive storage
  • Single data input (D) and clock input (CLK)
  • Passes D to Q when CLK is high (transparent)
  • D Flip-Flop
  • Edge-triggered storage
  • Constructed from two D Latches in master-slave configuration
  • Captures input only on clock edge (rising or falling)
  • More reliable than latches for synchronous circuits
  • N-Bit Register
  • Multiple 1-bit registers operating in parallel
  • Shared control signals: CLK, CLR, EN
  • All bits update synchronously on same clock edge
  • Counter Design
  • 8-bit register stores current count value
  • 8-bit adder increments the count
  • Feedback loop: adder output connects to register input
  • Increments on each clock cycle when enabled

Project05 - Instruction Memory and Decoding

  • ROM (Read-Only Memory)
  • Memory that can only be read from, not written to
  • Uses address bits (A) to select memory locations
  • Outputs instruction words (IW) of 32 bits
  • Example: 8-bit address can access 2^8 = 256 memory locations
  • Instruction Memory Architecture
  • Program Counter (PC) contains 64-bit address pointing to current instruction
  • PC feeds into instruction memory
  • Instruction word (IW) output is 32 bits
  • Address Calculation
  • Memory addressed by bytes (byte addressing)
  • Words consist of 4 bytes
  • Word address = byte address / 4
  • addr_word = addr_byte >> 2 (divide by 4 using right shift by 2)
  • Instruction Decoder (InstDecoder)
  • Analyzes 32-bit instruction word and determines instruction type
  • Input: 32-bit instruction word (IW)
  • Output: Instruction number (inum) indicating instruction type
  • Instruction Type Mapping:
    • 0: i-type, 1: r-type, 2: load, 3: s-type
    • 4: b-type, 5: jal, 6: j, 7: jalr
  • Instruction Decoding Process
  • Bit Splitter: Extracts specific bit ranges from 32-bit instruction
  • Opcode Recognition using equality comparators
    • Example opcodes: 0010011 (i-type), 0110011 (r-type)
  • Priority encoder for instruction type selection

Lab10 - Processor Components

  • Computer Architecture Layers
  • Instruction Set Architecture (ISA)
    • Software-hardware interface
    • Specification of machine code, registers, and memory
  • Microarchitecture
    • Processor implementation of the ISA
    • Digital logic implementation
  • Digital Design
    • Schematic entry (visual) or HDL (Hardware Description Language)
    • Languages: Verilog, VHDL (IEEE standards)
  • Moore's Law
  • The number of transistors doubles every 1.5 years for processors
  • Processor Types
  • Single-Cycle Processor: Each instruction completes in one clock cycle
  • Multi-Cycle Processor: Instructions broken into multiple cycles
  • Pipelined Processor: Multiple instructions execute simultaneously
  • Single-Cycle Processor Components
  • Program Counter (PC): 64-bit register, adder for PC+4
  • Instruction Memory (ROM): Stores program instructions
  • Instruction Decode: Decodes fetched instructions
  • Register File: Connected to ALU
  • Immediate value (Imm): Extracted from instruction
  • Data Memory (RAM): For load/store operations
  • Register File Architecture
  • 32 registers, each 64-bit wide
  • Register names: X0, X1, ..., X31
  • X0 (zero register) always contains zero
  • Operations:
    • Read up to two register values in a single clock cycle
    • Write to a single register in a clock cycle
  • Interface Signals:
    • RR0, RR1: Read register selectors (5 bits each)
    • RD0, RD1: Read data outputs (64 bits each)
    • WR: Write register selector (5 bits)
    • WD: Write data input (64 bits)
    • WE: Write enable (control signal)
    • CLK: Clock signal, CLR: Clear signal
  • Decoder with Enable
  • Select signal controls which output is activated
  • Enable signal (WE) gates the decoder operation
  • Converts binary select input to one-hot encoded output
  • ALU Structure
  • Takes two 64-bit inputs: A and B
  • ALU-op control signal (3 bits) selects operation
  • Produces 64-bit result
  • Internal functional units: ADD, SUB, MUL
  • Final multiplexer selects output based on ALU-op

Project06 - Processor Instruction Decoding

  • Register Decoder (RegDecoder)
  • Takes instruction word (IW) as input
  • Outputs register addresses: rs1, rs2, rd
  • Bit fields:
    • rd: bits 11-7
    • rs1: bits 19-15
    • rs2: bits 24-20
  • Immediate Decoder (ImmDecoder)
  • Takes instruction word (IW) as input
  • Outputs all immediate types (I-type, S-type, B-type, U-type, J-type)
  • S-Type Immediate construction:
    • Bits 11-7 for imm[4:0] and bits 31-25 for imm[11:5]
    • Sign extension required
  • Sign Extension Hardware
  • Converts 12-bit immediate to 64-bit signed value
  • Uses multiplexer to select between 0 and -1 for upper bits
  • Sign bit determines fill value for upper bits
  • Instruction Detection Logic
  • ADDI: opcode 0010011 (7-bit comparator)
  • ADD: opcode 0110011, funct3 000, funct7 0000000
  • AND gates combine comparator outputs to identify specific instructions
  • Control Signals Generated
  • RFW (Register File Write): Controls whether to write to register file
  • ALUOp (ALU Operation): Specifies which ALU operation to perform
  • ALUSrcB (ALU Source B): Selects second operand for ALU
    • 0: use register value, 1: use immediate value
  • ROM for Control Signals
  • Input: 3-bit encoded instruction type (inum)
  • Output: Control signal bundle (ALUSrcB, ALUOp, RFW)
  • Adding New Instructions (5-Step Process)
  • Pick an instruction or group of instructions
  • Modify or add components to processor
  • Modify or add to the datapath (usually need to extend or add muxes)
  • Extend instruction decoder control lines
  • Test the implementation

Project06 - Control Flow Instructions (JAL, JALR)

  • JAL (Jump and Link) Instruction
  • J-type format
  • Register Update: rd = PC + 4 (stores return address)
  • PC Update: PC = PC + imm (PC-relative addressing)
  • JALR (Jump and Link Register) Instruction
  • I-type format
  • Register Update: rd = PC + 4 (stores return address)
  • PC Update: PC = rs1 + imm (register-based addressing)
  • Special Case - RET: imm is 0, rd is x0, jumps to address in rs1
  • PCsel Signal (PC Select)
  • Multiplexer selects between PC+4 (sequential) and JTA (jump target address)

Project06 - Branch Instructions

  • Branch Mechanics
  • Syntax: beq rs0, rs1, label
  • Two-step process:
    1. Compute branch target address (BTA = PC + imm-b)
    2. Determine if branch should be taken based on comparison
  • Branch Unit Architecture
  • Comparison logic: equality (=), inequality (!=), less than (<), greater than or equal (>=)
  • Outputs: take_branch signal, PCbr (branch target address)
  • Branch Instructions Supported
  • beq: Branch if equal
  • bne: Branch if not equal
  • blt: Branch if less than
  • bge: Branch if greater than or equal

Project06 - Data Memory

  • Memory Architecture
  • RAM for data storage
  • Memory size calculation: 2^address_bits × bytes_per_word
  • Example: 7 address bits, 8 bytes per word = 1024 bytes
  • Load/Store Instructions
  • ld (load doubleword): loads 64-bit data from memory
  • sd (store doubleword): stores 64-bit data to memory
  • lw (load word): loads 32-bit data from memory
  • sw (store word): stores 32-bit data to memory
  • lb (load byte), sb (store byte)
  • Memory Address Calculation
  • Target address = base register + offset
  • Example: ld t0, 8(sp) - load from stack pointer + 8 byte offset
  • RAM Interface Signals
  • Din (Data In): 64-bit input for write operations
  • Dout (Data Out): 64-bit output for read operations
  • str (store signal): control for write enable
  • ld (load signal): control for read enable
  • clk (clock): synchronous operation
  • Control Signals for Memory
  • WDsel or Mem2Reg: selects memory output to write back to register file
  • MSZ (Memory Size): controls data size for load/store operations
  • Preparing Programs for Processor Execution
  • Remove ".global" directives
  • Use jal instead of call for function calls
  • Must include a main function
  • Initialize SP register (e.g., li sp, 1024)
  • Use unimp instruction to halt execution

Project07 - Processor Pipelining

  • Pipelining Fundamentals
  • Multiple instructions execute simultaneously in different stages
  • Laundry analogy: overlapping wash, dry, fold, put away operations
  • Theoretical speedup: n-stage pipeline can speed up by factor of n
  • Five-Stage RISC-V Pipeline
  • IF (Instruction Fetch): Fetch instruction from memory using PC
  • DR/DE (Decode and Register File Read): Decode instruction, read registers
  • EX (Execute): Execute operation using ALU, Branch Unit
  • MEM (Memory): Memory store/load operations
  • WB (Write Back): Write results back to Register File
  • Pipeline Registers
  • Placed between stages (IF/DE, DE/EX, EX/MEM, MEM/WB)
  • Hold intermediate values
  • Synchronize pipeline flow
  • Pipeline Performance
  • Total time = (stages × stage_time) + ((tasks - 1) × stage_time)
  • First item latency + (n-1) additional items at stage intervals

Project07 - Pipeline Hazards

  • Types of Pipeline Hazards
  • Data Hazards: Instructions depend on results from previous instructions
  • Control Hazards: Caused by jumps and branches
  • Data Hazard Solutions
  • Inverting the Clock to the RegFile
    • DR (Data Read) and WB (Write Back) timing optimization
    • Eliminates one NOP instruction
    • Reduces cycles (e.g., 10 cycles to 9 cycles)
  • Forwarding (Data Forwarding)
    • Forward data directly between pipeline stages
    • Bypass register file for data dependencies
    • Further cycle reduction (e.g., 10 cycles to 6 cycles)
  • Load Stalling (LD Stall)
    • Specific handling for load-use dependencies
    • Pipeline stall insertion when necessary
    • Hazard detection unit implementation
    • Enable signal control for stalling
  • Control Hazard Solutions
  • Branch and jump handling
  • Managing control flow changes in the pipeline
  • Hazard Detection Unit
  • ld-stall detection
  • PC enable control during stalls
  • Combinational logic: (ld-stall OR en) = stl-en
  • Pipeline Stage Tracking
  • Visualizing concurrent instruction execution
  • Identifying which stage each instruction occupies per cycle
  • Counting total cycles for instruction sequences

Advanced Architecture Topics

  • Speculative Execution
  • Executing instructions before confirming they're needed
  • Related to branch prediction
  • Out-of-Order Execution
  • Executing instructions in non-sequential order for performance
  • Allows independent instructions to execute while waiting for dependencies
  • Example: ld t0, (a0) followed by independent add t1, t2, t3
  • Majority Function Circuit
  • 3-input majority function: output is 1 when majority of inputs are 1
  • SOP expression: majority = (a·b·c̄) + (a·b̄·c) + (ā·b·c) + (a·b·c)