Skip to content

Key Concepts from Labs and Projects

Lab01 - Hello World in the RISC-V Dev Env

  • Console based editing: micro or vim
  • git and GitHub usage
  • Makefiles
  • Hello World in C
  • The Autograder

Project01 - RISC-V VM Access and Number Conversion in C

  • Dev Environment Setup
  • Local RISC-V VM using qemu-system-riscv64
  • Remote RISC-V VM on euryale
  • ssh keys, ssh config, ssh authorized_keys
  • Shell configuration
    • bash: ~/.profile, ~/.bash_profile, ~/.bashrc, ~/.bash_aliases
    • zsh: ~/.zprofile, ~/.zshrc
  • C Basics
  • Functions
  • Data (global, stack, heap)
  • Statements and expressions
  • Data types
    • Primitives - int, char, float, uint8_t, uint32_t, int32_t, uint64_t, int64_t
    • Composite - arrays and structs
    • Pointers - int *, char *, uint32_t *, uint64_t *
  • Pointer operations
    • & address of, e.g., int x; int *p; p = &x;
    • * dereference, e.g., x = *p;
  • Strings in C
    • Array of chars (bytes)
    • Null \0 (0) terminated
    • ASCII character codes
  • The C stack
    • Local variables
    • Allocated on function entry, deallocated on function exit (return)
  • Signed (int, int32_t) vs unsigned (unsigned int, uint32_t) values
  • Command line arguments with int argc and char *argv
  • Layout of argv array in memory
  • Array of pointers first, argv[0], argv[1], argv[2], NULL (0)
  • String arrays second, ./foo + \0, -p + \0, etc.
  • Parsing command line arguments
  • Number systems
  • Decimal (base 10, digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
  • Binary (base 2, digits: 0, 1, C literal prefix: 0b)
  • Hexadecimal, "hex" (base 16: digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F, C literal prefix: 0x)
    • A = 10, B = 11, C = 12, D = 13, E = 14, F = 15
  • base conversions
  • C string as decimal, binay, hexadecimal to int
    • Place value: 2 * (10^2) + 5 * (10^1) + 4 * (10^0)
  • int to C string as decimal, binary, hexadecimal
    • Modulus to find digit
    • Divide to get value for next digit
    • Repeat until divide gives 0

Lab02 - Introduction to RISC-V Assembly Programming

  • Instructions, registers, labels, directives
  • See RISC-V Assembly Guides
  • Assembling vs compiling
  • Assembling (as) assembly source (.s) to object code (.o)
  • Compiling (gcc) c source (.c) to object code (.o)
  • The object files must be linked, we can use gcc for this
    • as -o foo.o foo.s
    • gcc -o main main.c foo.o
  • The C compiler can also assemble
    • gcc -o main main.c main.s
  • Instructions have a name (mnemonic), like add or mul
  • Assembly programming model
  • A RISC-V processor has 32 registers plus the PC registers
    • x0, x1, ... x31
    • ABI names: sp, ra, a0, a1, ..., t0, t1, ..., s0, s1, ...
  • The PC points to next instruction in memory to execute
  • Load an instruction from memory into processor
  • Execute the instruction, usually updates one or more register values, but can also update memory
  • Update PC, often just the next instruction PC + 4, but can also be a diffenent address in the case of a branch or jump (control instruction)
  • Most instructions have three operands
  • One target (destination)
  • Two source
  • add t0, t1, t1 means t0 = t1 + t2
  • On RISC-V 64 bit registers are 64 bits (8 bytes)
  • Data processing instructions
  • Immediate values
  • Control instructions
  • conditional branchs: beq, bne, blt, bge
  • jumps: j
  • return: ret
  • Assembly arguments passed in a0, a1, a2, a3, etc.
  • Return value is put into a0
  • Basic assembly function
    .global func_name
    func_name:
        add a0, a0, a1
        ret
    
  • if/then/else in assembly
  • C:
    int x, y, r;
    r = 0;
    if (x > y) {
       r = r + 1;
    } else {
       r = r + 99;
    }
    
  • Asm: Assume a0 - int x, a1 - int y, t0 - int r
        li t0, 0
        bgt a0, a1, else
        addi t0, t0, 1
        j endif
    else:
        addi t0, t0, 99
    endif:
    
  • for loops in assembly
  • C:
    int r, i, n;
    r = 0;
    n = 10
    for (i = 0; i < n; i++) {
        r = r + i;
    }
    
  • Assume t0 - int r, t1 - int i, t2 - int n
        li r, 0
        li n, 10
        li i, 0
    loop:
        bge t1, t2, loopend
        addi t0, t0, t1
        addi t1, t1, 1
        j loop
    loopend:
    
  • Array access in assembly
  • lw (load word) lw t0, (a0) means t0 = *a0
  • Load word at address a0 into register t0
  • Pointer based array access
    • To get to next element in array of words (ints): addi a0, a0, 4

Project02 - RISC-V Assembly Language

  • RISC-V ABI Function Calling Conventions
  • Only one set of registers (32), so we need a way to coordinate usage between functions
  • Caller-saved registers (a0, a1, ..., a7, t0, t, ... t6, ra)
    • These must be preserved on the stack by the caller of a function
    • You only need to preserve the registers you will be using after the function call
  • Callee-saved registers (sp, s0, s1, ... s11)
    • These must be preserved on the stack by the callee, on entry to the function.
    • They should be restored on exit.
    • The stack pointer is preserved by subtracting (stack allocation) on function entry and adding (stack deallocation) on function exit the name number of bytes.
    • The sp should be a multiple of 16 (for performance compatibility with some instructions)
  • Functions that don't call other functions do not need to allocate stack space.
  • Only need stack space if the function uses caller-saved registers
  • Function call template when using stack:
    .global func_name
    func_name:
        addi sp, sp, -N    # Allocate N bytes on stack, N must be a multiple of 16
        sd ra, (sp)         # Save ra onto stack
                            # Save any callee-saved registers if needed
        ...
        ld ra, (sp)
        addi sp, sp, +N
        ret
    
  • Indexed-based array access
  • Assume t0 is int i, a0 is base address of array int arr[]
    li t1, 4
    mul t1, t1, t0
    add t1, a0, t1
    lw t2, (t1)
    
  • Altneratively use a shift instead of multiply
    slli t1, t0, 2
    add t1, a0, t1
    lw t2, (t1)
    
  • Recursive functions
  • Nothing special, just that the func calls itself
  • Same rules apply
  • Presere any caller-saved registers on stack before the recursive call
  • Restore any caller-saved registers from stack after the recursive call
  • If a recursive function simply returns after the recursive call, then it is called tail recursive and the recursive call can be turned into a jump.

Project03 - RISC-V Assembly Language Part 2

  • Working with C strings in assembly
  • Use lb (load byte) and sb (store byte) to access characters in a string
  • Calling C functions from Assembly
  • Just add a .global func_name for the function you want to use
  • E.g., .global strlen
  • Then set arg registers and use call instruction as normal
  • Make sure to preserve any caller-saved registers before call
  • Memory
  • Byte addressable
  • Smallest addressable unit is a byte (8 bits)
  • A int (word) is 4 bytes, need to decide how to layout a word in memory
  • Two byte orderings
    • Big endian (put the most significant byte first)
    • Little endian (put the least significant byte first)
      int x = 0xFFAA1122
      4 |    |   4 |    |
      3 | 22 |   3 | FF |
      2 | 11 |   2 | AA |
      1 | AA |   1 | 11 |
      0 | FF |   0 | 22 |
      Big        Little
      Endian     Endian
      
  • Two's Complement (2's Complement)
  • Need a way to represent negative integer values
  • To get the two's complement negative representation of a positive value:
    • twos = invert_bits(val) + 1
  • 3-bit two's complement values
    Bin Dec
    000   0
    001   1
    010   2
    011   3
    100  -4
    101  -3
    110  -2
    111  -1
    
  • Two's complement properties
    • The msb (most significant bit) is the sign bit (0 is pos, 1 is neg)
    • Only one representation of 0
    • Because of this, there is one more neg value than pos values
    • Simple grade school arithmetic work on two's complement numbers
  • Bit manipulation and bitwise operators
  • Zero (0) / One (1)
    • Other names: unset/set, low/higg, off/on, false/true
  • A byte is 8 bits
    msb | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | lsb
    
    • msb - most significant bit
    • lsb - least significant bit
  • A word is 4 bytes or 32 bits
    msb | 31      24|23      16|15       8|7        0| lsb
        |   byte 3  |  byte 2  |  byte 1  |  byte 0  |
    
  • Bitwise Operaters (op, C, ASM)
    • AND , & , and/andi
    • OR , | , or/ori
    • NOT , ~
    • XOR , ^ ,
    • Shift Left Logical
    • Shift x to the left by n bits
    • Fill lower bits with 0
    • uint32_t x
    • x << n
    • sll/slli
    • Shift Right Logical
    • Shift x to the right by n bits
    • Fill upper bits with 0
    • uint32_t x
    • x >> n
    • srl/srli
    • Shift Right Arithmetic
    • Shift x to the right by n bits
    • Fill upper bits with sign bit value
    • int32_t x
    • x >> n
    • sra/srai
  • Masking
    • It is often need to pull out a sequence of bits from a larger value
    • To get a sequence of bits:
    • Shift the bits to the far right
    • Mask the bits to zero out any upper bits we don't want
    • Example: assume we want the middle 4 bits of an 8 bit value (bits 2-5)
      uint8_t x = 0b11011010;
      uint32_t v = x >> 2     // Shift the 4 bits to the beginning
      uint32_t v = v & 0b1111 // Mask off upper bits remaining in v
      
  • Combine individual bit sequences with right shift and OR (|)
  • Sign extension
    • Assume you have a 4-bit two's complement number you want to sign extend to be a 32-bit two's complement number.
      int32_t v = 0b1110 // -2 in 4 bits
      v = (v << 28) >> 28;
      
    • Shift all the way to the left and then do shift right arithmetic all the way back

Lab03 - RISC-V Machine Code Emulation

  • RISC-V Machine Code and Emulation
  • Instructions are 32 bits (1 word), there is also a 16 bit compressed format
  • Each instruction encodes source register(s), a destination registers, and target address or offset
  • We decode the iw (instruction word) in steps
    • Look at opcode to determine instruction format type
    • Decode the iw based on opcode
    • Further decode iw by looking at other fields such as funct3 and funct7
  • The base RISC-V emulator includes the emulated state as a struct:
    struct rv_state {
        uint64_t regs[NREGS];
        uint64_t pc;
        uint8_t stack[STACK_SIZE];
    };
    
    • NREGS is 32 (because RISC-V has 32 registers)
    • The pc is the program counter
    • STACK_SIZE is 8192, but for prorgrams with lots of function calls and/or local vars this may need to be bigger.
  • Basic emulation
    • We will emulate assembly programs that have been linked with the emulator code
    • Initialize the rv_state with
    • A pointer to the function we want to emulate
    • The first 4 arguments as uint64_t values to send to the function
    • Get iw: iw = *pc
    • Get opcode from iw
    • Based on opcode call emulation function, like:
    • emu_r_type(rsp, iw)
    • rsp is struct rv_state *rsp
    • The emu functions will further decode the iw
    • Use shift and mask (get_bits()) to get each field
    • Update the state (registers and memory based on opcode, funct3, and funct7)
    • Update the pc, usually pc += 4 to go to the next instruction
      • For control instructions, we will compute either an offset and add to pc
      • Or, for jalr use a register value to update the pc
  • RISC-V Instruction Formats
  • R-type |funct7[31:25]|rs2[24:20]|rs1[19:15]|funct3[14:12]|rd[11:7]|opcode[6:0]|
    • For register instructions like add, sub, sll
  • I-type |imm_11_0[31:20]|rs1[19:15]|funct3[14:12]|rd[11:7]|opcode[6:0]|
    • For immediate instruction like addi, slli, lw, lb
    • Immediate must be constructed from parts and sign extended
    • int64_t imm = sign_extend(imm_11_0, 11)
  • B-type |imm_12[31]|imm_10_5[30:25]|rs2[24:20]|rs1[19:15]|funct3[14:12]|imm_4_1[11:8]|imm_11[7]|opcode[6:0]|
    • For branches, like beq, bne, blt, bge
    • int64_t imm = sign_extend(imm_12 << 12 | imm_11 << 11 | imm_10_5 << 5 | imm_4_1 << 1)
    • If branch taken pc += imm, if not taken pc += 4
  • J-type |imm_20[31]|imm_10_1[30:21]|imm_11[20]|imm_19_12[19:12]|rd[11:7]|opcode|
    • For 'jal' jump and link, also 'j'
    • int64_t imm = sign_extend(imm_20 << 20 | imm_19_12 << 12 | imm_11 < 11 | imm_10_1 << 1)
    • pc += imm
  • RISC-V Pseudo Instructions
  • The assembler provide many convenience instructions that do not exist as real instructions
  • Examples
    • li a0, 99 is addi a0, zero, 99
    • j offset is jal zero, offset
    • call offset is jal ra, offset
    • bgt r1, r2, offset is blt r2, r1, offset

Project04 - RISC-V Emulation - Analysis - Cache Simulation

  • Additional RISC-V Instruction Formats
  • S-type |imm_11_5[31:25]|rs2[24:20]|rs1[19:15]|funct3[14:12]|imm_4_0[11:7]|opcode[6:0]|
    • For store instuctions, like sb, sw, sd
    • Immediate must be constructed from parts and sign extended
    • int64_t imm = sign_extend(imm_11_5 << 5 | imm_4_0, 11)
  • U-type |imm_31_12[31:12]|rd[11:7]|opcode[6:0]|
    • For lui (load upper immediate) and auipc add upper immediate to pc
    • Our code dosn't use these
  • Note that load instructions are an I-type form
  • Dynamic Analysis
  • Because we emulate each instruction we can do the following
    • Count the total number of instructions executed
    • Count the different types of instructions executed
    • I/R count, Loads, Stores, Jumps, Branches Taken, Branches Not Taken
    • Counts added to struct rv_state as a new struct struct rv_analysis
    • Need to update counts accurately in the emulation functions
  • Cache Memory and Simulation
  • A Cache is a small and fast memory that holds recently accessed data from main memory
  • Main memory is slow relative to CPU speed
  • A cache can work because program exeuction adheres to two principles of locality
    • Principle of spatial locality - If code accesses a value, there is good chance it will access nearby values
    • Principle of temporal locality - If code accesses a value now there is a good chance it will do so again soon
  • We send an address to a cache an request a value back
    • Hit - value is in the cache for the requested address
    • Miss - value is not in the cache for the requested address
    • Hit Rate = #hits / #refs
    • Miss Rate = #misses / #refs
    • In modern processes caches can achieve very high hit rates, > 95%
  • Main cache ideas
    • For a given memory address, how to locate value in the cache?
    • How to replace a value?
    • How many bytes do we load at at time into the cache (block size)?
  • Three main cache designs
    • Direct Mapped
    • Fully Associative
    • Set Associative
  • Direct Mapped
    • Each address maps to one slot in the cache
    • On miss, just repace slot value with new value from memory
  • Fully Associative
    • Each address can map to any slot in the cache
    • Use a form of LRU (least recently used) to choose which slot to evict on miss
  • Set Associative
    • An address maps to a set of slots, and can map to any slot in the set
    • Use LRU within the slots in a set to choose a slot to evict
  • Block Size
    • To support spatial locatlity a cache will read in more than the word requested
    • We can think of memory as an array of blocks
    • We need to find the block for a give address
    • Find the base of the block
    • The load the entire block into the cache slot