Key Concepts from Labs and Projects¶
Lab01 - Hello World in the RISC-V Dev Env¶
- Console based editing: micro or vim
- git and GitHub usage
- Makefiles
- Hello World in C
- The Autograder
Project01 - RISC-V VM Access and Number Conversion in C¶
- Dev Environment Setup
- Local RISC-V VM using qemu-system-riscv64
- Remote RISC-V VM on euryale
- ssh keys, ssh
config, sshauthorized_keys - Shell configuration
- bash:
~/.profile,~/.bash_profile,~/.bashrc,~/.bash_aliases - zsh:
~/.zprofile,~/.zshrc
- bash:
- C Basics
- Functions
- Data (global, stack, heap)
- Statements and expressions
- Data types
- Primitives -
int,char,float,uint8_t,uint32_t,int32_t,uint64_t,int64_t - Composite - arrays and structs
- Pointers -
int *,char *,uint32_t *,uint64_t *
- Primitives -
- Pointer operations
&address of, e.g.,int x; int *p; p = &x;*dereference, e.g.,x = *p;
- Strings in C
- Array of chars (bytes)
- Null
\0(0) terminated - ASCII character codes
- The C stack
- Local variables
- Allocated on function entry, deallocated on function exit (return)
- Signed (
int,int32_t) vs unsigned (unsigned int,uint32_t) values - Command line arguments with
int argcandchar *argv - Layout of
argvarray in memory - Array of pointers first,
argv[0],argv[1],argv[2],NULL (0) - String arrays second,
./foo+\0,-p+\0, etc. - Parsing command line arguments
- Number systems
- Decimal (base 10, digits:
0,1,2,3,4,5,6,7,8,9) - Binary (base 2, digits:
0,1, C literal prefix:0b) - Hexadecimal, "hex" (base 16: digits:
0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F, C literal prefix:0x)A= 10,B= 11,C= 12,D=13, E =14, F =15
- base conversions
- C string as decimal, binay, hexadecimal to int
- Place value:
2 * (10^2) + 5 * (10^1) + 4 * (10^0)
- Place value:
- int to C string as decimal, binary, hexadecimal
- Modulus to find digit
- Divide to get value for next digit
- Repeat until divide gives 0
Lab02 - Introduction to RISC-V Assembly Programming¶
- Instructions, registers, labels, directives
- See RISC-V Assembly Guides
- Assembling vs compiling
- Assembling (as) assembly source (.s) to object code (.o)
- Compiling (gcc) c source (.c) to object code (.o)
- The object files must be linked, we can use gcc for this
as -o foo.o foo.sgcc -o main main.c foo.o
- The C compiler can also assemble
gcc -o main main.c main.s
- Instructions have a name (mnemonic), like
addormul - Assembly programming model
- A RISC-V processor has 32 registers plus the
PCregistersx0,x1, ...x31- ABI names:
sp,ra,a0,a1, ...,t0,t1, ...,s0,s1, ...
- The
PCpoints to next instruction in memory to execute - Load an instruction from memory into processor
- Execute the instruction, usually updates one or more register values, but can also update memory
- Update
PC, often just the next instructionPC + 4, but can also be a diffenent address in the case of a branch or jump (control instruction) - Most instructions have three operands
- One target (destination)
- Two source
add t0, t1, t1meanst0 = t1 + t2- On RISC-V 64 bit registers are 64 bits (8 bytes)
- Data processing instructions
- Immediate values
- Control instructions
- conditional branchs:
beq,bne,blt,bge - jumps:
j - return:
ret - Assembly arguments passed in
a0,a1,a2,a3, etc. - Return value is put into
a0 - Basic assembly function
- if/then/else in assembly
- C:
- Asm: Assume
a0 - int x, a1 - int y, t0 - int r - for loops in assembly
- C:
- Assume
t0 - int r, t1 - int i, t2 - int n - Array access in assembly
- lw (load word)
lw t0, (a0)meanst0 = *a0 - Load word at address
a0into registert0 - Pointer based array access
- To get to next element in array of words (ints):
addi a0, a0, 4
- To get to next element in array of words (ints):
Project02 - RISC-V Assembly Language¶
- RISC-V ABI Function Calling Conventions
- Only one set of registers (32), so we need a way to coordinate usage between functions
- Caller-saved registers (
a0,a1, ...,a7,t0,t, ...t6,ra)- These must be preserved on the stack by the caller of a function
- You only need to preserve the registers you will be using after the function call
- Callee-saved registers (
sp,s0,s1, ...s11)- These must be preserved on the stack by the callee, on entry to the function.
- They should be restored on exit.
- The stack pointer is preserved by subtracting (stack allocation) on function entry and adding (stack deallocation) on function exit the name number of bytes.
- The
spshould be a multiple of 16 (for performance compatibility with some instructions)
- Functions that don't call other functions do not need to allocate stack space.
- Only need stack space if the function uses caller-saved registers
- Function call template when using stack:
- Indexed-based array access
- Assume
t0isint i,a0is base address of arrayint arr[] - Altneratively use a shift instead of multiply
- Recursive functions
- Nothing special, just that the func calls itself
- Same rules apply
- Presere any caller-saved registers on stack before the recursive call
- Restore any caller-saved registers from stack after the recursive call
- If a recursive function simply returns after the recursive call, then it is called tail recursive and the recursive call can be turned into a jump.
Project03 - RISC-V Assembly Language Part 2¶
- Working with C strings in assembly
- Use
lb(load byte) andsb(store byte) to access characters in a string - Calling C functions from Assembly
- Just add a
.global func_namefor the function you want to use - E.g.,
.global strlen - Then set arg registers and use
callinstruction as normal - Make sure to preserve any caller-saved registers before call
- Memory
- Byte addressable
- Smallest addressable unit is a byte (8 bits)
- A int (word) is 4 bytes, need to decide how to layout a word in memory
- Two byte orderings
- Big endian (put the most significant byte first)
- Little endian (put the least significant byte first)
- Two's Complement (2's Complement)
- Need a way to represent negative integer values
- To get the two's complement negative representation of a positive value:
- twos = invert_bits(val) + 1
- 3-bit two's complement values
- Two's complement properties
- The msb (most significant bit) is the sign bit (0 is pos, 1 is neg)
- Only one representation of 0
- Because of this, there is one more neg value than pos values
- Simple grade school arithmetic work on two's complement numbers
- Bit manipulation and bitwise operators
- Zero (0) / One (1)
- Other names: unset/set, low/higg, off/on, false/true
- A byte is 8 bits
- msb - most significant bit
- lsb - least significant bit
- A word is 4 bytes or 32 bits
- Bitwise Operaters (op, C, ASM)
AND,&,and/andiOR,|,or/oriNOT,~XOR,^,- Shift Left Logical
- Shift x to the left by n bits
- Fill lower bits with 0
uint32_t xx << nsll/slli- Shift Right Logical
- Shift x to the right by n bits
- Fill upper bits with 0
uint32_t xx >> nsrl/srli- Shift Right Arithmetic
- Shift x to the right by n bits
- Fill upper bits with sign bit value
int32_t xx >> nsra/srai
- Masking
- It is often need to pull out a sequence of bits from a larger value
- To get a sequence of bits:
- Shift the bits to the far right
- Mask the bits to zero out any upper bits we don't want
- Example: assume we want the middle 4 bits of an 8 bit value (bits 2-5)
- Combine individual bit sequences with right shift and OR (
|) - Sign extension
- Assume you have a 4-bit two's complement number you want to sign extend to be a 32-bit two's complement number.
- Shift all the way to the left and then do shift right arithmetic all the way back
Lab03 - RISC-V Machine Code Emulation¶
- RISC-V Machine Code and Emulation
- Instructions are 32 bits (1 word), there is also a 16 bit compressed format
- Each instruction encodes source register(s), a destination registers, and target address or offset
- We decode the
iw(instruction word) in steps- Look at
opcodeto determine instruction format type - Decode the
iwbased on opcode - Further decode
iwby looking at other fields such asfunct3andfunct7
- Look at
- The base RISC-V emulator includes the emulated state as a struct:
NREGSis 32 (because RISC-V has 32 registers)- The
pcis the program counter STACK_SIZEis 8192, but for prorgrams with lots of function calls and/or local vars this may need to be bigger.
- Basic emulation
- We will emulate assembly programs that have been linked with the emulator code
- Initialize the
rv_statewith - A pointer to the function we want to emulate
- The first 4 arguments as uint64_t values to send to the function
- Get
iw:iw = *pc - Get
opcodefromiw - Based on
opcodecall emulation function, like: emu_r_type(rsp, iw)rspisstruct rv_state *rsp- The emu functions will further decode the
iw - Use shift and mask (
get_bits()) to get each field - Update the state (registers and memory based on
opcode,funct3, andfunct7) - Update the
pc, usuallypc += 4to go to the next instruction- For control instructions, we will compute either an
offsetand add topc - Or, for
jalruse a register value to update thepc
- For control instructions, we will compute either an
- RISC-V Instruction Formats
- R-type
|funct7[31:25]|rs2[24:20]|rs1[19:15]|funct3[14:12]|rd[11:7]|opcode[6:0]|- For register instructions like
add,sub,sll
- For register instructions like
- I-type
|imm_11_0[31:20]|rs1[19:15]|funct3[14:12]|rd[11:7]|opcode[6:0]|- For immediate instruction like
addi,slli,lw,lb - Immediate must be constructed from parts and sign extended
int64_t imm = sign_extend(imm_11_0, 11)
- For immediate instruction like
- B-type
|imm_12[31]|imm_10_5[30:25]|rs2[24:20]|rs1[19:15]|funct3[14:12]|imm_4_1[11:8]|imm_11[7]|opcode[6:0]|- For branches, like
beq,bne,blt,bge int64_t imm = sign_extend(imm_12 << 12 | imm_11 << 11 | imm_10_5 << 5 | imm_4_1 << 1)- If branch taken
pc += imm, if not takenpc += 4
- For branches, like
- J-type
|imm_20[31]|imm_10_1[30:21]|imm_11[20]|imm_19_12[19:12]|rd[11:7]|opcode|- For 'jal' jump and link, also 'j'
int64_t imm = sign_extend(imm_20 << 20 | imm_19_12 << 12 | imm_11 < 11 | imm_10_1 << 1)pc += imm
- RISC-V Pseudo Instructions
- The assembler provide many convenience instructions that do not exist as real instructions
- Examples
li a0, 99isaddi a0, zero, 99j offsetisjal zero, offsetcall offsetisjal ra, offsetbgt r1, r2, offsetisblt r2, r1, offset
Project04 - RISC-V Emulation - Analysis - Cache Simulation¶
- Additional RISC-V Instruction Formats
- S-type
|imm_11_5[31:25]|rs2[24:20]|rs1[19:15]|funct3[14:12]|imm_4_0[11:7]|opcode[6:0]|- For store instuctions, like
sb,sw,sd - Immediate must be constructed from parts and sign extended
int64_t imm = sign_extend(imm_11_5 << 5 | imm_4_0, 11)
- For store instuctions, like
- U-type
|imm_31_12[31:12]|rd[11:7]|opcode[6:0]|- For
lui(load upper immediate) andauipcadd upper immediate to pc - Our code dosn't use these
- For
- Note that load instructions are an I-type form
- Dynamic Analysis
- Because we emulate each instruction we can do the following
- Count the total number of instructions executed
- Count the different types of instructions executed
- I/R count, Loads, Stores, Jumps, Branches Taken, Branches Not Taken
- Counts added to
struct rv_stateas a new structstruct rv_analysis - Need to update counts accurately in the emulation functions
- Cache Memory and Simulation
- A Cache is a small and fast memory that holds recently accessed data from main memory
- Main memory is slow relative to CPU speed
- A cache can work because program exeuction adheres to two principles of locality
- Principle of spatial locality - If code accesses a value, there is good chance it will access nearby values
- Principle of temporal locality - If code accesses a value now there is a good chance it will do so again soon
- We send an address to a cache an request a value back
- Hit - value is in the cache for the requested address
- Miss - value is not in the cache for the requested address
- Hit Rate = #hits / #refs
- Miss Rate = #misses / #refs
- In modern processes caches can achieve very high hit rates, > 95%
- Main cache ideas
- For a given memory address, how to locate value in the cache?
- How to replace a value?
- How many bytes do we load at at time into the cache (block size)?
- Three main cache designs
- Direct Mapped
- Fully Associative
- Set Associative
- Direct Mapped
- Each address maps to one slot in the cache
- On miss, just repace slot value with new value from memory
- Fully Associative
- Each address can map to any slot in the cache
- Use a form of LRU (least recently used) to choose which slot to evict on miss
- Set Associative
- An address maps to a set of slots, and can map to any slot in the set
- Use LRU within the slots in a set to choose a slot to evict
- Block Size
- To support spatial locatlity a cache will read in more than the word requested
- We can think of memory as an array of blocks
- We need to find the block for a give address
- Find the base of the block
- The load the entire block into the cache slot