# CSE4303 Introduction to Computer Security (Lecture 18) > Due to lack of my attention, this lecture note is generated by AI to create continuations of the previous lecture note. I kept this warning because the note was created by AI. #### Software security ### Overview #### Outline - Context - Prominent software vulnerabilities and exploits - Buffer overflows - Background: C code, compilation, memory layout, execution - Baseline exploit - Challenges - Defenses, countermeasures, counter-countermeasures ### Buffer overflows #### All programs are stored in memory - The process's view of memory is that it owns all of it. - For a `32-bit` process, the virtual address space runs from: - `0x00000000` - to `0xffffffff` - In reality, these are virtual addresses. - The OS and CPU map them to physical addresses. #### The instructions themselves are in memory - Program text is also stored in memory. - The slide shows instructions such as: ```asm 0x4c2 sub $0x224,%esp 0x4c1 push %ecx 0x4bf mov %esp,%ebp 0x4be push %ebp ``` - Important point: - code and data are both memory-resident - control flow therefore depends on values stored in memory #### Data's location depends on how it's created - Static initialized data example ```c static const int y = 10; ``` - Static uninitialized data example ```c static int x; ``` - Command-line arguments and environment are set when the process starts. - Stack data appears when functions run. ```c int f() { int x; ... } ``` - Heap data appears at runtime. ```c malloc(sizeof(long)); ``` - Summary from the slide - Known at compile time - text - initialized data - uninitialized data - Set when process starts - command line and environment - Runtime - stack - heap #### We are going to focus on runtime attacks - Stack and heap grow in opposite directions. - Compiler-generated instructions adjust the stack size at runtime. - The stack pointer tracks the active top of the stack. - Repeated `push` instructions place values onto the stack. - The slides use the sequence: - `push 1` - `push 2` - `push 3` - `return` - Heap allocation is apportioned by the OS and managed in-process by `malloc`. - The lecture says: focusing on the stack for now. ```text 0x00000000 0xffffffff Heap ---------------------------------> <--------------------------------- Stack ``` #### Stack layout when calling functions Questions asked on the slide: - What do we do when we call a function? - What data need to be stored? - Where do they go? - How do we return from a function? - What data need to be restored? - Where do they come from? Example used in the slide: ```c void func(char *arg1, int arg2, int arg3) { char loc1[4]; int loc2; int loc3; } ``` Important layout points: - Arguments are pushed in reverse order of code. - Local variables are pushed in the same order as they appear in the code. - The slide then introduces two unknown slots between locals and arguments. #### Accessing variables Example: ```c void func(char *arg1, int arg2, int arg3) { char loc1[4]; int loc2; int loc3; ... loc2++; ... } ``` Question from the slide: - Where is `loc2`? Step-by-step answer developed in the slides: - Its absolute address is undecidable at compile time. - We do not know exactly where `loc2` is in absolute memory. - We do not know how many arguments there are in general. - But `loc2` is always a fixed offset before the frame metadata. - This motivates the frame pointer. Definitions from the slide: - Stack frame - the current function call's region on the stack - Frame pointer - `%ebp` - Example answer - `loc2` is at `-8(%ebp)` #### Notation - `%ebp` - a memory address stored in the frame-pointer register - `(%ebp)` - the value at memory address `%ebp` - like dereferencing a pointer The slide sequence then shows: ```asm pushl %ebp movl %esp, %ebp ``` - Meaning: - first save the old frame pointer on the stack - then set the new frame pointer to the current stack pointer #### Returning from functions Example caller: ```c int main() { ... func("Hey", 10, -3); ... } ``` Questions from the slides: - How do we restore `%ebp`? - How do we resume execution at the correct place? Slide answers: - Push `%ebp` before locals. - Set `%ebp` to current `%esp`. - Set `%ebp` to `(%ebp)` at return. - Push next `%eip` before `call`. - Set `%eip` to `4(%ebp)` at return. #### Stack and functions: Summary - Calling function - push arguments onto the stack in reverse order - push the return address - the address of the instruction that should run after control returns - jump to the function's address - Called function - push old frame pointer `%ebp` onto the stack - set frame pointer `%ebp` to current `%esp` - push local variables onto the stack - access locals as offsets from `%ebp` - Returning function - reset previous stack frame - `%ebp = (%ebp)` - jump back to return address - `%eip = 4(%ebp)` #### Quick overview (again) - Buffer - contiguous set of a given data type - common in C - all strings are buffers of `char` - Overflow - put more into the buffer than it can hold - Question - where does the extra data go? - Slide answer - now that we know memory layouts, we can reason about where the overwrite lands #### A buffer overflow example Example 1 from the slide: ```c void func(char *arg1) { char buffer[4]; strcpy(buffer, arg1); ... } int main() { char *mystr = "AuthMe!"; func(mystr); ... } ``` Step-by-step effect shown in the slides: - Initial stack region includes: - `buffer` - saved `%ebp` - saved `%eip` - `&arg1` - First 4 bytes copied: - `A u t h` - Remaining bytes continue writing: - `M e ! \0` - Because `strcpy` keeps copying until it sees `\0`, bytes go past the end of the buffer. - In the example, upon return: - `%ebp` becomes `0x0021654d` - Result: - segmentation fault - shown as `SEGFAULT (0x00216551)` in the slide sequence #### A buffer overflow example: changing control data vs. changing program data Example 2 from the slide: ```c void func(char *arg1) { int authenticated = 0; char buffer[4]; strcpy(buffer, arg1); if (authenticated) { ... } } int main() { char *mystr = "AuthMe!"; func(mystr); ... } ``` Step-by-step effect shown in the slides: - Initial stack contains: - `buffer` - `authenticated` - saved `%ebp` - saved `%eip` - `&arg1` - Overflow writes: - `A u t h` into `buffer` - `M e ! \0` into `authenticated` - Result: - code still runs - user now appears "authenticated" Important lesson: - A buffer overflow does not need to crash. - It may silently change program data or logic. #### `gets` vs `fgets` Unsafe function shown in the slide: ```c void vulnerable() { char buf[80]; gets(buf); } ``` Safer version shown in the slide: ```c void safe() { char buf[80]; fgets(buf, 64, stdin); } ``` Even safer pattern from the next slide: ```c void safer() { char buf[80]; fgets(buf, sizeof(buf), stdin); } ``` Reference from slide: - [List of vulnerable C functions](https://security.web.cern.ch/security/recommendations/en/codetools/c.shtml) #### User-supplied strings - In the toy examples, the strings are constant. - In reality they come from users in many ways: - text input - packets - environment variables - file input - Validating assumptions about user input is extremely important. #### What's the worst that could happen? Using: ```c char buffer[4]; strcpy(buffer, arg1); ``` - `strcpy` will let you write as much as you want until a `\0`. - If attacker-controlled input is long enough, the memory past the buffer becomes "all ours" from the attacker's perspective. - That raises the key question from the slide: - what could you write to memory to wreak havoc? #### Code injection - Title-only transition slide. - It introduces the move from accidental overwrite to deliberate attacker payloads. #### High-level idea Example used in the slide: ```c void func(char *arg1) { char buffer[4]; sprintf(buffer, arg1); ... } ``` Two-step plan shown in the slides: - 1. Load my own code into memory. - 2. Somehow get `%eip` to point to it. The slide sequence draws this as: - vulnerable buffer on stack - attacker-controlled bytes placed in memory - `%eip` redirected toward those bytes #### This is nontrivial - Pulling off this attack requires getting a few things really right, and some things only sorta right. - The lecture says to think about what is tricky about the attack. - Main security idea: - the key to defending it is to make the hard parts really hard #### Challenge 1: Loading code into memory - The attacker payload must be machine-code instructions. - already compiled - ready to run - We have to be careful in how we construct it. - It cannot contain all-zero bytes. - otherwise `sprintf`, `gets`, `scanf`, and similar routines stop copying - It cannot make use of the loader. - because we are injecting the bytes directly - It cannot use the stack. - because we are in the process of smashing it - The lecture then gives the name: - shellcode #### What kind of code would we want to run? - Goal: full-purpose shell - code to launch a shell is called shellcode - it is nontrivial to write shellcode that works as injected code - no zeroes - cannot use the stack - no loader dependence - there are many shellcodes already written - there are even competitions for writing the smallest shellcode - Goal: privilege escalation - ideally, attacker goes from guest or non-user to root #### Shellcode High-level C version shown in the slides: ```c #include int main() { char *name[2]; name[0] = "/bin/sh"; name[1] = NULL; execve(name[0], name, NULL); } ``` Assembly version shown in the slides: ```asm xorl %eax, %eax pushl %eax pushl $0x68732f2f pushl $0x6e69622f movl %esp, %ebx pushl %eax ... ``` Machine-code bytes shown in the slides: ```text "\x31\xc0" "\x50" "\x68""//sh" "\x68""/bin" "\x89\xe3" "\x50" ... ``` Important point from the slide: - those machine-code bytes can become part of the attacker's input #### Challenge 2: Getting our injected code to run - We cannot insert a fresh "jump into my code" instruction. - We must use whatever code is already running. #### Hijacking the saved `%eip` - Strategy: - overwrite the saved return address - make it point into the injected bytes - Core idea: - when the function returns, the CPU loads the overwritten return address into `%eip` Question raised by the slides: - But how do we know the address? Failure mode shown in the slide sequence: - if the guessed address is wrong, the CPU tries to execute data bytes - this is most likely not valid code - result: - invalid instruction - CPU "panic" / crash #### Challenge 3: Finding the return address - If we do not have the code, we may not know how far the buffer is from the saved `%ebp`. - One approach: - try many different values - Worst case: - `2^32` possible addresses on `32-bit` - `2^64` possible addresses on `64-bit` - But without address randomization: - the stack always starts from the same fixed address - the stack grows, but usually not very deeply unless heavily recursive #### Improving our chances: nop sleds - `nop` is a single-byte instruction. - Definition: - it does nothing except move execution to the next instruction - NOP sled idea: - put a long sequence of `nop` bytes before the real malicious code - now jumping anywhere in that region still works - execution slides down into the payload Why this helps: - it increases the chance that an approximate address guess still succeeds - the slides explicitly state: - now we improve our chances of guessing by a factor of `#nops` ```text [padding][saved return address guess][nop nop nop ...][malicious code] ``` #### Putting it all together - Payload components shown in the slides: - padding - guessed return address - NOP sled - malicious code - Constraint noted by the lecture: - input has to start wherever the vulnerable `gets` / similar function begins writing #### Buffer overflow defense #1: use secure bounds-checking functions - User-level protection - Replace unbounded routines with bounded ones. - Prefer secure languages where possible: - Java - Rust - etc. #### Buffer overflow defense #2: Address Space Layout Randomization (ASLR) - Randomize starting address of program regions. - Goal: - prevent attacker from guessing / finding the correct address to put in the return-address slot - OS-level protection #### Buffer overflow counter-technique: NOP sled - Counter-technique against uncertain addresses - By jumping somewhere into a wide sled, exact address knowledge becomes less necessary #### Buffer overflow defense #3: Canary - Put a guard value between vulnerable local data and control-flow data. - If overflow changes the canary, the program can detect corruption before returning. - OS-level / compiler-assisted protection in the lecture framing #### Buffer overflow defense #4: No-execute bits (NX) - Mark the stack as not executable. - Requires hardware support. - OS / hardware-level protection #### Buffer overflow counter-technique: ret-to-libc and ROP - Code in the C library is already stored at consistent addresses. - Attacker can find code in the C library that has the desired effect. - possibly heavily fragmented - Then return to the necessary address or addresses in the proper order. - This is the motivation behind: - `ret-to-libc` - Return-Oriented Programming (ROP) We will continue from defenses / exploitation follow-ups in the next lecture.