update

2026-04-02 15:17:50 -05:00
parent 96f5304400
commit d6bc8375ce
3 changed files with 1025 additions and 2 deletions
--- a/content/CSE4303/CSE4303_L18.md
+++ b/content/CSE4303/CSE4303_L18.md
@@ -0,0 +1,594 @@
+# CSE4303 Introduction to Computer Security (Lecture 18)
+
+> Due to lack of my attention, this lecture note is generated by AI to create continuations of the previous lecture note. I kept this warning because the note was created by AI.
+
+#### Software security
+
+### Overview
+
+#### Outline
+
+- Context
+- Prominent software vulnerabilities and exploits
+- Buffer overflows
+  - Background: C code, compilation, memory layout, execution
+  - Baseline exploit
+  - Challenges
+  - Defenses, countermeasures, counter-countermeasures
+
+### Buffer overflows
+
+#### All programs are stored in memory
+
+- The process's view of memory is that it owns all of it.
+- For a `32-bit` process, the virtual address space runs from:
+  - `0x00000000`
+  - to `0xffffffff`
+- In reality, these are virtual addresses.
+  - The OS and CPU map them to physical addresses.
+
+#### The instructions themselves are in memory
+
+- Program text is also stored in memory.
+- The slide shows instructions such as:
+
+```asm
+0x4c2 sub $0x224,%esp
+0x4c1 push %ecx
+0x4bf mov %esp,%ebp
+0x4be push %ebp
+```
+
+- Important point:
+  - code and data are both memory-resident
+  - control flow therefore depends on values stored in memory
+
+#### Data's location depends on how it's created
+
+- Static initialized data example
+
+```c
+static const int y = 10;
+```
+
+- Static uninitialized data example
+
+```c
+static int x;
+```
+
+- Command-line arguments and environment are set when the process starts.
+- Stack data appears when functions run.
+
+```c
+int f() {
+    int x;
+    ...
+}
+```
+
+- Heap data appears at runtime.
+
+```c
+malloc(sizeof(long));
+```
+
+- Summary from the slide
+  - Known at compile time
+    - text
+    - initialized data
+    - uninitialized data
+  - Set when process starts
+    - command line and environment
+  - Runtime
+    - stack
+    - heap
+
+#### We are going to focus on runtime attacks
+
+- Stack and heap grow in opposite directions.
+- Compiler-generated instructions adjust the stack size at runtime.
+- The stack pointer tracks the active top of the stack.
+- Repeated `push` instructions place values onto the stack.
+- The slides use the sequence:
+  - `push 1`
+  - `push 2`
+  - `push 3`
+  - `return`
+- Heap allocation is apportioned by the OS and managed in-process by `malloc`.
+- The lecture says: focusing on the stack for now.
+
+```text
+0x00000000                              0xffffffff
+Heap  --------------------------------->     <--------------------------------- Stack
+```
+
+#### Stack layout when calling functions
+
+Questions asked on the slide:
+
+- What do we do when we call a function?
+  - What data need to be stored?
+  - Where do they go?
+- How do we return from a function?
+  - What data need to be restored?
+  - Where do they come from?
+
+Example used in the slide:
+
+```c
+void func(char *arg1, int arg2, int arg3)
+{
+    char loc1[4];
+    int loc2;
+    int loc3;
+}
+```
+
+Important layout points:
+
+- Arguments are pushed in reverse order of code.
+- Local variables are pushed in the same order as they appear in the code.
+- The slide then introduces two unknown slots between locals and arguments.
+
+#### Accessing variables
+
+Example:
+
+```c
+void func(char *arg1, int arg2, int arg3)
+{
+    char loc1[4];
+    int loc2;
+    int loc3;
+    ...
+    loc2++;
+    ...
+}
+```
+
+Question from the slide:
+- Where is `loc2`?
+
+Step-by-step answer developed in the slides:
+
+- Its absolute address is undecidable at compile time.
+- We do not know exactly where `loc2` is in absolute memory.
+- We do not know how many arguments there are in general.
+- But `loc2` is always a fixed offset before the frame metadata.
+- This motivates the frame pointer.
+
+Definitions from the slide:
+
+- Stack frame
+  - the current function call's region on the stack
+- Frame pointer
+  - `%ebp`
+- Example answer
+  - `loc2` is at `-8(%ebp)`
+
+#### Notation
+
+- `%ebp`
+  - a memory address stored in the frame-pointer register
+- `(%ebp)`
+  - the value at memory address `%ebp`
+  - like dereferencing a pointer
+
+The slide sequence then shows:
+
+```asm
+pushl %ebp
+movl %esp, %ebp
+```
+
+- Meaning:
+  - first save the old frame pointer on the stack
+  - then set the new frame pointer to the current stack pointer
+
+#### Returning from functions
+
+Example caller:
+
+```c
+int main()
+{
+    ...
+    func("Hey", 10, -3);
+    ...
+}
+```
+
+Questions from the slides:
+
+- How do we restore `%ebp`?
+- How do we resume execution at the correct place?
+
+Slide answers:
+
+- Push `%ebp` before locals.
+- Set `%ebp` to current `%esp`.
+- Set `%ebp` to `(%ebp)` at return.
+- Push next `%eip` before `call`.
+- Set `%eip` to `4(%ebp)` at return.
+
+#### Stack and functions: Summary
+
+- Calling function
+  - push arguments onto the stack in reverse order
+  - push the return address
+    - the address of the instruction that should run after control returns
+  - jump to the function's address
+- Called function
+  - push old frame pointer `%ebp` onto the stack
+  - set frame pointer `%ebp` to current `%esp`
+  - push local variables onto the stack
+  - access locals as offsets from `%ebp`
+- Returning function
+  - reset previous stack frame
+    - `%ebp = (%ebp)`
+  - jump back to return address
+    - `%eip = 4(%ebp)`
+
+#### Quick overview (again)
+
+- Buffer
+  - contiguous set of a given data type
+  - common in C
+    - all strings are buffers of `char`
+- Overflow
+  - put more into the buffer than it can hold
+- Question
+  - where does the extra data go?
+- Slide answer
+  - now that we know memory layouts, we can reason about where the overwrite lands
+
+#### A buffer overflow example
+
+Example 1 from the slide:
+
+```c
+void func(char *arg1)
+{
+    char buffer[4];
+    strcpy(buffer, arg1);
+    ...
+}
+
+int main()
+{
+    char *mystr = "AuthMe!";
+    func(mystr);
+    ...
+}
+```
+
+Step-by-step effect shown in the slides:
+
+- Initial stack region includes:
+  - `buffer`
+  - saved `%ebp`
+  - saved `%eip`
+  - `&arg1`
+- First 4 bytes copied:
+  - `A u t h`
+- Remaining bytes continue writing:
+  - `M e ! \0`
+- Because `strcpy` keeps copying until it sees `\0`, bytes go past the end of the buffer.
+- In the example, upon return:
+  - `%ebp` becomes `0x0021654d`
+- Result:
+  - segmentation fault
+  - shown as `SEGFAULT (0x00216551)` in the slide sequence
+
+#### A buffer overflow example: changing control data vs. changing program data
+
+Example 2 from the slide:
+
+```c
+void func(char *arg1)
+{
+    int authenticated = 0;
+    char buffer[4];
+    strcpy(buffer, arg1);
+    if (authenticated) { ... }
+}
+
+int main()
+{
+    char *mystr = "AuthMe!";
+    func(mystr);
+    ...
+}
+```
+
+Step-by-step effect shown in the slides:
+
+- Initial stack contains:
+  - `buffer`
+  - `authenticated`
+  - saved `%ebp`
+  - saved `%eip`
+  - `&arg1`
+- Overflow writes:
+  - `A u t h` into `buffer`
+  - `M e ! \0` into `authenticated`
+- Result:
+  - code still runs
+  - user now appears "authenticated"
+
+Important lesson:
+- A buffer overflow does not need to crash.
+- It may silently change program data or logic.
+
+#### `gets` vs `fgets`
+
+Unsafe function shown in the slide:
+
+```c
+void vulnerable()
+{
+    char buf[80];
+    gets(buf);
+}
+```
+
+Safer version shown in the slide:
+
+```c
+void safe()
+{
+    char buf[80];
+    fgets(buf, 64, stdin);
+}
+```
+
+Even safer pattern from the next slide:
+
+```c
+void safer()
+{
+    char buf[80];
+    fgets(buf, sizeof(buf), stdin);
+}
+```
+
+Reference from slide:
+- [List of vulnerable C functions](https://security.web.cern.ch/security/recommendations/en/codetools/c.shtml)
+
+#### User-supplied strings
+
+- In the toy examples, the strings are constant.
+- In reality they come from users in many ways:
+  - text input
+  - packets
+  - environment variables
+  - file input
+- Validating assumptions about user input is extremely important.
+
+#### What's the worst that could happen?
+
+Using:
+
+```c
+char buffer[4];
+strcpy(buffer, arg1);
+```
+
+- `strcpy` will let you write as much as you want until a `\0`.
+- If attacker-controlled input is long enough, the memory past the buffer becomes "all ours" from the attacker's perspective.
+- That raises the key question from the slide:
+  - what could you write to memory to wreak havoc?
+
+#### Code injection
+
+- Title-only transition slide.
+- It introduces the move from accidental overwrite to deliberate attacker payloads.
+
+#### High-level idea
+
+Example used in the slide:
+
+```c
+void func(char *arg1)
+{
+    char buffer[4];
+    sprintf(buffer, arg1);
+    ...
+}
+```
+
+Two-step plan shown in the slides:
+
+- 1. Load my own code into memory.
+- 2. Somehow get `%eip` to point to it.
+
+The slide sequence draws this as:
+- vulnerable buffer on stack
+- attacker-controlled bytes placed in memory
+- `%eip` redirected toward those bytes
+
+#### This is nontrivial
+
+- Pulling off this attack requires getting a few things really right, and some things only sorta right.
+- The lecture says to think about what is tricky about the attack.
+- Main security idea:
+  - the key to defending it is to make the hard parts really hard
+
+#### Challenge 1: Loading code into memory
+
+- The attacker payload must be machine-code instructions.
+  - already compiled
+  - ready to run
+- We have to be careful in how we construct it.
+  - It cannot contain all-zero bytes.
+    - otherwise `sprintf`, `gets`, `scanf`, and similar routines stop copying
+  - It cannot make use of the loader.
+    - because we are injecting the bytes directly
+  - It cannot use the stack.
+    - because we are in the process of smashing it
+- The lecture then gives the name:
+  - shellcode
+
+#### What kind of code would we want to run?
+
+- Goal: full-purpose shell
+  - code to launch a shell is called shellcode
+  - it is nontrivial to write shellcode that works as injected code
+    - no zeroes
+    - cannot use the stack
+    - no loader dependence
+  - there are many shellcodes already written
+  - there are even competitions for writing the smallest shellcode
+- Goal: privilege escalation
+  - ideally, attacker goes from guest or non-user to root
+
+#### Shellcode
+
+High-level C version shown in the slides:
+
+```c
+#include <stdio.h>
+int main() {
+    char *name[2];
+    name[0] = "/bin/sh";
+    name[1] = NULL;
+    execve(name[0], name, NULL);
+}
+```
+
+Assembly version shown in the slides:
+
+```asm
+xorl %eax, %eax
+pushl %eax
+pushl $0x68732f2f
+pushl $0x6e69622f
+movl %esp, %ebx
+pushl %eax
+...
+```
+
+Machine-code bytes shown in the slides:
+
+```text
+"\x31\xc0"
+"\x50"
+"\x68""//sh"
+"\x68""/bin"
+"\x89\xe3"
+"\x50"
+...
+```
+
+Important point from the slide:
+- those machine-code bytes can become part of the attacker's input
+
+#### Challenge 2: Getting our injected code to run
+
+- We cannot insert a fresh "jump into my code" instruction.
+- We must use whatever code is already running.
+
+#### Hijacking the saved `%eip`
+
+- Strategy:
+  - overwrite the saved return address
+  - make it point into the injected bytes
+- Core idea:
+  - when the function returns, the CPU loads the overwritten return address into `%eip`
+
+Question raised by the slides:
+- But how do we know the address?
+
+Failure mode shown in the slide sequence:
+- if the guessed address is wrong, the CPU tries to execute data bytes
+- this is most likely not valid code
+- result:
+  - invalid instruction
+  - CPU "panic" / crash
+
+#### Challenge 3: Finding the return address
+
+- If we do not have the code, we may not know how far the buffer is from the saved `%ebp`.
+- One approach:
+  - try many different values
+- Worst case:
+  - `2^32` possible addresses on `32-bit`
+  - `2^64` possible addresses on `64-bit`
+- But without address randomization:
+  - the stack always starts from the same fixed address
+  - the stack grows, but usually not very deeply unless heavily recursive
+
+#### Improving our chances: nop sleds
+
+- `nop` is a single-byte instruction.
+- Definition:
+  - it does nothing except move execution to the next instruction
+- NOP sled idea:
+  - put a long sequence of `nop` bytes before the real malicious code
+  - now jumping anywhere in that region still works
+  - execution slides down into the payload
+
+Why this helps:
+- it increases the chance that an approximate address guess still succeeds
+- the slides explicitly state:
+  - now we improve our chances of guessing by a factor of `#nops`
+
+```text
+[padding][saved return address guess][nop nop nop ...][malicious code]
+```
+
+#### Putting it all together
+
+- Payload components shown in the slides:
+  - padding
+  - guessed return address
+  - NOP sled
+  - malicious code
+- Constraint noted by the lecture:
+  - input has to start wherever the vulnerable `gets` / similar function begins writing
+
+#### Buffer overflow defense #1: use secure bounds-checking functions
+
+- User-level protection
+- Replace unbounded routines with bounded ones.
+- Prefer secure languages where possible:
+  - Java
+  - Rust
+  - etc.
+
+#### Buffer overflow defense #2: Address Space Layout Randomization (ASLR)
+
+- Randomize starting address of program regions.
+- Goal:
+  - prevent attacker from guessing / finding the correct address to put in the return-address slot
+- OS-level protection
+
+#### Buffer overflow counter-technique: NOP sled
+
+- Counter-technique against uncertain addresses
+- By jumping somewhere into a wide sled, exact address knowledge becomes less necessary
+
+#### Buffer overflow defense #3: Canary
+
+- Put a guard value between vulnerable local data and control-flow data.
+- If overflow changes the canary, the program can detect corruption before returning.
+- OS-level / compiler-assisted protection in the lecture framing
+
+#### Buffer overflow defense #4: No-execute bits (NX)
+
+- Mark the stack as not executable.
+- Requires hardware support.
+- OS / hardware-level protection
+
+#### Buffer overflow counter-technique: ret-to-libc and ROP
+
+- Code in the C library is already stored at consistent addresses.
+- Attacker can find code in the C library that has the desired effect.
+  - possibly heavily fragmented
+- Then return to the necessary address or addresses in the proper order.
+- This is the motivation behind:
+  - `ret-to-libc`
+  - Return-Oriented Programming (ROP)
+
+We will continue from defenses / exploitation follow-ups in the next lecture.