update
Some checks failed
Sync from Gitea (main→main, keep workflow) / mirror (push) Has been cancelled
Some checks failed
Sync from Gitea (main→main, keep workflow) / mirror (push) Has been cancelled
This commit is contained in:
594
content/CSE4303/CSE4303_L18.md
Normal file
594
content/CSE4303/CSE4303_L18.md
Normal file
@@ -0,0 +1,594 @@
|
||||
# CSE4303 Introduction to Computer Security (Lecture 18)
|
||||
|
||||
> Due to lack of my attention, this lecture note is generated by AI to create continuations of the previous lecture note. I kept this warning because the note was created by AI.
|
||||
|
||||
#### Software security
|
||||
|
||||
### Overview
|
||||
|
||||
#### Outline
|
||||
|
||||
- Context
|
||||
- Prominent software vulnerabilities and exploits
|
||||
- Buffer overflows
|
||||
- Background: C code, compilation, memory layout, execution
|
||||
- Baseline exploit
|
||||
- Challenges
|
||||
- Defenses, countermeasures, counter-countermeasures
|
||||
|
||||
### Buffer overflows
|
||||
|
||||
#### All programs are stored in memory
|
||||
|
||||
- The process's view of memory is that it owns all of it.
|
||||
- For a `32-bit` process, the virtual address space runs from:
|
||||
- `0x00000000`
|
||||
- to `0xffffffff`
|
||||
- In reality, these are virtual addresses.
|
||||
- The OS and CPU map them to physical addresses.
|
||||
|
||||
#### The instructions themselves are in memory
|
||||
|
||||
- Program text is also stored in memory.
|
||||
- The slide shows instructions such as:
|
||||
|
||||
```asm
|
||||
0x4c2 sub $0x224,%esp
|
||||
0x4c1 push %ecx
|
||||
0x4bf mov %esp,%ebp
|
||||
0x4be push %ebp
|
||||
```
|
||||
|
||||
- Important point:
|
||||
- code and data are both memory-resident
|
||||
- control flow therefore depends on values stored in memory
|
||||
|
||||
#### Data's location depends on how it's created
|
||||
|
||||
- Static initialized data example
|
||||
|
||||
```c
|
||||
static const int y = 10;
|
||||
```
|
||||
|
||||
- Static uninitialized data example
|
||||
|
||||
```c
|
||||
static int x;
|
||||
```
|
||||
|
||||
- Command-line arguments and environment are set when the process starts.
|
||||
- Stack data appears when functions run.
|
||||
|
||||
```c
|
||||
int f() {
|
||||
int x;
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
- Heap data appears at runtime.
|
||||
|
||||
```c
|
||||
malloc(sizeof(long));
|
||||
```
|
||||
|
||||
- Summary from the slide
|
||||
- Known at compile time
|
||||
- text
|
||||
- initialized data
|
||||
- uninitialized data
|
||||
- Set when process starts
|
||||
- command line and environment
|
||||
- Runtime
|
||||
- stack
|
||||
- heap
|
||||
|
||||
#### We are going to focus on runtime attacks
|
||||
|
||||
- Stack and heap grow in opposite directions.
|
||||
- Compiler-generated instructions adjust the stack size at runtime.
|
||||
- The stack pointer tracks the active top of the stack.
|
||||
- Repeated `push` instructions place values onto the stack.
|
||||
- The slides use the sequence:
|
||||
- `push 1`
|
||||
- `push 2`
|
||||
- `push 3`
|
||||
- `return`
|
||||
- Heap allocation is apportioned by the OS and managed in-process by `malloc`.
|
||||
- The lecture says: focusing on the stack for now.
|
||||
|
||||
```text
|
||||
0x00000000 0xffffffff
|
||||
Heap ---------------------------------> <--------------------------------- Stack
|
||||
```
|
||||
|
||||
#### Stack layout when calling functions
|
||||
|
||||
Questions asked on the slide:
|
||||
|
||||
- What do we do when we call a function?
|
||||
- What data need to be stored?
|
||||
- Where do they go?
|
||||
- How do we return from a function?
|
||||
- What data need to be restored?
|
||||
- Where do they come from?
|
||||
|
||||
Example used in the slide:
|
||||
|
||||
```c
|
||||
void func(char *arg1, int arg2, int arg3)
|
||||
{
|
||||
char loc1[4];
|
||||
int loc2;
|
||||
int loc3;
|
||||
}
|
||||
```
|
||||
|
||||
Important layout points:
|
||||
|
||||
- Arguments are pushed in reverse order of code.
|
||||
- Local variables are pushed in the same order as they appear in the code.
|
||||
- The slide then introduces two unknown slots between locals and arguments.
|
||||
|
||||
#### Accessing variables
|
||||
|
||||
Example:
|
||||
|
||||
```c
|
||||
void func(char *arg1, int arg2, int arg3)
|
||||
{
|
||||
char loc1[4];
|
||||
int loc2;
|
||||
int loc3;
|
||||
...
|
||||
loc2++;
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
Question from the slide:
|
||||
- Where is `loc2`?
|
||||
|
||||
Step-by-step answer developed in the slides:
|
||||
|
||||
- Its absolute address is undecidable at compile time.
|
||||
- We do not know exactly where `loc2` is in absolute memory.
|
||||
- We do not know how many arguments there are in general.
|
||||
- But `loc2` is always a fixed offset before the frame metadata.
|
||||
- This motivates the frame pointer.
|
||||
|
||||
Definitions from the slide:
|
||||
|
||||
- Stack frame
|
||||
- the current function call's region on the stack
|
||||
- Frame pointer
|
||||
- `%ebp`
|
||||
- Example answer
|
||||
- `loc2` is at `-8(%ebp)`
|
||||
|
||||
#### Notation
|
||||
|
||||
- `%ebp`
|
||||
- a memory address stored in the frame-pointer register
|
||||
- `(%ebp)`
|
||||
- the value at memory address `%ebp`
|
||||
- like dereferencing a pointer
|
||||
|
||||
The slide sequence then shows:
|
||||
|
||||
```asm
|
||||
pushl %ebp
|
||||
movl %esp, %ebp
|
||||
```
|
||||
|
||||
- Meaning:
|
||||
- first save the old frame pointer on the stack
|
||||
- then set the new frame pointer to the current stack pointer
|
||||
|
||||
#### Returning from functions
|
||||
|
||||
Example caller:
|
||||
|
||||
```c
|
||||
int main()
|
||||
{
|
||||
...
|
||||
func("Hey", 10, -3);
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
Questions from the slides:
|
||||
|
||||
- How do we restore `%ebp`?
|
||||
- How do we resume execution at the correct place?
|
||||
|
||||
Slide answers:
|
||||
|
||||
- Push `%ebp` before locals.
|
||||
- Set `%ebp` to current `%esp`.
|
||||
- Set `%ebp` to `(%ebp)` at return.
|
||||
- Push next `%eip` before `call`.
|
||||
- Set `%eip` to `4(%ebp)` at return.
|
||||
|
||||
#### Stack and functions: Summary
|
||||
|
||||
- Calling function
|
||||
- push arguments onto the stack in reverse order
|
||||
- push the return address
|
||||
- the address of the instruction that should run after control returns
|
||||
- jump to the function's address
|
||||
- Called function
|
||||
- push old frame pointer `%ebp` onto the stack
|
||||
- set frame pointer `%ebp` to current `%esp`
|
||||
- push local variables onto the stack
|
||||
- access locals as offsets from `%ebp`
|
||||
- Returning function
|
||||
- reset previous stack frame
|
||||
- `%ebp = (%ebp)`
|
||||
- jump back to return address
|
||||
- `%eip = 4(%ebp)`
|
||||
|
||||
#### Quick overview (again)
|
||||
|
||||
- Buffer
|
||||
- contiguous set of a given data type
|
||||
- common in C
|
||||
- all strings are buffers of `char`
|
||||
- Overflow
|
||||
- put more into the buffer than it can hold
|
||||
- Question
|
||||
- where does the extra data go?
|
||||
- Slide answer
|
||||
- now that we know memory layouts, we can reason about where the overwrite lands
|
||||
|
||||
#### A buffer overflow example
|
||||
|
||||
Example 1 from the slide:
|
||||
|
||||
```c
|
||||
void func(char *arg1)
|
||||
{
|
||||
char buffer[4];
|
||||
strcpy(buffer, arg1);
|
||||
...
|
||||
}
|
||||
|
||||
int main()
|
||||
{
|
||||
char *mystr = "AuthMe!";
|
||||
func(mystr);
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
Step-by-step effect shown in the slides:
|
||||
|
||||
- Initial stack region includes:
|
||||
- `buffer`
|
||||
- saved `%ebp`
|
||||
- saved `%eip`
|
||||
- `&arg1`
|
||||
- First 4 bytes copied:
|
||||
- `A u t h`
|
||||
- Remaining bytes continue writing:
|
||||
- `M e ! \0`
|
||||
- Because `strcpy` keeps copying until it sees `\0`, bytes go past the end of the buffer.
|
||||
- In the example, upon return:
|
||||
- `%ebp` becomes `0x0021654d`
|
||||
- Result:
|
||||
- segmentation fault
|
||||
- shown as `SEGFAULT (0x00216551)` in the slide sequence
|
||||
|
||||
#### A buffer overflow example: changing control data vs. changing program data
|
||||
|
||||
Example 2 from the slide:
|
||||
|
||||
```c
|
||||
void func(char *arg1)
|
||||
{
|
||||
int authenticated = 0;
|
||||
char buffer[4];
|
||||
strcpy(buffer, arg1);
|
||||
if (authenticated) { ... }
|
||||
}
|
||||
|
||||
int main()
|
||||
{
|
||||
char *mystr = "AuthMe!";
|
||||
func(mystr);
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
Step-by-step effect shown in the slides:
|
||||
|
||||
- Initial stack contains:
|
||||
- `buffer`
|
||||
- `authenticated`
|
||||
- saved `%ebp`
|
||||
- saved `%eip`
|
||||
- `&arg1`
|
||||
- Overflow writes:
|
||||
- `A u t h` into `buffer`
|
||||
- `M e ! \0` into `authenticated`
|
||||
- Result:
|
||||
- code still runs
|
||||
- user now appears "authenticated"
|
||||
|
||||
Important lesson:
|
||||
- A buffer overflow does not need to crash.
|
||||
- It may silently change program data or logic.
|
||||
|
||||
#### `gets` vs `fgets`
|
||||
|
||||
Unsafe function shown in the slide:
|
||||
|
||||
```c
|
||||
void vulnerable()
|
||||
{
|
||||
char buf[80];
|
||||
gets(buf);
|
||||
}
|
||||
```
|
||||
|
||||
Safer version shown in the slide:
|
||||
|
||||
```c
|
||||
void safe()
|
||||
{
|
||||
char buf[80];
|
||||
fgets(buf, 64, stdin);
|
||||
}
|
||||
```
|
||||
|
||||
Even safer pattern from the next slide:
|
||||
|
||||
```c
|
||||
void safer()
|
||||
{
|
||||
char buf[80];
|
||||
fgets(buf, sizeof(buf), stdin);
|
||||
}
|
||||
```
|
||||
|
||||
Reference from slide:
|
||||
- [List of vulnerable C functions](https://security.web.cern.ch/security/recommendations/en/codetools/c.shtml)
|
||||
|
||||
#### User-supplied strings
|
||||
|
||||
- In the toy examples, the strings are constant.
|
||||
- In reality they come from users in many ways:
|
||||
- text input
|
||||
- packets
|
||||
- environment variables
|
||||
- file input
|
||||
- Validating assumptions about user input is extremely important.
|
||||
|
||||
#### What's the worst that could happen?
|
||||
|
||||
Using:
|
||||
|
||||
```c
|
||||
char buffer[4];
|
||||
strcpy(buffer, arg1);
|
||||
```
|
||||
|
||||
- `strcpy` will let you write as much as you want until a `\0`.
|
||||
- If attacker-controlled input is long enough, the memory past the buffer becomes "all ours" from the attacker's perspective.
|
||||
- That raises the key question from the slide:
|
||||
- what could you write to memory to wreak havoc?
|
||||
|
||||
#### Code injection
|
||||
|
||||
- Title-only transition slide.
|
||||
- It introduces the move from accidental overwrite to deliberate attacker payloads.
|
||||
|
||||
#### High-level idea
|
||||
|
||||
Example used in the slide:
|
||||
|
||||
```c
|
||||
void func(char *arg1)
|
||||
{
|
||||
char buffer[4];
|
||||
sprintf(buffer, arg1);
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
Two-step plan shown in the slides:
|
||||
|
||||
- 1. Load my own code into memory.
|
||||
- 2. Somehow get `%eip` to point to it.
|
||||
|
||||
The slide sequence draws this as:
|
||||
- vulnerable buffer on stack
|
||||
- attacker-controlled bytes placed in memory
|
||||
- `%eip` redirected toward those bytes
|
||||
|
||||
#### This is nontrivial
|
||||
|
||||
- Pulling off this attack requires getting a few things really right, and some things only sorta right.
|
||||
- The lecture says to think about what is tricky about the attack.
|
||||
- Main security idea:
|
||||
- the key to defending it is to make the hard parts really hard
|
||||
|
||||
#### Challenge 1: Loading code into memory
|
||||
|
||||
- The attacker payload must be machine-code instructions.
|
||||
- already compiled
|
||||
- ready to run
|
||||
- We have to be careful in how we construct it.
|
||||
- It cannot contain all-zero bytes.
|
||||
- otherwise `sprintf`, `gets`, `scanf`, and similar routines stop copying
|
||||
- It cannot make use of the loader.
|
||||
- because we are injecting the bytes directly
|
||||
- It cannot use the stack.
|
||||
- because we are in the process of smashing it
|
||||
- The lecture then gives the name:
|
||||
- shellcode
|
||||
|
||||
#### What kind of code would we want to run?
|
||||
|
||||
- Goal: full-purpose shell
|
||||
- code to launch a shell is called shellcode
|
||||
- it is nontrivial to write shellcode that works as injected code
|
||||
- no zeroes
|
||||
- cannot use the stack
|
||||
- no loader dependence
|
||||
- there are many shellcodes already written
|
||||
- there are even competitions for writing the smallest shellcode
|
||||
- Goal: privilege escalation
|
||||
- ideally, attacker goes from guest or non-user to root
|
||||
|
||||
#### Shellcode
|
||||
|
||||
High-level C version shown in the slides:
|
||||
|
||||
```c
|
||||
#include <stdio.h>
|
||||
int main() {
|
||||
char *name[2];
|
||||
name[0] = "/bin/sh";
|
||||
name[1] = NULL;
|
||||
execve(name[0], name, NULL);
|
||||
}
|
||||
```
|
||||
|
||||
Assembly version shown in the slides:
|
||||
|
||||
```asm
|
||||
xorl %eax, %eax
|
||||
pushl %eax
|
||||
pushl $0x68732f2f
|
||||
pushl $0x6e69622f
|
||||
movl %esp, %ebx
|
||||
pushl %eax
|
||||
...
|
||||
```
|
||||
|
||||
Machine-code bytes shown in the slides:
|
||||
|
||||
```text
|
||||
"\x31\xc0"
|
||||
"\x50"
|
||||
"\x68""//sh"
|
||||
"\x68""/bin"
|
||||
"\x89\xe3"
|
||||
"\x50"
|
||||
...
|
||||
```
|
||||
|
||||
Important point from the slide:
|
||||
- those machine-code bytes can become part of the attacker's input
|
||||
|
||||
#### Challenge 2: Getting our injected code to run
|
||||
|
||||
- We cannot insert a fresh "jump into my code" instruction.
|
||||
- We must use whatever code is already running.
|
||||
|
||||
#### Hijacking the saved `%eip`
|
||||
|
||||
- Strategy:
|
||||
- overwrite the saved return address
|
||||
- make it point into the injected bytes
|
||||
- Core idea:
|
||||
- when the function returns, the CPU loads the overwritten return address into `%eip`
|
||||
|
||||
Question raised by the slides:
|
||||
- But how do we know the address?
|
||||
|
||||
Failure mode shown in the slide sequence:
|
||||
- if the guessed address is wrong, the CPU tries to execute data bytes
|
||||
- this is most likely not valid code
|
||||
- result:
|
||||
- invalid instruction
|
||||
- CPU "panic" / crash
|
||||
|
||||
#### Challenge 3: Finding the return address
|
||||
|
||||
- If we do not have the code, we may not know how far the buffer is from the saved `%ebp`.
|
||||
- One approach:
|
||||
- try many different values
|
||||
- Worst case:
|
||||
- `2^32` possible addresses on `32-bit`
|
||||
- `2^64` possible addresses on `64-bit`
|
||||
- But without address randomization:
|
||||
- the stack always starts from the same fixed address
|
||||
- the stack grows, but usually not very deeply unless heavily recursive
|
||||
|
||||
#### Improving our chances: nop sleds
|
||||
|
||||
- `nop` is a single-byte instruction.
|
||||
- Definition:
|
||||
- it does nothing except move execution to the next instruction
|
||||
- NOP sled idea:
|
||||
- put a long sequence of `nop` bytes before the real malicious code
|
||||
- now jumping anywhere in that region still works
|
||||
- execution slides down into the payload
|
||||
|
||||
Why this helps:
|
||||
- it increases the chance that an approximate address guess still succeeds
|
||||
- the slides explicitly state:
|
||||
- now we improve our chances of guessing by a factor of `#nops`
|
||||
|
||||
```text
|
||||
[padding][saved return address guess][nop nop nop ...][malicious code]
|
||||
```
|
||||
|
||||
#### Putting it all together
|
||||
|
||||
- Payload components shown in the slides:
|
||||
- padding
|
||||
- guessed return address
|
||||
- NOP sled
|
||||
- malicious code
|
||||
- Constraint noted by the lecture:
|
||||
- input has to start wherever the vulnerable `gets` / similar function begins writing
|
||||
|
||||
#### Buffer overflow defense #1: use secure bounds-checking functions
|
||||
|
||||
- User-level protection
|
||||
- Replace unbounded routines with bounded ones.
|
||||
- Prefer secure languages where possible:
|
||||
- Java
|
||||
- Rust
|
||||
- etc.
|
||||
|
||||
#### Buffer overflow defense #2: Address Space Layout Randomization (ASLR)
|
||||
|
||||
- Randomize starting address of program regions.
|
||||
- Goal:
|
||||
- prevent attacker from guessing / finding the correct address to put in the return-address slot
|
||||
- OS-level protection
|
||||
|
||||
#### Buffer overflow counter-technique: NOP sled
|
||||
|
||||
- Counter-technique against uncertain addresses
|
||||
- By jumping somewhere into a wide sled, exact address knowledge becomes less necessary
|
||||
|
||||
#### Buffer overflow defense #3: Canary
|
||||
|
||||
- Put a guard value between vulnerable local data and control-flow data.
|
||||
- If overflow changes the canary, the program can detect corruption before returning.
|
||||
- OS-level / compiler-assisted protection in the lecture framing
|
||||
|
||||
#### Buffer overflow defense #4: No-execute bits (NX)
|
||||
|
||||
- Mark the stack as not executable.
|
||||
- Requires hardware support.
|
||||
- OS / hardware-level protection
|
||||
|
||||
#### Buffer overflow counter-technique: ret-to-libc and ROP
|
||||
|
||||
- Code in the C library is already stored at consistent addresses.
|
||||
- Attacker can find code in the C library that has the desired effect.
|
||||
- possibly heavily fragmented
|
||||
- Then return to the necessary address or addresses in the proper order.
|
||||
- This is the motivation behind:
|
||||
- `ret-to-libc`
|
||||
- Return-Oriented Programming (ROP)
|
||||
|
||||
We will continue from defenses / exploitation follow-ups in the next lecture.
|
||||
Reference in New Issue
Block a user