update
Some checks failed
Sync from Gitea (main→main, keep workflow) / mirror (push) Has been cancelled
Some checks failed
Sync from Gitea (main→main, keep workflow) / mirror (push) Has been cancelled
This commit is contained in:
@@ -1,3 +1,430 @@
|
||||
# CSE4303 Introduction to Computer Security (Lecture 17)
|
||||
|
||||
## Software security
|
||||
> Due to lack of my attention, this lecture note is generated by AI to create continuations of the previous lecture note. I kept this warning because the note was created by AI.
|
||||
|
||||
#### Software security
|
||||
|
||||
### Administrative notes
|
||||
|
||||
#### Project details
|
||||
|
||||
- Project plan
|
||||
- Thursday, `4/9` at the end of class
|
||||
- `5%`
|
||||
- Written document and presentation recording
|
||||
- Thursday, `4/30` at `11:30 AM`
|
||||
- `15%`
|
||||
- View peer presentations and provide feedback
|
||||
- Wednesday, `5/6` at `11:59 PM`
|
||||
- `5%`
|
||||
|
||||
#### Upcoming schedule
|
||||
|
||||
- This week (`3/20`)
|
||||
- software security lecture
|
||||
- studio
|
||||
- some time for studio on Tuesday
|
||||
- Next week (`4/6`)
|
||||
- fuzzing
|
||||
- some time to discuss project ideas
|
||||
- `4/13`
|
||||
- Web security
|
||||
- `4/20`
|
||||
- Privacy and ethics overview
|
||||
- time to work on projects
|
||||
- course wrap-up
|
||||
|
||||
### Overview
|
||||
|
||||
#### Outline
|
||||
|
||||
- Context
|
||||
- Prominent software vulnerabilities and exploits
|
||||
- Buffer overflows
|
||||
- Background: C code, compilation, memory layout, execution
|
||||
- Baseline exploit
|
||||
- Challenges
|
||||
- Defenses, countermeasures, counter-countermeasures
|
||||
|
||||
Sources:
|
||||
- SEED lab book
|
||||
- Gilbert/Tamassia book
|
||||
- Slides from Bryant/O'Hallaron (CMU), Dan Boneh (Stanford), Michael Hicks (UMD)
|
||||
|
||||
### Context
|
||||
|
||||
#### Context: computing stack (informal)
|
||||
|
||||
| Layer | Example |
|
||||
| --- | --- |
|
||||
| Application | web server, standalone app |
|
||||
| Compiler / assembler | `gcc`, `clang` |
|
||||
| OS: syscalls | `execve()`, `setuid()`, `write()`, `open()`, `fork()` |
|
||||
| OS: processes, mem layout | Linux virtual memory layout |
|
||||
| Architecture (ISA, execution) | x86, x86_64, ARM |
|
||||
| Hardware | Intel Sky Lake processor |
|
||||
|
||||
- User control is strongest near the application / compiler level.
|
||||
- System control becomes more important as we move down toward OS, architecture, and hardware.
|
||||
|
||||
### Prominent software vulnerabilities and exploits
|
||||
|
||||
#### Software security: categories
|
||||
|
||||
- Race conditions
|
||||
- Privilege escalation
|
||||
- Path traversal
|
||||
- Environment variable modification
|
||||
- Language-specific vulnerabilities
|
||||
- Format string attack
|
||||
- Buffer overflows
|
||||
|
||||
#### Buffer Overflows (BoFs)
|
||||
|
||||
- A buffer overflow is a bug that affects low-level code, typically in C and C++, with significant security implications.
|
||||
- Normally, a program with this bug will simply crash.
|
||||
- But an attacker can alter the situations that cause the program to do much worse.
|
||||
- Steal private information
|
||||
- e.g. Heartbleed
|
||||
- Corrupt valuable information
|
||||
- Run code of the attacker's choice
|
||||
|
||||
#### Application behavior
|
||||
|
||||
- Slide contains a figure only.
|
||||
- Intended point: normal application behavior can become attacker-controlled if input handling is unsafe.
|
||||
|
||||
#### BoFs: why do we care?
|
||||
|
||||
- Reference from slide: [IEEE Spectrum top programming languages 2025](https://spectrum.ieee.org/top-programming-languages-2025)
|
||||
|
||||
#### Critical systems in C/C++
|
||||
|
||||
- Most OS kernels and utilities
|
||||
- `fingerd`
|
||||
- X windows server
|
||||
- shell
|
||||
- Many high-performance servers
|
||||
- Microsoft IIS
|
||||
- Apache `httpd`
|
||||
- `nginx`
|
||||
- Microsoft SQL Server
|
||||
- MySQL
|
||||
- `redis`
|
||||
- `memcached`
|
||||
- Many embedded systems
|
||||
- Mars rover
|
||||
- industrial control systems
|
||||
- automobiles
|
||||
|
||||
A successful attack on these systems can be particularly dangerous.
|
||||
|
||||
#### Morris Worm
|
||||
|
||||
- Slide contains a figure / historical reference only.
|
||||
- It is included as an example of how memory-corruption vulnerabilities mattered in practice.
|
||||
|
||||
#### Why do we still care?
|
||||
|
||||
- The slide references the NVD search page: [NVD vulnerability search](https://nvd.nist.gov/vuln/search)
|
||||
- Why the drop?
|
||||
- Memory-safe languages
|
||||
- Rust
|
||||
- Go
|
||||
- Stronger defenses
|
||||
- Fuzzing
|
||||
- find bugs before release
|
||||
- Change in development practices
|
||||
- code review
|
||||
- static analysis tools
|
||||
- related engineering improvements
|
||||
|
||||
#### MITRE Top 25 2025
|
||||
|
||||
- Reference from slide: [MITRE CWE Top 25](http://cwe.mitre.org/top25/)
|
||||
|
||||
### Buffer overflows
|
||||
|
||||
#### Outline
|
||||
|
||||
- System Basics
|
||||
- Application memory layout
|
||||
- How does function call work under the hood
|
||||
- `32-bit x86` only
|
||||
- `64-bit x86_64` similar, but with important differences
|
||||
- Buffer overflow
|
||||
- Overwriting the return address pointer
|
||||
- Point it to shell code injected
|
||||
|
||||
#### Buffer Overflows (BoFs)
|
||||
|
||||
- 2-minute version first, then all background / full version
|
||||
|
||||
#### Process memory layout: virtual address space
|
||||
|
||||
- Slide reference: [virtual address space reference](https://hungys.xyz/unix-prog-process-environment/)
|
||||
|
||||
#### Process memory layout: function calls
|
||||
|
||||
- Slide reference: [Tenouk function call figure 1](http://www.tenouk.com/Bufferoverflowc/Bufferoverflow2.html)
|
||||
- Slide reference: [Tenouk function call figure 2](http://www.tenouk.com/Bufferoverflowc/Bufferoverflow4.html)
|
||||
|
||||
#### Process memory layout: compromised frame
|
||||
|
||||
- Slide reference: [Tenouk compromised frame figure](http://www.tenouk.com/Bufferoverflowc/Bufferoverflow4.html)
|
||||
|
||||
#### Computer System
|
||||
|
||||
High-level examples used in the slide:
|
||||
|
||||
```c
|
||||
car *c = malloc(sizeof(car));
|
||||
c->miles = 100;
|
||||
c->gals = 17;
|
||||
float mpg = get_mpg(c);
|
||||
free(c);
|
||||
```
|
||||
|
||||
```java
|
||||
Car c = new Car();
|
||||
c.setMiles(100);
|
||||
c.setGals(17);
|
||||
float mpg = c.getMPG();
|
||||
```
|
||||
|
||||
Assembly-language example used in the slide:
|
||||
|
||||
```asm
|
||||
get_mpg:
|
||||
pushq %rbp
|
||||
movq %rsp, %rbp
|
||||
...
|
||||
popq %rbp
|
||||
ret
|
||||
```
|
||||
|
||||
- The same computation can be viewed at multiple levels:
|
||||
- C / Java source
|
||||
- assembly language
|
||||
- machine code
|
||||
- operating system context
|
||||
|
||||
#### Little Theme 1: Representation
|
||||
|
||||
- All digital systems represent everything as `0`s and `1`s.
|
||||
- The `0` and `1` are really two different voltage ranges in wires.
|
||||
- Or magnetic positions on a disk, hole depths on a DVD, or even DNA.
|
||||
- "Everything" includes:
|
||||
- numbers
|
||||
- integers and floating point
|
||||
- characters
|
||||
- building blocks of strings
|
||||
- instructions
|
||||
- directives to the CPU that make up a program
|
||||
- pointers
|
||||
- addresses of data objects stored in memory
|
||||
- These encodings are stored throughout the computer system.
|
||||
- registers
|
||||
- caches
|
||||
- memories
|
||||
- disks
|
||||
- They all need addresses.
|
||||
- find an item
|
||||
- find a place for a new item
|
||||
- reclaim memory when data is no longer needed
|
||||
|
||||
#### Little Theme 2: Translation
|
||||
|
||||
- There is a big gap between how we think about programs / data and the `0`s and `1`s of computers.
|
||||
- We need languages to describe what we mean.
|
||||
- These languages must be translated one level at a time.
|
||||
- Example point from the slide:
|
||||
- we know Java as a programming language
|
||||
- but we must work down to the `0`s and `1`s of computers
|
||||
- we try not to lose anything in translation
|
||||
- we encounter Java bytecode, C, assembly, and machine code
|
||||
|
||||
#### Little Theme 3: Control Flow
|
||||
|
||||
- How do computers orchestrate everything they are doing?
|
||||
- Within one program:
|
||||
- How are `if/else`, loops, and switches implemented?
|
||||
- How do we track nested procedure calls?
|
||||
- How do we know what to do upon `return`?
|
||||
- At the operating-system level:
|
||||
- library loading
|
||||
- sharing system resources
|
||||
- memory
|
||||
- I/O
|
||||
- disks
|
||||
|
||||
#### HW/SW Interface: Code / Compile / Run Times
|
||||
|
||||
- Code time
|
||||
- user program in C
|
||||
- `.c` file
|
||||
- Compile time
|
||||
- C compiler
|
||||
- assembler
|
||||
- Run time
|
||||
- executable `.exe` file
|
||||
- hardware executes it
|
||||
- Note from slide:
|
||||
- the compiler and assembler are themselves just programs developed using this same process
|
||||
|
||||
#### Assembly Programmer's View
|
||||
|
||||
- Programmer-visible CPU / memory state
|
||||
- Program counter
|
||||
- address of next instruction
|
||||
- called `RIP` in x86-64
|
||||
- Named registers
|
||||
- heavily used program data
|
||||
- together called the register file
|
||||
- Condition codes
|
||||
- store status information about most recent arithmetic operation
|
||||
- used for conditional branching
|
||||
- Memory
|
||||
- byte-addressable array
|
||||
- contains code and user data
|
||||
- includes the stack for supporting procedures
|
||||
|
||||
#### Turning C into Object Code
|
||||
|
||||
- Code in files `p1.c` and `p2.c`
|
||||
- Compile with:
|
||||
|
||||
```bash
|
||||
gcc -Og p1.c p2.c -o p
|
||||
```
|
||||
|
||||
- Notes from the slide
|
||||
- `-Og` uses basic optimizations
|
||||
- resulting machine code goes into file `p`
|
||||
- Translation chain
|
||||
- C program -> assembly program -> object program -> executable program
|
||||
- Associated tools
|
||||
- compiler
|
||||
- assembler
|
||||
- linker
|
||||
- static libraries (`.a`)
|
||||
|
||||
#### Machine Instruction Example
|
||||
|
||||
- C code
|
||||
|
||||
```c
|
||||
*dest = t;
|
||||
```
|
||||
|
||||
- Meaning
|
||||
- store value `t` where designated by `dest`
|
||||
- Assembly
|
||||
|
||||
```asm
|
||||
movq %rsi, (%rdx)
|
||||
```
|
||||
|
||||
- Interpretation
|
||||
- move 8-byte value to memory
|
||||
- operands
|
||||
- `t` is in register `%rsi`
|
||||
- `dest` is in register `%rdx`
|
||||
- `*dest` means memory `M[%rdx]`
|
||||
- Object code
|
||||
|
||||
```text
|
||||
0x400539: 48 89 32
|
||||
```
|
||||
|
||||
- It is a 3-byte instruction stored at address `0x400539`.
|
||||
|
||||
#### IA32 Registers - 32 bits wide
|
||||
|
||||
- General-purpose register families shown in the slide
|
||||
- `%eax`, `%ax`, `%ah`, `%al`
|
||||
- `%ecx`, `%cx`, `%ch`, `%cl`
|
||||
- `%edx`, `%dx`, `%dh`, `%dl`
|
||||
- `%ebx`, `%bx`, `%bh`, `%bl`
|
||||
- `%esi`, `%si`
|
||||
- `%edi`, `%di`
|
||||
- `%esp`, `%sp`
|
||||
- `%ebp`, `%bp`
|
||||
- Roles highlighted in the slide
|
||||
- accumulate
|
||||
- counter
|
||||
- data
|
||||
- base
|
||||
- source index
|
||||
- destination index
|
||||
- stack pointer
|
||||
- base pointer
|
||||
|
||||
#### Data Sizes
|
||||
|
||||
- Slide is primarily a figure summarizing common integer widths and sizes.
|
||||
|
||||
#### Assembly Data Types
|
||||
|
||||
- "Integer" data of `1`, `2`, `4`, or `8` bytes
|
||||
- data values
|
||||
- addresses / untyped pointers
|
||||
- No aggregate types such as arrays or structures at the assembly level
|
||||
- just contiguous bytes in memory
|
||||
- Two common syntaxes
|
||||
- `AT&T`
|
||||
- used in the course, slides, textbook, GNU tools
|
||||
- `Intel`
|
||||
- used in Intel documentation and Intel tools
|
||||
- Need to know which syntax you are reading because operand order may be reversed.
|
||||
|
||||
#### Three Basic Kinds of Instructions
|
||||
|
||||
- Transfer data between memory and register
|
||||
- load
|
||||
- `%reg = Mem[address]`
|
||||
- store
|
||||
- `Mem[address] = %reg`
|
||||
- Perform arithmetic on register or memory data
|
||||
- examples: addition, shifting, bitwise operations
|
||||
- Control flow
|
||||
- unconditional jumps to / from procedures
|
||||
- conditional branches
|
||||
|
||||
#### Abstract Memory Layout
|
||||
|
||||
```text
|
||||
High addresses
|
||||
Stack <- local variables, procedure context
|
||||
Dynamic Data <- heap, new / malloc
|
||||
Static Data <- globals / static variables
|
||||
Literals <- large constants such as strings
|
||||
Instructions
|
||||
Low addresses
|
||||
```
|
||||
|
||||
#### The ELF File Format
|
||||
|
||||
- ELF = Executable and Linkable Format
|
||||
- One of the most widely used binary object formats
|
||||
- ELF is architecture-independent
|
||||
- ELF file types
|
||||
- Relocatable
|
||||
- must be fixed by the linker before execution
|
||||
- Executable
|
||||
- ready for execution
|
||||
- Shared
|
||||
- shared libraries with linking information
|
||||
- Core
|
||||
- core dumps created when a program terminates with a fault
|
||||
- Tools mentioned on slide
|
||||
- `readelf`
|
||||
- `file`
|
||||
- `objdump -D`
|
||||
|
||||
#### Process Memory Layout (32-bit x86 machine)
|
||||
|
||||
- This slide is primarily a diagram.
|
||||
- Key idea: a `32-bit x86` process has a standard virtual memory layout with code, static data, heap, and stack arranged in distinct regions.
|
||||
|
||||
We continue with the concrete runtime layout and the actual overflow mechanics in Lecture 18.
|
||||
|
||||
Reference in New Issue
Block a user