Some checks failed
Sync from Gitea (main→main, keep workflow) / mirror (push) Has been cancelled
11 KiB
11 KiB
CSE4303 Introduction to Computer Security (Lecture 17)
Due to lack of my attention, this lecture note is generated by AI to create continuations of the previous lecture note. I kept this warning because the note was created by AI.
Software security
Administrative notes
Project details
- Project plan
- Thursday,
4/9at the end of class 5%
- Thursday,
- Written document and presentation recording
- Thursday,
4/30at11:30 AM 15%
- Thursday,
- View peer presentations and provide feedback
- Wednesday,
5/6at11:59 PM 5%
- Wednesday,
Upcoming schedule
- This week (
3/20)- software security lecture
- studio
- some time for studio on Tuesday
- Next week (
4/6)- fuzzing
- some time to discuss project ideas
4/13- Web security
4/20- Privacy and ethics overview
- time to work on projects
- course wrap-up
Overview
Outline
- Context
- Prominent software vulnerabilities and exploits
- Buffer overflows
- Background: C code, compilation, memory layout, execution
- Baseline exploit
- Challenges
- Defenses, countermeasures, counter-countermeasures
Sources:
- SEED lab book
- Gilbert/Tamassia book
- Slides from Bryant/O'Hallaron (CMU), Dan Boneh (Stanford), Michael Hicks (UMD)
Context
Context: computing stack (informal)
| Layer | Example |
|---|---|
| Application | web server, standalone app |
| Compiler / assembler | gcc, clang |
| OS: syscalls | execve(), setuid(), write(), open(), fork() |
| OS: processes, mem layout | Linux virtual memory layout |
| Architecture (ISA, execution) | x86, x86_64, ARM |
| Hardware | Intel Sky Lake processor |
- User control is strongest near the application / compiler level.
- System control becomes more important as we move down toward OS, architecture, and hardware.
Prominent software vulnerabilities and exploits
Software security: categories
- Race conditions
- Privilege escalation
- Path traversal
- Environment variable modification
- Language-specific vulnerabilities
- Format string attack
- Buffer overflows
Buffer Overflows (BoFs)
- A buffer overflow is a bug that affects low-level code, typically in C and C++, with significant security implications.
- Normally, a program with this bug will simply crash.
- But an attacker can alter the situations that cause the program to do much worse.
- Steal private information
- e.g. Heartbleed
- Corrupt valuable information
- Run code of the attacker's choice
- Steal private information
Application behavior
- Slide contains a figure only.
- Intended point: normal application behavior can become attacker-controlled if input handling is unsafe.
BoFs: why do we care?
- Reference from slide: IEEE Spectrum top programming languages 2025
Critical systems in C/C++
- Most OS kernels and utilities
fingerd- X windows server
- shell
- Many high-performance servers
- Microsoft IIS
- Apache
httpd nginx- Microsoft SQL Server
- MySQL
redismemcached
- Many embedded systems
- Mars rover
- industrial control systems
- automobiles
A successful attack on these systems can be particularly dangerous.
Morris Worm
- Slide contains a figure / historical reference only.
- It is included as an example of how memory-corruption vulnerabilities mattered in practice.
Why do we still care?
- The slide references the NVD search page: NVD vulnerability search
- Why the drop?
- Memory-safe languages
- Rust
- Go
- Stronger defenses
- Fuzzing
- find bugs before release
- Change in development practices
- code review
- static analysis tools
- related engineering improvements
- Memory-safe languages
MITRE Top 25 2025
- Reference from slide: MITRE CWE Top 25
Buffer overflows
Outline
- System Basics
- Application memory layout
- How does function call work under the hood
32-bit x86only64-bit x86_64similar, but with important differences
- Buffer overflow
- Overwriting the return address pointer
- Point it to shell code injected
Buffer Overflows (BoFs)
- 2-minute version first, then all background / full version
Process memory layout: virtual address space
- Slide reference: virtual address space reference
Process memory layout: function calls
- Slide reference: Tenouk function call figure 1
- Slide reference: Tenouk function call figure 2
Process memory layout: compromised frame
- Slide reference: Tenouk compromised frame figure
Computer System
High-level examples used in the slide:
car *c = malloc(sizeof(car));
c->miles = 100;
c->gals = 17;
float mpg = get_mpg(c);
free(c);
Car c = new Car();
c.setMiles(100);
c.setGals(17);
float mpg = c.getMPG();
Assembly-language example used in the slide:
get_mpg:
pushq %rbp
movq %rsp, %rbp
...
popq %rbp
ret
- The same computation can be viewed at multiple levels:
- C / Java source
- assembly language
- machine code
- operating system context
Little Theme 1: Representation
- All digital systems represent everything as
0s and1s.- The
0and1are really two different voltage ranges in wires. - Or magnetic positions on a disk, hole depths on a DVD, or even DNA.
- The
- "Everything" includes:
- numbers
- integers and floating point
- characters
- building blocks of strings
- instructions
- directives to the CPU that make up a program
- pointers
- addresses of data objects stored in memory
- numbers
- These encodings are stored throughout the computer system.
- registers
- caches
- memories
- disks
- They all need addresses.
- find an item
- find a place for a new item
- reclaim memory when data is no longer needed
Little Theme 2: Translation
- There is a big gap between how we think about programs / data and the
0s and1s of computers. - We need languages to describe what we mean.
- These languages must be translated one level at a time.
- Example point from the slide:
- we know Java as a programming language
- but we must work down to the
0s and1s of computers - we try not to lose anything in translation
- we encounter Java bytecode, C, assembly, and machine code
Little Theme 3: Control Flow
- How do computers orchestrate everything they are doing?
- Within one program:
- How are
if/else, loops, and switches implemented? - How do we track nested procedure calls?
- How do we know what to do upon
return?
- How are
- At the operating-system level:
- library loading
- sharing system resources
- memory
- I/O
- disks
HW/SW Interface: Code / Compile / Run Times
- Code time
- user program in C
.cfile
- Compile time
- C compiler
- assembler
- Run time
- executable
.exefile - hardware executes it
- executable
- Note from slide:
- the compiler and assembler are themselves just programs developed using this same process
Assembly Programmer's View
- Programmer-visible CPU / memory state
- Program counter
- address of next instruction
- called
RIPin x86-64
- Named registers
- heavily used program data
- together called the register file
- Condition codes
- store status information about most recent arithmetic operation
- used for conditional branching
- Program counter
- Memory
- byte-addressable array
- contains code and user data
- includes the stack for supporting procedures
Turning C into Object Code
- Code in files
p1.candp2.c - Compile with:
gcc -Og p1.c p2.c -o p
- Notes from the slide
-Oguses basic optimizations- resulting machine code goes into file
p
- Translation chain
- C program -> assembly program -> object program -> executable program
- Associated tools
- compiler
- assembler
- linker
- static libraries (
.a)
Machine Instruction Example
- C code
*dest = t;
- Meaning
- store value
twhere designated bydest
- store value
- Assembly
movq %rsi, (%rdx)
- Interpretation
- move 8-byte value to memory
- operands
tis in register%rsidestis in register%rdx*destmeans memoryM[%rdx]
- Object code
0x400539: 48 89 32
- It is a 3-byte instruction stored at address
0x400539.
IA32 Registers - 32 bits wide
- General-purpose register families shown in the slide
%eax,%ax,%ah,%al%ecx,%cx,%ch,%cl%edx,%dx,%dh,%dl%ebx,%bx,%bh,%bl%esi,%si%edi,%di%esp,%sp%ebp,%bp
- Roles highlighted in the slide
- accumulate
- counter
- data
- base
- source index
- destination index
- stack pointer
- base pointer
Data Sizes
- Slide is primarily a figure summarizing common integer widths and sizes.
Assembly Data Types
- "Integer" data of
1,2,4, or8bytes- data values
- addresses / untyped pointers
- No aggregate types such as arrays or structures at the assembly level
- just contiguous bytes in memory
- Two common syntaxes
AT&T- used in the course, slides, textbook, GNU tools
Intel- used in Intel documentation and Intel tools
- Need to know which syntax you are reading because operand order may be reversed.
Three Basic Kinds of Instructions
- Transfer data between memory and register
- load
%reg = Mem[address]
- store
Mem[address] = %reg
- load
- Perform arithmetic on register or memory data
- examples: addition, shifting, bitwise operations
- Control flow
- unconditional jumps to / from procedures
- conditional branches
Abstract Memory Layout
High addresses
Stack <- local variables, procedure context
Dynamic Data <- heap, new / malloc
Static Data <- globals / static variables
Literals <- large constants such as strings
Instructions
Low addresses
The ELF File Format
- ELF = Executable and Linkable Format
- One of the most widely used binary object formats
- ELF is architecture-independent
- ELF file types
- Relocatable
- must be fixed by the linker before execution
- Executable
- ready for execution
- Shared
- shared libraries with linking information
- Core
- core dumps created when a program terminates with a fault
- Relocatable
- Tools mentioned on slide
readelffileobjdump -D
Process Memory Layout (32-bit x86 machine)
- This slide is primarily a diagram.
- Key idea: a
32-bit x86process has a standard virtual memory layout with code, static data, heap, and stack arranged in distinct regions.
We continue with the concrete runtime layout and the actual overflow mechanics in Lecture 18.