# CSE4303 Introduction to Computer Security (Lecture 9) ## Cryptographic Hash Functions ### What is a Hash Function A hash function maps a variable-length input to a fixed-length output. $h : X \to Y$ Typical examples: - Java hashCode(): input is an Object, output is a 4-byte integer. - String polynomial hash example: $h("cs433s") = 'c' \cdot 31^6 + 's' \cdot 31^5 + \dots + 's'$ Key property: - Domain $|X|$ is much larger than range $|Y|$. - Collisions are unavoidable in principle since $|X| > |Y|$. Main uses: - Compact numerical representation - Hash tables (Set, Map, dictionaries) - Object comparison - Integrity checking (fingerprint) ### Security Properties Let $h : X \to Y$. 1. Preimage Resistance (One-way) Given $y \in Y$, it is computationally infeasible to find $x \in X$ such that $h(x) = y$. 2. Second Preimage Resistance (Weak collision resistance) Given a specific $x \in X$, it is computationally infeasible to find $x' \neq x$ such that $h(x') = h(x)$. 3. Collision Resistance (Strong collision resistance) It is computationally infeasible to find any two distinct values $x, x' \in X$ such that $h(x) = h(x')$. Adversarial definition: Let $H : M \to T$ where $|M|$ is much larger than $|T|$. $H$ is collision resistant if for all efficient algorithms $A$: $Adv_{CR}[A, H] = Pr[A$ outputs a collision for $H]$ is negligible. ### Generic Collision Attack (Birthday Attack) Let $H : M \to \{0,1\}^n$. Generic algorithm to find a collision in time on the order of $2^{n/2}$: 1. Choose $2^{n/2}$ random messages $m_1, \dots, m_{2^{n/2}}$. 2. Compute $t_i = H(m_i)$. 3. Look for $t_i = t_j$. Birthday phenomenon: If the output space size is $B$, high collision probability greater than $50\%$ occurs with about $\sqrt{B}$ samples. Thus: - 128-bit hash gives about $2^{64}$ collision attack - 256-bit hash gives about $2^{128}$ collision attack ### Practical Hash Functions From performance and security table (AMD Opteron 2.2 GHz): - MD5: 128 bits, completely broken since 2004 - SHA-1: 160 bits, practical collision attack demonstrated - SHA-256: 256 bits - SHA-512: 512 bits - Whirlpool: 512 bits SHA-1 collision example: SHAttered attack (Google and CWI). Two different PDF files were produced with identical SHA-1 hash. ## Construction of Cryptographic Hash Functions ### Merkle-Damgard Construction Given compression function: $h : T \times X \to T$ We build: $H : X^{\le L} \to T$ Process: - Split message into blocks $m[0], m[1], \dots, m[L]$. - Use fixed initialization vector $IV$. - Iterate chaining: $H_0 = IV$ $H_1 = h(H_0, m[0])$ $H_2 = h(H_1, m[1])$ $\dots$ $H_L = h(H_{L-1}, m[L])$ - Apply padding: append $1000\ldots0$ concatenated with message length (64 bits). If no space remains, add another block. Theorem: If compression function $h$ is collision resistant, then $H$ is collision resistant. ### Davies-Meyer Compression from Block Cipher Given block cipher: $E : K \times \{0,1\}^n \to \{0,1\}^n$ Define compression function: $h(H, m) = E(m, H) \oplus H$ If $E$ behaves like an ideal cipher, finding a collision in $h$ takes about $2^{n/2}$ evaluations. This is optimal for $n$-bit output. ### Example: SHA-256 Built using: - Merkle-Damgard construction - Davies-Meyer style compression - Block cipher-like core: SHACAL-2 Structure: - 512-bit message block - 256-bit chaining value - 256-bit output ## Applications for Integrity and Authentication ### Standalone Usage: Message Integrity #### Application 1: Delayed Knowledge Verification Idea: Publish $h(secret)$ first. Later reveal secret. Anyone can recompute hash and verify. Justification: Preimage resistance ensures secret is hidden until revealed. Example: Stock market prediction commitment.
Example for delayed knowledge verification 1. Publish $H("Stock will rise on May 1")$. 2. On May 1, reveal the prediction string. 3. Anyone computes hash and checks equality.
#### Application 2: Password Storage Model: System must verify password but not store plaintext. Solution: Store hash of password. During login: - Hash input - Compare with stored value Example: Linux stores hashed passwords in the /etc/shadow file. Includes: - Salt - Password hash - Metadata Security relies on: - One-way property - Salting to prevent precomputed attacks #### Application 3: Trusted Timestamping and Blockchains Goal: Prove document existed before a given date. Methods: - Publish document hash in newspaper. - Time Stamping Authority signs hash. - Publish hash in blockchain block. Blockchain relies on: - One-way hash functions - Linking blocks via hash pointers #### Application 4: Software Integrity with Secure Read-Only Space Context: Trusted read-only public space (for example official website). Process: 1. Publisher computes $H(F_1), H(F_2), \dots, H(F_n)$. 2. Publish hashes publicly. 3. User downloads file $F_i$ and verifies hash. If $H$ is collision resistant: Attacker cannot modify file without detection. No encryption required. Public verifiability works if read-only space is trusted. ## Symmetric Crypto Authentication: MACs and AE This section can also be found here [CSE442T Introduction to Cryptography (Lecture 18)](https://notenextra.trance-0.com/CSE442T/CSE442T_L18/#chapter-5-authentication) ### Message Authentication Codes (MACs) Definition: MAC $I = (S, V)$ over $(K, M, T)$ - $S(k, m) \to t$ - $V(k, m, t) \to$ yes or no Security model: Attacker can query $S(k, m_i)$. Goal: produce new $(m, t)$ not previously seen such that $V$ accepts. $Adv_{MAC}[A, I]$ must be negligible. ### MAC from PRF Given PRF: $F : K \times X \to Y$ Define MAC: $S(k, m) = F(k, m)$ $V(k, m, t)$ accepts if $t = F(k, m)$ Theorem: If $F$ is secure PRF and $|Y|$ is large, then derived MAC is secure. Condition: $1 / |Y|$ must be negligible. Example: $|Y| = 2^{80}$.