updates
This commit is contained in:
@@ -1,10 +1,140 @@
|
||||
# CSE5313 Coding and information theory for data science (Lecture 1)
|
||||
|
||||
## Example problem
|
||||
|
||||
Get sum of original server data when any one of the servers is down.
|
||||
|
||||
-> Gradient coding for distributed machine learning.
|
||||
|
||||
Two shares give anything and on share gives nothing.
|
||||
|
||||
-> Scheme: Let $p(x)=s+rx$f where $r$ is a random number. Share 1 is $p(1)$ and share 2 is $p(2)$. where $s$ is the final key. (The shamir secret sharing scheme)
|
||||
|
||||
## Montage of topics
|
||||
|
||||
### Storage system in the Wild
|
||||
|
||||
- Huge amount of data worldwide.
|
||||
- Grows exponentially.
|
||||
- Must be reliably
|
||||
- Must be accessed easily
|
||||
|
||||
Challenge 1: reconstructing the data from subset of servers.
|
||||
|
||||
Challenge 2: repair and maintain the data.
|
||||
|
||||
Challenge 3: data access efficiency.
|
||||
|
||||
### Privacy information retrieval
|
||||
|
||||
Retrieving data from a database without revealing which data is retrieved.
|
||||
|
||||
silly solution: download everything.
|
||||
|
||||
### Coding for distributed computation
|
||||
|
||||
Problems in distributed computation:
|
||||
|
||||
- Stragglers: 5% of the server is x5-x8 times slower than the rest.
|
||||
- Privacy of the data
|
||||
- Hardware failure
|
||||
- Adversaries: one adversary can corrupt the divergence of the computation.
|
||||
- Privacy of computation
|
||||
|
||||
Network coding
|
||||
|
||||
Passing information over network of nodes
|
||||
|
||||
DNA storage
|
||||
|
||||
GTAC
|
||||
|
||||
- Super-dense
|
||||
- Super-durable
|
||||
- Non-volatile (no extra energy required to maintain the data)
|
||||
- Future proof (may be interested by future generations)
|
||||
|
||||
Challenges:
|
||||
|
||||
- DNA synthesis is hard, only short strings (1000 nucleotides)
|
||||
- Very noisy.
|
||||
|
||||
## Topics covered in this course
|
||||
|
||||
Part 1: Foundations
|
||||
|
||||
Foundations:
|
||||
|
||||
- Finite fields, linear codes, information theory
|
||||
|
||||
Distributed storage:
|
||||
|
||||
- Regenerating codes, locally recoverable codes
|
||||
|
||||
Privacy:
|
||||
|
||||
- secret sharing, multi-party computation, private information retrieval
|
||||
|
||||
Midterm: (short and easy.. I guess)
|
||||
|
||||
Part 2: contemporary topics
|
||||
|
||||
Computation:
|
||||
|
||||
- coded computation, gradient coding, private computation
|
||||
|
||||
Emerging/advanced topics:
|
||||
|
||||
- DNA storage
|
||||
|
||||
## Administration
|
||||
|
||||
Grading:
|
||||
|
||||
3-5 HW assignments (50%)
|
||||
Midterm (25%)
|
||||
Final assignment (25%)
|
||||
|
||||
- For a recent paper: read, summarize, criticize and propose for future work.
|
||||
|
||||
Textbooks:
|
||||
|
||||
- Introduction to Coding Theory, R. M. Roth.
|
||||
- Elements of Information Theory, T. M. Cover and J. A. Thomas.
|
||||
|
||||
The slides for every lecture will be made available online.
|
||||
|
||||
### Clarification about Gen AI
|
||||
|
||||
The use of generative artificial intelligence tools (GenAI) is permitted. However, students are
|
||||
not permitted to ask GenAI for complete solutions, and must limit their interaction with GenAI
|
||||
to light document editing (grammar, typos, etc.), understanding background information, and to
|
||||
seeking alternative explanations for course material. In any case, submission of AI generated
|
||||
text is prohibited, and up to light editing, students must write all submitted text themselves.
|
||||
|
||||
IMPORTANT:
|
||||
|
||||
- In all submitted assignments and projects, students must **include a "Use Of GenAI" paragraph**, which briefly summarizes their use of GenAI in preparing this assignment/project.
|
||||
- Failure to include this paragraph, or including untruthful statements, will be considered a violation of academic integrity.
|
||||
- The course staff reserves the right to summon any student for an oral exam regarding the content of any submitted assignment or project, and adjust their grade accordingly. The exam will focus on explaining the reasoning behind the work, and no memorization will be required. Students will be summoned at random, and not necessarily on the basis of any accusation or suspicion.
|
||||
|
||||
## Channel coding
|
||||
|
||||
Input alphabet $F$, output alphabet $\Phi$.
|
||||
|
||||
e.g. $F=\{0,1\},\mathbb{R}$.
|
||||
|
||||
Introduce noise: $\operatorname{Pr}(c'\text{ received}|c\text{ transmitted})$.
|
||||
|
||||
We use $u$ to denote the information to be transmitted
|
||||
|
||||
$c$ to be the codeword.
|
||||
|
||||
$c'$ is the received codeword. given to the decoder.
|
||||
|
||||
$u'$ is the decoded information word.
|
||||
|
||||
Error if $u' \neq u$.
|
||||
|
||||
A channel is defined by the three tuple $(F, \Phi, \operatorname{Pr}(c'|c))$.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user