This commit is contained in:
Trance-0
2025-08-26 14:18:11 -05:00
parent 4c964e13f9
commit 4ed190aab8
2 changed files with 157 additions and 1 deletions

View File

@@ -1,10 +1,140 @@
# CSE5313 Coding and information theory for data science (Lecture 1)
## Example problem
Get sum of original server data when any one of the servers is down.
-> Gradient coding for distributed machine learning.
Two shares give anything and on share gives nothing.
-> Scheme: Let $p(x)=s+rx$f where $r$ is a random number. Share 1 is $p(1)$ and share 2 is $p(2)$. where $s$ is the final key. (The shamir secret sharing scheme)
## Montage of topics
### Storage system in the Wild
- Huge amount of data worldwide.
- Grows exponentially.
- Must be reliably
- Must be accessed easily
Challenge 1: reconstructing the data from subset of servers.
Challenge 2: repair and maintain the data.
Challenge 3: data access efficiency.
### Privacy information retrieval
Retrieving data from a database without revealing which data is retrieved.
silly solution: download everything.
### Coding for distributed computation
Problems in distributed computation:
- Stragglers: 5% of the server is x5-x8 times slower than the rest.
- Privacy of the data
- Hardware failure
- Adversaries: one adversary can corrupt the divergence of the computation.
- Privacy of computation
Network coding
Passing information over network of nodes
DNA storage
GTAC
- Super-dense
- Super-durable
- Non-volatile (no extra energy required to maintain the data)
- Future proof (may be interested by future generations)
Challenges:
- DNA synthesis is hard, only short strings (1000 nucleotides)
- Very noisy.
## Topics covered in this course
Part 1: Foundations
Foundations:
- Finite fields, linear codes, information theory
Distributed storage:
- Regenerating codes, locally recoverable codes
Privacy:
- secret sharing, multi-party computation, private information retrieval
Midterm: (short and easy.. I guess)
Part 2: contemporary topics
Computation:
- coded computation, gradient coding, private computation
Emerging/advanced topics:
- DNA storage
## Administration
Grading:
3-5 HW assignments (50%)
Midterm (25%)
Final assignment (25%)
- For a recent paper: read, summarize, criticize and propose for future work.
Textbooks:
- Introduction to Coding Theory, R. M. Roth.
- Elements of Information Theory, T. M. Cover and J. A. Thomas.
The slides for every lecture will be made available online.
### Clarification about Gen AI
The use of generative artificial intelligence tools (GenAI) is permitted. However, students are
not permitted to ask GenAI for complete solutions, and must limit their interaction with GenAI
to light document editing (grammar, typos, etc.), understanding background information, and to
seeking alternative explanations for course material. In any case, submission of AI generated
text is prohibited, and up to light editing, students must write all submitted text themselves.
IMPORTANT:
- In all submitted assignments and projects, students must **include a "Use Of GenAI" paragraph**, which briefly summarizes their use of GenAI in preparing this assignment/project.
- Failure to include this paragraph, or including untruthful statements, will be considered a violation of academic integrity.
- The course staff reserves the right to summon any student for an oral exam regarding the content of any submitted assignment or project, and adjust their grade accordingly. The exam will focus on explaining the reasoning behind the work, and no memorization will be required. Students will be summoned at random, and not necessarily on the basis of any accusation or suspicion.
## Channel coding
Input alphabet $F$, output alphabet $\Phi$.
e.g. $F=\{0,1\},\mathbb{R}$.
Introduce noise: $\operatorname{Pr}(c'\text{ received}|c\text{ transmitted})$.
We use $u$ to denote the information to be transmitted
$c$ to be the codeword.
$c'$ is the received codeword. given to the decoder.
$u'$ is the decoded information word.
Error if $u' \neq u$.
A channel is defined by the three tuple $(F, \Phi, \operatorname{Pr}(c'|c))$.

View File

@@ -1,2 +1,28 @@
# CSE510 Deep Reinforcement Learning (Lecture 1)
# CSE5519 Lecture 1
## What is the class about?
recent research trends and how do they work.
important to work on depends on and hwo that relates to what you are working.
engage in deep technical discussion about computer vision.
how does the CV community work?
## Grading
Topic presentation: 25%
you are encouraged to discuss the topic you are presenting with others, but you are responsible for creating the presentation materials and presenting the paper. If you use outside materials, they must be fully acknowledged.
Example notebooks: 15%
Two notebooks, done independently. Implement
Course project: 50%
Team of 2-3 students.
Participation: 10%