IN4080 - Natural Language Processing

Syllabus

This is an overview of the mandatory readings for the exam. The syllabus generally consists of the lecture slides, weekly exercises, mandatory assignments, along with additional readings, described below.

Jurafsky & Martin 3rd ed. (August 2024 version):

Chapter 2 (text normalization): except 2.1 and 2.8
Chapter 3 (n-gram LMs): except 3.7
Chapter 4 (Na?ve Bayes): except 4.9
Chapter 5 (logistic regression): except 5.10
Chapter 6 (vectors and embeddings): except 6.6, 6.10, 6.12
Chapter 7 (neural networks): except 7.5 and 7.7
Chapter 9 (Transformers): except 9.3
Chapter 10 (LLMs): whole chapter
Chapter 12 (Model Alignment, Prompting, and In-Context Learning): whole chapter
Chapter 13 (machine translation): except 13.1, 13.4.1
Chapter 15 (chatbots and dialogue systems): whole chapter
Appendix H (Phonetics) and Chapter 16 (ASR and TTS): all except 16.4.1-16.4.4 and 16.6.2-16.6.3. You also don't need to know the details of articulatory phonetics (just what has been covered in the lecture).
Chapter 17 (sequence labeling): except 17.7

Other obligatory readings:

On Zipf's law, collocations, type-token-ratio etc.:
- Manning & Sch��tze, Foundations of Statistical Natural Language Processing, chapter 1 (PDF on the website).
On ranking (covered in the first lecture on dialogue systems):
- Ransaka Ravihara, What Is Learning to Rank: A Beginner��s Guide to Learning to Rank Methods, Towards Data science.
On decoding (covered in the second lecture on dialogue sytems):
- Fabio Chiusano, Most used Decoding Methods for Language Models, Medium.
On MDPs:
- Section 24.6 from the Dialogue chapter of 2nd edition of Jurafsky & Martin.
On fairness:
- Ziyuan Zhong, "A tutorial on Fairness in Machine Learning", Towards Data science. NB: you can skip Section 5 of the text.
On privacy:
- Chapter 2 of Domingo-Ferrer, J., S��nchez, D., & Soria-Comas, J. (2016). Database anonymization: privacy models, data utility, and microaggregation-based inter-model connections. Synthesis Lectures on Information Security, Privacy, & Trust, 8(1), 1-136. NB: You can skip the technical details on measuring information loss.

Formulas:

We expect you to know the formulas listed below. However, the most important is to understand the logic behind them and to be able to explain how they should be applied and what they are used for.

Zipf��s laws, type-token ratio
Language model training, additive smoothing, interpolation, perplexity
Accuracy, precision, recall, F-measure, micro- and macro-averaging
Bayes�� theorem, Na?ve Bayes training and prediction formulas
Softmax, logistic regression update rule
HMM training formula, greedy inference formula
Cosine similarity, TF-IDF weighting
Sigmoid function, ReLU, cross-entropy loss
Self-attention
Bellman equation (and the definition of MDPs)
BLEU score
Word error rate
Formulas for group fairness

Other useful links for the exam preparation:

Published Nov. 19, 2024 10:37 AM - Last modified Nov. 19, 2024 10:37 AM