Winter 2023
Stanford University
CS 224N provides an in-depth introduction to neural networks for NLP, focusing on end-to-end neural models. The course covers topics such as word vectors, recurrent neural networks, and transformer models, among others.
Natural language processing (NLP) is a crucial part of artificial intelligence (AI), modeling how people share information. In recent years, deep learning approaches have obtained very high performance on many NLP tasks. In this course, students gain a thorough introduction to cutting-edge neural networks for NLP.
Proficiency in Python All class assignments will be in Python (using NumPy and PyTorch). If you need to remind yourself of Python, or you're not very familiar with NumPy, you can come to the Python review session in week 1 (listed in the schedule). If you have a lot of programming experience but in a different language (e.g. C/C++/Matlab/Java/Javascript), you will probably be fine.
College Calculus, Linear Algebra (e.g. MATH 51, CME 100) You should be comfortable taking (multivariable) derivatives and understanding matrix/vector notation and operations.
Basic Probability and Statistics (e.g. CS 109 or equivalent) You should know the basics of probabilities, gaussian distributions, mean, standard deviation, etc.
Foundations of Machine Learning (e.g. CS221, CS229, CS230, or CS124) We will be formulating cost functions, taking derivatives and performing optimization with gradient descent. If you already have basic machine learning and/or deep learning knowledge, the course will be easier; however it is possible to take CS224n without it. There are many introductions to ML, in webpage, book, and video form. One approachable introduction is Hal Daumé’s in-progress A Course in Machine Learning. Reading the first 5 chapters of that book would be good background. Knowing the first 7 chapters would be even better!
Natural language processing (NLP) or computational linguistics is one of the most important technologies of the information age. Applications of NLP are everywhere because people communicate almost everything in language: web search, advertising, emails, customer service, language translation, virtual agents, medical reports, politics, etc. In the last decade, deep learning (or neural network) approaches have obtained very high performance across many different NLP tasks, using single end-to-end neural models that do not require traditional, task-specific feature engineering. In this course, students will gain a thorough introduction to cutting-edge research in Deep Learning for NLP. Through lectures, assignments and a final project, students will learn the necessary skills to design, implement, and understand their own neural network models, using the Pytorch framework.
“Take it. CS221 taught me algorithms. CS229 taught me math. CS224N taught me how to write machine learning models.” – A CS224N student on Carta
The following texts are useful, but none are required. All of them can be read free online.
If you have no background in neural networks but would like to take the course anyway, you might well find one of these books helpful to give you more background: