CS 124: From Languages to Information

Winter 2023

Stanford University

This course is centered on extracting information from unstructured data in language and social networks using machine learning tools. It covers techniques like sentiment analysis, chatbot development, and social network analysis.

14 covered concepts

Slides / notes available

No videos available

Assignments available

Other resources available

Course Page

Overview

The online world has a vast array of unstructured information in the form of language and social networks. Learn how to make sense of it using neural networks and other machine learning tools, and how to interact with humans via language, from answering questions to giving advice!

Prerequisites

CS106B. CS 107 can be helpful, but is fine if you haven't had it, we'll cover the required UNIX material. Math 51 can also be helpful, but isn't required, since we will introduce the basic vectors knowledge we need in the class.

Learning objectives

Extracting meaning, information, and structure from human language text, speech, web pages, social networks. Introducing methods (string algorithms, edit distance, language modeling, machine learning, logistic regression, neural networks, neural embeddings, inverted indices, collaborative filtering, PageRank), applications (chatbots, sentiment analysis, information retrieval, text classification, social networks, recommender systems), and ethical issues.

Textbooks and other notes

Textbook

There is no required textbook, but I'll expect you to know the textbook/reading material listed above, and will test it on the midterms.

Online new chapters from Jurafsky and Martin. third edition in progress. Speech and Language Processing.
Chapters from Manning, Raghavan, and Schutze. 2008. Introduction to Information Retrieval. Cambridge University Press. You can buy the book, get it from the library, or it's also available online HERE.

Is 106B the only prereq? Do I need 109 or 221 before I take CS124? : 106B is the only prereq. Taking the course as a sophomore is recommended, but we also get lots of juniors and a reasonable number of frosh; the course is designed to be taken early in your Stanford career. It will help if you have at least done some programming beyond 106B, and is also useful to have had 107 or Math 51, but not required; we'll try to give you pointers to places to make up missing background.

Can I take this course as a non-CS grad student?: Yes, although this course is not appropriate for CS grad students (there are graduate versions of all the material in this course), it's very commonly taken by PhD students in the social sciences or humanities who plan to use text processing methods in their research.

Courseware availability

Lecture slides available at Schedule

No videos available

Homework available at Schedule

Readings available at Schedule

Covered concepts

Chatbots Information Retrieval Language Modeling Logistic Regression Minimum Edit Distance Naive Bayes Neural network Regular Expressions Sentiment Analysis Sequence Labelling Social Networks Text classification Vector Semantics and Embeddings Web graphs, Links, and PageRank

About Feedback

Discord

CS 124: From Languages to Information

Overview

Prerequisites

Learning objectives

Textbooks and other notes

Textbook

Other courses in Natural Language Processing

11-411/611 Natural Language Processing

CSE 447 and 517 Natural Language Processing

CS 224N: Natural Language Processing with Deep Learning

COS 484: Natural Language Processing

Courseware availability

Covered concepts