Syllabus

Course description

The course will cover fundamental concepts and algorithms in computational linguistics and natural language processing. We will explore computational analysis from the word level to the sentence level. We will cover text classification, part-of-speech tagging, probabilistic statistical parsing, computational distributional lexical semantics, and neural networks. The lab component of the course will introduce necessary mathematical concepts and computational libraries such as linear algebra, differential and vector calculus, Numpy, and PyTorch. The assignments will be done in Python. If you do not know how to program in Python before, you will have to learn quickly during the first week of the class.

Learning Goals

Upon completion of the course, students will understand the fundamentals of processing natural language using some of the most important machine learning algorithms in NLP and AI. This includes Naive Bayes and Logistic Regression classification; perceptron, multilayer perceptrons, structured perceptrons, and recurrent neural network architectures. Students also learn how to create word embeddings that can be used in language models for a wide variety of NLP applications. Students will become proficient in NumPy for implementing these programs, as well as become very familiar with the standard neural network machine libraries and development platforms.

Grading

There will be one quiz in the midterm and a final exam.

Assignments are worth 75% of your grade. The quizzes are worth 20%. Class participation is worth 5%.

If you are a student with a documented disability on record at Brandeis University and which to have reasonable accommodations made for you in this class, please see me immediately.

Success in this 4 credit hour course is based on the expectation that students will spend a minimum of 9 hours of study time per week in preparation for class (readings, papers, discussion sections, preparation for exams, etc.).

Prerequisites: COSI 114a, plus at least one of LING 160b, MATH 10a, MATH 10b, MATH 15a, MATH 20a, MATH 22a, or equivalent knowledge.

This course is required for Computational Linguistics MS students.