Course Description

This course will cover the fundamentals of structured & unstructured data analysis for text and multimedia content exploration, with an emphasis on vector space representations and deep learning models. It will focus on machine learning and algorithms suitable for these tasks, and cover both applications and scholarship. Students are expected to have previous exposure to data mining and machine learning, and to be comfortable programming in python.

Instructor

Dr. Andrew Gardner andywocky@gmail.com

TAs & Office Hour

OH Location: CSE common area (next to KLAUS 1324)

OH Time: W 10:00AM - 12:00PM

Textbook

There is no formal textbook for this class. We will provide links for materials as part of lectures.

Piazza & T-square

We will use Piazza for discussion and all announcements. Post your questions there. Our teaching staff and your fellow classmates will help answer them quickly. You can also use Pizza to find project teammates.

T-square will be used for submission of assignments and projects.

Collaboration

We welcome everyone to share their experiences in tackling issues and helping each other out, but please do not post your answers, as that may affect the learning experience of your fellow classmates.

You are allowed to loosely collaborate with fellow students, but your assignments should be worked on independently. Do not share answers with others. All GT students must observe the honor code. Any suspected plagiarism or academic misconduct will be reported and directly handled by the Office of Student Integrity (OSI).

Grading

Homework (50% of Grade)

We plan to have 5 homework assignments in total, HW 1 - 5, which will consist of a mix of exercises, programming and other activities. Homeworks will be due approximately two weeks after release.

Projects (30% of Grade)

We will assign one major project for the course, due before Spring Break. This will be more significant in scope than the homeworks, and will require substantial programming effort. For example, you may be asked to create a web applications for a recommendations system for movies, or an active learning image annotation tool.

Projects teams of four will be randomly assigned. More details on project grading will be provided when released.

Research Paper (20% of Grade)

One of the key skills a graduate student should master is the ability to conduct scholarly research on a topic. Each student will be required to prepare a research paper of 10-15 pages length. Topics and format will be discussed after Spring Break.

Late Submission Policy

Homework can be submitted up to 1 week late at a penalty of -10% per 24 hr period overdue. Projects can be submitted up to 1 week late at a flat penalty of 30%. After 1 week the project will be scored as 0%.

Schedule

Week Date Topic Event
1 Jan 09 Abbreviated course (syllabus out)
Jan 11 Course Overview and Linear Algebra Review
2 Jan 16 MLK Holiday – No Class
Jan 18 Data representations & text encoding
3 Jan 23 Embeddings 1
Jan 25 Embeddings 2
4 Jan 30 Machine Learning Review
Feb 01 Recommender Systems 1
5 Feb 06 Recommender Systems 2
Feb 08 Deep learning concepts & overview HW 1 due
6 Feb 13 Guest Lecture 1: TensorFlow 1 & autoencoders
Feb 15 Guest Lecture 2: TensorFlow 2
7 Feb 20 Training & (hyper) parameter tuning
Feb 22 Experiment design and performance measurement
8 Feb 27 Deep learning: CNNs
Mar 01 Deep learning: LSTMs 1
9 Mar 06 Deep Learning: LSTMs 2
Mar 08 Guest lecture 3: Bayesian techniques
10 Mar 13 Topic modeling
Mar 15 Sentiment analysis Project due
11 Mar 20 Spring Break
Mar 22
12 Mar 27 Guest lecture 4: Clustering Paper out
Mar 29 Text clustering HW 5 out
13 Apr 03 Exploration vs. exploitation & bandits
Apr 05 Probabilistic data structures
14 Apr 10 Count-min sketch HW 5 due
Apr 12 Advanced Topics TBD
15 Apr 17 Advanced Topics TBD Paper due
Apr 19 Advanced Topics TBD
16 Apr 24 Guest lecture 5: TBD
Apr 26 No class

Advanced topics lectures will be facilitated, interactive classroom discussions of advanced papers related to text mining and natural language processing, deep learning and artificial intelligence.