This course will cover the fundamentals of structured & unstructured data analysis for text and multimedia content exploration, with an emphasis on vector space representations and deep learning models. It will focus on machine learning and algorithms suitable for these tasks, and cover both applications and scholarship. Students are expected to have previous exposure to data mining and machine learning, and to be comfortable programming in python.
Dr. Andrew Gardner andywocky@gmail.com
OH Location: CSE common area (next to KLAUS 1324)
OH Time: W 10:00AM - 12:00PM
There is no formal textbook for this class. We will provide links for materials as part of lectures.
We will use Piazza for discussion and all announcements. Post your questions there. Our teaching staff and your fellow classmates will help answer them quickly. You can also use Pizza to find project teammates.
T-square will be used for submission of assignments and projects.
We welcome everyone to share their experiences in tackling issues and helping each other out, but please do not post your answers, as that may affect the learning experience of your fellow classmates.
You are allowed to loosely collaborate with fellow students, but your assignments should be worked on independently. Do not share answers with others. All GT students must observe the honor code. Any suspected plagiarism or academic misconduct will be reported and directly handled by the Office of Student Integrity (OSI).
We plan to have 5 homework assignments in total, HW 1 - 5, which will consist of a mix of exercises, programming and other activities. Homeworks will be due approximately two weeks after release.
We will assign one major project for the course, due before Spring Break. This will be more significant in scope than the homeworks, and will require substantial programming effort. For example, you may be asked to create a web applications for a recommendations system for movies, or an active learning image annotation tool.
Projects teams of four will be randomly assigned. More details on project grading will be provided when released.
One of the key skills a graduate student should master is the ability to conduct scholarly research on a topic. Each student will be required to prepare a research paper of 10-15 pages length. Topics and format will be discussed after Spring Break.
Homework can be submitted up to 1 week late at a penalty of -10% per 24 hr period overdue. Projects can be submitted up to 1 week late at a flat penalty of 30%. After 1 week the project will be scored as 0%.
Week | Date | Topic | Event |
---|---|---|---|
1 | Jan 09 | Abbreviated course (syllabus out) | |
Jan 11 | Course Overview and Linear Algebra Review | ||
2 | Jan 16 | MLK Holiday – No Class | |
Jan 18 | Data representations & text encoding | ||
3 | Jan 23 | Embeddings 1 | |
Jan 25 | Embeddings 2 | ||
4 | Jan 30 | Machine Learning Review | |
Feb 01 | Recommender Systems 1 | ||
5 | Feb 06 | Recommender Systems 2 | |
Feb 08 | Deep learning concepts & overview | HW 1 due | |
6 | Feb 13 | Guest Lecture 1: TensorFlow 1 & autoencoders | |
Feb 15 | Guest Lecture 2: TensorFlow 2 | ||
7 | Feb 20 | Training & (hyper) parameter tuning | |
Feb 22 | Experiment design and performance measurement | ||
8 | Feb 27 | Deep learning: CNNs | |
Mar 01 | Deep learning: LSTMs 1 | ||
9 | Mar 06 | Deep Learning: LSTMs 2 | |
Mar 08 | Guest lecture 3: Bayesian techniques | ||
10 | Mar 13 | Topic modeling | |
Mar 15 | Sentiment analysis | Project due | |
11 | Mar 20 | Spring Break | |
Mar 22 | |||
12 | Mar 27 | Guest lecture 4: Clustering | Paper out |
Mar 29 | Text clustering | HW 5 out | |
13 | Apr 03 | Exploration vs. exploitation & bandits | |
Apr 05 | Probabilistic data structures | ||
14 | Apr 10 | Count-min sketch | HW 5 due |
Apr 12 | Advanced Topics TBD | ||
15 | Apr 17 | Advanced Topics TBD | Paper due |
Apr 19 | Advanced Topics TBD | ||
16 | Apr 24 | Guest lecture 5: TBD | |
Apr 26 | No class |
Advanced topics lectures will be facilitated, interactive classroom discussions of advanced papers related to text mining and natural language processing, deep learning and artificial intelligence.