Machine Learning for Large-Scale Data Analysis and Decision Making (MATH80629A): Fall 2021

Course Description

Welcome to MATH80629A Graduate level course on introduction to machine learning at HEC Montreal (English edition). This is the English edition of the course, for the French edition, please check here. In this course, we will study machine learning models, a type of statistical analysis that focuses on prediction, for analyzing very large datasets (“big data”). The plan is to survey different machine learning techniques (supervised, unsupervised, reinforcement learning) as well as some applications (e.g., recommender systems). We will also study large-scale machine learning and will discuss distributed computational frameworks (Hadoop and Spark).

Course Format

Due to the hybrid nature of the semester, this course will be given as a flipped classroom. It is an instructional strategy where students learn the material before they come to the class. The material will be a mix of readings and video capsules. Class time is reserved for more active activities such as problem solving, demonstrations, and questions-answering. In addition, class time will contain a short summary of the week’s material.

Time & room

Mondays 8:30 am - 11:30 am
room: Manuvie. This classroom is located on the 1st floor of Côte-Sainte-Catherine building.
online: zoom link. Rooms are password-protected. Reach out to me by email if you need the password.

Feedback

Please use this form to provide feedback about the course.

Prerequisites

Mathematical maturity and basic knowledge of statistics, and probability will be assumed. For the programming assignments and the project, Python programming will be assumed. If you do not know Python here are few ways to learn the basics below.

Data Camp: Complete Chapters 1, 2, 3 (sign in using this link with your @hec.ca email address to access Chapters 2 and 3). This option is recommended
HEC CAM offers introductory python courses in September (currently only in French). Register at CAM registration.
Fall 2018 tutorial. This will give you an idea of the level that is expected for this course.

Further a machine-learning tutorial using python will be provided on week #5.

Grading

Your final score for the course will be computed using the following weights:

Homework (20%)
Capsule quizes (10%)
Project (30%)
Project presentation (10%)
Final Exam (30%)

ATTENTION regarding fraud and plagiarism: The HEC Montreal has a strict policy in case of fraud or plagiarism. If an infraction is found, the professor is required to report to the director of the department. An administrative procedure is then automatically triggered with the following consequences: the offense is noted in your file, and a sanction is decided (which can be serious and go to dismissal in case of recidivism). It is important that you do the work yourself!

Reading

The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition Hastie, Trevor, Tibshirani, Robert, Friedman, Jerome, 2009 [ESL]
Deep Learning. Ian Goodfellow, Yoshua Bengio and, Aaron Courville. [DL]
Reinforcement Learning : An Introduction Hardcover. Richard S. Sutton, Andrew G. Barto. A Bradford Book. 2nd edition [RL-Sutton-Barto]
Machine Learning. Kevin Murphy. MIT Press. 2012. [ML-Murphy]
Recommender Systems Handbook, Ricci, F., Rokach, L., Shapira, B., Kantor, P.B. 2011. [RSH]
Data Algorithms : Recipes for Scaling Up with Hadoop and Spark 1st Edition. Mahmoud Parsian. O’Reilly. 2015 [DA]
Python for Data Analysis : Data Wrangling with Pandas, NumPy, and IPython. Wes McKinney. O’Reilly. 2012 [PDA]
Pattern Recognition and Machine Learning. Christopher Bishop. 2006 [PRML]
Advanced Analytics with Spark. O’Reilly. Second Edition. 2017

Acknowledgement

I thank prof. Laurent Charlin for sharing his slides and video capsules with me. The majority of the materials of this course are based on the previous editions that have been thaugh by him.