Learn how build machine learning pipelines that work at any scale using Apache Spark.
Study online but work in group.
Get help from a local expert.
Meet fellow minds over free pizza.
NOTE: There is large overlap between weeks 1 & 2 of this course and the previous coached MOOC, “Introduction to Big Data with Spark (CS100.1x)”. In fact, week 2 of this MOOC is identical to week 2 of the previous MOOC. We’ll go over all of the material, but emphasize mostly what is new (NumPy, linear algebra in PySpark). First labs are due on July 11.
Why we coach MOOCs
The European Data Innovation Hub is partnering with top experts to offer MOOC participants the possibility to these online courses in group. During the duration of the Mooc participants will be welcome to come to the Hub to work and to go through exercises with other participants. On specific days one or more domain expert will be present to coach the students.
About this course
Machine learning aims to extract knowledge from data and enables a wide range of applications. With datasets rapidly growing in size and complexity, learning techniques are fast becoming a core component of large-scale data processing pipelines. This course introduces the underlying statistical and algorithmic principles required to develop scalable real-world machine learning pipelines. We present an integrated view of data processing by highlighting the various components of these pipelines, including feature extraction, supervised learning, model evaluation, and exploratory data analysis. Students will gain hands-on experience applying these principles by using Apache Spark to implement several scalable learning pipelines.
Programming background; comfort with mathematical and algorithmic reasoning; familiarity with basic machine learning concepts; exposure to algorithms, probability, linear algebra and calculus; experience with Python (or the ability to learn it quickly). All exercises will use PySpark, but previous experience with Spark or distributed computing is NOT required.
Meet the online instructor: Anthony D. Joseph
Meet the coach: Patrick Varilly