Coached Mooc – Introduction to Big Data with Apache Spark

mooc coaching  spark-logo


  • Learn how to apply data science techniques using parallel programming in Apache Spark to explore big (and small) data.
  • Study online but work in group
  • Get help from a local expert

Why we coach MOOCs

The European Data Innovation Hub is partnering with top experts to offer MOOC participants the possibility to do these online courses in group. During the duration of the Mooc participants will be welcome to come to the Hub in Brussels to work and to go through exercises with other participants. On specific days one or more domain expert will be present to coach the students.


  1. Sign up to this course here
  2. Join the meetup group here

About this course

Organizations use their data for decision support and to build data-intensive products and services, such as recommendation, prediction, and diagnostic systems. The collection of skills required by organizations to support these functions has been grouped under the term Data Science. This course will attempt to articulate the expected output of Data Scientists and then teach students how to use PySpark (part of Apache Spark) to deliver against these expectations. The course assignments include Log Mining, Textual Entity Recognition, Collaborative Filtering exercises that teach students how to manipulate data sets using parallel processing with PySpark.

This course covers advanced undergraduate-level material. It requires a programming background and experience with Python (or the ability to learn it quickly). All exercises will use PySpark (part of Apache Spark), but previous experience with Spark or distributed computing is NOT required. Students should take this Python mini-quiz before the course and take this Python mini-course if they need to learn Python or refresh their Python knowledge.

What you’ll learn

  • Learn how to use Apache Spark to perform data analysis
  • How to use parallel programming to explore data sets
  • Apply Log Mining, Textual Entity Recognition and Collaborative Filtering to real world data questions
  • Prepare for the Spark Certified Developer exam

Meet the online instructor:

bio for Anthony D. Joseph

Anthony D. Joseph

Meet the coach:

Kris Peeters

Kris Peeters from Dataminded


Pursue a Verified Certificate to highlight the knowledge and skills you gain ($50)

View a PDF of a sample edX certificate
  • Official and Verified

    Receive a credential signed by the instructor, with the institution logo to verify your achievement and increase your job prospects

  • Easily Shareable

    Add the certificate to your CV, resume or post it directly on LinkedIn

  • Proven Motivator

    Get the credential as an incentive for your successful course completion

Job opportunities ?


Click here for Data related job offers.
Join our community on linkedin and attend our meetups.
Follow our twitter account: @datajobsbe

Have you been to our Meetups yet ?

Each month we organize a Meetup in Brussels focused on a specific DataScience topic.

Brussels Data Science Meetup

Brussels, BE
1,239 Business & Data Science pro’s

The Brussels Data Science Community:Mission:  Our mission is to educate, inspire and empower scholars and professionals to apply data sciences to address humanity’s grand cha…

Next Meetup

Launch MOOC Coaching activities, First course is the Machine…

Thursday, May 28, 2015, 7:00 PM
15 Attending

Check out this Meetup Group →

Starting free hands-on MOOC Coaching!

Hi everyone!

We’re excited to announce our coaching in the most popular Massive Open Online Course: Machine Learning by Andrew Ng! We’ve got some beautiful new office space thanks to our buddies at AXA Belgium where we’ll be holding meetups to discuss and work through course materials. We’ll start Monday 4th of May around 7 pm, so keep a look out at our various channels of communication! Here’s the event with address details, a Calendar file and so on.

Andrew Ng’s ‘Machine Learning’ is one of the first courses on Coursera which has grown to amazing popularity, and rightfully so! This course covers ‘how to make computers act without explicitly programming them’, as Andrew puts it, by explaining concepts like multivariate regression, neural networks, support vector machines and much more. This information is invaluable for many branches of data science and gives a good look at what’s ahead for those willing to get their hands dirty. You don’t have to be an expert programmer for it either. Everything Andrew does is in Octave, but to make our learning experience even more exciting we’ll be repeating the Octave exercises in R, a very common language among all data- or statistics workers. R is a great language to learn if you’re looking to go forward in (online) courses concerning data science.

With our group, you’ll be guided in understanding the concepts and assignments given to you in this course, giving you valuable experience in what Machine Learning is and what can be accomplished with it. We’ll also give a little more background on some of the stuff Ng talks about so that each and all can keep their head above water.

And, of course, it’s free! We want to stimulate a learning environment and attract enthusiasts on all levels, so feel free to join in. After our first meetup we’ll hook a camera up with a Google Hangouts group so that you can follow online.

First ‘in-real-life’ meeting will start Monday May 4th and from then on we’ll get together every Thursday (except on main meetup days, about once per month). Edward and I will coach, though enthusiasts are always welcome to help out or hang around.

See you there!

Sneak preview – Mooc – Bart Baesens – Credit Risk Analytics

Baesens_Bart_small     Big Data World
I had a nice lunch with Prof. Dr Bart Baesens today at the MIM to discuss his recent book ‘Analytics in a Big Data World: The Essential Guide to Data Science and its Applications’
One topic we discussed was knowledge transfer and certification.
Next to the recorded presentations already available on, the professor told me that his new course about Credit Risk Analytics would soon be released. Here is for you, in avant première, the content of this course that he has put together with SAS. This course will be available mid November 2014.

New e-learning course Credit Risk Analytics by professor Bart Baesens

The outline of the course is as follows:
Lesson 1: Introduction to Credit Scoring
Lesson 2: The Basel Capital Accords
Lesson 3: Preparing the data for credit scoring
Lesson 4: Classification for credit scoring
Lesson 5: Measuring the Performance of Credit Scoring Classification Models
Lesson 6: Variable Selection for Classification
Lesson 7: Issues in Scorecard Construction
Lesson 8: Defining Default Ratings and Calibrating PD
Lesson 9: LGD modeling
Lesson 10: EAD modeling
Lesson 11: Validation of Credit Risk Models
Lesson 12: Low Default Portfolios
Lesson 13: Stress testing
You are invited to send an email to if interested in more information.

Coursera – Process Mining -TU Eindhoven – starts Nov 12th



Process Mining: Data science in Action

Process mining is the missing link between model-based process analysis and data-oriented analysis techniques. Through concrete data sets and easy to use software the course provides data science knowledge that can be applied directly to analyze and improve processes in a variety of domains.

Course at a Glance

4-6 hours of work / week
English subtitles

How to Become a Data Scientist by Kunal Jain

full article from Kunal Jain: How Do I Become a Data Scientist?

Last week, Kunal Jain shared a framework to help you answer the question, “Should I become a data scientist (or business analyst)?“. For the people, who clear the cut-offs, the next obvious question is “How do I become a data scientist?”

Having said that, if I was starting his career today, would he choose the same path? The answer is NO.

Step 1: Graduate from a top tier university in a quantitative discipline

Step 2: Take up a lot of MOOCs on the subject – but do them one at a time

  • Python:
    • Introduction to Computer Science and Programming using Python –
    • Intro to Data Science – Udacity
    • Workshop videos from Pycon and SciPy – some of them are mentioned here
    • Selectively pick from the vast tutorials available on the net in form of iPython notebooks
  • R:
    • The Analytics Edge –
    • Pick out a few courses from Data Science specialization to complement Analytics Edge
  • Other courses (applicable for both the stacks):
    • Machine Learning from Andrew Ng – Coursera
    • Statistics course on Udacity
    • Introduction to Hadoop and MapReduce on Udacity

Step 3: Take a couple of internships / freelancing jobs

Step 4: Participate in data science competitions

Step 5: Take up the right job which provides awesome experience

Coursera starts a free Mooc called Mining of Massive Datasets from Stanford University.

Coursera starts a free Mooc called  Mining of Massive Datasets from Stanford University.

This is a popular course at Stanford and goes along with the book by the same name.

The FREE course starts September 29, 2014, and runs for 7 weeks.

The prerequisites are some SQL, algorithms, and data structures knowledge.