My name’s Alexander Chituc, and I’ll be your foreign correspondent in Brussels, regularly reporting on the diHub and the data science community here in Belgium. I’m an American, I studied philosophy at Yale, and I’m one of the seventeen boot-campers for the di-Academy.
We started the second week of the Data Science bootcamp developing some more practical skills. The first day was devoted to learning about building predictive models using R with Nele Verbiest, a Senior Analyst from Python Predictions. The second day, we worked with Xander Steenbrugge, a data analyst from Datatonic, learning about Data Visualization using Tableau Software.
Day 1: Predictive modeling
Nele told us to think of predictive modeling as the use of all available information to predict future events to optimize decision making. Just making predictions isn’t enough, she said, if there’s no action to take.
The analogy used throughout the training was that developing a predictive model was like cooking. We can think of cooking for a restaurant as having five general steps: take the order, prepare the ingredients, determine the proportion of ingredients to use and how to cook them, taste and approve the dish, and finally, serve the dish and check in with the customer. We can translate this into five analogous steps for preparing a predictive model: project definition, data preparation, model building, model validation, and model usage.
We were given a lab in predictive modeling in R, providing us with hands-on experience with the methodology and techniques of predictive modeling. A sample dataset was provided, and the lab walked us step by step through the process of developing a model to detect the predictors that determine the likelihood of whether a customer will churn (for those outside the biz, a churn rate is the rate at which individuals leave a community over time, in this case that means canceling a subscription with a telecom provider). This lab took us through all five steps of the process, and along the way we cleaned data, replaced any outliers, went over the basics of model building, discussed the danger of over-fitting a model (the analogy here was recording a concert — you want to record the music, not the sound of the audience, conductor’s baton, or pages turning) and how to simplify a model to prevent this. We went over decision trees, linear regression, logistic regression, variable selection, and how to evaluate your model.
There’s obviously a lot more detail I could get into here, but if I had to write about all of it, I’d never get the chance to write about day two.
Day 2: Data Visualization using Tableau Software
The second day, we immediately jumped into how to use Tableau software. Considering just how much it’s possible to do with this program, I was surprised by how intuitive and and easy to use it was. Managing data is extremely simple, and to create a graph you simply set the parameters, select the graph type, assign data to the columns and rows, set any filters you might want, and choose which data you want to visually represent by color, size, or label.
Xander walked us through how to create the dashboard below, demonstrating the sales of a sample superstore geographically, showing which quarters and departments had the most sales, as well as the average shipping delay for each category and subcategory.
After lunch, we were given a dataset and an image of a desktop, and asked to recreate it ourselves in Tableau. After learning the basics with Xander, it was nice to be tossed into the pool to get some real practice swimming:
If you’re interested in seeing more of what Tableau software is capable of, here’s an example of an interactive graph from their website, where you can explore Global Nuclear Energy Use. You can explore the entire gallery here.
Thanks again to Nele Verbiest and Xander Steenbrugge for being such great teachers, and expect a post on week 3 soon.