Training – Executive Class – Small talk on Big Data by Corina Ciechanow

logo

This is an informative talk aimed to managers and decision makers about what Big Data is and how it can be used for business.

Target audience

All managers of functional departments (marketing, operations, HR, strategists…), decision makers, Project managers, CXO, limited from 8 – 12 participants.

Details

  • Duration: One afternoon seminar (3h):
    •  When Thursday 29/10
  • Location: European Data Innovation Hub @ AXA, Vorstlaan 23, 1170 Brussel
  • Price: 270€ per manager

Motivation

Everybody is talking about Big Data but mainly on technical terms and around the IT department.  What is Big Data about, and why should you care if you are not a technical person?  ‘Data is the new Gold’ says the EU, how could you do business with this new resource?

Overview

This seminar provides you an overview around these questions.  We will discuss the impacts of riding on the Big Data wave, the impact of not doing it, and why to succeed in the Big Data era managers need to embrace this new paradigm.

Additionally, it will provide you with the technical keywords to follow future developments of this trend in order to take informed decisions.

Topics

1.      What does the term Big Data really means

2.      Big Data means Big Business

3.      An aperçu of where the technology stands

4.      Main players

5.     Big Data Issues

6.     What are the societal impacts (to be prepared)

About the speaker

corina

Corina Ciechanow is a dynamic presenter with a solid technical background as well as a long trajectory of making technology trends accessible to managers and decision-makers.
Former professor in Machine Learning, she has a long experience as consultant (www.waterloohills.com), managing IT projects for big companies as well as for international organizations.
Since 2012, Corina is also VP Women & Technology at Professional Women International (www.pwi.be), where she spreads the word on the new Internet trends.

#datascience training programmes

Why don’t you join one of our  #datascience trainings in order to sharpen your skills.

Special rates apply if you are a job seeker.

Here are some training highlights for the coming months:

Check out the full agenda here.

Have you been to our Meetups yet ?

Each month we organize a Meetup in Brussels focused on a specific DataScience topic.

Brussels Data Science Meetup

Brussels, BE
1,333 Business & Data Science pro’s

The Brussels Data Science Community:Mission:  Our mission is to educate, inspire and empower scholars and professionals to apply data sciences to address humanity’s grand cha…

Next Meetup

Event – SAPForum – Sept9 @Tour & Taxis, Brussels

Wednesday, Sep 9, 2015, 12:00 AM
2 Attending

Check out this Meetup Group →

Training – Spark4Devs – by Andy Petrella + Xavier Tordoir from Data Fellas

spark-logo

main page

Woot Woot – The Hub is hosting the Spark4Devs training from Data Fellas

Why you should come to Spark4 Devs

Why now

Nowadays, for developers, one of the rare areas where there is still a lot of things to discover and to do is Big Data, or more precisely, Distributed Engineering.

This is an area where a lot of developers have been around for a while, but this area is also slightly moving towards Distributed Computing on Distributed Datasets.

Not only Machine Learning is important in this area, but certainly, core development capabilities are required to make the production processes Scalable, Highly Available whilst keeping Reliability and Accuracy of the results.

Clearly Scala has its role to play there, Typesafe is spreading the word and were precursors on that, but Data Fellas and BoldRadius Solutions are on the same page. Scala is to be the natural successor of Python, R or even Julia in Data Science, with the introduction of the Distributed facet.

That’s why you, devs but of course data scientists, you want to come to this training on Apache Spark focusing on the Scala language.

Why Spark 4 Devs

Because, the Apache Spark project is THE project you need for all you data project, actually, even if the data is not distributed!

The training is three-days, hence the first day will tackle the core concepts, batch processing. We will start by introducing why Spark is the way to go then we’ll explain how to hit this road by hacking some datasets using it.

The second day will, in the same pragmatic and concrete manner, introduce the Spark Streaming library. Of course, we’ll see how to crack some tweets but also show how a real broker can be consumed, that is, Apache Kafka.

Not only will you be on track to use Apache Spark in you project, but since the Spark Notebook will be used as the main driver for our analyses, you’ll be on track to use interactively and proficiently Apache Spark with a shortened development lifecycle.

Title

Spark 4 Devs

subtitle Discover and learn how to manage Distributed Computing on Distributed Datasets in a Bigdata environment driven by Apache Spark.
 Date  23,24,25 September

Target audience

Mainly developers that want to learn about distributed computing and data processing

Duration

3 days

Location

European Data Innovation Hub @ AXA, Vorstlaan 23, 1170 Brussel

Price

500€ /per day /per trainee

audience

minimum 8 – max 15 people

Registration

via http://spark4devs.data-fellas.guru/

Motivation

http://blog.bythebay.io/post/125621089861/why-you-should-come-to-spark4-devs

Overview

via http://spark4devs.data-fellas.guru/

Learning Objectives

via http://spark4devs.data-fellas.guru/

Topics

Distributed Computing, Spark, Spark Streaming, Kafka, Hands on, Scala

Prerequisites

developing skills in common languages

About the trainer:

Andy

Andy is a mathematician turned into a distributed computing entrepreneur.
Besides being a Scala/Spark trainer. Andy also participated in many projects built using spark, cassandra, and other distributed technologies, in various fields including Geospatial, IoT, Automotive and Smart cities projects.
He is the creator of the Spark Notebook (https://github.com/andypetrella/spark-notebook), the only reactive and fully Scala notebook for Apache Spark.
In 2015, Xavier Tordoir and Andy founded Data Fellas (http://data-fellas.guru) working on:
* the Distributed Data Science toolkit, Shar3
* the Distributed Genomics product, Med@Scale

After completing a Ph.D in experimental atomic physics, Xavier focused on the data processing part of the job, with projects in finance, genomics and software development for academic research. During that time, he worked on timeseries, on prediction of biological molecular structures and interactions, and applied Machine Learning methodologies. He developed solutions to manage and process data distributed across data centers. Since leaving academia a couple of years ago, he provides services and develops products related to data exploitation in distributed computing environments, embracing functional programming, Scala and BigData technologies.

#datascience training programmes

Why don’t you join one of our  #datascience trainings in order to sharpen your skills.

Special rates apply if you are a job seeker.

Here are some training highlights for the coming months:

Check out the full agenda here.

Have you been to our Meetups yet ?

Each month we organize a Meetup in Brussels focused on a specific DataScience topic.

Brussels Data Science Meetup

Brussels, BE
1,328 Business & Data Science pro’s

The Brussels Data Science Community:Mission:  Our mission is to educate, inspire and empower scholars and professionals to apply data sciences to address humanity’s grand cha…

Next Meetup

Event – SAPForum – Sept9 @Tour & Taxis, Brussels

Wednesday, Sep 9, 2015, 12:00 AM
2 Attending

Check out this Meetup Group →

Blog – Predictive Analytics – a Soup Story by Geert Verstraeten

geert 1brasserie octopus

Predictive Analytics – a Soup Story

A simple metaphor for projects in predictive analytics 

By: Geert Verstraeten, Predictive Analytics advocate, Managing Partner and Professional Trainer, Python Predictions

The analytical scene has recently been dominated by the prediction that we would soon experience an important shortage of analytical talent. As a response, academic programs and massive open online courses (MOOCs) have sprung up like mushrooms after the rain, all with the purpose of developing skills for the analyst or its more modern counterpart, the data scientist. However, in the original McKinsey article, the shortage of analytics-oriented managers was predicted to become ten times more important than the shortage of analysts[1]. But how do we offer relevant concepts and tools to managers without drowning our ‘sweet victims’ in technology and jargon?

For managers, most analytics training falls short in a critical way. The vast majority of newfound analytics training focuses on core analytics algorithms and model building, not on the organizational process needed to apply it. In my opinion, the single most important tool for any manager lies in understanding the process of what should be managed. The absolute essence when asked to supervise predictive analytical developments lies in having a solid understanding of the main project phases. Obviously, we are not the first to realize that this is vital. Tools have been developed to describe the process methodology for developing predictive models[2]. However, it is difficult for non-experts to become excited about these tools, as they describe phases in a rather dry way.

We have experimented with different ways to present process methodology in a more fun and engaging way. Today, we no longer experiment. In our meetings and trainings with managers, we present the development of analytical models as simple as the process of making soup in a soup bar.

Project definition

geert phase 1  This first phase is concerned with understanding the organization’s needs, priorities, desires and resources. Taking the order basically means we should start by carefully exploring what it is that we need to predict. Do we want to predict who will leave our organization in the next year, and if so – how will we define this concretely? At this time, when the order becomes clear, it is time to check the stock to make sure we will be able to cook the desired dish. This is equivalent to checking data availability. Additionally, it is important to have an idea about timing: will our client need to leave timely in order to catch the latest movie? This is pretty similar to drawing a project plan.

Data preparation

geert phase 2The second phase deals with preparing all useful data in a way that they are ready to be used subsequently in the analysis. For those not familiar with (French) cooking jargon, mise en place is a term used in professional kitchens to refer to organizing and arranging the ingredients (e.g. freshly chopped vegetables, spices, and other components) that a cook will require for his shift[3]. Data are for predictive analytics what ingredients are for making soup. In predictive analytics, data are gathered, cleaned and often sliced and diced such that they are ready to be used in a later analytical stage.

Model building

gert phase 3The main task in cooking the soup lies in choosing exactly those ingredients that blend into a great result. This is no different in predictive modeling, where the absolute essence lies in selecting those variables that are jointly capable of predicting the event of interest. One does not make a great soup with only onions. Obviously, not only the presence of ingredients is relevant, also the proportions in which they are used – compare this to the parameters of predictors: not every predictor is equally important for obtaining a high quality result. Finally, cooking techniques matter just as much as algorithms do in predictive analytics – they represent essentially different ways to combine the same data into the best soup.

Model validation

geert phase 4In cooking it is crucial to taste a dish before it is served. This is very similar to model validation in predictive model building. Both technical and business relevant measures can be used to objectively determine whether a model built on a specific data set will hold true for new data. As long as the soup does not taste well, we can iterate back to cooking, until the final soup is approved – i.e. the champion model is selected.

Model usage

geert phase 5This phase is all about presentation and professional serving. A great soup served in an awful bowl may not be fully appreciated. The same holds true for predictive models – a model with fantastic performance may fail to convince potential users when key insights are missing. Drawing a colorful profile of the results may prove instrumental in convincing the audience of the model’s merit. If done successfully, this will likely result in an in-field experiment, for example designing a set of retention campaigns targeting those with the highest potential to leave. At that point, the engaged analyst should check in whether the meal is enjoyed.

Conclusion
title

This simple, intuitive process has been important to us to allow managers to engage in the process in a fun way. Presenting the process in a non-technical way makes the process digestible (to be fair, I’ve stolen this phrase from my friend Andrew Pease, Global Practice Analytics Lead at SAS because it makes such great sense in this context). However, it should remain clear that it is only a metaphor. At some point, building predictive models is obviously also different that making soup. Every phase, especially project definition, involves many more components than those where a link with soup can be found. But the metaphor gets us where we want to be – a point where a discussion is possible on what is needed to develop predictive models, and where a minimum of trust can exist: it ensures that we get on speaking terms with decision makers and all those who will be impacted by the models developed.

Notes and further reading

brasserie octopusWe fully realize this is not completely different from CRISP-DM, the Cross Industry Standard Process for Data Mining, which has been developed in 1996, and is still the leading process methodology used by 43% of analysts and data scientists. However, except if you are a veteran and/or an analyst, it is difficult to get really excited about CRISP-DM or its typical visualization. For those looking for a more in-depth understanding of the process, I recommend reading the modern answer to CRISP-DM, the Standard Methodology for Analytical Models (by Olav Laudy, Chief Data Scientist, IBM).

[1] In a previous post, we have also argued that the analytics-oriented manager is main lever for success with predictive analytics.

[2] for the sake of clarity: a predictive model is a representation of the way we understand a phenomenon – or if you will, a formulaic way to combine predictive information in a way to optimally predict future behavior or events.

[3] see the Wikipedia definition of mise en place

About Geert

Geert VerstraetenGeert Verstraeten is Managing Partner at Python Predictions, a niche player in the domain of Predictive Analytics. He has over 10 years of hands-on experience in Predictive Analytics and in training predictive analysts and their managers. His main interest lies in enabling clients to take their adoption of analytics to the next level. His next training will be organised in Brussels on October 1st 2015.

 

Gratitude goes to Eric SiegelAndrew Pease and our team at Python Predictions for delivering great suggestions on an earlier version of this article. All remaining errors are my own.

Link to the next training details from Geert.

Video

Job – NG-Data – Big Data Scientist – US and Belgium

NGDATA

Jo Buyl shared this job opportunity with us for a Big Data Scientist.

Job Description

In the era of Big Data, data is not useful until we identify patterns, apply context and intelligence. The data scientist, as an emerging career path, is at the core of organizational success with Big Data and for humanizing the data to help businesses better understands their consumer.

As a data scientist, you sift through the explosion of data to discover what the data is telling you. You figure out “what questions to ask” so that relevant information hidden in the large volumes and varieties of data can be extracted. The Data Scientist will be responsible for designing and implementing processes and layouts for complex, large-scale data sets used for modeling, data mining, and research purposes.

Opportunities

  • Be a true partner in defining the solutions, have and develop business acumen and bring technical perspective in furthering the product and business;
  • Aggregate data from various sources;
  • Help define, design, and build projects that leverage our data;
  • Develop computational algorithms and statistical methods that find patterns and relationships in large volumes of data;
  • Determine and implement mechanisms to improve our data quality;
  • Deliver clear, well-communicated and complete design documents;
  • Ability to work in a team as well as independently and deliver on aggressive goals;
  • Exhibit Creativity and resourcefulness at problem solving while collaborating and working effectively with best in class designers, engineers of different technical backgrounds, architects and product managers.

Personal Skills

  • You have a logical approach to the solution of problems and good conceptual ability and skills in analysis;
  • You have the ability to integrate research and best practices into problem avoidance and continuous improvement
  • You possess good interpersonal skills;
  • You are self reliant and capable of both independent work and as member of a team;
  • You are persistent, accurate, imaginative;
  • You are able and have the discipline to document and record results;
  • Be customer service oriented;
  • Be open minded and solution oriented;
  • You enjoy constantly expanding your knowledge base;
  • You are willing to travel up to five days per month.

Technical Background

The successful candidate should have 5+ years experience in large-scale software development, with at least 3 years in Hadoop. Have a strong cross-functional technical background, excellent written/oral communication skills, and a willingness and capacity to expand their leadership and technical skills.

  • BS / MS in computer Science;
  • Strong understanding of data mining and machine learning algorithms, data structures and related core software engineering concepts;
  • Understanding the concepts of Hadoop, HBase and other big data technologies; Understanding of marketing processes in the financial and or retail market;
  • Have a sound knowledge of SPSS and SQL

Apply:

Make sure that you are a member of the Brussels Data Science Community linkedin group before you apply. Join  here.

Please note that we also manage other vacancies that are not public, if you want us to bring you in contact with them too, just send your CV to datasciencebe@gmail.com .

Apply Today!

Upload your resume or send it to jobs@ngdata.com. We look forward to your application!

Job – Datashift – Data Analytics Consultant

datashift

Sandra asked us to share this opportunity with  you.

At Data Shift, we are always looking for skilled consultants and right now, we want to hire a Data Analytics Consultant.

Have you graduated recently with a master’s degree, are you eager to have an everlasting impact on clients and are you willing to be challenged intellectually and personally? Are you ready to shift gears?

You have the opportunity to join us and help us shift organizations towards data-drive decision-making.

You will be part of an entrepreneurial team that focusses on providing high quality skills, technologies and advisory services to our clients. These can include data analytics strategy definition, statistical and quantitative analysis, explanatory and predictive modelling, data modelling and visualisation. By offering analytics enabled solutions, you will help clients to turn data into meaningful insights.

As a Data Analytics Consultant you will:

  • Extract, clean and sample data to get suitable output for further analysis;
  • Integrate, validate and manipulate data;
  • Make performance reports and data models to help make CXO-level decisions;
  • Participate in the development of strategies, roadmaps and business cases for Big data, Analytics and BI solutions;
  • Build and maintain client relationships.

 We ask for the following basic qualifications:

  • Master Degree in the field of Business Economics, Civil Engineering, Business Analytics;
  • Natural feeling for predictive and/or descriptive analytics;
  • Proven ability to apply analytical and creative thinking;
  • Able to deliver high profile activities against tight deadlines;
  • Proven success in contributing to a team-oriented environment;
  • Excellent leadership, communication (written and oral) and interpersonal skills;
  • Fluent in Dutch and/or French and English

We offer a no-nonsense working environment offering excellent terms of employment and fringe benefits including continuous training that builds and extends professional, technical and management skills in all areas. You will have the opportunity to grow together with dataSHIFT in terms of capabilities as well as financial benefits.

Apply:

Make sure that you are a member of the Brussels Data Science Community linkedin group before you apply. Join  here.

Please note that we also manage other vacancies that are not public, if you want us to bring you in contact with them too, just send your CV to datasciencebe@gmail.com .

In case you are willing (= believe in a Data Shift) and able (= this level of expertise and professionalism),
feel free to reach out to +32 473 863 126 (Sandra) or +32 476 89 30 69 (Nico) to grab a coffee and an interview.

Job – Artycs – Data Engineer

artycs

DATA ENGINEER – Freelance and Permanent position

About us: ARTYCS is a start-up providing advisory services on Big Data strategy, full end-to-end management and delivery of data analytics projects and sourcing of data science related profiles.

For one of our clients, a key player in financial services based in Brussels, we are looking for a Data Engineer. This is an exciting and challenging position as the candidate will join a dynamic, successful and rapidly growing team responsible for handling a large variety of Big Data business use cases. The candidate will be key in initiating and facilitating the business use cases from a data extraction and data management perspective.

Not only is this position very interesting for someone able to extract and prepare data for analytical purposes, it also allows to step into the data science field as some selected business use cases will be allocated to the data engineer. Last but not least, the team where the position seats is known for its innovative mindset and the data engineer is encouraged to test new products and state-of-the-art methods.

Job description

  • Extract and prepare data from corporate systems to initiate the analytical process
  • Actively support data scientists in the data discovery and data preparation process
  • Work under the supervision of a senior data engineer
  • Design, build and launch new data ETL/ELT processes in production for pilot use cases selected for operationalization
  • Design, build, and test of Hadoop based applications for analyzing datasets and making the results available to business users
  • Process unstructured data into a form suitable for analysis, utilizing custom applications
  • Productionize prototypes & support existing processes running in production
  • Work with data infrastructure to triage infra issues and drive to resolution
  • Perform advanced data analysis on a selection of business use cases, supported by data scientists

Skills

  • Able to extract, transform, compress and load data into HDFS
  • Knowledge of different Hadoop file formats like parquet, ORC,
  • Broad base understanding of the Hadoop ecosystem
  • Experience with high-level programming languages by preference Java, Scala, Python & R
  • Experience with Open Source big data technologies like Spark, Pig, Hive, HBase, Kafka, Storm
  • Ability to write efficient SQL statements
  • Able or eager to learn to write map-reduce & Spark jobs
  • Experience with Linux Redhat and Linux scripting
  • Experience of data flows, data architecture, ETL/ELT and processing of structured and unstructured data
  • Knowledge of Cloudera is a plus
  • Knowledge of IBM mainframe is considered as a plus
  • Ability to analyse data and identify deliverables, gaps and inconsistencies
  • Professional experience delivering scripts
  • Experience of traditional data warehouse systems and RDBMS considered a plus
  • Knowledge of statistics, data mining, machine learning and predictive modelling, data visualization (JavaScript) and information discovery techniques considered a plus
  • Knowledge of and experience with classic and new/emerging business intelligence methodologies considered a plus
  • Experience working with customers to identify and clarify requirements and ways to meet needs
  • Must have strong verbal and written communication skills, good customer relationship skills

Apply:

Make sure that you are a member of the Brussels Data Science Community linkedin group before you apply. Join  here.

Please note that we also manage other vacancies that are not public, if you want us to bring you in contact with them too, just send your CV to datasciencebe@gmail.com .

For more information about this opportunity, please contact Laurent Fayet, lbfayet@yahoo.fr, Tel: +32.476.79.46.28

Book – Fraud Analytics by Veronique, Bart and Wouter available on Amazon

bart en veerle 2Fraud Analytics

Using descriptive, predictive and social network techniques.

by Veronique Van Vlasselaer, Bart Baesens and Wouter Verbeke.

We are please to announce that the book about Fraud Analytics is now available for purchase on amazon.

Here is the full video of this presentation:

bart en veerlebart en veerle 1

Reserve your startup and co-working space at the Hub

co-working

Whether you want to grow an existing small company or build a company based on an innovative idea, the European Data Innovation Hub can support you as long as data is key in your project.

Is your focus on data, datascience, analytics, big data, open data, internet of things, machine learning then you should join the datascience den at the European Data Innovation Hub where you are  invited to network and connect with your peers and co-work, share and learn.

We have 400m² of work and meeting space at your disposal.

As from September 1 we open our doors at Boulevard du Souverain 23 in Brussels (1170 Watermaal-Bosvoorde).

Apply now for your flexible desk(s) or meeting space and

  • get your space at no cost for the first month
  • get free wifi and cold/warm drinks
  • get advice from our experienced datascientist in residence
  • receive interesting discounts on trainings and events
  • network with peers in the same domain of interest

Spaces are limited, so reserve your seat now.

Contact Philippe via pvanimpe@datainnovationhub.eu or call him at +32 477 23 78 42

Here is the nice 360° video of the Hub:

https://www.bubl.io/embed/7ca38302-53e8-4055-bdd7-60657924d529

don’t forget to start the video.

Job – Real Impact Analytics – Head of Software & Lead Architect

RIA

Hi Philippe,

Thanks a lot for your message, always happy to hear from you! Please find the links below, as both roles have a Big Data component:

Head of Software: https://realimpactanalytics.com/en/team/job/head-of-engineering

Lead Architect: https://realimpactanalytics.com/en/team/job/lead-software-architect

Thank you very much!

Best,

Mona Lazar

RealImpact Analytics Global Recruiter

www.realimpactanalytics.com

Apply:

Make sure that you are a member of the Brussels Data Science Community linkedin group before you apply. Join  here.

Please note that we also manage other vacancies that are not public, if you want us to bring you in contact with them too, just send your CV to datasciencebe@gmail.com .

Please apply directy on the website from RIA, don’t forget to mention that you got the lead from us.

Event – SAPForum – Sept9 @Tour & Taxis, Brussels

sappage

We will be present with a booth at the SAP event on September 9th in Tour & Taxis.

We are also managing a topic lunch where will will discuss the following topic:

Recruiting a Data Scientist today in Belgium (Brussels Data Science Community)
In this session you will discover what a Data Scientist is and does, on top you will also receive tips & tricks on how to find an Analytics expert.
– What’s the definition of a Data Scientist?
– How to find an Analytics expert?
Philippe Van Impe, Brussels Data Science Community

You can register here.