Summer Data Science activities in Belgium.

summer edition

We wish you happy holidays, in case you get bored check out our educational channel on youtube.

The European Data Innovation Hub is active during the summer.
Here is a short update from what to expect in the coming weeks:

Thank you for supporting the European Data Innovation Hub, we had a great academic year.

Philippe Van Impe
@pvanimpe
www.di-academy.com

Please forward the information about the data science bootcamp to your peers and friends.

Advertisements

Data Science Meetup about the Panama Papers with Mar Cabra

putin

There was a big crowd attending inspiring talk by Mar Cabra from ICIJ last Thursday at the Data Science Meetup at the VUBrussels.

mar cabraMar gave a whole new meaning to Messi data. This data was originally obtained from an anonymous source by reporters at the German newspaper Süeddeustche Zeitung, who asked ICIJ to organize a global reporting collaboration to analyze the files.

More than 370 reporters in nearly 80 countries probed the files for a year. Their investigations uncovered the secret offshore holdings of 12 world leaders, more than 128 other politicians and scores of fraudsters, drug traffickers and other criminals whose companies had been blacklisted in the US and elsewhere.

Here is the link to her presentation.

The data is available and can be downloaded ! users are now able to search through the data and visualize the networks around thousands of offshore entities, including, when available, Mossack Fonseca’s internal records of the company’s true owners. The interactive database  also includes information about more than 100,000 additional companies that were part of the 2013 ICIJ Offshore Leaks investigation.

Try it yourself and download the database:

Some interesting links:

We were very happy that she could come to do this for our community. Although we did not record the presentation, here are two videos for info:

How the ICIJ Used Neo4j to Unravel the Panama Papers – Mar Cabra
https://www.youtube.com/watch?v=S20XMQyvANY
(very similar to last night, from GraphConnect Europe in London on 26th April);

The Making of a Scoop – The Panama Papers (W.Krach,
Süddeutsche Zeitung & K.Auletta) | DLDnyc 16
https://www.youtube.com/watch?v=_Yfq1gwAQZE

 

 

 

The Panama Papers is a global investigation into the sprawling, secretive industry of offshore that the world’s rich and powerful use to hide assets and skirt rules by setting up front companies in far-flung jurisdictions.
Based on a trove of more than 11 million leaked files, the investigation exposes a cast of characters who use offshore companies to facilitate bribery, arms deals, tax evasion, financial fraud and drug trafficking.
Behind the email chains, invoices and documents that make up the Panama Papers are often unseen victims of wrongdoing enabled by this shadowy industry.

Data Science Trainings Belgium

Datascience - Training calendar datascience training

scalaSpark  neo4j_logo_globe sqlLogo R spark mooc business-analytics-with-r-online-training

The European Data Innovation Hub facilitates  a full series of Data Science and Big Data training programmes organized by its partners.

You can expect

  • a series of executive training to support your management in understanding the benefits of analytics
  • a series of coached MOOCs on machine learning and big data technology
  • a series of hands-on training on the different datascience technologies

All members of the European Data Science and Big Data communities are welcome to use our Brussels based professional facilities to give their training. The members of the hub will promote your training and include it on our e-learning platform for further use.

The full list is available here.

Here are some highlights for the coming months:

Check out the full agenda here.

How to get the best price:

You can always use Eventbrite to order and pay for your ad-hoc trainings but if you want to benefit from volume discounts then you could contact Philippe on 0477/23.78.42 | pvanimpe@dihub.eu .

Have you been to our Meetups yet ?

Each month we organize a Meetup in Brussels focused on a specific DataScience topic.

Brussels Data Science Meetup

Brussels, BE
1,608 Business & Data Science pro’s

The Brussels Data Science Community:Mission:  Our mission is to educate, inspire and empower scholars and professionals to apply data sciences to address humanity’s grand cha…

Next Meetup

IBM Bluemix and Analytics – Introduction

Tuesday, Feb 9, 2016, 6:30 PM
22 Attending

Check out this Meetup Group →

Graphs for HR Analytics by Rik Van Bruggen

 

Graphs for HR Analytics

Yesterday, I had the pleasure of doing a talk at the Brussels Data Science meetup. Some really cool people there, with interesting things to say. My talk was about how graph databases like Neo4j can contribute to HR Analytics. Here are the slides of the talk:

I truly had a lot of fun delivering the talk, but probably even more preparing for it.

My basic points that I wanted to get across where these:

  • the HR function could really benefit from a more real world understanding of how information flows in its organization. Information flows through the *real* social network of people in your organization – independent of your “official” hierarchical / matrix-shaped org chart. Therefore it follows logically that it would really benefit the HR function to understand and analyse this information flow, through social network analysis.
  • In recruitment, there is a lot to be said to integrate social network information into your recruitment process. This is logical: the social network will tell us something about the social, friendly ties between people – and that will tell us something about how likely they are to form good, performing teams. Several online recruitment platforms are starting to use this – eg. Glassdoor uses Neo4j to store more than 70% of the Facebook sociogram – to really differentiate themselves. They want to suggest and recommend the jobs that people really want.
  • In competence management, large organizations can gain a lot by accurately understanding the different competencies that people have / want to have. When putting together multi-disciplinary, often times global teams, this can be a huge time-saver for the project offices chartered to do this.

For all of these 3 points, a graph database like Neo4j can really help. So I put together a sample dataset that should explain this. Broadly speaking, these queries are in three categories:

  1. “Deep queries”: these are the types of queries that perform complex pattern matches on the graph. As an example, that would something like: “Find me a friend-of-a-friend of Mike that has the same competencies as Mike, has worked or is working at the same company as Mike, but is currently not working together with Mike.” In Neo4j cypher, that would something like this
 match (p1:Person {first_name:"Mike"})-[:HAS_COMPETENCY]->(c:Competency)<-[:HAS_COMPETENCY]-(p2:Person),  
 (p1)-[:WORKED_FOR|:WORKS_FOR]->(co:Company)<-[:WORKED_FOR]-(p2)  
 where not((p1)-[:WORKS_FOR]->(co)<-[:WORKS_FOR]-(p2))  
 with p1,p2,c,co  
 match (p1)-[:FRIEND_OF*2..2]-(p2)  
 return p1.first_name+' '+p1.last_name as Person1, p2.first_name+' '+p2.last_name as Person2, collect(distinct c.name), collect(distinct co.name) as Company;  
  1. “Pathfinding queries”: this allows you to explore the paths from a certain person to other people – and see how they are connected to eachother. For example, if I wanted to find paths between two people, I could do
 match p=AllShortestPaths((n:Person {first_name:"Mike"})-[*]-(m:Person {first_name:"Brandi"}))  
 return p;  

and get this:

Which is a truly interesting and meaningful representation in many cases.

  1. Graph Analysis queries: these are queries that look at some really interesting graph metrics that could help us better understand our HR network. There are some really interesting measures out there, like for example degree centrality, betweenness centrality, pagerank, and triadic closures. Below are some of the queries that implement these (note that I have done some of these also for the Dolphin Social Network). Please be aware that these queries are often times “graph global” queries that can consume quite a bit of time and resources. I would not do this on truly large datasets – but in the HR domain the datasets are often quite limited anyway, and we can consider them as valid examples.
 //Degree centrality  
 match (n:Person)-[r:FRIEND_OF]-(m:Person)  
 return n.first_name, n.last_name, count(r) as DegreeScore  
 order by DegreeScore desc  
 limit 10;  
   
 //Betweenness centrality  
 MATCH p=allShortestPaths((source:Person)-[:FRIEND_OF*]-(target:Person))  
 WHERE id(source) < id(target) and length(p) > 1  
 UNWIND nodes(p)[1..-1] as n  
 RETURN n.first_name, n.last_name, count(*) as betweenness  
 ORDER BY betweenness DESC  
   
 //Missing triadic closures  
 MATCH path1=(p1:Person)-[:FRIEND_OF*2..2]-(p2:Person)  
 where not((p1)-[:FRIEND_OF]-(p2))  
 return path1  
 limit 50;  
   
 //Calculate the pagerank  
 UNWIND range(1,10) AS round  
 MATCH (n:Person)  
 WHERE rand() < 0.1 // 10% probability  
 MATCH (n:Person)-[:FRIEND_OF*..10]->(m:Person)  
 SET m.rank = coalesce(m.rank,0) + 1;  

I am sure you could come up with plenty of other examples. Just to make the point clear, I also made a short movie about it:

The queries for this entire demonstration are on Github. Hope you like it, and that everyone understands that Graph Databases can truly add value in an HR Analytics contect.

Feedback, as always, much appreciated.

Rik