There was a big crowd attending inspiring talk by Mar Cabra from ICIJ last Thursday at the Data Science Meetup at the VUBrussels.
Mar gave a whole new meaning to Messi data. This data was originally obtained from an anonymous source by reporters at the German newspaper Süeddeustche Zeitung, who asked ICIJ to organize a global reporting collaboration to analyze the files.
More than 370 reporters in nearly 80 countries probed the files for a year. Their investigations uncovered the secret offshore holdings of 12 world leaders, more than 128 other politicians and scores of fraudsters, drug traffickers and other criminals whose companies had been blacklisted in the US and elsewhere.
The data is available and can be downloaded ! users are now able to search through the data and visualize the networks around thousands of offshore entities, including, when available, Mossack Fonseca’s internal records of the company’s true owners. The interactive database also includes information about more than 100,000 additional companies that were part of the 2013 ICIJ Offshore Leaks investigation.
We were very happy that she could come to do this for our community. Although we did not record the presentation, here are two videos for info:
How the ICIJ Used Neo4j to Unravel the Panama Papers – Mar Cabra https://www.youtube.com/watch?v=S20XMQyvANY
(very similar to last night, from GraphConnect Europe in London on 26th April);
The Panama Papers is a global investigation into the sprawling, secretive industry of offshore that the world’s rich and powerful use to hide assets and skirt rules by setting up front companies in far-flung jurisdictions.
Based on a trove of more than 11 million leaked files, the investigation exposes a cast of characters who use offshore companies to facilitate bribery, arms deals, tax evasion, financial fraud and drug trafficking.
Behind the email chains, invoices and documents that make up the Panama Papers are often unseen victims of wrongdoing enabled by this shadowy industry.
The European Data Innovation Hub facilitates a full series of Data Science and Big Data training programmes organized by its partners.
You can expect
a series of executive training to support your management in understanding the benefits of analytics
a series of coached MOOCs on machine learning and big data technology
a series of hands-on training on the different datascience technologies
All members of the European Data Science and Big Data communities are welcome to use our Brussels based professional facilities to give their training. The members of the hub will promote your training and include it on our e-learning platform for further use.
You can always use Eventbrite to order and pay for your ad-hoc trainings but if you want to benefit from volume discounts then you could contact Philippe on 0477/23.78.42 | pvanimpe@dihub.eu .
Have you been to our Meetups yet ?
Each month we organize a Meetup in Brussels focused on a specific DataScience topic.
Brussels Data Science Meetup
Brussels, BE 1,608Business & Data Science pro’s
The Brussels Data Science Community:Mission: Our mission is to educate, inspire and empower scholars and professionals to apply data sciences to address humanity’s grand cha…
Yesterday, I had the pleasure of doing a talk at the Brussels Data Science meetup. Some really cool people there, with interesting things to say. My talk was about how graph databases like Neo4j can contribute to HR Analytics. Here are the slides of the talk:
I truly had a lot of fun delivering the talk, but probably even more preparing for it.
My basic points that I wanted to get across where these:
the HR function could really benefit from a more real world understanding of how information flows in its organization. Information flows through the *real* social network of people in your organization – independent of your “official” hierarchical / matrix-shaped org chart. Therefore it follows logically that it would really benefit the HR function to understand and analyse this information flow, through social network analysis.
In recruitment, there is a lot to be said to integrate social network information into your recruitment process. This is logical: the social network will tell us something about the social, friendly ties between people – and that will tell us something about how likely they are to form good, performing teams. Several online recruitment platforms are starting to use this – eg. Glassdoor uses Neo4j to store more than 70% of the Facebook sociogram – to really differentiate themselves. They want to suggest and recommend the jobs that people really want.
In competence management, large organizations can gain a lot by accurately understanding the different competencies that people have / want to have. When putting together multi-disciplinary, often times global teams, this can be a huge time-saver for the project offices chartered to do this.
For all of these 3 points, a graph database like Neo4j can really help. So I put together a sample dataset that should explain this. Broadly speaking, these queries are in three categories:
“Deep queries”: these are the types of queries that perform complex pattern matches on the graph. As an example, that would something like: “Find me a friend-of-a-friend of Mike that has the same competencies as Mike, has worked or is working at the same company as Mike, but is currently not working together with Mike.” In Neo4j cypher, that would something like this
match (p1:Person {first_name:"Mike"})-[:HAS_COMPETENCY]->(c:Competency)<-[:HAS_COMPETENCY]-(p2:Person),
(p1)-[:WORKED_FOR|:WORKS_FOR]->(co:Company)<-[:WORKED_FOR]-(p2)
where not((p1)-[:WORKS_FOR]->(co)<-[:WORKS_FOR]-(p2))
with p1,p2,c,co
match (p1)-[:FRIEND_OF*2..2]-(p2)
return p1.first_name+' '+p1.last_name as Person1, p2.first_name+' '+p2.last_name as Person2, collect(distinct c.name), collect(distinct co.name) as Company;
“Pathfinding queries”: this allows you to explore the paths from a certain person to other people – and see how they are connected to eachother. For example, if I wanted to find paths between two people, I could do
match p=AllShortestPaths((n:Person {first_name:"Mike"})-[*]-(m:Person {first_name:"Brandi"}))
return p;
and get this:
Which is a truly interesting and meaningful representation in many cases.
Graph Analysis queries: these are queries that look at some really interesting graph metrics that could help us better understand our HR network. There are some really interesting measures out there, like for example degree centrality, betweenness centrality, pagerank, and triadic closures. Below are some of the queries that implement these (note that I have done some of these also for the Dolphin Social Network). Please be aware that these queries are often times “graph global” queries that can consume quite a bit of time and resources. I would not do this on truly large datasets – but in the HR domain the datasets are often quite limited anyway, and we can consider them as valid examples.
//Degree centrality
match (n:Person)-[r:FRIEND_OF]-(m:Person)
return n.first_name, n.last_name, count(r) as DegreeScore
order by DegreeScore desc
limit 10;
//Betweenness centrality
MATCH p=allShortestPaths((source:Person)-[:FRIEND_OF*]-(target:Person))
WHERE id(source) < id(target) and length(p) > 1
UNWIND nodes(p)[1..-1] as n
RETURN n.first_name, n.last_name, count(*) as betweenness
ORDER BY betweenness DESC
//Missing triadic closures
MATCH path1=(p1:Person)-[:FRIEND_OF*2..2]-(p2:Person)
where not((p1)-[:FRIEND_OF]-(p2))
return path1
limit 50;
//Calculate the pagerank
UNWIND range(1,10) AS round
MATCH (n:Person)
WHERE rand() < 0.1 // 10% probability
MATCH (n:Person)-[:FRIEND_OF*..10]->(m:Person)
SET m.rank = coalesce(m.rank,0) + 1;
I am sure you could come up with plenty of other examples. Just to make the point clear, I also made a short movie about it:
The queries for this entire demonstration are on Github. Hope you like it, and that everyone understands that Graph Databases can truly add value in an HR Analytics contect.