Datasciencebe comes second in the study from @marc_smith using NodeXL SNA Map

Selection criteria: data science OR #datascience Twitter NodeXL SNA Map and Report for Tuesday, 08 July 2014 at 17:54 UT

From: marc_smith,  Uploaded on: July 08, 2014
Description:
  • The graph represents a network of 6,564 Twitter users whose tweets in the requested range contained “data science OR #datascience”, or who were replied to or mentioned in those tweets. The network was obtained from the NodeXL Graph Server on Tuesday, 08 July 2014 at 17:59 UTC.
  • The requested start date was Tuesday, 08 July 2014 at 23:59 UTC and the maximum number of tweets (going backward in time) was 10,000.
  • The tweets in the network were tweeted over the 17-day, 1-hour, 40-minute period from Friday, 20 June 2014 at 21:48 UTC to Monday, 07 July 2014 at 23:28 UTC.
  • There is an edge for each “replies-to” relationship in a tweet, an edge for each “mentions” relationship in a tweet, and a self-loop edge for each tweet that is not a “replies-to” or “mentions”.
  • The graph is directed.
  • The graph’s vertices were grouped by cluster using the Clauset-Newman-Moore cluster algorithm.
  • The graph was laid out using the Harel-Koren Fast Multiscale layout algorithm.
  • The edge colors are based on edge weight values. The edge widths are based on edge weight values. The edge opacities are based on edge weight values. The vertex sizes are based on followers values. The vertex opacities are based on followers values.

Overall Graph Metrics:

  • Vertices: 6564
  • Unique Edges: 7487
  • Edges With Duplicates: 4294
  • Total Edges: 11781
  • Self-Loops: 5169
  • Reciprocated Vertex Pair Ratio: 0.0284219703574542
  • Reciprocated Edge Ratio: 0.0552729738894541
  • Connected Components: 2411
  • Single-Vertex Connected Components: 1890
  • Maximum Vertices in a Connected Component: 3054
  • Maximum Edges in a Connected Component: 7070
  • Maximum Geodesic Distance (Diameter): 19
  • Average Geodesic Distance: 5.396585
  • Graph Density: 0.000136909565312827
  • Modularity: 0.537045
  • NodeXL Version: 1.0.1.331

Top 10 Vertices, Ranked by Betweenness Centrality:

  1. kirkdborne
  2. datasciencebe
  3. kdnuggets
  4. analyticbridge
  5. jackwmson
  6. wsj
  7. datasciencedojo
  8. coursera
  9. zeynep
  10. data_nerd

Job – RealDolmen – Senior BI Consultant

Original post here

 

Dat word je niet zómaar. Daar moet je wel wat voor in je mars hebben. Allereerst: een onweerlegbare passie voor ICT. En daarnaast de ambitie om het waar te maken bij een van de toonaangevende ICT-spelers van België. Want dát is RealDolmen.RealDolmen helpt bedrijfsstrategieën te vertalen in efficiënte en betrouwbare ICT-oplossingen die echt werken. Dat doen we volgens het principe van plan | build | operate. Wat houdt dat in? Dat we instaan voor heel de ICT-levenscyclus, van analyse over projectimplementatie tot opleiding en onderhoud.

Onze 1.800 medewerkers maken dit elke dag al waar. Maar om onze groeistrategie een extra impuls te geven, zijn we op zoek naar een (m/v):

RealDolmen – Senior BI Consultant

Ben jij die ervaren BI Consultant met passie voor ons vakgebied? Dan wordt het tijd om eens kennis te maken met de marktleider op het gebied van PM en BI.

Functie

  • Samen met inspirerende collega’s werk je aan Performance Management en Business Intelligence trajecten.
  • Als Business Intelligence Consultant opereer je op het raakvlak van techniek en business.
  • Je hebt veel klantcontact en bent hierdoor een belangrijke steunpilaar voor het project.
  • Om onze klanten optimaal gebruik te laten maken van de ongekende mogelijkheden van BI werk je hands-on aan diverse projecten.
  • De diversiteit van de verschillende projecten bieden de mogelijkheid tot een flexibele insteek: je kan ingezet worden in kader van Data Warehouse architect, maar ook ruimer als Business Intelligence architect tot Reporting Specialist en ETL Specialist.

Profiel

  • Minimum een Bachelor diploma
  • Business Intelligence (ontwerp en implementatie van datawarehouse, rapportage- en analyseomgeving).
  • Een brede visie en interesse op BI en in staat deze over te brengen op verschillende niveaus;
  • Kennis en ervaring met één of meerdere BI-tools;
  • Goede communicatieve vaardigheden, zowel mondeling als schriftelijk;
  • Analytisch sterk met een praktische instelling
  • Sterk probleemoplossend vermogen
  • zelfstandig maar ook teamwerker.

Aanbod

  • Een boeiende en uitdagende job
  • Een werkomgeving waar een “no-nonsense” cultuur centraal staat en waar kennisdeling een essentiële waarde is – niet enkel binnen RealDolmen maar ook naar de bredere community toe
  • Werken binnen een team met ruime ervaring op BI vlak
  • Competitief salaris
  • Opleidingen op basis van een persoonlijk opleidingsplan (PoP)
  • Constante mogelijkheid om bij te scholen via de RealDolmen Academy

RealDolmen moedigt expliciet kandidaten van 45 jaar en ouder aan om zich ook kandidaat te stellen.

How Sears Became a Real-Time Digital Enterprise Due to Big Data

Original post here
Sears,_Robuck_&_Co._letterhead_1907

Sears is a large retailer from the US that is a true Big Data pioneer for quite some years already. They have learned, made mistakes and achieved success by hands-on effort. It currently operates a very large enterprise deployment of Hadoop.

Sears was founded in 1893 and started as a shipping and mail order company. In 2005 it was acquired by Kmart, but continued to be operating under its own brand. In 2013 they had 798 stores and had revenue of over $ 21 billion. They are the fourth largest department store in the US and they offer millions of products across their stores. They have data of over 100 million customers, which they analyse to offer real-time, relevant offers to their customers. They are deep into Big Data and combine massive amounts of data to become a real-time digital enterprise.

 

Sears was ahead of its time, and its competitors, regarding Big Data. Already in 2010 they had a 10-nodes Hadoop cluster, which Walmart only reached in 2012. These days, Sears has a Hadoop cluster of 300-nodes that is populated with over 2 petabytes of structure customer transaction data, sales data and supply chain data. They used to have data in silos in many locations, but now their objective is to get all data in one place in order to achieve a single point of truth about the customer. But that’s not all; Sears applies Big Data also to combat fraud, track the effectiveness of marketing campaigns, optimize (personal) pricing, the supply chain as well as promotional campaigns.

Personalized Pricing

Sears combines and mixes vast amounts of data to help set (personal) prices in near real-time. Data from product information, local economic conditions, competitor prices etc. are combined and analysed using a price elasticity algorithm, which enables Sears to find the best price for the right product at the right moment in time and location via customized coupons. These coupons are given to loyal shoppers and are also used to move inventory if necessary. Just a few years ago this would have been still a dream scenario, as it used to take Sears up to 8 weeks to find the best price due to legacy systems, but nowadays this can be done almost in real-time.

In the past years, Sears went from nation wide pricing strategies to regional and now also personal pricing. The coupons that customers receive are based on where the customers live, the amount of products that are available as well as the products that need to go and which products Sears believes the customer will like and consequently will buy.

Shop Your Way Rewards loyalty program

In 2011, Sears launched a new loyalty program called the Shop your Way Rewards loyalty program. Also this program runs on Hadoop and that enables them to make use of 100% of the data that is collected. This results in better-targeted customers for certain online and mobile scenarios.

They key for Sears is to maximize multi-channel customer engagement through the loyalty program. Customers are providing their personal data in return for relevant interaction with that customer through the right channel, according to Dr. Phil Shelley, CTO at Sears Holdings Corporations, in an interview on Forbes.

Sears’ Big Data platform

In the past, Sears used all different kinds of tools on the data that was across the organisation in silos. These legacy systems prevented Sears from offering the right product at the right moment for the right price. Sears started by experimenting and innovating with Big Data, exactly as companies should when starting with Big Data. They began with a Hadoop cluster running on a net book computer and from there on they started experimenting. They have learned the hard way, through trial and error, among others due to the few outside Big Data experts they had that could guide them with the platform. They have managed to build a large centralized platform where all data is stored. The platform uses a variety of (open-source) tools such as Hive, Pig, Hbase, Solr, Lucene and MapReduce. This offers them all possibilities to have personalized interactions with the customer as well as use their data for different applications across the company.

Sears Big Data platform

 

Next to Hadoop, Sears also uses Datameer, a data exploration tool that enables visualization directly on top of Hadoop, for their ad-hoc queries, without the need for IT to be involved. Previously these jobs required ETL jobs that could take up to a few weeks. At the moment, Sears only gives their users access to Hadoop data via Datameer.

Sears started using Big Data because of declined sales, while major competitors such as Amazon kept growing. In the past years they have managed to move rapidly into the Big Data era and are turning their company into a real-time digital enterprise. A great achievement for a company that is over a century old.

7 HABITS OF HIGHLY EFFECTIVE TWEETERS: A Longitudinal Study of Follow Predictors on Twitter

A 15 MONTH STUDY OF 500K TWEETS HIGHLIGHTS 7 HABITS OF HIGHLY EFFECTIVE TWEETERS

I just came across this scientific study explaining what the most effective habits were of successful users of twitter. This is the first study I read that examines what factors are associated with an increased follower-count on Twitter over an extended period of time. The study analysed in 2013 507 Twitter users and a half-million of their tweets over 15 months.

The 7 habits of highly effective tweeters are:

  1. always send out positive messages
    • People are attracted to people who talk positively
  2. always talk about others
    • Self obsessed people gain less followers
    • Informers: 20% share information and reply to other users.
    • Meformers: 80% mostly send out information about themselves.
  3. Create ‘Social Proof’
    • Being retweeted is a sign to others you are worth following
    • Also those who gave a web address, a location and a long description were more likely to attract followers
  4. Stay on topic ( in our case Data Science)
    • This study found that people who remained more ‘on-topic’ tended to attract more followers.
  5. Check spelling and limit hashtag use
    • People hate the #random #use #of #hashtags.
  6. Short bursts of activity are ok
    • Twitter users who were bursty [10 tweets in 1 hour] from time-to-time tended to attract more followers.
  7. Increase direct mentions and reduce broadcast tweets
    • It pays; in terms of more followers to increase the proportion of directed tweets and decrease the broadcasts.

 

These points might just be timely reminders of good Twitter etiquette, nevertheless it’s quality advice for every Tweeter.

A Longitudinal Study of Follow Predictors on Twitter

Thank you CJ, Sarita & Eric for this excellent study. The study report is available here
C.J. Hutto
School of Interactive Computing 
Georgia Institute of Technology 
cjhutto@gatech.edu

Sarita Yardi
School of Information 
University of Michigan 
yardi@umich.edu

Eric Gilbert
School of Interactive Computing 
Georgia Institute of Technology 
gilbert@cc.gatech.edu

Job – KBC – applicatiebeheerder Data

Original Version here

logo_kbc

 

KBC applicatiebeheerder Data,

Referentiecode B06RMD0321EXT002

Functieomschrijving (incl. context van de activiteiten)

Welkom tot de zeer boeiende en snel wijzigende omgeving van internet en mobile. Binnen Divisie België is de directie Directe Kanalen, en meer specifiek de dienst ‘Click’ (DKC)
verantwoordelijk voor het beheren en uitbouwen van de directe digitale kanalen tussen KBC en onze cliënt.

Situering van de activiteiten

Het datateam Click is te situeren binnen deze dienst.

Deze dynamische ploeg heeft een drievoudige missie:

  • die van analytisch competence center rond e-channels gerelateerde aangelegenheden,
  • de aanlevering van beleidsinformatie via reporting en dashboards,
  • het applicatiebeheer voor ondersteunende webanalyse tools zoals bv. de SAS Web Analytics (CxA) tool evenals het behartigen van de datacollectie.

Het doel van het datateam is om de overige click-medewerkers te faciliteren op het vlak van analyses en cijfermateriaal zodat zij hun beslissingen kunnen onderbouwen/kracht bijzetten en/of om mogelijke optimalisatie opportuniteiten bloot te leggen (datadriven werken). We rekenen hiervoor o.a. op web analyse gerelateerde tools zoals Google Analytics, Sas CXA en Adobe Analytics.

Er is nood aan een functionele expert die de tools beheert en een brugfunctie vervult tussen de eindgebruikers van de toepassing enerzijds en ICT anderzijds.

Je opdracht

We zijn op zoek naar een applicatiebeheerder met een passie voor het operationeel beheren van een aantal web analytics applicaties inclusief databronnen en die een passie heeft voor projectwerking.
Je bent een applicatiebeheerder die een trekkersrol wil opnemen om de behoeftes van onze cliënten servicegericht in te vullen:

  • Je fungeert als SPOC voor de web analyse toepassing(en) en vervult de rol vantoegangsbeheerder;
  • Je onderzoekt, coördineert en verzorgt de communicatie aangaande nieuwe releases van de gebruikte web analyse toepassingen;
  • Je bepaalt in samenspraak met de eindgebruikers de meetopzet en documenteert, organiseert en coördineert testen in het kader hiervan;
  • Je overwaakt het data collectie proces en de correctheid van de gecollecteerde data;
  • Je beheert het budget voor gewenste aanpassingen en vervult de brugfunctie tussen eindgebruikers en ICT;
  • Je volgt de trends op die verband houden met data en data collectie tools;
  • Je werkt actief mee in projecten binnen directe kanalen voor website, aanvraagprocessen ….

Dit betekent dat je mede verantwoordelijk bent voor zowel de technische als functionele analyse fase van de meetopzet binnen deze domeinen;

  • Je bent een spil in de behandeling van incidenten;
  • Je neemt initiatieven om een doorgedreven kennis te verwerven van de fascinerende, digitale leef- en werkwereld waarbinnen we vandaag opereren zodat je een optimale service verlening kunt garanderen naar onze eindgebruikers;
  • Je bent medeverantwoordelijk voor het realiseren van de directe en indirecte (afspraken/lead generation) verkoopdoelstellingen.

Dit alles gebeurt in nauwe samenwerking met de reporting medewerkers, de data scientists, de data architecten en de functionele eindgebruikers.

Plaats van tewerkstelling : Leuven
Tewerkstellingspercentage: 100%

Profiel & Opleiding

  • Je hebt een bachelor of master diploma.
  • Je bent analytisch ingesteld en beschikt over voldoende synthese vaardigheden.
  • Je kan zelfstandig en planmatig werken en tegelijkertijd ben je een team-player.
  • Je levert kwaliteitsvol werk en staat garant voor tijdige opleveringen.
  • Je bent flexibel met het binnen nemen van dringende opdrachten en wijzigingen in de planning.
  • Je kan zowel schriftelijk als mondeling ideeën op een duidelijke manier overbrengen. De geschreven documentatie is gestructureerd, helder en in begrijpbare taal.
  • Je kan overtuigen o.b.v. duidelijke argumenten, standpunten.
  • Je staat open voor de mening van andere collega’s en laat teamresultaat primeren op het individueel belang.
  • Cliëntgerichtheid staat bij jou centraal.
  • Je kan je snel inwerken in nieuwe materie, je bent leergierig, op zoek naar nieuwe uitdagingen.
  • Je hebt affiniteit met ICT en projectmatig werken.
  • Je bent stress bestendig.
  • Je kan communiceren in het Engels (zowel schriftelijk als mondeling).

Opleiding:

Je zal deel uitmaken van een enthousiast team.
Opleiding gebeurt ‘on-the-job’ en onder begeleiding.

Meer informatie.

Je kunt bij KBC rekenen op:

  • actieve begeleiding tijdens je loopbaan,
  • een uitzonderlijk aanbod aan opleidings- en ontwikkelingskansen,
  • diverse doorgroeimogelijkheden,
  • een contract van onbepaalde duur,
  • een competitief beloningspakket, aangevuld met een ruim pakket aan aanvullende voordelen en personeelsvoorwaarden op onze bank- en verzekeringsproducten,
  • mogelijkheden om je werk en je privéleven op elkaar af te stemmen,
  • een dynamische werkomgeving met een open cultuur en aangename werksfeer

Hoe solliciteren?

Solliciteer online met het sollicitatieformulier op deze website.  here

 

Big Data from the bottom up by Nick Couldry & Alison Powell

Orriginal Version here

Big Data from the bottom up

Abstract

This short article argues that an adequate response to the implications for governance raised by ‘Big Data’ requires much more attention to agency and reflexivity than theories of ‘algorithmic power’ have so far allowed. It develops this through two contrasting examples: the sociological study of social actors used of analytics to meet their own social ends (for example, by community organisations) and the study of actors’ attempts to build an economy of information more open to civic intervention than the existing one (for example, in the environmental sphere). The article concludes with a consideration of the broader norms that might contextualise these empirical studies, and proposes that they can be understood in terms of the notion of voice, although the practical implementation of voice as a norm means that voice must sometimes be considered via the notion of transparency.

 

Introduction

We are living through a transformation of governance – both its mechanisms and reference-points – which is likely to have profound implications for practical processes of government and everyday understandings of the social world. A shift is under way from discrete forms of intervention in social space based on intermittent and/or specific information-gathering to continuous processes of management based on total and unremitting surveillance (Ruppert, 2011). Both management and government increasingly are becoming predicated upon the continuous gathering and analysis of dynamically collected, individual-level data about what people are, do and say (‘Big Data’). However misleading or mythical some narratives around Big Data (Boyd and Crawford, 2011Couldry, 2013), the actual processes of data-gathering, data-processing and organisational adjustment associated with such narratives are not mythical; they constitute an important, if highly contested, ‘fact’ with which all social actors must deal. This article will offer a social approach to the construction and use of such data and related analytics.

The possibility of such a social approach to Big Data has, until now, been obscured by unnecessarily generalised readings of the consequences of these broad changes. Without a doubt, the information types that management and governance take as their starting-point have changed: it is digital infrastructures of collection, transmission, analysis and presentation that have made possible continuous data-mining. Compared to representative sampling, such new approaches to data collection are totalising; they are also characterised by the aggregation of multiple data sets through the use of calculation algorithms. This seemingly increased role for algorithms has led some commentators to focus on the dominance of ‘algorithmic power’ (Lash, 2007), an approach that leaves no room for agency or reflexivity on the part of ‘smaller’ actors. We posit that emerging cultures of data collection deserve to be examined in a way that foregrounds the agency and reflexivity of individual actors as well as the variable ways in which power and participation are constructed and enacted.

This more agent-focussed inquiry into the consequences of algorithmic calculation’s deep embedding in everyday life has been foreshadowed in some earlier debates, notably Beer’s (2009) response to Lash’s (2007) argument that ‘algorithmic power’ has changed the nature of hegemony. As Beer (2009: 999) noted, sociology must also ‘focus… on those who engage with the software in their everyday lives’. Such a focus does not come naturally within Lash’s broadly philosophical formulations of issues in social theory which foreground ‘a collapse of ontology and epistemology’ (Lash, 2006: 581), and a new power-laden regime of ‘facticity’ (Lash, 2007: 56) in which ‘there is no time, nor space… for reflection’ (Lash, 2002: 18). If that were right, why pay close attention to what actors say when they ‘reflect’ on their position in the social world? But this analytic closure is unhelpful.

Needed instead is a more open enquiry into what actual social actors, and groups of actors, are doing under these conditions in a variety of places and settings. Without denying of course the ‘generative’ importance of algorithms (Lash, 2007: 71) when embedded in modes of calculation, processing and rule, we need to remember that social actors are often themselves aware of being classified. Even if they are not aware of the details of when, by whom, and how they have been classified, that this has happened is something of which they aware, and indeed one of the main ‘facts’ they have to deal with as social actors. We need to become sensitive to what Beer (2009: 998) has called people’s ‘classificatory imagination’ and, over the longer-term, the wider ‘social imaginaries’ (Mansell, 2012Taylor, 2005) that may be emerging around these new cultures of data collection.

Beer goes on helpfully to distinguish three levels of resulting empirical research: first, regarding the ‘organizations that establish and activate Web 2.0 applications’; second, regarding the ‘software infrastructures and their applications on the web’ and third, regarding how the first two levels ‘play out in the lives of those that use (or do not use) particular web applications’ (2009: 998). We would like in this short article to build particularly on Beer’s third level, and on the lessons of our own empirical researches, to map out some more detailed and concrete ways of researching the everyday uses of data and analytics from a social perspective. The result is to open up a much wider and more varied space of agency and reflexivity than allowed for in philosophical accounts. The likely outcome may be no less critical of Big Data’s implications, but will develop critique through a more nuanced characterisation of ‘Big Data’ as a variegated space of action, albeit one very different from the spaces in which pre-digital social actors operated.

Doing social analytics

Our first example of a more agent-focussed account of Big Data is what has been called ‘social analytics’ (see Couldry et al., (forthcoming) for a much more detailed account). A social analytics approach is an explicitly sociological treatment of how analytics get used by a range of social actors. Such an approach aims to capture how particular actors reflect upon, and adjust, their online presence and the actions that feed into it, through the use of ‘analytics’. ‘Analytics’ here is used broadly to cover both basic analytics (the automated measurement and counting installed within the operation of digital platforms and associated websites, apps and tools) and the adjustments made by actors themselves in response to such measurement and counting operations. Platforms that count and sort online data, such as Google and Facebook, work automatically via algorithms, often allowing users only limited degrees of manual adjustment (van Dijck, 2013). Other adjustments around those operations may take direct digital form (a website redesign) or organisational form (an adjustment in an organisation’s management of its resources). In all these cases, the variable use of analytics is a social process involving reflection, monitoring and adjustment.

By ‘social actors’ we mean actors with social ends over and above the basic aim of generating and analysing data (usually for profit): that basic aim in itself is of little sociological interest. The broader sociological interest starts when there is some tension, actual or potential, between the aims that social actors are trying to achieve and the interpretations of their activities that analytics generate. This use of the term ‘social analytics’ encompasses, but goes beyond, the everyday ‘technical’ use of the term ‘analytics’ to mean the measurement and reporting of Internet data. The mutual intertwining of human and material agency is hardly a new insight (Pickering, 1995: 15–20), but it acquires a special interest when analytics’ operations are opaque to non-expert social actors who must work hard to acquire control over them.

One key variable in such research is what is measured and analysed, the ‘object’ of analytics. The underlying data’s relationship to an organisation’s online presence may be more or less direct: direct if the data is literally about that organisation’s online presence (numbers of unique users, their characteristics, types of interaction with online content); or indirect if the data is not about an organisation’s online presence, but is generated or presented online, becoming part of how that organisation is judged by online visitors (online reviews, debates). The closeness, or distance, of the relation between the object of data analysis and the general aims and practice of social actors clearly will shape the degree of tension and reflexivity that exists over the implementation of analytics. At one end of the spectrum will be cases where analytics are used directly to support other mechanisms of power (e.g. performance management); at the other end will be cases where what is at stake in the use of analytics is the broad redefinition of an organisation’s aims and performance, with no direct impact on the evaluation or management of individuals. In the former case, social analytics may merge into the study of management and power; in the latter case, social analytics may be something closer to a phenomenology of how social actors and organisations with social aims appear to themselves, and to the world, under digital conditions.

Other variables when doing social analytics will include the degree of technical expertise of the actors involved, including the degree to which they can go beyond merely using off-the-shelf analytics to customising them, or perhaps even developing their own analytic tools and data-collection designs. Financial and other resources will also affect how far the processes which social analytics studies can develop, or get blocked, for example, if the staff to do the analytic work that would enable a richer re-evaluation of an organisation’s digital presence cease to be available. Expertise and resources are, of course, variables in any fieldwork setting.

Within these basic parameters, however, social analytics promises a rich vein of inquiry into the conditions of data use and analytics use, from the perspective of social actors who are not principally experts in relation to data or algorithms, but who look to them do certain work towards other ends. It has so far been explored in the context of community and civic activism, but it has the potential to be expanded to many more areas.

 

Data as media

For media scholars more generally, the shift to a data rich environment poses challenges for a robust understanding of how agency and expression might still work within that environment. The critical tradition in media and communications has largely been concerned with the operation of power in the construction of systems of symbolic mediation – for example, the function of ideological systems (in the Marxist tradition) or the Gramscian concept of hegemony. These strategies have allowed media and communication scholars to ‘work backwards’ through systems of symbolic mediation in order to understand the process and initial starting points of mediated ‘messages’. This focus on the symbolic quality of media messages allows us to examine power relationships from several different vantage points. Within traditional broadcast media forms we can observe how the symbolic control of mediated messages solidifies control and results in things like propaganda, but we can also see how alternative media producers can wrest control of ideas and their representation to challenge that kind of hegemony.

Broadcast models have however been overtaken, for important purposes, by models of mass-self communication. Whereas, institutionalised mass media is structured to disseminate messages from one to many, mass self-communication is structured to invite continual input of data by individuals. This reorganisation of media production initially seemed to promise a reconfiguration of the top-down production of ideology and the bottom-up resistance to it, but as political–economic analyses have developed, we are beginning to see how such shifts have also led to the production of data replacing the production of audiences.

If the exemplary product of institutionalised mass media is propaganda, the exemplary product of mass self-communication is data. A mass media apparatus requests information to be disseminated from the one to the many; its economic model uses this information to generate an audience whose attention can be sold to an advertiser. In the mass self-communication model individuals are still part of an aggregate product to be sold, but instead of their attention on a single message produced for broadcast, it is their individual acts of communication that comprise the ‘Big Data’ and drives much media value-extraction.

Early critics of mass-self communication noted that the model encouraged individuals to create ‘content’ that was then sold to others in order to capture their attention (Terranova, 2000van Dijck, 2013). However, ‘content’ is still expressive, even when it is sold to capture attention. A more complicated issue concerns the data that is produced, often unwittingly, which now generates much of the value in the newest iteration of the contribution economy. Many everyday activities now produce data without requiring human meaning-construction (or even basic consent). The rise of sensor networks has meant that increasingly individuals are producing not ‘content’ composed of messages containing intrinsic or constructed meaning, but mere data – temperature readings, status updates, location coordinates, tracks, traces and check-ins. Not one of these individual data-types is necessarily meaningful in itself – but taken together, either through aggregation, correlation or calculation, such data provide large amounts of information. The difference between this and the ‘content’ that mass-self communication promises to distribute is that the meaning of data is made not semantically (through expression and interpretation) but through processing – especially the matching of metadata (Boellstorf, 2013). Big Data sets are composed of numerous pieces of information that can be cross-compared, aggregated and disaggregated and made very finely grained, not things whose creators necessarily endowed them with meaning. In mining the data, more insights are made available about more aspects of everyday life but no opportunity is provided for these insights to be folded back into the experience of everyday life. In this context, is there any scope, as Boellstorf urges, for integrating the epistemic perspectives of ethnography back into the calculative logic of meta-data?

All along, the political economy of personal data, as anticipated by Gandy (1993), has been concerned with value created through the aggregation and calculation of individual traces. Even if we leave aside the expressive quality of individual acts of communication online, the production of data as a by-product of everyday life practices enacts a particular political economics of media, undertaken within a situation of pervasive surveillance and generalised authoritarianism (Cohen 2012). But the potential disconnect between system and experience, phenomenology and political economy, can be overcome by examining on the ground agents’ strategies for building alternative economies of information. Such alternative economies are being developed in several areas related to environment and sustainability, including projects that use data sources to make provenance and supply chains visible, and those that encourage individuals and communities to collect data as a means to make environmental issues visible by challenging conventional data collection.

Academic projects like Wikichains (Graham, 2010) and start-up companies like Provenance.it (2013) aggregate various forms of data about the production, distribution and supply chains of manufactured objects as a means of drawing attention to their long-term ecological and economic costs. While Provenance.it remains anchored in a consumer-based economic model, it does illustrate how alternative modes of data collection and analysis could shift agency and representation, especially if it permitted for greater reflexivity. Similarly, NGOs like Mapping for Change (2013) have supported individuals and community groups in gathering environmental data (like air quality and noise) as a means of engaging with gaps and flaws in official data. These actions intervene in efforts to use such environmental data within top-down governance processes. As Gabrys (2014) identifies, such citizen science efforts must be enfolded and imagined in processes of environmental governance or ‘biopolitics 2.0’. These examples illustrate two ways that an alternative economics of information might employ calculation of multiple data sources or generation of alternative sources to illustrate or critique power relations, although they also illustrate the ambiguity of accountability within these processes.

Voice, transparency and power

The rise of analytics presents a significant normative challenge for scholars, activists and others who seek to understand how humanity, sociability and experience are represented. The daily practices of grappling with data and with the consequences of data analyses generate new questions about what and whose power gets exercised through such practices, and to what degree such exercises of power are satisfactorily made accountable. One approach to these challenges is through attention to problems of voice (Couldry, 2010) Voice, understood as a value for social organisation (Couldry, 2010: chapter 1), involves taking into account agents’ practices of giving an account of themselves and their conditions of life. The value of voice is essential to the workings of any models so far developed of democratic institutions, but it is not immediately compatible with a world saturated with the automated aggregation of analytic mechanisms that are not, even in principle, open to any continuous human interpretation or review.

While the notion of voice insists upon organisational processes being accountable to the subjectivities and expressiveness of all, the movement towards more casual, automatic sensing and its calculative rather than epistemic logic seems to eliminate this accountability. Yet clearly something similar to ‘voice’ is required in this new world., and this is not just a matter of democracy: ‘we have no idea’ wrote Paul Ricoeur ‘what a culture would be where no one any longer knew what it meant to narrate things’ (Couldry, 2010: 1, quoting Ricoeur, 1984: 29). At present, the proxy for voice in the algorithmic domain is the notion that data gathering processes ought to be transparent, and the logic of calculation revealed. A focus on transparency could begin to foreground notions of accountability in data calculation, ownership and use.

Notions of transparency have been discussed with respect to government production and use of data (Tkacz, 2012). Yet despite pledging to make public data collection transparent, governments like the US and the UK in fact collect much more information via surveillance projects and partnerships with information technology companies. With the reform of the USA’s National Security Administration, perhaps more attention will begin to be paid to the data collection practices of the technology sector, making more of them visible. This kind of transparency goes part of the way to establishing accountability, but it still fails to address accountability and reflexivity. A refined concept of transparency that is sensitive to the meaning that data trails might form (even if it cannot be sensitive to the meaning inherent in their production) might go some way to addressing this. This is a tricky proposal: unless and until the unconscious production of data can be conceived of as a form of expression, the philosophical basis for such an expansive transparency will be difficult to establish. One possible way to proceed might be to highlight not just the risks of creating and sharing data, but the opportunities as well. The practices of social analytics and citizen science have the potential to establish these opportunities, ambiguous as they may be.

We hope that, as the debates about Big Data and society continue and their democratic stakes become clearer, the values implicit in the terms ‘voice’ and ‘transparency’ will themselves begin to converge in more satisfying ways that is at present possible.

 

Declaration of conflicting interest

The authors declare that there is no conflict of interest.

 

Funding

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

This article is distributed under the terms of the Creative Commons Attribution 3.0 License (http://www.creativecommons.org/licenses/by/3.0/) which permits any use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (http://www.uk.sagepub.com/aboutus/openaccess.htm).

Waar is de Data Scientist? door Edwin van Unen

Orriginal Version: http://business-analytics.biz/analytics/761#comments
Waar is de Data Scientist?
Waar is de Data Scientist?

Het is veelvuldig te lezen in diverse media: “Data is de nieuwe olie van de 21-e eeuw”, “De Data Scientist is het beroep van de toekomst”. Big Data is hot. Voor het ontginnen van die Big Data en er daadwerkelijk waarde voor de business uit te halen zijn mensen nodig met vaardigheden en tools. Veel bedrijven stoeien echter met de vraag hoe te beginnen: hoe verkrijg ik nieuwe inzichten door data uit social media en websites, maar ook door het combineren van sensordata en transactionele data? Wat is daar voor nodig en waar haal ik de mensen vandaan die dat kunnen?

Wat is een Data Scientist?
Als je mensen vraagt naar de definitie van een Data Scientist, dan krijg je geen eensluidend antwoord. Accenten worden verschillend gelegd, maar een rode draad is: iemand die in staat is data uit verschillende (on)gestructureerde bronnen, intern en extern, te verwerken, deze te analyseren met geavanceerde algoritmen, de resultaten te interpreteren en te vertalen naar een business context en hierover ook te communiceren met (hoger) management. Hij combineert kennis van wiskunde/statistiek met hands-on computerkennis en goede communicatievaardigheden. Dat klinkt als het schaap met de vijf poten. Een wizzkid met vlotte babbel en gouden handjes? Of een onderzoeker die vernieuwend denkt en altijd op zoek is naar nieuwe structuren en onverwachte verbanden. Waar halen we die vandaan?

Vernieuwende opleidingen
Aankomend studenten wordt geadviseerd een studie te kiezen waarin data-analyse centraal staat. Ook op hogescholen en universiteiten is hier steeds meer aandacht voor. In het afgelopen jaar heb ik diverse groepen studenten kennis laten maken met SAS om in te zetten voor datamining en visualisatie. Ik ervaar daarbij veel enthousiasme; studenten geven aan dit graag in te zetten bij hun stage of afstudeeropdracht. SAS biedt daarvoor diverse mogelijkheden. Maar er is ook een gat tussen de theoretische kennis van de student en de vaardigheden om dit toe te passen in praktijk. Soms ontbreekt het aan affiniteit met bedrijfsprocessen of is juist de wiskundige diepgang niet voldoende. Universiteiten spelen hier gelukkig wel steeds beter op in. Zo is vorig jaar het Data Science Center in Eindhoven gestart, een goed voorbeeld van hoe de toekomstige behoefte ingevuld kan worden.

Wie vult dit gat?
Hoewel Big Data en de Data Scientist gezien worden als iets van de laatste jaren en staan beide begrippen hoog in de Gartner Hype Cycle, is het deels ook oude wijn in nieuwe zakken. SAS analyseert al meer dan 38 jaar data en vaak ook in grote hoeveelheden. Functies als data- of business analist, dataminer of gewoon statisticus bestaan al vele jaren. Nieuw voor hen is wellicht wel de grote hoeveelheid beschikbare data, uit meerdere bronnen en vaak ook ongestructureerd. Dit vergt nieuwe vaardigheden, zoals het omgaan met Machine Learning-technieken of Hadoop, maar zij zijn vaak goed in staat zich deze aan te leren. Authentieke interesse in de nieuwste technieken en deze zelf kunnen toepassen is belangrijk.

Daarnaast kan data science ook goed als team worden opgepakt, waarbij specialisten op hun gebied effectief worden ingezet. Het schaap met de vijf poten is immers geen gemeengoed. Het inzetten van software waarmee goed kan worden samengewerkt is dan cruciaal. Het is van belang op elkaars resultaten voort te bouwen en bevindingen snel met elkaar te delen. Zoek uw nieuwe Data Scientist dus niet alleen buiten uw organisatie. In een analytisch gedreven organisatie, met de juiste processen, middelen en cultuur komt de echte Data Scientist vanzelf tot bloei.

Masterclasses Data Science
Wilt u weten wat Data Science voor uw organisatie kan betekenen? Of wilt u zich verder ontwikkelen als Data Scientist? SAS organiseert dit najaar een reeks masterclasses aan waarin u in een paar maanden volledig up to date bent van de ontwikkelingen op het gebied van Data Science. Voor meer informatie hierover: afdeling SAS Opleidingen, 035-699 6999/ opleidingen@sas.com.

“Do you need a Masters Degree to become a Data Scientist?” Read practical tips and interesting commentary.

Leading analytics experts answer the question: “Do you need a Masters Degree to become a Data Scientist?” Read practical tips and interesting commentary.

By Gregory Piatetsky, @kdnuggets, Jun 27, 2014.

KDnuggets LinkedIn GroupKDnuggets Analytics, Data Mining, and Data Science LinkedIn Group has many active discussions, and recently one such discussion was prompted by a question from Alok Sharma:

Is it necessary to have Masters Degree to become a Data Scientist? Or are there any certificate courses that can help me to become a Data Scientist?

Data Scientist - a Unicorn?This discussion is now going for 4 months, and got responses from many leading data scientists and professors, including Mark A. Biernbaum, Goutam Chakraborty, Michael Fahy, Myles Gartland, Vincent Granville, Daniel Dean Gutierrez, Steven Miller, Greta Roberts, and myself.

The consensus seems to be that good practical skills can take the place of a MS degree, but there are many interesting comments and practical tips – see below.

In case you are interested in Masters, here are Analytics and Data Science Education options, including

Here are selected and most interesting answers from the discussion in answer to the question:
Is it necessary to have Masters Degree to become a Data Scientist? Or are there any certificate courses that can help me to become a Data Scientist?

Steven Miller, Data Maestro, Talent & Skills Ecosystem at IBM
To learn data science — absolutely. A few schools are building undergraduate programs that will be akin to a computer science degree. You will learn core skills, but they won’t make you a scientist who will be advancing the field.

Sal DiStefano, Developer at Restaurant Technologies, Inc.
Remember that Data Scientist is just a Title. (A media hyped title) Some give themselves or have this title because that’s the work they do, not because they have a particular degree. Some may hold degrees in Statistics, Mathematics, Computer Science, the disciplines vary.

You can learn data science anywhere. No single Masters Program could cover all the disciplines needed in significant depth for one to be an expert in all these areas. Selecting an area or two or three and having depth and expertise in those is common. Many companies do not have just a “Data Scientist” but teams comprised of experts from the different disciplines.

While some institutions are offering or creating Masters Programs with this title, most of the current field of Data Scientist have no such Degree. Check out the following linken.wikibooks.org/wiki/Data_Science:_An_Introduction/A_Mash-up_of_Disciplines to see a list of disciplines considered within the Data Science area.

The new Columbia Data Science Institute is offering a new Masters in Data Science Program idse.columbia.edu/masters as well as a certificate program for those who already hold advanced degrees.

There is the Johns Hopkins University Online Data Science Certificate program available on Coursera bit.ly/1dTkXju,

Experience and doing in my opinion is the best way to become a “Data Scientist”. There are many ways to do this with or without an advanced degree program. There are many doing great “Data Science” work under other titles.

Joyce Crum, MBA, P.E., Operations & Business Manager
I agree, Data Scientist is just a title. A good Masters to get you in the arena is Operations Research. My experience was, I had to understand the system to apply the math or tools, not just how to do the math or work the tool. Good luck.

Gregory Piatetsky-Shapiro, Analytics/Data Mining Expert, KDnuggets President
There are many analytics certificates – seewww.kdnuggets.com/education/analytics-data-mining-certificates.html . I also recommend you take part in some Kaggle competitions – a good result there shows your competence (but it is not easy – competition is stiff!).

Greta Roberts, Co-founder and CEO, Talent Analytics, Corp.
You may want to check out the hands on / practical / certificate that comes with a SAS certification at the end (a very valuable certification). All done online – in your own time. math.kennesaw.edu/academics/certificate/sas-dm/index.html

Nidhi Kohli, Online Marketing, content writer at Jigsaw Academy
Big data knowledge is not very difficult to obtain, and anyone with some needed pre-requisites like existing knowledge of statistics, programming and databases concepts can become a big data professional

Based on job requirements, the skills in most demand are Hadoop/Big Data, tools including R and SAS, and some domain knowledge. Theory is assumed as a prerequisite, but usually good data selection and engineering is more important than advanced algorithms.

However, there is a strong demand for analytic talent and a shortfall in supply. If you have a master’s degree, it will be add on for you but if you don’t have, many companies will overlook this as long as you have the right skills.

Please check the below link for India’s top 10 Analytics institutes.education.sulekha.com/top-10-analytics-training-institutes-in-india_602070_blog

Daniel Dean Gutierrez, Data Scientist at AMULET Analytics
I was at a Meetup last night where a guy, a “programmer for 31 years,” said that a last year he decided to call himself a data scientist. He wanted to take advantage of the new hype. He said it was really easy to become a data scientist. He started by taking Andrew Ng’s class in machine learning through Coursera, but was “destroyed” by the class and had to drop. Then he took the Coursera “Computing for Data Analysis” class, 4 weeks to basically learn R. Then he took an expensive Data Science on-premise classes. And voila! A data scientist is born.

Having an academic background in data science, I’m hard pressed to call this gentleman a data scientist. I think it takes more than a couple MOOC classes and and more time to take on that moniker.

Gregory Piatetsky-Shapiro, Analytics/Data Mining Expert, KDnuggets President
The person Daniel refers to may be a good data analyst/coder, but not yet a Data Scientist. Knowledge of R , Python or other tools is secondary to knowing how to approach the data, how to ask right questions, and good intuition about what works and what not. Those skills are critical to a good data scientist, but take more than a few weeks and

Jim Lola, Entrepreneur, Sr. Manager, Technologist, Architect, & Author
So what if you have undergrad degrees in History, Statistics, and CS, and then advanced degrees in CS and/or SE. Experience doing actuarial analysis, financial fraud analysis, failure analysis, BioStatistics, HUMINT, OSINT, and organizational theory analysis. And experience developing software for HPC systems, information management systems, DBMS (so a lot of information theory app), etc. in languages like C/C++, PHP, Java, Python, R, and Julia. And, of course, a natural curiosity on how things work and the ability to hire and manage other folks who also have a passion for information. And finally, have run a business and made business decisions. So does someone like this qualify as a Data Scientist? Just curious…

Matthew O’Connor, LTC Analyst
I think people are really getting way too hung up on the term “data scientist”. Due to it being completely nebulous as to what exactly it is at this point I would say it is merely a buzzword that may or may not end up getting a hard definition in the future. Speaking as someone who has minimal experience in the field and currently enrolled in Northwestern’s MSPA program, I can confidently say that when I complete my degree I will NOT be a data scientist (if I had years of experience prior to beginning the program I might be singing a different tune though). However, I do believe it will give me a solid foundation to build upon.

In the end, I believe it requires a combination of years of experience and a relevant degree. Furthermore I would say a certain amount of aptitude is also required. I’m sure there are people with a B.S. in Computer Science that have been doing analysis for over ten years that can and should be called data scientists just as there are PhD’s out there who should not.

So to conclude I believe it is wise for newcomers like myself just to become excellent at data analysis/mining and not worry about monikers – those will come naturally once a certain degree of success has been achieved. That’s my two cents at any rate.

Daniel Dean Gutierrez, Data Scientist at AMULET Analytics
Jim, what you describe is a “unicorn,” something many companies are seeking when hiring a data scientist.
Matthew … as one long-time “data scientist” I love the new term for what I do. I think it aptly describes what I do, what I’ve always done, with data.

Vincent Granville, Data Scientist, Startup Entrepreneur
Masters will eventually change and adapt. I wouldn’t be surprised that some organizations/companies will soon offer a solid master, at almost no cost, online and on-demand. We are actually working on delivering such a high-quality training to practitioners with a quant background. The idea is to help interested candidates acquire all my useful experience and knowledge gathered over my 25 years career, spanning across multiple continents and various data science roles (Visa, Microsft, eBay, Wells Fargo amd start-ups), in a compact format delivered online on-demand in less than six months.

Aatash Shah, Founder & CEO at Edvancer Eduventures Pvt. Ltd.
One doesn’t become a data scientist overnight. You need to take it step by step especially if you are new to the whole analytics/data science show. To become a data scientist you need to be hands-on on various tools and technologies and these vary right from the basic MS Excel and SQL to statistical software like SAS/R/SPSS, languages like Python, Perl, C++, Java etc. and technologies that can handle Big Data like the Hadoop ecosystem. Apart from this you would be expected to have good knowledge of business to be able to eventually bring out the insights from the data.

To take a step by step approach an expensive Master’s degree may not be the best solution as no degree will eventually cover all these requisites and there will always be newer tech coming up. The best way would be to take a modular approach in learning all this stuff through short, inexpensive certificate courses. After all it is the knowledge which matters and certificate courses probably provide more hands-on, practical knowledge at a cheaper price than a Master’s.

Check out some certificate course providers in analytics here:analyticsindiamag.com/top-8-analytics-training-institutes-in-india/

Alok Sharma, Programer Analyst at BitWise Inc
Thank You Sal, Greg, Daniel, Vincent, Atash & others. Your Comments were really insightful. I think I will start off by taking up Data Analysis courses on Coursera followed by industry relevant course in analytics and work on developing my knowledge until I land up to a relevant job.

Myles Gartland, Professor and Director of Graduate Business Programs at Rockhurst University; Chief Analyst at Insightful Analytics
To me it is also like asking do I need have a graduate degree to be a CEO. Well, no. A data scientist does not require any licensure- so technically you need so specific credential. Your degree usually gets you in the door, and your skills let you keep and excel at your job. All that said, you do not need a graduate degree to DO the job, but you might need one to GET the job (look at many of the job postings and their requirements).

Vincent Granville, Data Scientist, Startup Entrepreneur
If you don’t need to sell something to someone (a real human – like selling yourself to get a job), but instead generate revenue via automated data science systems that do not require human interactions (stock trading, various arbitraging systems including keyword bidding, sport bets, data science publisher generating revenue via Google Adwords, some types of hacking), then you don’t need any diploma or certifications. Not even high school, not even primary school.

Goutam Chakraborty, Professor (Marketing) at Oklahoma State University and Management Consultant
This is a great discussion. I have taught, advised, counseled more than 500 students (in last 10+ years) of our graduate certificate program in data mining at Oklahoma State University (analytics.okstate.edu). Most of our students work in the field of data mining, predictive analytics, marketing analytics, web analytics, marketing science, data science ….. After having talked to 100’s of major corporations and employers, I feel a data scientist (wanted by a company) is someone with a “multiple personality disorder” who can still function well! This person has knowledge and abilities in

1. Programming (SQL, Python, Java, …..) and exposure to big data via Hadoop, MapReduce…
2. Statistical and numerical models along with ability to do visualization and optimization (using multiple software platforms including proprietary such as SAS and open-source such as R)
3. Have domain expertise to understand how all these apply in the context of a business to cerate value
4. A good communicator so that the person can explain the models to users who are unlikely to accept the models if they do not understand them
5. A person with curiosity, determination, team-player, leader… you name it.

So, can you develop all these skills in one course? or a short program? NO!

How about through a series of well-designed courses (not 1 or 2 perhaps 4 or more spread over a span of 1-2 years so you have time to assimilate knowledge and put those to work) that build on each other plus hands-on experience in working with complex data and models – Yes (that is what we do at our graduate certificate program for working professionals taking courses via online). But, of course, I am biased because I run the program.

Atul Thatte, Senior Manager, Advanced Transaction & Consumption Analytics

I completely agree that “Data Scientist” is just a title. A Master’s degree can be beneficial, however, a set of courses that provide a balance between theoretical depth and practical breadth would be ideal. I would highly recommend Dr. Chakraborty’s Graduate Data Mining Certificate program at Oklahoma State University. Having completed it myself, I know first hand that the program provides a solid theoretical foundation, a lot of experience with publishing/presenting in industry fora, and competing in industry sponsored competitions such as the annual SAS analytics shootout. It is available online, and so its quite practical for full time professionals. Hope this helps answer your question.

Michael Fahy, Associate Dean, School of Computational Sciences, Chapman University
You need a combination of Mathematics, Statistics and Computer Science.

Here is a sample list of courses from our MS degree in Computational and Datat Sciences at Chapman University:

  • CS 510 Foundations of Scientific Computing
  • CS 520 Mathematical Modeling
  • CS 530 Data Mining
  • CS 540 High-Performance Computing
  • CS 555 Multivariate Data Analysis
  • CS 595 Computational Science Seminar
  • CS 611 Time Series Analysis
  • CS 612 Advanced Numerical Methods
  • CS 613 Machine Learning
  • CS 614 Interactive Data Analysis
  • CS 615 Digital Image Processing

Myles Gartland, PhD, Professor and Director of Graduate Business Programs at Rockhurst University; Chief Analyst at Insightful Analytics
to pile on to Michaels and Daniel’s list- lets not forget context, domain and communication. A few classes in communication and basic business (assuming they will work for one) will go a long way too. People sitting in cubes writing models without understanding of businesses questions and ability to communicate their results lose some of their value.

Mark A. Biernbaum, PhD Researcher- 25 years experience; Children’s Institute, Clinical/Social Psychology, University of Rochester
The current crop of Data Scientists has learned the work on the job. A lot of great research work is learned on the job. A Master’s or credential program could create problems for the person obtaining them once they get on the job and find that the work is as much experience as it is education.

Vincent Granville, Data Scientist, Startup Entrepreneur
If you are a self-funded entrepreneur, you don’t even need a primary school education, not even kindergarden: your education and job title do not matter, only your capacity to generate value and profits. In my case, though having a PhD, I never mention my degree – nobody (except time-wasters) is asking anyway. Indeed, ignoring people asking about my (real!) credentials has been one of the best strategies to stop wasting time in discussions going to nowhere.

Romakanta Irungbam, Analytics Consultant, Predictive Modeler
Except for companies that have just started doing a few things in analytics and dream of a hiring a non-existent master of all trades, experience is a more valuable thing than a Masters or any other degree.

When I started out sometime in 2003 (in India), it was called Research and Analytics because most of the analytics work were linked with marketing research studies/surveys. Quantum and SPSS were very popular software then, while sampling and significance tests were the most used analytical techniques. I learned all of these on the job.

Sometime around 2007, I had to learn SAS programming because the new company I joined, uses SAS for all their analytics projects. I also learned techniques like logistic regression, cluster analysis, and factor analysis at this company.

During 2009 till 2012, I worked on Netezza and Teradata to extract and acquire the data I needed for my predictive modeling and other analytical projects. While I continued using SAS, I had to learn SPSS Modeler because one of my biggest clients uses SPSS. I also became very good in a lot of statistical/data mining techniques – Decision Trees, Regression Models, Time Series, Mixed Models, etc.

And finally in my current role (starting 2012), I learned MS SQL and Tableau. I also did my first and very challenging SAS Macro programming which is more than 800 lines of code and will accomplish the same tasks that used to take weeks, in about a day.

All the software/programming and the statistical/data mining techniques were not learned in college or a formal coaching environment. Most of the times, it was searching and reading on Google, a good discussion with the team, and in a few cases, a colleague or someone senior who will help when the right questions were asked. What I want to say is – the media, some companies and their ‘executives’ try to make everything sound very technical and critical/essential – R, Hadoop, Big Data, MongoDB, Deep Learning, etc,. etc,. – but at the end of the day, you can and will learn anything when there is a requirement. And sometimes, there will be someone who will push you into the water, someone who will make you take the first step when you have your doubts.

So, my answer is no. 🙂