Training: Introductory statistics with R – 24 August 2015


  • Exploratory data analysis with R. Simple and inferential statistics with a focus on graphics, both for exploration and interpretation. All topics will be illustrated using R, where possible with the R standard example data sets. The training will take one day, and consist of 6-8 sessions, each starting with an introduction to some of the issues, followed by hands-on exercises.



  • 1. Descriptive statistics
    Types of variables: nominal/categorical (including binary), ranked, interval, ratio; discrete vs continuous. Examples from standard R datasets: mtcars. How to represent as variables (which type/class?). ‘Factors’ in R: PITA or blessing?

    • 1.1. Univariate
      Numeric summaries: parametric (mean, variance, standard deviation, coefficient of variation), non-parametric (median, quartiles, quantiles), mode & range. 5-number summaries.
      Graphs: box plot, bar graph, histogram (and difference with histogram), rug plot. Symmetrical vs skewed distributions. Difference between mean and median in skewed distributions. Outliers.
      1.2. Bivariate: two nominal variates
      Numeric summaries: contingency tables; contingency coefficients: measure of association (e.g. hair colour vs eye colour)
      Graphs: (heat maps); mosaic plots
      1.3. Bivariate: nominal vs numerical
      Numeric summaries: differences in the mean/median…
      Graphs: overlapping histograms or bar graphs; box plot
      1.4. Bivariate: ordinal vs numerical
      Numeric summaries: covariance, correlation
      Graphs: heat map, scatter plots
      1.5. Multivariate:
      Numeric summaries: correlation matrix
      Graphs: ‘pairs’ (matrix of ‘scatterplots’). 3d graphs are difficult to interpret, better inspect individual bivariate relationships separately. Possibility to use colour, symbol type and size to vary with different variables.
  • 2. Inferential statistics
    R basics to allow simulations: random numbers, set.seed. Importance of simulated data in understanding concepts and models

    • 2.1. Concepts needed for inferential stats:
      Law of large numbers; simulation demonstrating how sums of IDD variates are approaching normal distribution; form and variance of the distribution of the mean; not everything is normally distributed (e.g. multiplicative processes instead of additive, log is normally distributed so need for transformations)
      Probability distributions, normal distribution; Z-scores, basis of testing; errors of type I and II; Z test; confidence intervals; significance and power; dangers of repeated testing
      2.2. Simple statistical tests
      2.2.1. t-tests: compensating for uncertainty over estimate of variance. Different forms of t test: one and two-sample; paired sample; equal and different variance. ANOVA as extension of t test
      2.2.2. Linear regression: predicting one variate from another. Significance of correlation, intercept, slope. Confidence bands. Diagnostic plots.
      2.2.3. Contingency tables, Chi squared


  • Edward Vanden Berghe holds a PhD in Science, with a background in Marine Biodiversity. He has been actively using R for several years, and teaching statistics for several decades.


  • Duration: 8h
  • Date and time: 24 August 2015, start at 9 am
  • Location: European Data Innovation Hub @ AXA, Vorstlaan 23, 1170 Brussel
  • Price: 300€


Please register via the eventbrite web site, on


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s