Register for this course

Data Science using R - summer course at AAU

One week course in Data Science at Aalborg University from August 20 to August 24, 2018, in Aalborg, Denmark. The course is offered by Department of Mathematical Sciences, Aalborg University.

The software used in this course are R and RStudio. Both are open source and freely available.

Who attends?

The course is intended for individuals with experience in

  • data analysis using other tools than R;
  • programming and/or databases, but limited or no experience with data analysis and modelling; and/or
  • using R and an interest in being updated on both R and statistics

Note, that the course uses R as a programming language, and data science tasks are performed by commands and programming some routines. Hence, participants should be willing to and interested in programming for data science.

R is a programming language available on most operating systems (OS X, Windows and Linux). Data analysis and visualisations in R can be embedded in several data analytical tools (e.g. Microsoft Power BI) and database systems (e.g. Microsoft SQL Server R Services). RStudio is the state-of-the-art IDE for R that integrates scripting, syntax completion, visualisation, support for version control and much more.

What can you expect?

After the completion of the course, the participants

  • will be able to streamline the data import and data handling through the use of tidyverse
  • can make presentation-ready graphics to visualise their own data using ggplot2
  • are able fit, interpret and present various statistical models for both numeric and categorical data using methodologies like linear regression lm (also in penalised form using the LASSO glmnet), support vector machines svm, naive bayes classifier naiveBayes, classification and regressions trees (CART) rpart, etc.
  • can reduce the dimensionality of their data by principal components analysis princomp for visualisation and modelling
  • can perform cluster analysis using \(K\)-means kmeans and hierarchical clustering hclust used to produce dendrograms and in heatmap graphics
  • have acquired the skills to present their analysis in standard alone documents using rmarkdown and knitr
  • will be able to produce simple shiny applications and dashboards for interactive demonstrations of data and models through the use of responsive graphics and tables
  • can use github for version control and sharing their work with others.

These skills will enable you to efficiently wrangle your data into a desired format for further analysis. This includes the abbility to aggregate, summarise and visualise the data at various steps in the data analysis. As R is a scripting language you are free from the constraints of an usual spread sheet program like Excel. The scripts also serves as a transparent and reproducible framework for re-doing your analysis over and over again - as well as re-using essential parts in other analyses. The graphics produced by R and in particular ggplot2 are used professionaly by academics, data visualisation communities and data scientists. As intellegent use of graphics can say more than a thousand words – bringing your data, models and insight to a visual format is a key point in data analysis. RStudio makes it easy to integrate your scripts, tables and graphics into an interactive output using either Shiny or dashboards that can be shared with your organisation.

Topics

  • Import of data from different source (e.g. files, databases)
  • Data management and handling
  • Graphics and knowledge discovery
  • Interactive presentations with dashboards
  • Reproducible analyses with Rmarkdown
  • Regression and classification methods
  • Supervised and unsupervised learning
  • Dimension reduction
  • Version control and collaborative work using github

The topics will provide an essential toolbox for data science and data analytics. Furthermore, R provides a rich eco-system that ease the workflow for the data scientist, where reproducibility and communication of the analysis is optimised through Rmarkdown, Shiny and interactive dashboards.

Type of instruction

The course will be taught in English but with the possibility of getting help and asking questions in Danish. Each day will be divided in lectures and hands-on sessions, where the participants will solve relevant tasks related to the specific topic in R.

Solutions to the exercises and other scripts will be made available to the participants for review and inspiration after each session.

The course will be based on the book “Data Science for R” by Garrett Grolemund and Hadley Wickham. This book covers the foundamental parts of data manipulation and data science. Additional course topics will be covered through the use of free online materials and course notes.

We also invite the participants to bring their own project challenges and data. When time permits, we will provide specific guidance in solving challenges related to these projects and directions for further work.

Faculty

Associate Professors Mikkel Meyer Andersen, Torben Tvedebrink and Søren Højsgaard.

  • Mikkel Meyer Andersen has 15 years of experience in programming, including using computation in applied statistics and with IT technologies such as web and databases. He has authored and co-authored several R-packages.
Mikkel Meyer Andersen

Mikkel Meyer Andersen

  • Torben Tvedebrink has 15 years of experience using R. He was chairing the organising committee for useR! 2015 in Aalborg. He has contributed to several R-packages and uses R as primary tool for modelling, data handling and visualisation.
Torben Tvedebrink

Torben Tvedebrink

  • Søren Højsgaard has 20 years of experience in applied statistics. He has authored and co-authored several R-packages. He is an author of the book “Graphical Models with R”.
Søren Højsgaard

Søren Højsgaard

Price and registration

The course fee is 19.000 DKK (plus VAT) for participants from industry; half price for participants from academia.

A group discount (3 for 4) is given to registrations from the same organisation (provided that the billing information is the same). Please register individually and we will ensure that the group discount is given.

The fee covers teaching material, food and drinks during the course and a course dinner Wednesday evening.

You can register here.

Register for this course

The precise venue and itinerary will be announced soon.

Need more information?

Contact course director Torben Tvedebrink: tvede@math.aau.dk

Preparation

Participants must bring a laptop with R (minimum version 3.3.3 - “Another Canoe”) and RStudio (minimum version 1.0.136) installed.

Register for this course