The course is intended for individuals with experience in
Note, that the course uses R as a programming language, and data science tasks are performed by commands and programming some routines. Hence, participants should be willing to and interested in programming for data science.
R is a programming language available on most operating systems (OS X, Windows and Linux). Data analysis and visualisations in R can be embedded in several data analytical tools (e.g. Microsoft Power BI) and database systems (e.g. Microsoft SQL Server R Services). RStudio is the state-of-the-art IDE for R that integrates scripting, syntax completion, visualisation, support for version control and much more.
After the completion of the course, the participants
lm(also in penalised form using the LASSO
glmnet), support vector machines
svm, naive bayes classifier
naiveBayes, classification and regressions trees (CART)
princompfor visualisation and modelling
kmeansand hierarchical clustering
hclustused to produce dendrograms and in heatmap graphics
dashboardsfor interactive demonstrations of data and models through the use of responsive graphics and tables
These skills will enable you to efficiently wrangle your data into a desired format for further analysis. This includes the abbility to aggregate, summarise and visualise the data at various steps in the data analysis. As R is a scripting language you are free from the constraints of an usual spread sheet program like Excel. The scripts also serves as a transparent and reproducible framework for re-doing your analysis over and over again - as well as re-using essential parts in other analyses. The graphics produced by R and in particular ggplot2 are used professionaly by academics, data visualisation communities and data scientists. As intellegent use of graphics can say more than a thousand words – bringing your data, models and insight to a visual format is a key point in data analysis. RStudio makes it easy to integrate your scripts, tables and graphics into an interactive output using either Shiny or dashboards that can be shared with your organisation.
The topics will provide an essential toolbox for data science and data analytics. Furthermore, R provides a rich eco-system that ease the workflow for the data scientist, where reproducibility and communication of the analysis is optimised through Rmarkdown, Shiny and interactive dashboards.
The course will be taught in English but with the possibility of getting help and asking questions in Danish. Each day will be divided in lectures and hands-on sessions, where the participants will solve relevant tasks related to the specific topic in R.
Solutions to the exercises and other scripts will be made available to the participants for review and inspiration after each session.
The course will be based on the book “Data Science for R” by Garrett Grolemund and Hadley Wickham. This book covers the foundamental parts of data manipulation and data science. Additional course topics will be covered through the use of free online materials and course notes.
We also invite the participants to bring their own project challenges and data. When time permits, we will provide specific guidance in solving challenges related to these projects and directions for further work.
The course fee is 19.000 DKK (plus VAT) for participants from industry; half price for participants from academia.
A group discount (3 for 4) is given to registrations from the same organisation (provided that the billing information is the same). Please register individually and we will ensure that the group discount is given.
The fee covers teaching material, food and drinks during the course and a course dinner Wednesday evening.
You can register here.
The precise venue and itinerary will be announced soon.
Contact course director Torben Tvedebrink: email@example.com