Introduction

These notes are intended as a practical introduction to using the R environment for data analysis and graphics to work with epidemiological data. Topics covered include univariate statistics, simple statistical inference, charting data, two-by-two tables, stratified analysis, chi-square test for trend, logistic regression, survival analysis, computer-intensive methods, and extending R using user-provided functions. You should be able to follow the material if you are reasonably familiar with the mechanics of statistical estimation (e.g. calculation of odds ratios and confidence intervals) and require a system that can perform simple or complex analyses to your exact specifications.

These notes are split into ten sections:

 

Introduction: You are reading this section now!

 

Introducing R: Some information about the R system, the way the R system works, how to get a copy of R, and how to start R.

 

Exercise 1: Read a dataset, producing descriptive statistics, charts, and perform simple statistical inference. The aim of the exercise is for you to become familiar with R and some basic R functions and objects.

 

Exercise 2: In this exercise we explore how to manipulate R objects and how to write functions that can manipulate and extract data and information from R objects and produce useful analyses.

 

Exercise 3: In this exercise we explore how R handles generalised linear models using the example of logistic regression as well as seeing how R can perform stratified (i.e. Mantel-Haenszel) analysis as well as analysing data arising from matched case-control studies.

 

Exercise 4: In this exercise we use R to analyse a small dataset using the methods introduced in the previous exercises.

 

Exercise 5: In this exercise we explore how R can be extended using add-in packages. Specifically, we will use an add-in package to perform a survival analysis.

 

Exercise 6: In this exercise we explore how to make your own R functions behave like R objects so that they return a data-structure that can be manipulated or interrogated by other R functions.

 

Exercise 7: In this exercise we explore how you can use R to produce custom graphical functions.

 

Exercise 8: In this exercise we explore some more graphical functions and create custom graphical functions that produce two variable plots, pyramid charts, Pareto charts, charts with error bars, and simple mesh-maps.

 

Exercise 9: In this exercise we explore ways of implementing computer-intensive methods, such as the bootstrap and computer based simulation, using standard R functions.

 

If you are interested in a system that is flexible, can be tailored to produce exactly the analysis you want, provides modern analytical facilities, and have a basic understanding of the mechanics of hypothesis testing and estimation then you should consider following this material.