Exercise 6 Making your own objects behave like R objects

In the previous exercises we concentrated on writing functions that take some input data, analyse it, and display the results of the analysis. The standard R functions we have used all do this. The fisher.test() function, for example, takes a table object (or the names of two variables) as input and calculates and displays the p- value for Fisher’s exact test and the odds ratio and associated confidence interval for two-by-two tables:

fem <- read.table("fem.dat", header = TRUE)
attach(fem)

## The following objects are masked from fem (pos = 7):
## 
##     AGE, ANX, DEP, ID, IQ, LIFE, SEX, SLP, WT

fisher.test(SEX, LIFE)

## 
##  Fisher's Exact Test for Count Data
## 
## data:  SEX and LIFE
## p-value = 0.03175
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##   1.080298 14.214482
## sample estimates:
## odds ratio 
##   3.620646

The results of the fisher.test() function may also be saved for later use:

ft <- fisher.test(SEX, LIFE)
ft

## 
##  Fisher's Exact Test for Count Data
## 
## data:  SEX and LIFE
## p-value = 0.03175
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##   1.080298 14.214482
## sample estimates:
## odds ratio 
##   3.620646

The fisher.test() function returns an object of the class htest:

class(ft)

## [1] "htest"

which is a list containing the output of the fisher.test() function. Each item of output is stored as a different named item in the list:

names(ft)
str(ft)

## [1] "p.value"     "conf.int"    "estimate"    "null.value"  "alternative"
## [6] "method"      "data.name"

## List of 7
##  $ p.value    : num 0.0318
##  $ conf.int   : num [1:2] 1.08 14.21
##   ..- attr(*, "conf.level")= num 0.95
##  $ estimate   : Named num 3.62
##   ..- attr(*, "names")= chr "odds ratio"
##  $ null.value : Named num 1
##   ..- attr(*, "names")= chr "odds ratio"
##  $ alternative: chr "two.sided"
##  $ method     : chr "Fisher's Exact Test for Count Data"
##  $ data.name  : chr "SEX and LIFE"
##  - attr(*, "class")= chr "htest"

Each of these items can be referred to by name:

ft$estimate
ft$conf.int

## odds ratio 
##   3.620646

## [1]  1.080298 14.214482
## attr(,"conf.level")
## [1] 0.95

When you display the output of the fisher.test() function either by calling the function directly:

fisher.test(SEX, LIFE)

## 
##  Fisher's Exact Test for Count Data
## 
## data:  SEX and LIFE
## p-value = 0.03175
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##   1.080298 14.214482
## sample estimates:
## odds ratio 
##   3.620646

or by typing the name of an object created using the fisher.test() function:

ft

## 
##  Fisher's Exact Test for Count Data
## 
## data:  SEX and LIFE
## p-value = 0.03175
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##   1.080298 14.214482
## sample estimates:
## odds ratio 
##   3.620646

The print() function takes over and formatted output is produced. The print() function knows about htest class objects and produces output of the correct format for that class of object. This means that any function that produces an htest object (or any other standard R object) does not need to include R commands to produce formatted output.

All hypothesis testing functions supplied with R produce objects of the htest class and use the print() function to produce formatted output. For example:

tt <- t.test(WT ~ LIFE)
class(tt)
tt

## [1] "htest"

## 
##  Welch Two Sample t-test
## 
## data:  WT by LIFE
## t = 0.60608, df = 98.866, p-value = 0.5459
## alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
## 95 percent confidence interval:
##  -0.3326225  0.6251763
## sample estimates:
## mean in group 1 mean in group 2 
##       0.7867213       0.6404444

You can use this feature of R in your own functions. We will explore this by writing a function to test the null hypothesis that the variance to mean ratio of a vector of numbers is equal to one. Such a test might be used to investigate the spatial distribution (e.g. over natural sampling units such as households) of cases of a disease.

Create a new function using the function() function:

v2m.test <- function(data) {}

And start the function editor:

fix(v2m.test)

Now edit this function to make it do something useful:

function(data) {
  nsu <- length(data)
  obs <- sum(data)
  m <- obs / nsu
  v <- var(data)
  vmr <- v / m
  chi2 <- sum((data - m)^2) / m
  df <- nsu - 1
  p <- 1 - pchisq(chi2, df)
  names(chi2) <- "Chi-square"
  names(df) <- "df"
  names(vmr) <- "Variance : mean ratio"
  v2m <- list(method = "Variance to mean test",
              data.name = deparse(substitute(data)),
              statistic = chi2,
              parameter = df,
              p.value = p,
              estimate = vmr)
  class(v2m) <- "htest"
  return(v2m)
}

Once you have made the changes shown above, check your work, save the file, and quit the editor.

Before proceeding we should examine the v2m.test() function to make sure we understand what is happening:

The first eight lines after the opening curly bracket ({) contain the required calculations.
The next three lines use the names() function to give our variables names that will make sense in formatted output.
The next line creates a list of items that the function returns using some of the names used by htest class objects.
The next line tells R that the list object called v2m is of the class htest.
The next line causes the function to return the v2m object (i.e. a list of class htest containing the named items method, data.name, statistic, parameter, p.value, and estimate).
The final line ends the function definition.

Note that objects of class htest may contain items with the following names:

Item	Usage
method	Text description of the test used to title output
data.name	Name(s) of data or variables used for the test
null.value	The null value
statistic	Value of test statistic
parameter	A test parameter such as the degrees of freedom of the test statistic
p.value	The p-value of the test
estimate	An estimate (e.g. the mean)
conf.int	Confidence interval of estimate
alternative	Text describing the alternative hypothesis
note	Text note

We are now ready to test the v2m.test() function. This table:

Number of cases :       0  1  2  3  4  6
Number of households : 24 29 26 14  5  2

shows the number of cases of chronic (stunting) undernutrition found in a random sample of 100 households.

We can reproduce the data behind this table using a combination of the c() and rep() functions:

stunt <- c(rep(0,24), rep(1,29), rep(2,26), rep(3,14), rep(4,5),
           rep(5,0), rep(6,2))
table(stunt)

## stunt
##  0  1  2  3  4  6 
## 24 29 26 14  5  2

And use it to test our new v2m.test() function:

v2m.test(stunt)

Which should produce the following output:

v2m.test(stunt)

## 
##  Variance to mean test
## 
## data:  stunt
## Chi-square = 110.16, df = 99, p-value = 0.2083
## sample estimates:
## Variance : mean ratio 
##               1.11274

If your vm2.test() function does not produce this output then use the fix() function:

fix(v2m.test)

to check and edit the vm2.test() function and try again.

The important thing to note from this exercise is that R allows us to specify a class for the output of our functions. This means that we can use standard R classes and functions to (e.g.) produce formatted output without us having to write commands to format the output ourselves.

More importantly, it also means that we can write functions that return values when we need them to return values but can also produce formatted output when we need them to produce formatted output.

Our v2m.test() function can produce values for later use:

vm <- v2m.test(stunt)
vm$p.value

## [1] 0.2083442

or produce formatted output:

v2m.test(stunt)

## 
##  Variance to mean test
## 
## data:  stunt
## Chi-square = 110.16, df = 99, p-value = 0.2083
## sample estimates:
## Variance : mean ratio 
##               1.11274

This way of working is not limited to using standard R classes and functions.

R also allows us to define our own classes. We will explore this by defining functions and a new class to deal with two-by-two tables.

We need to create two functions:

One function will handle the calculations.
A second function function will produce formatted output when required.

Create a new function using the function() function:

rr22 <- function(exposure, outcome) {}

And start the function editor:

fix(rr22)

Now edit this function to make it do something useful:

function(exposure, outcome) {
  tab <- table(exposure, outcome)
  a <- tab[1,1]
  b <- tab[1,2]
  c <- tab[2,1]
  d <- tab[2,2]
  rr <- (a / (a + b)) / (c / (c + d))
  se.log.rr <- sqrt((b / a) / (a + b) + (d / c) / (c + d))
  lci <- exp(log(rr) - 1.96 * se.log.rr)
  uci <- exp(log(rr) + 1.96 * se.log.rr)
  rr22.output <- list(estimate = rr, ci = c(lci, uci))
  class(rr22.output) <- "rr22"
  return(rr22.output)
}

Once you have made the changes shown above, save the file and quit the editor.

The rr22() function is similar to the tab2by2() function that you created in the second exercise of this tutorial except that the function now returns a list of values instead of formatted output:

fem <- read.table("fem.dat", header = TRUE)
attach(fem)
rr22.test <- rr22(SEX, LIFE)
names(rr22.test)
rr22.test$estimate
rr22.test$conf.int
rr22.test$conf.int[1]
rr22.test$conf.int[2]

## The following objects are masked from fem (pos = 3):
## 
##     AGE, ANX, DEP, ID, IQ, LIFE, SEX, SLP, WT

## The following objects are masked from fem (pos = 8):
## 
##     AGE, ANX, DEP, ID, IQ, LIFE, SEX, SLP, WT

## [1] "estimate" "ci"

## [1] 2.054167

## NULL

## NULL

## NULL

The function returns a list of class rr22:

class(rr22.test)

## [1] "rr22"

The displayed output from the rr22() function is, however, not pretty:

print(rr22.test)
rr22(SEX, LIFE)

## $estimate
## [1] 2.054167
## 
## $ci
## [1] 0.966417 4.366232
## 
## attr(,"class")
## [1] "rr22"

## $estimate
## [1] 2.054167
## 
## $ci
## [1] 0.966417 4.366232
## 
## attr(,"class")
## [1] "rr22"

This can be fixed by creating a new function:

print.rr22 <- function(x) {}

And start the function editor:

fix(print.rr22)

Now edit this function to make it do something useful:

function(x) {
  cat("RR     : ", x$estimate, "\n",
      "95% CI : ", x$ci[1], "; ", x$ci[2], "\n", sep = "")
}

Once you have made the changes shown above, check your work, save the file, and quit the editor.

The function name print.rr22() indicates that this function contains the print method for objects of class rr22. All objects of class rr22 will use the function print.rr22() instead of the standard R print() function to produce formatted output:

rr22(SEX, LIFE)
rr22.test <- rr22(SEX, LIFE)
rr22.test
print(rr22.test)

## RR     : 2.054167
## 95% CI : 0.966417; 4.366232

## RR     : 2.054167
## 95% CI : 0.966417; 4.366232

## RR     : 2.054167
## 95% CI : 0.966417; 4.366232

Note that we can still extract returned values from an rr22 class object:

rr22.test$estimate

The print.rr22() function only controls the way an entire rr22 object is displayed.

You might like to use the save() function to save the v2m.test(), rr22(), and print.rr22() functions before quitting R. We can now quit R:

q()

For this exercise there is no need to save the workspace image so click the No or Don’t Save button (GUI) or enter n when prompted to save the workspace image (terminal).

6.1 Summary

R objects can be assigned a class or type.
Objects of a specific class or type may share functions that extract and manipulate data common to members of that class. This allows you to write functions that handle data that is common to all members of that class (e.g. to produce formatted output for hypothesis testing functions).
R provides a set of ready-made classes (e.g. htest) which can be used by standard R functions such as the print() and summary() functions.
R allows you to create new classes and class-specific functions that can extract and manipulate data common to the new classes.
Classes allows you to create versatile functions that return values when you need them to return values but can also produce formatted output when you need them to produce formatted output.
Classes allow you to write functions that can be chained together so that the output of one function is the input of another function.