Exercise 6 Making your own objects behave like R objects
In the previous exercises we concentrated on writing functions that take some input data, analyse it, and display the results of the analysis. The standard R
functions we have used all do this. The fisher.test()
function, for example, takes a table
object (or the names of two variables) as input and calculates and displays the p- value for Fisher’s exact test and the odds ratio and associated confidence interval for two-by-two tables:
<- read.table("fem.dat", header = TRUE)
fem attach(fem)
## The following objects are masked from fem (pos = 7):
##
## AGE, ANX, DEP, ID, IQ, LIFE, SEX, SLP, WT
fisher.test(SEX, LIFE)
##
## Fisher's Exact Test for Count Data
##
## data: SEX and LIFE
## p-value = 0.03175
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 1.080298 14.214482
## sample estimates:
## odds ratio
## 3.620646
The results of the fisher.test()
function may also be saved for later use:
<- fisher.test(SEX, LIFE)
ft ft
##
## Fisher's Exact Test for Count Data
##
## data: SEX and LIFE
## p-value = 0.03175
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 1.080298 14.214482
## sample estimates:
## odds ratio
## 3.620646
The fisher.test()
function returns an object of the class htest
:
class(ft)
## [1] "htest"
which is a list containing the output of the fisher.test()
function. Each item of output is stored as a different named item in the list:
names(ft)
str(ft)
## [1] "p.value" "conf.int" "estimate" "null.value" "alternative"
## [6] "method" "data.name"
## List of 7
## $ p.value : num 0.0318
## $ conf.int : num [1:2] 1.08 14.21
## ..- attr(*, "conf.level")= num 0.95
## $ estimate : Named num 3.62
## ..- attr(*, "names")= chr "odds ratio"
## $ null.value : Named num 1
## ..- attr(*, "names")= chr "odds ratio"
## $ alternative: chr "two.sided"
## $ method : chr "Fisher's Exact Test for Count Data"
## $ data.name : chr "SEX and LIFE"
## - attr(*, "class")= chr "htest"
Each of these items can be referred to by name:
$estimate
ft$conf.int ft
## odds ratio
## 3.620646
## [1] 1.080298 14.214482
## attr(,"conf.level")
## [1] 0.95
When you display the output of the fisher.test()
function either by calling the function directly:
fisher.test(SEX, LIFE)
##
## Fisher's Exact Test for Count Data
##
## data: SEX and LIFE
## p-value = 0.03175
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 1.080298 14.214482
## sample estimates:
## odds ratio
## 3.620646
or by typing the name of an object created using the fisher.test()
function:
ft
##
## Fisher's Exact Test for Count Data
##
## data: SEX and LIFE
## p-value = 0.03175
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 1.080298 14.214482
## sample estimates:
## odds ratio
## 3.620646
The print()
function takes over and formatted output is produced. The print()
function knows about htest
class objects and produces output of the correct format for that class of object. This means that any function that produces an htest
object (or any other standard R
object) does not need to include R
commands to produce formatted output.
All hypothesis testing functions supplied with R
produce objects of the htest class and use the print()
function to produce formatted output. For example:
<- t.test(WT ~ LIFE)
tt class(tt)
tt
## [1] "htest"
##
## Welch Two Sample t-test
##
## data: WT by LIFE
## t = 0.60608, df = 98.866, p-value = 0.5459
## alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
## 95 percent confidence interval:
## -0.3326225 0.6251763
## sample estimates:
## mean in group 1 mean in group 2
## 0.7867213 0.6404444
You can use this feature of R
in your own functions. We will explore this by writing a function to test the null hypothesis that the variance to mean ratio of a vector of numbers is equal to one. Such a test might be used to investigate the spatial distribution (e.g. over natural sampling units such as households) of cases of a disease.
Create a new function using the function()
function:
<- function(data) {} v2m.test
And start the function editor:
fix(v2m.test)
Now edit this function to make it do something useful:
function(data) {
<- length(data)
nsu <- sum(data)
obs <- obs / nsu
m <- var(data)
v <- v / m
vmr <- sum((data - m)^2) / m
chi2 <- nsu - 1
df <- 1 - pchisq(chi2, df)
p names(chi2) <- "Chi-square"
names(df) <- "df"
names(vmr) <- "Variance : mean ratio"
<- list(method = "Variance to mean test",
v2m data.name = deparse(substitute(data)),
statistic = chi2,
parameter = df,
p.value = p,
estimate = vmr)
class(v2m) <- "htest"
return(v2m)
}
Once you have made the changes shown above, check your work, save the file, and quit the editor.
Before proceeding we should examine the v2m.test()
function to make sure we understand what is happening:
The first eight lines after the opening curly bracket (
{
) contain the required calculations.The next three lines use the
names()
function to give our variables names that will make sense in formatted output.The next line creates a list of items that the function returns using some of the names used by
htest
class objects.The next line tells
R
that the list object calledv2m
is of the classhtest
.The next line causes the function to return the
v2m
object (i.e. a list of classhtest
containing the named itemsmethod
,data.name
,statistic
,parameter
,p.value
, andestimate
).The final line ends the function definition.
Note that objects of class htest may contain items with the following names:
Item | Usage |
---|---|
method | Text description of the test used to title output |
data.name | Name(s) of data or variables used for the test |
null.value | The null value |
statistic | Value of test statistic |
parameter | A test parameter such as the degrees of freedom of the test statistic |
p.value | The p-value of the test |
estimate | An estimate (e.g. the mean) |
conf.int | Confidence interval of estimate |
alternative | Text describing the alternative hypothesis |
note | Text note |
We are now ready to test the v2m.test()
function. This table:
Number of cases : 0 1 2 3 4 6
Number of households : 24 29 26 14 5 2
shows the number of cases of chronic (stunting) undernutrition found in a random sample of 100 households.
We can reproduce the data behind this table using a combination of the c()
and rep()
functions:
<- c(rep(0,24), rep(1,29), rep(2,26), rep(3,14), rep(4,5),
stunt rep(5,0), rep(6,2))
table(stunt)
## stunt
## 0 1 2 3 4 6
## 24 29 26 14 5 2
And use it to test our new v2m.test()
function:
v2m.test(stunt)
Which should produce the following output:
v2m.test(stunt)
##
## Variance to mean test
##
## data: stunt
## Chi-square = 110.16, df = 99, p-value = 0.2083
## sample estimates:
## Variance : mean ratio
## 1.11274
If your vm2.test()
function does not produce this output then use the fix()
function:
fix(v2m.test)
to check and edit the vm2.test()
function and try again.
The important thing to note from this exercise is that R
allows us to specify a class for the output of our functions. This means that we can use standard R
classes and functions to (e.g.) produce formatted output without us having to write commands to format the output ourselves.
More importantly, it also means that we can write functions that return values when we need them to return values but can also produce formatted output when we need them to produce formatted output.
Our v2m.test()
function can produce values for later use:
<- v2m.test(stunt)
vm $p.value vm
## [1] 0.2083442
or produce formatted output:
v2m.test(stunt)
##
## Variance to mean test
##
## data: stunt
## Chi-square = 110.16, df = 99, p-value = 0.2083
## sample estimates:
## Variance : mean ratio
## 1.11274
This way of working is not limited to using standard R
classes and functions.
R
also allows us to define our own classes. We will explore this by defining functions and a new class to deal
with two-by-two tables.
We need to create two functions:
One function will handle the calculations.
A second function function will produce formatted output when required.
Create a new function using the function()
function:
<- function(exposure, outcome) {} rr22
And start the function editor:
fix(rr22)
Now edit this function to make it do something useful:
function(exposure, outcome) {
<- table(exposure, outcome)
tab <- tab[1,1]
a <- tab[1,2]
b <- tab[2,1]
c <- tab[2,2]
d <- (a / (a + b)) / (c / (c + d))
rr <- sqrt((b / a) / (a + b) + (d / c) / (c + d))
se.log.rr <- exp(log(rr) - 1.96 * se.log.rr)
lci <- exp(log(rr) + 1.96 * se.log.rr)
uci <- list(estimate = rr, ci = c(lci, uci))
rr22.output class(rr22.output) <- "rr22"
return(rr22.output)
}
Once you have made the changes shown above, save the file and quit the editor.
The rr22()
function is similar to the tab2by2()
function that you created in the second exercise of this
tutorial except that the function now returns a list of values instead of formatted output:
<- read.table("fem.dat", header = TRUE)
fem attach(fem)
<- rr22(SEX, LIFE)
rr22.test names(rr22.test)
$estimate
rr22.test$conf.int
rr22.test$conf.int[1]
rr22.test$conf.int[2] rr22.test
## The following objects are masked from fem (pos = 3):
##
## AGE, ANX, DEP, ID, IQ, LIFE, SEX, SLP, WT
## The following objects are masked from fem (pos = 8):
##
## AGE, ANX, DEP, ID, IQ, LIFE, SEX, SLP, WT
## [1] "estimate" "ci"
## [1] 2.054167
## NULL
## NULL
## NULL
The function returns a list of class rr22
:
class(rr22.test)
## [1] "rr22"
The displayed output from the rr22()
function is, however, not pretty:
print(rr22.test)
rr22(SEX, LIFE)
## $estimate
## [1] 2.054167
##
## $ci
## [1] 0.966417 4.366232
##
## attr(,"class")
## [1] "rr22"
## $estimate
## [1] 2.054167
##
## $ci
## [1] 0.966417 4.366232
##
## attr(,"class")
## [1] "rr22"
This can be fixed by creating a new function:
<- function(x) {} print.rr22
And start the function editor:
fix(print.rr22)
Now edit this function to make it do something useful:
function(x) {
cat("RR : ", x$estimate, "\n",
"95% CI : ", x$ci[1], "; ", x$ci[2], "\n", sep = "")
}
Once you have made the changes shown above, check your work, save the file, and quit the editor.
The function name print.rr22()
indicates that this function contains the print method for objects of class rr22
. All objects of class rr22
will use the function print.rr22()
instead of the standard R
print()
function to produce formatted output:
rr22(SEX, LIFE)
<- rr22(SEX, LIFE)
rr22.test
rr22.testprint(rr22.test)
## RR : 2.054167
## 95% CI : 0.966417; 4.366232
## RR : 2.054167
## 95% CI : 0.966417; 4.366232
## RR : 2.054167
## 95% CI : 0.966417; 4.366232
Note that we can still extract returned values from an rr22
class object:
$estimate rr22.test
The print.rr22()
function only controls the way an entire rr22
object is displayed.
You might like to use the save()
function to save the v2m.test()
, rr22()
, and print.rr22()
functions before quitting R
. We can now quit R
:
q()
For this exercise there is no need to save the workspace image so click the No or Don’t Save button (GUI) or enter n
when prompted to save the workspace image (terminal).
6.1 Summary
R
objects can be assigned a class or type.Objects of a specific class or type may share functions that extract and manipulate data common to members of that class. This allows you to write functions that handle data that is common to all members of that class (e.g. to produce formatted output for hypothesis testing functions).
R
provides a set of ready-made classes (e.g.htest
) which can be used by standard R functions such as theprint()
andsummary()
functions.R
allows you to create new classes and class-specific functions that can extract and manipulate data common to the new classes.Classes allows you to create versatile functions that return values when you need them to return values but can also produce formatted output when you need them to produce formatted output.
Classes allow you to write functions that can be chained together so that the output of one function is the input of another function.