10 min read

R Vocabulary - Part 2

This is the second part of the series of articles on R vocabulary. In this series, we explore most of the functions mentioned in Chapter 2 of the book Advanced R. The first part of the series can be read here.

The keyword function is used to define what is technically a closure in R. It has three components - it’s formals (arguments), the body of the function and the enviroment. A closure returns the value of the last expression which is evaluated in it’s body. A function can also return a value using the return keyword. This need not be at the end of the function.

f <- function(x, y) x + y + 1
formals(f)
## $x
## 
## 
## $y
f(3.2, 1.7)
## [1] 5.9
g <- function(x, y) {
  if (x > y) {
    return("greater")
  } else {
    return("less than or equal to")
  }
}
g(3.2, 1.7)
## [1] "greater"
g(1, 5)
## [1] "less than or equal to"

The function missing can be used test whether a value was specified as an argument to a function. The function on.exit can be used to store an expression which needs to be executed when the function exits. This is useful to perform any kind of clean up actions or restore global options when the function exits.

incrementx <- function(x, y) {
  
  on.exit(print("I am exiting"))
  
  if (missing(y)) {
    y <- 1
  }
  x + y
}
incrementx(2)
## [1] "I am exiting"
## [1] 3
incrementx(2, 3)
## [1] "I am exiting"
## [1] 5

The function invisible is used to return a value which can be assigned to another variable, but which does not print if not assigned.

f <- function(n) {
  invisible(rnorm(n) * rnorm(n))
}
f(100)
x <- f(100)
str(x)
##  num [1:100] 0.125 0.086 0.474 1.036 -0.931 ...

The logical operators are ! (not), & (and), &&, | (or), || and xor. The & operator works similarly to arithmetic operators and does an element-wise comparision on vectors. The && operator examines only the first element of each vector and is most appropriately used in if clauses. all and any checks whether all of the values or any of the values in a logical vector true respectively.

x <- sample(c(TRUE, FALSE), 5, replace = TRUE)
y <- sample(c(TRUE, FALSE), 5, replace = TRUE)
x
## [1]  TRUE  TRUE  TRUE  TRUE FALSE
y
## [1] FALSE FALSE FALSE FALSE FALSE
x & y
## [1] FALSE FALSE FALSE FALSE FALSE
x && y
## [1] FALSE
x | y
## [1]  TRUE  TRUE  TRUE  TRUE FALSE
x || y
## [1] TRUE
!x
## [1] FALSE FALSE FALSE FALSE  TRUE
xor(x, y)
## [1]  TRUE  TRUE  TRUE  TRUE FALSE
all(x)
## [1] FALSE
any(x)
## [1] TRUE
all(c(TRUE, NA))
## [1] NA
all(c(TRUE, NA), na.rm = TRUE)
## [1] TRUE

intersect, union, setdiff, setequal and is.element together forms the set operations functions. As they are set operations, they discard any duplicate values.

x <- c(1, seq(1, 5, 1))
y <- seq(3, 10, 2)
x
## [1] 1 1 2 3 4 5
y
## [1] 3 5 7 9
union(x, y) # Note that the duplicate 1 is discarded
## [1] 1 2 3 4 5 7 9
intersect(x, y)
## [1] 3 5
setdiff(x, y)
## [1] 1 2 4
setequal(x, y)
## [1] FALSE
setequal(c(1, 2), c(2, 1))
## [1] TRUE
is.element(1, x)
## [1] TRUE
is.element(1, y)
## [1] FALSE

which takes a condition and returns the indices where the condition is true.

x <- 1:10
which(x > 5)
## [1]  6  7  8  9 10
x <- array(1:20, dim = c(2, 2, 5))
which(x > 18, arr.ind = TRUE)
##      dim1 dim2 dim3
## [1,]    1    2    5
## [2,]    2    2    5

Next we have functions which primarily operate on vectors and matrices, and are sometimes applicable to data frames. length can also be used on the left hand side of the assignment operator to either truncate or lengthen a vector.

x <- rnorm(10)
length(x)
## [1] 10
dim(x)
## NULL
length(cars) # returns the numbers of columns of the data frame cars
## [1] 2
nrow(cars)
## [1] 50
ncol(cars)
## [1] 2
dim(cars)
## [1] 50  2
x <- 1:5
length(x) <- 3
x
## [1] 1 2 3
length(x) <- 5
x
## [1]  1  2  3 NA NA

cbind and rbind are used to combine R objects by columns or rows.

d1 <- data.frame(x = rnorm(5))
d2 <- data.frame(y = rnorm(5))
d3 <- data.frame(x = rnorm(3))
cbind(d1, d2)
##            x          y
## 1 -1.2743918 -1.2858313
## 2 -0.5634582  0.1697238
## 3  1.3310640  0.1051012
## 4  1.5500924 -0.5670996
## 5  0.5549560  0.6578452
rbind(d1, d3)
##            x
## 1 -1.2743918
## 2 -0.5634582
## 3  1.3310640
## 4  1.5500924
## 5  0.5549560
## 6 -0.1633991
## 7  0.5065965
## 8  1.9580916
rbind(d2, d3)
## Error in match.names(clabs, names(xi)): names do not match previous names

The function names is used to retrive the names of an object. The functions rownames and colnames are used to retrieve the row or columns of a object like a data frame or a matrix. They can also be used to assign names of an object.

cars_mdl <- lm(speed ~ dist, data = cars)
names(cars_mdl)
##  [1] "coefficients"  "residuals"     "effects"       "rank"         
##  [5] "fitted.values" "assign"        "qr"            "df.residual"  
##  [9] "xlevels"       "call"          "terms"         "model"
colnames(cars)
## [1] "speed" "dist"
rownames(cars)
##  [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11" "12" "13" "14"
## [15] "15" "16" "17" "18" "19" "20" "21" "22" "23" "24" "25" "26" "27" "28"
## [29] "29" "30" "31" "32" "33" "34" "35" "36" "37" "38" "39" "40" "41" "42"
## [43] "43" "44" "45" "46" "47" "48" "49" "50"
d <- data.frame(x = rnorm(5))
rownames(d) <- c("A", "B", "C", "D", "E")
d
##            x
## A -1.3864713
## B -1.6658348
## C  0.6022367
## D  0.3243157
## E -0.8350832

t calculates the transpose of a matrix. diag can be used to retrieve the diagonal elements of a matrix, construct a diagonal matrix from a vector or even replace the diagonal elements of a matrix.

m <- matrix(1:4, 2, 2)
m
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4
t(m)
##      [,1] [,2]
## [1,]    1    2
## [2,]    3    4
diag(m)
## [1] 1 4
diag(m) <- c(7, 8)
m
##      [,1] [,2]
## [1,]    7    3
## [2,]    2    8
diag(c(1, 2, 3), 3, 3)
##      [,1] [,2] [,3]
## [1,]    1    0    0
## [2,]    0    2    0
## [3,]    0    0    3

data.matrix is used to convert all variables in a data frame to numbers and return it as a matrix. Factors are replaced with their numeric codes.

d <- data.frame(x = 1:2, y = c("a", "b"))
str(d)
## 'data.frame':    2 obs. of  2 variables:
##  $ x: int  1 2
##  $ y: Factor w/ 2 levels "a","b": 1 2
data.matrix(d)
##      x y
## [1,] 1 1
## [2,] 2 2

Next we look at a set of functions whose output is typically a vector. rep and rep_len are used to replicate the elements of a vector. seq, seq_along and seq_len are used to create sequences. rev is used to reverse the elements of a vector.

rep(c(1, 2), each = 2)
## [1] 1 1 2 2
rep_len(3, length.out = 5)
## [1] 3 3 3 3 3
seq(1, 5)
## [1] 1 2 3 4 5
seq(1, 5, by = 2)
## [1] 1 3 5
seq_len(7)
## [1] 1 2 3 4 5 6 7
seq_along(letters)
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
## [24] 24 25 26
rev(letters)
##  [1] "z" "y" "x" "w" "v" "u" "t" "s" "r" "q" "p" "o" "n" "m" "l" "k" "j"
## [18] "i" "h" "g" "f" "e" "d" "c" "b" "a"

We have used the function sample in quite a few of the examples in this series of articles. It is used to sample a vector for a specified number of elements, with or without replacement.

sample(letters, 3)
## [1] "m" "x" "u"
sample(1:3, 10, replace = TRUE)
##  [1] 1 2 2 2 2 2 3 3 1 2

The is.<> and as.<> functions can be used to test whether a vector belongs to a particular type and coerce the vector to a particular type respectively.

x <- c(TRUE, FALSE, TRUE)
is.numeric(x)
## [1] FALSE
is.logical(x)
## [1] TRUE
as.numeric(x)
## [1] 1 0 1

We now look at a few functions which operate primarily on lists and data frames. We have already looked at list to create lists. The function unlist can be used to simplify a list into a vector. In the example below, note how the first call results in a numeric vector while the second call results in a character vector.

l <- list(x = 1, y = 2)
unlist(l)
## x y 
## 1 2
l <- list(x = 1, y = 2, z = "a")
unlist(l)
##   x   y   z 
## "1" "2" "a"

data.frame is used to create a new data frame. Note that under default options, character variables are automatically coverted to factors. as.data.frame is used to coerce an object to a data frame, if possible.

d <- data.frame(x = c(1, 2), y = c("a", "b"))
str(d)
## 'data.frame':    2 obs. of  2 variables:
##  $ x: num  1 2
##  $ y: Factor w/ 2 levels "a","b": 1 2
l <- list(x = c(1, 2), y = c(3, 4), z = "a")
as.data.frame(l)
##   x y z
## 1 1 3 a
## 2 2 4 a

split is useful to divide a data frame by groups of a particular variable. A function is typically applied to the resulting list to calculate the results by each group. In the example below, we first split the data frame d into three different groups and then calculate the mean y for each group.

d <- data.frame(x = sample(letters[1:3], 10, replace = TRUE),
                y = rnorm(10))
s <- split(d, d$x)
s
## $a
##    x        y
## 9  a 1.173482
## 10 a 1.790276
## 
## $b
##   x          y
## 1 b -0.7675610
## 2 b  0.3367396
## 3 b  0.4592502
## 4 b  0.1788512
## 6 b -0.5316457
## 
## $c
##   x          y
## 5 c 0.26159210
## 7 c 0.65306719
## 8 c 0.05526093
sapply(s, function(df) mean(df$y))
##           a           b           c 
##  1.48187892 -0.06487315  0.32330674

expand.grid is useful to create a data frame using all combinations of the vectors provided as arguments.

x <- c("a", "b")
y <- c("p", "q", "r")
z <- c("m", "n")
expand.grid(x, y, z)
##    Var1 Var2 Var3
## 1     a    p    m
## 2     b    p    m
## 3     a    q    m
## 4     b    q    m
## 5     a    r    m
## 6     b    r    m
## 7     a    p    n
## 8     b    p    n
## 9     a    q    n
## 10    b    q    n
## 11    a    r    n
## 12    b    r    n

We will not be looking at details of the control flow operations in this article. These include if, for, while, next, break, switch and ifelse. These are primarily used to implement loops and execute different code based on different conditions.

The apply functions are explained in great detail in the chapter on ‘Functionals’ in the same book, and we will not look at them here. It is also recommended that you look at the apply functions in the plyr package, which provides a consistent interface between different types of objects (lists, arrays and data frames).