<

Programming and scripts

Overview

In this lesson we learn R programming tools such as “if”, “for” and “apply”.

Objectives

After completing this lesson, students should be able to:

  1. Create and use “if” conditionals.
  2. Create and use “for” loops.
  3. Apply “apply” for faster vector operations.

References

This is a soruce for you to review and practice your scripting/programming knowledge of R: https://www.datacamp.com/community/tutorials/tutorial-on-loops-in-r

Programming

The other important tools for writing functions and manipulating data are control statements and loops. This too will be familiar to those with some programming background.

For those with no programming experience, the key idea is that a program or script is executed by R one line at a time, often with generous use of curly brackets { } and parentheses to let the computer know where chunks start and end.

Basically, programming in R is like learning a new language though there are many different languages to pick from.

Key tools

This are the key scripting/programming statements that will allow you to write nuance and complex functions:

  • If
  • Logical operators
  • Else
  • Loops: For and While
  • Apply

If

We have seen conditional statements implicitly before, but if allows us more control:

if(3 < 4){
  print("yup")
}
[1] "yup"

Again, like all scripts, this is executed by R like we read it – from the top down, one line at a time. If takes a truth condition as its input, and executes the stuff in the brackets {} if the input truth condition is TRUE.

Here’s another example:

if(3 == 4){
  print("yup")
}

Note the lack of output now. This is because 3==4 is a logical test that returns FALSE, when encountering a FALSE input, if() will ignore the code within curly brackets. When encountering a TRUE input, if() will execute the code within curly brackets.

Logical operators

  • x < y, TRUE if x is less than y

  • x <= y, TRUE if x is less than or equal to y

  • x == y, TRUE if x equals y

  • x != y, TRUE if x does not equal y

  • x >= y, TRUE if x is greater than or equal to y

  • x > y, TRUE if x is greater than y

  • x %in% c(a, b, c), TRUE if x is in the vector c(a, b, c)

For example:

if((3 < 4) & (3 != 2)){
  print("yup")
}
[1] "yup"

Note the use of parentheses to make sure the logic is clear; you can sometimes get away with less parentheses, but it’s good practice to be explicit.

Else

If can also do alternative actions when the input is false, via else. Here is an illustration in a function using if and else:

isitthree <- function(x){
  if(x == 3){
    return(TRUE)
  }else{
    return(FALSE)
  } 
}
isitthree(3)
[1] TRUE
#Or: 

isitthree(2+2)
[1] FALSE

You can also user a shorter function that combines them into ifelse:

isitless <- ifelse(3<4,1,0)
isitless
[1] 1

where the first argument is the test, the second is the output for if it is true, and the third is the output for if the test is false.

Loops: For

R is notoriously slow for scripts that repeated loop through data, but often speed is not an issue or you just need to write a loop to generate or manipulate your dataset.

The most common loop function is for:

for(i in 1:3){
  print(2*i)
}
[1] 2
[1] 4
[1] 6

Here i, like in a function, is a local variable, used within the loop only. After the “in” comes a vector (eg, 1:3); it can be any vector, including a column of data.

for(j in c("frog","dog")){
  print(j)
}
[1] "frog"
[1] "dog"

You can set the for loop in a function:

#Or:
doubleit <- function(x){
  for(i in length(x)){
    print(2*i)
    }
}

x <- (1:5)
doubleit(x)
## [1] 10
#Or (spot the difference):
doubleit <- function(x){
  for(i in x){
    print(2*i)
    }
}

x <- (1:5)
doubleit(x)
## [1] 2
## [1] 4
## [1] 6
## [1] 8
## [1] 10

Loops: While

While is another loop function, one that’s pretty self-explanatory:

i <- 1
while(i <= 2){
  print(3*i)
  i <- i + 1
}
[1] 3
[1] 6

What happens if we run the following:

#i <- 1
#while(i == i){
#  print(i)
#}

So, be careful with while loops: if you set it up wrong it can possibly run forever if the truth condition is never reached. If possible, better to use a for loop, and be clever with your “in” vector if need be. One option is to use a break which interrupts the loop if some condition is met partway:

for(i in 1:3){
  if(i == 2){
    break
  }
  print(i)
}
[1] 1

Apply

A faster way to step through and manipulate a vector or list of data is using R’s various apply functions, which like loops will apply any function iteratively to a set of data. Apply is in general much faster than a for loop, but it can be trickier to conceptualize how best to use it, and sometimes for loops, though slower, are easier and more flexible.

Apply is best suited to matrices or data frames, and is generally used with functions that take as their input rows or columns. For example, instead of writing a for loop to step through each row or column, you can use apply to apply a function to all of those rows or columns at once.

Apply to take the mean

Consider the following matrix:

m <- matrix(1:6,nrow=2,ncol=3)
m
     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6

Say we want the mean of each column or row (eg, we might want the mean of each variable in a dataset). We could do this via:

# Mean of each column: 
apply(m,2,mean)
[1] 1.5 3.5 5.5
# Mean of each row: 
apply(m,1,mean)
[1] 3 4

where m is the input data, 2 means we apply the function over columns (1 is rows), and mean is the function we are applying. We can use any function we want here, including those of our own creation:

apply(m,2,doubleit)
     [,1] [,2] [,3]
[1,]    2    6   10
[2,]    4    8   12

Apply can also be applied to each element of the data individually, ie over both rows and columns with the setting 1:2:

apply(m,1:2,doubleit)
     [,1] [,2] [,3]
[1,]    2    6   10
[2,]    4    8   12

There are many other apply functions designed for different input data.

Lapply, sapply, vapply

lapply applies a function to a list or vector x.

n <- (1:5)
lapplydoubleit <- lapply(n, doubleit)
lapplydoubleit 
[[1]]
[1] 2

[[2]]
[1] 4

[[3]]
[1] 6

[[4]]
[1] 8

[[5]]
[1] 10

Note: lapply always returns a list of the same length as x.

sapply is the user-friendly version of lapply because it returns a vector or matrix by default.

sapplydoubleit <- sapply(n, doubleit, simplify=TRUE)
sapplydoubleit
## [1]  2  4  6  8 10

Other ways to Apply

Of course, R also has shortcuts for such common things as calculating row or column means:

colMeans(m)
[1] 1.5 3.5 5.5

And R also has base functions for getting more detailed summaries of variables:

summary(m)
       V1             V2             V3      
 Min.   :1.00   Min.   :3.00   Min.   :5.00  
 1st Qu.:1.25   1st Qu.:3.25   1st Qu.:5.25  
 Median :1.50   Median :3.50   Median :5.50  
 Mean   :1.50   Mean   :3.50   Mean   :5.50  
 3rd Qu.:1.75   3rd Qu.:3.75   3rd Qu.:5.75  
 Max.   :2.00   Max.   :4.00   Max.   :6.00  

Exercise

Write a function that

  • Calculates and returns the sum of any vector with 30 numbers;
  • Print the elements of a string vector until they encounter a string that starts with the letter m;
Show example