<

R Basics

Overview

In this lesson we learn how to install R and understand its basic philosophy.

Objectives

After completing this lesson, students should be able to:

  1. Install and run R and RStudio.
  2. Understand and use the various RStudio panes.
  3. Set the working directory and RStudio preferences.

What is R?

  • R is what’s known as a statistical programming language. As the name suggests, it can do a lot of things: programming, statistics, data storage and manipulation, graphics, etc. It can even scrape websites or Twitter, interact with databases, run other languages within it or be run by other languages, and produce pretty documents in PDF or html formats – such as these very slides, which were written (in part) in R.

  • Fundamentally, R is a command based language – that is, it takes in lines of text as input, and spits out formatted text as output. While it can also do lots of other things (eg, reading in data or outputting graphs), the text interface is fundamental. Thus doing data analysis in R fundamentally takes place in a “console”: you write commands to R, and it spits out replies.

  • Generally, one uses a graphical user interface (GUI) of some sort to handle this interaction with R and to organize your workplace. The GUI we will be using in this course is RStudio, which is an additional program separate from R itself, but which runs R in the background and helps the user interact with R, with data sets, and with R’s outputs.

Installing R and RStudio

Before going into further detail about how R and RStudio work, it is best to install both. It will be essential throughout these lectures to have a copy of RStudio running at all times, to try out for yourself the various bits of code we will be exploring.

First we install R, and then RStudio.

  1. To install R, go here and follow the instructions:

https://rweb.crmda.ku.edu/cran/

For Windows you want the base install; for Mac the first pkg file should be fine.

  1. To install RStudio, go here and follow the instructions:

http://www.rstudio.com/products/rstudio/download/

In both cases, installation should work just like installing any other application. Please stick to the default installation locations and settings to avoid mishaps.

Using RStudio

Once R and RStudio are installed, you shouldn’t need to worry about running R directly – RStudio will take care of that. So we just need to launch RStudio, and familiarize ourselves with it.

When you launch RStudio, you will see a large window with four “panes”. Each of these panes has tabs at the top, each with a different view.

Console

The most important pane is the “Console” – this is where the main interaction with R occurs. If you type 2+3 at the > prompt and hit return, R should immediately return 5.

>2+3
[1] 5

You could do all of your R work just in the Console window, but this is a bad idea because it is difficult to retain a record of what you’ve done.

Source pane

Instead, it is better to write and save all your commands in a text file, and execute lines of commands as needed. The Source pane is just a very basic text editor. To start a new file, go to File -> New File -> R script. This R script is just a plain text file where you can write whatever you want.

To run the line (i.e., send it to the Console) that the cursor is currently on, you can hit the “Run” button in the upper right of the Source pane. You can also go to Code -> Run Line(s). As a shortcut, you can also just hit command-return (Mac) to execute that line. To execute multiple lines, just highlight those lines first before doing the above.

As you can see in the Code -> Run Region menu, you can also run everything up until the current line, or after it, or other variants.

To add a comment line that doesn’t get run in R (such as notes to yourself), just preface each line you don’t want to run with a #.

# What is 2+3?
2+3

The Working Directory

To save an R script or to open a data file, you need to tell R where to save or open it. You can do this with the GUI in RStudio each time, or you can run an R command to set this “Working Directory” location. Being able to do this from within your R script is important because you might want to save or open files in various locations.

To set the working directory, you issue something like the following command:

setwd("F:/2. NEU/Courses/PPUA 5301 - Computational Statistics")

If you are unfamiliar with how Windows or Mac machines encode directory information, you can use RStudio to set the working directory via Session -> Set Working Directory -> Choose Directory. This will let you pick your folder in the usual graphical way, but will also print the R command in the Console, which you can then copy to your R script and paste in at the top so that every time that script is run, it first sets R to the correct working directory.

(Note that RStudio does almost everything via sending commands to the Console. The RStudio GUI is basically just a set of shortcuts for sending text commands to the R Console.)

RStudio

Other important RStudio panes include “History,” “Plot,” “Files,” “Help,” “Environment,” and “Packages.”

History shows a list of every command you’ve issued R via the Console. To reissue a command, you can double-click it and hit return. You can also paste it into your Source file using the button in the upper left of the History pane.

Plot shows the results of graphs and other plots in R. We will return to this in Module 2.

Files shows all the files in the current Working Directory.

Help shows the outputs from issuing help commands. For instance, if you want the documentation on a function like mean you would type help(mean) or ?mean and the documentation appears in the Help window. We will return to this in Module 2.

Environment shows all the data currently loaded into R: both their names and (where possible) their values. We’ll return to variables in Module 1.3.

Packages shows all the packages you’ve installed. Packages are specialized R functions that don’t come bundled with R but are installed by the user. For instance, if you want to import STATA data files in .dta format, you would install the foreign package and load it, which loads the functions needed to read these files. We will return to Packages in Module 2.

Preferences

RStudio has various preferences that allow you to customize the layout and appearance of these panes.

In particular, the “Appearance” tab in the Preferences allows you to set how code is colored in the Source window; these colors help the eye pick out names, functions, commands, etc, but different people have difference color preferences.

The “Panes” tab lets you arrange the panes in RStudio. I personally prefer the Source to be in the upper right and the Console in the upper left. The window icons in the upper-right-most corner of each pane allow you to shrink that pane down to a mere title-bar; thus most of the time I have the bottom left and bottom right panes shrunk, and the RStudio screen is thus mostly just the Source on the right and the Console on the left. But that’s just personal preference. Additionally, you can

Help with R

Help within R:

help(mean)
# or
?mean

Help outside R:

  • Google
  • R Tutorials online
  • Stackoverflow.com (among others)
  • Books (often helpful for in-depth explanations)

Tips:

  • Don’t worry: It can often be an exercise in frustration to try to figure out what the function name for something should be (eg, calculating the standard deviation), how its options work, and why it’s not working for you – despite the fact that you are surely doing everything correctly!

  • Sometimes you will encounter an unhelpful answer from a testy statistician; the best strategy is to not waste too much time but move quickly on to a better answer. The internet is filled with many of them.

RMarkdown

Generating PDFs When you create a new RMarkdown document, RStudio asks you what format you want your final output to be in: PDF, Word, or HTML. Unlike HTML or Word formats, generating PDFs with RMarkdown requires one additional step. You have a few different options for getting PDFs to work in RStudio:

  1. To generate PDFs directly with RStudio, you first need to install another program called “Latex.” On a Mac, go here https://tug.org/mactex/ to install MacTex; on a Windows machine go here to install Miktex http://miktex.org/download. Once installed, this allows you to knit your .Rmd directly into PDF format. This method is strongly preferred.

  2. If you have trouble installing Latex or it doesn’t work, you can knit the .Rmd as HTML, and then open the HTML in your browswer and print the page to a PDF. All Macs can do this, and in Windows, the Chrome browser has this ability built in. Note that this should be a stop-gap in case you can’t get method 1 to work immediately, but (1) should be your main approach once you have it up and running.

  3. A final fallback option, similar to (2), is to “knit” the .Rmd as a Word document, and then open it in Word and Save As a PDF. This too should only be a stop-gap until you get Latex working and directly knit to PDF.

Cheatsheet: https://www.rstudio.com/wp-content/uploads/2016/03/rmarkdown-cheatsheet-2.0.pdf

Writing equations

Name Symbol RMarkdown
alpha \(\alpha\) \alpha
beta \(\beta\) \beta
gamma \(\gamma\) \gamma
delta \(\delta\) \delta
epsilon \(\epsilon\) \epsilon
(epsilon) \(\varepsilon\) \varepsilon
eta \(\eta\) \eta
theta \(\theta\) \theta
iota \(\iota\) \iota
kappa \(\kappa\) \kappa
lambda \(\lambda\) \lambda
mu \(\mu\) \mu
nu \(\nu\) \nu
xi \(\xi\) \xi
omicron \(\omicron\) \omicron
pi \(\pi\) \pi
rho \(\rho\) \rho
sigma \(\sigma\) \sigma
tau \(\tau\) \tau
upsilon \(\upsilon\) \upsilon
phi \(\phi\) \phi
chi \(\chi\) \chi
psi \(\psi\) \psi