19 R Glossary

19.1 Arguments

Inputs supplied to a function. Arguments can be either defined by their position in the function declaration (“positional argument”; e.g., function (x, y, z = 0): the first value supplied to the function will be given the name x, and the second one the name y) or by a tag / name (e.g., z in function (x, y, z = 0): in the function call, the value will have to be explicitly linked to z, and if no value is linked to z, z will default to 0).

19.2 Block

A sequence of statements, grouped between curly braces.

19.3

Console pane

19.4 CRAN

Comprehensive R Archive Network. The official repository where the versions of R and all published packages can be downloaded from. Use install.packages("package_name") to download and install a new package from CRAN.

19.5 Environment Pane

19.6 Evaluate

19.7 Expression

R code consists of expressions. There are multiple types of expressions: - constant (that is, a character string or a number). E.g.,

> 123
[1] 123
> "abc"
[1] abc
  • operator expression (that is, every expression that contains one of R’s operators):
> 2 + 3
[1] 5
> a <- "abc"
[1]
abc>
2 > 3
[1]
FALSE
  • index constructions (that is, extracting elements from a vector or list using numerical or name indices):
> c(1, 2, 3)[-1]
[1] 2 3
> fruit <- list(apple = 5, pear = 2)
> fruit[["apple"]][1]
5
> fruit$pear[1]
2
  • flow control element: loops, conditional expressions,…
> if (x %% 2 == 1) print("odd") else print("even")
> for (i in 1:10) print(i)
  • compound expression (“block”): a series of expressions grouped between curly braces and separated by semicolons or new lines:
> {x <- 1; x += 5}
> {
    x <- 1
    x += 5
  }
  • function definition:
> function(x, y) x + y
> function(x,y) {
    x + y
    }
  • function call:
> print(5)
[1]
5

19.8 File pane

19.9 Function:

A function is a (sequence of) statements that is not evaluated immediately when it is “declared” (that is, created/written), but only when it is “called” / “invoked” (that is, when you tell R to evaluate it). Functions have their own environment of variables; if you assign a value to a symbol that is already used in another environment, the variable in that other environment will be left untouched.

When you call a function, you can pass values to that function through its list of arguments, which is a comma-separated list between brackets following the function’s name.

In order to “declare” (create) a function, the keyword function is needed, followed by the list of arguments (between parentheses) and the body of the function (usually between curly braces). Functions are usually assigned to a variable name (functions that are not linked to a name are called anonymous functions).

Functions in R are objects, which mean they can be passed to other functions as arguments, placed in lists, etc.

19.10 Global variable:

A variable defined outside the scope of a function.

19.11 Import:

see Load.

19.12 Load:

Bring data or packages from the computer’s (passive) storage into its (active) memory so they can be used in the current R session. Loading a package adds the names of functions and other objects from that package to R’s namespace. Use library(“package_name”) to import/load a package into R for the current session (in a script, group all these imports at the top of a script).

19.13 Local variable

A variable defined inside a function. Outside that function, the variable’s name will not be bound to the value it was bound to within the function’s scope.

19.14 Package:

A collection of functions and datasets developed by a user to extend R. Packages can be published on CRAN, or distributed on GitHub or other repositories.

If you want to use a package, it must first be installed and then be loaded into your current R session. Installation must be done only once, loading every time you start a new R session.

  • Use install.packages("package_name") to download and install a new package from CRAN.
  • Use library("package_name") to import/load an installed package into R for the current session (in a script, group all imports at the top of a script).
  • Use installed.packages() to get a list of all packages installed for your R version.

19.15 R:

A computer language developed for statistical analysis and graphics.

19.16 RStudio:

An Integrated Development Environment (IDE) for writing, executing and debugging R code. R comes with its own IDE but most users prefer RStudio for its user-friendliness.

19.17 Pane:

The Rstudio window is divided into four quadrants called panes: the Source pane, the Console pane, the Environment Pane and the File pane. The first two are for writing code, the latter two contain a number of tabs with useful resources.

You can minimise and maximise the size of each pane by using the icons in the top right of every pane.

To switch the order of the panes, use RStudio’s “Pane layout” dialog, which you can find in the Options dialog (Tools > Options; RStudio > Preferences on Mac).

19.18 Prompt:

19.19 Run:

Execute R code.

19.20 Scope:

Symbols in R are bound to a specific value only within a specific environment or “scope”. E.g., the symbol of a variable defined within a function will be bound to that variable’s value only within that function (such a variable is called a local variable); outside of the function, it will be considered unbound (that is, not connected to a value):

> f <- function (x) {
    doubled = x * 2
    }
> doubled
Error: Object 'doubled' not found.

Variables declared outside a function are considered global variables.

If during evaluation of a function a symbol is encountered that is not in the local environment (that is, a symbol that was not in that function’s list of arguments and that was not defined inside the function), R will search for this symbol in the environment from which the function was called, and so on until the global environment is reached.

19.21 Session:

19.22 Source pane:

19.23 Symbol:

The name component in a variable: a variable is a value assigned to a symbol. E.g.,

x <- 3

(x is the symbol/name of the variable, 3 its value).

19.24 Variable:

A symbol (name) linked with a value that this symbol represents. Linking a symbol with a value is called assigning; this is done using an assignment operator (<-, ->, =).

19.25 Vector:

A vector is the main data structure in R: vectors are collections of data. Usually, the term vector is used as shorthand for a specific type of vector in R, so-called atomic vectors; they are called like that because every element in an atomic vector is of the same data type.

There are 5 “modes” of atomic vectors, based on the data type of its elements; only the first three are directly relevant for us:

  • character vector: all elements are text strings (data type: character).

    > v <- c("a", "100", "ألف", "vector elements can be very long strings")
    > typeof(v)
    [1] "character"
    > mode(v)
    [1] "character"
  • numeric vector: all elements are of the integer type (whole numbers, both positive and negative: 1, 2, -137, …), or of the double type (“double precision floating point numbers”: 1.2345, -125.8, pi, …) E.g.,

    > v <- c(1, -300, 18.5, pi)
    > v
    [1]   1.000000 300.000000  18.500000   3.141593
    > typeof(v)
    [1] "double"
    > mode(v)
    [1] "numeric"
  • logical vector: all elements are one of the boolean values TRUE or FALSE:

    > v <- c(TRUE, TRUE, FALSE, TRUE)
    > typeof(v)
    [1] "logical"
    > mode(v)
    [1] "logical"
  • complex vector: all elements are complex numbers (numbers that have a real and an imaginary part)

  • raw vector: all elements are raw byte objects

The c() function is often used to create vectors with multiple elements. But even if you assign a single string or number to a variable, the variable will be a vector:

> a <- "This is a string"
> class(a)
[1] "character"  # a is a character vector!
> a[1]
[1] "This is a string"  # our string is the first (and only) element of that vector!

Vectors have no dimensions (vs. for example tables, which have 2 dimensions: rows and columns).

19.26 relevant R functions

19.26.1 class()

The class function is used to display the class of an R object.

The function has one argument: the object you want to know the class of.

> a = 15
> class(a)
[1] "numeric"
> b = "15"
> class(b)
[1] "character"
> class(class)
[1] "function"

19.26.2 ls()

The ls function (for “list”) lists all the objects we have created in the current R session. You will find the same information in the Environment tab in RStudio.

The ls() function does not require any arguments

> ls()
[1] "a" "b"

19.26.3 c()

The c() function (for “combine”) combines multiple values into a single vector object.

> character_vector <- c('a', 'b', 'c')
> character_vector
[1] "a" "b" "c"

> numeric_vector <- c(1,2,3)
> numeric_vector 
[1] 1 2 3

> logical_vector <- numeric_vector >= 2
> logical_vector
[1] FALSE  TRUE  TRUE

Note that all objects inside a vector must be of the same type (character/numberic/logical). If they are of different types, R will “coerce” them into the same type.

> mixed_vector <- c(1, "2", "three", TRUE)
> mixed_vector
[1] "1"     "2"     "three"    "TRUE"  # R has converted all elements into strings!

19.26.4 length()

The length() function will display the number of elements in a vector.

> my_vector <- c("a", "bb", "ccc")
> length(my_vector)
[1] 3
> length("A longer character string")
[1] 1

19.26.5 paste()

The paste() function concatenates two or more character vectors. By default, it will add a space between two strings:

> paste('a', "b")
[1] "a b"
> paste('a', "b", "c")
[1] "a b c"

If you want another character to be used to separate the two strings, the function provides an additional argument called “sep”:

> paste('a', "b", sep=",")
[1] "a,b"
> paste(c('a', 'b', 'c'), "d", sep='/')
[1] "a/d" "b/d" "c/d"

19.26.6 nchar()

The nchar() function (for “number of characters”) returns the number of characters in each string in a character vector.

> nchar("banana")
[1] 6
> test_vector <- c("apple", "pear", "banana")
> nchar(test_vector)
[1] 5 4 6

19.26.7 substr()

The substr function returns substrings of character vectors using character offsets of each string in the vector. The function takes three arguments: * the character vector from which you want to extract a substring * start: the index of the substring inside each string inside the vector * stop: the last character of the substring inside each string inside the vector

> substr("Banana", start=2, stop=5)
[1] "anan"
> test_vector <- c("apple", "pear", "banana")
> substr(test_vector, start=1, stop=3)
[1] "app" "pea" "ban"

NB: note that in R (in contrast to many other programming languages) the first index of an object is 1, not 0; and that the stop index is inclusive (e.g., if stop is set to 5, the substring will end after the fifth value, not before it).

If a string inside a character vector is shorter than the stop value, the substr function will return the string from the start value until its last character:

> test_vector <- c("apple", "pear", "banana")
> substr(test_vector, start=1, stop=5)
[1] "apple" "pear"  "banan"

You can use the nchar() function to return all characters after an index position until the end of the string for each string in a vector, or only the last n characters in each string:

> test_vector <- c("apple", "pear", "banana")
> substr(test_vector, start=2, stop=nchar(test_vector))
[1] "pple"  "ear"   "anana"
> substr(test_vector, start=nchar(test_vector)-2, stop=nchar(test_vector))
[1] "ple" "ear" "ana"

19.26.8 grep()

The grep() function (short for “Global Regular Expressions Print”) returns the indices of all strings inside a character vector that match a given pattern.

grep() requires two arguments (in addition, there are a number of optional arguments):

  • pattern: the regex pattern a string must match
  • x: the character vector containing the string(s)
> test_vector <- c("apple", "pear", "banana")
> grep('a', test_vector)
[1] 1 2 3
> grep('p', test_vector)
[1] 1 2
> grep('(\\w)\\1', test_vector) # strings that have a duplicated character in them
[1] 1

NB: to escape a character in regular expressions in R, you need to use double backslashes instead of single ones: e.g., use \\n for a new line character. To match a literal backslash, you will need four backslashes! \\\\

19.26.9 Casting functions: as.numeric(), as.character()

Casting functions explicitly convert an R object of one type into an object of another type. You will probably most frequently use as.numeric() to turn character vectors into numeric characters, and as.character() to do the opposite.

> a <- "123"
> a
[1] "123"
> a + 4
Error in a + 4 : non-numeric argument to binary operator
> as.numeric(a) + 4
[1] 127
> b <- 123
> b
[1] 123
> as.character(b)
[1] "123"

There are many more casting functions (try writing “as.” in the RStudio console and a popup will appear with dozens of other casting functions).

19.26.10 matrix()

The matrix() function creates a matrix object. A matrix is a two-dimensional data structure, similar to a table with rows and columns. All elements in a matrix must be of the same type (string/numeric/…)

The matrix() function requires three arguments:

  • data: a vector containing the data that should be put into the matrix
  • nrow: the number of rows the matrix should have
  • ncol: the number of columns the matrix should have
# create a matrix with 3 rows and 4 columns containing the numbers from 1 to 12:
> m <- matrix(data=1:12, nrow=3, ncol=4) 
> m
[,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8   11
[3,]    3    6    9   12

The optional argument byrow defines whether the data should be fed into the matrix by row (byrow=TRUE) or by column (byrow=FALSE); default is FALSE.

> m <- matrix(data=1:12, nrow=3, ncol=4, byrow=TRUE)
> m
[,1] [,2] [,3] [,4]
[1,]    1    2    3    4
[2,]    5    6    7    8
[3,]    9   10   11   12

19.26.11 data.frame()

The data.frame() function creates a dataframe object, which is similar to a spreadsheet or a table in a database. Contrary to a matrix, a dataframe can contain objects of different types (character, numeric, …)

> df <- data.frame(fruit=c("apple", "banana", "pear"), stock=c(15, 20, 3))
> df
fruit stock
1  apple    15
2 banana    20
3   pear     3

Every dataframe has 3 attributes:

  • dim: its dimension (number of rows and columns)
  • colnames: the names of its columns
  • rownames: the names of its rows
> dim(df)
[1] 3 2
> colnames(df)
[1] "fruit" "stock"
> rownames(df)
[1] "1" "2" "3"

The name of a column can be used to get the data from that column:

> df$fruit
[1] "apple"  "banana" "pear"  

19.26.12 setwd() and getwd()

R needs a specific location on your computer where it can write data, and from where it can read data. This location is called the “working directory”.

The setwd() function (short for “set working directory”) sets the working directory to the path that is passed as an argument to the function. The getwd() function displays the current working directory.

> setwd("~")
> getwd()
[1] "C:/Users/peter/Documents"

19.26.13 dir()

The dir() function (short for “directory”) the contents (files and folders) of a directory (by default: the working directory).

19.26.14 read.csv()

CSV (“Comma-Separated Values”) files are plain text files used to hold structured data. Data is written in tabular form, with commas delimiting columns and new lines delimiting rows.

An example of the contents of a csv file:

word,frequency
apple,15
banana,20
pear,5

The read.csv() function loads a csv file into an R dataframe object:

> freq <- read.csv(file="word_frequency.csv", as.is=TRUE)
> freq
  word frequency
1  apple        15
2 banana        20
3   pear        5
> class(freq)
[1] "data.frame

Sometimes, csv data uses other delimiters than comma to separate columns; the tab character (symbolized in R by \t) is often used (such files are often called TSV files, for “tab-separated values”).

word    frequency
apple   15
banana  20
pear    5

The read.csv function’s optional argument sep (for “separator”; default: “,”) can be set to \t (that is, tab) to parse a tsv file:

> freq2 <- read.csv(file="word_frequency.csv", as.is=TRUE, sep="\t")
> freq2
    word frequency
1  apple        15
2 banana        20
3   pear         5
> class(freq2)
[1] "data.frame

19.26.15 order()

The order() function can be used to sort a vector or another R data structure like a dataframe. Its output is a vector that contains indices for the sort order of the object you passed to the function.

> test_vector <- c("apple","pear", "banana")
> index <- order(test_vector)
[1] 1 3 2

You can use the output of the order() function to sort the original object:

> index <- order(test_vector)
> test_vector[index]
[1] "apple"  "banana" "pear" 
> df <- data.frame(word=c("apple", "banana", "pear"), frequency=c(15, 20, 3))
> index <- order(df$frequency)
> index
[1] 3 1 2
> df[index,]
    word frequency
3   pear         5
1  apple        15
2 banana        20

In order to sort from high to low values, set the optional argument decreasing to TRUE:

> test_vector <- c("apple","pear", "banana")
> sort(test_vector, decreasing=TRUE)
[1] "pear"   "banana" "apple"