# 19 R Glossary

## 19.1 Arguments

Inputs supplied to a function. Arguments can be either defined by their position in the function declaration (“positional argument”; e.g., `function (x, y, z = 0)`: the first value supplied to the function will be given the name `x`, and the second one the name `y`) or by a tag / name (e.g., `z` in `function (x, y, z = 0)`: in the function call, the value will have to be explicitly linked to `z`, and if no value is linked to `z`, `z` will default to 0).

## 19.2 Block

A sequence of statements, grouped between curly braces.

Console pane

## 19.4 CRAN

Comprehensive R Archive Network. The official repository where the versions of R and all published packages can be downloaded from. Use `install.packages("package_name")` to download and install a new package from CRAN.

## 19.7 Expression

R code consists of expressions. There are multiple types of expressions: - constant (that is, a character string or a number). E.g.,

``````> 123
 123``````
``````> "abc"
 abc``````
• operator expression (that is, every expression that contains one of R’s operators):
``````> 2 + 3
 5``````
``````> a <- "abc"

abc>``````
``````2 > 3

FALSE``````
• index constructions (that is, extracting elements from a vector or list using numerical or name indices):
``````> c(1, 2, 3)[-1]
 2 3``````
``````> fruit <- list(apple = 5, pear = 2)
> fruit[["apple"]]
5``````
``````> fruit\$pear
2``````
• flow control element: loops, conditional expressions,…
``> if (x %% 2 == 1) print("odd") else print("even")``
``> for (i in 1:10) print(i)``
• compound expression (“block”): a series of expressions grouped between curly braces and separated by semicolons or new lines:
``> {x <- 1; x += 5}``
``````> {
x <- 1
x += 5
}``````
• function definition:
``> function(x, y) x + y``
``````> function(x,y) {
x + y
}``````
• function call:
``````> print(5)

5``````

## 19.9 Function:

A function is a (sequence of) statements that is not evaluated immediately when it is “declared” (that is, created/written), but only when it is “called” / “invoked” (that is, when you tell R to evaluate it). Functions have their own environment of variables; if you assign a value to a symbol that is already used in another environment, the variable in that other environment will be left untouched.

When you call a function, you can pass values to that function through its list of arguments, which is a comma-separated list between brackets following the function’s name.

In order to “declare” (create) a function, the keyword `function` is needed, followed by the list of arguments (between parentheses) and the body of the function (usually between curly braces). Functions are usually assigned to a variable name (functions that are not linked to a name are called anonymous functions).

Functions in R are objects, which mean they can be passed to other functions as arguments, placed in lists, etc.

## 19.10 Global variable:

A variable defined outside the scope of a function.

## 19.11 Import:

Bring data or packages from the computer’s (passive) storage into its (active) memory so they can be used in the current R session. Loading a package adds the names of functions and other objects from that package to R’s namespace. Use library(“package_name”) to import/load a package into R for the current session (in a script, group all these imports at the top of a script).

## 19.13 Local variable

A variable defined inside a function. Outside that function, the variable’s name will not be bound to the value it was bound to within the function’s scope.

## 19.14 Package:

A collection of functions and datasets developed by a user to extend R. Packages can be published on `CRAN`, or distributed on GitHub or other repositories.

If you want to use a package, it must first be installed and then be loaded into your current R session. Installation must be done only once, loading every time you start a new R session.

• Use `install.packages("package_name")` to download and install a new package from `CRAN`.
• Use `library("package_name")` to import/load an installed package into R for the current session (in a script, group all imports at the top of a script).
• Use `installed.packages()` to get a list of all packages installed for your R version.

## 19.15 R:

A computer language developed for statistical analysis and graphics.

## 19.16 RStudio:

An Integrated Development Environment (IDE) for writing, executing and debugging R code. R comes with its own IDE but most users prefer RStudio for its user-friendliness.

## 19.17 Pane:

The Rstudio window is divided into four quadrants called panes: the Source pane, the Console pane, the Environment Pane and the File pane. The first two are for writing code, the latter two contain a number of tabs with useful resources.

You can minimise and maximise the size of each pane by using the icons in the top right of every pane.

To switch the order of the panes, use RStudio’s “Pane layout” dialog, which you can find in the `Options` dialog (`Tools > Options`; `RStudio > Preferences` on Mac).

Execute R code.

## 19.20 Scope:

Symbols in R are bound to a specific value only within a specific environment or “scope”. E.g., the symbol of a variable defined within a function will be bound to that variable’s value only within that function (such a variable is called a local variable); outside of the function, it will be considered unbound (that is, not connected to a value):

``````> f <- function (x) {
doubled = x * 2
}
> doubled

Variables declared outside a function are considered global variables.

If during evaluation of a function a symbol is encountered that is not in the local environment (that is, a symbol that was not in that function’s list of arguments and that was not defined inside the function), R will search for this symbol in the environment from which the function was called, and so on until the global environment is reached.

## 19.23 Symbol:

The name component in a variable: a variable is a value assigned to a symbol. E.g.,

``x <- 3``

(x is the symbol/name of the variable, 3 its value).

## 19.24 Variable:

A symbol (name) linked with a value that this symbol represents. Linking a symbol with a value is called assigning; this is done using an assignment operator (`<-`, `->`, `=`).

## 19.25 Vector:

A vector is the main data structure in R: vectors are collections of data. Usually, the term vector is used as shorthand for a specific type of vector in R, so-called atomic vectors; they are called like that because every element in an atomic vector is of the same data type.

There are 5 “modes” of atomic vectors, based on the data type of its elements; only the first three are directly relevant for us:

• character vector: all elements are text strings (data type: `character`).

``````> v <- c("a", "100", "ألف", "vector elements can be very long strings")
> typeof(v)
 "character"
> mode(v)
 "character"``````
• numeric vector: all elements are of the `integer` type (whole numbers, both positive and negative: 1, 2, -137, …), or of the `double` type (“double precision floating point numbers”: 1.2345, -125.8, pi, …) E.g.,

``````> v <- c(1, -300, 18.5, pi)
> v
   1.000000 300.000000  18.500000   3.141593
> typeof(v)
 "double"
> mode(v)
 "numeric"``````
• logical vector: all elements are one of the boolean values TRUE or FALSE:

``````> v <- c(TRUE, TRUE, FALSE, TRUE)
> typeof(v)
 "logical"
> mode(v)
 "logical"``````
• complex vector: all elements are complex numbers (numbers that have a real and an imaginary part)

• raw vector: all elements are raw byte objects

The `c()` function is often used to create vectors with multiple elements. But even if you assign a single string or number to a variable, the variable will be a vector:

``````> a <- "This is a string"
> class(a)
 "character"  # a is a character vector!
> a
 "This is a string"  # our string is the first (and only) element of that vector!``````

Vectors have no dimensions (vs. for example tables, which have 2 dimensions: rows and columns).

## 19.26 relevant R functions

### 19.26.1 class()

The class function is used to display the class of an R object.

The function has one argument: the object you want to know the class of.

``````> a = 15
> class(a)
 "numeric"
> b = "15"
> class(b)
 "character"
> class(class)
 "function"``````

### 19.26.2 ls()

The `ls` function (for “list”) lists all the objects we have created in the current R session. You will find the same information in the `Environment` tab in RStudio.

The `ls()` function does not require any arguments

``````> ls()
 "a" "b"``````

### 19.26.3 c()

The `c()` function (for “combine”) combines multiple values into a single vector object.

``````> character_vector <- c('a', 'b', 'c')
> character_vector
 "a" "b" "c"

> numeric_vector <- c(1,2,3)
> numeric_vector
 1 2 3

> logical_vector <- numeric_vector >= 2
> logical_vector
 FALSE  TRUE  TRUE``````

Note that all objects inside a vector must be of the same type (character/numberic/logical). If they are of different types, R will “coerce” them into the same type.

``````> mixed_vector <- c(1, "2", "three", TRUE)
> mixed_vector
 "1"     "2"     "three"    "TRUE"  # R has converted all elements into strings!``````

### 19.26.4 length()

The `length()` function will display the number of elements in a vector.

``````> my_vector <- c("a", "bb", "ccc")
> length(my_vector)
 3
> length("A longer character string")
 1``````

### 19.26.5 paste()

The `paste()` function concatenates two or more character vectors. By default, it will add a space between two strings:

``````> paste('a', "b")
 "a b"
> paste('a', "b", "c")
 "a b c"``````

If you want another character to be used to separate the two strings, the function provides an additional argument called “sep”:

``````> paste('a', "b", sep=",")
 "a,b"
> paste(c('a', 'b', 'c'), "d", sep='/')
 "a/d" "b/d" "c/d"``````

### 19.26.6 nchar()

The `nchar()` function (for “number of characters”) returns the number of characters in each string in a character vector.

``````> nchar("banana")
 6
> test_vector <- c("apple", "pear", "banana")
> nchar(test_vector)
 5 4 6``````

### 19.26.7 substr()

The `substr` function returns substrings of character vectors using character offsets of each string in the vector. The function takes three arguments: * the character vector from which you want to extract a substring * `start`: the index of the substring inside each string inside the vector * `stop`: the last character of the substring inside each string inside the vector

``````> substr("Banana", start=2, stop=5)
 "anan"
> test_vector <- c("apple", "pear", "banana")
> substr(test_vector, start=1, stop=3)
 "app" "pea" "ban"``````

NB: note that in R (in contrast to many other programming languages) the first index of an object is 1, not 0; and that the stop index is inclusive (e.g., if `stop` is set to 5, the substring will end after the fifth value, not before it).

If a string inside a character vector is shorter than the `stop` value, the `substr` function will return the string from the `start` value until its last character:

``````> test_vector <- c("apple", "pear", "banana")
> substr(test_vector, start=1, stop=5)
 "apple" "pear"  "banan"``````

You can use the `nchar()` function to return all characters after an index position until the end of the string for each string in a vector, or only the last n characters in each string:

``````> test_vector <- c("apple", "pear", "banana")
> substr(test_vector, start=2, stop=nchar(test_vector))
 "pple"  "ear"   "anana"
> substr(test_vector, start=nchar(test_vector)-2, stop=nchar(test_vector))
 "ple" "ear" "ana"``````

### 19.26.8 grep()

The `grep()` function (short for “Global Regular Expressions Print”) returns the indices of all strings inside a character vector that match a given pattern.

`grep()` requires two arguments (in addition, there are a number of optional arguments):

• `pattern`: the regex pattern a string must match
• `x`: the character vector containing the string(s)
``````> test_vector <- c("apple", "pear", "banana")
> grep('a', test_vector)
 1 2 3
> grep('p', test_vector)
 1 2
> grep('(\\w)\\1', test_vector) # strings that have a duplicated character in them
 1``````

NB: to escape a character in regular expressions in R, you need to use double backslashes instead of single ones: e.g., use `\\n` for a new line character. To match a literal backslash, you will need four backslashes! `\\\\`

### 19.26.9 Casting functions: as.numeric(), as.character()

Casting functions explicitly convert an R object of one type into an object of another type. You will probably most frequently use `as.numeric()` to turn character vectors into numeric characters, and `as.character()` to do the opposite.

``````> a <- "123"
> a
 "123"
> a + 4
Error in a + 4 : non-numeric argument to binary operator
> as.numeric(a) + 4
 127``````
``````> b <- 123
> b
 123
> as.character(b)
 "123"``````

There are many more casting functions (try writing “as.” in the RStudio console and a popup will appear with dozens of other casting functions).

### 19.26.10 matrix()

The `matrix()` function creates a matrix object. A matrix is a two-dimensional data structure, similar to a table with rows and columns. All elements in a matrix must be of the same type (string/numeric/…)

The `matrix()` function requires three arguments:

• `data`: a vector containing the data that should be put into the matrix
• `nrow`: the number of rows the matrix should have
• `ncol`: the number of columns the matrix should have
``````# create a matrix with 3 rows and 4 columns containing the numbers from 1 to 12:
> m <- matrix(data=1:12, nrow=3, ncol=4)
> m
[,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8   11
[3,]    3    6    9   12``````

The optional argument `byrow` defines whether the data should be fed into the matrix by row (`byrow=TRUE`) or by column (`byrow=FALSE`); default is `FALSE`.

``````> m <- matrix(data=1:12, nrow=3, ncol=4, byrow=TRUE)
> m
[,1] [,2] [,3] [,4]
[1,]    1    2    3    4
[2,]    5    6    7    8
[3,]    9   10   11   12``````

### 19.26.11 data.frame()

The `data.frame()` function creates a dataframe object, which is similar to a spreadsheet or a table in a database. Contrary to a matrix, a dataframe can contain objects of different types (character, numeric, …)

``````> df <- data.frame(fruit=c("apple", "banana", "pear"), stock=c(15, 20, 3))
> df
fruit stock
1  apple    15
2 banana    20
3   pear     3``````

Every dataframe has 3 attributes:

• `dim`: its dimension (number of rows and columns)
• `colnames`: the names of its columns
• `rownames`: the names of its rows
``````> dim(df)
 3 2
> colnames(df)
 "fruit" "stock"
> rownames(df)
 "1" "2" "3"``````

The name of a column can be used to get the data from that column:

``````> df\$fruit
 "apple"  "banana" "pear"  ``````

### 19.26.12 setwd() and getwd()

R needs a specific location on your computer where it can write data, and from where it can read data. This location is called the “working directory”.

The `setwd()` function (short for “set working directory”) sets the working directory to the path that is passed as an argument to the function. The `getwd()` function displays the current working directory.

``````> setwd("~")
> getwd()
 "C:/Users/peter/Documents"``````

### 19.26.13 dir()

The `dir()` function (short for “directory”) the contents (files and folders) of a directory (by default: the working directory).

CSV (“Comma-Separated Values”) files are plain text files used to hold structured data. Data is written in tabular form, with commas delimiting columns and new lines delimiting rows.

An example of the contents of a csv file:

``````word,frequency
apple,15
banana,20
pear,5``````

The `read.csv()` function loads a csv file into an R dataframe object:

``````> freq <- read.csv(file="word_frequency.csv", as.is=TRUE)
> freq
word frequency
1  apple        15
2 banana        20
3   pear        5
> class(freq)
 "data.frame``````

Sometimes, csv data uses other delimiters than comma to separate columns; the tab character (symbolized in R by `\t`) is often used (such files are often called TSV files, for “tab-separated values”).

``````word    frequency
apple   15
banana  20
pear    5``````

The `read.csv` function’s optional argument `sep` (for “separator”; default: “,”) can be set to `\t` (that is, tab) to parse a tsv file:

``````> freq2 <- read.csv(file="word_frequency.csv", as.is=TRUE, sep="\t")
> freq2
word frequency
1  apple        15
2 banana        20
3   pear         5
> class(freq2)
 "data.frame``````

### 19.26.15 order()

The `order()` function can be used to sort a vector or another R data structure like a dataframe. Its output is a vector that contains indices for the sort order of the object you passed to the function.

``````> test_vector <- c("apple","pear", "banana")
> index <- order(test_vector)
 1 3 2``````

You can use the output of the `order()` function to sort the original object:

``````> index <- order(test_vector)
> test_vector[index]
 "apple"  "banana" "pear" ``````
``````> df <- data.frame(word=c("apple", "banana", "pear"), frequency=c(15, 20, 3))
> index <- order(df\$frequency)
> index
 3 1 2
> df[index,]
word frequency
3   pear         5
1  apple        15
2 banana        20``````

In order to sort from high to low values, set the optional argument `decreasing` to `TRUE`:

``````> test_vector <- c("apple","pear", "banana")
> sort(test_vector, decreasing=TRUE)
 "pear"   "banana" "apple" ``````