Easy R Programming Basics

Previously, we described how to install R/RStudio as well as how to launch R/RStudio and set up your working directory.

Here, we described the basics you should know about R programming, including :

  • Performing basic arithmetic operations and using basic arithmetic functions
  • Creating and subsetting basic data types in R

Basic arithmetic operations

R can be used as a calculator.

The basic arithmetic operators are:

  1. + (addition)
  2. - (subtraction)
  3. * (multiplication)
  4. / (division)
  5. and ^ (exponentiation).

Type directly the command below in the console:

# Addition
3 + 7
[1] 10
# Substraction
7 - 3
[1] 4
# Multiplication
3 * 7
[1] 21
# Divison
[1] 2.333333
# Exponentiation
[1] 8
# Modulo: returns the remainder of the division of 8/3
8 %% 3
[1] 2

Note that, in R, ‘#’ is used for adding comments to explain what the R code is about.

Basic arithmetic functions

  1. Logarithms and Exponentials:
log2(x) # logarithms base 2 of x
log10(x) # logaritms base 10 of x
exp(x) # Exponential of x
  1. Trigonometric functions:
cos(x) # Cosine of x
sin(x) # Sine of x
tan(x) #Tangent of x
acos(x) # arc-cosine of x
asin(x) # arc-sine of x
atan(x) #arc-tangent of x
  1. Other mathematical functions
abs(x) # absolute value of x
sqrt(x) # square root of x

Assigning values to variables

A variable can be used to store a value.

For example, the R code below will store the price of a lemon in a variable, say “lemon_price”:

# Price of a lemon = 2 euros
lemon_price <- 2
# or use this
lemon_price = 2

Note that, it’s possible to use <- or = for variable assignments.

Note that, R is case-sensitive. This means that lemon_price is different from Lemon_Price.

To print the value of the created object, just type its name:

[1] 2

or use the function print():

[1] 2

R saves the object lemon_price (also known as a variable) in memory. It’s possible to make some operations with it.

# Multiply lemon price by 5
5 * lemon_price
[1] 10

You can change the value of the object:

# Change the value
lemon_price <- 5
# Print again
[1] 5

The following R code creates two variables holding the width and the height of a rectangle. These two variables will be used to compute of the rectangle.

# Rectangle height
height <- 10
# rectangle width
width <- 5
# compute rectangle area
area <- height*width
[1] 50

The function ls() can be used to see the list of objects we have created:

[1] "area"        "height"      "info"        "lemon_price" "PACKAGES"    "R_VERSION"  
[7] "width"      

The collection of objects currently stored is called the workspace.

Note that, each variable takes some place in the computer memory. If you work on a big project, it’s good to clean up your workspace.

To remove a variable, use the function rm():

# Remove height and width variable
rm(height, width)
# Display the remaining variables
[1] "area"        "info"        "lemon_price" "PACKAGES"    "R_VERSION"  

Basic data types

Basic data types are numeric, character and logical.

# Numeric object: How old are you?
my_age <- 28
# Character  object: What's your name?
my_name <- "Nicolas"
# logical object: Are you a data scientist?
# (yes/no) <=> (TRUE/FALSE)
is_datascientist <- TRUE

Note that, character vector can be created using double (“) or single (’) quotes. If your text contains quotes, you should escape them using”\" as follow.

'My friend\'s name is "Jerome"'
[1] "My friend's name is \"Jerome\""
# or use this
"My friend's name is \"Jerome\""
[1] "My friend's name is \"Jerome\""

It’s possible to use the function class() to see what type a variable is:

[1] "numeric"
[1] "character"

You can also use the functions is.numeric(), is.character(), is.logical() to check whether a variable is numeric, character or logical, respectively. For instance:

[1] TRUE

If you want to change the type of a variable to another one, use the as.* functions, including: as.numeric(), as.character(), as.logical(), etc.

[1] 28
# Convert my_age to a character variable
[1] "28"

Note that, the conversion of a character to a numeric will output NA (for not available). R doesn’t know how to convert a numeric variable to a character variable.


A vector is a combination of multiple values (numeric, character or logical) in the same object. In this case, you can have numeric vectors, character vectors or logical vectors.

Create a vector

A vector is created using the function c() (for concatenate), as follow:

# Store your friends'age in a numeric vector
friend_ages <- c(27, 25, 29, 26) # Create
friend_ages # Print
[1] 27 25 29 26
# Store your friend names in a character vector
my_friends <- c("Nicolas", "Thierry", "Bernard", "Jerome")
[1] "Nicolas" "Thierry" "Bernard" "Jerome" 
# Store your friends marital status in a logical vector
# Are they married? (yes/no <=> TRUE/FALSE)
are_married <- c(TRUE, FALSE, TRUE, TRUE)

It’s possible to give a name to the elements of a vector using the function names().

# Vector without element names
[1] 27 25 29 26
# Vector with element names
names(friend_ages) <- c("Nicolas", "Thierry", "Bernard", "Jerome")
Nicolas Thierry Bernard  Jerome 
     27      25      29      26 
# You can also create a named vector as follow
friend_ages <- c(Nicolas = 27, Thierry = 25, 
                 Bernard = 29, Jerome = 26)
Nicolas Thierry Bernard  Jerome 
     27      25      29      26 

Note that a vector can only hold elements of the same type. For example, you cannot have a vector that contains both characters and numeric values.

  • Find the length of a vector (i.e., the number of elements in a vector)
# Number of friends
[1] 4

Case of missing values

I know that some of my friends (Nicolas and Thierry) have 2 child. But this information is not available (NA) for the remaining friends (Bernard and Jerome).

In R missing values (or missing information) are represented by NA:

have_child <- c(Nicolas = "yes", Thierry = "yes", 
                Bernard = NA, Jerome = NA)
Nicolas Thierry Bernard  Jerome 
  "yes"   "yes"      NA      NA 

It’s possible to use the function is.na() to check whether a data contains missing value. The result of the function is.na() is a logical vector in which, the value TRUE specifies that the corresponding element in x is NA.

# Check if have_child contains missing values
Nicolas Thierry Bernard  Jerome 

Note that, there is a second type of missing values named NaN (“Not a Number”). This is produced in a situation where mathematical function won’t work properly, for example 0/0 = NaN.

Note also that, the function is.na() is TRUE for both NA and NaN values. To differentiate these, the function is.nan() is only TRUE for NaNs.

Get a subset of a vector

Subsetting a vector consists of selecting a part of your vector.

  • Selection by positive indexing: select an element of a vector by its position (index) in square brackets
# Select my friend number 2
[1] "Thierry"
# Select my friends number 2 and 4 
my_friends[c(2, 4)]
[1] "Thierry" "Jerome" 
# Select my friends number 1 to 3
[1] "Nicolas" "Thierry" "Bernard"

Note that, R indexes from 1, NOT 0. So your first column is at [1] and not [0].

If you have a named vector, it’s also possible to use the name for selecting an element:

  • Selection by negative indexing: Exclude an element
# Exclude my friend number 2
[1] "Nicolas" "Bernard" "Jerome" 
# Exclude my friends number 2 and 4
my_friends[-c(2, 4)]
[1] "Nicolas" "Bernard"
# Exclude my friends number 1 to 3
[1] "Jerome"
  • Selection by logical vector: Only, the elements for which the corresponding value in the selecting vector is TRUE, will be kept in the subset.
# Select only married friends
my_friends[are_married == TRUE]
[1] "Nicolas" "Bernard" "Jerome" 
# Friends with age >=27
my_friends[friend_ages >= 27]
[1] "Nicolas" "Bernard"
# Friends with age different from 27
my_friends[friend_ages != 27]
[1] "Thierry" "Bernard" "Jerome" 

If you want to remove missing data, use this:

# Data with missing values
Nicolas Thierry Bernard  Jerome 
  "yes"   "yes"      NA      NA 
# Keep only values different from NA (!is.na())
Nicolas Thierry 
  "yes"   "yes" 
# Or, replace NA value by "NO" and then print
have_child[!is.na(have_child)] <- "NO"
Nicolas Thierry Bernard  Jerome 
   "NO"    "NO"      NA      NA 

Note that, the “logical” comparison operators available in R are:

  • <: for less than
  • >: for greater than
  • <=: for less than or equal to
  • >=: for greater than or equal to
  • ==: for equal to each other
  • !=: not equal to each other

Calculations with vectors

Note that, all the basic arithmetic operators (+, -, *, / and ^ ) as well as the common arithmetic functions (log, exp, sin, cos, tan, sqrt, abs, …), described in the previous sections, can be applied on a numeric vector.

If you perform an operation with vectors, the operation will be applied to each element of the vector. An example is provided below:

# My friends' salary in dollars
salaries <- c(2000, 1800, 2500, 3000)
names(salaries) <- c("Nicolas", "Thierry", "Bernard", "Jerome")
Nicolas Thierry Bernard  Jerome 
   2000    1800    2500    3000 
# Multiply salaries by 2
Nicolas Thierry Bernard  Jerome 
   4000    3600    5000    6000 

As you can see, R multiplies each element in the salaries vector with 2.

Now, suppose that you want to multiply the salaries by different coefficients. The following R code can be used:

# create coefs vector with the same length as salaries
coefs <- c(2, 1.5, 1, 3)
# Multiply salaries by coeff
Nicolas Thierry Bernard  Jerome 
   4000    2700    2500    9000 

Note that the calculation is done element-wise. The first element of salaries vector is multiplied by the first element of coefs vector, and so on.

Compute the square root of a numeric vector:

my_vector <- c(4, 16, 9)
[1] 2 4 3

Other useful functions are:

max(x) # Get the maximum value of x
min(x) # Get the minimum value of x
# Get the range of x. Returns a vector containing
# the minimum and the maximum of x
length(x) # Get the number of elements in x
sum(x) # Get the total of the elements in x
prod(x) # Get the product of the elements in x
# The mean value of the elements in x
# sum(x)/length(x)
sd(x) # Standard deviation of x
var(x) # Variance of x
# Sort the element of x in ascending order

For example, if you want to compute the total sum of salaries, type this:

[1] 9300

Compute the mean of salaries:

[1] 2325

The range (minimum, maximum) of salaries is:

[1] 1800 3000


A matrix is like an Excel sheet containing multiple rows and columns. It’s used to combine vectors with the same type, which can be either numeric, character or logical. Matrices are used to store a data table in R. The rows of a matrix are generally individuals/observations and the columns are variables.

Create and naming matrix

To create easily a matrix, use the function cbind() or rbind() as follow:

# Numeric vectors
col1 <- c(5, 6, 7, 8, 9)
col2 <- c(2, 4, 5, 9, 8)
col3 <- c(7, 3, 4, 8, 7)
# Combine the vectors by column
my_data <- cbind(col1, col2, col3)
     col1 col2 col3
[1,]    5    2    7
[2,]    6    4    3
[3,]    7    5    4
[4,]    8    9    8
[5,]    9    8    7
# Change rownames
rownames(my_data) <- c("row1", "row2", "row3", "row4", "row5")
     col1 col2 col3
row1    5    2    7
row2    6    4    3
row3    7    5    4
row4    8    9    8
row5    9    8    7

  • cbind(): combine R objects by columns
  • rbind(): combine R objects by rows
  • rownames(): retrieve or set row names of a matrix-like object
  • colnames(): retrieve or set column names of a matrix-like object

If you want to transpose your data, use the function t():

     row1 row2 row3 row4 row5
col1    5    6    7    8    9
col2    2    4    5    9    8
col3    7    3    4    8    7

Note that, it’s also possible to construct a matrix using the function matrix().

The simplified format of matrix() is as follow:

matrix(data = NA, nrow = 1, ncol = 1, byrow = FALSE,
       dimnames = NULL)

  • data: an optional data vector
  • nrow, ncol: the desired number of rows and columns, respectively.
  • byrow: logical value. If FALSE (the default) the matrix is filled by columns, otherwise the matrix is filled by rows.
  • dimnames: A list of two vectors giving the row and column names respectively.

In the R code below, the input data has length 6. We want to create a matrix with two columns. You don’t need to specify the number of rows (here nrow = 3). R will infer this automatically. The matrix is filled column by column when the argument byrow = FALSE. If you want to fill the matrix by rows, use byrow = TRUE.

mdat <- matrix(
           data = c(1,2,3, 11,12,13), 
           nrow = 2, byrow = TRUE,
           dimnames = list(c("row1", "row2"), c("C.1", "C.2", "C.3"))
     C.1 C.2 C.3
row1   1   2   3
row2  11  12  13

Dimensions of a matrix

The R functions nrow() and ncol() return the number of rows and columns present in the data, respectively.

ncol(my_data) # Number of columns
[1] 3
nrow(my_data) # Number of rows
[1] 5
dim(my_data) # Number of rows and columns
[1] 5 3

Get a subset of a matrix

  • Select rows/columns by positive indexing

rows and/or columns can be selected as follow: my_data[row, col]

# Select row number 2
my_data[2, ]
col1 col2 col3 
   6    4    3 
# Select row number 2 to 4
my_data[2:4, ]
     col1 col2 col3
row2    6    4    3
row3    7    5    4
row4    8    9    8
# Select multiple rows that aren't contiguous
# e.g.: rows 2 and 4 but not 3
my_data[c(2,4), ]
     col1 col2 col3
row2    6    4    3
row4    8    9    8
# Select column number 3
my_data[, 3]
row1 row2 row3 row4 row5 
   7    3    4    8    7 
# Select the value at row 2 and column  3
my_data[2, 3]
[1] 3
  • Select by row/column names
# Select column 2
my_data[, "col2"]
row1 row2 row3 row4 row5 
   2    4    5    9    8 
# Select by index and names: row 3 and olumn 2
my_data[3, "col2"]
[1] 5
  • Exclude rows/columns by negative indexing
# Exclude column 1
my_data[, -1]
     col2 col3
row1    2    7
row2    4    3
row3    5    4
row4    9    8
row5    8    7
  • Selection by logical: In the R code below, we want to keep only rows where col3 >=4:
col3 <- my_data[, "col3"]
my_data[col3 >= 4, ]
     col1 col2 col3
row1    5    2    7
row3    7    5    4
row4    8    9    8
row5    9    8    7

Calculations with matrices

  • It’s also possible to perform simple operations on matrice. For example, the following R code multiplies each element of the matrix by 2:
     col1 col2 col3
row1   10    4   14
row2   12    8    6
row3   14   10    8
row4   16   18   16
row5   18   16   14

Or, compute the log2 values:

         col1     col2     col3
row1 2.321928 1.000000 2.807355
row2 2.584963 2.000000 1.584963
row3 2.807355 2.321928 2.000000
row4 3.000000 3.169925 3.000000
row5 3.169925 3.000000 2.807355
  • rowSums() and colSums() functions: Compute the total of each row and the total of each column, respectively.
# Total of each row
row1 row2 row3 row4 row5 
  14   13   16   25   24 
# Total of each column
col1 col2 col3 
  35   28   29 

If you are interested in row/column means, you can use the function rowMeans() and colMeans() for computing row and column means, respectively.

Note that, it’s also possible to use the function apply() to apply any statistical functions to rows/columns of matrices.

The simplified format of apply() is as follow:

apply(X, MARGIN, FUN)
  • X: your data matrix
  • MARGIN: possible values are 1 (for rows) and 2 (for columns)
  • FUN: the function to apply on rows/columns

Use apply() as follow:

# Compute row means
apply(my_data, 1, mean)
    row1     row2     row3     row4     row5 
4.666667 4.333333 5.333333 8.333333 8.000000 
# Compute row medians
apply(my_data, 1, median)
row1 row2 row3 row4 row5 
   5    4    5    8    8 
# Compute column means
apply(my_data, 2, mean)
col1 col2 col3 
 7.0  5.6  5.8 


Factor variables represent categories or groups in your data. The function factor() can be used to create a factor variable.

Create a factor

# Create a factor variable
friend_groups <- factor(c(1, 2, 1, 2))
[1] 1 2 1 2
Levels: 1 2

The variable friend_groups contains two categories of friends: 1 and 2. In R terminology, categories are called factor levels.

It’s possible to access to the factor levels using the function levels():

# Get group names (or levels)
[1] "1" "2"
# Change levels
levels(friend_groups) <- c("best_friend", "not_best_friend")
[1] best_friend     not_best_friend best_friend     not_best_friend
Levels: best_friend not_best_friend

Note that, R orders factor levels alphabetically. If you want a different order in the levels, you can specify the levels argument in the factor function as follow.

# Change the order of levels
friend_groups <- factor(friend_groups, 
                      levels = c("not_best_friend", "best_friend"))
# Print
[1] best_friend     not_best_friend best_friend     not_best_friend
Levels: not_best_friend best_friend

Note that:

  • The function is.factor() can be used to check whether a variable is a factor. Results are TRUE (if factor) or FALSE (if not factor)
  • The function as.factor() can be used to convert a variable to a factor.

# Check if friend_groups is a factor
[1] TRUE
# Check if "are_married" is a factor
# Convert "are_married" as a factor

Calculations with factors

  • If you want to know the number of individuals in each levels, use the function summary():
not_best_friend     best_friend 
              2               2 
  • In the following example, I want to compute the mean salary of my friends by groups. The function tapply() can be used to apply a function, here mean(), to each group.
# Salaries of my friends
Nicolas Thierry Bernard  Jerome 
   2000    1800    2500    3000 
# Friend groups
[1] best_friend     not_best_friend best_friend     not_best_friend
Levels: not_best_friend best_friend
# Compute the mean salaries by groups
mean_salaries <- tapply(salaries, friend_groups, mean)
not_best_friend     best_friend 
           2400            2250 
# Compute the size/length of each group
tapply(salaries, friend_groups, length)
not_best_friend     best_friend 
              2               2 
  • It’s also possible to use the function table() to create a frequency table, also known as a contingency table of the counts at each combination of factor levels.
not_best_friend     best_friend 
              2               2 
# Cross-tabulation between 
# friend_groups and are_married variables
table(friend_groups, are_married)
friend_groups     FALSE TRUE
  not_best_friend     1    1
  best_friend         0    2

Data frames

A data frame is like a matrix but can have columns with different types (numeric, character, logical). Rows are observations (individuals) and columns are variables.

Create a data frame

A data frame can be created using the function data.frame(), as follow:

# Create a data frame
friends_data <- data.frame(
  name = my_friends,
  age = friend_ages,
  height = c(180, 170, 185, 169),
  married = are_married
# Print
           name age height married
Nicolas Nicolas  27    180    TRUE
Thierry Thierry  25    170   FALSE
Bernard Bernard  29    185    TRUE
Jerome   Jerome  26    169    TRUE

To check whether a data is a data frame, use the is.data.frame() function. Returns TRUE if the data is a data frame:

[1] TRUE

The object “friends_data” is a data frame, but not the object “my_data”. We can convert-it to a data frame using the as.data.frame() function:

# What is the class of my_data? --> matrix
[1] "matrix"
# Convert it as a data frame
my_data2 <- as.data.frame(my_data)
# Now, the class is data.frame
[1] "data.frame"

As described in matrix section, you can use the function t() to transpose a data frame:


Subset a data frame

To select just certain columns from a data frame, you can either refer to the columns by name or by their location (i.e., column 1, 2, 3, etc.).

  1. Positive indexing by name and by location
# Access the data in 'name' column
# dollar sign is used
[1] Nicolas Thierry Bernard Jerome 
Levels: Bernard Jerome Nicolas Thierry
# or use this
friends_data[, 'name']
[1] Nicolas Thierry Bernard Jerome 
Levels: Bernard Jerome Nicolas Thierry
# Subset columns 1 and 3
friends_data[ , c(1, 3)]
           name height
Nicolas Nicolas    180
Thierry Thierry    170
Bernard Bernard    185
Jerome   Jerome    169
  1. Negative indexing
# Exclude column 1
friends_data[, -1]
        age height married
Nicolas  27    180    TRUE
Thierry  25    170   FALSE
Bernard  29    185    TRUE
Jerome   26    169    TRUE
  1. Index by characteristics

We want to select all friends with age >= 27.

# Identify rows that meet the condition
friends_data$age >= 27

TRUE specifies that the row contains a value of age >= 27.

# Select the rows that meet the condition
friends_data[friends_data$age >= 27, ]
           name age height married
Nicolas Nicolas  27    180    TRUE
Bernard Bernard  29    185    TRUE

The R code above, tells R to get all rows from friends_data where age >= 27, and then to return all the columns.

If you don’t want to see all the column data for the selected rows but are just interested in displaying, for example, friend names and age for friends with age >= 27, you could use the following R code:

# Use column locations
friends_data[friends_data$age >= 27,  c(1, 2)]
           name age
Nicolas Nicolas  27
Bernard Bernard  29
# Or use column names
friends_data[friends_data$age >= 27, c("name", "age")]
           name age
Nicolas Nicolas  27
Bernard Bernard  29

If you’re finding that your selection statement is starting to be inconvenient, you can put your row and column selections into variables first, such as:

age27 <- friends_data$age >= 27
cols <- c("name", "age")

Then you can select the rows and columns with those variables:

friends_data[age27, cols]
           name age
Nicolas Nicolas  27
Bernard Bernard  29

It’s also possible to use the function subset() as follow.

# Select friends data with age >= 27
subset(friends_data, age >= 27)
           name age height married
Nicolas Nicolas  27    180    TRUE
Bernard Bernard  29    185    TRUE

Another option is to use the functions attach() and detach(). The function attach() takes a data frame and makes its columns accessible by simply giving their names.

The functions attach() and detach() can be used as follow:

# Attach a data frame
# === Data manipulation ====
friends_data[age>=27, ]
# === End of data manipulation ====
# Detach the data frame

Extend a data frame

Add new column in a data frame

# Add group column to friends_data
friends_data$group <- friend_groups
           name age height married           group
Nicolas Nicolas  27    180    TRUE     best_friend
Thierry Thierry  25    170   FALSE not_best_friend
Bernard Bernard  29    185    TRUE     best_friend
Jerome   Jerome  26    169    TRUE not_best_friend

It’s also possible to use the functions cbind() and rbind() to extend a data frame.

cbind(friends_data, group = friend_groups)

Calculations with data frame

With numeric data frame, you can use the function rowSums(), colSums(), colMeans(), rowMeans() and apply() as described in matrix section.


A list is an ordered collection of objects, which can be vectors, matrices, data frames, etc. In other words, a list can contain all kind of R objects.

Create a list

# Create a list
my_family <- list(
  mother = "Veronique", 
  father = "Michel",
  sisters = c("Alicia", "Monica"),
  sister_age = c(12, 22)
# Print
[1] "Veronique"
[1] "Michel"
[1] "Alicia" "Monica"
[1] 12 22
# Names of elements in the list
[1] "mother"     "father"     "sisters"    "sister_age"
# Number of elements in the list
[1] 4

The list object “my_family”, contains four components, which may be individually referred to as my_family[[1]], as_family[[2]] and so on.

Subset a list

It’s possible to select an element, from a list, by its name or its index:

  • my_family$mother is the same as my_family[[1]]
  • my_family$father is the same as my_family[[2]]
# Select by name (1/2)
[1] "Michel"
# Select by name (2/2)
[1] "Michel"
# Select by index
[1] "Veronique"
[1] "Alicia" "Monica"
# Select a specific element of a component
# select the first ([1]) element of my_family[[3]]
[1] "Alicia"

Extend a list

Note that, it’s possible to extend an original list.

In the R code below, we want to add the components “grand_father” and “grand_mother” to my_family list object:

# Extend the list
my_family$grand_father <- "John"
my_family$grand_mother <- "Mary"
# Print
[1] "Veronique"
[1] "Michel"
[1] "Alicia" "Monica"
[1] 12 22
[1] "John"
[1] "Mary"

You can also concatenate two lists as follow:

list_abc <- c(list_a, list_b, list_c)

The result is a list also, whose components are those of the argument lists joined together in sequence.


This analysis has been performed using R software (ver. 3.2.3).

Enjoyed this article? I’d be very grateful if you’d help it spread by emailing it to a friend, or sharing it on Twitter, Facebook or Linked In.

Show me some love with the like buttons below... Thank you and please don't forget to share and comment below!!
Avez vous aimé cet article? Je vous serais très reconnaissant si vous aidiez à sa diffusion en l'envoyant par courriel à un ami ou en le partageant sur Twitter, Facebook ou Linked In.

Montrez-moi un peu d'amour avec les like ci-dessous ... Merci et n'oubliez pas, s'il vous plaît, de partager et de commenter ci-dessous!

This page has been seen 151128 times