ggplot2 barplots : Quick start guide - R software and data visualization
This R tutorial describes how to create a barplot using R software and ggplot2 package.
The function geom_bar() can be used.
Related Book:
GGPlot2 Essentials for Great Data Visualization in R
Basic barplots
Data
Data derived from ToothGrowth data sets are used. ToothGrowth describes the effect of Vitamin C on Tooth growth in Guinea pigs.
df <- data.frame(dose=c("D0.5", "D1", "D2"),
len=c(4.2, 10, 29.5))
head(df)
## dose len
## 1 D0.5 4.2
## 2 D1 10.0
## 3 D2 29.5
- len : Tooth length
- dose : Dose in milligrams (0.5, 1, 2)
Create barplots
library(ggplot2)
# Basic barplot
p<-ggplot(data=df, aes(x=dose, y=len)) +
geom_bar(stat="identity")
p
# Horizontal bar plot
p + coord_flip()
Change the width and the color of bars :
# Change the width of bars
ggplot(data=df, aes(x=dose, y=len)) +
geom_bar(stat="identity", width=0.5)
# Change colors
ggplot(data=df, aes(x=dose, y=len)) +
geom_bar(stat="identity", color="blue", fill="white")
# Minimal theme + blue fill color
p<-ggplot(data=df, aes(x=dose, y=len)) +
geom_bar(stat="identity", fill="steelblue")+
theme_minimal()
p
Choose which items to display :
p + scale_x_discrete(limits=c("D0.5", "D2"))
Bar plot with labels
# Outside bars
ggplot(data=df, aes(x=dose, y=len)) +
geom_bar(stat="identity", fill="steelblue")+
geom_text(aes(label=len), vjust=-0.3, size=3.5)+
theme_minimal()
# Inside bars
ggplot(data=df, aes(x=dose, y=len)) +
geom_bar(stat="identity", fill="steelblue")+
geom_text(aes(label=len), vjust=1.6, color="white", size=3.5)+
theme_minimal()
Barplot of counts
In the R code above, we used the argument stat = “identity” to make barplots. Note that, the default value of the argument stat is “bin”. In this case, the height of the bar represents the count of cases in each category.
To make a barplot of counts, we will use the mtcars data sets :
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
# Don't map a variable to y
ggplot(mtcars, aes(x=factor(cyl)))+
geom_bar(stat="bin", width=0.7, fill="steelblue")+
theme_minimal()
Change barplot colors by groups
Change outline colors
Barplot outline colors can be automatically controlled by the levels of the variable dose :
# Change barplot line colors by groups
p<-ggplot(df, aes(x=dose, y=len, color=dose)) +
geom_bar(stat="identity", fill="white")
p
It is also possible to change manually barplot line colors using the functions :
- scale_color_manual() : to use custom colors
- scale_color_brewer() : to use color palettes from RColorBrewer package
- scale_color_grey() : to use grey color palettes
# Use custom color palettes
p+scale_color_manual(values=c("#999999", "#E69F00", "#56B4E9"))
# Use brewer color palettes
p+scale_color_brewer(palette="Dark2")
# Use grey scale
p + scale_color_grey() + theme_classic()
Read more on ggplot2 colors here : ggplot2 colors
Change fill colors
In the R code below, barplot fill colors are automatically controlled by the levels of dose :
# Change barplot fill colors by groups
p<-ggplot(df, aes(x=dose, y=len, fill=dose)) +
geom_bar(stat="identity")+theme_minimal()
p
It is also possible to change manually barplot fill colors using the functions :
- scale_fill_manual() : to use custom colors
- scale_fill_brewer() : to use color palettes from RColorBrewer package
- scale_fill_grey() : to use grey color palettes
# Use custom color palettes
p+scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9"))
# use brewer color palettes
p+scale_fill_brewer(palette="Dark2")
# Use grey scale
p + scale_fill_grey()
Use black outline color :
ggplot(df, aes(x=dose, y=len, fill=dose))+
geom_bar(stat="identity", color="black")+
scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9"))+
theme_minimal()
Read more on ggplot2 colors here : ggplot2 colors
Change the legend position
# Change bar fill colors to blues
p <- p+scale_fill_brewer(palette="Blues")
p + theme(legend.position="top")
p + theme(legend.position="bottom")
# Remove legend
p + theme(legend.position="none")
The allowed values for the arguments legend.position are : “left”,“top”, “right”, “bottom”.
Read more on ggplot legend : ggplot2 legend
Change the order of items in the legend
The function scale_x_discrete can be used to change the order of items to “2”, “0.5”, “1” :
p + scale_x_discrete(limits=c("D2", "D0.5", "D1"))
Barplot with multiple groups
Data
Data derived from ToothGrowth data sets are used. ToothGrowth describes the effect of Vitamin C on tooth growth in Guinea pigs. Three dose levels of Vitamin C (0.5, 1, and 2 mg) with each of two delivery methods [orange juice (OJ) or ascorbic acid (VC)] are used :
df2 <- data.frame(supp=rep(c("VC", "OJ"), each=3),
dose=rep(c("D0.5", "D1", "D2"),2),
len=c(6.8, 15, 33, 4.2, 10, 29.5))
head(df2)
## supp dose len
## 1 VC D0.5 6.8
## 2 VC D1 15.0
## 3 VC D2 33.0
## 4 OJ D0.5 4.2
## 5 OJ D1 10.0
## 6 OJ D2 29.5
- len : Tooth length
- dose : Dose in milligrams (0.5, 1, 2)
- supp : Supplement type (VC or OJ)
Create barplots
A stacked barplot is created by default. You can use the function position_dodge() to change this. The barplot fill color is controlled by the levels of dose :
# Stacked barplot with multiple groups
ggplot(data=df2, aes(x=dose, y=len, fill=supp)) +
geom_bar(stat="identity")
# Use position=position_dodge()
ggplot(data=df2, aes(x=dose, y=len, fill=supp)) +
geom_bar(stat="identity", position=position_dodge())
Change the color manually :
# Change the colors manually
p <- ggplot(data=df2, aes(x=dose, y=len, fill=supp)) +
geom_bar(stat="identity", color="black", position=position_dodge())+
theme_minimal()
# Use custom colors
p + scale_fill_manual(values=c('#999999','#E69F00'))
# Use brewer color palettes
p + scale_fill_brewer(palette="Blues")
Add labels
Add labels to a dodged barplot :
ggplot(data=df2, aes(x=dose, y=len, fill=supp)) +
geom_bar(stat="identity", position=position_dodge())+
geom_text(aes(label=len), vjust=1.6, color="white",
position = position_dodge(0.9), size=3.5)+
scale_fill_brewer(palette="Paired")+
theme_minimal()
Add labels to a stacked barplot : 3 steps are required
- Sort the data by dose and supp : the package plyr is used
- Calculate the cumulative sum of the variable len for each dose
- Create the plot
library(plyr)
# Sort by dose and supp
df_sorted <- arrange(df2, dose, supp)
head(df_sorted)
## supp dose len
## 1 OJ D0.5 4.2
## 2 VC D0.5 6.8
## 3 OJ D1 10.0
## 4 VC D1 15.0
## 5 OJ D2 29.5
## 6 VC D2 33.0
# Calculate the cumulative sum of len for each dose
df_cumsum <- ddply(df_sorted, "dose",
transform, label_ypos=cumsum(len))
head(df_cumsum)
## supp dose len label_ypos
## 1 OJ D0.5 4.2 4.2
## 2 VC D0.5 6.8 11.0
## 3 OJ D1 10.0 10.0
## 4 VC D1 15.0 25.0
## 5 OJ D2 29.5 29.5
## 6 VC D2 33.0 62.5
# Create the barplot
ggplot(data=df_cumsum, aes(x=dose, y=len, fill=supp)) +
geom_bar(stat="identity")+
geom_text(aes(y=label_ypos, label=len), vjust=1.6,
color="white", size=3.5)+
scale_fill_brewer(palette="Paired")+
theme_minimal()
If you want to place the labels at the middle of bars, you have to modify the cumulative sum as follow :
df_cumsum <- ddply(df_sorted, "dose",
transform,
label_ypos=cumsum(len) - 0.5*len)
# Create the barplot
ggplot(data=df_cumsum, aes(x=dose, y=len, fill=supp)) +
geom_bar(stat="identity")+
geom_text(aes(y=label_ypos, label=len), vjust=1.6,
color="white", size=3.5)+
scale_fill_brewer(palette="Paired")+
theme_minimal()
Barplot with a numeric x-axis
If the variable on x-axis is numeric, it can be useful to treat it as a continuous or a factor variable depending on what you want to do :
# Create some data
df2 <- data.frame(supp=rep(c("VC", "OJ"), each=3),
dose=rep(c("0.5", "1", "2"),2),
len=c(6.8, 15, 33, 4.2, 10, 29.5))
head(df2)
## supp dose len
## 1 VC 0.5 6.8
## 2 VC 1 15.0
## 3 VC 2 33.0
## 4 OJ 0.5 4.2
## 5 OJ 1 10.0
## 6 OJ 2 29.5
# x axis treated as continuous variable
df2$dose <- as.numeric(as.vector(df2$dose))
ggplot(data=df2, aes(x=dose, y=len, fill=supp)) +
geom_bar(stat="identity", position=position_dodge())+
scale_fill_brewer(palette="Paired")+
theme_minimal()
# Axis treated as discrete variable
df2$dose<-as.factor(df2$dose)
ggplot(data=df2, aes(x=dose, y=len, fill=supp)) +
geom_bar(stat="identity", position=position_dodge())+
scale_fill_brewer(palette="Paired")+
theme_minimal()
Barplot with error bars
The helper function below will be used to calculate the mean and the standard deviation, for the variable of interest, in each group :
#+++++++++++++++++++++++++
# Function to calculate the mean and the standard deviation
# for each group
#+++++++++++++++++++++++++
# data : a data frame
# varname : the name of a column containing the variable
#to be summariezed
# groupnames : vector of column names to be used as
# grouping variables
data_summary <- function(data, varname, groupnames){
require(plyr)
summary_func <- function(x, col){
c(mean = mean(x[[col]], na.rm=TRUE),
sd = sd(x[[col]], na.rm=TRUE))
}
data_sum<-ddply(data, groupnames, .fun=summary_func,
varname)
data_sum <- rename(data_sum, c("mean" = varname))
return(data_sum)
}
Summarize the data :
df3 <- data_summary(ToothGrowth, varname="len",
groupnames=c("supp", "dose"))
# Convert dose to a factor variable
df3$dose=as.factor(df3$dose)
head(df3)
## supp dose len sd
## 1 OJ 0.5 13.23 4.459709
## 2 OJ 1 22.70 3.910953
## 3 OJ 2 26.06 2.655058
## 4 VC 0.5 7.98 2.746634
## 5 VC 1 16.77 2.515309
## 6 VC 2 26.14 4.797731
The function geom_errorbar() can be used to produce a bar graph with error bars :
# Standard deviation of the mean as error bar
p <- ggplot(df3, aes(x=dose, y=len, fill=supp)) +
geom_bar(stat="identity", position=position_dodge()) +
geom_errorbar(aes(ymin=len-sd, ymax=len+sd), width=.2,
position=position_dodge(.9))
p + scale_fill_brewer(palette="Paired") + theme_minimal()
Customized barplots
# Change color by groups
# Add error bars
p + labs(title="Plot of length per dose",
x="Dose (mg)", y = "Length")+
scale_fill_manual(values=c('black','lightgray'))+
theme_classic()
Change fill colors manually :
# Greens
p + scale_fill_brewer(palette="Greens") + theme_minimal()
# Reds
p + scale_fill_brewer(palette="Reds") + theme_minimal()
Infos
This analysis has been performed using R software (ver. 3.1.2) and ggplot2 (ver. 1.0.0)
Show me some love with the like buttons below... Thank you and please don't forget to share and comment below!!
Montrez-moi un peu d'amour avec les like ci-dessous ... Merci et n'oubliez pas, s'il vous plaît, de partager et de commenter ci-dessous!
Recommended for You!
Recommended for you
This section contains best data science and self-development resources to help you on your path.
Coursera - Online Courses and Specialization
Data science
- Course: Machine Learning: Master the Fundamentals by Standford
- Specialization: Data Science by Johns Hopkins University
- Specialization: Python for Everybody by University of Michigan
- Courses: Build Skills for a Top Job in any Industry by Coursera
- Specialization: Master Machine Learning Fundamentals by University of Washington
- Specialization: Statistics with R by Duke University
- Specialization: Software Development in R by Johns Hopkins University
- Specialization: Genomic Data Science by Johns Hopkins University
Popular Courses Launched in 2020
- Google IT Automation with Python by Google
- AI for Medicine by deeplearning.ai
- Epidemiology in Public Health Practice by Johns Hopkins University
- AWS Fundamentals by Amazon Web Services
Trending Courses
- The Science of Well-Being by Yale University
- Google IT Support Professional by Google
- Python for Everybody by University of Michigan
- IBM Data Science Professional Certificate by IBM
- Business Foundations by University of Pennsylvania
- Introduction to Psychology by Yale University
- Excel Skills for Business by Macquarie University
- Psychological First Aid by Johns Hopkins University
- Graphic Design by Cal Arts
Books - Data Science
Our Books
- Practical Guide to Cluster Analysis in R by A. Kassambara (Datanovia)
- Practical Guide To Principal Component Methods in R by A. Kassambara (Datanovia)
- Machine Learning Essentials: Practical Guide in R by A. Kassambara (Datanovia)
- R Graphics Essentials for Great Data Visualization by A. Kassambara (Datanovia)
- GGPlot2 Essentials for Great Data Visualization in R by A. Kassambara (Datanovia)
- Network Analysis and Visualization in R by A. Kassambara (Datanovia)
- Practical Statistics in R for Comparing Groups: Numerical Variables by A. Kassambara (Datanovia)
- Inter-Rater Reliability Essentials: Practical Guide in R by A. Kassambara (Datanovia)
Others
- R for Data Science: Import, Tidy, Transform, Visualize, and Model Data by Hadley Wickham & Garrett Grolemund
- Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurelien Géron
- Practical Statistics for Data Scientists: 50 Essential Concepts by Peter Bruce & Andrew Bruce
- Hands-On Programming with R: Write Your Own Functions And Simulations by Garrett Grolemund & Hadley Wickham
- An Introduction to Statistical Learning: with Applications in R by Gareth James et al.
- Deep Learning with R by François Chollet & J.J. Allaire
- Deep Learning with Python by François Chollet