Articles - ggpubr: Publication Ready Plots

Facilitating Exploratory Data Visualization: Application to TCGA Genomic Data

kassambara | 31/08/2017 | 37868 | Comments (13) | ggpubr: Publication Ready Plots

In genomic fields, it’s very common to explore the gene expression profile of one or a list of genes involved in a pathway of interest. Here, we present some helper functions in the ggpubr R package to facilitate exploratory data analysis (EDA) for life scientists.

Standard graphical techniques used in EDA, include:

Box plot
Violin plot
Stripchart
Dot plot
Histogram and density plots
ECDF plot
Q-Q plot

All these plots can be created using the ggplot2 R package, which is highly flexible.

However, to customize a ggplot, the syntax might appear opaque for a beginner and this raises the level of difficulty for researchers with no advanced R programming skills.

Here, we present the ggpubr package, a wrapper around ggplot2, which provides some easy-to-use functions for creating ‘ggplot2’- based publication ready plots. We’ll use the ggpubr functions to visualize gene expression profile from TCGA genomic data sets.

Contents:

Prerequisites
- ggpubr package
- TCGA data
Gene expression data
Box plots
Violin plots
Stripcharts and dot plots
Density plots
Histogram plots
Empirical cumulative density function
Quantile - Quantile plot

Prerequisites

ggpubr package

Required R package: ggpubr.

Install from CRAN as follow:

install.packages("ggpubr")

Or, install the latest developmental version from GitHub as follow:

if(!require(devtools)) install.packages("devtools")
devtools::install_github("kassambara/ggpubr")

Load ggpubr:

library(ggpubr)

TCGA data

The Cancer Genome Atlas (TCGA) data is a publicly available data containing clinical and genomic data across 33 cancer types. These data include gene expression, CNV profiling, SNP genotyping, DNA methylation, miRNA profiling, exome sequencing, and other types of data.

The RTCGA R package, by Marcin Marcin Kosinski et al., provides a convenient solution to access to clinical and genomic data available in TCGA. Each of the data packages is a separate package, and must be installed (once) individually.

The following R code installs the core RTCGA package as well as the clinical and mRNA gene expression data packages.

# Load the bioconductor installer. 
source("https://bioconductor.org/biocLite.R")
# Install the main RTCGA package
biocLite("RTCGA")
# Install the clinical and mRNA gene expression data packages
biocLite("RTCGA.clinical")
biocLite("RTCGA.mRNA")

To see the type of data available for each cancer type, use this:

library(RTCGA)
infoTCGA()

## # A tibble: 38 x 13
##   Cohort    BCR Clinical     CN   LowP Methylation   mRNA mRNASeq    miR
## *                 
## 1    ACC     92       92     90      0          80      0      79      0
## 2   BLCA    412      412    410    112         412      0     408      0
## 3   BRCA   1098     1097   1089     19        1097    526    1093      0
## 4   CESC    307      307    295     50         307      0     304      0
## 5   CHOL     51       45     36      0          36      0      36      0
## 6   COAD    460      458    451     69         457    153     457      0
## # ... with 32 more rows, and 4 more variables: miRSeq , RPPA ,
## #   MAF , rawMAF

More information about the disease names can be found at: http://gdac.broadinstitute.org/

Gene expression data

The R function expressionsTCGA() [in RTCGA package] can be used to easily extract the expression values of genes of interest in one or multiple cancer types.

In the following R code, we start by extracting the mRNA expression for five genes of interest - GATA3, PTEN, XBP1, ESR1 and MUC1 - from 3 different data sets:

Breast invasive carcinoma (BRCA),
Ovarian serous cystadenocarcinoma (OV) and
Lung squamous cell carcinoma (LUSC)

library(RTCGA)
library(RTCGA.mRNA)
expr <- expressionsTCGA(BRCA.mRNA, OV.mRNA, LUSC.mRNA,
                        extract.cols = c("GATA3", "PTEN", "XBP1","ESR1", "MUC1"))
expr

## # A tibble: 1,305 x 7
##            bcr_patient_barcode   dataset GATA3  PTEN  XBP1   ESR1  MUC1
##                                     
## 1 TCGA-A1-A0SD-01A-11R-A115-07 BRCA.mRNA  2.87 1.361  2.98  3.084  1.65
## 2 TCGA-A1-A0SE-01A-11R-A084-07 BRCA.mRNA  2.17 0.428  2.55  2.386  3.08
## 3 TCGA-A1-A0SH-01A-11R-A084-07 BRCA.mRNA  1.32 1.306  3.02  0.791  2.99
## 4 TCGA-A1-A0SJ-01A-11R-A084-07 BRCA.mRNA  1.84 0.810  3.13  2.495 -1.92
## 5 TCGA-A1-A0SK-01A-12R-A084-07 BRCA.mRNA -6.03 0.251 -1.45 -4.861 -1.17
## 6 TCGA-A1-A0SM-01A-11R-A084-07 BRCA.mRNA  1.80 1.311  4.04  2.797  3.53
## # ... with 1,299 more rows

To display the number of sample in each data set, type this:

nb_samples <- table(expr$dataset)
nb_samples

## 
## BRCA.mRNA LUSC.mRNA   OV.mRNA 
##       590       154       561

We can simplify data set names by removing the “mRNA” tag. This can be done using the R base function gsub().

expr$dataset <- gsub(pattern = ".mRNA", replacement = "",  expr$dataset)

Let’s simplify also the patients’ barcode column. The following R code will change the barcodes into BRCA1, BRCA2, …, OV1, OV2, …., etc

expr$bcr_patient_barcode <- paste0(expr$dataset, c(1:590, 1:561, 1:154))
expr

## # A tibble: 1,305 x 7
##   bcr_patient_barcode dataset GATA3  PTEN  XBP1   ESR1  MUC1
##                          
## 1               BRCA1    BRCA  2.87 1.361  2.98  3.084  1.65
## 2               BRCA2    BRCA  2.17 0.428  2.55  2.386  3.08
## 3               BRCA3    BRCA  1.32 1.306  3.02  0.791  2.99
## 4               BRCA4    BRCA  1.84 0.810  3.13  2.495 -1.92
## 5               BRCA5    BRCA -6.03 0.251 -1.45 -4.861 -1.17
## 6               BRCA6    BRCA  1.80 1.311  4.04  2.797  3.53
## # ... with 1,299 more rows

The above (expr) dataset has been saved at https://raw.githubusercontent.com/kassambara/data/master/expr_tcga.txt. This data is required to practice the R code provided in this tutotial.

If you experience some issues in installing the RTCGA packages, You can simply load the data as follow:

expr <- read.delim("https://raw.githubusercontent.com/kassambara/data/master/expr_tcga.txt",
                   stringsAsFactors = FALSE)

Box plots

Create a box plot of a gene expression profile, colored by groups (here data set/cancer type):

library(ggpubr)
# GATA3
ggboxplot(expr, x = "dataset", y = "GATA3",
          title = "GATA3", ylab = "Expression",
          color = "dataset", palette = "jco")
# PTEN
ggboxplot(expr, x = "dataset", y = "PTEN",
          title = "PTEN", ylab = "Expression",
          color = "dataset", palette = "jco")

Note that, the argument palette is used to change color palettes. Allowed values include:

“grey” for grey color palettes;
brewer palettes e.g. “RdBu”, “Blues”, …;. To view all, type this in R: RColorBrewer::display.brewer.all() or click here to see all brewer palettes;
or custom color palettes e.g. c(“blue”, “red”) or c(“#00AFBB”, “#E7B800”);
and scientific journal palettes from the ggsci R package, e.g.: “npg”, “aaas”, “lancet”, “jco”, “ucscgb”, “uchicago”, “simpsons” and “rickandmorty”.

Instead of repeating the same R code for each gene, you can create a list of plots at once, as follow:

# Create a  list of plots
p <- ggboxplot(expr, x = "dataset", 
               y = c("GATA3", "PTEN", "XBP1"),
               title = c("GATA3", "PTEN", "XBP1"),
               ylab = "Expression", 
               color = "dataset", palette = "jco")
# View GATA3
p$GATA3
# View PTEN
p$PTEN
# View XBP1
p$XBP1

Note that, when the argument y contains multiple variables (here multiple gene names), then the arguments title, xlab and ylab can be also a character vector of same length as y.

To add p-values and significance levels to the boxplots, read our previous article: Add P-values and Significance Levels to ggplots. Briefly, you can do this:

my_comparisons <- list(c("BRCA", "OV"), c("OV", "LUSC"))
ggboxplot(expr, x = "dataset", y = "GATA3",
          title = "GATA3", ylab = "Expression",
          color = "dataset", palette = "jco")+
  stat_compare_means(comparisons = my_comparisons)

For each of the genes, you can compare the different groups as follow:

compare_means(c(GATA3, PTEN, XBP1) ~ dataset, data = expr)

## # A tibble: 9 x 8
##      .y. group1 group2         p     p.adj p.format p.signif   method
##                             
## 1  GATA3   BRCA     OV 1.11e-177 3.34e-177  < 2e-16     **** Wilcoxon
## 2  GATA3   BRCA   LUSC  6.68e-73  1.34e-72  < 2e-16     **** Wilcoxon
## 3  GATA3     OV   LUSC  2.97e-08  2.97e-08  3.0e-08     **** Wilcoxon
## 4   PTEN   BRCA     OV  6.79e-05  6.79e-05  6.8e-05     **** Wilcoxon
## 5   PTEN   BRCA   LUSC  1.04e-16  3.13e-16  < 2e-16     **** Wilcoxon
## 6   PTEN     OV   LUSC  1.28e-07  2.56e-07  1.3e-07     **** Wilcoxon
## # ... with 3 more rows

If you want to select items (here cancer types) to display or to remove a particular item from the plot, use the argument select or remove, as follow:

# Select BRCA and OV cancer types
ggboxplot(expr, x = "dataset", y = "GATA3",
          title = "GATA3", ylab = "Expression",
          color = "dataset", palette = "jco",
          select = c("BRCA", "OV"))
# or remove BRCA
ggboxplot(expr, x = "dataset", y = "GATA3",
          title = "GATA3", ylab = "Expression",
          color = "dataset", palette = "jco",
          remove = "BRCA")

To change the order of the data sets on x axis, use the argument order. For example order = c(“LUSC”, “OV”, “BRCA”):

# Order data sets
ggboxplot(expr, x = "dataset", y = "GATA3",
          title = "GATA3", ylab = "Expression",
          color = "dataset", palette = "jco",
          order = c("LUSC", "OV", "BRCA"))

To create horizontal plots, use the argument rotate = TRUE:

ggboxplot(expr, x = "dataset", y = "GATA3",
          title = "GATA3", ylab = "Expression",
          color = "dataset", palette = "jco",
          rotate = TRUE)

To combine the three gene expression plots into a multi-panel plot, use the argument combine = TRUE:

ggboxplot(expr, x = "dataset",
          y = c("GATA3", "PTEN", "XBP1"),
          combine = TRUE,
          ylab = "Expression",
          color = "dataset", palette = "jco")

You can also merge the 3 plots using the argument merge = TRUE or merge = “asis”:

ggboxplot(expr, x = "dataset",
          y = c("GATA3", "PTEN", "XBP1"),
          merge = TRUE,
          ylab = "Expression", 
          palette = "jco")

In the plot above, It’s easy to visually compare the expression level of the different genes in each cancer type.

But you might want to put genes (y variables) on x axis, in order to compare the expression level in the different cell subpopulations.

In this situation, the y variables (i.e.: genes) become x tick labels and the x variable (i.e.: dataset) becomes the grouping variable. To do this, use the argument merge = “flip”.

ggboxplot(expr, x = "dataset",
          y = c("GATA3", "PTEN", "XBP1"),
          merge = "flip",
          ylab = "Expression", 
          palette = "jco")

You might want to add jittered points on the boxplot. Each point correspond to individual observations. To add jittered points, use the argument add = “jitter” as follow. To customize the added elements, specify the argument add.params.

ggboxplot(expr, x = "dataset",
          y = c("GATA3", "PTEN", "XBP1"),
          combine = TRUE,
          color = "dataset", palette = "jco",
          ylab = "Expression", 
          add = "jitter",                              # Add jittered points
          add.params = list(size = 0.1, jitter = 0.2)  # Point size and the amount of jittering
          )

Note that, when using ggboxplot() sensible values for the argument add are one of c(“jitter”, “dotplot”). If you decide to use add = “dotplot”, you can adjust dotsize and binwidth wen you have a strong dense dotplot. Read more about binwidth.

You can add and adjust a dotplot as follow:

ggboxplot(expr, x = "dataset",
          y = c("GATA3", "PTEN", "XBP1"),
          combine = TRUE,
          color = "dataset", palette = "jco",
          ylab = "Expression", 
          add = "dotplot",                              # Add dotplot
          add.params = list(binwidth = 0.1, dotsize = 0.3)
          )

You might want to label the boxplot by showing the names of samples with the top n highest or lowest values. In this case, you can use the following arguments:

label: the name of the column containing point labels.
label.select: can be of two formats:
- a character vector specifying some labels to show.
- a list containing one or the combination of the following components:
  - top.up and top.down: to display the labels of the top up/down points. For example, label.select = list(top.up = 10, top.down = 4).
  - criteria: to filter, for example, by x and y variables values, use this: label.select = list(criteria = “`y` > 3.9 & `y` < 5 & `x` %in% c(‘BRCA’, ‘OV’)”).

For example:

ggboxplot(expr, x = "dataset",
          y = c("GATA3", "PTEN", "XBP1"),
          combine = TRUE,
          color = "dataset", palette = "jco",
          ylab = "Expression", 
          add = "jitter",                               # Add jittered points
          add.params = list(size = 0.1, jitter = 0.2),  # Point size and the amount of jittering
          label = "bcr_patient_barcode",                # column containing point labels
          label.select = list(top.up = 2, top.down = 2),# Select some labels to display
          font.label = list(size = 9, face = "italic"), # label font
          repel = TRUE                                  # Avoid label text overplotting
          )

A complex criteria for labeling can be specified as follow:

label.select.criteria <- list(criteria = "`y` > 3.9 & `x` %in% c('BRCA', 'OV')")
ggboxplot(expr, x = "dataset",
          y = c("GATA3", "PTEN", "XBP1"),
          combine = TRUE,
          color = "dataset", palette = "jco",
          ylab = "Expression", 
          label = "bcr_patient_barcode",              # column containing point labels
          label.select = label.select.criteria,       # Select some labels to display
          font.label = list(size = 9, face = "italic"), # label font
          repel = TRUE                                # Avoid label text overplotting
          )

Other types of plots, with the same arguments as the function ggboxplot(), are available, such as stripchart and violin plots.

Violin plots

The following R code draws violin plots with box plots inside:

ggviolin(expr, x = "dataset",
          y = c("GATA3", "PTEN", "XBP1"),
          combine = TRUE, 
          color = "dataset", palette = "jco",
          ylab = "Expression", 
          add = "boxplot")

Instead of adding a box plot inside the violin plot, you can add the median + interquantile range as follow:

ggviolin(expr, x = "dataset",
          y = c("GATA3", "PTEN", "XBP1"),
          combine = TRUE, 
          color = "dataset", palette = "jco",
          ylab = "Expression", 
          add = "median_iqr")

When using the function ggviolin(), sensible values for the argument add include: “mean”, “mean_se”, “mean_sd”, “mean_ci”, “mean_range”, “median”, “median_iqr”, “median_mad”, “median_range”.

You can also add “jitter” points and “dotplot” inside the violin plot as described previously in the box plot section.

Stripcharts and dot plots

To draw a stripchart, type this:

ggstripchart(expr, x = "dataset",
             y = c("GATA3", "PTEN", "XBP1"),
             combine = TRUE, 
             color = "dataset", palette = "jco",
             size = 0.1, jitter = 0.2,
             ylab = "Expression", 
             add = "median_iqr",
             add.params = list(color = "gray"))

For a dot plot, use this:

ggdotplot(expr, x = "dataset",
          y = c("GATA3", "PTEN", "XBP1"),
          combine = TRUE, 
          color = "dataset", palette = "jco",
          fill = "white",
          binwidth = 0.1,
          ylab = "Expression", 
          add = "median_iqr",
          add.params = list(size = 0.9))

Density plots

To visualize the distribution as a density plot, use the function ggdensity() as follow:

# Basic density plot
ggdensity(expr,
       x = c("GATA3", "PTEN",  "XBP1"),
       y = "..density..",
       combine = TRUE,                  # Combine the 3 plots
       xlab = "Expression", 
       add = "median",                  # Add median line. 
       rug = TRUE                       # Add marginal rug
)

# Change color and fill by dataset
ggdensity(expr,
       x = c("GATA3", "PTEN",  "XBP1"),
       y = "..density..",
       combine = TRUE,                  # Combine the 3 plots
       xlab = "Expression", 
       add = "median",                  # Add median line. 
       rug = TRUE,                      # Add marginal rug
       color = "dataset", 
       fill = "dataset",
       palette = "jco"
)

# Merge the 3 plots
# and use y = "..count.." instead of "..density.."
ggdensity(expr,
       x = c("GATA3", "PTEN",  "XBP1"),
       y = "..count..",
       merge = TRUE,                    # Merge the 3 plots
       xlab = "Expression", 
       add = "median",                  # Add median line. 
       rug = TRUE ,                     # Add marginal rug
       palette = "jco"                  # Change color palette
)

# color and fill by x variables
ggdensity(expr,
       x = c("GATA3", "PTEN",  "XBP1"),
       y = "..count..",
       color = ".x.", fill = ".x.",     # color and fill by x variables
       merge = TRUE,                    # Merge the 3 plots
       xlab = "Expression", 
       add = "median",                  # Add median line. 
       rug = TRUE ,                     # Add marginal rug
       palette = "jco"                  # Change color palette
)

# Facet by "dataset"
ggdensity(expr,
       x = c("GATA3", "PTEN",  "XBP1"),
       y = "..count..",
       color = ".x.", fill = ".x.", 
       facet.by = "dataset",            # Split by "dataset" into multi-panel
       merge = TRUE,                    # Merge the 3 plots
       xlab = "Expression", 
       add = "median",                  # Add median line. 
       rug = TRUE ,                     # Add marginal rug
       palette = "jco"                  # Change color palette
)

Histogram plots

To visualize the distribution as a histogram plot, use the function gghistogram() as follow:

# Basic histogram plot 
gghistogram(expr,
       x = c("GATA3", "PTEN",  "XBP1"),
       y = "..density..",
       combine = TRUE,                  # Combine the 3 plots
       xlab = "Expression", 
       add = "median",                  # Add median line. 
       rug = TRUE                       # Add marginal rug
)

# Change color and fill by dataset
gghistogram(expr,
       x = c("GATA3", "PTEN",  "XBP1"),
       y = "..density..",
       combine = TRUE,                  # Combine the 3 plots
       xlab = "Expression", 
       add = "median",                  # Add median line. 
       rug = TRUE,                      # Add marginal rug
       color = "dataset", 
       fill = "dataset",
       palette = "jco"
)

# Merge the 3 plots
# and use y = "..count.." instead of "..density.."
gghistogram(expr,
       x = c("GATA3", "PTEN",  "XBP1"),
       y = "..count..",
       merge = TRUE,                    # Merge the 3 plots
       xlab = "Expression", 
       add = "median",                  # Add median line. 
       rug = TRUE ,                     # Add marginal rug
       palette = "jco"                  # Change color palette
)

# color and fill by x variables
gghistogram(expr,
       x = c("GATA3", "PTEN",  "XBP1"),
       y = "..count..",
       color = ".x.", fill = ".x.",     # color and fill by x variables
       merge = TRUE,                    # Merge the 3 plots
       xlab = "Expression", 
       add = "median",                  # Add median line. 
       rug = TRUE ,                     # Add marginal rug
       palette = "jco"                  # Change color palette
)

# Facet by "dataset"
gghistogram(expr,
       x = c("GATA3", "PTEN",  "XBP1"),
       y = "..count..",
       color = ".x.", fill = ".x.", 
       facet.by = "dataset",            # Split by "dataset" into multi-panel
       merge = TRUE,                    # Merge the 3 plots
       xlab = "Expression", 
       add = "median",                  # Add median line. 
       rug = TRUE ,                     # Add marginal rug
       palette = "jco"                  # Change color palette
)

Empirical cumulative density function

# Basic ECDF plot 
ggecdf(expr,
       x = c("GATA3", "PTEN",  "XBP1"),
       combine = TRUE,                 
       xlab = "Expression", ylab = "F(expression)"
)

# Change color  by dataset
ggecdf(expr,
       x = c("GATA3", "PTEN",  "XBP1"),
       combine = TRUE,                 
       xlab = "Expression", ylab = "F(expression)",
       color = "dataset", palette = "jco"
)

# Merge the 3 plots and color by x variables
ggecdf(expr,
       x = c("GATA3", "PTEN",  "XBP1"),
       merge = TRUE,                 
       xlab = "Expression", ylab = "F(expression)",
       color = ".x.", palette = "jco"
)

# Merge the 3 plots and color by x variables
# facet by "dataset" into multi-panel
ggecdf(expr,
       x = c("GATA3", "PTEN",  "XBP1"),
       merge = TRUE,                 
       xlab = "Expression", ylab = "F(expression)",
       color = ".x.", palette = "jco",
       facet.by = "dataset"
)

Quantile - Quantile plot

# Basic ECDF plot 
ggqqplot(expr,
       x = c("GATA3", "PTEN",  "XBP1"),
       combine = TRUE, size = 0.5
)

# Change color  by dataset
ggqqplot(expr,
       x = c("GATA3", "PTEN",  "XBP1"),
       combine = TRUE, color = "dataset", palette = "jco",
       size = 0.5
)

# Merge the 3 plots and color by x variables
ggqqplot(expr,
       x = c("GATA3", "PTEN",  "XBP1"),
       merge = TRUE,  
       color = ".x.", palette = "jco"
)

# Merge the 3 plots and color by x variables
# facet by "dataset" into multi-panel
ggqqplot(expr,
       x = c("GATA3", "PTEN",  "XBP1"),
       merge = TRUE, size = 0.5,
       color = ".x.", palette = "jco",
       facet.by = "dataset"
)

Last update : 31/08/2017

1 Note

Enjoyed this article? Give us 5 stars (just above this text block)! Reader needs to be STHDA member for voting. I’d be very grateful if you’d help it spread by emailing it to a friend, or sharing it on Twitter, Facebook or Linked In.

Show me some love with the like buttons below... Thank you and please don't forget to share and comment below!!

Avez vous aimé cet article? Donnez nous 5 étoiles (juste au dessus de ce block)! Vous devez être membre pour voter. Je vous serais très reconnaissant si vous aidiez à sa diffusion en l'envoyant par courriel à un ami ou en le partageant sur Twitter, Facebook ou Linked In.

Montrez-moi un peu d'amour avec les like ci-dessous ... Merci et n'oubliez pas, s'il vous plaît, de partager et de commenter ci-dessous!

Recommended for You!

Machine Learning Essentials: Practical Guide in R

Practical Guide to Cluster Analysis in R

Practical Guide to Principal Component Methods in R

R Graphics Essentials for Great Data Visualization

Network Analysis and Visualization in R

More books on R and data science

Recommended for you

This section contains best data science and self-development resources to help you on your path.

Coursera - Online Courses and Specialization

Books - Data Science

Our Books

Practical Guide to Cluster Analysis in R by A. Kassambara (Datanovia)
Practical Guide To Principal Component Methods in R by A. Kassambara (Datanovia)
Machine Learning Essentials: Practical Guide in R by A. Kassambara (Datanovia)
R Graphics Essentials for Great Data Visualization by A. Kassambara (Datanovia)
GGPlot2 Essentials for Great Data Visualization in R by A. Kassambara (Datanovia)
Network Analysis and Visualization in R by A. Kassambara (Datanovia)
Practical Statistics in R for Comparing Groups: Numerical Variables by A. Kassambara (Datanovia)
Inter-Rater Reliability Essentials: Practical Guide in R by A. Kassambara (Datanovia)

Others

R for Data Science: Import, Tidy, Transform, Visualize, and Model Data by Hadley Wickham & Garrett Grolemund
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurelien Géron
Practical Statistics for Data Scientists: 50 Essential Concepts by Peter Bruce & Andrew Bruce
Hands-On Programming with R: Write Your Own Functions And Simulations by Garrett Grolemund & Hadley Wickham
An Introduction to Statistical Learning: with Applications in R by Gareth James et al.
Deep Learning with R by François Chollet & J.J. Allaire
Deep Learning with Python by François Chollet

Comments

You are not authorized to post a comment

Comment

Bennet Anry

Member

#854 05/06/2020 at 14h17

Thanks

Comment

RadekJ

Member

#831 07/18/2019 at 15h08

Just for the record, I figured out that to allow each facet to have its own y axis with a different scale, one can set scale="free" to the ggboxplot(..., combine=T), however axes of individual facets will not be labelled differently. So it is better to produce a list of plots with combine=F, and then use ggarrange(plotlist=your.list.of.plots)

Comment

RadekJ

Member

#827 07/18/2019 at 11h29

Hello, first thanks for very useful resource.
I have a simple question. I want to facet grouped boxplots, but each facet needs to have a different y scale (because it is different variable with different units for example, not like gene expression where all variables have same units). How can I achieve that? I thought that similarly to having a vector with multiple name for the ylab, I could supply multiple ylim parameters, but this is not possible.

Thanks

Comment

Gilles

Visitor

#629 10/15/2018 at 13h04

Hi, first I thank you so much for this very usefull package.
I don't understand why the lines go beyond the highest values, and the same with add="median_iqr".

mesure<-c(1,3,5,1,2,2,1,5,8)
modalité<-as.vector(c(rep("A",3), rep("B",3), rep("C",3)))
ABC<-data.frame(mesure, modalité)

library(ggpubr)
ggstripchart(ABC, x="modalité", y ="mesure",
add="median_range", color="modalité",
title="ggstripchart ABC add='median_range'")

Comment

kassambara

Administrator

#501 05/26/2018 at 10h45

You need to read the documentation of the RTCGA package

Comment

Visitor

#500 05/25/2018 at 14h37

Hi,

is it possible to do this with other cancer types as well? In the list it says there are counts for all cancer types but I cannot extract the expression from other cancer types apart from the ones you used here.

Best
Caro

Comment

kassambara

Administrator

#470 05/19/2018 at 11h48

You can change ylab as follow:

Code R :

 
library(ggpubr)
expr <- read.delim("https://raw.githubusercontent.com/kassambara/data/master/expr_tcga.txt",stringsAsFactors = FALSE)
ggscatter(expr, x = "GATA3", 
          y = c("PTEN", "XBP1"),
          ylab = c("PTEN.New.Name", "XBP1.New.Name"),
          color = "dataset", palette = "jco")

To easily change graphical parameters, read this: http://www.sthda.com/english/articles/24-ggpubr-publication-ready-plots/82-ggplot2-easy-way-to-change-graphical-parameters/

Comment

Visitor

#443 04/20/2018 at 00h29

Hi!
If I have multiple variables in y leading to a few panels (scatterplots):
ggscatter(dat2, x= "X", y=c("Y1", "Y1"),add="reg.line", conf.int=T, cor.coef=F, combine=T, color="black", fill="#00BFC4", palette="blues", size=4, shape=21)
How can I change the names of Y1 and Y2 and the font size?

Thanks!
Louis

Comment

Ahsan

Member

#350 01/13/2018 at 20h30

Wow !!
Thats so cool !!
Thanks a lot for the nice solution.

Comment

kassambara

Administrator

#348 01/12/2018 at 23h19

I would go as follow:

Code R :

 
library(readr)
df <- read_csv(file.choose())
# Change the order of groups
df$cell <- factor(df$cell, levels = c("wt", "mut_1", "mut_2"))
df$time <- factor(df$time, levels = c("0", "4", "8"))
 
# Create the box plot
library(ggpubr)
ggboxplot(
  df, x = "cell", y = "foci",
  color = "cell", palette = "npg",
  ylab = "Number of foci per nucleus", xlab = "Genotype",
  add="jitter", add.params = list(size = 1, jitter = 0.2),
  ggtheme = theme_bw(), ylim = c(0, 45)
)+
  facet_wrap(~time) +
  stat_compare_means(comparisons = list(c("wt", "mut_1"), c("mut_2", "wt")))

The output looks like this:

Comment

Ahsan

Member

#339 12/27/2017 at 15h31

Thanks for the prompt reply.
I don't know how to reply my previous comment.
My working data structure is little bit different than this example, that's why I still facing some problem.

This is the link to my data , data structure is like below:
> head(df)
cell time foci
1 wt 0 6
2 wt 0 7
3 wt 0 8
4 wt 0 6
5 wt 0 6
6 wt 0

I run the following code :

ggboxplot(df, x = "time" ,
y = "foci",
color = "cell",
palette = "npg",
ylab = "Number of foci per nucleus",
xlab = "Time of drug exposure",
add="jitter", add.params = list(size = 0.1, jitter = 0.2)
)

for the first part I get a graph, like below

1) But I want the order like wt,mut_1,mut_2 or in another case I want to change the time order 4, 0 , 8 and cell order mut_1 , wt and mut_2.
For that what do I have to write?

2) when I add the following segment
+
stat_compare_means(comparisons = list(c("wt", "mut_1"), c("mut_2", "wt"))
)
It shows me error:

Warning message:
Computation failed in `stat_signif()`:
missing value where TRUE/FALSE needed

Actually I wanted to make p value comparison (and t test also) between wt at 0 hr vs wt at 4 hr, mut_1 at 4 hr vs mut_1 at 8 hr and mut_1 at 0 hr vs mut_2 at 8 hr
How to do that?

Sorry for such long post.

Comment

kassambara

Administrator

#338 12/27/2017 at 09h34

1) Reorder the data set. New order: LUSC OV BRCA.

Code R :

 
# Reorder the data set
expr$dataset <- factor(
  expr$dataset, levels = c("LUSC", "OV", "BRCA")
  )
 
# Create the box plot
ggboxplot(expr, x = "dataset",
          y = c("GATA3", "PTEN", "XBP1"),
          merge = "flip",
          ylab = "Expression",
          palette = "jco")

2)

- You want to compare the different data set for each gene: the plot should be reorganized so that you have the data set name on the x-axis
- Yu still want to combine multiple genes ==> use combine = TRUE instead of merge = TRUE

Code R :

 
ggboxplot(expr, x = "dataset",
          y = c("GATA3", "PTEN", "XBP1"),
          combine = TRUE,
          color = "dataset", palette = "jco",
          ylab = "Expression"
          )+
  stat_compare_means(
    comparisons = list(c("BRCA", "OV"), c("OV", "LUSC"))
  )

Comment

Ahsan

Visitor

#336 12/27/2017 at 06h37

Dear Dr. Alboukadel Kassambara
Thanks a lot for the nice blogs to teach people data visualization.
For a newbie like me this is a huge treasure.

for the code below
ggboxplot(expr, x = "dataset",
y = c("GATA3", "PTEN", "XBP1"),
merge = "flip",
ylab = "Expression",
palette = "jco")
the image is like below

1. If I want to swap the position of BRCA OV LUSC what I have to add ?
for example I want LUSC OV BRCA order or LUSC BRCA OV this order.
2. For above case , may be I only want to cpmpare p Value between only GATA3-BRCA to PTEN-OV , PTEN-LUSC to XBP1-LUSC and PTEN-BRCA to PTEN-OV,
What will be the code or how to modify the following code ?

my_comparisons <- list(c("BRCA", "OV"), c("OV", "LUSC"))
ggboxplot(expr, x = "dataset", y = "GATA3",
title = "GATA3", ylab = "Expression",
color = "dataset", palette = "jco")+
stat_compare_means(comparisons = my_comparisons)