Add Text Labels to Histogram and Density Plots

kassambara | 02/09/2017 | 26084 | Post a comment | ggpubr: Publication Ready Plots

In this article, we’ll explain how to create histograms/density plots with text labels using the ggpubr package.

I used this type of plots in my recent scientific publication entitled “Global miRNA expression analysis identifies novel key regulators of plasma cell differentiation and malignant plasma cell”, in Nucleic Acids Research Journal, where I was interested to visualize the distribution of the citation index of some key genes (Figure 4A, A. Kassambara et al., NAR 2017). The plot has been generated using the ggpubr package.

In the examples presented here, We’ll use the demo data set gene_citation [in ggpubr]. It contains the mean citation index of 66 genes defined by assessing PubMed abstracts and annotations using two key words i) Gene name + b cell differentiation and ii) Gene name + plasma cell differentiation. A citation index is computed for each gene as the average number of citations obtained using the two key words. Genes with a mean citation index >= 3 are kept in the data.

Bar plot of the gene citation index sorted in descending order:

library(ggpubr)
# Load data
data(gene_citation)
head(gene_citation)

##      gene citation_index
## 2   CASP3           68.0
## 4    CDK6           10.5
## 7   CCND2           10.0
## 8     SCD            8.5
## 10 SLAMF6            4.5
## 11 BCL2L1           56.5

ggbarplot(gene_citation, x = "gene", y = "citation_index",
          fill = "lightgray", 
          xlab = "Gene name", ylab = "Citation index",
          sort.val = "desc", # Sort in descending order
          top = 20,          # select top 20 most citated genes
          x.text.angle = 45  # x axis text rotation angle
          )

The plot below shows the distribution of the citation index. Some key genes known to be involved in plasma cell differentiation are highlighted.

# Some key genes of interest to be highlighted
key.gns <- c("MYC", "PRDM1", "CD69", "IRF4", "CASP3",
             "BCL2L1", "MYB",  "BACH2", "BIM1",  "PTEN",
             "KRAS", "FOXP1", "IGF1R", "KLF4", "CDK6", "CCND2",
             "IGF1", "TNFAIP3", "SMAD3", "SMAD7",
             "BMPR2", "RB1", "IGF2R", "ARNT")
        
# Histogram distribution
gghistogram(gene_citation, x = "citation_index", y = "..count..",
            xlab = "Number of citation",
            ylab = "Number of genes",
            binwidth = 5, 
            fill = "lightgray", color = "black",
            label = "gene", label.select = key.gns, repel = TRUE,
            font.label = list(color= "citation_index"),
            xticks.by = 20, # Break x ticks by 20
            gradient.cols = c("blue", "red"),
            legend = c(0.7, 0.6),                                 
            legend.title = ""       # Hide legend title
            )

# Density distribution
ggdensity(gene_citation, x = "citation_index", y = "..count..",
            xlab = "Number of citation",
            ylab = "Number of genes",
            fill = "lightgray", color = "black",
            label = "gene", label.select = key.gns, repel = TRUE,
            font.label = list(color= "citation_index"),
            xticks.by = 20, # Break x ticks by 20
            gradient.cols = c("blue", "red"),
            legend = c(0.7, 0.6),                                 
            legend.title = ""       # Hide legend title
            )