1 Introduction

This vignette illustrates how to use a MAD-FC transform to visualize fold change measurements in a volcano plot. Volcano plots are scatterplots that visualize statistical significance versus effect size and enable the quick identification of genes that exhibit both high statistical significance and large fold change. MAD-FC is a new transform for fold changes that distorts linear fold change measurements to match the scale of positive fold change measurements. For more information, please refer to the publication.

B. A. Corliss, Y. Wang, F. P. Driscoll, H. Shakeri, P. E. Bourne, MAD-FC: A Fold Change Visualization with Readability, Proportionality, and Symmetry (2023), doi:10.48550/arXiv.2303.10829.

2 Dataset and Code Sources

Dataset Availability: This vignette uses a dataset that is included with DeseQ2 package. Airway https://pubmed.ncbi.nlm.nih.gov/24926665/.

Code Reference: This tutorial uses code from the introductory vignette for DESeq2. “Analyzing RNA-seq data with DESeq2” by Michael I. Love, Simon Anders, and Wolfgang Huber.

3 Package Dependencies

Primary Packages: tidyverse, BiocManager, cowplot, magrittr

Supporting Packages: DESeq2, airway, EnhancedVolcano

The MAD-FC source file is also required (mirrored_axis_distortion.R)

# Install required Base Packages
base_packages <- c("magrittr", "tidyverse","BiocManager")
install.packages(setdiff(base_packages, rownames(installed.packages())))  
# Install required Bioconductor Packages
biocm_packages <- c("DESeq2","airway","EnhancedVolcano")
bioc_installs <- setdiff(biocm_packages, rownames(installed.packages()))
if (length(bioc_installs)) {BiocManager::install(bioc_installs) }

# Load base packages
lapply(base_packages, library, character.only = TRUE)
# Load Bioconductor packages packages
lapply(biocm_packages, library, character.only = TRUE)
source(paste0(dirname(getwd()), "/R/mirrored_axis_distortion.R"))

4 Data Analysis Steps

We processing the data with the following steps:

  1. Loading the airway dataset.
  2. Convert Airway dataset to a DESeq dataset (Enforces non-negative integer values in counts matrix, and an experiment formula).
  3. Perform differential expression analysis based on the Negative Binomial distribution.
  4. Extract results.
  5. Perform shrinkage of effect size, which reduced noise in lowly expressed genes with high variability. Calculating log2 fold changes, and then refining with shrinkage of effect size.
# Load data
data('airway',results='hide')

# Converts airway dataset from RangedSummarizedExperiment to DESeqDataSet class
dds <- DESeqDataSet(airway, design = ~ cell + dex)
# Differential expression analysis based on the Negative Binomial (Gamme-Poisson) distribution
dds <- DESeq(dds)
# Extract results from DESeq analysis
res <- results(dds, contrast = c('dex','trt','untrt'))
#Shrink fold changes associated with a condition
res <- lfcShrink(dds, contrast = c('dex','trt','untrt'), res=res, type = 'normal')

5 Datapoint Transforms and Annotation

Next, the results table is converted to a data frame for plotting with ggplot2. The datapoints are designated as having significantly up-regulated, down-regulated expression, or neither.

# COnvert result table to dataframe
df_res <- as.data.frame(res)
# Anottate fold change measurements based on pvaue and statistical significance.
df_res$Change <- "NS"
# if log2Foldchange > 0.6 and pvalue < 0.05, set as "UP" 
df_res$Change[df_res$log2FoldChange > 1 & df_res$padj < 0.1] <- "Up"
# if log2Foldchange < -0.6 and pvalue < 0.05, set as "DOWN"
df_res$Change[df_res$log2FoldChange < -1 & df_res$padj < 0.1] <- "Down"
df_res$Change <- factor(df_res$Change, ordered = TRUE, levels = c("Down", "NS", "Up"))
# Calculate raw fold change from log2 fold change values
df_res$FoldChange <- 2^df_res$log2FoldChange
# Calculate MAD-FC values
df_res$MAD_FoldChange <- contract1(fc_to_mfc(df_res$FoldChange))

6 Visualization

6.1 Volcano Plots

We now will visualize the same dataset with a volcano plot where fold changes are linear, log2, and MAD-FC transformed. Each type of transform has unique characteristics for how it displays the fold change values.

6.1.1 Linear Volcano Plot

We first produce a volcano plot where fold changes are mapped to a linear scale (raw fold change units, y-axis). Notice that with this plot, negative fold changes are compressed between [0,1) and positive fold changes are between (1,inf). This limits the linear plot’s usefulness in comparing the magnitude of the positive and negative fold change values.

# Fold Change Linear Plot
g1 <- ggplot(data=df_res, aes(x=FoldChange, y=-log10(pvalue))) +
  geom_vline(xintercept = 1, color = "gray") +
  geom_point(aes(col=Change), size = 0.5, alpha = 0.6, shape = 16) +
  scale_color_manual(values=c("blue", "black", "red"), name = "Change") +
  scale_y_continuous(expand = c(0,0)) +
  theme_classic(base_size = 16)   +
  ylab(expression(-log[10]~(`p-value`))) +  xlab("FC") +
  theme(legend.position = "none")     
g1

6.1.2 Log Volcano plot

Next we produce a volcano plot of the same data where fold changes are mapped to a log2 scale (raw fold change units). Log plots exhibit symmetry with positive and negative fold changes, allowing them to be compared based on their relative position to zero on the log scale (the point of no change). However, the non-linearity of the log transform makes it difficult to compare the magnitude among collections of positive fold change data points and separately, negative fold changes. It is also difficult to recover the original fold change value before the log transform.

# Log2 Plot
g2 <- ggplot(data=df_res, aes(x=log2FoldChange, y=-log10(pvalue))) +
  geom_vline(xintercept = 0, color = "gray") +
  geom_point(aes(col=Change), size = 1, alpha = 0.6, shape = 16) +
  scale_color_manual(values=c("blue", "black", "red"), name = "Change") +
  scale_y_continuous(expand = c(0,0)) +
  theme_classic(base_size = 16) + 
  ylab(expression(-log[10]~(`p-value`))) +  xlab(expression(log[2]~(FC))) +
  theme(legend.position = "none")     
g2

6.1.3 MAD Volcano Plot

Finally we produce a volcano plot of the same data where fold changes are mapped to a MAD-FC scale (raw fold change units with negative fold change scale distorted to match the positive fold change scale). MAD plots are symmetrical and linear by design, making it easier to compare datapoints with different fold change direction, datapoints with same direction. The linear mapping makes it easy to recover the original datapoint position from the plot.

To create a MAD plot, first create a linear plot like above, and then call the function. gg_rev_axis_mfc(gg, axes, num_format) Where gg is the ggplot object, axis is the axis to be transformed, and num_format is the format used for the plot tick labels. This function extracts the axis labels from the gg plot object and reverses the MAD-FC transform to display the original FC units.

# MAD Fold Change Linear Plot
g3 <- ggplot(data=df_res, aes(x=MAD_FoldChange, y=-log10(pvalue))) +
  geom_vline(xintercept = 0, color = "gray") +
  geom_point(aes(col=Change), size = 1, alpha = 0.6, shape = 16) +
  scale_color_manual(values=c("blue", "black", "red"), name = "Change") +
  theme_classic(base_size = 16) +
  ylab(expression(-log[10]~(`p-value`))) +  xlab("FC") +
  theme(legend.position = "none") +
  scale_x_continuous(breaks=c(c(-9, - 4, 0, seq(4,30,5))), expand = c(0,0)) +
  scale_y_continuous(expand = c(0,0))

g3 <- gg_revaxis_mfc(g3,'x', num_format = "fraction")
g3

6.2 MA Plots

We now will visualize the same dataset with a MA plot where fold changes are linear, log2, and MAD-FC transformed. Each type of transform has unique characteristics for how it displays the fold change values.

6.2.1 Linear MA Plot

We first produce a MA plot where fold changes are mapped to a linear scale (raw fold change units, y-axis). Notice that with this plot, negative fold changes are compressed between [0,1) and positive fold changes are between (1,inf). This limits the linear plot’s usefulness in comparing the magnitude of the positive and negative fold change values.

g1 <- ggplot(data = df_res, aes(x = baseMean, y = FoldChange)) +
  geom_hline(yintercept = 1, color = "grey") +
  geom_point(aes(col=Change), size = 0.4, alpha = 0.6, shape = 16) +
  scale_x_log10() +
  scale_color_manual(values=c("blue", "black", "red"), name = "Change") +
  theme_classic(base_size = 16) + 
  ylab("FC") +  xlab("Normalized Mean Count") +
  theme(legend.position = "none")     
g1

6.2.2 Log MA plot

Next we produce a MA plot of the same data where fold changes are mapped to a log2 scale (raw fold change units). Log plots exhibit symmetry with positive and negative fold changes, allowing them to be compared based on their relative position to zero on the log scale (the point of no change). However, the non-linearity of the log transform makes it difficult to compare the magnitude among collections of positive fold change data points and separately, negative fold changes. It is also difficult to recover the original fold change value before the log transform.

# Log2 Plot
g2 <- ggplot(data = df_res, aes(x = baseMean, y = log2FoldChange)) +
  geom_hline(yintercept = 0, color = "grey") +
  geom_point(aes(col=Change), size = 0.4, alpha = 0.6, shape = 16) +
  scale_x_log10() +
  scale_color_manual(values=c("blue", "black", "red"), name = "Change") +
  theme_classic(base_size = 16) + 
  ylab(expression(log[2]~(FC))) +  xlab("Normalized Mean Count") +
  theme(legend.position = "none")     
g2

6.2.3 MAD MA Plot

Finally we produce a volcano plot of the same data where fold changes are mapped to a MAD-FC scale (raw fold change units with negative fold change scale distorted to match the positive fold change scale). MAD plots are symmetrical and linear by design, making it easier to compare datapoints with different fold change direction, datapoints with same direction. The linear mapping makes it easy to recover the original datapoint position from the plot.

To create a MAD plot, first create a linear plot like above, and then call the function. gg_rev_axis_mfc(gg, axes, num_format) Where gg is the ggplot object, axis is the axis to be transformed, and num_format is the format used for the plot tick labels. This function extracts the axis labels from the gg plot object and reverses the MAD-FC transform to display the original FC units.

# Produce a MA plot with MAD-FC transformed fold changes
g3 <- ggplot(data = df_res, aes(x = baseMean, y = MAD_FoldChange)) +
  geom_hline(yintercept = 0, color = "grey") +
  geom_point(aes(col=Change), size = 0.4, alpha = 0.6, shape = 16) +
  scale_x_log10() +
  scale_color_manual(values=c("blue", "black", "red"), name = "Change") +
  scale_y_continuous(breaks = c(-10, 0, 10, 20)) +
  theme_classic(base_size = 16) + 
  ylab("FC") +  xlab("Normalized Mean Count") +
  theme(legend.position = "none")     
g3

g3 <- gg_revaxis_mfc(g3,'y', num_format = "fraction")
g3


7 Session Information

sessionInfo()
## R version 4.3.1 (2023-06-16 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 11 x64 (build 22621)
## 
## Matrix products: default
## 
## 
## locale:
## [1] LC_COLLATE=English_United States.utf8 
## [2] LC_CTYPE=English_United States.utf8   
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.utf8    
## 
## time zone: America/New_York
## tzcode source: internal
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] EnhancedVolcano_1.18.0      ggrepel_0.9.3              
##  [3] airway_1.20.0               DESeq2_1.40.2              
##  [5] SummarizedExperiment_1.30.2 Biobase_2.60.0             
##  [7] MatrixGenerics_1.12.3       matrixStats_1.0.0          
##  [9] GenomicRanges_1.52.0        GenomeInfoDb_1.36.3        
## [11] IRanges_2.34.1              S4Vectors_0.38.2           
## [13] BiocGenerics_0.46.0         BiocManager_1.30.22        
## [15] lubridate_1.9.2             forcats_1.0.0              
## [17] stringr_1.5.0               dplyr_1.1.3                
## [19] purrr_1.0.2                 readr_2.1.4                
## [21] tidyr_1.3.0                 tibble_3.2.1               
## [23] ggplot2_3.4.3               tidyverse_2.0.0            
## [25] magrittr_2.0.3             
## 
## loaded via a namespace (and not attached):
##  [1] gtable_0.3.4            xfun_0.40               bslib_0.5.1            
##  [4] lattice_0.21-8          tzdb_0.4.0              vctrs_0.6.3            
##  [7] tools_4.3.1             bitops_1.0-7            generics_0.1.3         
## [10] parallel_4.3.1          fansi_1.0.4             pkgconfig_2.0.3        
## [13] Matrix_1.6-1.1          lifecycle_1.0.3         GenomeInfoDbData_1.2.10
## [16] farver_2.1.1            compiler_4.3.1          munsell_0.5.0          
## [19] codetools_0.2-19        htmltools_0.5.6.1       sass_0.4.7             
## [22] RCurl_1.98-1.12         yaml_2.3.7              pillar_1.9.0           
## [25] crayon_1.5.2            jquerylib_0.1.4         BiocParallel_1.34.2    
## [28] DelayedArray_0.26.7     cachem_1.0.8            abind_1.4-5            
## [31] locfit_1.5-9.8          tidyselect_1.2.0        digest_0.6.33          
## [34] stringi_1.7.12          labeling_0.4.3          fastmap_1.1.1          
## [37] grid_4.3.1              colorspace_2.1-0        cli_3.6.1              
## [40] S4Arrays_1.0.6          utf8_1.2.3              withr_2.5.1            
## [43] scales_1.2.1            timechange_0.2.0        rmarkdown_2.25         
## [46] XVector_0.40.0          hms_1.1.3               evaluate_0.22          
## [49] knitr_1.44              rlang_1.1.1             Rcpp_1.0.11            
## [52] glue_1.6.2              rstudioapi_0.15.0       jsonlite_1.8.7         
## [55] R6_2.5.1                zlibbioc_1.46.0

Bruce Corliss, 10/16/2023