1 Introduction

This vignette illustrates how to use a MAD-FC transform to visualize fold change measurements in heatmaps. We will compare visualizing the same data with a linear, log2, and MAD-FC transform. Each of these transforms exhibit different characteristics and emphasizes different aspects of the data, so much so that at first glance it appears that each of these plots are of different datasets.

Reference for MAD-FC

B. A. Corliss, Y. Wang, F. P. Driscoll, H. Shakeri, P. E. Bourne, MAD-FC: A Fold Change Visualization with Readability, Proportionality, and Symmetry (2023), doi:10.48550/arXiv.2303.10829.

2 Dataset and Code Sources

Dataset Availability: This vignette uses from the UbiLength dataset in the DEP package from Zhang et all.

Zhang, Smits, van Tilburg, et al (2017). An interaction landscape of ubiquitin signaling. Molecular Cell 65(5): 941-955. doi: 10.1016/j.molcel.2017.01.004.

3 Package Dependencies

Primary Packages: tidyverse, BiocManager, DESeq2, DEP

The MAD-FC source file is also required (mirrored_axis_distortion.R)

# Install required Base Packages
base_packages <- c("tidyverse","BiocManager")
install.packages(setdiff(base_packages, rownames(installed.packages())))  
# Install required Bioconductor Packages
biocm_packages <- c("DESeq2", "DEP")
bioc_installs <- setdiff(biocm_packages, rownames(installed.packages()))
if (length(bioc_installs)) {BiocManager::install(bioc_installs) }

# Load base packages
lapply(base_packages, library, character.only = TRUE)
# Load Bioconductor packages packages
lapply(biocm_packages, library, character.only = TRUE)
source(paste0(dirname(getwd()), "/R/mirrored_axis_distortion.R"))

4 Data Analysis Steps

We processing the data with the following steps:

  1. Loading the UbiLength dataset.
  2. Analyze dataset with Differential Enrichment analysis of Proteomics data (standard workflow in DEP package).
  3. Plot data in default heatmap that identifies interesting genes.
  4. Extract which sub group of genes and their order from the plot.
# The data is provided with the package 
data <- UbiLength
experimental_design <- UbiLength_ExpDesign

# The wrapper function performs the full analysis on dataset
data_results <- LFQ(data, experimental_design, fun = "MinProb", 
                    type = "control", control = "Ctrl", alpha = 0.05, lfc = 1)
## [1] 0.2608395
# Extract the results table from the LFQ analysis
results_table <- data_results$results

# Extract the sign object
full_data <- data_results$dep
# Used a gene expression heatmap function with specialized algorithm for visualizing most interesting genes.
heatmap_fc <- plot_heatmap(full_data, type = "contrast", kmeans = TRUE,
             k = 6, col_limit = 4, show_row_names = FALSE)

# Extract fold changes from heatmap
def_heatmap <- heatmap_fc@ht_list$`log2 Fold change`@matrix
  1. Obtain the data log2 fold change data from same genes, directly from experiment data
  2. Convert the data frame to long format for ggplot2.
  3. Label in 1 in 10 genes on the y axis.
  4. Define a function that saturates data values for plotting purposes.
# Extracted log fold changes from results
df_expr <- results_table %>% select(ends_with("_ratio"))
# Added gene names as seprate column
df_expr$Gene <- results_table$name
# Removed extra text from sample columns
colnames(df_expr) <- str_replace(colnames(df_expr),"_vs_Ctrl_ratio","")
# Discarded genes not included in default heatmap
df_sub <- df_expr[df_expr$Gene %in% rownames(def_heatmap),]
# Converted genes to factor to ensure same gene order as in default heatmap
df_sub$Gene <- factor(df_sub$Gene, ordered = TRUE, levels = rownames(def_heatmap))
# COnverted wide dataset to long
df_sub <- pivot_longer(df_sub,cols = -Gene, names_to = "Sample", values_to = "log2FC")

# Added columsn for FC and madFC
df_sub$FC <- 2^ df_sub$log2FC
df_sub$madFC <- contract1(fc_to_mfc(df_sub$FC))

# Only label 1 in 10 genes
gene_list <- rownames(def_heatmap)
gene_sublist <- rep(" ",length(gene_list))
gene_sublist[seq(1,length(gene_list), 10)] <- gene_list[seq(1,length(gene_list), 10)]

saturate <- function(df, colname,lo, hi) {
  df[[colname]][df[[colname]]<lo] <- lo
  df[[colname]][df[[colname]]>hi] <- hi
  return(df)
}

5 Visualization

We now will visualize the same dataset with heatmaps where fold changes are linear, log2, and MAD-FC transformed. Each type of transform has unique characteristics for how it displays the fold change values.

5.1 Linear Errorbar Plot

We first produce a heatmap where fold changes are mapped to a linear scale (raw fold change units, y-axis). Notice that with this plot, negative fold changes are compressed between [0,1) and positive fold changes are between (1,inf). This limits the linear plot’s usefulness in comparing the datapoints and intervals between positive and negative fold change values.

# Fold Change Linear Plot
# Log2 FC plot
g1 <- ggplot(data = saturate(df_sub,'log2FC',-4,4), aes(x=Sample, y= Gene, fill = log2FC)) + 
  geom_tile() +
  scale_y_discrete("", labels = gene_sublist) +
  scale_fill_gradientn(name = expression(log[2](FC)),limits = c(-4,4),
                       colors=c("#377eb8", "white", "#e41a1c"),
                       # breaks = c(-15,-7, 0, 7, 15), labels = mad_fc_labeller,
                       guide = guide_colorbar(title.position = "top", title.hjust = 0.5,
                                              barwidth = grid::unit(.6, "npc"),
                                              barheight = grid::unit(.025, "npc")),) +
  theme_classic(base_size = 8) +
  theme(legend.position="top")
g1

5.2 Log Errorbar plot

Next we produce a heatmap of the same data where fold changes are mapped to a log2 scale (raw fold change units). Log plots exhibit symmetry with positive and negative fold changes, allowing them to be compared based on their relative position to zero on the log scale (the point of no change). However, the non-linearity of the log transform makes it difficult to compare the magnitude of datapoints and intervals among collections of positive fold change data points (and separately, negative fold changes). It is also difficult to recover the original fold change value prior to the log transform.

# Fold Change Log2 Plot
g2 <- ggplot(data = saturate(df_sub,'FC',1/16,16), aes(x=Sample, y= Gene, fill = FC)) + 
  geom_tile() +
  scale_y_discrete("", labels = gene_sublist) + 
  scale_fill_gradientn(name = "FC", limits = c(0,16),
                       colors=c("#377eb8", "white", "#e41a1c"), values=c(0, .75/16, 1.25/16, 1),
                       breaks = c(0,-1/16,1, 4,8,12,16),
                       guide = guide_colorbar(title.position = "top", title.hjust = 0.5,
                                              barwidth = grid::unit(.6, "npc"),
                                              barheight = grid::unit(.025, "npc")),) +
  theme_classic(base_size = 8) +
  theme(legend.position="top")
g2

5.3 MAD Errorbar Plot

Finally we produce a heatmap of the same data where fold changes are mapped to a MAD-FC scale (linear fold change units with negative fold change scale distorted to match the scale of positive fold changes). MAD plots are symmetrical and linear by design, making it easier to compare datapoints and intervals regardless of fold change direction. The linear mapping also makes it easy to recover the original datapoint position from the plot.

To create a MAD plot, first create a linear plot like above, and then call the function. gg_rev_axis_mfc(gg, axes, num_format) Where gg is the ggplot object, axis is the axis to be transformed, and num_format is the format used for the plot tick labels. This function extracts the axis labels from the gg plot object and reverses the MAD-FC transform to display the original FC units.

# MAD Fold Change Plot
g3 <- ggplot(data = saturate(df_sub,'madFC',-16,16), aes(x=Sample, y= Gene, fill = madFC)) + 
  geom_tile() +
  scale_y_discrete("", labels = gene_sublist) + 
  scale_fill_gradientn(name = "FC", limits = c(-16,16), 
                       colors=c("#377eb8", "white", "#e41a1c"),
                       breaks = c(-15,-7, 0, 7, 15), labels = mad_fc_labeller,
                       guide = guide_colorbar(title.position = "top", title.hjust = 0.5,
                                               barwidth = grid::unit(.6, "npc"), 
                                              barheight = grid::unit(.025, "npc")),) +
  theme_classic(base_size = 8) +
  theme(legend.position="top")
g3

6 Session Information

sessionInfo()
## R version 4.3.1 (2023-06-16 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 11 x64 (build 22621)
## 
## Matrix products: default
## 
## 
## locale:
## [1] LC_COLLATE=English_United States.utf8 
## [2] LC_CTYPE=English_United States.utf8   
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.utf8    
## 
## time zone: America/New_York
## tzcode source: internal
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] DEP_1.22.0                  DESeq2_1.40.2              
##  [3] SummarizedExperiment_1.30.2 Biobase_2.60.0             
##  [5] MatrixGenerics_1.12.3       matrixStats_1.0.0          
##  [7] GenomicRanges_1.52.0        GenomeInfoDb_1.36.3        
##  [9] IRanges_2.34.1              S4Vectors_0.38.2           
## [11] BiocGenerics_0.46.0         BiocManager_1.30.22        
## [13] lubridate_1.9.2             forcats_1.0.0              
## [15] stringr_1.5.0               dplyr_1.1.3                
## [17] purrr_1.0.2                 readr_2.1.4                
## [19] tidyr_1.3.0                 tibble_3.2.1               
## [21] ggplot2_3.4.3               tidyverse_2.0.0            
## 
## loaded via a namespace (and not attached):
##   [1] bitops_1.0-7            fdrtool_1.2.17          sandwich_3.0-2         
##   [4] rlang_1.1.1             magrittr_2.0.3          shinydashboard_0.7.2   
##   [7] clue_0.3-65             GetoptLong_1.0.5        compiler_4.3.1         
##  [10] png_0.1-8               vctrs_0.6.3             ProtGenerics_1.32.0    
##  [13] pkgconfig_2.0.3         shape_1.4.6             crayon_1.5.2           
##  [16] fastmap_1.1.1           ellipsis_0.3.2          XVector_0.40.0         
##  [19] labeling_0.4.3          utf8_1.2.3              promises_1.2.1         
##  [22] rmarkdown_2.25          tzdb_0.4.0              preprocessCore_1.62.1  
##  [25] xfun_0.40               zlibbioc_1.46.0         cachem_1.0.8           
##  [28] jsonlite_1.8.7          gmm_1.8                 later_1.3.1            
##  [31] DelayedArray_0.26.7     BiocParallel_1.34.2     parallel_4.3.1         
##  [34] cluster_2.1.4           R6_2.5.1                bslib_0.5.1            
##  [37] stringi_1.7.12          RColorBrewer_1.1-3      limma_3.56.2           
##  [40] jquerylib_0.1.4         assertthat_0.2.1        Rcpp_1.0.11            
##  [43] iterators_1.0.14        knitr_1.44              zoo_1.8-12             
##  [46] httpuv_1.6.11           Matrix_1.6-1.1          timechange_0.2.0       
##  [49] tidyselect_1.2.0        rstudioapi_0.15.0       abind_1.4-5            
##  [52] yaml_2.3.7              doParallel_1.0.17       codetools_0.2-19       
##  [55] affy_1.78.2             lattice_0.21-8          plyr_1.8.9             
##  [58] shiny_1.7.5             withr_2.5.1             tmvtnorm_1.5           
##  [61] evaluate_0.22           norm_1.0-11.1           circlize_0.4.15        
##  [64] pillar_1.9.0            affyio_1.70.0           DT_0.30                
##  [67] foreach_1.5.2           MSnbase_2.26.0          MALDIquant_1.22.1      
##  [70] ncdf4_1.21              generics_0.1.3          RCurl_1.98-1.12        
##  [73] hms_1.1.3               munsell_0.5.0           scales_1.2.1           
##  [76] xtable_1.8-4            glue_1.6.2              tools_4.3.1            
##  [79] mzID_1.38.0             vsn_3.68.0              mzR_2.34.1             
##  [82] locfit_1.5-9.8          imputeLCMD_2.1          mvtnorm_1.2-3          
##  [85] XML_3.99-0.14           grid_4.3.1              impute_1.74.1          
##  [88] MsCoreUtils_1.12.0      colorspace_2.1-0        GenomeInfoDbData_1.2.10
##  [91] cli_3.6.1               fansi_1.0.4             S4Arrays_1.0.6         
##  [94] ComplexHeatmap_2.16.0   pcaMethods_1.92.0       gtable_0.3.4           
##  [97] sass_0.4.7              digest_0.6.33           farver_2.1.1           
## [100] htmlwidgets_1.6.2       rjson_0.2.21            htmltools_0.5.6.1      
## [103] lifecycle_1.0.3         mime_0.12               GlobalOptions_0.1.2    
## [106] MASS_7.3-60

Bruce Corliss, 10/16/2023