3.1 Allohexaploid C. flexuosa

3.1.1 Introduction

Cardamine flexuosa (2n = 4x = 32, HHAA), wavy bittercress, is an allopolyploid species that originated from two diploid progenitors, Cardamine hirsuta (2n = 2x = 16, HH) and Cardamine amara (2n = 2x = 16, AA) (Mandáková et al. 2014). While C. hirsuta is typically adapted to dry habitats such as roadsides, C. amara inhabits wet environments such as riversides or running streams. The allopolyploid C. flexuosa thrives across a wide range of ecological habitats.

Akiyama et al. conducted an RNA-Seq experiment using leaf samples of C. flexuosa collected from wet and dry habitats across three different days (Akiyama et al. 2021). Each habitat–date combination included two or three biological replicates. In the original study, homeolog expression levels were quantified using read counts processed through the HomeoRoq sorting pipeline (Akama et al. 2014). Analysis of homeolog expression ratios revealed that a small percentage of homeologs in C. flexuosa exhibit shifts in expression depending on environmental differences, wet and dry.

3.1.2 Data Preparation

In this example, we demonstrate how to use HOBIT and HomeoRoq to analyze the C. flexuosa dataset. To reduce computation time, we focus on samples collected on May 16, 2013, and randomly select 500 homeolog pairs (c_flexuosa.0516.mini.txt.gz). Users can, however, analyze the full dataset or samples from other dates. These datasets are available in the data directory, and the same workflow can be applied.

gexp <- read.table("../data/c_flexuosa.0516.mini.txt.gz",
                   header = TRUE, sep = "\t", row.names = 1)
head(gexp)

##               wet_1 wet_2 wet_3 dry_1 dry_2 dry_3
## CARHR000190_H   126   109   150   175   113   124
## CARHR000660_H     3     4     8    10     2     5
## CARHR000770_H   129    95   170   233   316   238
## CARHR000890_H    45    32   117     5     6     1
## CARHR001940_H    23    22    73   176    46   108
## CARHR003740_H   541   436   576   552   344   284

group <- c("wet", "wet", "wet", "dry", "dry", "dry")

Next, load the homeolog mapping table, which links gene expression values to their corresponding homeologs. This table is a tab-separated file in which the first and second columns represent gene names from C. hirsuta and C. amara, respectively.

mapping_table <- read.table("../data/c_flexuosa.homeolog.txt.gz",
                            header = TRUE, sep = "\t")
head(mapping_table)

##       C_hirsuta       C_amara
## 1 CARHR000010_H CARHR000010_A
## 2 CARHR000060_H CARHR000060_A
## 3 CARHR000070_H CARHR000070_A
## 4 CARHR000090_H CARHR000090_A
## 5 CARHR000110_H CARHR000110_A
## 6 CARHR000120_H CARHR000120_A

We then use the newExpMX() function to organize the gene expression matrix (gexp) into a homeolog expression matrix using the mapping table (mapping_table) and store the result in an ExpMX class object.

x <- newExpMX(gexp, group, mapping_table)
x

## # 2 subgenome sets (C_hirsuta, C_amara)
## # 500 homeolog tuples
## ---------------------
## Experiment Design:
##   group
## 1   wet
## 2   wet
## 3   wet
## 4   dry
## 5   dry
## 6   dry
## ---------------------
## > subgenome: C_hirsuta 
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]  126  109  150  175  113  124
## [2,]    3    4    8   10    2    5
## [3,]  129   95  170  233  316  238
## [4,]   45   32  117    5    6    1
## [5,]   23   22   73  176   46  108
## [6,]  541  436  576  552  344  284
## +++++++++++++++++++++
## > subgenome: C_amara 
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]  238  141   82  136   63   88
## [2,]   14   10   24   23    6   13
## [3,]   20    9   17   26   40   22
## [4,]   23   19   73    2    5    2
## [5,]   15   17   54  209   36   70
## [6,]  798  610  880  958  969  567
## ---------------------

Before performing the test, we normalize the raw read counts using the TMM method (Robinson and Oshlack 2010) to adjust for differences in library size. However, if the expression data (gexp) has already been normalized (e.g. FPKM), this step can be skipped.

x <- norm_counts(x)

To visualize the distributions of homeolog expression ratios, use the plot_HER_distr() function. This function returns a list of histograms, one for each condition, showing the distribution of homeolog expression ratios.

distr_plots <- plot_HER_distr(x)
names(distr_plots)

## [1] "wet" "dry"

By default, the expression ratio is calculated as the proportion contributed by the first subgenome relative to all subgenomes. In this dataset, the first subgenome is derived from C. hirsuta, as indicated by the first column in the mapping table (mapping_table).

To display the distribution of homeolog expression ratios for each experimental group, run the following code.

library(ggplot2)
library(gridExtra)
grid.arrange(distr_plots[["wet"]] + ggtitle("wet"),
             distr_plots[["dry"]] + ggtitle("dry"),
             ncol = 2)

Figure 3.1: Distribution of homeolog expression ratios in C. flexuosa under wet and dry habitats.

After visually inspecting the distributions, if no irregularities are observed, users can proceed to apply the statistical test.

3.1.3 HOBIT

HOBIT can be executed using the hobit() function, which performs a statistical test to detect homeologs with shifts in expression ratios between wet and dry habitats. This step takes approximately five minutes using eight threads and may vary depending on hardware performance.

x_output <- hobit(x)

The order of the test output matches that of the input.

head(x_output)

##            gene    pvalue qvalue raw_pvalue raw_qvalue D__C_hirsuta__(dry-wet) D__C_amara__(dry-wet) OR__C_hirsuta__(dry/wet) OR__C_amara__(dry/wet)        Dmax    ORmax theta0__C_hirsuta theta0__C_amara theta1__C_hirsuta__dry theta1__C_amara__dry theta1__C_hirsuta__wet theta1__C_amara__wet logLik_H0 logLik_H1
## 1 CARHR000190_H 0.1233256      1 0.06541479  0.7656442             0.135262370          -0.135262370                1.7271228              0.5789976 0.135262370 1.727123         0.5139389       0.4860611              0.5834637            0.4165363              0.4460479            0.5539521 -64.49956 -62.84965
## 2 CARHR000660_H 1.0000000      1 1.00000000  1.0000000             0.053684855          -0.053684855                1.3411438              0.7456322 0.053684855 1.341144         0.2533909       0.7466091              0.2769508            0.7230492              0.2212910            0.7787090 -31.81579 -32.06581
## 3 CARHR000770_H 1.0000000      1 1.00000000  1.0000000            -0.002602245           0.002602245                0.9721639              1.0286332 0.002602245 1.028633         0.8930171       0.1069829              0.8936688            0.1063312              0.8960575            0.1039425 -53.30633 -53.69221
## 4 CARHR000890_H 0.7696578      1 0.72625945  1.0000000            -0.084993555           0.084993555                0.6983480              1.4319519 0.084993555 1.431952         0.5863827       0.4136173              0.5442493            0.4557507              0.6292444            0.3707556 -41.19201 -41.26375
## 5 CARHR001940_H 1.0000000      1 1.00000000  1.0000000            -0.048105955           0.048105955                0.8195665              1.2201572 0.048105955 1.220157         0.5520396       0.4479604              0.5294441            0.4705559              0.5774278            0.4225722 -57.14303 -57.51374
## 6 CARHR003740_H 0.3542458      1 0.26803105  1.0000000            -0.089427170           0.089427170                0.6749970              1.4814881 0.089427170 1.481488         0.3601068       0.6398932              0.3143448            0.6856552              0.4038831            0.5961169 -75.39992 -74.78379

To rank the output in descending order of p-values, use the following code.

head(x_output[order(x_output$pvalue), ])

##              gene       pvalue       qvalue   raw_pvalue   raw_qvalue D__C_hirsuta__(dry-wet) D__C_amara__(dry-wet) OR__C_hirsuta__(dry/wet) OR__C_amara__(dry/wet)      Dmax      ORmax theta0__C_hirsuta theta0__C_amara theta1__C_hirsuta__dry theta1__C_amara__dry theta1__C_hirsuta__wet theta1__C_amara__wet  logLik_H0  logLik_H1
## 318 CARHR189440_H 1.829199e-13 9.145996e-11 1.360237e-18 6.801183e-16              -0.2404259             0.2404259              0.007678317           130.23691441 0.2404259 130.236914         0.8762089       0.1237911              0.7564332           0.24356680             0.99756518          0.002434821 -122.41584  -83.64062
## 299 CARHR177020_H 6.117493e-07 1.529373e-04 2.474391e-09 6.185977e-07              -0.4707073             0.4707073              0.065687423            15.22361436 0.4707073  15.223614         0.6862210       0.3137790              0.4511883           0.54881169             0.92454972          0.075450275  -65.14355  -47.45449
## 118 CARHR064090_H 6.761393e-06 1.126899e-03 7.383501e-08 1.230584e-05              -0.3467176             0.3467176              0.172119224             5.80992628 0.3467176   5.809926         0.3204976       0.6795024              0.1438408           0.85615924             0.49473466          0.505265340 -116.26714 -101.90238
## 460 CARHR262780_H 1.178066e-05 1.472582e-03 1.616641e-07 2.020801e-05              -0.5174396             0.5174396              0.086974815            11.49758531 0.5174396  11.497585         0.4220346       0.5779654              0.1566324           0.84336759             0.68466764          0.315332355  -85.88928  -72.37795
## 442 CARHR252830_H 3.645239e-05 3.645239e-03 7.949838e-07 7.949838e-05               0.4967808            -0.4967808             49.779939767             0.02008841 0.4967808  49.779940         0.7163901       0.2836099              0.9772524           0.02274755             0.46677192          0.533228085  -54.15785  -42.16888
## 52  CARHR027460_H 5.811810e-05 4.843175e-03 1.533691e-06 1.278076e-04               0.1625840            -0.1625840              5.711463046             0.17508650 0.1625840   5.711463         0.1262953       0.8737047              0.2070179           0.79298212             0.04360943          0.956390565  -69.64253  -58.25526

To visualize homeologs with shifts in expression ratios between wet and dry habitats, use the plot_HER() function. This function generates a scatter plot comparing the expression ratios calculated for the wet and dry groups. By default, the expression ratio is defined as the proportion contributed by the first subgenome (i.e., the subgenome derived from C. hirsuta) relative to the total expression across all subgenomes. In the example below, homeologs with q-values less than 0.01 are highlighted.

is_sig <- ifelse(x_output$qvalue < 0.01, "q<0.01", "q>=0.01")
plot_HER(x, label = is_sig)

Changes in homeolog expression ratios between wet (x-axis) and dry (y-axis) habitats. Orange points indicate homeologs with significant shifts detected by HOBIT, while gray points represent homeologs without significant changes.

Figure 3.2: Changes in homeolog expression ratios between wet (x-axis) and dry (y-axis) habitats. Orange points indicate homeologs with significant shifts detected by HOBIT, while gray points represent homeologs without significant changes.

3.1.4 HomeoRoq

For allopolyploids composed of two subgenomes under two different conditions, HomeoRoq is also a suitable option for detecting homeologs with shifts in expression ratios. It can be executed using the homeoroq() function in the same way as hobit(), as shown below.

x_homeoroq <- homeoroq(x)

head(x_homeoroq[order(x_homeoroq$pvalue), ])

##              gene pvalue qvalue sumexp__wet__C_hirsuta sumexp__wet__C_amara sumexp__dry__C_hirsuta sumexp__dry__C_amara ratio__wet__C_hirsuta ratio__dry__C_hirsuta   ratio_sd
## 52  CARHR027460_H      0      0               41.57096            909.51163             102.245685             409.2047             0.0437091            0.10105752 0.07791076
## 68  CARHR034170_H      0      0               55.84898            207.90394               4.376849             213.3613             0.2117473            0.02061821 0.13132195
## 118 CARHR064090_H      0      0             2317.24730           2368.74631             658.880578            4368.6664             0.4945050            0.21762278 0.21895848
## 211 CARHR120540_H      0      0               58.59614             35.50623              21.889335               0.0000             0.6226851            0.38137679 0.18669063
## 299 CARHR177020_H      0      0              173.87466             14.80519             152.537840             173.6894             0.9215327            0.91152790 0.22859512
## 307 CARHR184910_H      0      0               60.88312            274.91276               5.699047             801.7122             0.1813099            0.02030936 0.12764240

The overlap of homeologs with shifts in expression ratios detected by HOBIT and HomeoRoq can be visualized using an UpSet plot, which is implemented in the UpSetR package (Conway et al. 2017).

library("UpSetR")
sig_homeologs <- list(
    HOBIT = x_output$gene[x_output$qvalue < 0.01],
    HomeoRoq = x_homeoroq$gene[x_homeoroq$qvalue < 0.01]
)
upset(fromList(sig_homeologs))

Figure 3.3: Overlap of homeologs with shifted expression ratios detected by HOBIT and HomeoRoq.

Additionally, homeologs with shifts in expression ratios detected by both methods can be visualized using the plot_HER() function.

is_sig <- rep("n.s.", length = nrow(x_output))
is_sig[x_output$qvalue < 0.01] <- "HOBIT"
is_sig[x_homeoroq$qvalue < 0.01] <- "HomeoRoq"
is_sig[x_output$qvalue < 0.01 & x_homeoroq$qvalue < 0.01] <- "HOBIT&HomeoRoq"
is_sig <- factor(is_sig, levels = c("n.s.", "HOBIT", "HomeoRoq", "HOBIT&HomeoRoq"))
plot_HER(x, label = is_sig)

Comparison of homeologs with changes in expression ratios detected by HOBIT and HomeoRoq. Each point represents a homeolog expression ratio in wet (x-axis) and dry (y-axis) conditions. Points labeled as n.s. indicate homeologs without significant changes detected by either method; other points represent homeologs identified as having significant changes by the respective method.

Figure 3.4: Comparison of homeologs with changes in expression ratios detected by HOBIT and HomeoRoq. Each point represents a homeolog expression ratio in wet (x-axis) and dry (y-axis) conditions. Points labeled as n.s. indicate homeologs without significant changes detected by either method; other points represent homeologs identified as having significant changes by the respective method.

3.1.5 Analysis Environment

sessionInfo()

## R version 4.5.1 (2025-06-13)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.3 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8        LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8   
##  [6] LC_MESSAGES=C.UTF-8    LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C           LC_TELEPHONE=C        
## [11] LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
## 
## time zone: UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] UpSetR_1.4.0    gridExtra_2.3   ggplot2_4.0.0   hespresso_1.0.1
## 
## loaded via a namespace (and not attached):
##  [1] tensorA_0.36.2.1     sass_0.4.10          future_1.67.0        generics_0.1.4       lattice_0.22-7      
##  [6] hms_1.1.3            listenv_0.9.1        digest_0.6.37        magrittr_2.0.4       evaluate_1.0.5      
## [11] grid_4.5.1           RColorBrewer_1.1-3   bookdown_0.44        fastmap_1.2.0        plyr_1.8.9          
## [16] jsonlite_2.0.0       processx_3.8.6       progress_1.2.3       backports_1.5.0      limma_3.64.3        
## [21] ps_1.9.1             scales_1.4.0         codetools_0.2-20     jquerylib_0.1.4      abind_1.4-8         
## [26] cli_3.6.5            rlang_1.1.6          crayon_1.5.3         cmdstanr_0.9.0       parallelly_1.45.1   
## [31] future.apply_1.20.0  splines_4.5.1        withr_3.0.2          cachem_1.1.0         yaml_2.3.10         
## [36] tools_4.5.1          parallel_4.5.1       checkmate_2.3.3      locfit_1.5-9.12      globals_0.18.0      
## [41] vctrs_0.6.5          posterior_1.6.1      R6_2.6.1             lifecycle_1.0.4      edgeR_4.6.3         
## [46] pkgconfig_2.0.3      progressr_0.16.0     pillar_1.11.1        bslib_0.9.0          gtable_0.3.6        
## [51] Rcpp_1.1.0           data.table_1.17.8    glue_1.8.0           statmod_1.5.0        xfun_0.53           
## [56] tibble_3.3.0         knitr_1.50           farver_2.1.2         htmltools_0.5.8.1    labeling_0.4.3      
## [61] rmarkdown_2.29       compiler_4.5.1       prettyunits_1.2.0    S7_0.2.0             distributional_0.5.0