3.1 Allohexaploid C. flexuosa
3.1.1 Introduction
Cardamine flexuosa (2n = 4x = 32, HHAA), wavy bittercress, is an allopolyploid species that originated from two diploid progenitors, Cardamine hirsuta (2n = 2x = 16, HH) and Cardamine amara (2n = 2x = 16, AA) (Mandáková et al. 2014). While C. hirsuta is typically adapted to dry habitats such as roadsides, C. amara inhabits wet environments such as riversides or running streams. The allopolyploid C. flexuosa thrives across a wide range of ecological habitats.
Akiyama et al. conducted an RNA-Seq experiment using leaf samples of C. flexuosa collected from wet and dry habitats across three different days (Akiyama et al. 2021). Each habitat–date combination included two or three biological replicates. In the original study, homeolog expression levels were quantified using read counts processed through the HomeoRoq sorting pipeline (Akama et al. 2014). Analysis of homeolog expression ratios revealed that a small percentage of homeologs in C. flexuosa exhibit shifts in expression depending on environmental differences, wet and dry.
3.1.2 Data Preparation
In this example, we demonstrate how to use HOBIT and HomeoRoq to analyze the C. flexuosa dataset.
To reduce computation time, we focus on samples collected on May 16, 2013,
and randomly select 500 homeolog pairs (c_flexuosa.0516.mini.txt.gz
).
Users can, however, analyze the full dataset or samples from other dates.
These datasets are available in the data directory,
and the same workflow can be applied.
gexp <- read.table("../data/c_flexuosa.0516.mini.txt.gz",
header = TRUE, sep = "\t", row.names = 1)
head(gexp)
## wet_1 wet_2 wet_3 dry_1 dry_2 dry_3
## CARHR000190_H 126 109 150 175 113 124
## CARHR000660_H 3 4 8 10 2 5
## CARHR000770_H 129 95 170 233 316 238
## CARHR000890_H 45 32 117 5 6 1
## CARHR001940_H 23 22 73 176 46 108
## CARHR003740_H 541 436 576 552 344 284
Next, load the homeolog mapping table, which links gene expression values to their corresponding homeologs. This table is a tab-separated file in which the first and second columns represent gene names from C. hirsuta and C. amara, respectively.
mapping_table <- read.table("../data/c_flexuosa.homeolog.txt.gz",
header = TRUE, sep = "\t")
head(mapping_table)
## C_hirsuta C_amara
## 1 CARHR000010_H CARHR000010_A
## 2 CARHR000060_H CARHR000060_A
## 3 CARHR000070_H CARHR000070_A
## 4 CARHR000090_H CARHR000090_A
## 5 CARHR000110_H CARHR000110_A
## 6 CARHR000120_H CARHR000120_A
We then use the newExpMX()
function
to organize the gene expression matrix (gexp
) into a homeolog expression matrix
using the mapping table (mapping_table
) and store the result in an ExpMX
class object.
## # 2 subgenome sets (C_hirsuta, C_amara)
## # 500 homeolog tuples
## ---------------------
## Experiment Design:
## group
## 1 wet
## 2 wet
## 3 wet
## 4 dry
## 5 dry
## 6 dry
## ---------------------
## > subgenome: C_hirsuta
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 126 109 150 175 113 124
## [2,] 3 4 8 10 2 5
## [3,] 129 95 170 233 316 238
## [4,] 45 32 117 5 6 1
## [5,] 23 22 73 176 46 108
## [6,] 541 436 576 552 344 284
## +++++++++++++++++++++
## > subgenome: C_amara
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 238 141 82 136 63 88
## [2,] 14 10 24 23 6 13
## [3,] 20 9 17 26 40 22
## [4,] 23 19 73 2 5 2
## [5,] 15 17 54 209 36 70
## [6,] 798 610 880 958 969 567
## ---------------------
Before performing the test,
we normalize the raw read counts using the TMM method (Robinson and Oshlack 2010)
to adjust for differences in library size.
However, if the expression data (gexp
) has already been normalized (e.g. FPKM),
this step can be skipped.
To visualize the distributions of homeolog expression ratios,
use the plot_HER_distr()
function.
This function returns a list of histograms, one for each condition,
showing the distribution of homeolog expression ratios.
## [1] "wet" "dry"
By default, the expression ratio is calculated as the proportion contributed
by the first subgenome relative to all subgenomes.
In this dataset, the first subgenome is derived from C. hirsuta,
as indicated by the first column in the mapping table (mapping_table
).
To display the distribution of homeolog expression ratios for each experimental group, run the following code.
library(ggplot2)
library(gridExtra)
grid.arrange(distr_plots[["wet"]] + ggtitle("wet"),
distr_plots[["dry"]] + ggtitle("dry"),
ncol = 2)

Figure 3.1: Distribution of homeolog expression ratios in C. flexuosa under wet and dry habitats.
After visually inspecting the distributions, if no irregularities are observed, users can proceed to apply the statistical test.
3.1.3 HOBIT
HOBIT can be executed using the hobit()
function, which performs a statistical test
to detect homeologs with shifts in expression ratios between wet and dry habitats.
This step takes approximately five minutes
using eight threads and may vary depending on hardware performance.
The order of the test output matches that of the input.
## gene pvalue qvalue raw_pvalue raw_qvalue D__C_hirsuta__(dry-wet) D__C_amara__(dry-wet) OR__C_hirsuta__(dry/wet) OR__C_amara__(dry/wet) Dmax ORmax theta0__C_hirsuta theta0__C_amara theta1__C_hirsuta__dry theta1__C_amara__dry theta1__C_hirsuta__wet theta1__C_amara__wet logLik_H0 logLik_H1
## 1 CARHR000190_H 0.1233256 1 0.06541479 0.7656442 0.135262370 -0.135262370 1.7271228 0.5789976 0.135262370 1.727123 0.5139389 0.4860611 0.5834637 0.4165363 0.4460479 0.5539521 -64.49956 -62.84965
## 2 CARHR000660_H 1.0000000 1 1.00000000 1.0000000 0.053684855 -0.053684855 1.3411438 0.7456322 0.053684855 1.341144 0.2533909 0.7466091 0.2769508 0.7230492 0.2212910 0.7787090 -31.81579 -32.06581
## 3 CARHR000770_H 1.0000000 1 1.00000000 1.0000000 -0.002602245 0.002602245 0.9721639 1.0286332 0.002602245 1.028633 0.8930171 0.1069829 0.8936688 0.1063312 0.8960575 0.1039425 -53.30633 -53.69221
## 4 CARHR000890_H 0.7696578 1 0.72625945 1.0000000 -0.084993555 0.084993555 0.6983480 1.4319519 0.084993555 1.431952 0.5863827 0.4136173 0.5442493 0.4557507 0.6292444 0.3707556 -41.19201 -41.26375
## 5 CARHR001940_H 1.0000000 1 1.00000000 1.0000000 -0.048105955 0.048105955 0.8195665 1.2201572 0.048105955 1.220157 0.5520396 0.4479604 0.5294441 0.4705559 0.5774278 0.4225722 -57.14303 -57.51374
## 6 CARHR003740_H 0.3542458 1 0.26803105 1.0000000 -0.089427170 0.089427170 0.6749970 1.4814881 0.089427170 1.481488 0.3601068 0.6398932 0.3143448 0.6856552 0.4038831 0.5961169 -75.39992 -74.78379
To rank the output in descending order of p-values, use the following code.
## gene pvalue qvalue raw_pvalue raw_qvalue D__C_hirsuta__(dry-wet) D__C_amara__(dry-wet) OR__C_hirsuta__(dry/wet) OR__C_amara__(dry/wet) Dmax ORmax theta0__C_hirsuta theta0__C_amara theta1__C_hirsuta__dry theta1__C_amara__dry theta1__C_hirsuta__wet theta1__C_amara__wet logLik_H0 logLik_H1
## 318 CARHR189440_H 1.829199e-13 9.145996e-11 1.360237e-18 6.801183e-16 -0.2404259 0.2404259 0.007678317 130.23691441 0.2404259 130.236914 0.8762089 0.1237911 0.7564332 0.24356680 0.99756518 0.002434821 -122.41584 -83.64062
## 299 CARHR177020_H 6.117493e-07 1.529373e-04 2.474391e-09 6.185977e-07 -0.4707073 0.4707073 0.065687423 15.22361436 0.4707073 15.223614 0.6862210 0.3137790 0.4511883 0.54881169 0.92454972 0.075450275 -65.14355 -47.45449
## 118 CARHR064090_H 6.761393e-06 1.126899e-03 7.383501e-08 1.230584e-05 -0.3467176 0.3467176 0.172119224 5.80992628 0.3467176 5.809926 0.3204976 0.6795024 0.1438408 0.85615924 0.49473466 0.505265340 -116.26714 -101.90238
## 460 CARHR262780_H 1.178066e-05 1.472582e-03 1.616641e-07 2.020801e-05 -0.5174396 0.5174396 0.086974815 11.49758531 0.5174396 11.497585 0.4220346 0.5779654 0.1566324 0.84336759 0.68466764 0.315332355 -85.88928 -72.37795
## 442 CARHR252830_H 3.645239e-05 3.645239e-03 7.949838e-07 7.949838e-05 0.4967808 -0.4967808 49.779939767 0.02008841 0.4967808 49.779940 0.7163901 0.2836099 0.9772524 0.02274755 0.46677192 0.533228085 -54.15785 -42.16888
## 52 CARHR027460_H 5.811810e-05 4.843175e-03 1.533691e-06 1.278076e-04 0.1625840 -0.1625840 5.711463046 0.17508650 0.1625840 5.711463 0.1262953 0.8737047 0.2070179 0.79298212 0.04360943 0.956390565 -69.64253 -58.25526
To visualize homeologs with shifts in expression ratios between wet and dry habitats,
use the plot_HER()
function.
This function generates a scatter plot comparing the expression ratios
calculated for the wet and dry groups.
By default, the expression ratio is defined as the proportion contributed
by the first subgenome (i.e., the subgenome derived from C. hirsuta)
relative to the total expression across all subgenomes.
In the example below, homeologs with q-values less than 0.01 are highlighted.

Figure 3.2: Changes in homeolog expression ratios between wet (x-axis) and dry (y-axis) habitats. Orange points indicate homeologs with significant shifts detected by HOBIT, while gray points represent homeologs without significant changes.
3.1.4 HomeoRoq
For allopolyploids composed of two subgenomes under two different conditions,
HomeoRoq is also a suitable option for detecting homeologs with shifts in expression ratios.
It can be executed using the homeoroq()
function in the same way as hobit()
,
as shown below.
## gene pvalue qvalue sumexp__wet__C_hirsuta sumexp__wet__C_amara sumexp__dry__C_hirsuta sumexp__dry__C_amara ratio__wet__C_hirsuta ratio__dry__C_hirsuta ratio_sd
## 52 CARHR027460_H 0 0 41.57096 909.51163 102.245685 409.2047 0.0437091 0.10105752 0.07791076
## 68 CARHR034170_H 0 0 55.84898 207.90394 4.376849 213.3613 0.2117473 0.02061821 0.13132195
## 118 CARHR064090_H 0 0 2317.24730 2368.74631 658.880578 4368.6664 0.4945050 0.21762278 0.21895848
## 211 CARHR120540_H 0 0 58.59614 35.50623 21.889335 0.0000 0.6226851 0.38137679 0.18669063
## 299 CARHR177020_H 0 0 173.87466 14.80519 152.537840 173.6894 0.9215327 0.91152790 0.22859512
## 307 CARHR184910_H 0 0 60.88312 274.91276 5.699047 801.7122 0.1813099 0.02030936 0.12764240
The overlap of homeologs with shifts in expression ratios detected by HOBIT and HomeoRoq can be visualized using an UpSet plot, which is implemented in the UpSetR package (Conway et al. 2017).
library("UpSetR")
sig_homeologs <- list(
HOBIT = x_output$gene[x_output$qvalue < 0.01],
HomeoRoq = x_homeoroq$gene[x_homeoroq$qvalue < 0.01]
)
upset(fromList(sig_homeologs))

Figure 3.3: Overlap of homeologs with shifted expression ratios detected by HOBIT and HomeoRoq.
Additionally, homeologs with shifts in expression ratios detected by both methods
can be visualized using the plot_HER()
function.
is_sig <- rep("n.s.", length = nrow(x_output))
is_sig[x_output$qvalue < 0.01] <- "HOBIT"
is_sig[x_homeoroq$qvalue < 0.01] <- "HomeoRoq"
is_sig[x_output$qvalue < 0.01 & x_homeoroq$qvalue < 0.01] <- "HOBIT&HomeoRoq"
is_sig <- factor(is_sig, levels = c("n.s.", "HOBIT", "HomeoRoq", "HOBIT&HomeoRoq"))
plot_HER(x, label = is_sig)

Figure 3.4: Comparison of homeologs with changes in expression ratios detected by HOBIT and HomeoRoq. Each point represents a homeolog expression ratio in wet (x-axis) and dry (y-axis) conditions. Points labeled as n.s. indicate homeologs without significant changes detected by either method; other points represent homeologs identified as having significant changes by the respective method.
3.1.5 Analysis Environment
## R version 4.5.1 (2025-06-13)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.3 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8 LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8
## [6] LC_MESSAGES=C.UTF-8 LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
##
## time zone: UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] UpSetR_1.4.0 gridExtra_2.3 ggplot2_4.0.0 hespresso_1.0.1
##
## loaded via a namespace (and not attached):
## [1] tensorA_0.36.2.1 sass_0.4.10 future_1.67.0 generics_0.1.4 lattice_0.22-7
## [6] hms_1.1.3 listenv_0.9.1 digest_0.6.37 magrittr_2.0.4 evaluate_1.0.5
## [11] grid_4.5.1 RColorBrewer_1.1-3 bookdown_0.44 fastmap_1.2.0 plyr_1.8.9
## [16] jsonlite_2.0.0 processx_3.8.6 progress_1.2.3 backports_1.5.0 limma_3.64.3
## [21] ps_1.9.1 scales_1.4.0 codetools_0.2-20 jquerylib_0.1.4 abind_1.4-8
## [26] cli_3.6.5 rlang_1.1.6 crayon_1.5.3 cmdstanr_0.9.0 parallelly_1.45.1
## [31] future.apply_1.20.0 splines_4.5.1 withr_3.0.2 cachem_1.1.0 yaml_2.3.10
## [36] tools_4.5.1 parallel_4.5.1 checkmate_2.3.3 locfit_1.5-9.12 globals_0.18.0
## [41] vctrs_0.6.5 posterior_1.6.1 R6_2.6.1 lifecycle_1.0.4 edgeR_4.6.3
## [46] pkgconfig_2.0.3 progressr_0.16.0 pillar_1.11.1 bslib_0.9.0 gtable_0.3.6
## [51] Rcpp_1.1.0 data.table_1.17.8 glue_1.8.0 statmod_1.5.0 xfun_0.53
## [56] tibble_3.3.0 knitr_1.50 farver_2.1.2 htmltools_0.5.8.1 labeling_0.4.3
## [61] rmarkdown_2.29 compiler_4.5.1 prettyunits_1.2.0 S7_0.2.0 distributional_0.5.0