4.2 HomeoRoq

HomeoRoq (Akama et al. 2014) estimates the probability that homeolog expression ratios remain constant between two experimental groups using RNA-seq read count data for each homeolog. Since this probability cannot be derived analytically, HomeoRoq adopts a Bayesian framework to approximate the posterior distribution of homeolog expression ratios under the null hypothesis of no change between conditions. The homeoroq() function provides a complete interface for detecting homeologs whose expression ratios differ between two conditions.

By default, it performs 10,000 sampling iterations (iter_sampling = 1e4) per probability calculation and repeats the calculation four times (chians = 4). However, for robust and reproducible estimation of statistical significance (i.e., p-values), it is recommended to run multiple independent sampling replicates, each with a sufficiently large number of iterations. To reproduce the settings used in the original HomeoRoq implementation (Akama et al. 2014), set iter_sampling = 1e4 and chains = 10.

4.2.1 Inputs

The homeoroq() function requires an input object of class ExpMX, same to the hobit() function.

gexp <- read.table("../data/c_flexuosa.0516.mini.txt.gz",
                   header = TRUE, sep = "\t", row.names = 1)
group <- c("wet", "wet", "wet", "dry", "dry", "dry")
mapping_table <- read.table("../data/c_flexuosa.homeolog.txt.gz",
                            header = TRUE, sep = "\t")
x <- newExpMX(gexp, group, mapping_table)

4.2.2 Test

Once an ExpMX object is prepared, normalize the counts if the data have not already been normalized, then run the test using the homeoroq() function.

x <- norm_counts(x)
x_output <- homeoroq(x)

4.2.3 Outputs

The result (x_output) is a data frame with one row per homeolog, summarizing the statistical test results.

head(x_output)
##            gene   pvalue qvalue sumexp__wet__C_hirsuta sumexp__wet__C_amara sumexp__dry__C_hirsuta sumexp__dry__C_amara ratio__wet__C_hirsuta ratio__dry__C_hirsuta   ratio_sd
## 1 CARHR000190_H 0.241925      1              383.26142            475.04089              419.43533            285.21986             0.4465343             0.4689173 0.11000959
## 2 CARHR000660_H 0.942875      1               14.57256             46.19000               15.95469             40.22082             0.2398279             0.2567345 0.04073867
## 3 CARHR000770_H 0.990150      1              385.98379             44.96460              850.62370             95.35341             0.8956613             0.9497932 0.01737532
## 4 CARHR000890_H 0.520575      1              181.89090            107.32037               12.38467             10.08555             0.6289205             0.1034599 0.12263357
## 5 CARHR001940_H 0.978275      1              110.98129             81.00309              318.05281            285.09804             0.5780746             0.7970132 0.05006071
## 6 CARHR003740_H 0.197475      1             1549.17077           2270.28481             1179.77571           2596.99673             0.4055999             0.3419580 0.05277229
  • gene: Homeolog ID. The gene names defined in the first column of the mapping table.
  • pvalue: p-value calculated from sampling.
  • qvalue: Adjusted p-value using the Benjamini-Hochberg method.
  • sumexp__*__$: Total read counts for subgenome $ under condition *.
  • ratio__*__$: Homeolog expression ratios for subgenome $ under condition *.
  • ratio_sd: Standard deviation of homeolog expression ratios calculated from observed counts.

The total read counts (sumexp__*__$) and homeolog expression ratios (ratio__*__$) are computed directly from the observed read counts. Note that this differs from the hobit() function, which calculates its statistics from MCMC samples.

Furthermore, the pvalue and qvalue columns in x_output may contain NA values. This happens when at least one gene in a homeolog tuple is not expressed in one or more conditions, making it impossible to calculate expression ratios. As a result, the statistical test cannot be performed for those tuples. These NA values may affect sorting or counting the number of significant homeolog tuples. Therefore, we recommend replacing NA with 1 (i.e., non-significant) before performing downstream analyses. For example:

x_output$pvalue[is.na(x_output$pvalue)] <- 1
x_output$qvalue[is.na(x_output$qvalue)] <- 1

4.2.4 Options

The number of iterations and number of samples drawn per iteration can be adjusted using the chains and iter_sampling arguments, respectively. For example:

x_output <- homeoroq(x, chains = 8, iter_sampling = 1e5)

Additionally, the n_threads option can be used to run iterations in parallel.

x_output <- homeoroq(x, chains = 8, iter_sampling = 1e5, n_threads = 8)