4.2 HomeoRoq

HomeoRoq (Akama et al. 2014) estimates the probability that homeolog expression ratios remain constant between two experimental groups using RNA-Seq read count data for each homeolog. Since this probability cannot be derived analytically, HomeoRoq adopts a Bayesian framework to approximate the posterior distribution of homeolog expression ratios under the null hypothesis of no change between conditions. The homeoroq() function provides a complete interface for detecting homeologs whose expression ratios differ between two conditions.

By default, it performs 10,000 sampling iterations (iter_sampling = 1e4) per probability calculation and repeats the calculation four times (chians = 4). However, for robust and reproducible estimation of statistical significance (i.e., p-values), it is recommended to run multiple independent sampling replicates, each with a sufficiently large number of iterations. To reproduce the settings used in the original HomeoRoq implementation (Akama et al. 2014), set iter_sampling = 1e4 and chains = 10.

4.2.1 Inputs

The homeoroq() function requires an input object of class ExpMX, same to the hobit() function.

gexp <- read.table("../data/c_flexuosa.0516.mini.txt.gz",
                   header = TRUE, sep = "\t", row.names = 1)
group <- c("wet", "wet", "wet", "dry", "dry", "dry")
mapping_table <- read.table("../data/c_flexuosa.homeolog.txt.gz",
                            header = TRUE, sep = "\t")
x <- newExpMX(gexp, group, mapping_table)

4.2.2 Test

Once an ExpMX object is prepared, normalize the counts if the data have not already been normalized, then run the test using the homeoroq() function.

x <- norm_counts(x)
x_output <- homeoroq(x)

4.2.3 Outputs

The result (x_output) is a data frame with one row per homeolog, summarizing the statistical test results.

head(x_output)

##            gene   pvalue qvalue sumexp__wet__C_hirsuta sumexp__wet__C_amara sumexp__dry__C_hirsuta sumexp__dry__C_amara ratio__wet__C_hirsuta ratio__dry__C_hirsuta   ratio_sd
## 1 CARHR000190_H 0.241925      1              383.26142            475.04089              419.43533            285.21986             0.4465343             0.4689173 0.11000959
## 2 CARHR000660_H 0.942875      1               14.57256             46.19000               15.95469             40.22082             0.2398279             0.2567345 0.04073867
## 3 CARHR000770_H 0.990150      1              385.98379             44.96460              850.62370             95.35341             0.8956613             0.9497932 0.01737532
## 4 CARHR000890_H 0.520575      1              181.89090            107.32037               12.38467             10.08555             0.6289205             0.1034599 0.12263357
## 5 CARHR001940_H 0.978275      1              110.98129             81.00309              318.05281            285.09804             0.5780746             0.7970132 0.05006071
## 6 CARHR003740_H 0.197475      1             1549.17077           2270.28481             1179.77571           2596.99673             0.4055999             0.3419580 0.05277229

gene: Homeolog ID. The gene names defined in the first column of the mapping table.
pvalue: p-value from the likelihood ratio test.
qvalue: Adjusted p-value using the Benjamini-Hochberg method.
sumexp__*__$: Total read counts for subgenome $ under condition *.
ratio__*__$: Homeolog expression ratios for subgenome $ under condition *.
ratio_sd: Standard deviation of homeolog expression ratios calculated from observed counts.

The total read counts (sumexp__*__$) and homeolog expression ratios (ratio__*__$) are computed directly from the observed read counts. Note that this differs from the hobit() function, which calculates its statistics from MCMC samples.

4.2.4 Options

The number of iterations and number of samples drawn per iteration can be adjusted using the chains and iter_sampling arguments, respectively. For example:

x_output <- homeoroq(x, chains = 8, iter_sampling = 1e5)

Additionally, the n_threads option can be used to run iterations in parallel.

x_output <- homeoroq(x, chains = 8, iter_sampling = 1e5, n_threads = 8)