5.3 Custom Data Simulation

Instead of relying on the default datasets based on C. flexuosa or wheat, users can provide their own expression matrix to calculate the mean and dispersion parameters for simulation. The matrix should have genes as rows and samples (biological replicates) as columns, with each cell containing a normalized read count.

Below is an example showing how to load a custom matrix and use it as a template for simulation.

expmx <- read.table("../data/seed.c_flexuosa.txt.gz")
head(expmx)

##    V1  V2  V3
## 1  90  50   6
## 2   8  13   6
## 3 319 351 472
## 4 683 671 535
## 5 258 162 160
## 6  69  74  81

x <- sim_homeolog_counts(seed_expmx = expmx)
x

## # 2 subgenome sets (A, B)
## # 10000 homeolog tuples
## ---------------------
## Experiment Design:
##     group replicate
## 1 group_1         1
## 2 group_1         2
## 3 group_1         3
## 4 group_2         1
## 5 group_2         2
## 6 group_2         3
## ---------------------
## > subgenome: A 
##        group_1__1 group_1__2 group_1__3 group_2__1 group_2__2 group_2__3
## gene_1         27         22         16         14         14         22
## gene_2        278          4         12         87        113         93
## gene_3        614        316        433        437        228        631
## gene_4        360        341        311        359        237        411
## gene_5        138         42        214        150        189        165
## gene_6       1028       1063       1171        383        385        342
## +++++++++++++++++++++
## > subgenome: B 
##        group_1__1 group_1__2 group_1__3 group_2__1 group_2__2 group_2__3
## gene_1          9         21         11         21         19         21
## gene_2        124        107        123        218        187        200
## gene_3        162        149        206        180        235        259
## gene_4        373        329        384        353        269        374
## gene_5         63         45         69         75         49         43
## gene_6        101         93        102         67         98         83
## ---------------------

This approach enables users to generate simulated datasets that closely resemble their own study system, facilitating realistic evaluation of HOBIT or related methods.