5.3 Custom Data Simulation
Instead of relying on the default datasets based on C. flexuosa or wheat, users can provide their own expression matrix to calculate the mean and dispersion parameters for simulation. The matrix should have genes as rows and samples (biological replicates) as columns, with each cell containing a normalized read count.
Below is an example showing how to load a custom matrix and use it as a template for simulation.
## V1 V2 V3
## 1 90 50 6
## 2 8 13 6
## 3 319 351 472
## 4 683 671 535
## 5 258 162 160
## 6 69 74 81
## # 2 subgenome sets (A, B)
## # 10000 homeolog tuples
## ---------------------
## Experiment Design:
## group replicate
## 1 group_1 1
## 2 group_1 2
## 3 group_1 3
## 4 group_2 1
## 5 group_2 2
## 6 group_2 3
## ---------------------
## > subgenome: A
## group_1__1 group_1__2 group_1__3 group_2__1 group_2__2 group_2__3
## gene_1 27 22 16 14 14 22
## gene_2 278 4 12 87 113 93
## gene_3 614 316 433 437 228 631
## gene_4 360 341 311 359 237 411
## gene_5 138 42 214 150 189 165
## gene_6 1028 1063 1171 383 385 342
## +++++++++++++++++++++
## > subgenome: B
## group_1__1 group_1__2 group_1__3 group_2__1 group_2__2 group_2__3
## gene_1 9 21 11 21 19 21
## gene_2 124 107 123 218 187 200
## gene_3 162 149 206 180 235 259
## gene_4 373 329 384 353 269 374
## gene_5 63 45 69 75 49 43
## gene_6 101 93 102 67 98 83
## ---------------------
This approach enables users to generate simulated datasets that closely resemble their own study system, facilitating realistic evaluation of HOBIT or related methods.