API Reference¶
Expression Data¶
The xqubit.ExpressionData class stores a gene expression matrix
together with sample annotation, including experimental conditions, time
points, and biological replicates.
It provides a consistent interface for accessing expression data as a gene × sample matrix, a gene × replicate × condition array, or a gene × condition matrix after replicate averaging. This structure allows downstream functions to handle replicate-aware data and condition-level summaries consistently.
- class xqubit.ExpressionData[source]¶
Bases:
objectContainer for gene expression data with experimental design.
This class stores a gene expression matrix together with experimental design information, such as conditions, time points, and biological replicates.
The input matrix is expected to have genes in rows and samples in columns. Internally, the data are also represented as a three dimensional array with the shape gene × replicate × condition.
- __init__(x, design, genes=None)[source]¶
Create an ExpressionData object.
- Parameters:
x (numpy.ndarray or pandas.DataFrame) – Gene expression matrix with shape (n_genes, n_samples). Rows correspond to genes and columns correspond to samples. Values should be normalized expression values, such as variance-stabilized counts, log-transformed counts, or another comparable expression scale.
design (dict or pandas.DataFrame) –
Sample annotation table with one row per sample.
The table must contain the following columns: -
condition: experimental condition, time point, or stage -replicate: biological replicate identifier -timepoint: time point of the sample in integers (optional)The row order of
designmust match the column order ofx.genes (sequence of str, optional) – Gene identifiers corresponding to the rows of
x. IfNone, gene identifiers are generated automatically.
Notes
Input data are standardized by
utils.read_databefore being stored. The expression matrix is then reshaped into an array with shape (n_genes, n_replicates, n_conditions). Conditions and replicates are ordered by their sorted unique values in the design table.
- genes(i=None)[source]¶
Access gene identifiers.
- Parameters:
i (int, str, or None, optional) –
Gene selector.
If
None, return all gene identifiers.If
int, return the gene identifier at that row index.If
str, return the row index of the specified gene.
- Returns:
Gene identifiers or a row index, depending on
i.- Return type:
list of str, str, or int
- design(att=None, flatten=False)[source]¶
Access sample or condition annotation.
- Parameters:
att (str or None, optional) – Annotation column to retrieve. If
None, return the full annotation table.flatten (bool, default=False) –
If
False, return condition-level annotation aligned with the condition axis of the expression array.If
True, return sample-level annotation aligned with the original sample order.
- Returns:
Annotation table or one annotation column.
- Return type:
pandas.DataFrame or pandas.Series
- exp(avg_replicates=False, zscore=False, flatten=False)[source]¶
Retrieve expression values.
- Parameters:
avg_replicates (bool, default=False) – If
True, average biological replicates for each condition and return a gene × condition matrix.zscore (bool, default=False) – If
True, standardize expression values for each gene across the last axis of the returned array.flatten (bool, default=False) – If
True, return a gene × sample matrix aligned with the original sample order. This option applies when replicates have not been averaged.
- Returns:
Expression values. Depending on the options, the returned array has one of the following shapes:
gene × replicate × condition
gene × condition
gene × sample
- Return type:
numpy.ndarray
- plot(gene, file_name=None, figsize=(6.0, 4.0), dpi=300)[source]¶
Plot the expression profile of one gene.
- Parameters:
gene (int or str) – Gene row index or gene identifier.
file_name (str or None, optional) – Output file path. If provided, the figure is saved to this path. If
None, the figure is displayed.figsize (tuple of float, default=(6.0, 4.0)) – Figure size in inches.
dpi (int, default=300) – Figure resolution in dots per inch.
- Return type:
None
Data Utilities¶
The xqubit.utils module provides helper functions for reading and
standardizing gene expression data.
It accepts expression data from arrays, data frames, or tabular files, and checks that expression values, gene identifiers, and sample annotation are properly aligned before analysis.
- xqubit.utils.read_data(x, design=None, genes=None, **kwargs)[source]¶
Read gene expression data and sample annotation.
This function converts input expression data into a standard format used by this package: a numeric gene × sample expression matrix, a sample annotation table, and gene identifiers.
- Parameters:
x (str, os.PathLike, numpy.ndarray, or pandas.DataFrame) –
Gene expression data.
If a file path is given, the file is read with
pandas.read_csv.If a
numpy.ndarrayis given, it must have shape (n_genes, n_samples), anddesignmust also be provided.If a
pandas.DataFrameis given, rows are treated as genes and columns are treated as samples.
design (dict or pandas.DataFrame, optional) –
Sample annotation table with one row per sample.
The table must contain the following columns:
condition: experimental condition, time point, or stagereplicate: biological replicate identifier
The row order of
designmust match the column order of the expression matrix.If
designisNone, sample annotation is inferred from the column names ofx. In this case, column names must follow the format<condition>_<replicate>, for example72h_1,control_2, orDAG10_R1.genes (sequence of str, optional) –
Gene identifiers corresponding to the rows of the expression matrix.
If
genesisNoneandxis apandas.DataFramewith a non-integer index, the DataFrame index is used as gene identifiers. Otherwise, gene identifiers are generated automatically.**kwargs –
Additional keyword arguments passed to
pandas.read_csvwhenxis a file path.If the input file contains gene identifiers in the first column, pass
index_col=0so that the first column is used as the row index rather than being treated as expression values.
- Returns:
Dictionary containing standardized expression data.
"exp":numpy.ndarraywith shape (n_genes, n_samples) Expression matrix stored asfloat64."design":pandas.DataFramewith shape (n_samples, n_fields) Sample annotation table containing at leastcondition,timepoint, andreplicate."genes": list of str Gene identifiers aligned with the rows of"exp".
- Return type:
dict
Similarity and Statistical Metrics¶
The xqubit.metrics module provides functions for calculating gene-wise
scores and gene-gene similarity matrices.
It includes basic gene-wise statistics, Pearson correlation, cosine-squared similarity, and fidelity between state vectors. These functions can be used for gene filtering, similarity analysis, and network construction.
- xqubit.metrics.calc_stats(x, y=None, method='variance', normalize=False)[source]¶
Calculate gene-wise summary or group-difference scores.
This function computes one score for each gene from a gene × sample expression matrix. The score can be based on overall variation, mean expression, or differential expression across sample groups.
- Parameters:
x (numpy.ndarray, shape (n_genes, n_samples)) – Gene expression matrix. Rows correspond to genes and columns correspond to samples.
y (numpy.ndarray or None, optional) –
Group labels for samples. The length of
ymust match the number of columns inx.This argument is required when
methodis"ttest"or"anova".method ({"variance", "mean", "ttest", "anova"}, default="variance") –
Scoring method.
"variance": variance of each gene across samples"mean": mean expression level of each gene"ttest": absolute Welch’s t-statistic between two groups"anova": one-way ANOVA F-statistic across two or more groups
normalize (bool, default=False) – If
True, rescale scores to the range [0, 1].
- Returns:
Gene-wise scores.
- Return type:
numpy.ndarray, shape (n_genes,)
- xqubit.metrics.calc_corrcoef(x, method='pearson', diff=False)[source]¶
Calculate pairwise gene-gene correlation.
- Parameters:
x (numpy.ndarray, shape (n_genes, n_conditions)) – Gene expression matrix, typically after averaging biological replicates for each condition or time point.
method ({"pearson"}, default="pearson") – Correlation method. Currently, only Pearson correlation is supported.
diff (bool, default=False) – If
True, calculate correlations using changes between adjacent conditions or time points, defined asx[:, 1:] - x[:, :-1].
- Returns:
Symmetric matrix of pairwise gene-gene correlation coefficients.
- Return type:
numpy.ndarray, shape (n_genes, n_genes)
- xqubit.metrics.calc_cos2_similarity(x, normalize=False)[source]¶
Calculate pairwise cosine-squared similarity between genes.
This function compares gene expression profiles by the squared cosine of the angle between two expression vectors. The resulting score is high when two genes have similar expression profile directions, regardless of the sign of the cosine value.
- Parameters:
x (numpy.ndarray, shape (n_genes, n_conditions)) – Gene expression matrix. Rows correspond to genes and columns correspond to conditions, time points, or other ordered measurements.
normalize (bool, default=False) – If
True, rescale cosine values to the range [0, 1] before squaring.
- Returns:
Symmetric matrix of pairwise cosine-squared similarity scores.
- Return type:
numpy.ndarray, shape (n_genes, n_genes)
- xqubit.metrics.calc_fidelity(x, n=100, seed=None, **kwargs)[source]¶
Calculate pairwise fidelity between gene level state vectors.
For two normalized state vectors, fidelity is the squared absolute inner product between them. In this package, it is used as a gene-gene similarity measure after expression profiles have been encoded as normalized state vectors.
- Parameters:
x (numpy.ndarray | data.ExpressionData) –
Input data for fidelity computation.
- If 2D ndarray (n_genes, n_components), interpreted as normalized
state vectors and fidelity is computed directly.
- If ExpressionData, one replicate is randomly sampled per
condition from the 3D expression cube,
xqubit.qstate.buildis applied, and fidelity is computed. This is repeatedntimes.
3D ndarray input is not supported.
n (int, default=100) – Number of random sampling runs for ExpressionData input.
seed (int or None, default=None) – Random seed for reproducibility when sampling from ExpressionData.
**kwargs –
Additional options.
seed: int or None, random seed.timepoints: sequence for sampled temporary design.conditions: sequence for sampled temporary design.genes: sequence of gene names for sampled temporary data.- Other keyword arguments are passed to
- Returns:
Symmetric matrix of pairwise fidelity scores.
For ExpressionData input, mean and variance are computed across
nruns, and the mean matrix is returned.- Return type:
numpy.ndarray, shape (n_genes, n_genes)
State-Vector Encoding¶
The xqubit.qstate module provides functions for encoding temporal gene
expression profiles as normalized complex-valued state vectors.
Encoding methods define how expression magnitude and temporal changes are mapped to amplitudes and phases. The resulting state vectors can be compared by fidelity and used to construct gene networks.
- xqubit.qstate.build(x, encoding='TDP', alpha=None, alpha_scale=1.0, weights='amp', output='statevector')[source]¶
Build state-vector representations from expression profiles.
This function converts each gene expression profile into a normalized complex-valued state vector. These state vectors can then be used to calculate gene-gene similarity by fidelity or by SWAP test circuits.
- Parameters:
x (ExpressionData) – Gene expression data. The design table must contain a
timepointcolumn when temporal encodings are used.encoding (str, default="TDP") –
Encoding method.
Supported values are:
"EA": expression amplitude encoding"TDP": temporal-difference phase encoding"IDP": integrated-difference phase encoding"ODTDP": TDP with orthogonalized direction branches"ODIDP": IDP with orthogonalized direction branches
alpha (float or None, optional) – Phase scaling parameter. If
None,alphais determined automatically from the distribution of temporal features.alpha_scale (float, default=1.0) – Multiplicative factor applied to
alpha. Larger values increase phase separation between genes.weights ({"amp", "phase"}, list of {"amp", "phase"}, or None, optional) –
Optional weighting for unequally spaced time points.
"amp": weight amplitudes by interval length"phase": weight phase differences by inverse interval length
Weighting is ignored for
"EA".output ({"statevector", "full"}, default="statevector") –
Output format.
"statevector": return only the normalized state vectors"full": return state vectors and intermediate components
- Returns:
If
output="statevector", returns a complex-valued array with shape (n_genes, n_components).If
output="full", returns a dictionary containing:"amplitude": amplitude component"phase": phase component"statevector": normalized state vectors"x": expression array used for encoding"z": z-scored expression array used for encoding"weights": amplitude and phase weights used for construction
- Return type:
numpy.ndarray or dict
- xqubit.qstate.plot(x, i, encoding=None, file_name=None, title=None, figsize=(4.0, 4.0), dpi=300, xlim=None, ylim=None, alpha=1.0)[source]¶
Plot state vector in the complex plane.
- Parameters:
x (numpy.ndarray, shape (n_genes, n_components) or (n_genes, n_replicates, n_components)) – State-vector array returned by
build. If 3D, replicates are averaged before plotting.i (int) – Row index of the gene to plot.
encoding (str or None, optional) – Encoding used to build
x. Set this to an encoding containing"OD", such as"ODTDP"or"ODIDP", to display the two orthogonalized direction branches separately.file_name (str or None, optional) – Output file path. If provided, the figure is saved to this path. If
None, the figure is displayed.title (str or None, optional) – Plot title.
figsize (tuple of float, default=(4.0, 4.0)) – Figure size in inches.
dpi (int, default=300) – Figure resolution in dots per inch.
xlim (tuple of float or None, optional) – Limits of the real axis. If
None, limits are determined automatically.ylim (tuple of float or None, optional) – Limits of the imaginary axis. If
None, the same limits asxlimare used.alpha (float, default=1.0) – Transparency of plotted points and arrows.
- Return type:
None
SWAP Test Simulation¶
The xqubit.qcircuit module provides circuit-based fidelity estimation
using the SWAP test.
This module is mainly intended for checking or demonstrating how fidelity can be estimated from state vectors using simulated quantum measurements. For large-scale gene-gene similarity analysis, direct numerical fidelity calculation is usually more practical.
- xqubit.qcircuit.swaptest(x, n=None, backend=None, shots=8192, execute=True, seed=None)[source]¶
Estimate pairwise fidelity using SWAP test circuits.
This function estimates gene-gene fidelity values from normalized state vectors by running a SWAP test for each pair of genes.
For two normalized state vectors, the SWAP test estimates the probability of measuring the ancillary qubit as 0. The fidelity is then calculated from that probability. The result can be used as a gene-gene similarity matrix.
- Parameters:
x (numpy.ndarray, shape (n_genes, n_components)) – Normalized state vectors. Each row corresponds to one gene.
n (int or None, optional) – Number of gene pairs to sample. If
None, all upper-triangular gene pairs are evaluated. If an integer is given, only that many pairs are randomly selected.backend (AerSimulator or None, optional) – Qiskit backend used to run the SWAP test circuits. If
None, a defaultAerSimulatorbackend is created.shots (int, default=2**13) – Number of measurement shots used for each SWAP test circuit.
execute (bool, default=True) –
If
True, run the SWAP test circuits and return a fidelity matrix.If
False, return a representative SWAP test circuit constructed from the first two rows ofx. This is useful for inspecting or drawing the circuit.seed (int or None, optional) – Random seed used when
nis specified.
- Returns:
If
execute=True, returns a symmetric matrix of pairwise fidelity estimates with shape (n_genes, n_genes).If
execute=False, returns a QiskitQuantumCircuitobject for the first two state vectors.- Return type:
numpy.ndarray or QuantumCircuit
Notes
The returned matrix contains
NaNfor pairs that are not evaluated whennis specified.
Network Construction and Community Analysis¶
The xqubit.nx module provides tools for constructing fidelity-based
or other similarity-based gene networks and detecting gene communities.
It includes network construction by similarity thresholding and k-nearest neighbor sparsification, Leiden-based community detection, parameter scanning, parameter ranking, and visualization utilities. These functions support the exploration of transcriptomic structure from gene-gene similarity matrices.
- xqubit.nx.build_network(x, s=0.5, k=10, mutual_knn=True, seed=None)[source]¶
Build a sparse weighted gene network from a similarity matrix.
This function converts a gene-gene similarity matrix into an undirected weighted graph. Edges are selected by applying a similarity threshold, retaining up to
knearest neighbors for each gene, and optionally keeping only mutual nearest-neighbor relationships.- Parameters:
x (numpy.ndarray, shape (n_genes, n_genes)) – Symmetric gene-gene similarity matrix. Diagonal values are ignored.
s (float, default=0.5) –
Similarity cutoff for edge selection.
If
0 <= s <= 1,sis used as an absolute similarity cutoff.If
s > 1,sis interpreted as a percentile of the upper-triangular similarity values.
k (int, default=10) – Maximum number of neighbors retained for each gene after thresholding.
mutual_knn (bool, default=True) – If
True, keep an edge only when both genes select each other as neighbors. IfFalse, keep an edge when either gene selects the other.seed (int or None, optional) – Random seed.
- Returns:
Weighted undirected graph. Edge weights are stored in
g.es["weight"].- Return type:
igraph.Graph
- xqubit.nx.detect_communities(g, min_size=20, resolution=1.0, n_iterations=100, consensus_threshold=0.8, seed=None, format='list')[source]¶
Detect gene communities using Leiden clustering.
This function detects communities in a weighted gene network. When
n_iterationsis greater than 1, Leiden clustering is repeated multiple times, a consensus network is built from co-clustering frequencies, and a final Leiden clustering is performed on the consensus network.- Parameters:
g (igraph.Graph) – Weighted undirected graph. Edge weights must be stored in
g.es["weight"].min_size (int, default=100) – Minimum size of reported communities. Communities smaller than this value are assigned to community 0.
resolution (float, default=1.0) – Resolution parameter for Leiden clustering. Larger values usually produce more and smaller communities.
n_iterations (int, default=100) – Number of Leiden runs used to build the consensus network. If set to 1, consensus clustering is skipped.
consensus_threshold (float, default=0.8) –
Threshold for retaining edges in the consensus network.
If
0 <= consensus_threshold <= 1, it is used as an absolute co-clustering frequency cutoff.If
consensus_threshold > 1, it is interpreted as a percentile of upper-triangular consensus values.
seed (int or None, optional) – Random seed. When provided, each repeated run uses a different deterministic seed.
format ({"list", "partition"}, default="list") –
Output format.
"list": return one community label per gene"partition": return the Leiden partition object
- Returns:
Community labels or a Leiden partition object, depending on
format.- Return type:
list of int or leidenalg.RBConfigurationVertexPartition
- xqubit.nx.save_communities(x, file_name, nodes=None)[source]¶
Save gene community assignments to a tab-separated file.
- Parameters:
x (list of int) – Community label for each gene or node. Community 0 is used for genes assigned to small communities that were pooled during post-processing.
file_name (str) – Output file path.
nodes (list of str or None, optional) – Gene or node identifiers corresponding to
x. IfNone, integer node indices are written.
- Return type:
None
- xqubit.nx.plot_communities(x, exp, x_labels=None, i=None, file_name=None, figsize=(4, 4), line_width=1.0, alpha=0.1, ylim=None)[source]¶
Plot expression profiles for genes grouped by community.
- Parameters:
x (sequence of int) – Community label for each gene. The length must match the number of rows in
exp.exp (numpy.ndarray, shape (n_genes, n_conditions)) – Gene expression matrix to plot.
x_labels (numpy.ndarray or None, optional) – Labels or values shown on the x-axis. If
None, column indices ofexpare used.i (int or None, optional) – Community to plot. If
None, all communities except community 0 are plotted.file_name (str or None, optional) – Output file path. If provided, the figure is saved to this path. If
None, the figure is displayed.figsize (float or tuple of float, default=(4, 4)) – Size of each subplot in inches. If a single number is given, the same value is used for width and height.
line_width (float, default=1.0) – Line width for individual gene expression profiles.
alpha (float, default=0.1) – Transparency of individual gene expression profiles.
ylim (tuple of float or None, optional) – Limits of the y-axis. If
None, limits are determined automatically.
- Return type:
None
- xqubit.nx.scan_network_params(x, s_cutoffs=[50, 60, 70, 80, 85, 90, 95], k_cutoffs=array([5, 10, 15, 20, 25, 30]), resolutions=array([0.79432823, 1., 1.25892541, 1.58489319, 1.99526231]), min_size=20, n_iterations=100, mutual_knn=True, n_threads=1, seed=None)[source]¶
Scan network construction and community detection parameters.
This function evaluates combinations of similarity cutoffs, k-nearest neighbor settings, and Leiden resolution parameters. For each parameter set, it builds a gene network, repeats community detection, and summarizes network modularity, clustering stability, gene coverage, and community size statistics.
- Parameters:
x (numpy.ndarray, shape (n_genes, n_genes)) – Symmetric gene-gene similarity matrix.
s_cutoffs (list of float or numpy.ndarray, default=[50, 60, 70, 80, 85, 90, 95]) – Similarity cutoffs passed to
build_network. Values greater than 1 are interpreted as percentiles of the upper-triangular similarity values.k_cutoffs (list of int or numpy.ndarray, default=np.arange(5, 31, 5)) – Values of
kpassed tobuild_network.resolutions (float, list of float, or numpy.ndarray, default=np.logspace(-0.1, 0.3, 5)) – Leiden resolution parameters to evaluate.
min_size (int, default=100) – Minimum size of reported communities. Smaller communities are pooled into community 0.
n_iterations (int, default=100) – Number of repeated Leiden runs for each parameter set.
mutual_knn (bool, default=True) – Whether to use mutual k-nearest-neighbor filtering when building the network.
n_threads (int, default=1) – Number of parallel jobs.
seed (int or None, optional) – Random seed.
- Returns:
Parameter scan results. Each row corresponds to one parameter combination and contains summary statistics for modularity, clustering stability, gene coverage, and community sizes.
- Return type:
pandas.DataFrame
- xqubit.nx.rank_network_params(df, opt_vars=None, balance_weights=None)[source]¶
Rank network parameter sets from a parameter scan table.
This function filters parameter sets by acceptable value ranges, identifies Pareto-optimal solutions among selected objective variables, and selects one balanced solution from the Pareto front.
- Parameters:
df (pandas.DataFrame) – Output table from
scan_network_params.opt_vars (dict or None, optional) –
Filtering and optimization criteria.
Each key is a column name in
df. Each value is a dictionary that can contain:"min": minimum acceptable value"max": maximum acceptable value"direction": optimization direction, either"max"or"min"
Variables with
"min"or"max"are used for filtering. Variables with"direction"are also used for Pareto-front detection and balanced-solution selection.balance_weights (dict or None, optional) – Weights used to select the balanced solution from the Pareto front. Keys must match variables in
opt_varsthat have"direction". IfNone, all objective variables are weighted equally.
- Returns:
Ranked parameter table with additional columns:
score: ranking scorewithin_opt_range: whether the parameter set passed range filtersis_pareto: whether the parameter set is Pareto-optimalis_balanced: whether the parameter set is the selected balanced solutionbalance_distance: distance from the normalized ideal pointbalance_rank: rank bybalance_distanceamong Pareto-optimal solutions
- Return type:
pandas.DataFrame
Notes
The
scorecolumn is coded as follows:0: outside the acceptable range1: within the acceptable range2: Pareto-optimal3: selected balanced solution
- xqubit.nx.plot_network_params(df, xlabel='ami_mean', ylabel='modularity_mean', file_name=None)[source]¶
Plot network parameter scan results.
` This function creates an interactive scatter plot from a ranked parameter scan table. Points are grouped by ``scoreso that acceptable, Pareto-optimal, and selected balanced parameter sets can be inspected.- Parameters:
df (pandas.DataFrame) – Output table from
rank_network_params.xlabel (str, default="ami_mean") – Column name used for the x-axis.
ylabel (str, default="modularity_mean") – Column name used for the y-axis.
file_name (str or None, optional) – Output HTML file path. If provided, the plot is saved to this path. If
None, the plot is displayed.
- Returns:
This function is used for visualization and does not return a value.
- Return type:
None