API Reference

Expression Data

The xqubit.ExpressionData class stores a gene expression matrix together with sample annotation, including experimental conditions, time points, and biological replicates.

It provides a consistent interface for accessing expression data as a gene × sample matrix, a gene × replicate × condition array, or a gene × condition matrix after replicate averaging. This structure allows downstream functions to handle replicate-aware data and condition-level summaries consistently.

class xqubit.ExpressionData[source]

Bases: object

Container for gene expression data with experimental design.

This class stores a gene expression matrix together with experimental design information, such as conditions, time points, and biological replicates.

The input matrix is expected to have genes in rows and samples in columns. Internally, the data are also represented as a three dimensional array with the shape gene × replicate × condition.

__init__(x, design, genes=None)[source]

Create an ExpressionData object.

Parameters:
  • x (numpy.ndarray or pandas.DataFrame) – Gene expression matrix with shape (n_genes, n_samples). Rows correspond to genes and columns correspond to samples. Values should be normalized expression values, such as variance-stabilized counts, log-transformed counts, or another comparable expression scale.

  • design (dict or pandas.DataFrame) –

    Sample annotation table with one row per sample.

    The table must contain the following columns: - condition: experimental condition, time point, or stage - replicate: biological replicate identifier - timepoint: time point of the sample in integers (optional)

    The row order of design must match the column order of x.

  • genes (sequence of str, optional) – Gene identifiers corresponding to the rows of x. If None, gene identifiers are generated automatically.

Notes

Input data are standardized by utils.read_data before being stored. The expression matrix is then reshaped into an array with shape (n_genes, n_replicates, n_conditions). Conditions and replicates are ordered by their sorted unique values in the design table.

genes(i=None)[source]

Access gene identifiers.

Parameters:

i (int, str, or None, optional) –

Gene selector.

  • If None, return all gene identifiers.

  • If int, return the gene identifier at that row index.

  • If str, return the row index of the specified gene.

Returns:

Gene identifiers or a row index, depending on i.

Return type:

list of str, str, or int

design(att=None, flatten=False)[source]

Access sample or condition annotation.

Parameters:
  • att (str or None, optional) – Annotation column to retrieve. If None, return the full annotation table.

  • flatten (bool, default=False) –

    If False, return condition-level annotation aligned with the condition axis of the expression array.

    If True, return sample-level annotation aligned with the original sample order.

Returns:

Annotation table or one annotation column.

Return type:

pandas.DataFrame or pandas.Series

exp(avg_replicates=False, zscore=False, flatten=False)[source]

Retrieve expression values.

Parameters:
  • avg_replicates (bool, default=False) – If True, average biological replicates for each condition and return a gene × condition matrix.

  • zscore (bool, default=False) – If True, standardize expression values for each gene across the last axis of the returned array.

  • flatten (bool, default=False) – If True, return a gene × sample matrix aligned with the original sample order. This option applies when replicates have not been averaged.

Returns:

Expression values. Depending on the options, the returned array has one of the following shapes:

  • gene × replicate × condition

  • gene × condition

  • gene × sample

Return type:

numpy.ndarray

plot(gene, file_name=None, figsize=(6.0, 4.0), dpi=300)[source]

Plot the expression profile of one gene.

Parameters:
  • gene (int or str) – Gene row index or gene identifier.

  • file_name (str or None, optional) – Output file path. If provided, the figure is saved to this path. If None, the figure is displayed.

  • figsize (tuple of float, default=(6.0, 4.0)) – Figure size in inches.

  • dpi (int, default=300) – Figure resolution in dots per inch.

Return type:

None

Data Utilities

The xqubit.utils module provides helper functions for reading and standardizing gene expression data.

It accepts expression data from arrays, data frames, or tabular files, and checks that expression values, gene identifiers, and sample annotation are properly aligned before analysis.

xqubit.utils.read_data(x, design=None, genes=None, **kwargs)[source]

Read gene expression data and sample annotation.

This function converts input expression data into a standard format used by this package: a numeric gene × sample expression matrix, a sample annotation table, and gene identifiers.

Parameters:
  • x (str, os.PathLike, numpy.ndarray, or pandas.DataFrame) –

    Gene expression data.

    • If a file path is given, the file is read with pandas.read_csv.

    • If a numpy.ndarray is given, it must have shape (n_genes, n_samples), and design must also be provided.

    • If a pandas.DataFrame is given, rows are treated as genes and columns are treated as samples.

  • design (dict or pandas.DataFrame, optional) –

    Sample annotation table with one row per sample.

    The table must contain the following columns:

    • condition: experimental condition, time point, or stage

    • replicate: biological replicate identifier

    The row order of design must match the column order of the expression matrix.

    If design is None, sample annotation is inferred from the column names of x. In this case, column names must follow the format <condition>_<replicate>, for example 72h_1, control_2, or DAG10_R1.

  • genes (sequence of str, optional) –

    Gene identifiers corresponding to the rows of the expression matrix.

    If genes is None and x is a pandas.DataFrame with a non-integer index, the DataFrame index is used as gene identifiers. Otherwise, gene identifiers are generated automatically.

  • **kwargs

    Additional keyword arguments passed to pandas.read_csv when x is a file path.

    If the input file contains gene identifiers in the first column, pass index_col=0 so that the first column is used as the row index rather than being treated as expression values.

Returns:

Dictionary containing standardized expression data.

  • "exp": numpy.ndarray with shape (n_genes, n_samples) Expression matrix stored as float64.

  • "design": pandas.DataFrame with shape (n_samples, n_fields) Sample annotation table containing at least condition, timepoint, and replicate.

  • "genes": list of str Gene identifiers aligned with the rows of "exp".

Return type:

dict

Similarity and Statistical Metrics

The xqubit.metrics module provides functions for calculating gene-wise scores and gene-gene similarity matrices.

It includes basic gene-wise statistics, Pearson correlation, cosine-squared similarity, and fidelity between state vectors. These functions can be used for gene filtering, similarity analysis, and network construction.

xqubit.metrics.calc_stats(x, y=None, method='variance', normalize=False)[source]

Calculate gene-wise summary or group-difference scores.

This function computes one score for each gene from a gene × sample expression matrix. The score can be based on overall variation, mean expression, or differential expression across sample groups.

Parameters:
  • x (numpy.ndarray, shape (n_genes, n_samples)) – Gene expression matrix. Rows correspond to genes and columns correspond to samples.

  • y (numpy.ndarray or None, optional) –

    Group labels for samples. The length of y must match the number of columns in x.

    This argument is required when method is "ttest" or "anova".

  • method ({"variance", "mean", "ttest", "anova"}, default="variance") –

    Scoring method.

    • "variance": variance of each gene across samples

    • "mean": mean expression level of each gene

    • "ttest": absolute Welch’s t-statistic between two groups

    • "anova": one-way ANOVA F-statistic across two or more groups

  • normalize (bool, default=False) – If True, rescale scores to the range [0, 1].

Returns:

Gene-wise scores.

Return type:

numpy.ndarray, shape (n_genes,)

xqubit.metrics.calc_corrcoef(x, method='pearson', diff=False)[source]

Calculate pairwise gene-gene correlation.

Parameters:
  • x (numpy.ndarray, shape (n_genes, n_conditions)) – Gene expression matrix, typically after averaging biological replicates for each condition or time point.

  • method ({"pearson"}, default="pearson") – Correlation method. Currently, only Pearson correlation is supported.

  • diff (bool, default=False) – If True, calculate correlations using changes between adjacent conditions or time points, defined as x[:, 1:] - x[:, :-1].

Returns:

Symmetric matrix of pairwise gene-gene correlation coefficients.

Return type:

numpy.ndarray, shape (n_genes, n_genes)

xqubit.metrics.calc_cos2_similarity(x, normalize=False)[source]

Calculate pairwise cosine-squared similarity between genes.

This function compares gene expression profiles by the squared cosine of the angle between two expression vectors. The resulting score is high when two genes have similar expression profile directions, regardless of the sign of the cosine value.

Parameters:
  • x (numpy.ndarray, shape (n_genes, n_conditions)) – Gene expression matrix. Rows correspond to genes and columns correspond to conditions, time points, or other ordered measurements.

  • normalize (bool, default=False) – If True, rescale cosine values to the range [0, 1] before squaring.

Returns:

Symmetric matrix of pairwise cosine-squared similarity scores.

Return type:

numpy.ndarray, shape (n_genes, n_genes)

xqubit.metrics.calc_fidelity(x, n=100, seed=None, **kwargs)[source]

Calculate pairwise fidelity between gene level state vectors.

For two normalized state vectors, fidelity is the squared absolute inner product between them. In this package, it is used as a gene-gene similarity measure after expression profiles have been encoded as normalized state vectors.

Parameters:
  • x (numpy.ndarray | data.ExpressionData) –

    Input data for fidelity computation.

    • If 2D ndarray (n_genes, n_components), interpreted as normalized

      state vectors and fidelity is computed directly.

    • If ExpressionData, one replicate is randomly sampled per

      condition from the 3D expression cube, xqubit.qstate.build is applied, and fidelity is computed. This is repeated n times.

    3D ndarray input is not supported.

  • n (int, default=100) – Number of random sampling runs for ExpressionData input.

  • seed (int or None, default=None) – Random seed for reproducibility when sampling from ExpressionData.

  • **kwargs

    Additional options.

    • seed: int or None, random seed.

    • timepoints: sequence for sampled temporary design.

    • conditions: sequence for sampled temporary design.

    • genes: sequence of gene names for sampled temporary data.

    • Other keyword arguments are passed to

      xqubit.qstate.build.

Returns:

Symmetric matrix of pairwise fidelity scores.

For ExpressionData input, mean and variance are computed across n runs, and the mean matrix is returned.

Return type:

numpy.ndarray, shape (n_genes, n_genes)

State-Vector Encoding

The xqubit.qstate module provides functions for encoding temporal gene expression profiles as normalized complex-valued state vectors.

Encoding methods define how expression magnitude and temporal changes are mapped to amplitudes and phases. The resulting state vectors can be compared by fidelity and used to construct gene networks.

xqubit.qstate.build(x, encoding='TDP', alpha=None, alpha_scale=1.0, weights='amp', output='statevector')[source]

Build state-vector representations from expression profiles.

This function converts each gene expression profile into a normalized complex-valued state vector. These state vectors can then be used to calculate gene-gene similarity by fidelity or by SWAP test circuits.

Parameters:
  • x (ExpressionData) – Gene expression data. The design table must contain a timepoint column when temporal encodings are used.

  • encoding (str, default="TDP") –

    Encoding method.

    Supported values are:

    • "EA": expression amplitude encoding

    • "TDP": temporal-difference phase encoding

    • "IDP": integrated-difference phase encoding

    • "ODTDP": TDP with orthogonalized direction branches

    • "ODIDP": IDP with orthogonalized direction branches

  • alpha (float or None, optional) – Phase scaling parameter. If None, alpha is determined automatically from the distribution of temporal features.

  • alpha_scale (float, default=1.0) – Multiplicative factor applied to alpha. Larger values increase phase separation between genes.

  • weights ({"amp", "phase"}, list of {"amp", "phase"}, or None, optional) –

    Optional weighting for unequally spaced time points.

    • "amp": weight amplitudes by interval length

    • "phase": weight phase differences by inverse interval length

    Weighting is ignored for "EA".

  • output ({"statevector", "full"}, default="statevector") –

    Output format.

    • "statevector": return only the normalized state vectors

    • "full": return state vectors and intermediate components

Returns:

If output="statevector", returns a complex-valued array with shape (n_genes, n_components).

If output="full", returns a dictionary containing:

  • "amplitude": amplitude component

  • "phase": phase component

  • "statevector": normalized state vectors

  • "x": expression array used for encoding

  • "z": z-scored expression array used for encoding

  • "weights": amplitude and phase weights used for construction

Return type:

numpy.ndarray or dict

xqubit.qstate.plot(x, i, encoding=None, file_name=None, title=None, figsize=(4.0, 4.0), dpi=300, xlim=None, ylim=None, alpha=1.0)[source]

Plot state vector in the complex plane.

Parameters:
  • x (numpy.ndarray, shape (n_genes, n_components) or (n_genes, n_replicates, n_components)) – State-vector array returned by build. If 3D, replicates are averaged before plotting.

  • i (int) – Row index of the gene to plot.

  • encoding (str or None, optional) – Encoding used to build x. Set this to an encoding containing "OD", such as "ODTDP" or "ODIDP", to display the two orthogonalized direction branches separately.

  • file_name (str or None, optional) – Output file path. If provided, the figure is saved to this path. If None, the figure is displayed.

  • title (str or None, optional) – Plot title.

  • figsize (tuple of float, default=(4.0, 4.0)) – Figure size in inches.

  • dpi (int, default=300) – Figure resolution in dots per inch.

  • xlim (tuple of float or None, optional) – Limits of the real axis. If None, limits are determined automatically.

  • ylim (tuple of float or None, optional) – Limits of the imaginary axis. If None, the same limits as xlim are used.

  • alpha (float, default=1.0) – Transparency of plotted points and arrows.

Return type:

None

SWAP Test Simulation

The xqubit.qcircuit module provides circuit-based fidelity estimation using the SWAP test.

This module is mainly intended for checking or demonstrating how fidelity can be estimated from state vectors using simulated quantum measurements. For large-scale gene-gene similarity analysis, direct numerical fidelity calculation is usually more practical.

xqubit.qcircuit.swaptest(x, n=None, backend=None, shots=8192, execute=True, seed=None)[source]

Estimate pairwise fidelity using SWAP test circuits.

This function estimates gene-gene fidelity values from normalized state vectors by running a SWAP test for each pair of genes.

For two normalized state vectors, the SWAP test estimates the probability of measuring the ancillary qubit as 0. The fidelity is then calculated from that probability. The result can be used as a gene-gene similarity matrix.

Parameters:
  • x (numpy.ndarray, shape (n_genes, n_components)) – Normalized state vectors. Each row corresponds to one gene.

  • n (int or None, optional) – Number of gene pairs to sample. If None, all upper-triangular gene pairs are evaluated. If an integer is given, only that many pairs are randomly selected.

  • backend (AerSimulator or None, optional) – Qiskit backend used to run the SWAP test circuits. If None, a default AerSimulator backend is created.

  • shots (int, default=2**13) – Number of measurement shots used for each SWAP test circuit.

  • execute (bool, default=True) –

    If True, run the SWAP test circuits and return a fidelity matrix.

    If False, return a representative SWAP test circuit constructed from the first two rows of x. This is useful for inspecting or drawing the circuit.

  • seed (int or None, optional) – Random seed used when n is specified.

Returns:

If execute=True, returns a symmetric matrix of pairwise fidelity estimates with shape (n_genes, n_genes).

If execute=False, returns a Qiskit QuantumCircuit object for the first two state vectors.

Return type:

numpy.ndarray or QuantumCircuit

Notes

The returned matrix contains NaN for pairs that are not evaluated when n is specified.

Network Construction and Community Analysis

The xqubit.nx module provides tools for constructing fidelity-based or other similarity-based gene networks and detecting gene communities.

It includes network construction by similarity thresholding and k-nearest neighbor sparsification, Leiden-based community detection, parameter scanning, parameter ranking, and visualization utilities. These functions support the exploration of transcriptomic structure from gene-gene similarity matrices.

xqubit.nx.build_network(x, s=0.5, k=10, mutual_knn=True, seed=None)[source]

Build a sparse weighted gene network from a similarity matrix.

This function converts a gene-gene similarity matrix into an undirected weighted graph. Edges are selected by applying a similarity threshold, retaining up to k nearest neighbors for each gene, and optionally keeping only mutual nearest-neighbor relationships.

Parameters:
  • x (numpy.ndarray, shape (n_genes, n_genes)) – Symmetric gene-gene similarity matrix. Diagonal values are ignored.

  • s (float, default=0.5) –

    Similarity cutoff for edge selection.

    • If 0 <= s <= 1, s is used as an absolute similarity cutoff.

    • If s > 1, s is interpreted as a percentile of the upper-triangular similarity values.

  • k (int, default=10) – Maximum number of neighbors retained for each gene after thresholding.

  • mutual_knn (bool, default=True) – If True, keep an edge only when both genes select each other as neighbors. If False, keep an edge when either gene selects the other.

  • seed (int or None, optional) – Random seed.

Returns:

Weighted undirected graph. Edge weights are stored in g.es["weight"].

Return type:

igraph.Graph

xqubit.nx.detect_communities(g, min_size=20, resolution=1.0, n_iterations=100, consensus_threshold=0.8, seed=None, format='list')[source]

Detect gene communities using Leiden clustering.

This function detects communities in a weighted gene network. When n_iterations is greater than 1, Leiden clustering is repeated multiple times, a consensus network is built from co-clustering frequencies, and a final Leiden clustering is performed on the consensus network.

Parameters:
  • g (igraph.Graph) – Weighted undirected graph. Edge weights must be stored in g.es["weight"].

  • min_size (int, default=100) – Minimum size of reported communities. Communities smaller than this value are assigned to community 0.

  • resolution (float, default=1.0) – Resolution parameter for Leiden clustering. Larger values usually produce more and smaller communities.

  • n_iterations (int, default=100) – Number of Leiden runs used to build the consensus network. If set to 1, consensus clustering is skipped.

  • consensus_threshold (float, default=0.8) –

    Threshold for retaining edges in the consensus network.

    • If 0 <= consensus_threshold <= 1, it is used as an absolute co-clustering frequency cutoff.

    • If consensus_threshold > 1, it is interpreted as a percentile of upper-triangular consensus values.

  • seed (int or None, optional) – Random seed. When provided, each repeated run uses a different deterministic seed.

  • format ({"list", "partition"}, default="list") –

    Output format.

    • "list": return one community label per gene

    • "partition": return the Leiden partition object

Returns:

Community labels or a Leiden partition object, depending on format.

Return type:

list of int or leidenalg.RBConfigurationVertexPartition

xqubit.nx.save_communities(x, file_name, nodes=None)[source]

Save gene community assignments to a tab-separated file.

Parameters:
  • x (list of int) – Community label for each gene or node. Community 0 is used for genes assigned to small communities that were pooled during post-processing.

  • file_name (str) – Output file path.

  • nodes (list of str or None, optional) – Gene or node identifiers corresponding to x. If None, integer node indices are written.

Return type:

None

xqubit.nx.plot_communities(x, exp, x_labels=None, i=None, file_name=None, figsize=(4, 4), line_width=1.0, alpha=0.1, ylim=None)[source]

Plot expression profiles for genes grouped by community.

Parameters:
  • x (sequence of int) – Community label for each gene. The length must match the number of rows in exp.

  • exp (numpy.ndarray, shape (n_genes, n_conditions)) – Gene expression matrix to plot.

  • x_labels (numpy.ndarray or None, optional) – Labels or values shown on the x-axis. If None, column indices of exp are used.

  • i (int or None, optional) – Community to plot. If None, all communities except community 0 are plotted.

  • file_name (str or None, optional) – Output file path. If provided, the figure is saved to this path. If None, the figure is displayed.

  • figsize (float or tuple of float, default=(4, 4)) – Size of each subplot in inches. If a single number is given, the same value is used for width and height.

  • line_width (float, default=1.0) – Line width for individual gene expression profiles.

  • alpha (float, default=0.1) – Transparency of individual gene expression profiles.

  • ylim (tuple of float or None, optional) – Limits of the y-axis. If None, limits are determined automatically.

Return type:

None

xqubit.nx.scan_network_params(x, s_cutoffs=[50, 60, 70, 80, 85, 90, 95], k_cutoffs=array([5, 10, 15, 20, 25, 30]), resolutions=array([0.79432823, 1., 1.25892541, 1.58489319, 1.99526231]), min_size=20, n_iterations=100, mutual_knn=True, n_threads=1, seed=None)[source]

Scan network construction and community detection parameters.

This function evaluates combinations of similarity cutoffs, k-nearest neighbor settings, and Leiden resolution parameters. For each parameter set, it builds a gene network, repeats community detection, and summarizes network modularity, clustering stability, gene coverage, and community size statistics.

Parameters:
  • x (numpy.ndarray, shape (n_genes, n_genes)) – Symmetric gene-gene similarity matrix.

  • s_cutoffs (list of float or numpy.ndarray, default=[50, 60, 70, 80, 85, 90, 95]) – Similarity cutoffs passed to build_network. Values greater than 1 are interpreted as percentiles of the upper-triangular similarity values.

  • k_cutoffs (list of int or numpy.ndarray, default=np.arange(5, 31, 5)) – Values of k passed to build_network.

  • resolutions (float, list of float, or numpy.ndarray, default=np.logspace(-0.1, 0.3, 5)) – Leiden resolution parameters to evaluate.

  • min_size (int, default=100) – Minimum size of reported communities. Smaller communities are pooled into community 0.

  • n_iterations (int, default=100) – Number of repeated Leiden runs for each parameter set.

  • mutual_knn (bool, default=True) – Whether to use mutual k-nearest-neighbor filtering when building the network.

  • n_threads (int, default=1) – Number of parallel jobs.

  • seed (int or None, optional) – Random seed.

Returns:

Parameter scan results. Each row corresponds to one parameter combination and contains summary statistics for modularity, clustering stability, gene coverage, and community sizes.

Return type:

pandas.DataFrame

xqubit.nx.rank_network_params(df, opt_vars=None, balance_weights=None)[source]

Rank network parameter sets from a parameter scan table.

This function filters parameter sets by acceptable value ranges, identifies Pareto-optimal solutions among selected objective variables, and selects one balanced solution from the Pareto front.

Parameters:
  • df (pandas.DataFrame) – Output table from scan_network_params.

  • opt_vars (dict or None, optional) –

    Filtering and optimization criteria.

    Each key is a column name in df. Each value is a dictionary that can contain:

    • "min": minimum acceptable value

    • "max": maximum acceptable value

    • "direction": optimization direction, either "max" or "min"

    Variables with "min" or "max" are used for filtering. Variables with "direction" are also used for Pareto-front detection and balanced-solution selection.

  • balance_weights (dict or None, optional) – Weights used to select the balanced solution from the Pareto front. Keys must match variables in opt_vars that have "direction". If None, all objective variables are weighted equally.

Returns:

Ranked parameter table with additional columns:

  • score: ranking score

  • within_opt_range: whether the parameter set passed range filters

  • is_pareto: whether the parameter set is Pareto-optimal

  • is_balanced: whether the parameter set is the selected balanced solution

  • balance_distance: distance from the normalized ideal point

  • balance_rank: rank by balance_distance among Pareto-optimal solutions

Return type:

pandas.DataFrame

Notes

The score column is coded as follows:

  • 0: outside the acceptable range

  • 1: within the acceptable range

  • 2: Pareto-optimal

  • 3: selected balanced solution

xqubit.nx.plot_network_params(df, xlabel='ami_mean', ylabel='modularity_mean', file_name=None)[source]

Plot network parameter scan results.

` This function creates an interactive scatter plot from a ranked parameter scan table. Points are grouped by ``score so that acceptable, Pareto-optimal, and selected balanced parameter sets can be inspected.

Parameters:
  • df (pandas.DataFrame) – Output table from rank_network_params.

  • xlabel (str, default="ami_mean") – Column name used for the x-axis.

  • ylabel (str, default="modularity_mean") – Column name used for the y-axis.

  • file_name (str or None, optional) – Output HTML file path. If provided, the plot is saved to this path. If None, the plot is displayed.

Returns:

This function is used for visualization and does not return a value.

Return type:

None