straindesign.compression

Metabolic network compression using nullspace-based coupling detection.

This module provides network compression for COBRA models. The main function is compress_cobra_model() which compresses a model and returns transformation matrices for converting between original and compressed flux spaces.

API:

>>> from straindesign.compression import compress_cobra_model
>>> result = compress_cobra_model(model)
>>> compressed_model = result.compressed_model

Note: The model should be preprocessed first (rational coefficients, conservation relations removed). Use networktools.compress_model() for automatic preprocessing.

Module Contents

class straindesign.compression.CompressionConverter(reaction_map: Dict[str, Dict[str, float | fractions.Fraction]], metabolite_map: Dict[str, Dict[str, float | fractions.Fraction]], flipped_reactions: List[str])[source]

Bidirectional transformer for expressions between original and compressed spaces.

expand_expression(expression: Dict[str, float], remove_missing: bool = False) → Dict[str, float][source]: Transform expression from compressed back to original space.

class straindesign.compression.CompressionMethod[source]

Bases: enum.Enum

Compression methods for metabolic network compression.

classmethod standard() → List[CompressionMethod][source]: Standard compression methods (recommended).

class straindesign.compression.CompressionRecord(pre: RationalMatrix, cmp: RationalMatrix, post: RationalMatrix, meta_names: List[str], stats: CompressionStatistics | None = None)[source]

Compression result with transformation matrices.

Contains: pre @ stoich @ post == cmp For EFM expansion: efm_original = post @ efm_compressed

class straindesign.compression.CompressionResult(compressed_model, compression_converter, pre_matrix, post_matrix, reaction_map, metabolite_map, statistics, methods_used, original_reaction_names, original_metabolite_names, flipped_reactions)[source]: Result of COBRA model compression.

class straindesign.compression.CompressionStatistics[source]: Tracks compression statistics for logging.

class straindesign.compression.RationalMatrix(rows: int, cols: int)[source]

Sparse rational matrix using dual int sparse storage (numerators + denominators).

add_scaled_column(dst_col: int, src_col: int, scalar_num: int, scalar_den: int) → None[source]: Add scalar * column[src] to column[dst]. dst[i] += (num/den) * src[i]

begin_batch_edit()[source]: Enter batch edit mode - delays cache invalidation.

clone() → RationalMatrix[source]: Create a deep copy.

end_batch_edit()[source]: Exit batch edit mode - converts back to CSR and invalidates cache.

classmethod from_cobra_model(model, max_precision: int = 6, max_denom: int = 100) → RationalMatrix[source]: Create RationalMatrix from COBRA model stoichiometry.

classmethod from_numpy(arr: numpy.ndarray, max_precision: int = 6, max_denom: int = 100) → RationalMatrix[source]: Create RationalMatrix from numpy array.

get_signum(row: int, col: int) → int[source]: Return sign of element: -1, 0, or 1.

classmethod identity(size: int) → RationalMatrix[source]: Create identity matrix.

is_bigint() → bool[source]: True if this matrix is stored in big-integer mode (coefficients exceed int64, so it is kept as a dict of exact Fractions rather than scipy int64 sparse).

iter_column_fractions(col: int) → Iterator[Tuple[int, fractions.Fraction]][source]: Iterate over non-zero entries in column as (row, Fraction) pairs.

remove_columns(keep_indices: List[int]) → None[source]: Keep only the specified columns.

remove_rows(keep_indices: List[int]) → None[source]: Keep only the specified rows.

submatrix(rows: int, cols: int) → RationalMatrix[source]: Extract top-left submatrix of given dimensions.

to_coo_exact() → ExactCOO[source]

Big-integer-safe exact sparse export, valid in both storage modes.

Returns ExactCOO(rows, cols, data, shape, denom) where entry (rows[k], cols[k]) equals data[k] / denom exactly. data are arbitrary-precision Python ints (no int64 ceiling).

to_numpy() → numpy.ndarray[source]: Convert to numpy float array.

to_sparse_csr() → Tuple[scipy.sparse.csr_matrix, int][source]

Return sparse representation and common denominator.

Returns numerator matrix scaled by LCM of denominators, plus the LCM.

to_sparse_pattern() → Tuple[scipy.sparse.csr_matrix, Dict[int, Dict[int, fractions.Fraction]]][source]

Return sparsity pattern as CSR and row-wise Fraction data.

For pattern analysis (coupled reaction detection) without integer overflow. Works in both the int64 and big-integer (dict-of-Fractions) storage modes. :returns: CSR matrix with 1s at non-zero positions (for indptr/indices)

row_data: {row: {col: Fraction}} for actual values

Return type:: pattern

class straindesign.compression.StoichMatrixCompressor(*methods: CompressionMethod)[source]

Nullspace-based metabolic network compression.

compress(stoich: RationalMatrix, meta_names: List[str], reac_names: List[str], suppressed: Set[str] = set(), bounds: List[Tuple[float, float]] | None = None, protected: Set[str] = set()) → CompressionRecord[source]

Compress network, return transformation matrices.

Single-pass approach: each iteration computes the nullspace once, then removes zero-flux reactions AND merges coupled groups from the same kernel. Re-iterates only when contradicting groups were removed (which may expose new couplings).

protected is an optional set of reaction names that must NOT be merged into coupled groups (kept intact). The rest of their coupled group still merges. Unlike suppressed, protected reactions are NOT removed.

straindesign.compression.basic_columns(matrix: RationalMatrix) → List[int][source]: Find indices of basic (pivot) columns using integer RREF.

straindesign.compression.basic_columns_from_numpy(mx: numpy.ndarray) → List[int][source]: Find basic columns from a numpy array.

straindesign.compression.compress_cobra_model(model, methods: List[str | CompressionMethod] | None = None, in_place: bool = True, suppressed_reactions: Set[str] = set(), protected_reactions: Set[str] = set()) → CompressionResult[source]

Compress a COBRA model using nullspace-based coupling detection.

Note: Model should be preprocessed first (rational coefficients, conservation relations removed). Use networktools.compress_model() for automatic preprocessing.

Parameters:

model – COBRA model to compress (should be preprocessed)
methods – Compression methods. Default: CompressionMethod.standard()
in_place – Modify original model (True) or work on copy (False)
suppressed_reactions – Reaction names removed from the network before compression (destructive; the reactions are deleted from the stoichiometric matrix). Note this changes the nullspace, so it is not the right tool for merely keeping reactions intact.
protected_reactions – Reaction names kept in the network but exempted from being merged into a coupled (lumped) group. Non-destructive: the reactions stay, only their lumping is prevented.

Returns:

CompressionResult with compressed model and transformation data

straindesign.compression.compress_model(model, no_par_compress_reacs=set(), compression_backend='sparse_rref', propagate_gpr=False, no_coupled_compress_reacs=set())[source]

Compress a metabolic model using multiple techniques.

Performs blocked reaction removal, conservation relation removal, and alternating dependent/parallel reaction lumping until no further compression is possible.

Parameters:

model – COBRA model to compress in-place
no_par_compress_reacs – Reactions exempt from parallel compression
no_coupled_compress_reacs – Reactions exempt from coupled compression. The rest of their coupled group still merges. Used to keep gene-controlled reactions un-merged through COMPRESS#1 so that gene multiplicity is preserved exactly once GPR rules are integrated (correct gene-regulatory semantics under compression). To also exempt them from parallel merging, include them in no_par_compress_reacs.
compression_backend –
Compression backend to use: - ‘sparse_rref’ (default): Pure Python sparse integer RREF.

No external dependencies beyond NumPy/SciPy.
- ’efmtool_rref’ (legacy): Java-based EFMTool via JPype. Requires a JVM and the jpype1 package.
propagate_gpr – If True, propagate and simplify GPR rules through compression (AND for coupled, OR for parallel merges). Empty GPR rules are correctly handled: skipped in AND (always active), and absorb in OR (result is always active). Uses sympy for boolean simplification. Default False.

Returns:

Compression maps for reversing each compression step

Return type:

list of dict

straindesign.compression.compress_model_coupled(model, compression_backend='sparse_rref', propagate_gpr=False, suppressed_reactions=set(), protected_reactions=set())[source]

Compress by lumping stoichiometrically coupled (dependent) reactions.

Identifies groups of reactions whose flux vectors are proportional in every steady state (i.e. they share a common nullspace direction) and merges each group into a single lumped reaction. Both the pure-Python and legacy Java backends perform this operation; the compression_backend controls the nullspace algorithm.

Parameters:

model – COBRA model to compress in-place
compression_backend – ‘sparse_rref’ (default, Python) or ‘efmtool_rref’ (Java legacy)
propagate_gpr – If True, AND-combine GPR rules of merged reactions (with sympy simplification). Empty GPRs are skipped. Default False.
suppressed_reactions – Set of reaction IDs to exclude from compression (Java backend only). Used to protect reactions referenced in strain design constraints from being deleted by the Java compressor’s CoupledContradicting logic. Ignored for the Python backend (which handles contradicting groups correctly via bounds intersection).
protected_reactions – Set of reaction IDs to exempt from coupled merging (kept as their own reactions; the rest of their coupled group still merges). Python (sparse_rref) backend only. Used to keep gene-controlled reactions intact through compression before GPR integration so that the gene multiplicity is preserved (correct gene-regulatory semantics).

Returns:

Mapping {compressed_id: {orig_id: factor, …}}

Return type:

dict

straindesign.compression.compress_model_parallel(model, protected_rxns=set(), propagate_gpr=False)[source]

Compress by lumping parallel reactions.

Parameters:

model – COBRA model to compress in-place
protected_rxns – Reactions exempt from parallel compression
propagate_gpr – If True, OR-combine GPR rules of lumped reactions (with sympy simplification). Default False.

Returns:

Mapping {compressed_id: {orig_id: factor, …}}

Return type:

dict

straindesign.compression.detect_max_precision(model) → int[source]: Detect maximum decimal precision needed for model coefficients.

straindesign.compression.float_to_rational(val, max_precision: int = 6, max_denom: int = 100) → fractions.Fraction[source]: Convert float to Fraction with bounded denominators.

straindesign.compression.nullspace(matrix: RationalMatrix) → RationalMatrix[source]

Compute right nullspace (kernel). Returns K where matrix @ K = 0.

Uses integer RREF with column/row pre-sorting and GCD reduction. All arithmetic is Python arbitrary-precision integers — no overflow possible.

Parameters:: matrix – Input RationalMatrix
Returns:: Kernel matrix K where matrix @ K = 0

straindesign.compression.remove_blocked_reactions(model) → List[source]: Remove blocked reactions (bounds == (0, 0)) from a network.

straindesign.compression.remove_conservation_relations(model) → None[source]

Remove conservation relations (dependent metabolites) from a model.

This reduces the number of metabolites while maintaining the original flux space. Uses exact rational arithmetic to find linearly independent rows.

Parameters:: model – COBRA model to modify in-place

straindesign.compression.remove_dummy_bounds(model) → None[source]: Replace COBRA standard bounds with +/-inf.

straindesign.compression.remove_ext_mets(model) → None[source]: Remove external metabolites from ‘External_Species’ compartment.

straindesign.compression.stoichmat_coeff2float(model) → None[source]: Convert stoichiometric coefficients to floats.

straindesign.compression.stoichmat_coeff2rational(model) → None[source]: Convert stoichiometric coefficients to rational numbers.