moscot.base.output.MatrixSolverOutput.sparsify

MatrixSolverOutput.sparsify(mode, value=None, batch_size=1024, n_samples=None, seed=None, max_k=None)

Sparsify the transport_matrix.

This function sets entries of the transport matrix to \(0\) according to mode and returns a MatrixSolverOutput with sparsified transport matrix stored as a csr_matrix. The transport matrix is materialized in row blocks of batch_size rows, so peak memory is bounded by batch_size.

Warning

This function only serves for interfacing software which has to instantiate the transport matrix, moscot never uses the sparsified transport matrix.

Parameters:
  • mode (Literal['threshold', 'percentile', 'min_row', 'mass']) –

    How to determine the entries that are set to \(0\). Valid options are:

    • ’threshold’ - value is the threshold below which entries are set to \(0\).

    • ’percentile’ - value is the percentile in \([0, 100]\) of the transport_matrix. below which entries are set to \(0\).

    • ’min_row’ - value is not used, it is chosen such that each row has at least 1 non-zero entry.

    • ’mass’ - per row, keep the largest entries capturing a fraction value of the row’s mass (at most max_k entries per row); value must be in \((0, 1]\).

  • value (Optional[float]) – Value to use for sparsification. Its meaning depends on mode (see above).

  • batch_size (int) – How many rows to materialize at a time when sparsifying the transport_matrix.

  • n_samples (Optional[int]) – If mode = 'percentile', determine the number of samples based on which the percentile is computed stochastically. Note this means that a matrix of shape [n_samples, min(transport_matrix.shape)] has to be instantiated. If None, n_samples is set to batch_size.

  • seed (Optional[int]) – Random seed needed for sampling if mode = 'percentile'.

  • max_k (Optional[int]) – Maximum number of entries to keep per row. Only valid when mode = 'mass'.

Return type:

MatrixSolverOutput

Returns:

: Output with sparsified transport matrix.