moscot.problems.generic.SinkhornProblem.compute_feature_correlation#

SinkhornProblem.compute_feature_correlation(obs_key, corr_method='pearson', significance_method='fisher', annotation=None, layer=None, features=None, confidence_level=0.95, n_perms=1000, seed=None, **kwargs)#

Compute correlation of push-forward or pull-back distribution with features.

Correlates a feature, e.g., counts of a gene, with probabilities of cells mapped to a set of cells such as the push-forward or pull-back distributions.

See also

  • TODO: create and link an example

Parameters:
  • obs_key (str) – Key in obs containing the push-forward or pull-back distribution.

  • corr_method (Literal['pearson', 'spearman']) – Which type of correlation to compute, either 'pearson' or 'spearman'.

  • significance_method (Literal['fisher', 'perm_test']) –

    Mode to use when calculating p-values and confidence intervals. Valid options are:

  • annotation (Optional[dict[str, Iterable[str]]]) –

    How to subset the data when computing the correlation:

    • None - do not subset the data.

    • str - key in obs where categorical data is stored.

    • dict - a dictionary with one key corresponding to a categorical column in obs and values to a subset of categories.

  • layer (Optional[str]) – Key in layers from which to get the expression. If None, use X.

  • features (Union[list[str], Literal['human', 'mouse', 'drosophila'], None]) –

    Features in AnnData to correlate with obs['{obs_key}']:

  • confidence_level (float) – Confidence level for the confidence interval calculation. Must be in interval \([0, 1]\).

  • n_perms (int) – Number of permutations to use when method = 'perm_test'.

  • seed (Optional[int]) – Random seed when method = 'perm_test'.

  • kwargs (Any) – Keyword arguments for parallelization, e.g., n_jobs.

  • self (AnalysisMixinProtocol[K, B]) –

Return type:

DataFrame

Returns:

: Dataframe of shape (n_features, 5) containing the following columns, one for each feature:

  • 'corr' - correlation between the count data and push/pull distributions.

  • 'pval' - calculated p-values for double-sided test.

  • 'qval' - corrected p-values using the Benjamini-Hochberg method at \(0.05\) level.

  • 'ci_low' - lower bound of the confidence_level correlation confidence interval.

  • 'ci_high' - upper bound of the confidence_level correlation confidence interval.