Creating marginals from proliferation and apoptosis markers in developmental processes#

This example shows how to use score_genes_for_marginals().

Marginals define the weight of each single cell within a distribution of cells. In many cases, marginals are chosen to be uniform as all cells are equally important. In some cases, we have prior knowledge to adapt the marginals. When we consider developmental processes (e.g. TemporalProblem, e.g. LineageProblem, e.g. SpatioTemporalProblem) some cells proliferate faster than other cells, while some cells might even be likely to die. The expression of marker genes of proliferation and apoptosis provide a quantitive estimate.

See also

Imports and data loading#

from moscot import datasets
from moscot.problems.time import TemporalProblem

import numpy as np

import scanpy as sc

Here, we work with hspc(), containing data on blood development.

adata = datasets.hspc()
AnnData object with n_obs × n_vars = 4000 × 2000
    obs: 'day', 'donor', 'cell_type', 'technology', 'n_genes'
    var: 'n_cells', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
    uns: 'cell_type_colors', 'hvg', 'neighbors', 'neighbors_atac', 'pca', 'umap'
    obsm: 'X_lsi', 'X_pca', 'X_umap_ATAC', 'X_umap_GEX', 'peaks_tfidf'
    varm: 'PCs'
    obsp: 'connectivities', 'distances', 'neighbors_atac_connectivities', 'neighbors_atac_distances'
tp = TemporalProblem(adata)

Scoring genes#

We start with scoring genes to obtain a proliferation and apoptosis score for each cell. If we have a custom list of genes for proliferation markers or apoptosis markers, we can pass these. moscot provides a data base of proliferation and apoptosis genes for humans, mice, and pigs. These are queried automatically when passing the corresponding string. Here, we work with human donors.

tp = tp.score_genes_for_marginals(
    gene_set_proliferation="human", gene_set_apoptosis="human"
WARNING: genes are not in var_names and ignored: ['ANLN', 'ANP32E', 'ATAD2', 'AURKA', 'AURKB', 'BIRC5', 'BLM', 'BRIP1', 'BUB1', 'CASP8AP2', 'CBX5', 'CCNB2', 'CCNE2', 'CDC20', 'CDC25C', 'CDC45', 'CDC6', 'CDCA2', 'CDCA3', 'CDCA7', 'CDCA8', 'CDK1', 'CENPA', 'CENPF', 'CHAF1B', 'CKAP2', 'CKAP2L', 'CKAP5', 'CKS1B', 'CKS2', 'CLSPN', 'CTCF', 'DLGAP5', 'DSCC1', 'DTL', 'E2F8', 'ECT2', 'EXO1', 'FAM64A', 'FEN1', 'G2E3', 'GAS2L3', 'GINS2', 'GMNN', 'GTSE1', 'HELLS', 'HJURP', 'HMGB2', 'HMMR', 'HN1', 'KIF11', 'KIF20B', 'KIF23', 'KIF2C', 'LBR', 'MCM2', 'MCM4', 'MCM5', 'MCM6', 'MLF1IP', 'MSH2', 'NASP', 'NCAPD2', 'NDC80', 'NEK2', 'NUF2', 'NUSAP1', 'PCNA', 'POLA1', 'POLD3', 'PRIM1', 'PSRC1', 'RAD51', 'RAD51AP1', 'RANGAP1', 'RFC2', 'RPA2', 'RRM1', 'RRM2', 'SLBP', 'SMC4', 'TACC3', 'TIPIN', 'TMPO', 'TOP2A', 'TPX2', 'TTK', 'TUBB4B', 'TYMS', 'UBR7', 'UHRF1', 'UNG', 'USP1', 'WDR76']
WARNING: genes are not in var_names and ignored: ['ADD1', 'AIFM3', 'ANKH', 'ANXA1', 'APP', 'ATF3', 'AVPR1A', 'BAX', 'BCAP31', 'BCL10', 'BCL2L1', 'BCL2L10', 'BCL2L2', 'BGN', 'BID', 'BIK', 'BIRC3', 'BMF', 'BMP2', 'BNIP3L', 'BRCA1', 'BTG2', 'BTG3', 'CASP1', 'CASP2', 'CASP3', 'CASP4', 'CASP6', 'CASP7', 'CASP8', 'CASP9', 'CAV1', 'CCNA1', 'CCND1', 'CCND2', 'CD2', 'CDC25B', 'CDK2', 'CDKN1A', 'CDKN1B', 'CFLAR', 'CREBBP', 'CTH', 'CTNNB1', 'CYLD', 'DAP', 'DAP3', 'DCN', 'DDIT3', 'DFFA', 'DIABLO', 'DNAJA1', 'DNAJC3', 'DNM1L', 'DPYD', 'EBP', 'EGR3', 'ENO2', 'ERBB2', 'ERBB3', 'ETF1', 'F2', 'FAS', 'FASLG', 'FDXR', 'FEZ1', 'GADD45A', 'GADD45B', 'GCH1', 'GNA15', 'GPX1', 'GPX3', 'GPX4', 'GSR', 'GSTM1', 'GUCY2D', 'H1-0', 'HMGB2', 'HMOX1', 'HSPB1', 'IFNB1', 'IFNGR1', 'IGFBP6', 'IL18', 'IL1A', 'IL1B', 'IL6', 'IRF1', 'ISG20', 'KRT18', 'MADD', 'MCL1', 'MGMT', 'MMP2', 'NEFH', 'PAK1', 'PDCD4', 'PDGFRB', 'PEA15', 'PLCB2', 'PLPPR4', 'PMAIP1', 'PPP2R5B', 'PPP3R1', 'PPT1', 'PRF1', 'PSEN1', 'PSEN2', 'RARA', 'RELA', 'RETSAT', 'RHOB', 'RHOT2', 'RNASEL', 'ROCK1', 'SATB1', 'SC5D', 'SLC20A1', 'SMAD7', 'SOD1', 'SOD2', 'SPTAN1', 'TAP1', 'TGFB2', 'TIMP2', 'TNF', 'TNFRSF12A', 'TNFSF10', 'TOP2A', 'TSPO', 'TXNIP', 'VDAC2', 'WEE1', 'XIAP']

Now we can investigate the proliferation and apoptosis markers on the UMAP. Proliferation markers are much stronger than apoptosis markers, which can be seen from the range of the scores. This is to be expected, as we are in a developmental setting., basis="umap_GEX", color=["proliferation", "apoptosis"])

When preparing the problem, the above scores are used to compute the left marginals of our OT problems. By default, this is done via a birth death process ([Schiebinger et al., 2019]).

tp = tp.prepare(time_key="day", joint_attr="X_pca")

We can visualize the prior estimates of the growth rates and assess their values. Note that cells which only occur in the target do not require a prior growth rate estimate. Hence, their value is set to NaN, and their color in the plot is gray.

adata.obs["prior_growth_rates"] = tp.prior_growth_rates, basis="umap_GEX", color="prior_growth_rates")

Assuming a cell with value 4.0 corresponds to day 2, this means that this cell is expected to proliferate such that at day 3 the cell gave rise to another 3 cells. This indeed might be realistic.

Nevertheless, if we would like to adapt our prior estimates, we can do this via the marginal_kwargs estimate in ~moscot.problems.time.TemporalProblem.prepare. Either we can use the parameters corresponding to the birth process

    beta_max: float = 1.7,
    beta_min: float = 0.3,
    beta_center: float = 0.25,
    beta_width: float = 0.5,

and the death process

    delta_max: float = 1.7,
    delta_min: float = 0.3,
    delta_center: float = 0.1,
    delta_width: float = 0.2,

A more tunable way of the prior marginal estimates#

In some cases it might be easier to use the more easily tunable alternative prior growth rate estimation defined by

\(s_i = \exp(\frac{(p_i - a_i)*(t_1 - t_0)}{scaling})\)

where \(p_i\) are the proliferation scores saved in obs['proliferation'] and \(a_i\) are the apoptosis scores saved in obs['apoptosis'].

This estimation is done if we pass scaling in marginal_kwargs.

In the following, we try different values of scaling to obtain a different range of prior marginal estimations. Note that some cells might be outliers, hence we truncate the values at the 99th percentile.

tp = tp.prepare(time_key="day", joint_attr="X_pca", marginal_kwargs={"scaling": 2})
adata.obs["prior_growth_rates_2"] = tp.prior_growth_rates, basis="umap_GEX", color="prior_growth_rates_2", vmax="p99")
tp = tp.prepare(time_key="day", joint_attr="X_pca", marginal_kwargs={"scaling": 5})
adata.obs["prior_growth_rates_3"] = tp.prior_growth_rates, basis="umap_GEX", color="prior_growth_rates_3", vmax="p99")
tp = tp.prepare(time_key="day", joint_attr="X_pca", marginal_kwargs={"scaling": 10})
adata.obs["prior_growth_rates_4"] = tp.prior_growth_rates, basis="umap_GEX", color="prior_growth_rates_4", vmax="p99")