Perturbation modeling with the SinkhornProblem#

In this tutorial, we showcase how to use the generic solver class :class:~moscot.problems.generic.SinkhornProblem to model cellular responses to chemical drugs.

Preliminaries#

import warnings

import moscot as mt
import moscot.plotting as mtp
from moscot.problems.generic import SinkhornProblem
from tqdm.std import TqdmWarning

import numpy as np

import matplotlib.pyplot as plt

import scanpy as sc

warnings.simplefilter("ignore", UserWarning)
warnings.simplefilter("ignore", TqdmWarning)
warnings.simplefilter("ignore", FutureWarning)

Dataset description#

The sciplex() dataset is a perturbation dataset published in [Srivatsan et al., 2020]. It contains transcriptomes of A549, K562, and mCF7 cells exposed to 188 compounds. Data obtained from scPerturb.

adata = mt.datasets.sciplex()
adata
AnnData object with n_obs × n_vars = 799317 × 110984
    obs: 'ncounts', 'well', 'plate', 'cell_line', 'replicate', 'time', 'dose_value', 'pathway_level_1', 'pathway_level_2', 'perturbation', 'target', 'pathway', 'dose_unit', 'celltype', 'disease', 'cancer', 'tissue_type', 'organism', 'perturbation_type', 'ngenes', 'percent_mito', 'percent_ribo', 'nperts', 'chembl-ID'
    var: 'ensembl_id', 'ncounts', 'ncells'

Filter the data#

drugs = [
    "Dacinostat (LAQ824)",
    "Flavopiridol HCl",
    "Givinostat (ITF2357)",
    "TAK-901",
]
adata_red = adata[adata.obs["perturbation"].isin(["control"] + drugs)].copy()
adata_red
AnnData object with n_obs × n_vars = 29020 × 110984
    obs: 'ncounts', 'well', 'plate', 'cell_line', 'replicate', 'time', 'dose_value', 'pathway_level_1', 'pathway_level_2', 'perturbation', 'target', 'pathway', 'dose_unit', 'celltype', 'disease', 'cancer', 'tissue_type', 'organism', 'perturbation_type', 'ngenes', 'percent_mito', 'percent_ribo', 'nperts', 'chembl-ID'
    var: 'ensembl_id', 'ncounts', 'ncells'
sc.pp.normalize_total(adata_red, target_sum=1e4)
sc.pp.log1p(adata_red)
sc.pp.pca(adata_red)
sc.pp.neighbors(adata_red)
sc.tl.umap(adata_red)
sc.pl.umap(adata_red, color=["perturbation", "cell_line"], ncols=1)
../../_images/dd7a88fbcb53b367b8bd97c9ed6140aa39145533537740af07df06d42587d3d5.png

In this case, we want to compare the perturbations from the selected drugs with the control, so we use the star policy with the control as reference.

sp = SinkhornProblem(adata_red)
sp = sp.prepare(
    key="perturbation", joint_attr="X_pca", policy="star", reference="control"
)
sp = sp.solve(1e-2, 0.95, 0.95)
sp
INFO     Solving `4` problems                                                                                      
INFO     Solving problem OTProblem[stage='prepared', shape=(3609, 17578)].                                         
INFO     Solving problem OTProblem[stage='prepared', shape=(3545, 17578)].                                         
INFO     Solving problem OTProblem[stage='prepared', shape=(2559, 17578)].                                         
INFO     Solving problem OTProblem[stage='prepared', shape=(1729, 17578)].                                         
SinkhornProblem[('Dacinostat (LAQ824)', 'control'), ('Givinostat (ITF2357)', 'control'), ('TAK-901', 'control'), ('Flavopiridol HCl', 'control')]

We can verify that the transport plan we learn is meaningful, as cell lines are mapped onto themselves.

for drug in drugs:
    transition_matrix = sp.cell_transition(
        drug, "control", "cell_line", "cell_line", key_added=f"cell_transition_{drug}"
    )
    mtp.cell_transition(sp, key=f"cell_transition_{drug}", cmap="Blues")
../../_images/87e7dcb45cd142ee6015a55d804eaba99ffd514cd7834d42cca6a195e5838f4a.png ../../_images/fe2a3c19fc01bfd3d31968f3a15e2c8293582276a1832fa86cc22dd6544dd6f2.png ../../_images/7bc2c8e9d265afbf08f28e68e5f8449f93aabc72a293118433ec7ad59bc9e6ca.png ../../_images/38268231bb0669617012fe4bed9c95274646c225a7f983e9df74b6b9d6bd8fa2.png
sc.tl.leiden(adata_red, key_added="new_clusters", resolution=0.9)
sc.pl.umap(adata_red, color=["new_clusters", "perturbation"], ncols=1)
../../_images/e697935806c66fcb5f57614046e5fce3df5a905f756aa8c4c0a6cf7d8339c18e.png

We now choose cluster 11 and see where it comes from

for drug in drugs:
    sp.pull(
        source=drug,
        target="control",
        data="new_clusters",
        subset="11",
        key_added=f"pull_1_{drug}",
    )
for drug in drugs:
    mtp.pull(sp, key=f"pull_1_{drug}")
../../_images/9f2a525e8af1e359108636c47b7e87fff7ed9b7ba6da0005de6ec74f896a3f20.png ../../_images/1bd8f3fb2368f7aac7390ac9f663d17328d712741c0c8cd0271f9a164d8c6449.png ../../_images/2edeab68710221d2764a849eabb16aff3c088548747389ced8ac48399d5df926.png ../../_images/d6e56efc021ffe3c7913c3f1e3756a9897b45263d752fd7e1ffd9dca53805307.png

Let’s do the same for another cluster which is a bit bigger.

for drug in drugs:
    sp.pull(
        source=drug,
        target="control",
        data="new_clusters",
        subset="8",
        key_added=f"pull_2_{drug}",
    )
for drug in drugs:
    mtp.pull(sp, key=f"pull_2_{drug}")
../../_images/c6f63c0e2dc6494256028fd1d82902ce40961f2dc07294cec451474a63f00758.png ../../_images/8ea0f4915694df820b4fa026c92530c25da263b1f95b8a5c4a9ee17394acbbd4.png ../../_images/629cf394734beab3dcb323db9e73f2774dd8a8ee2504745c99812b4bb5f1aa53.png ../../_images/394332a11d2642eb30dc09ce710b38d3fa4e143ac0b290ab31f0eb497db1feec.png