Disentangling lineage relationships of delta and epsilon cells in pancreatic development with the TemporalProblem

In this tutorial, we showcase using gene expression and chromatin accessability to analyze the temporal trajectory of delta and epsilon cells during mouse pancreatic development.

Mapping cells across time points

Given measurements of any modality across time we can reconstruct the trajectories by mapping cells across time. We assume that we have the same measurement at each time point, resulting in a W-type OT problem.

The reconstruction of these trajectories is crucial in various applications, like understanding the pancretic development in mice. In this specific example researchers have been hypothesizing about the complex lineage specifications of delta and epsilon cells during the developmental phase, which appear to be connected. Here we examine these lineages by modelling the trajectories using the TemporalProblem.

Preliminaries

import os
import warnings

import moscot as mt
import moscot.plotting as mpl
from moscot.problems.time import TemporalProblem

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt

import scanpy as sc

plt.rcParams["legend.scatterpoints"] = 1
sc.set_figure_params(scanpy=True, dpi=80, dpi_save=200)

warnings.simplefilter("ignore", UserWarning)
warnings.simplefilter("ignore", FutureWarning)

Dataset description

The pancreas_multiome() dataset (cite) contains and includes, among other:

  • obs['stage'], representing the developmental stage for cells at time points E14.5, E15.5, and E16.5.

  • obs['cell_type'], representing the cell types present in the pancreas during development.

  • obsm['X_MultiVI'], a feature space containing information about both gene expression and chromatin accessibility that was created using MultiVI

adata = mt.datasets.pancreas_multiome(rna_only=True, force_download=False)

As our goal is to study endocrine cells, we filter out ductal and acinar cells. Filtering to the relevant cell types often helps to get a more accurate mapping

endocrine_celltypes = [
    "Ngn3 low",
    "Ngn3 high",
    "Ngn3 high cycling",
    "Fev+",
    "Fev+ Alpha",
    "Fev+ Beta",
    "Fev+ Delta",
    "Eps. progenitors",
    "Alpha",
    "Beta",
    "Delta",
    "Epsilon",
]

adata = adata[adata.obs["cell_type"].isin(endocrine_celltypes)].copy()

As the TemporalProblem requires the temporal information to be numeric, we create a column obs['time'] of numeric type.

def adapt_time(x):
    if x["stage"] == "E14.5":
        return 14.5
    if x["stage"] == "E15.5":
        return 15.5
    if x["stage"] == "E16.5":
        return 16.5
    raise ValueError


adata.obs["time"] = adata.obs.apply(adapt_time, axis=1).astype("category")

We can visualize the data by considering the cell type annotations in a precomputed UMAP.

sc.pl.embedding(adata, basis="X_umap", color="cell_type")

Prepare the TemporalProblem

First, we instantiate the TemporalProblem

tp0 = TemporalProblem(adata)

To prepare the problem, we need to pass some information:

  • time_key: defines the obs column for the temporal information.

  • joint_attr: defines based on which (latent) space we compute the distances for the OT problem. In this case, we use the latent space obsm['X_MultiVI'] described above.

See also

  • See Subset policy on how to choose pairs of time points between which to compute OT maps.

tp0 = tp0.prepare(time_key="time", joint_attr="X_MultiVI")

Compute the cost matrix

The {class}~moscot.problems.time.TemporalProblem requires a cost matrix, which moscots generates by default using the square euclidian distance on the latent space. Here we want to define our cost using geodesic distances, for which we need to create graphs.

Since we have three different time points E14.5, E15.5, and E16.5, we will examine two separate temporal problems. The first problem analyzes the temporal trajectory from cells at E14.5 to cells at E15.5, while the second problem does the same for E15.5 and E16.5. The cost matrices for both problems are calculated based on the euclidian distance between cells in the space defined by obsm['X_MultiVI'].

import networkx as nx

dfs = {}
batch_column = "time"
unique_batches = [14.5, 15.5, 16.5]
for i in range(len(unique_batches) - 1):
    batch1 = unique_batches[i]
    batch2 = unique_batches[i + 1]

    indices = np.where(
        (adata.obs[batch_column] == batch1) | (adata.obs[batch_column] == batch2)
    )[0]
    adata_subset = adata[indices]
    sc.pp.neighbors(adata_subset, use_rep="X_MultiVI", n_neighbors=30)
    G = nx.from_numpy_array(adata_subset.obsp["connectivities"].todense())
    assert nx.is_connected(G)

    dfs[(batch1, batch2)] = pd.DataFrame(
        index=adata_subset.obs_names,
        columns=adata_subset.obs_names,
        data=adata_subset.obsp["connectivities"].todense().astype("float"),
    )

We then pass the computed cost matrices using {func}moscot.base.problems.BirthDeathProblem.set_graph_xy

See also

tp0[14.5, 15.5].set_graph_xy((dfs[14.5, 15.5]).astype("float"), t=100.0)
tp0[15.5, 16.5].set_graph_xy((dfs[15.5, 16.5]).astype("float"), t=100.0)

Solve the TemporalProblem

Now we can solve the problem. We set epsilon to be relatively small to get a sparse mapping between cells. scale_cost is set to mean, which simply means that we normalize the entries of the cost matrix by its mean to stabilize training. We also introduce an additional stopping criterion, which is activated if the number of iterations of the Sinkhorn algorithm exceeds one billion by setting max_iterations to 1e7.

tp0 = tp0.solve(epsilon=1e-3, scale_cost="mean", max_iterations=1e7)
INFO     Solving `2` problems                                                                                      
INFO     Solving problem BirthDeathProblem[stage='prepared', shape=(5185, 1699)].                                  
INFO     Solving problem BirthDeathProblem[stage='prepared', shape=(4761, 5185)].                                  

Identifying ancestors and descendants of cells.

We can now examine the ancestors and descendants of a certain cell type. We do this by aggregating the transport matrix by cell type, using cell_transition(). First we investigate which ancestry population a certain cell type has. We again leave out all cell types which are not relevant for our examination.

order_celltypes = [
    "Ngn3 low",
    "Ngn3 high cycling",
    "Ngn3 high",
    "Eps. progenitors",
    "Fev+",
    "Fev+ Alpha",
    "Fev+ Beta",
    "Fev+ Delta",
    "Alpha",
    "Beta",
    "Delta",
    "Epsilon",
]

ct_desc = tp0.cell_transition(
    14.5,
    15.5,
    {"cell_type": order_celltypes},
    {"cell_type": order_celltypes},
    forward=False,
    key_added="transitions_14.5_15.5_large",
)
ct_desc = tp0.cell_transition(
    15.5,
    16.5,
    {"cell_type": order_celltypes},
    {"cell_type": order_celltypes},
    forward=False,
    key_added="transitions_15.5_16.5_large",
)

# Create a 1x2 grid of subplots
fig, axes = plt.subplots(ncols=2, figsize=(12, 6))

axes[0] = mpl.cell_transition(
    tp0,
    fontsize=7,
    figsize=(5, 5),
    return_fig=True,
    ax=axes[0],
    key="transitions_14.5_15.5_large",
)

axes[1] = mpl.cell_transition(
    tp0,
    fontsize=7,
    figsize=(5, 5),
    return_fig=True,
    ax=axes[1],
    key="transitions_15.5_16.5_large",
)


fig.subplots_adjust(wspace=0.8)

The transition matrices shows the ancestry of each cell type. For example, looking at the columns of the left transition matrix, we see that both epsilon and delta cells at time point 15.5 are derived from a similiar group of ancestors consisting of Ngn3 high, epsilon progenitor and Fev+ delta cells, while also developing from their own cell type.

Next we will examine the descendant of each selected cell type:

ct_desc = tp0.cell_transition(
    14.5,
    15.5,
    {"cell_type": order_celltypes},
    {"cell_type": order_celltypes},
    forward=True,
    key_added="transitions_14.5_15.5",
)
ct_desc = tp0.cell_transition(
    15.5,
    16.5,
    {"cell_type": order_celltypes},
    {"cell_type": order_celltypes},
    forward=True,
    key_added="transitions_15.5_16.5",
)

# Create a 1x2 grid of subplots
fig, axes = plt.subplots(ncols=2, figsize=(12, 6))

axes[0] = mpl.cell_transition(
    tp0,
    fontsize=7,
    figsize=(5, 5),
    return_fig=True,
    ax=axes[0],
    key="transitions_14.5_15.5",
)

axes[1] = mpl.cell_transition(
    tp0,
    fontsize=7,
    figsize=(5, 5),
    return_fig=True,
    ax=axes[1],
    key="transitions_15.5_16.5",
)


fig.subplots_adjust(wspace=0.8)

Looking at the rows of the right transition matrix we see that mature endocrine (alpha, beta, delta and epsilon) cells at time points 15.5 mostly evolve into themselves, with one exception to this being epsilon cells which appear to be developing primarily into alpha cells.

Visualizing ancestors and descendants

We can also visualize ancestors and descendants, e.g. on a UMAP, by using the pull() and push(), respectively. We start with ancestors of delta cells at time point 15.5.

See also

tp0.pull(source=14.5, target=15.5, data="cell_type", subset="Delta")
fig, axes = plt.subplots(ncols=2, figsize=(16, 4))

mpl.pull(
    tp0,
    time_points=[15.5],
    basis="X_umap",
    ax=axes[0],
    title=["Delta cells at time 15.5"],
)

mpl.pull(
    tp0,
    time_points=[14.5],
    basis="X_umap",
    ax=axes[1],
    title=["Delta cells ancestors"],
)

fig.subplots_adjust(wspace=0.3)

The left plot above colorizes all delta cells at time point 15.5 in green. All other cells in time point 15.5 are dark gray, while cells not measured at time point 15.5 are light gray. The right plot shows the ancestry likelihood of each cell at time point 14.5 on a palette from green (very likely ancestor) to dark gray (very unlikely ancestor). All cells not belonging to time point 14.5 are light gray.

We can also visualize the ancestors of epsilon cells at time point 15.5, by applying the same technique.

tp0.pull(source=14.5, target=15.5, data="cell_type", subset="Epsilon")
fig, axes = plt.subplots(ncols=2, figsize=(16, 4))

mpl.pull(
    tp0,
    time_points=[15.5],
    basis="X_umap",
    ax=axes[0],
    title=["Epsilon cells at time 15.5"],
)

mpl.pull(
    tp0,
    time_points=[14.5],
    basis="X_umap",
    ax=axes[1],
    title=["Epsilon cells ancestors"],
)

fig.subplots_adjust(wspace=0.3)

We can visualize the predicted cell evolution using a sankey() diagram for the two problems from time points 14.5 to 15.5 and from 15.5 to 16.5. For simplicity we use a different, smaller subset of cell types including the mature endocrine cells, epsilon progenitor cells and two different types of Fev+ delta cells. Since the cell colors are not directly provided by the dataset we have to define them first. Additionally we choose threshold=0.05, which ignores all transitions with a probability of less than 0.05.

cluster_subset = [
    "Eps. progenitors",
    "Fev+ Delta,0",
    "Fev+ Delta,1",
    "Alpha",
    "Beta",
    "Delta",
    "Epsilon",
]

tp0.sankey(
    source=14.5,
    target=16.5,
    source_groups={"cell_type_refined": cluster_subset},
    target_groups={"cell_type_refined": cluster_subset},
    threshold=0.05,
    order_annotations=cluster_subset[::-1],
    normalize=True,
)
color_dict = {
    "Prlf. Ductal": "#f7b6d2",
    "Ductal": "#d62728",
    "Imm. Acinar": "#ffeb3b",
    "Mat. Acinar": "#98df8a",
    "Alpha": "#1f77b4",
    "Beta": "#ff7f0e",
    "Delta": "#279e68",
    "Eps. progenitors": "#aa40fc",
    "Epsilon": "#8c564b",
    "Fev+": "#e377c2",
    "Fev+ Alpha": "#b5bd61",
    "Fev+ Beta": "#42f5ec",
    "Fev+ Delta": "#aec7e8",
    "Fev+ Delta,0": "#17becf",
    "Fev+ Delta,1": "#98df8a",
    "Ngn3 high cycling": "#adf542",
    "Ngn3 high,0": "#ff9896",
    "Ngn3 high": "#ff9896",
    "Ngn3 high,1": "#f0b98d",
    "Ngn3 low": "#c5b0d5",
}
tp0.adata.uns["cell_type_refined_colors"] = [
    color_dict[ct] for ct in tp0.adata.obs["cell_type_refined"].cat.categories
]
mpl.sankey(tp0, dpi=100, figsize=(12, 4), fontsize=10, interpolate_color=True)

This figure again illustrates some of the previous findings like mature endocrine cells mainly developing into themselves as well as epsilon cells primarily developing into alpha cells. Additionally we also see that the majority of epsilon progenitor cells develops into epsilon and Fev+ delta cells, while there seem to be developmental differences between Fev+ delta 0 and 1 cells.

Identifying driver genes and transcription factors

Next we will demonstrate how to use compute_feature_correlation() to identify the top correlated transcription factors (TFs), by computing the correlation between the pull distributions of epsilon progenitor/ Fev+ delta cells and the gene expression of the TFs.

We start off by examining the transcription factors for epsilon progenitor cells. Since the data for gene expression is given by raw counts, we first have to normalize.

sc.pp.normalize_total(tp0.adata)
sc.pp.log1p(tp0.adata)
WARNING: adata.X seems to be already log-transformed.

Next we calculate the pull distributions at each time point.

tp0.pull(
    14.5,
    15.5,
    data="cell_type",
    subset="Eps. progenitors",
    key_added="Eps_progenitors_pull_early",
    normalize=False,
)
tp0.pull(
    15.5,
    16.5,
    data="cell_type",
    subset="Eps. progenitors",
    key_added="Eps_progenitors_pull_late",
    normalize=False,
)
tp0.adata.obs["Eps_progenitors_pull"] = (
    tp0.adata.obs["Eps_progenitors_pull_early"]
    + tp0.adata.obs["Eps_progenitors_pull_late"]
)

To specify that we only want to compute correlation for the genes which are TFs in mice, we restrict the set of genes considered, by setting features to mouse.

eps_prog_drivers_tf = tp0.compute_feature_correlation(
    obs_key="Eps_progenitors_pull", features="mouse", annotation={"time": [14.5, 15.5]}
)
eps_prog_drivers_tf.head(5)
Eps_progenitors_pull_corr Eps_progenitors_pull_pval Eps_progenitors_pull_qval Eps_progenitors_pull_ci_low Eps_progenitors_pull_ci_high
Gadd45a 0.185421 1.481816e-41 1.118771e-38 0.159004 0.211574
Neurod2 0.184938 2.414550e-41 1.215324e-38 0.158516 0.211096
Rps4x 0.156166 8.852916e-30 2.673581e-27 0.129496 0.182610
Setbp1 0.142848 4.018292e-25 7.438895e-23 0.116079 0.169410
Nkx2-2 0.142699 4.503957e-25 7.438895e-23 0.115929 0.169262

From the table we can infer that epsilon progenitor cells appear to specifically express Neurod2. For further validation we can use the UMAP to plot the expression of Neurod2 per cell (left) and compare it the the labeled set of cells (right). We can verify that the majority of epsilon progenitor cells show a high level of Neurod2 expression.

sc.pl.embedding(tp0.adata, basis="X_umap", color=["Neurod2", "cell_type"])

We can repeat this analysis for Fev+ delta cells. The approach is the same as for the eps. progenitor cells.

tp0.pull(
    14.5,
    15.5,
    data="cell_type",
    subset="Fev+ Delta",
    key_added="Fev_delta_pull_early",
    normalize=False,
)
tp0.pull(
    15.5,
    16.5,
    data="cell_type",
    subset="Fev+ Delta",
    key_added="Fev_delta_pull_late",
    normalize=False,
)
tp0.adata.obs["Fev_delta_pull"] = (
    tp0.adata.obs["Fev_delta_pull_early"] + tp0.adata.obs["Fev_delta_pull_late"]
)
fev_delta_drivers_tf = tp0.compute_feature_correlation(
    obs_key="Fev_delta_pull", features="mouse", annotation={"time": [14.5, 15.5]}
)
fev_delta_drivers_tf.head(5)
Fev_delta_pull_corr Fev_delta_pull_pval Fev_delta_pull_qval Fev_delta_pull_ci_low Fev_delta_pull_ci_high
Isl1 0.174969 4.224819e-37 6.379477e-34 0.148456 0.201231
Neurod2 0.165569 2.488616e-33 1.878905e-30 0.138975 0.191925
Prox1 0.158975 8.115406e-31 3.063566e-28 0.132327 0.185393
Peg3 0.152899 1.347384e-28 4.069099e-26 0.126204 0.179373
Jund 0.129767 5.756594e-21 1.241780e-18 0.102910 0.156434

We find that the predicted progenitor population of Fev+ delta cells also specifically expresses Neurod2, which suggests that we found a potential transcription factor for both delta and epsilon populations. Thus, we set out to experimentally verify the role of Neurod2, and found it to be an activator of epsilon cells [Klein et al., 2025].