Disentangling lineage relationships of delta and epsilon cells in pancreatic development with the TemporalProblem¶
In this tutorial, we showcase using gene expression and chromatin accessability to analyze the temporal trajectory of delta and epsilon cells during mouse pancreatic development.
Mapping cells across time points¶
Given measurements of any modality across time we can reconstruct the trajectories by mapping cells across time. We assume that we have the same measurement at each time point, resulting in a W-type OT problem.
The reconstruction of these trajectories is crucial in various applications, like understanding the pancretic development in mice. In this specific example
researchers have been hypothesizing about the complex lineage specifications of delta and epsilon cells during the developmental phase, which appear to
be connected. Here we examine these lineages by modelling the trajectories using the TemporalProblem.
Preliminaries¶
import os
import warnings
import moscot as mt
import moscot.plotting as mpl
from moscot.problems.time import TemporalProblem
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scanpy as sc
plt.rcParams["legend.scatterpoints"] = 1
sc.set_figure_params(scanpy=True, dpi=80, dpi_save=200)
warnings.simplefilter("ignore", UserWarning)
warnings.simplefilter("ignore", FutureWarning)
Dataset description¶
The pancreas_multiome() dataset (cite) contains and includes, among other:
obs['stage'], representing the developmental stage for cells at time pointsE14.5,E15.5, andE16.5.obs['cell_type'], representing the cell types present in the pancreas during development.obsm['X_MultiVI'], a feature space containing information about both gene expression and chromatin accessibility that was created using MultiVI
adata = mt.datasets.pancreas_multiome(rna_only=True, force_download=False)
As our goal is to study endocrine cells, we filter out ductal and acinar cells. Filtering to the relevant cell types often helps to get a more accurate mapping
endocrine_celltypes = [
"Ngn3 low",
"Ngn3 high",
"Ngn3 high cycling",
"Fev+",
"Fev+ Alpha",
"Fev+ Beta",
"Fev+ Delta",
"Eps. progenitors",
"Alpha",
"Beta",
"Delta",
"Epsilon",
]
adata = adata[adata.obs["cell_type"].isin(endocrine_celltypes)].copy()
As the TemporalProblem requires the temporal information to be numeric, we create a column obs['time'] of numeric type.
def adapt_time(x):
if x["stage"] == "E14.5":
return 14.5
if x["stage"] == "E15.5":
return 15.5
if x["stage"] == "E16.5":
return 16.5
raise ValueError
adata.obs["time"] = adata.obs.apply(adapt_time, axis=1).astype("category")
We can visualize the data by considering the cell type annotations in a precomputed UMAP.
Prepare the TemporalProblem¶
First, we instantiate the TemporalProblem
tp0 = TemporalProblem(adata)
To prepare the problem, we need to pass some information:
time_key: defines theobscolumn for the temporal information.joint_attr: defines based on which (latent) space we compute the distances for the OT problem. In this case, we use the latent spaceobsm['X_MultiVI']described above.
See also
See Subset policy on how to choose pairs of time points between which to compute OT maps.
tp0 = tp0.prepare(time_key="time", joint_attr="X_MultiVI")
Compute the cost matrix¶
The {class}~moscot.problems.time.TemporalProblem requires a cost matrix, which moscots generates by default using the square euclidian distance on the latent space. Here we want to define our cost using geodesic distances, for which we need to create graphs.
Since we have three different time points E14.5, E15.5, and E16.5, we will examine two separate temporal problems. The first problem analyzes the temporal trajectory from cells at E14.5 to cells at E15.5, while the second problem does the same for E15.5 and E16.5. The cost matrices for both problems are calculated based on the euclidian distance between cells in the space defined by obsm['X_MultiVI'].
import networkx as nx
dfs = {}
batch_column = "time"
unique_batches = [14.5, 15.5, 16.5]
for i in range(len(unique_batches) - 1):
batch1 = unique_batches[i]
batch2 = unique_batches[i + 1]
indices = np.where(
(adata.obs[batch_column] == batch1) | (adata.obs[batch_column] == batch2)
)[0]
adata_subset = adata[indices]
sc.pp.neighbors(adata_subset, use_rep="X_MultiVI", n_neighbors=30)
G = nx.from_numpy_array(adata_subset.obsp["connectivities"].todense())
assert nx.is_connected(G)
dfs[(batch1, batch2)] = pd.DataFrame(
index=adata_subset.obs_names,
columns=adata_subset.obs_names,
data=adata_subset.obsp["connectivities"].todense().astype("float"),
)
We then pass the computed cost matrices using {func}moscot.base.problems.BirthDeathProblem.set_graph_xy
See also
See Custom cost matrices for further details on how to pass precomputed cost matrices.
tp0[14.5, 15.5].set_graph_xy((dfs[14.5, 15.5]).astype("float"), t=100.0)
tp0[15.5, 16.5].set_graph_xy((dfs[15.5, 16.5]).astype("float"), t=100.0)
Solve the TemporalProblem¶
Now we can solve the problem. We set epsilon to be relatively small to get a sparse mapping between cells. scale_cost is set to mean, which simply means that we normalize the entries of the cost matrix by its mean to stabilize training. We also introduce an additional stopping criterion, which is
activated if the number of iterations of the Sinkhorn algorithm exceeds one billion by setting max_iterations to 1e7.
tp0 = tp0.solve(epsilon=1e-3, scale_cost="mean", max_iterations=1e7)
INFO Solving `2` problems
INFO Solving problem BirthDeathProblem[stage='prepared', shape=(5185, 1699)].
INFO Solving problem BirthDeathProblem[stage='prepared', shape=(4761, 5185)].
Identifying ancestors and descendants of cells.¶
We can now examine the ancestors and descendants of a certain cell type. We do this by aggregating the transport matrix by cell type, using cell_transition(). First we investigate which ancestry population a certain cell type has. We again leave out
all cell types which are not relevant for our examination.
order_celltypes = [
"Ngn3 low",
"Ngn3 high cycling",
"Ngn3 high",
"Eps. progenitors",
"Fev+",
"Fev+ Alpha",
"Fev+ Beta",
"Fev+ Delta",
"Alpha",
"Beta",
"Delta",
"Epsilon",
]
ct_desc = tp0.cell_transition(
14.5,
15.5,
{"cell_type": order_celltypes},
{"cell_type": order_celltypes},
forward=False,
key_added="transitions_14.5_15.5_large",
)
ct_desc = tp0.cell_transition(
15.5,
16.5,
{"cell_type": order_celltypes},
{"cell_type": order_celltypes},
forward=False,
key_added="transitions_15.5_16.5_large",
)
# Create a 1x2 grid of subplots
fig, axes = plt.subplots(ncols=2, figsize=(12, 6))
axes[0] = mpl.cell_transition(
tp0,
fontsize=7,
figsize=(5, 5),
return_fig=True,
ax=axes[0],
key="transitions_14.5_15.5_large",
)
axes[1] = mpl.cell_transition(
tp0,
fontsize=7,
figsize=(5, 5),
return_fig=True,
ax=axes[1],
key="transitions_15.5_16.5_large",
)
fig.subplots_adjust(wspace=0.8)
The transition matrices shows the ancestry of each cell type. For example, looking at the columns of the left transition matrix, we see that both epsilon
and delta cells at time point 15.5 are derived from a similiar group of ancestors consisting of Ngn3 high, epsilon progenitor and Fev+ delta cells,
while also developing from their own cell type.
Next we will examine the descendant of each selected cell type:
ct_desc = tp0.cell_transition(
14.5,
15.5,
{"cell_type": order_celltypes},
{"cell_type": order_celltypes},
forward=True,
key_added="transitions_14.5_15.5",
)
ct_desc = tp0.cell_transition(
15.5,
16.5,
{"cell_type": order_celltypes},
{"cell_type": order_celltypes},
forward=True,
key_added="transitions_15.5_16.5",
)
# Create a 1x2 grid of subplots
fig, axes = plt.subplots(ncols=2, figsize=(12, 6))
axes[0] = mpl.cell_transition(
tp0,
fontsize=7,
figsize=(5, 5),
return_fig=True,
ax=axes[0],
key="transitions_14.5_15.5",
)
axes[1] = mpl.cell_transition(
tp0,
fontsize=7,
figsize=(5, 5),
return_fig=True,
ax=axes[1],
key="transitions_15.5_16.5",
)
fig.subplots_adjust(wspace=0.8)
Looking at the rows of the right transition matrix we see that mature endocrine (alpha, beta, delta and epsilon) cells at time points 15.5 mostly evolve
into themselves, with one exception to this being epsilon cells which appear to be developing primarily into alpha cells.
Visualizing ancestors and descendants¶
We can also visualize ancestors and descendants, e.g. on a UMAP, by using the pull() and push(), respectively. We start with ancestors of delta cells at time point 15.5.
See also
See Pull/push distributions for further details on how to use
pull()andpush().
tp0.pull(source=14.5, target=15.5, data="cell_type", subset="Delta")
fig, axes = plt.subplots(ncols=2, figsize=(16, 4))
mpl.pull(
tp0,
time_points=[15.5],
basis="X_umap",
ax=axes[0],
title=["Delta cells at time 15.5"],
)
mpl.pull(
tp0,
time_points=[14.5],
basis="X_umap",
ax=axes[1],
title=["Delta cells ancestors"],
)
fig.subplots_adjust(wspace=0.3)
The left plot above colorizes all delta cells at time point 15.5 in green. All other cells in time point 15.5 are dark gray, while cells not measured at time point 15.5 are light gray. The right plot shows the ancestry likelihood of each cell at time point 14.5 on a palette from green (very likely ancestor) to dark gray (very unlikely ancestor). All cells not belonging to time point 14.5 are light gray.
We can also visualize the ancestors of epsilon cells at time point 15.5, by applying the same technique.
tp0.pull(source=14.5, target=15.5, data="cell_type", subset="Epsilon")
fig, axes = plt.subplots(ncols=2, figsize=(16, 4))
mpl.pull(
tp0,
time_points=[15.5],
basis="X_umap",
ax=axes[0],
title=["Epsilon cells at time 15.5"],
)
mpl.pull(
tp0,
time_points=[14.5],
basis="X_umap",
ax=axes[1],
title=["Epsilon cells ancestors"],
)
fig.subplots_adjust(wspace=0.3)
We can visualize the predicted cell evolution using a sankey() diagram for the two problems from time points 14.5 to 15.5 and
from 15.5 to 16.5. For simplicity we use a different, smaller subset of cell types including the mature endocrine cells, epsilon progenitor cells and
two different types of Fev+ delta cells. Since the cell colors are not directly provided by the dataset we have to define them first. Additionally we
choose threshold=0.05, which ignores all transitions with a probability of less than 0.05.
cluster_subset = [
"Eps. progenitors",
"Fev+ Delta,0",
"Fev+ Delta,1",
"Alpha",
"Beta",
"Delta",
"Epsilon",
]
tp0.sankey(
source=14.5,
target=16.5,
source_groups={"cell_type_refined": cluster_subset},
target_groups={"cell_type_refined": cluster_subset},
threshold=0.05,
order_annotations=cluster_subset[::-1],
normalize=True,
)
color_dict = {
"Prlf. Ductal": "#f7b6d2",
"Ductal": "#d62728",
"Imm. Acinar": "#ffeb3b",
"Mat. Acinar": "#98df8a",
"Alpha": "#1f77b4",
"Beta": "#ff7f0e",
"Delta": "#279e68",
"Eps. progenitors": "#aa40fc",
"Epsilon": "#8c564b",
"Fev+": "#e377c2",
"Fev+ Alpha": "#b5bd61",
"Fev+ Beta": "#42f5ec",
"Fev+ Delta": "#aec7e8",
"Fev+ Delta,0": "#17becf",
"Fev+ Delta,1": "#98df8a",
"Ngn3 high cycling": "#adf542",
"Ngn3 high,0": "#ff9896",
"Ngn3 high": "#ff9896",
"Ngn3 high,1": "#f0b98d",
"Ngn3 low": "#c5b0d5",
}
tp0.adata.uns["cell_type_refined_colors"] = [
color_dict[ct] for ct in tp0.adata.obs["cell_type_refined"].cat.categories
]
This figure again illustrates some of the previous findings like mature endocrine cells mainly developing into themselves as well as epsilon cells primarily developing into alpha cells. Additionally we also see that the majority of epsilon progenitor cells develops into epsilon and Fev+ delta cells, while there seem to be developmental differences between Fev+ delta 0 and 1 cells.
Identifying driver genes and transcription factors¶
Next we will demonstrate how to use compute_feature_correlation() to identify the top correlated transcription factors (TFs), by computing the correlation between the pull distributions of epsilon progenitor/ Fev+ delta cells and the gene expression of the TFs.
We start off by examining the transcription factors for epsilon progenitor cells. Since the data for gene expression is given by raw counts, we first have to normalize.
sc.pp.normalize_total(tp0.adata)
sc.pp.log1p(tp0.adata)
WARNING: adata.X seems to be already log-transformed.
Next we calculate the pull distributions at each time point.
tp0.pull(
14.5,
15.5,
data="cell_type",
subset="Eps. progenitors",
key_added="Eps_progenitors_pull_early",
normalize=False,
)
tp0.pull(
15.5,
16.5,
data="cell_type",
subset="Eps. progenitors",
key_added="Eps_progenitors_pull_late",
normalize=False,
)
tp0.adata.obs["Eps_progenitors_pull"] = (
tp0.adata.obs["Eps_progenitors_pull_early"]
+ tp0.adata.obs["Eps_progenitors_pull_late"]
)
To specify that we only want to compute correlation for the genes which are TFs in mice, we restrict the set of genes
considered, by setting features to mouse.
eps_prog_drivers_tf = tp0.compute_feature_correlation(
obs_key="Eps_progenitors_pull", features="mouse", annotation={"time": [14.5, 15.5]}
)
eps_prog_drivers_tf.head(5)
| Eps_progenitors_pull_corr | Eps_progenitors_pull_pval | Eps_progenitors_pull_qval | Eps_progenitors_pull_ci_low | Eps_progenitors_pull_ci_high | |
|---|---|---|---|---|---|
| Gadd45a | 0.185421 | 1.481816e-41 | 1.118771e-38 | 0.159004 | 0.211574 |
| Neurod2 | 0.184938 | 2.414550e-41 | 1.215324e-38 | 0.158516 | 0.211096 |
| Rps4x | 0.156166 | 8.852916e-30 | 2.673581e-27 | 0.129496 | 0.182610 |
| Setbp1 | 0.142848 | 4.018292e-25 | 7.438895e-23 | 0.116079 | 0.169410 |
| Nkx2-2 | 0.142699 | 4.503957e-25 | 7.438895e-23 | 0.115929 | 0.169262 |
From the table we can infer that epsilon progenitor cells appear to specifically express Neurod2. For further validation we can use the UMAP to plot the expression of Neurod2 per cell (left) and compare it the the labeled set of cells (right). We can verify that the majority of epsilon progenitor cells show a high level of Neurod2 expression.
We can repeat this analysis for Fev+ delta cells. The approach is the same as for the eps. progenitor cells.
tp0.pull(
14.5,
15.5,
data="cell_type",
subset="Fev+ Delta",
key_added="Fev_delta_pull_early",
normalize=False,
)
tp0.pull(
15.5,
16.5,
data="cell_type",
subset="Fev+ Delta",
key_added="Fev_delta_pull_late",
normalize=False,
)
tp0.adata.obs["Fev_delta_pull"] = (
tp0.adata.obs["Fev_delta_pull_early"] + tp0.adata.obs["Fev_delta_pull_late"]
)
fev_delta_drivers_tf = tp0.compute_feature_correlation(
obs_key="Fev_delta_pull", features="mouse", annotation={"time": [14.5, 15.5]}
)
fev_delta_drivers_tf.head(5)
| Fev_delta_pull_corr | Fev_delta_pull_pval | Fev_delta_pull_qval | Fev_delta_pull_ci_low | Fev_delta_pull_ci_high | |
|---|---|---|---|---|---|
| Isl1 | 0.174969 | 4.224819e-37 | 6.379477e-34 | 0.148456 | 0.201231 |
| Neurod2 | 0.165569 | 2.488616e-33 | 1.878905e-30 | 0.138975 | 0.191925 |
| Prox1 | 0.158975 | 8.115406e-31 | 3.063566e-28 | 0.132327 | 0.185393 |
| Peg3 | 0.152899 | 1.347384e-28 | 4.069099e-26 | 0.126204 | 0.179373 |
| Jund | 0.129767 | 5.756594e-21 | 1.241780e-18 | 0.102910 | 0.156434 |
We find that the predicted progenitor population of Fev+ delta cells also specifically expresses Neurod2, which suggests that we found a potential transcription factor for both delta and epsilon populations. Thus, we set out to experimentally verify the role of Neurod2, and found it to be an activator of epsilon cells [Klein et al., 2025].