# Lineage tree#

This example shows how lineage trees can be passed, specifically useful for the LineageProblem, which requires lineage information. Check moslin for examples on real-world data.

moscot allows this by passing the:

1. precomputed cost matrices,

2. barcode information,

3. or the lineage tree as a DiGraph.

In this notebook, we consider the lineage tree case.

• TODO: link to other relevant examples

from moscot import datasets
from moscot.problems.time import LineageProblem


Simulate data using simulate_data().

adata = datasets.simulate_data(n_distributions=3, key="day", quad_term="tree")

AnnData object with n_obs × n_vars = 60 × 60
obs: 'day', 'celltype'
uns: 'trees'


We assume trees are saved in uns as a dict, where each key is a value in obs and each value is a DiGraph.

adata.uns["trees"]

{0: <networkx.classes.digraph.DiGraph at 0x7fb4786c9ae0>,
1: <networkx.classes.digraph.DiGraph at 0x7fb4786c9720>,
2: <networkx.classes.digraph.DiGraph at 0x7fb4786c94b0>}


## Leaf distance#

Now, we can instantiate and prepare the LineageProblem by specifying the cost.

lp = LineageProblem(adata)
lp = lp.prepare(
time_key="day",
lineage_attr={"attr": "uns", "key": "trees", "cost": "leaf_distance"},
)

INFO     Computing pca with n_comps=30 for xy using adata.X
INFO     Computing pca with n_comps=30 for xy using adata.X


Internally, cost matrices have been computed from the trees using the shortest path distance between the leaves. Let us investigate the first few entries of the cost matrix computed from the first lineage tree.

lp[0, 1].x.data_src[:3, :3]

array([[0., 2., 3.],
[2., 0., 3.],
[3., 3., 0.]])


Similarly, we investigate parts of the cost matrix created from the second tree.

lp[0, 1].y.data_src[:3, :3]

array([[0., 2., 3.],
[2., 0., 3.],
[3., 3., 0.]])


Note that the gene expression term is still saved as two point clouds. This cost matrix will be computed by the backend.

lp[0, 1].xy.data_src.shape, lp[0, 1].xy.data_tgt.shape

((20, 30), (20, 30))