# Including barcode information via barcode distances#

This example shows how to incorporate lineage information obtained from barcodes in the LineageProblem. Check out moslin for examples on real-world data.

• TODO: link to other relevant examples

from moscot import datasets
from moscot.problems.time import LineageProblem


Simulate data using simulate_data().

adata = datasets.simulate_data(n_distributions=3, key="day", quad_term="barcode")

AnnData object with n_obs × n_vars = 60 × 60
obs: 'day', 'celltype'
obsm: 'barcode'


We assume barcodes are saved in obsm.

adata.obsm["barcode"][:10, :]

array([[ 1,  8,  0, 12, 18, 15, 11, 16, 13,  9],
[ 7, 19,  6,  1, 11,  8,  4, 15, 19,  1],
[10,  7,  3, 14, 15,  4, 11,  4,  0,  7],
[10, 19,  1, 18,  0, 14, 13,  5,  2, 12],
[ 2, 18,  1, 14, 17,  9, 12,  7,  3, 15],
[ 1,  7, 11,  7, 10,  8, 14, 19,  9, 16],
[ 9, 13,  5,  5, 13,  9,  2, 15,  0,  4],
[ 8,  6,  1,  7, 10, 12, 13,  8, 12, 16],
[ 3,  5, 10,  8,  5,  0,  1,  2,  9, 14],
[ 7, 11,  5,  2,  2,  4, 14,  0, 10,  5]])


## Barcode distance#

Now, we can instantiate and prepare the LineageProblem by specifying the cost.

lp = LineageProblem(adata)
lp = lp.prepare(
time_key="day",
lineage_attr={"attr": "obsm", "key": "barcode"},
cost={"x": "barcode_distance", "y": "barcode_distance", "xy": "sq_euclidean"},
)

INFO     Computing pca with n_comps=30 for xy using adata.X
INFO     Computing pca with n_comps=30 for xy using adata.X


Internally, cost matrices have been computed from the trees using the hamming distance between barcodes. Let us investigate the first few entries of the cost matrix computed from the barcodes.

lp[0, 1].x.data_src[:5, :5]

array([[0. , 1.9, 1.6, 1.8, 1.9],
[1.9, 0. , 1.9, 1.7, 2. ],
[1.6, 1.9, 0. , 1.6, 1.7],
[1.8, 1.7, 1.6, 0. , 1.7],
[1.9, 2. , 1.7, 1.7, 0. ]])


Similarly, we investigate parts of the cost matrix from the lineage tree corresponding to time point 1.

lp[0, 1].y.data_src[:5, :5]

array([[0. , 1.8, 2. , 1.8, 2. ],
[1.8, 0. , 2. , 1.6, 1.8],
[2. , 2. , 0. , 1.8, 2. ],
[1.8, 1.6, 1.8, 0. , 2. ],
[2. , 1.8, 2. , 2. , 0. ]])


Note that the gene expression term is still saved as two point clouds. The corresponding cost matrix will be computed by the backend.

lp[0, 1].xy.data_src.shape, lp[0, 1].xy.data_tgt.shape

((20, 30), (20, 30))