Including barcode information via barcode distances#
This example shows how to incorporate lineage information obtained from barcodes in the
LineageProblem
.
Check out moslin [Lange et al., 2023] for examples on real-world data.
See also
TODO: link to other relevant examples
Imports and data loading#
from moscot import datasets
from moscot.problems.time import LineageProblem
Simulate data using simulate_data()
.
adata = datasets.simulate_data(n_distributions=3, key="day", quad_term="barcode")
adata
AnnData object with n_obs × n_vars = 60 × 60
obs: 'day', 'celltype'
obsm: 'barcode'
We assume barcodes are saved in obsm
.
adata.obsm["barcode"][:10, :]
array([[ 1, 8, 0, 12, 18, 15, 11, 16, 13, 9],
[ 7, 19, 6, 1, 11, 8, 4, 15, 19, 1],
[10, 7, 3, 14, 15, 4, 11, 4, 0, 7],
[10, 19, 1, 18, 0, 14, 13, 5, 2, 12],
[ 2, 18, 1, 14, 17, 9, 12, 7, 3, 15],
[ 1, 7, 11, 7, 10, 8, 14, 19, 9, 16],
[ 9, 13, 5, 5, 13, 9, 2, 15, 0, 4],
[ 8, 6, 1, 7, 10, 12, 13, 8, 12, 16],
[ 3, 5, 10, 8, 5, 0, 1, 2, 9, 14],
[ 7, 11, 5, 2, 2, 4, 14, 0, 10, 5]])
Barcode distance#
Now, we can instantiate and prepare the LineageProblem
by specifying the cost.
lp = LineageProblem(adata)
lp = lp.prepare(
time_key="day",
lineage_attr={"attr": "obsm", "key": "barcode"},
cost={"x": "barcode_distance", "y": "barcode_distance", "xy": "sq_euclidean"},
)
INFO Computing pca with `n_comps=30` for `xy` using `adata.X`
INFO Computing pca with `n_comps=30` for `xy` using `adata.X`
Internally, cost matrices have been computed from the trees using the hamming distance between barcodes. Let us investigate the first few entries of the cost matrix computed from the barcodes.
lp[0, 1].x.data_src[:5, :5]
array([[0. , 1.9, 1.6, 1.8, 1.9],
[1.9, 0. , 1.9, 1.7, 2. ],
[1.6, 1.9, 0. , 1.6, 1.7],
[1.8, 1.7, 1.6, 0. , 1.7],
[1.9, 2. , 1.7, 1.7, 0. ]])
Similarly, we investigate parts of the cost matrix from the lineage tree corresponding to time point 1.
lp[0, 1].y.data_src[:5, :5]
array([[0. , 1.8, 2. , 1.8, 2. ],
[1.8, 0. , 2. , 1.6, 1.8],
[2. , 2. , 0. , 1.8, 2. ],
[1.8, 1.6, 1.8, 0. , 2. ],
[2. , 1.8, 2. , 2. , 0. ]])
Note that the gene expression term is still saved as two point clouds. The corresponding cost matrix will be computed by the backend.
lp[0, 1].xy.data_src.shape, lp[0, 1].xy.data_tgt.shape
((20, 30), (20, 30))