References#

Bibliography#

[AFB+18]

Anna Alemany, Maria Florescu, Chloé S Baron, Josi Peterson-Maduro, and Alexander Van Oudenaarden. Whole-organism clone tracing using single-cell sequencing. Nature, 556(7699):108–112, 2018.

[CLC+22]

Ao Chen, Sha Liao, Mengnan Cheng, Kailong Ma, Liang Wu, Yiwei Lai, Xiaojie Qiu, Jin Yang, Jiangshan Xu, Shijie Hao, Xin Wang, Huifang Lu, Xi Chen, Xing Liu, Xin Huang, Zhao Li, Yan Hong, Yujia Jiang, Jian Peng, Shuai Liu, Mengzhe Shen, Chuanyu Liu, Quanshui Li, Yue Yuan, Xiaoyu Wei, Huiwen Zheng, Weimin Feng, Zhifeng Wang, Yang Liu, Zhaohui Wang, Yunzhi Yang, Haitao Xiang, Lei Han, Baoming Qin, Pengcheng Guo, Guangyao Lai, Pura Muñoz-Cánoves, Patrick H. Maxwell, Jean Paul Thiery, Qing-Feng Wu, Fuxiang Zhao, Bichao Chen, Mei Li, Xi Dai, Shuai Wang, Haoyan Kuang, Junhou Hui, Liqun Wang, Ji-Feng Fei, Ou Wang, Xiaofeng Wei, Haorong Lu, Bo Wang, Shiping Liu, Ying Gu, Ming Ni, Wenwei Zhang, Feng Mu, Ye Yin, Huanming Yang, Michael Lisby, Richard J. Cornall, Jan Mulder, Mathias Uhlén, Miguel A. Esteban, Yuxiang Li, Longqi Liu, Xun Xu, and Jian Wang. Spatiotemporal transcriptomic atlas of mouse organogenesis using dna nanoball-patterned arrays. Cell, 185(10):1777–1792.e21, 2022. URL: https://www.sciencedirect.com/science/article/pii/S0092867422003993, doi:10.1016/j.cell.2022.04.003.

[CPSV18]

Lenaic Chizat, Gabriel Peyré, Bernhard Schmitzer, and François-Xavier Vialard. Scaling algorithms for unbalanced optimal transport problems. Mathematics of Computation, 87(314):2563–2609, 2018. doi:10.48550/arXiv.1508.05216.

[CWW13]

Keenan Crane, Clarisse Weischedel, and Max Wardetzky. Geodesics in heat: a new approach to computing distance based on heat flow. ACM Trans. Graph., October 2013. URL: https://doi.org/10.1145/2516971.2516977, doi:10.1145/2516971.2516977.

[Cut13] (1,2,3)

Marco Cuturi. Sinkhorn distances: lightspeed computation of optimal transport. In C.J. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 26. Curran Associates, Inc., 2013. URL: https://proceedings.neurips.cc/paper/2013/file/af21d0c97db2e27e13572cbf59eb343d-Paper.pdf.

[DSS+22]

Pinar Demetci, Rebecca Santorella, Björn Sandstede, William Stafford Noble, and Ritambhara Singh. Scot: single-cell multi-omics alignment with optimal transport. Journal of Computational Biology, 29(1):3–18, 2022. URL: https://pubmed.ncbi.nlm.nih.gov/35050714/.

[Fis21]

R. A. Fisher. On the “probable error” of a coefficient of correlation deduced from a small sample. 1921.

[FS21]

Aden Forrow and Geoffrey Schiebinger. Lineageot is a unified framework for lineage tracing and trajectory inference. Nature communications, 12(1):4940, 2021.

[HLS+22]

Bo Hu, Sara Lelek, Bastiaan Spanjaard, Hadil El-Sammak, Mariana Guedes Simões, Janita Mintcheva, Hananeh Aliee, Ronny Schäfer, Alexander M. Meyer, Fabian Theis, Didier Y. R. Stainier, Daniela Panáková, and Jan Philipp Junker. Origin and function of activated fibroblast states during zebrafish heart regeneration. Nature Genetics, 54(8):1227–1237, 2022. URL: https://doi.org/10.1038/s41588-022-01129-5, doi:10.1038/s41588-022-01129-5.

[HTZ+23]

Guillaume Huguet, Alexander Tong, María Ramos Zapatero, Christopher J. Tape, Guy Wolf, and Smita Krishnaswamy. Geodesic sinkhorn for fast and accurate optimal transport on manifolds. 2023. arXiv:2211.00805.

[JTLE22]

Andrew Jones, F. William Townes, Didong Li, and Barbara E. Engelhardt. Alignment of spatial genomics and histology data using deep gaussian processes. bioRxiv, 2022. URL: https://www.biorxiv.org/content/early/2022/01/11/2022.01.10.475692, doi:10.1101/2022.01.10.475692.

[JKQ+20]

Matthew G Jones, Alex Khodaverdian, Jeffrey J Quinn, Michelle M Chan, Jeffrey A Hussmann, Robert Wang, Chenling Xu, Jonathan S Weissman, and Nir Yosef. Inference of single-cell phylogenies from lineage tracing data using cassiopeia. Genome biology, 21(1):1–27, 2020.

[LBK+22]

Marius Lange, Volker Bergen, Michal Klein, Manu Setty, Bernhard Reuter, Mostafa Bakhti, Heiko Lickert, Meshal Ansari, Janine Schniering, Herbert B Schiller, and others. Cellrank for directed single-cell fate mapping. Nature methods, 19(2):159–170, 2022.

[LPK+23]

Marius Lange, Zoe Piran, Michal Klein, Bastiaan Spanjaard, Dominik Klein, Jan Philipp Junker, Fabian J. Theis, and Mor Nitzan. Mapping lineage-traced cells across time points with moslin. bioRxiv, 2023. URL: https://www.biorxiv.org/content/10.1101/2023.04.14.536867v1.

[LZG+22]

Bin Li, Wen Zhang, Chuang Guo, Hao Xu, Longfei Li, Minghao Fang, Yinlei Hu, Xinye Zhang, Xinfeng Yao, Meifang Tang, Ke Liu, Xuetong Zhao, Jun Lin, Linzhao Cheng, Falai Chen, Tian Xue, and Kun Qu. Benchmarking spatial and single-cell transcriptomics integration methods for transcript distribution prediction and cell type deconvolution. Nature Methods, 19(6):662–670, 2022. URL: https://doi.org/10.1038/s41592-022-01480-9, doi:10.1038/s41592-022-01480-9.

[LRC+18]

Romain Lopez, Jeffrey Regier, Michael B Cole, Michael I Jordan, and Nir Yosef. Deep generative modeling for single-cell transcriptomics. Nature methods, 15(12):1053–1058, 2018.

[LBC+21]

Malte D Luecken, Daniel Bernard Burkhardt, Robrecht Cannoodt, Christopher Lance, Aditi Agrawal, Hananeh Aliee, Ann T Chen, Louise Deconinck, Angela M Detweiler, Alejandro A Granados, and others. A sandbox for prediction and integration of dna, rna, and proteins in single cells. In Thirty-fifth conference on neural information processing systems datasets and benchmarks track (Round 2). NeurIPS Proceedings, 2021. URL: https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/file/158f3069a435b314a80bdcb024f8e422-Paper-round2.pdf.

[Mem11]

Facundo Mémoli. Gromov–wasserstein distances and the metric approach to object matching. Foundations of Computational Mathematics, 11(4):417–487, 2011. doi:10.1007/s10208-011-9093-5.

[NKFR19]

Mor Nitzan, Nikos Karaiskos, Nir Friedman, and Nikolaus Rajewsky. Gene expression cartography. Nature, 576(7785):132–137, 2019. doi:10.1038/s41586-019-1773-3.

[PZH+19]

Jonathan S. Packer, Qin Zhu, Chau Huynh, Priya Sivaramakrishnan, Elicia Preston, Hannah Dueck, Derek Stefanik, Kai Tan, Cole Trapnell, Junhyong Kim, Robert H. Waterston, and John I. Murray. A lineage-resolved molecular atlas of c. elegans embryogenesis at single-cell resolution. Science, 365(6459):eaax1971, 2019. URL: https://www.science.org/doi/abs/10.1126/science.aax1971, doi:10.1126/science.aax1971.

[PLZ22]

Xinhai Pan, Hechen Li, and Xiuwei Zhang. TedSim: temporal dynamics simulation of single-cell RNA sequencing data and cell division history. Nucleic Acids Research, 50(8):4272–4288, 04 2022. URL: https://doi.org/10.1093/nar/gkac235, doi:10.1093/nar/gkac235.

[PC+19]

Gabriel Peyré, Marco Cuturi, and others. Computational optimal transport: with applications to data science. Foundations and Trends® in Machine Learning, 11(5-6):355–607, 2019.

[PCS16]

Gabriel Peyré, Marco Cuturi, and Justin Solomon. Gromov-wasserstein averaging of kernel and distance matrices. In Maria Florina Balcan and Kilian Q. Weinberger, editors, Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, 2664–2672. PMLR, 06 2016. URL: http://proceedings.mlr.press/v48/peyre16.html.

[RWM+18]

Bushra Raj, Daniel E Wagner, Aaron McKenna, Shristi Pandey, Allon M Klein, Jay Shendure, James A Gagnon, and Alexander F Schier. Simultaneous single-cell profiling of lineages and cell types in the vertebrate brain. Nature biotechnology, 36(5):442–450, 2018.

[SC22]

Meyer Scetbon and Marco Cuturi. Low-rank optimal transport: approximation, statistics and debiasing. 2022. URL: https://arxiv.org/abs/2205.12365, doi:10.48550/ARXIV.2205.12365.

[SCP21]

Meyer Scetbon, Marco Cuturi, and Gabriel Peyré. Low-rank sinkhorn factorization. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, 9344–9354. PMLR, 07 2021. URL: https://proceedings.mlr.press/v139/scetbon21a.html.

[SKPC23]

Meyer Scetbon, Michal Klein, Giovanni Palla, and Marco Cuturi. Unbalanced low-rank optimal transport solvers. 2023. URL: https://arxiv.org/abs/2305.19727, doi:10.48550/arXiv.2305.19727.

[SPC21]

Meyer Scetbon, Gabriel Peyré, and Marco Cuturi. Linear-time gromov wasserstein distances using low rank couplings and costs. 2021. URL: https://arxiv.org/abs/2106.01128, doi:10.48550/ARXIV.2106.01128.

[SST+19]

Geoffrey Schiebinger, Jian Shu, Marcin Tabaka, Brian Cleary, Vidya Subramanian, Aryeh Solomon, Joshua Gould, Siyan Liu, Stacie Lin, Peter Berube, Lia Lee, Jenny Chen, Justin Brumbaugh, Philippe Rigollet, Konrad Hochedlinger, Rudolf Jaenisch, Aviv Regev, and Eric S. Lander. Optimal-transport analysis of single-cell gene expression identifies developmental trajectories in reprogramming. Cell, 176(4):928–943.e22, 2019. doi:10.1016/j.cell.2019.01.006.

[SVP21]

Thibault Sejourne, Francois-Xavier Vialard, and Gabriel Peyré. The unbalanced gromov wasserstein distance: conic formulation and relaxation. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, 8766–8779. Curran Associates, Inc., 2021. URL: https://proceedings.neurips.cc/paper/2021/file/4990974d150d0de5e6e15a1454fe6b0f-Paper.pdf.

[SHM+18]

Bastiaan Spanjaard, Bo Hu, Nina Mitic, Pedro Olivares-Chauvet, Sharan Janjuha, Nikolay Ninov, and Jan Philipp Junker. Simultaneous lineage tracing and cell-type identification using crispr–cas9-induced genetic scars. Nature biotechnology, 36(5):469–473, 2018. URL: https://doi.org/10.1038/nbt.4124.

[SMFR+20]

Sanjay R Srivatsan, José L McFaline-Figueroa, Vijay Ramani, Lauren Saunders, Junyue Cao, Jonathan Packer, Hannah A Pliner, Dana L Jackson, Riza M Daza, Lena Christiansen, and others. Massively multiplex chemical transcriptomics at single-cell resolution. Science, 367(6473):45–51, 2020.

[TC22]

James Thornton and Marco Cuturi. Rethinking initialization of the sinkhorn algorithm. 2022. arXiv:2206.07630.

[TIP+16]

Itay Tirosh, Benjamin Izar, Sanjay M. Prakadan, Marc H. Wadsworth, Daniel Treacy, John J. Trombetta, Asaf Rotem, Christopher Rodman, Christine Lian, George Murphy, Mohammad Fallahi-Sichani, Ken Dutton-Regester, Jia-Ren Lin, Ofir Cohen, Parin Shah, Diana Lu, Alex S. Genshaft, Travis K. Hughes, Carly G. K. Ziegler, Samuel W. Kazer, Aleth Gaillard, Kellie E. Kolb, Alexandra-Chloé Villani, Cory M. Johannessen, Aleksandr Y. Andreev, Eliezer M. Van Allen, Monica Bertagnolli, Peter K. Sorger, Ryan J. Sullivan, Keith T. Flaherty, Dennie T. Frederick, Judit Jané-Valbuena, Charles H. Yoon, Orit Rozenblatt-Rosen, Alex K. Shalek, Aviv Regev, and Levi A. Garraway. Dissecting the multicellular ecosystem of metastatic melanoma by single-cell rna-seq. Science, 352(6282):189–196, 2016. doi:10.1126/science.aad0501.

[TVH+16]

Itay Tirosh, Andrew S. Venteicher, Christine Hebert, Leah E. Escalante, Anoop P. Patel, Keren Yizhak, Jonathan M. Fisher, Christopher Rodman, Christopher Mount, Mariella G. Filbin, Cyril Neftel, Niyati Desai, Jackson Nyman, Benjamin Izar, Christina C. Luo, Joshua M. Francis, Aanand A. Patel, Maristela L. Onozato, Nicolo Riggi, Kenneth J. Livak, Dave Gennert, Rahul Satija, Brian V. Nahed, William T. Curry, Robert L. Martuza, Ravindra Mylvaganam, A. John Iafrate, Matthew P. Frosch, Todd R. Golub, Miguel N. Rivera, Gad Getz, Orit Rozenblatt-Rosen, Daniel P. Cahill, Michelle Monje, Bradley E. Bernstein, David N. Louis, Aviv Regev, and Mario L. Suvà. Single-cell rna-seq supports a developmental hierarchy in human oligodendroglioma. Nature, 539(7628):309–313, 2016. doi:10.1038/nature20123.

[VCF+18]

Titouan Vayer, Laetita Chapel, Rémi Flamary, Romain Tavenard, and Nicolas Courty. Fused gromov-wasserstein distance for structured objects: theoretical foundations and mathematical properties. 2018. URL: https://arxiv.org/abs/1811.02834, doi:10.48550/ARXIV.1811.02834.

[VCF+20]

Titouan Vayer, Laetitia Chapel, Remi Flamary, Romain Tavenard, and Nicolas Courty. Fused gromov-wasserstein distance for structured objects. Algorithms, 2020. URL: https://www.mdpi.com/1999-4893/13/9/212, doi:10.3390/a13090212.

[WK20]

Daniel E Wagner and Allon M Klein. Lineage tracing meets single-cell omics: opportunities and challenges. Nature Reviews Genetics, 21(7):410–427, 2020.

[ZLSR22]

Ron Zeira, Max Land, Alexander Strzalkowski, and Benjamin J. Raphael. Alignment and integration of spatial transcriptomics data. Nature Methods, 19(5):567–575, 2022. URL: https://doi.org/10.1038/s41592-022-01459-6, doi:10.1038/s41592-022-01459-6.

Glossary#

OT#

An optimal transport problem is defined as a matching task between distributions, e.g. sets of cells.

transport matrix#

The output of a discrete OT problem indicating how much mass from data point \(x_i\) in row \(i\) is transported to data point \(y_j\) in column \(j\).

entropic regularization#

Entropy regularization of OT problems [Cuturi, 2013] reduces the time complexity and allows for more desirable statistical properties. The higher the entropy regularization, the more diffused the OT solution.

marginals#

An OT problem matches distributions, e.g. set of cells. The distribution is defined by the location of a cell, e.g. in gene expression space, and the weight assigned to one cell.

balanced OT problem#

OT problem where the marginals are fixed. Each data point (cell) of the source distribution emits a certain amount of mass given by the source marginals, and each data point (cell) of the target distribution receives a certain amount of mass given by the target marginals.

unbalanced OT problem#

OT problem where the marginals are not fixed. If beneficial, a data point might emit or receive more or less mass than prescribed by the marginals. The larger the unbalancedness parameters tau_a and tau_b, the more the mass emitted, and received, respectively, can deviate from the marginals [Chizat et al., 2018].

linear problem#

OT problem only containing a linear term and no quadratic term.

linear term#

Term of the cost function on the shared space, e.g. gene expression space.

quadratic problem#

OT problem containing a quadratic term and possibly a linear term.

quadratic term#

Term of the cost function comparing two different spaces.

Gromov-Wasserstein#

OT problem between two distributions where a data point, e.g. a cell. in the source distribution does not live in the same space as a data point in the target distribution. Such problem is a quadratic problem.

fused Gromov-Wasserstein#

OT problem between two distributions where a data point, e.g. a cell, of the source distribution has both features in the same space as the target distribution (linear term) and features in a different space than a data point in the target distribution (quadratic term). Such problem is a quadratic problem.

dual potentials#

Potentials obtained by the Sinkhorn algorithm which define the solution of a linear problem [Cuturi, 2013]. These weights are referred to as marginals.

Sinkhorn#

The Sinkhorn algorithm [Cuturi, 2013] is used for solving a linear problem, and is also used in inner iterations for solving a quadratic problem.

low-rank OT#

low-rank OT approximates full-rank OT, which allows for faster computations and lower memory complexity [Scetbon and Cuturi, 2022, Scetbon et al., 2021, Scetbon et al., 2023, Scetbon et al., 2021]. The transport matrix will be low-rank.

low-rank#

If the OT problem is solved with a low-rank solver, the transport matrix is the product of several low-rank matrices (i.e. lower than the number of data points in the source distribution and the target distribution).