A cross entropy test allows quantitative statistical comparison of t-SNE and UMAP representations
Files
(Published version)
Date
2023
Authors
Roca, C.P.
Burton, O.
Neumann, J.
Tareen, S.
Whyte, C.E.
Gergelits, V.
Veiga, R.V.
Humblet Baron, S.
Liston, A.
Editors
Advisors
Journal Title
Journal ISSN
Volume Title
Type:
Journal article
Citation
Cell Reports: Methods, 2023; 3(1, article no. 100390):1-15
Statement of Responsibility
Conference Name
Abstract
The advent of high-dimensional single-cell data has necessitated the development of dimensionality-reduction tools. t-Distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP) are the two most frequently used approaches, allowing clear visualization of complex single-cell datasets. Despite the need for quantitative comparison, t-SNE and UMAP have largely remained visualization tools due to the lack of robust statistical approaches. Here, we have derived a statistical test for evaluating the difference between dimensionality-reduced datasets using the Kolmogorov-Smirnov test on the distributions of cross entropy of single cells within each dataset. As the approach uses the inter-relationship of single cells for comparison, the resulting statistic is robust and capable of identifying true biological variation. Further, the test provides a valid distance between single-cell datasets, allowing the organization of multiple samples into a dendrogram for quantitative comparison of complex datasets. These results demonstrate the largely untapped potential of dimensionality-reduction tools for biomedical data analysis beyond visualization.
School/Discipline
Dissertation Note
Provenance
Description
Data source: Supplemental information, https://doi.org/10.1016/j.crmeth.2022.100390
Access Status
Rights
Copyright 2023 The Author(s). This is an open access article under the CC BY-NC-ND license. (http://creativecommons.org/licenses/by-nc-nd/4.0/)