A cross entropy test allows quantitative statistical comparison of t-SNE and UMAP representations

Date

2023

Authors

Roca, C.P.
Burton, O.
Neumann, J.
Tareen, S.
Whyte, C.E.
Gergelits, V.
Veiga, R.V.
Humblet Baron, S.
Liston, A.

Editors

Advisors

Journal Title

Journal ISSN

Volume Title

Type:

Journal article

Citation

Cell Reports: Methods, 2023; 3(1, article no. 100390):1-15

Statement of Responsibility

Conference Name

Abstract

The advent of high-dimensional single-cell data has necessitated the development of dimensionality-reduction tools. t-Distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP) are the two most frequently used approaches, allowing clear visualization of complex single-cell datasets. Despite the need for quantitative comparison, t-SNE and UMAP have largely remained visualization tools due to the lack of robust statistical approaches. Here, we have derived a statistical test for evaluating the difference between dimensionality-reduced datasets using the Kolmogorov-Smirnov test on the distributions of cross entropy of single cells within each dataset. As the approach uses the inter-relationship of single cells for comparison, the resulting statistic is robust and capable of identifying true biological variation. Further, the test provides a valid distance between single-cell datasets, allowing the organization of multiple samples into a dendrogram for quantitative comparison of complex datasets. These results demonstrate the largely untapped potential of dimensionality-reduction tools for biomedical data analysis beyond visualization.

School/Discipline

Dissertation Note

Provenance

Description

Data source: Supplemental information, https://doi.org/10.1016/j.crmeth.2022.100390

Access Status

Rights

Copyright 2023 The Author(s). This is an open access article under the CC BY-NC-ND license. (http://creativecommons.org/licenses/by-nc-nd/4.0/)

License

Grant ID

Call number

Persistent link to this record