Positive-unlabeled learning in bioinformatics and computational biology: A brief review

dc.contributor.authorLi, F.
dc.contributor.authorDong, S.
dc.contributor.authorLeier, A.
dc.contributor.authorHan, M.
dc.contributor.authorGuo, X.
dc.contributor.authorXu, J.
dc.contributor.authorWang, X.
dc.contributor.authorPan, S.
dc.contributor.authorJia, C.
dc.contributor.authorZhang, Y.
dc.contributor.authorWebb, G.I.
dc.contributor.authorCoin, L.J.M.
dc.contributor.authorLi, C.
dc.contributor.authorSong, J.
dc.date.issued2022
dc.description.abstractConventional supervised binary classification algorithms have been widely applied to address significant research questions using biological and biomedical data. This classification scheme requires two fully labeled classes of data (e.g. positive and negative samples) to train a classification model. However, in many bioinformatics applications, labeling data is laborious, and the negative samples might be potentially mislabeled due to the limited sensitivity of the experimental equipment. The positive unlabeled (PU) learning scheme was therefore proposed to enable the classifier to learn directly from limited positive samples and a large number of unlabeled samples (i.e. a mixture of positive or negative samples). To date, several PU learning algorithms have been developed to address various biological questions, such as sequence identification, functional site characterization and interaction prediction. In this paper, we revisit a collection of 29 state-of-the-art PU learning bioinformatic applications to address various biological questions. Various important aspects are extensively discussed, including PU learning methodology, biological application, classifier design and evaluation strategy. We also comment on the existing issues of PU learning and offer our perspectives for the future development of PU learning applications. We anticipate that our work serves as an instrumental guideline for a better understanding of the PU learning framework in bioinformatics and further developing next-generation PU learning frameworks for critical biological applications.
dc.description.statementofresponsibilityFuyi Li, Shuangyu Dong, André Leier, Meiya Han, Xudong Guo, Jing Xu, Xiaoyu Wang, Shirui Pan, Cangzhi Jia, Yang Zhang, Geoffrey I.Webb, Lachlan J.M. Coin, Chen Li and Jiangning Song
dc.identifier.citationBriefings in Bioinformatics, 2022; 23(1):bbab461-1-bbab461-13
dc.identifier.doi10.1093/bib/bbab461
dc.identifier.issn1467-5463
dc.identifier.issn1477-4054
dc.identifier.orcidLi, F. [0000-0001-5216-3213]
dc.identifier.urihttps://hdl.handle.net/2440/139737
dc.language.isoen
dc.publisherOxford University Press (OUP)
dc.relation.granthttp://purl.org/au-research/grants/nhmrc/1127948
dc.relation.granthttp://purl.org/au-research/grants/nhmrc/1144652
dc.relation.granthttp://purl.org/au-research/grants/arc/LP110200333
dc.relation.granthttp://purl.org/au-research/grants/arc/DP120104460
dc.relation.granthttp://purl.org/au-research/grants/nhmrc/1143366
dc.relation.granthttp://purl.org/au-research/grants/nhmrc/1103384
dc.relation.granthttp://purl.org/au-research/grants/nhmrc/GNT1195743
dc.rights© The Author(s) 2021. Published by Oxford University Press. All rights reserved.
dc.source.urihttps://doi.org/10.1093/bib/bbab461
dc.subjectpositive unlabeled learning; semi-supervised learning; machine learning; bioinformatics; pattern recognition
dc.subject.meshComputational Biology
dc.subject.meshAlgorithms
dc.subject.meshSupervised Machine Learning
dc.titlePositive-unlabeled learning in bioinformatics and computational biology: A brief review
dc.typeJournal article
pubs.publication-statusPublished

Files