iLearn: An integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data

Chen, Z.; Zhao, P.; Li, F.; Marquez-Lago, T.T.; Leier, A.; Revote, J.; Zhu, Y.; Powell, D.R.; Akutsu, T.; Webb, G.I.; Chou, K.C.; Smith, A.I.; Daly, R.J.; Li, J.; Song, J.

doi:10.1093/bib/bbz041

iLearn: An integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data

dc.contributor.author	Chen, Z.
dc.contributor.author	Zhao, P.
dc.contributor.author	Li, F.
dc.contributor.author	Marquez-Lago, T.T.
dc.contributor.author	Leier, A.
dc.contributor.author	Revote, J.
dc.contributor.author	Zhu, Y.
dc.contributor.author	Powell, D.R.
dc.contributor.author	Akutsu, T.
dc.contributor.author	Webb, G.I.
dc.contributor.author	Chou, K.C.
dc.contributor.author	Smith, A.I.
dc.contributor.author	Daly, R.J.
dc.contributor.author	Li, J.
dc.contributor.author	Song, J.
dc.date.issued	2020
dc.description.abstract	With the explosive growth of biological sequences generated in the post-genomic era, one of the most challenging problems in bioinformatics and computational biology is to computationally characterize sequences, structures and functions in an efficient, accurate and high-throughput manner. A number of online web servers and stand-alone tools have been developed to address this to date; however, all these tools have their limitations and drawbacks in terms of their effectiveness, user-friendliness and capacity. Here, we present iLearn, a comprehensive and versatile Python-based toolkit, integrating the functionality of feature extraction, clustering, normalization, selection, dimensionality reduction, predictor construction, best descriptor/model selection, ensemble learning and results visualization for DNA, RNA and protein sequences. iLearn was designed for users that only want to upload their data set and select the functions they need calculated from it, while all necessary procedures and optimal settings are completed automatically by the software. iLearn includes a variety of descriptors for DNA, RNA and proteins, and four feature output formats are supported so as to facilitate direct output usage or communication with other computational tools. In total, iLearn encompasses 16 different types of feature clustering, selection, normalization and dimensionality reduction algorithms, and five commonly used machine-learning algorithms, thereby greatly facilitating feature analysis and predictor construction. iLearn is made freely available via an online web server and a stand-alone toolkit.
dc.description.statementofresponsibility	Zhen Chen, Pei Zhao, Fuyi Li, Tatiana T. Marquez-Lago, André Leier, Jerico Revote, Yan Zhu, David R. Powell, Tatsuya Akutsu, Geoffrey I. Webb, Kuo-Chen Chou, A. Ian Smith, Roger J. Daly, Jian Li and Jiangning Song
dc.identifier.citation	Briefings in Bioinformatics, 2020; 21(3):1047-1057
dc.identifier.doi	10.1093/bib/bbz041
dc.identifier.issn	1467-5463
dc.identifier.issn	1477-4054
dc.identifier.orcid	Li, F. [0000-0001-5216-3213]
dc.identifier.uri	https://hdl.handle.net/2440/139576
dc.language.iso	en
dc.publisher	Oxford University Press (OUP)
dc.relation.grant	http://purl.org/au-research/grants/nhmrc/1127948
dc.relation.grant	http://purl.org/au-research/grants/nhmrc/1144652
dc.relation.grant	http://purl.org/au-research/grants/nhmrc/490989
dc.relation.grant	http://purl.org/au-research/grants/nhmrc/LP110200333
dc.relation.grant	http://purl.org/au-research/grants/nhmrc/DP120104460
dc.rights	© The Author(s) 2019. Published by Oxford University Press. All rights reserved.
dc.source.uri	https://doi.org/10.1093/bib/bbz041
dc.subject	bioinformatics; integrated platform; sequence analysis; machine learning; automated modeling; data clustering; feature selection; biomedical data mining
dc.subject.mesh	Proteins
dc.subject.mesh	DNA
dc.subject.mesh	RNA
dc.subject.mesh	Sequence Analysis
dc.subject.mesh	Algorithms
dc.subject.mesh	Internet
dc.subject.mesh	Machine Learning
dc.title	iLearn: An integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data
dc.type	Journal article
pubs.publication-status	Published

Collections

Medicine publications

iLearn: An integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data

Files

Collections