Prediction of Multiple Types of RNA Modifications via Biological Language Model

dc.contributor.authorZhang, Y.
dc.contributor.authorGe, F.
dc.contributor.authorLi, F.
dc.contributor.authorYang, X.
dc.contributor.authorSong, J.
dc.contributor.authorYu, D.-J.
dc.date.issued2023
dc.description.abstractIt has been demonstrated that RNA modifications play essential roles in multiple biological processes. Accurate identification of RNA modifications in the transcriptome is critical for providing insights into the biological functions and mechanisms. Many tools have been developed for predicting RNA modifications at single-base resolution, which employ conventional feature engineering methods that focus on feature design and feature selection processes that require extensive biological expertise and may introduce redundant information. With the rapid development of artificial intelligence technologies, end-to-end methods are favorably received by researchers. Nevertheless, each well-trained model is only suitable for a specific RNA methylation modification type for nearly all of these approaches. In this study, we present MRM-BERT by feeding task-specific sequences into the powerful BERT (Bidirectional Encoder Representations from Transformers) model and implementing fine-tuning, which exhibits competitive performance to the state-of-the-art methods. MRM-BERT avoids repeated de novo training of the model and can predict multipleRNAmodifications such as pseudouridine, m6A, m5C, and m1A in Mus musculus, Arabidopsis thaliana, and Saccharomyces cerevisiae. In addition, we analyse the attention heads to provide high attention regions for the prediction, and conduct saturated in silico mutagenesis of the input sequences to discover potential changes of RNA modifications, which can better assist researchers in their follow-up research.
dc.description.statementofresponsibilityYing Zhang, Fang Ge, Fuyi Li, Xibei Yang, Jiangning Song, and Dong-Jun Yu
dc.identifier.citationIEEE/ACM Transactions on Computational Biology and Bioinformatics, 2023; 20(5):3205-3214
dc.identifier.doi10.1109/tcbb.2023.3283985
dc.identifier.issn1545-5963
dc.identifier.issn1557-9964
dc.identifier.orcidLi, F. [0000-0001-5216-3213]
dc.identifier.urihttps://hdl.handle.net/2440/139906
dc.language.isoen
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)
dc.relation.granthttp://purl.org/au-research/grants/nhmrc/1127948
dc.relation.granthttp://purl.org/au-research/grants/nhmrc/1144652
dc.relation.granthttp://purl.org/au-research/grants/arc/LP110200333
dc.relation.granthttp://purl.org/au-research/grants/arc/DP120104460
dc.rights© 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.
dc.source.urihttps://doi.org/10.1109/tcbb.2023.3283985
dc.subjectRNA modification; deep learning; self-attention mechanism; BERT; biological language model
dc.subject.meshAnimals
dc.subject.meshMice
dc.subject.meshSaccharomyces cerevisiae
dc.subject.meshArabidopsis
dc.subject.meshRNA
dc.subject.meshPseudouridine
dc.subject.meshArtificial Intelligence
dc.subject.meshTranscriptome
dc.titlePrediction of Multiple Types of RNA Modifications via Biological Language Model
dc.typeJournal article
pubs.publication-statusPublished

Files