Computational identification of eukaryotic promoters based on cascaded deep capsule neural networks

dc.contributor.authorZhu, Y.
dc.contributor.authorLi, F.
dc.contributor.authorXiang, D.
dc.contributor.authorAkutsu, T.
dc.contributor.authorSong, J.
dc.contributor.authorJia, C.
dc.date.issued2021
dc.description.abstractA promoter is a region in the DNA sequence that defines where the transcription of a gene by RNA polymerase initiates, which is typically located proximal to the transcription start site (TSS). How to correctly identify the gene TSS and the core promoter is essential for our understanding of the transcriptional regulation of genes. As a complement to conventional experimental methods, computational techniques with easy-to-use platforms as essential bioinformatics tools can be effectively applied to annotate the functions and physiological roles of promoters. In this work, we propose a deep learning-based method termed Depicter (Deep learning for predicting promoter), for identifying three specific types of promoters, i.e. promoter sequences with the TATA-box (TATA model), promoter sequences without the TATA-box (non-TATA model), and indistinguishable promoters (TATA and non-TATA model). Depicter is developed based on an up-to-date, species-specific dataset which includes Homo sapiens, Mus musculus, Drosophila melanogaster and Arabidopsis thaliana promoters. A convolutional neural network coupled with capsule layers is proposed to train and optimize the prediction model of Depicter. Extensive benchmarking and independent tests demonstrate that Depicter achieves an improved predictive performance compared with several state-of-the-art methods. The webserver of Depicter is implemented and freely accessible at https://depicter.erc.monash.edu/.
dc.description.statementofresponsibilityYan Zhu, Fuyi Li, Dongxu Xiang, Tatsuya Akutsu, Jiangning Song and Cangzhi Jia
dc.identifier.citationBriefings in Bioinformatics, 2021; 22(4):1-11
dc.identifier.doi10.1093/bib/bbaa299
dc.identifier.issn1467-5463
dc.identifier.issn1477-4054
dc.identifier.orcidLi, F. [0000-0001-5216-3213]
dc.identifier.urihttps://hdl.handle.net/2440/139566
dc.language.isoen
dc.publisherOxford University Press (OUP)
dc.rights© The Author(s) 2020. Published by Oxford University Press. All rights reserved.
dc.source.urihttps://doi.org/10.1093/bib/bbaa299
dc.subjecteukaryotic promoters; bioinformatics; sequence analysis; machine learning; deep learning
dc.subject.meshAnimals
dc.subject.meshHumans
dc.subject.meshMice
dc.subject.meshDrosophila melanogaster
dc.subject.meshArabidopsis
dc.subject.meshSequence Analysis, DNA
dc.subject.meshComputational Biology
dc.subject.meshTranscription, Genetic
dc.subject.meshSoftware
dc.subject.meshDatabases, Nucleic Acid
dc.subject.meshPromoter Regions, Genetic
dc.subject.meshNeural Networks, Computer
dc.titleComputational identification of eukaryotic promoters based on cascaded deep capsule neural networks
dc.typeJournal article
pubs.publication-statusPublished

Files