VPatho: a deep learning-based two-stage approach for accurate prediction of gain-of-function and loss-of-function variants

dc.contributor.authorGe, F.
dc.contributor.authorLi, C.
dc.contributor.authorIqbal, S.
dc.contributor.authorMuhammad, A.
dc.contributor.authorLi, F.
dc.contributor.authorThafar, M.A.
dc.contributor.authorYan, Z.
dc.contributor.authorWorachartcheewan, A.
dc.contributor.authorXu, X.
dc.contributor.authorSong, J.
dc.contributor.authorYu, D.J.
dc.date.issued2023
dc.description.abstractDetermining the pathogenicity and functional impact (i.e. gain-of-function; GOF or loss-of-function; LOF) of a variant is vital for unraveling the genetic level mechanisms of human diseases. To provide a 'one-stop' framework for the accurate identification of pathogenicity and functional impact of variants, we developed a two-stage deep-learning-based computational solution, termed VPatho, which was trained using a total of 9619 pathogenic GOF/LOF and 138 026 neutral variants curated from various databases. A total number of 138 variant-level, 262 protein-level and 103 genome-level features were extracted for constructing the models of VPatho. The development of VPatho consists of two stages: (i) a random under-sampling multi-scale residual neural network (ResNet) with a newly defined weighted-loss function (RUS-Wg-MSResNet) was proposed to predict variants' pathogenicity on the gnomAD_NV + GOF/LOF dataset; and (ii) an XGBOD model was constructed to predict the functional impact of the given variants. Benchmarking experiments demonstrated that RUS-Wg-MSResNet achieved the highest prediction performance with the weights calculated based on the ratios of neutral versus pathogenic variants. Independent tests showed that both RUS-Wg-MSResNet and XGBOD achieved outstanding performance. Moreover, assessed using variants from the CAGI6 competition, RUS-Wg-MSResNet achieved superior performance compared to state-of-the-art predictors. The fine-trained XGBOD models were further used to blind test the whole LOF data downloaded from gnomAD and accordingly, we identified 31 nonLOF variants that were previously labeled as LOF/uncertain variants. As an implementation of the developed approach, a webserver of VPatho is made publicly available at http://csbio.njust.edu.cn/bioinf/vpatho/ to facilitate community-wide efforts for profiling and prioritizing the query variants with respect to their pathogenicity and functional impact.
dc.description.statementofresponsibilityFang Ge, Chen Li, Shahid Iqbal, Arif Muhammad, Fuyi Li, Maha A. Thafar, Zihao Yan, Apilak Worachartcheewan, Xiaofeng Xu, Jiangning Song and Dong-Jun Yu
dc.identifier.citationBriefings in Bioinformatics, 2023; 24(1):1-16
dc.identifier.doi10.1093/bib/bbac535
dc.identifier.issn1467-5463
dc.identifier.issn1477-4054
dc.identifier.orcidLi, F. [0000-0001-5216-3213]
dc.identifier.urihttps://hdl.handle.net/2440/139687
dc.language.isoen
dc.publisherOxford University Press (OUP)
dc.relation.granthttp://purl.org/au-research/grants/arc/DP120104460
dc.relation.granthttp://purl.org/au-research/grants/arc/LP110200333
dc.rights© The Author(s) 2022. Published by Oxford University Press. All rights reserved.
dc.source.urihttps://doi.org/10.1093/bib/bbac535
dc.subjectweighted-loss function; random under-sampling; 1D-ResNet; 2D-ResNet; gnomAD variants; pathogenic GOF/LOF
dc.subject.meshHumans
dc.subject.meshGenome
dc.subject.meshGain of Function Mutation
dc.subject.meshDeep Learning
dc.titleVPatho: a deep learning-based two-stage approach for accurate prediction of gain-of-function and loss-of-function variants
dc.typeJournal article
pubs.publication-statusPublished

Files