VPatho: a deep learning-based two-stage approach for accurate prediction of gain-of-function and loss-of-function variants

Date

2023

Authors

Ge, F.
Li, C.
Iqbal, S.
Muhammad, A.
Li, F.
Thafar, M.A.
Yan, Z.
Worachartcheewan, A.
Xu, X.
Song, J.

Editors

Advisors

Journal Title

Journal ISSN

Volume Title

Type:

Journal article

Citation

Briefings in Bioinformatics, 2023; 24(1):1-16

Statement of Responsibility

Fang Ge, Chen Li, Shahid Iqbal, Arif Muhammad, Fuyi Li, Maha A. Thafar, Zihao Yan, Apilak Worachartcheewan, Xiaofeng Xu, Jiangning Song and Dong-Jun Yu

Conference Name

Abstract

Determining the pathogenicity and functional impact (i.e. gain-of-function; GOF or loss-of-function; LOF) of a variant is vital for unraveling the genetic level mechanisms of human diseases. To provide a 'one-stop' framework for the accurate identification of pathogenicity and functional impact of variants, we developed a two-stage deep-learning-based computational solution, termed VPatho, which was trained using a total of 9619 pathogenic GOF/LOF and 138 026 neutral variants curated from various databases. A total number of 138 variant-level, 262 protein-level and 103 genome-level features were extracted for constructing the models of VPatho. The development of VPatho consists of two stages: (i) a random under-sampling multi-scale residual neural network (ResNet) with a newly defined weighted-loss function (RUS-Wg-MSResNet) was proposed to predict variants' pathogenicity on the gnomAD_NV + GOF/LOF dataset; and (ii) an XGBOD model was constructed to predict the functional impact of the given variants. Benchmarking experiments demonstrated that RUS-Wg-MSResNet achieved the highest prediction performance with the weights calculated based on the ratios of neutral versus pathogenic variants. Independent tests showed that both RUS-Wg-MSResNet and XGBOD achieved outstanding performance. Moreover, assessed using variants from the CAGI6 competition, RUS-Wg-MSResNet achieved superior performance compared to state-of-the-art predictors. The fine-trained XGBOD models were further used to blind test the whole LOF data downloaded from gnomAD and accordingly, we identified 31 nonLOF variants that were previously labeled as LOF/uncertain variants. As an implementation of the developed approach, a webserver of VPatho is made publicly available at http://csbio.njust.edu.cn/bioinf/vpatho/ to facilitate community-wide efforts for profiling and prioritizing the query variants with respect to their pathogenicity and functional impact.

School/Discipline

Dissertation Note

Provenance

Description

Access Status

Rights

© The Author(s) 2022. Published by Oxford University Press. All rights reserved.

License

Call number

Persistent link to this record