Knowledge discovery and sequence-based prediction of pandemic influenza using an integrated classification and association rule mining (CBA) algorithm

Date

2015

Authors

Kargarfard, F.
Sami, A.
Ebrahimie, E.

Editors

Advisors

Journal Title

Journal ISSN

Volume Title

Type:

Journal article

Citation

Journal of Biomedical Informatics, 2015; 57:181-188

Statement of Responsibility

Conference Name

Abstract

Pandemic influenza is a major concern worldwide. Availability of advanced technologies and the nucleotide sequences of a large number of pandemic and non-pandemic influenza viruses in 2009 provide a great opportunity to investigate the underlying rules of pandemic induction through data mining tools. Here, for the first time, an integrated classification and association rule mining algorithm (CBA) was used to discover the rules underpinning alteration of non-pandemic sequences to pandemic ones. We hypothesized that the extracted rules can lead to the development of an efficient expert system for prediction of influenza pandemics. To this end, we used a large dataset containing 5373 HA (hemagglutinin) segments of the 2009 H1N1 pandemic and non-pandemic influenza sequences. The analysis was carried out for both nucleotide and protein sequences. We found a number of new rules which potentially present the undiscovered antigenic sites at influenza structure. At the nucleotide level, alteration of thymine (T) at position 260 was the key discriminating feature in distinguishing non-pandemic from pandemic sequences. At the protein level, rules including I233K, M334L were the differentiating features. CBA efficiently classifies pandemic and non-pandemic sequences with high accuracy at both the nucleotide and protein level. Finding hotspots in influenza sequences is a significant finding as they represent the regions with low antibody reactivity. We argue that the virus breaks host immunity response by mutation at these spots. Based on the discovered rules, we developed the software, "Prediction of Pandemic Influenza" for discrimination of pandemic from non-pandemic sequences. This study opens a new vista in discovery of association rules between mutation points during evolution of pandemic influenza.

School/Discipline

Dissertation Note

Provenance

Description

Data source: ‘‘Prediction of Pandemic Influenza’’ software, http://www.predictionofpandemicinfluenza.ir

Access Status

Rights

Copyright 2015 Elsevier Inc. Under an Elsevier user license (http://www.elsevier.com/open-access/userlicense/1.0/)

License

Grant ID

Call number

Persistent link to this record