Protein sequence comparison based on K-string dictionary

Yu, C.; He, R.; Yau, S.

doi:10.1016/j.gene.2013.07.092

Protein sequence comparison based on K-string dictionary

Date

2013

Authors

Yu, C.

He, R.

Yau, S.

Type:

Journal article

Citation

Gene, 2013; 529(2):250-256

DOI

10.1016/j.gene.2013.07.092

Abstract

The current K-string-based protein sequence comparisons require large amounts of computer memory because the dimension of the protein vector representation grows exponentially with K. In this paper, we propose a novel concept, the "K-string dictionary", to solve this high-dimensional problem. It allows us to use a much lower dimensional K-string-based frequency or probability vector to represent a protein, and thus significantly reduce the computer memory requirements for their implementation. Furthermore, based on this new concept, we use Singular Value Decomposition to analyze real protein datasets, and the improved protein vector representation allows us to obtain accurate gene trees.

Description

Data source: Supplementary data, https://doi.org/10.1016/j.gene.2013.07.092

Rights

Published Version

https://doi.org/10.1016/j.gene.2013.07.092

Persistent link to this record

https://hdl.handle.net/11541.2/131862

Full item page

Protein sequence comparison based on K-string dictionary

Date

Authors

Editors

Advisors

Journal Title

Journal ISSN

Volume Title

Type:

Citation

Statement of Responsibility

Conference Name

DOI

Abstract

School/Discipline

Dissertation Note

Provenance

Description

Access Status

Rights

License

Grant ID

Published Version

Call number

Persistent link to this record