Protein sequence comparison based on K-string dictionary
| dc.contributor.author | Yu, C. | |
| dc.contributor.author | He, R. | |
| dc.contributor.author | Yau, S. | |
| dc.date.issued | 2013 | |
| dc.description | Data source: Supplementary data, https://doi.org/10.1016/j.gene.2013.07.092 | |
| dc.description.abstract | The current K-string-based protein sequence comparisons require large amounts of computer memory because the dimension of the protein vector representation grows exponentially with K. In this paper, we propose a novel concept, the "K-string dictionary", to solve this high-dimensional problem. It allows us to use a much lower dimensional K-string-based frequency or probability vector to represent a protein, and thus significantly reduce the computer memory requirements for their implementation. Furthermore, based on this new concept, we use Singular Value Decomposition to analyze real protein datasets, and the improved protein vector representation allows us to obtain accurate gene trees. | |
| dc.identifier.citation | Gene, 2013; 529(2):250-256 | |
| dc.identifier.doi | 10.1016/j.gene.2013.07.092 | |
| dc.identifier.issn | 0378-1119 | |
| dc.identifier.issn | 1879-0038 | |
| dc.identifier.orcid | Yu, C. [0000-0002-3248-8421] | |
| dc.identifier.uri | https://hdl.handle.net/11541.2/131862 | |
| dc.language.iso | en | |
| dc.publisher | Elsevier | |
| dc.relation.funding | US NSF DMS-1120824 | |
| dc.relation.funding | China NSF 31271408 | |
| dc.relation.funding | Tsinghua University | |
| dc.rights | Copyright 2013 Elsevier | |
| dc.source.uri | https://doi.org/10.1016/j.gene.2013.07.092 | |
| dc.subject | Proteins | |
| dc.subject | Data Interpretation, Statistical | |
| dc.subject | Sequence Alignment | |
| dc.subject | Sequence Analysis, Protein | |
| dc.subject | Phylogeny | |
| dc.subject | Databases, Protein | |
| dc.title | Protein sequence comparison based on K-string dictionary | |
| dc.type | Journal article | |
| pubs.publication-status | Published | |
| ror.mmsid | 9916188090201831 |