Please use this identifier to cite or link to this item:
Scopus Web of Science® Altmetric
Type: Journal article
Title: A motif and amino acid bias bioinformatics pipeline to identify hydroxyproline-rich glycoproteins
Author: Johnson, K.
Cassin, A.
Lonsdale, A.
Bacic, A.
Doblin, M.
Schultz, C.
Citation: Plant Physiology, 2017; 174(2):886-903
Publisher: American Society of Plant Biologists
Issue Date: 2017
ISSN: 0032-0889
Statement of
Kim L. Johnson, Andrew M. Cassin, Andrew Lonsdale, Antony Bacic, Monika S. Doblin, and Carolyn J. Schultz
Abstract: Intrinsically disordered proteins (IDPs) are functional proteins that lack a well-defined three-dimensional structure. The study of IDPs is a rapidly growing area as the crucial biological functions of more of these proteins are uncovered. In plants, IDPs are implicated in plant stress responses, signaling, and regulatory processes. A superfamily of cell wall proteins, the hydroxyproline-rich glycoproteins (HRGPs), have characteristic features of IDPs. Their protein backbones are rich in the disordering amino acid proline, they contain repeated sequence motifs and extensive posttranslational modifications (glycosylation), and they have been implicated in many biological functions. HRGPs are evolutionarily ancient, having been isolated from the protein-rich walls of chlorophyte algae to the cellulose-rich walls of embryophytes. Examination of HRGPs in a range of plant species should provide valuable insights into how they have evolved. Commonly divided into the arabinogalactan proteins, extensins, and proline-rich proteins, in reality, a continuum of structures exists within this diverse and heterogenous superfamily. An inability to accurately classify HRGPs leads to inconsistent gene ontologies limiting the identification of HRGP classes in existing and emerging omics data sets. We present a novel and robust motif and amino acid bias (MAAB) bioinformatics pipeline to classify HRGPs into 23 descriptive subclasses. Validation of MAAB was achieved using available genomic resources and then applied to the 1000 Plants transcriptome project ( data set. Significant improvement in the detection of HRGPs using multiple-k-mer transcriptome assembly methodology was observed. The MAAB pipeline is readily adaptable and can be modified to optimize the recovery of IDPs from other organisms.
Keywords: Glycoproteins; Hydroxyproline; Plant Proteins; Arabidopsis Proteins; Proteome; Reproducibility of Results; Computational Biology; Amino Acid Motifs; Transcriptome; Intrinsically Disordered Proteins
Description: Published April 26, 2017
Rights: © 2017 American Society of Plant Biologists. All Rights Reserved.
RMID: 0030069176
DOI: 10.1104/pp.17.00294
Grant ID:
Appears in Collections:Agriculture, Food and Wine publications

Files in This Item:
There are no files associated with this item.

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.