Please use this identifier to cite or link to this item:
Scopus Web of Science® Altmetric
Type: Journal article
Title: Combining transcriptional datasets using the generalized singular value decomposition
Author: Schreiber, A.
Shirley, N.
Burton, R.
Fincher, G.
Citation: BMC Bioinformatics, 2008; 2008(1):1-15
Publisher: BioMed Central Ltd.
Issue Date: 2008
ISSN: 1471-2105
Statement of
Andreas W Schreiber, Neil J Shirley, Rachel A Burton and Geoffrey B Fincher
Abstract: Background Both microarrays and quantitative real-time PCR are convenient tools for studying the transcriptional levels of genes. The former is preferable for large scale studies while the latter is a more targeted technique. Because of platform-dependent systematic effects, simple comparisons or merging of datasets obtained by these technologies are difficult, even though they may often be desirable. These difficulties are exacerbated if there is only partial overlap between the experimental conditions and genes probed in the two datasets. Results We show here that the generalized singular value decomposition provides a practical tool for merging a small, targeted dataset obtained by quantitative real-time PCR of specific genes with a much larger microarray dataset. The technique permits, for the first time, the identification of genes present in only one dataset co-expressed with a target gene present exclusively in the other dataset, even when experimental conditions for the two datasets are not identical. With the rapidly increasing number of publically available large scale microarray datasets the latter is frequently the case. The method enables us to discover putative candidate genes involved in the biosynthesis of the (1,3;1,4)-β-D-glucan polysaccharide found in plant cell walls. Conclusion We show that the generalized singular value decomposition provides a viable tool for a combined analysis of two gene expression datasets with only partial overlap of both gene sets and experimental conditions. We illustrate how the decomposition can be optimized self-consistently by using a judicious choice of genes to define it. The ability of the technique to seamlessly define a concept of "co-expression" across both datasets provides an avenue for meaningful data integration. We believe that it will prove to be particularly useful for exploiting large, publicly available, microarray datasets for species with unsequenced genomes by complementing them with more limited in-house expression measurements.
Keywords: Proteome; Transcription Factors; Oligonucleotide Array Sequence Analysis; Gene Expression Profiling; Reverse Transcriptase Polymerase Chain Reaction; Algorithms; Database Management Systems; Information Storage and Retrieval; Databases, Protein
Rights: © 2008 Schreiber et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
RMID: 0020084741
DOI: 10.1186/1471-2105-9-335
Appears in Collections:Agriculture, Food and Wine publications

Files in This Item:
File Description SizeFormat 
hdl_51654.pdfPublished version431.79 kBAdobe PDFView/Open

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.