Please use this identifier to cite or link to this item: http://hdl.handle.net/2440/60532
Type: Thesis
Title: Bayesian networks for high-dimensional data with complex mean structure.
Author: Kasza, Jessica Eleonore
Issue Date: 2010
School/Discipline: School of Mathematical Sciences : Statistics
Abstract: In a microarray experiment, it is expected that there will be correlations between the expression levels of different genes under study. These correlation structures are of great interest from both biological and statistical points of view. From a biological perspective, the identification of correlation structures can lead to an understanding of genetic pathways involving several genes, while the statistical interest, and the emphasis of this thesis, lies in the development of statistical methods to identify such structures. However, the data arising from microarray studies is typically very high-dimensional, with an order of magnitude more genes being considered than there are samples of each gene. This leads to difficulties in the estimation of the dependence structure of all genes under study. Graphical models and Bayesian networks are often used in these situations, providing flexible frameworks in which dependence structures for high-dimensional data sets can be considered. The current methods for the estimation of dependence structures for high-dimensional data sets typically assume the presence of independent and identically distributed samples of gene expression values. However, often the data available will have a complex mean structure and additional components of variance. Given such data, the application of methods that assume independent and identically distributed samples may result in incorrect biological conclusions being drawn. In this thesis, methods for the estimation of Bayesian networks for gene expression data sets that contain additional complexities are developed and implemented. The focus is on the development of score metrics that take account of these complexities for use in conjunction with score-based methods for the estimation of Bayesian networks, in particular the High-dimensional Bayesian Covariance Selection algorithm. The necessary theory relating to Gaussian graphical models and Bayesian networks is reviewed, as are the methods currently available for the estimation of dependence structures for high-dimensional data sets consisting of independent and identically distributed samples. Score metrics for the estimation of Bayesian networks when data sets are not independent and identically distributed are then developed and explored, and the utility and necessity of these metrics is demonstrated. Finally, the developed metrics are applied to a data set consisting of samples of grape genes taken from several different vineyards.
Advisor: Glonek, Garique Francis Vladimir
Solomon, Patricia Joy
Dissertation Note: Thesis (Ph.D.) -- University of Adelaide, School of Mathematical Sciences, 2010
Keywords: Bayesian networks; genetic regulatory networks; complex mean structure; high dimensional data
Appears in Collections:Research Theses

Files in This Item:
File Description SizeFormat 
01front.pdf109.8 kBAdobe PDFView/Open
02whole.pdf1.42 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.