Please use this identifier to cite or link to this item:
Scopus Web of Science® Altmetric
Type: Journal article
Title: Mergeomics: Multidimensional data integration to identify pathogenic perturbations to biological systems
Author: Shu, L.
Zhao, Y.
Kurt, Z.
Byars, S.
Tukiainen, T.
Kettunen, J.
Orozco, L.
Pellegrini, M.
Lusis, A.
Ripatti, S.
Zhang, B.
Inouye, M.
Mäkinen, V.
Yang, X.
Citation: BMC Genomics, 2016; 17(1):874-1-874-16
Publisher: BioMed Central
Issue Date: 2016
ISSN: 1471-2164
Statement of
Le Shu, Yuqi Zhao, Zeyneb Kurt, Sean Geoffrey Byars, Taru Tukiainen, Johannes Kettunen, Luz D. Orozco, Matteo Pellegrini, Aldons J. Lusis, Samuli Ripatti, Bin Zhang, Michael Inouye, Ville-Petteri Mäkinen, and Xia Yang
Abstract: Background: Complex diseases are characterized by multiple subtle perturbations to biological processes. New omics platforms can detect these perturbations, but translating the diverse molecular and statistical information into testable mechanistic hypotheses is challenging. Therefore, we set out to create a public tool that integrates these data across multiple datasets, platforms, study designs and species in order to detect the most promising targets for further mechanistic studies. Results: We developed Mergeomics, a computational pipeline consisting of independent modules that 1) leverage multi-omics association data to identify biological processes that are perturbed in disease, and 2) overlay the diseaseassociated processes onto molecular interaction networks to pinpoint hubs as potential key regulators. Unlike existing tools that are mostly dedicated to specific data type or settings, the Mergeomics pipeline accepts and integrates datasets across platforms, data types and species. We optimized and evaluated the performance of Mergeomics using simulation and multiple independent datasets, and benchmarked the results against alternative methods. We also demonstrate the versatility of Mergeomics in two case studies that include genome-wide, epigenome-wide and transcriptome-wide datasets from human and mouse studies of total cholesterol and fasting glucose. In both cases, the Mergeomics pipeline provided statistical and contextual evidence to prioritize further investigations in the wet lab. The software implementation of Mergeomics is freely available as a Bioconductor R package. Conclusion: Mergeomics is a flexible and robust computational pipeline for multidimensional data integration. It outperforms existing tools, and is easily applicable to datasets from different studies, species and omics data types for the study of complex traits.
Keywords: Mergeomics; integrative genomics; multidimensional data integration; functional genomics; gene networks; key drivers; cholesterol; blood glucose
Rights: © The Author(s). 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.
RMID: 0030060096
DOI: 10.1186/s12864-016-3198-9
Grant ID:
Appears in Collections:Molecular and Biomedical Science publications

Files in This Item:
File Description SizeFormat 
hdl_104452.pdfPublished version1.87 MBAdobe PDFView/Open

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.