Please use this identifier to cite or link to this item: http://hdl.handle.net/2440/82686
Citations
Scopus Web of Science® Altmetric
?
?
Type: Journal article
Title: Science in the cloud: Allocation and execution of data-intensive scientific workflows
Author: Szabo, C.
Sheng, Q.
Kroeger, T.
Zhang, Y.
Yu, J.
Citation: Journal of Grid Computing, 2014; 12(2):245-264
Publisher: Springer
Issue Date: 2014
ISSN: 1570-7873
1572-9184
Statement of
Responsibility: 
Claudia Szabo, Quan Z. Sheng, Trent Kroeger, Yihong Zhang, Jian Yu
Abstract: An important challenge for the adoption of cloud computing in the scientific community remains the efficient allocation and execution of data-intensive scientific workflows to reduce execution time and the size of transferred data. The transferred data overhead is becoming significant with emerging scientific workflows that have input/output files and intermediate data products ranging in the hundreds of gigabytes. The allocation of scientific workflows on public clouds can be described through a variety of perspectives and parameters, and has been proved to be NP-complete. This paper proposes an evolutionary approach for task allocation on public clouds considering data transfer and execution time. In our framework, a solution is represented using an allocation chromosome that encodes the allocation of tasks to nodes, and an ordering chromosome that defines the execution order according to the scientific workflow representation. We propose a multi-objective optimization that relies on a cloud cost model and employs tailored evolution operators. Starting from a population of possible solutions, we employ crossover and mutation operators on both chromosomes aiming at optimizing the data transferred between nodes as well as the total workflow runtime. The crossover operators combine parts of solutions to reduce data overhead, whereas the mutation operators swamp between parts of the same chromosome according to pre-defined rules. Our experimental study compares between the proposed approach and current state-of-the art approaches using synthetic and real-life workflows. Our algorithm performs similarly to existing heuristics for small workflows and shows up to 80 % improvements for larger synthetic workflows. To further validate our approach we compare between the allocation and scheduling obtained by our approach with that obtained by popular scientific workflow managers, when real workflows with hundreds of tasks are executed on a public cloud. The results show a 10 % improvement in runtime over existing schedulers, caused by a 80 % reduction in transferred data and optimized allocation and ordering of tasks. This improved data locality has greater impact as it can be employed to improve and study data provenance and facilitate data persistence for scientific workflows.
Keywords: Data-intensive workflows; cloud computing; scheduling; allocation; evolutionary computation
Rights: © Springer Science+Business Media Dordrecht 2013
RMID: 0020137027
DOI: 10.1007/s10723-013-9282-3
Appears in Collections:Computer Science publications

Files in This Item:
There are no files associated with this item.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.