Adelaide Research and Scholarship
Please use this identifier to cite or link to this item:
|Title: ||Communication performance measurement and analysis on commodity clusters.|
|Author: ||Abdul Hamid, Nor Asilah Wati|
|Issue Date: ||2008|
|School/Discipline: ||School of Computer Science|
|Abstract: ||Cluster computers have become the dominant architecture in high-performance computing. Parallel programs on these computers are mostly written using the Message Passing Interface (MPI) standard, so the communication performance of the MPI library for a cluster is very important. This thesis investigates several different aspects of performance analysis for MPI libraries, on both distributed memory clusters and shared memory parallel computers.
The performance evaluation was done using MPIBench, a new MPI benchmark program that provides some useful new functionality compared to existing MPI benchmarks. Since there has been only limited previous use of MPIBench, some initial work was done on comparing MPIBench with other MPI benchmarks, and improving its functionality, reliability, portability and ease of use. This work included a detailed comparison of results from the Pallas MPI Benchmark (PMB), SKaMPI, Mpptest, MPBench and MPIBench on both distributed memory and shared memory parallel computers, which has not previously been done. This comparison showed that the results for some MPI routines were significantly different between the different benchmarks, particularly for the shared memory machine.
A comparison was done between Myrinet and Ethernet network performance on the same machine, an IBM Linux cluster with 128 dual processor nodes, using the MPICH MPI library. The analysis focused mainly on the scalability and variability of communication times for the different networks, making use of the capability of MPIBench to generate distributions of MPI communication times. The analysis provided an improved understanding of the effects of TCP retransmission timeouts on Ethernet networks.
This analysis showed anomalous results for some MPI routines. Further investigation showed that this is because MPICH uses different algorithms for small and large message sizes for some collective communication routines, and the message size where this changeover occurs is fixed, based on measurements using a cluster with a single processor per node. Experiments were done to measure the performance of the different algorithms, which demonstrated that for some MPI routines the optimal changeover points were very different between Myrinet and Ethernet networks and for 1 and 2 processors per node. Significant performance improvements can be made by allowing the changeover points to be tuned rather than fixed, particularly for commodity Ethernet networks and for clusters with more than 1 process per node. MPIBench was also used to analyse the MPI performance and scalability of a large ccNUMA shared memory machine, an SGI Altix 3000 with 160 processors. The results were compared with a high-end cluster, an AlphaServer SC with Quadrics QsNet interconnect. For most MPI routines the Altix showed significantly better performance, particularly when non-buffered copy was used. MPIBench proved to be a very capable tool for analyzing MPI performance in a variety of different situations.|
|Advisor: ||Coddington, Paul|
|Dissertation Note: ||Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 2008|
|Subject: ||High performance computing|
Electronic data processing -- Distributed processing
|Provenance: ||Copyright material removed from digital thesis. See print copy in University of Adelaide Library for full text.|
|Call number: ||09PH A166|
|Description (link): ||http://proxy.library.adelaide.edu.au/login?url=http://library.adelaide.edu.au/cgi-bin/Pwebrecon.cgi?BBID=1331421|
|Appears in Collections:||Research Theses|
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.