Three Papers Accepted - SC11


 Three papers published by Naoya Maruyama, Leonardo Bautista-Gomez and Takashi Shimokawabe (Aoki Lab.) are accepted in SC11 (High Performance Computing, Networking, Storage and Analysis) 

AUTHORS:Naoya Maruyama, Tatsuo Nomura, Kento Sato and Satoshi Matsuoka
TITLE:Physis: An Implicitly Parallel Programming Model for Stencil Computations on Large-Scale GPU-Accelerated Supercomputers


This paper proposes a compiler-based programming framework that automatically translates user-written
structured grid code into scalable parallel implementation code for GPU-equipped clusters. T o enable such
automatic translations, we design a small set of declarative constructs that allow the user to express stencil
computations in a portable and implicitly parallel manner. Our framework translates the user-written code into
actual implementation code in CUDA for GPU acceleration and MPI for node-level paralleliz ation with
automatic optimizations such as computation and communication overlapping. We demonstrate the feasibility
of such automatic translations by implementing several structured grid applications in our framework.
Experimental results on the T SUBAME2.0 GPU-based supercomputer show that the performance is comparable
as hand-written code and good strong and weak scalability up to 256 GPUs


AUTHORS:Leonardo Bautista-Gomez Dimitri Komatitsch, Naoya Maruyama, Seiji Tsuboi, Franck Cappello and Satoshi Matsuoka
TITLE: FTI: high performance Fault Tolerance Interface for hybrid systems

Large scientific applications deployed on current petascale systems expend a significant amount of their execution time dumping checkpoint files to remote storage. New fault tolerant techniques will be critical to efficiently exploit post-petascale systems. In this work, we propose a low-overhead high-frequency multi-level checkpoint technique in which we integrate a highly-reliable topology-aware Reed-Solomon encoding in a three-level checkpoint scheme. We efficiently hide the encoding time using one Fault-Tolerance dedicated thread per node. We implement our technique in the Fault Tolerance Interface FTI. We evaluate the correctness of our performance model and conduct a study of the reliability of our library. To demonstrate the performance of FTI, we present a case study of theMw9.0 Tohoku Japan earthquake simulation with SPECFEM3D on TSUBAME2.0. We demonstrate a checkpoint overhead as low as 8% on sustained 0.1 Petaflops runs (1152 GPUs) while checkpointing


AUTHORS:Takashi Shimokawabe, Takayuki Aoki, Tomohiro Takaki, Akinori Yamanaka, Akira Nukada, Toshio Endo, Naoya Maruyama and Satoshi Matsuoka
TITLE: Peta-scale Phase-Field Simulation for Dendritic Solidification on the TSUBAME 2.0 Supercomputer

The mechanical properties of metal materials largely depend on the intrinsic microstructures in these materials.To develop engineering materials with expected properties, the prediction of the microstructural patterns in solidified metals is indispensable. The phase-field simulation is the most powerful methods to solve micro-scale dendritic growth during solidification in a binary alloy. To evaluate a realistic description of solidification, this simulation demands computation of a large number of complex nonlinear terms over fine-grained grid. Due to this heavy computational load, early work on simulating three-dimensional solidification with the phase-field method resulted in describing simple shapes. Our simulations have achieved sufficient largescale to obtain complex dendritic structures required in the material science. Our benchmarks on the GPU TSUBAME 2.0 supercomputer at the Tokyo Institute of Technology have demonstrated good weak scaling and achieved 1.017 PFlops in single precision for our largest simulation using 4000 GPUs





Copyright (c) 2010 Tokyo Institute of Technology. Matsuoka Labo. All Rights Reserved.