論文 (3本) が採択されました - SC11

2011.07.07

 投稿しておりました丸山 直也、Leonardo Bautista-Gomezと、本学 青木研究室 下川辺  隆史氏 の論文が、SC11 (High Performance Computing, Networking, Storage and Analysis) に採択されました。

 

著者:Naoya Maruyama, Tatsuo Nomura, Kento Sato and Satoshi Matsuoka
タイトル:Physis: An Implicitly Parallel Programming Model for Stencil Computations on Large-Scale GPU-Accelerated Supercomputers

概要:

This paper proposes a compiler-based programming framework that automatically translates user-written
structured grid code into scalable parallel implementation code for GPU-equipped clusters. T o enable such
automatic translations, we design a small set of declarative constructs that allow the user to express stencil
computations in a portable and implicitly parallel manner. Our framework translates the user-written code into
actual implementation code in CUDA for GPU acceleration and MPI for node-level paralleliz ation with
automatic optimizations such as computation and communication overlapping. We demonstrate the feasibility
of such automatic translations by implementing several structured grid applications in our framework.
Experimental results on the T SUBAME2.0 GPU-based supercomputer show that the performance is comparable

as hand-written code and good strong and weak scalability up to 256 GPUs

 

著者:Leonardo Bautista-Gomez Dimitri Komatitsch, Naoya Maruyama, Seiji Tsuboi, Franck Cappello and Satoshi Matsuoka
タイトル: FTI: high performance Fault Tolerance Interface for hybrid systems
概要:

Large scientific applications deployed on current petascale systems expend a significant amount of their execution time dumping checkpoint files to remote storage. New fault tolerant techniques will be critical to efficiently exploit post-petascale systems. In this work, we propose a low-overhead high-frequency multi-level checkpoint technique in which we integrate a highly-reliable topology-aware Reed-Solomon encoding in a three-level checkpoint scheme. We efficiently hide the encoding time using one Fault-Tolerance dedicated thread per node. We implement our technique in the Fault Tolerance Interface FTI. We evaluate the correctness of our performance model and conduct a study of the reliability of our library. To demonstrate the performance of FTI, we present a case study of theMw9.0 Tohoku Japan earthquake simulation with SPECFEM3D on TSUBAME2.0. We demonstrate a checkpoint overhead as low as 8% on sustained 0.1 Petaflops runs (1152 GPUs) while checkpointing

 

著者:Takashi Shimokawabe, Takayuki Aoki, Tomohiro Takaki, Akinori Yamanaka, Akira Nukada, Toshio Endo, Naoya Maruyama and Satoshi Matsuoka
タイトル: Peta-scale Phase-Field Simulation for Dendritic Solidification on the TSUBAME 2.0 Supercomputer
概要:

The mechanical properties of metal materials largely depend on the intrinsic microstructures in these materials.To develop engineering materials with expected properties, the prediction of the microstructural patterns in solidified metals is indispensable. The phase-field simulation is the most powerful methods to solve micro-scale dendritic growth during solidification in a binary alloy. To evaluate a realistic description of solidification, this simulation demands computation of a large number of complex nonlinear terms over fine-grained grid. Due to this heavy computational load, early work on simulating three-dimensional solidification with the phase-field method resulted in describing simple shapes. Our simulations have achieved sufficient largescale to obtain complex dendritic structures required in the material science. Our benchmarks on the GPU TSUBAME 2.0 supercomputer at the Tokyo Institute of Technology have demonstrated good weak scaling and achieved 1.017 PFlops in single precision for our largest simulation using 4000 GPUs

 

 

 

 

Copyright (c) 2010 Tokyo Institute of Technology. Matsuoka Labo. All Rights Reserved.