論文 (3本) が採択されました - SC12

2012.07.11

投稿しておりました額田 彰、佐藤 賢斗、藤澤 克樹氏 の論文が、SC12に採択されました。


  著者: Akira Nukada, Kento Sato, Satoshi Matsuoka
タイトル: Scalable Multi-GPU 3-D FFT for TSUBAME 2.0 Supercomputer
  概要:

For scalable 3-D FFT computation using multiple GPUs, efficient all-to-all communication between GPUs is the most important factor in good performance. Implementations with point-to-point MPI library functions and CUDA memory copy APIs typically exhibit very large overheads especially for small message sizes in all-to-all communications between many nodes. We propose several schemes to minimize the overheads, including employment of lower-level API of InfiniBand to effectively overlap intra- and inter-node communication along with computation, as well as auto-tuning strategies to control scheduling and determine rail assignments. As a result we achieve very good strong scalability as well as good performance, up to 4.8TFLOPS using 256 nodes of TSUBAME 2.0 Supercomputer (768 GPUs), several times faster than reported in comparable work.

  

  著者: Kento Sato, Adam Moody, Kathryn Mohror, Todd Gamblin, Bronis R.de Supinski, Naoya Maruyama, Satoshi Matsuoka
タイトル: Design and Modeling of a Non-blocking Checkpointing System
  概要:

As high performance computing (HPC) systems move towards exascale, the resiliency of these systems is becoming increasingly important. Typically, applications periodically save their state in checkpoint files to mitigate losses due to failures. However, checkpointing on large scale systems can incur unacceptably high overheads when transferring checkpoints to the parallel file system (PFS). As a result, as HPC systems grow larger, application execution cannot practically proceed, and the efficiency of HPC systems decreases. Our approach to solve this problem is to combine non-blocking checkpointing with multi-level checkpointing, where checkpoints are first cached on compute node-local storage and then asynchronously drained from the compute nodes to the PFS. In this paper, we present the design of our approach and a model describing its performance. Our experimental results show that the combination of non-blocking and multi-level checkpointing can achieve as much as 5.2 times more efficiency on future systems.

 

  著者: Katsuki Fujisawa, Toshio Endo, Hitoshi Sato, Makoto Yamashita, Satoshi Matsuoka, Maho Nakata
タイトル: High-Performance General Solver for Extremely Large-scale Semidefinite Programming Problems
  概要: 

Semidefinite Programming (SDP) is one of the most important problems in current research areas in optimization problems. It covers a wide range of applications such as combinatorial optimization, structural optimization, control theory, economics, quantum chemistry, sensor network location, data mining, etc. Solving extremely large-scale SDP problems has significant importance for the current and future applications of SDPs. In 1995, Fujisawa et al. started the SDPA Project aimed for solving large-scale SDP problems with numerical stability and accuracy. SDPA is one of pioneers' codes to solve general SDPs. SDPARA is a parallel version of SDPA on multiple processors and distributed memory, which replaces two major bottleneck parts (the generation of the Schur complement matrix and its Cholesky factorization) of SDPA by their parallel implementation. In particular, it has been successfully applied on combinatorial optimization and truss topology optimization, the new version of SDPARA(7.5.0-G) on a large-scale super computer called TSUBAME2.0 at Tokyo Institute of Technology has succeeded to solve the largest SDP problem which has over 1.48 million constraints and make a new world record. Our implementation has also achieved 533 TFlops in double precision for the large-scale Cholesky factorization using 2,720 CPUs and 4,080 GPUs.

Copyright (c) 2010 Tokyo Institute of Technology. Matsuoka Labo. All Rights Reserved.