Publications (in English)

Publications in Japanese

Refereed Conference/Workshop Papers

  • Takashi Shimokawabe, Takayuki Aoki, Tomohiro Takaki, Akinori Yamanaka, Akira Nukada, Toshio Endo, Naoya Maruyama, Satoshi Matsuoka. Peta-scale Phase-Field Simulation for Dendritic Solidification on the TSUBAME 2.0 Supercomputer. In Proceedings of IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC11), Seattle, November 2011.

    ACM Gordon Bell Prize Special Achievements in Scalability and Time-to-Solution

  • Massimo Bernaschi, Mauro Bisson, Toshio Endo, Massimiliano Fatica, Satoshi Matsuoka, Simone Melchionna, Sauro Succi. Petaflop Biofluidics Simulations On A Two Million-Core System. In Proceedings of IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC11), Seattle, November 2011.

  • Shiqiao Du, Takuro Udagawa, Toshio Endo and Masakazu Sekijima. Molecular Dynamics Simulation of a Biomolecule with High Speed, Low Power and Accuracy Using GPU-Accelerated TSUBAME2.0 Supercomputer. In Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2011), Xi'an, October 2011.

  • Takashi Shimokawabe, Takayuki Aoki, Chiashi Muroi, Junichi Ishida, Kohei Kawano, Toshio Endo, Akira Nukada, Naoya Maruyama, Satoshi Matsuoka. An 80-Fold Speedup, 15.0 TFlops, Full GPU Acceleration of Non-Hydrostatic Weather Model ASUCA Production Code. In Proceedings of IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC10), pp.1-11, New Orleans, November 2010.

  • Hitoshi Nagasaka, Naoya Maruyama, Akira Nukada, Toshio Endo, and Satoshi Matsuoka, Statistical Power Modeling of GPU Kernels Using Performance Counters. Proceedings of International Green Computing Conference (IGCC'10), pp. 115--122, Chicago, IL, USA, Aug 2010.

  • Toshio Endo, Akira Nukada, Satoshi Matsuoka and Naoya Maruyama. Linpack Evaluation on a Supercomputer with Heterogeneous Accelerators. In Proceedings of IEEE International Parallel & Distributed Processing Symposium (IPDPS 2010), Atlanta, pp.1-8, April 2010. [paper] [slides]

  • Tomoaki Hamano, Toshio Endo and Satoshi Matsuoka. Power-Aware Dynamic Task Scheduling for Heterogeneous Accelerated Clusters. In Proceedings of The Fourth Workshop on High-Performance, Power-Aware Computing (HPPAC), in conjunction with IPDPS 2009, pp.1-8, May 2009.

  • Hitoshi Sato, Satoshi Matsuoka and Toshio Endo. File Clustering Based Replication Algorithm in a Grid Environment. In Proceedings of IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2009), pp.204-211, May 2009.

  • Hideyuki Jitsumoto, Toshio Endo and Satoshi Matsuoka. Environmental-Aware Optimization of MPI Checkpointing Intervals . In Proceedings of HPC ASIA 2009, pp. 285--292, March 2009.

  • Akira Nukada, Yasuhiko Ogata, Toshio Endo and Satoshi Matsuoka. Bandwidth Intensive 3-D FFT kernel for GPUs using CUDA . In Proceedings of IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC08), pp.1-11, November 2008.

  • Hitoshi Sato, Satoshi Matsuoka, Toshio Endo and Naoya Maruyama. Access-Pattern and Bandwidth Aware File Replication Algorithm in a Grid Environment. In Proceedings of IEEE/ACM International Conference on Grid Computing (Grid 2008), pp.250-257, October 2008.

  • Toshio Endo and Satoshi Matsuoka. Massive Supercomputing Coping with Heterogeneity of Modern Accelerators . In Proceedings of IEEE International Parallel & Distributed Processing Symposium (IPDPS 2008), pp.1-10, April 2008. [paper] [slides]

  • Shin'ichiro Takizawa, Toshio Endo and Satoshi Matsuoka. Locality Aware MPI Communication on a Commodity Opto-Electronic Hybrid Network. In Proceedings of Workshop on Large-Scale Parallel Processing (LSPP), in conjunction with IEEE IPDPS 2008, pp.1-8, April 2008.

  • Yasuhiko Ogata, Toshio Endo, Naoya Maruyama and Satoshi Matsuoka. An Efficient, Model-Based CPU-GPU Heterogeneous FFT Library. In Proceedings of 17th International Heterogeneity in Computing Workshop (HCW '08), in conjunction with IEEE IPDPS 2008, pp.1-10, April 2008.

  • Yuto Hosogaya, Toshio Endo and Satoshi Matsuoka. Performance Evaluation of Parallel Applications on Next Generation Memory Architecture with Power-Aware Paging Method. In Proceedings of The Fourth Workshop on High-Performance, Power-Aware Computing (HPPAC), in conjunction with IPDPS 2008, pp.1-8, April 2008.

  • Tatsuhiro Chiba, Toshio Endo and Satoshi Matsuoka. High-Performance MPI Broadcast Algorithm for Grid Environments Utilizing Multi-lane NICs. In Proceedings of IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid2007), pp.487--494, May 2007.

  • Hideyuki Jitsumoto, Toshio Endo and Satoshi Matsuoka. ABARIS: An Adaptable Fault Detection/Recovery Component Framework for MPIs. In Proceedings of 12th IEEE Workshop on Dependable Parallel, Distributed and Network-Centric Systems (DPDNS '07), in conjunction with IPDPS 2007, pp.1-8, March 2007.

  • Toshio Endo and Kenjiro Taura. Highly Latency Tolerant Gaussian Elimination. In Proceedings of IEEE/ACM International Workshop on Grid Computing (Grid2005), pp. 91--98, November 2005. [paper] [slides]

  • Toshio Endo, Kenji Kaneda, Kenjiro Taura and Akinori Yonezawa. High Performance LU Factorization for Non-dedicated Clusters. In Proceedings of IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid2004), pp. 678--685, April 2004. [paper] [slides]

  • Kenjiro Taura, Toshio Endo, Kenji Kaneda, and Akinori Yonezawa. Phoenix : a Parallel Programming Model for Accommodating Dynamically Joining/Leaving Resources. In Proceedings of ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '03), pp.216-229, June 2003.

  • Toshio Endo and Kenjiro Taura. Reducing Pause Time of Conservative Collectors. In Proceedings of ACM SIGPLAN International Symposium on Memory Management (ISMM2002), Berlin, pp.119-131, June 2002. [paper] [slides]

  • Toshio Endo, Kenjiro Taura and Akinori Yonezawa. Predicting Scalability of Parallel Garbage Collectors on Shared Memory Multiprocessors. In Proceedings of 15th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2001), San Francisco, pp.1-6, April 2001. [paper]

  • Toshio Endo, Kenjiro Taura and Akinori Yonezawa. A Scalable Mark-Sweep Garbage Collector on Large-Scale Shared-Memory Machines. In Proceedings of ACM/IEEE Conference on Supercomputing (High Performance Networking and Computing) (SC97), San Jose, 14pages, November 1997. [paper]

    Theses for Degrees

  • Toshio Endo. Scalable Dynamic Memory Management Module on Shared Memory Multiprocessors. Ph.D Thesis, Department of Information Science, Faculty of Science, University of Tokyo. June 2001. [paper] [slides in japanese]
    NOTE: The paper is written in English, but title page and abstract page include Japanese characters.

  • Toshio Endo. A Scalable Mark-Sweep Garbage Collector on Large-Scale Shared-Memory Machines. Master thesis, Department of Information Science, The University of Tokyo, February 1998. [paper]
    NOTE: The paper is written in English, but title page and abstract page include Japanese characters.

  • Toshio Endo. A Methodology for Constructing a Portable Garbage Collector on Parallel Machines. Senior thesis, Department of Information Science, The University of Tokyo, February 1996.

    Unrefereed Papers

  • Irina Demeshko¡¤Satoshi Matsuoka, Toshio Endo. GPU-based approach for elastic-plastic deformation simulation, Summer United Workshops on Parallel, Distributed and Cooperative Processing (SWoPP 2011), IPSJ SIG Technical Report, 2011-HPC-130 No.12, 7 pages, Kagoshima, August 2011.

  • Nguyen Toan, Hideyuki Jitsumoto, Naoya Maruyama, Tatsuo Nomura, Toshio Endo, Satoshi Matsuoka. MPI-CUDA Applications Checkpointing, Summer United Workshops on Parallel, Distributed and Cooperative Processing (SWoPP 2010), IPSJ SIG Technical Report, 2010-HPC-126 No.18, 7 pages, Kanazawa, August 2010.

    Invited Talks

  • Toshio Endo. TSUBAME2.0: A Petascale GPU-accelerated Supercomputer, The Second International Conference on Networking and Computing (ICNC'11), Tutorial, Osaka, December 2011.

  • Toshio Endo. Supercomputing on The TSUBAME GPU-Accelerated Cluster, CSIRO GPU Cluster Workshop, Melbourne, June 2009.

    Posters and Other Presentation

  • Hideyuki Jitsumoto, Toshio Endo, Satoshi Matsuoka. Environmental-Aware Optimization of MPI Checkpointing Intervals, IEEE International Conference on Cluster Computing (Cluster 2008), poster session, September 2008.

  • Toshio Endo, Satoshi Matsuoka. A Methodology for Coping with Heterogeneity of Modern Accelerators on a Massive Supercomputing Scale, ACM/IEEE Conference on Supercomputing (High Performance Computing, Networking, Storage and Analysis) (SC07), poster session, November 2007. [poster]

  • Tatsuhiro Chiba, Toshio Endo, Satoshi Matsuoka. High Performance MPI Broadcast Algorithm for Grid Environments with Long-fat Pipes, Korea-Japan Grid Symposium 2007, Sapporo, Japan, July 2007.

  • Kenjiro Taura, Kenji Kaneda, Toshio Endo, Daisaku Yokoyama, Yoshikazu Kamoshida, Takashi Chikayama, Akinori Yonezawa. Developing Grid Applications using Phoenix, IEEE/ACM Symposium on cluster computing and the Grid (CCGrid2003), posters & reseach demos session, May 2003.

  • Toshio Endo, Kenjiro Taura, Akinori Yonezawa. A Garbage Collector with Dynamic Load Balancing on Shared-Memory Machines and Its Evaluation. ACM SIGPLAN Conference on Programming Language Design & Implementation (PLDI'97), student poster session, May 1997.

    Publications in Japanese
    return to TOP