MatsuLab. Lecture Note
ハイパフォーマンスコンピューティング †
- 日時
- 月曜日 10:45〜12:15(3,4限)
- 場所
- 西8号館 832号室
- 連絡
松岡教授 (Prof. S.Matsuoka) | matsu あっと is. | TA 岩渕 (K.Iwabuchi) | iwabuchi.k.ab あっと m.titech.ac.jp |
メーリングリストに追加しますので、TA岩渕までメールを送ってください。Please email to iwabuchi (TA) in order to add you to the mailing list.
目次 †
休講予定日 Lecture Cancelled †
11/17
授業概要と参考資料 Guidance and References †
発表スケジュール Schedule †
禁止リスト Inhibited List †
- "McrEngine?: a scalable checkpointing system using data-aware aggregation and compression"
- "Reliability-Aware Approach: An Incremental Checkpoint/Restart Model in HPC Environments"
- "FALCON - A System for Reliable Checkpoint Recovery in Shared Grid
Environments"
- "Detection and Correction of Silent Data Corruption for Large-Scale High-Performance Computing"
- "A Proactive Fault Tolerance Approach to High Performance Computing (HPC) in the Cloud"
- "Checkpoint-Restart for a Network of Virtual Machines"
- "Checkpointing Orchestration: Toward a Scalable HPC Fault-Tolerant Environment"
- "UniFI: leveraging non-volatile memories for a unified fault tolerance and idle power management technique"
- "Transparent checkpoint-restart over infiniband"
- "Feliss: Flexible distributed computing framework with light-weight checkpointing"
- "Parallel Reduction to Hessenberg Form with Algorithm-Based Fault Tolerance"
- "Online-ABFT: An Online Algorithm Based Fault Tolerance Scheme for Soft Error Detection in Iterative Methods"
- "Algorithmic Approaches to Low Overhead Fault Detection for Sparse Linear Algebra"
リンク Links †
|