[[MatsuLab. Lecture Note]] *ハイパフォーマンスコンピューティング [#qd95a3e2] :日時|月曜日 10:45〜12:15(3,4限) :場所|西8号館 832号室 :連絡| |松岡教授 (Prof. S.Matsuoka) | matsu あっと is.| |TA 岩渕 (K.Iwabuchi) |iwabuchi.k.ab あっと m.titech.ac.jp | &color(red,white){メーリングリストに追加しますので、TA岩渕までメールを送ってください。Please email to iwabuchi (TA) in order to add you to the mailing list.}; **目次 [#r3e9a72b] #contents **休講予定日 Lecture Cancelled [#qd8018ac] 11/17 **授業概要と参考資料 Guidance and References [#d0dd7aa4] -ガイダンス資料/Guidance &ref("HPChadout0-1.pdf"); -[[Reference1 >http://cs.nju.edu.cn/distribute-systems/lecture-notes/c9.ppt]] -[[Reference2 >http://www.christian-engelmann.info/publications/engelmann13resilience.ppt.pdf]] **発表スケジュール Schedule [#t8026a8e] |CENTER:|CENTER:|CENTER:|CENTER:|LEFT:|c |回|日付|担当|発表資料|文献| |第2回|10/27(月)|佐々木| &ref("hpc141027-Sasaki-ver2.pptx"); (update after this lecture)| &ref("MCREngine.pdf"); | |第3回|11/6(木) |佐々木|同上|同上| |第4回|11/10(月)|社本|&ref("hpc14_shamoto_1110.pdf");|&ref("sc12-redmpi.pdf");| |第5回|11/26(水) |Shweta|&ref("HPC2014_11:26_Shweta.pdf");|&ref("Checkpointing Orchestration.pdf");| |第6回|12/1(月)|Jian|&ref("HPC2014_Jian_1201.pdf");|&ref("ICS12_UniFI.pdf");| |第7回|12/8(月)|Jian|同上|同上| |第8回|12/15(月)|Mateusz||&ref("ICPP2014_rollback-avoidance-modeling.pdf");| |第9回|12/22(月)|長坂 侑亮|&ref("hpc14_nagasaka_ver2.pdf");|&ref("dsn12_sparse.pdf");| |第10回|1/5(月)|矢野 雅大|&ref("hauberk.pdf");|&ref("yim_ipdps_hauberk.pdf");| |第11回|1/15(木)|鈴木 太一郎|&ref("HPC_suzuki.pdf");|&ref("20150115_paper.pdf");| |第12回|1/19(月) |大村 裕|&ref("HPC2014_20150119.pdf");|&ref("p707-costa.pdf");| |第13回|1/26(月) |太田尚博|&ref("HPC_20150126.pdf");|&ref("core.pdf");| |第14回|2/2(月)(場所・時間は通常通り)|都筑 一希|&ref("HPC14_tsuzuku.pdf");|| |第14回|2/2(月)(場所・時間は通常通り)|都筑 一希|&ref("HPC14_tsuzuku.pdf");|&ref("Energy Consumption of Resilience Mechanisms in Large Scale Systems.pdf");| ** 禁止リスト Inhibited List [#n4428258] -"McrEngine: a scalable checkpointing system using data-aware aggregation and compression" -"Reliability-Aware Approach: An Incremental Checkpoint/Restart Model in HPC Environments" -"FALCON - A System for Reliable Checkpoint Recovery in Shared Grid Environments" -"Detection and Correction of Silent Data Corruption for Large-Scale High-Performance Computing" -"A Proactive Fault Tolerance Approach to High Performance Computing (HPC) in the Cloud" -"Checkpoint-Restart for a Network of Virtual Machines" -"Checkpointing Orchestration: Toward a Scalable HPC Fault-Tolerant Environment" -"UniFI: leveraging non-volatile memories for a unified fault tolerance and idle power management technique" -"Transparent checkpoint-restart over infiniband" -"Feliss: Flexible distributed computing framework with light-weight checkpointing" -"Parallel Reduction to Hessenberg Form with Algorithm-Based Fault Tolerance" -"Online-ABFT: An Online Algorithm Based Fault Tolerance Scheme for Soft Error Detection in Iterative Methods" -"Algorithmic Approaches to Low Overhead Fault Detection for Sparse Linear Algebra" //**質問者リスト [#m8a6621e] //|CENTER:|CENTER:|CENTER:|CENTER:|LEFT:|c //|回|日付|質問者| //|第1回|10/20(月)|| //|第2回|10/27(月)|| //|第3回|11/6(木) || //|第4回|11/10(月)|| //|第5回|11/26(水) || //|第6回|12/1(月)|| //|第7回|12/8(月)|| //|第8回|12/15(月)|| //|第9回|12/22(月)|| //|第10回|1/5(月)|| //|第11回|1/15(木)|| //|第12回|1/19(月)|| //|第13回|1/26(月)|| **リンク Links [#g5ee5573] -[[ACM/IEEE Supercomputing>http://www.supercomp.org]] -[[IEEE IPDPS>http://www.ipdps.org]] -[[IEEE HPDC>http://www.hpdc.org/]] -[[ACM International Conference on Supercomputing (ICS)>http://www.ics-conference.org/]] -[[ISC>http://www.isc-events.com/]] -[[IEEE Cluster Computing>http://www.clustercomp.org/]] -[[IEEE/ACM Grid Computing>http://www.gridcomputing.org/]] -[[IEEE/ACM CCGrid>http://www.buyya.com/ccgrid/]] -[[IEEE Big Data>http://cci.drexel.edu/bigdata/bigdata2013/]] -[[CiteSeer.IST>http://citeseer.ist.psu.edu]] -[[Google Scholar>http://scholar.google.com]] -[[Windows Live Academic>http://academic.live.com]] -[[The ACM Degital Library>http://portal.acm.org/portal.cfm?coll=portal&dl=ACM&CFID=38583716&CFTOKEN=29210274]]