hpc2017 のバックアップソース(No.20)

[[MatsuLab. Lecture Note]]
 
*大規模計算論 High Performance Computing [#g6337a39]
:Date|
Tuesday 10:45 - 12:15 (Period: 3-4) &br;
Friday 10:45 - 12:15 (Period: 3-4)
:Room|
Main building 119A (H119A)
:Contact|
|松岡教授 (Prof. S.Matsuoka) | matsu [at] is.titech.ac.jp |
|TA 長坂 (Y.Nagasaka)         | nagasaka.y.aa [at] m.titech.ac.jp |
&color(red,white){メーリングリストに追加しますので、至急TAまでメールを送ってください。Please email to Nagasaka (TA) as soon as possible  in order to add you to the mailing list.};

**目次 [#j532edb8]
#contents

**休講予定日 Lecture Cancelled [#cb04d351]
11/14 (Tue), 11/17 (Fri)

**授業概要と参考資料 Guidance and References [#ha180204]
-ガイダンス資料/Guidance &ref("hpc2017_guidance.pdf");

**発表スケジュール Schedule [#n9a6672a]
&color(red,white){暫定的な割り当ては以下の通りですが、都合が悪い場合はTAまで希望日をメールしてください。};
|CENTER:|CENTER:|CENTER:|CENTER:|LEFT:|c
||Date|Presenter|Slides|Paper|
| 1 | 09/26 | (Guidance) |  |  |
| 2 | 09/29 | Lecture | &ref(GTC2016_RNN_Performance.pptx,,,RNN,);, &ref(pascal-DL.pptx,,,DL); |  |
| 3 | 10/03 | Matsumura | &ref("hpc.pdf"); | &ref("BlockMomentumSGD.pdf"); |
| 4 | 10/10 | Tsuchikawa | &ref("HPC_2.pdf"); | &ref("s-caffe.pdf"); |
| 5 | 10/13 | Zixuan | &ref("16M58336_ZhouZixuan.pdf"); | &ref("GeePS.pdf"); |
| 6 | 10/17 | Barton | &ref("C-Brain presentation.pdf"); | &ref("C-Brain paper.pdf"); |
| 7 | 10/20 | Haoyu | &ref("DGX-1_KNL_presentation.pptx"); | &ref("DGX-1_KNL_presentation.pdf"); |
| 8 | 10/24 | Yashima |  |  |
| 9 | 10/27 %%(2)%% | Yi |  |  |
| 10 | 10/31 (2) | Deshmukh |  |  |
| 11 | 11/03 (2) | Duan, Sun |  |  |
| 12 | 11/07 (2) | Erum, Jun |  |  |
| 13 | 11/10 (2) | Chenwu, Maurya |  |  |
| 14 | 11/21 (2) | Hwang & Sakurai |  |  |
| 15 | 11/24 (2) | Ky & Beaudoin |  |  |



** 選択済み論文リスト Selected Papers List [#xf8bcff6]
- Scalable Training of Deep Learning Machines by Incremental Block Training with Intra-block Parallel Optimization and Blockwise Model-Update Filtering (ICASSP-2016)
- Scalable Distributed DNN Training Using Commodity GPU Cloud Computing (INTERSPEECH 2015)
- Scalable and Sustainable Deep Learning via Randomized Hashing (KDD 2017)
- S-Caffe: Co-designing MPI Runtimes and Caffe for Scalable Deep Learning on Modern GPU Clusters (PPoPP17)
- DLAU: A Scalable Deep Learning Accelerator Unit on FPGA(IEEE TCAD, 2017)
- GeePS: scalable deep learning on distributed GPUs with a GPU-specialized parameter server (EuroSys '16)
- Deep learning with COTS HPC systems (ICML, 2013)
- Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks (FPGA'15)
- C-Brain: A deep learning accelerator that tames the diversity of CNNs through adaptive data-level parallelization (DAC'16)
- DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices (IPSN 2016)
- A Parallel Computing Platform for Training Large Scale Neural Networks (IEEE Big Data, 2013)
- Large Scale Distributed Deep Networks (NIPS'12)
- Scaling Deep Learning Workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing (IPDPSW2017)
- Efficient and Scalable Multi-Source Streaming Broadcast on GPU Clusters for Deep Learning (ICPP2017)
- FireCaffe: near-linear acceleration of deep neural network training on compute clusters (CVPR, 2016)
- Big Data with Cloud Computing : an insight on the computing environment, MapReduce, and programming frameworks (Journal Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery)
- Project Adam: Building an Efficient and Scalable Deep Learning Training System (OSDI'14)
- Efficient Machine Learning for Big Data: A Review (Journal Big Data Research)
- In-Datacenter Performance Analysis of a Tensor Processing Unit&#8203; TM (ISCA2017)
- Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks (FPGA 2016)
- Snowflake: An efficient hardware accelerator for convolutional neural networks (ISCAS 2017)
- Modeling Scalability of Distributed Machine Learning (ICDE2017)
- Distributed training of deep neural networks: theoretical and practical limits of parallel scalability (MLHPC 2016)
- Evaluation of Deep Learning Frameworks Over Different HPC Architectures (ICDCS, 2017)
- Efficient Large Message Broadcast using NCCL and CUDA-Aware MPI for Deep Learning (EuroMPI2016)
- Training large scale deep neural networks on the intel Xeon Phi many-core coprocessor (IPDPSW 2014)
- A Software Technique to Enhance Register Utilization of Convolutional Neural Networks on GPGPUs (ICASI 2017)

//**期末レポート Report
//- &color(red,white){期限 Due date: 02/17 (Extended)};
//- Summarize the general topic covering and including ALL THREE PAPERS regarding the state of the art in HPC and Big Data convergence.
//- It should be 10 pages in [[IEEE conference paper format>http://www.ieee.org/conferences_events/conferences/publishing/templates.html]]
//- Please submit it to TA by email &color(red,white){(NOT mailing list)};

**リンク Links [#tdc564e8]
-[[ACM/IEEE Supercomputing>http://www.supercomp.org]]
-[[IEEE IPDPS>http://www.ipdps.org]]
-[[IEEE HPDC>http://www.hpdc.org/]]
-[[ACM International Conference on Supercomputing (ICS)>http://www.ics-conference.org/]]
-[[ISC>http://www.isc-events.com/]]
-[[IEEE Cluster Computing>http://www.clustercomp.org/]]
-[[IEEE/ACM Grid Computing>http://www.gridcomputing.org/]]
-[[IEEE/ACM CCGrid>http://www.buyya.com/ccgrid/]]
-[[IEEE Big Data>http://cci.drexel.edu/bigdata/bigdata2015/]]
-[[CiteSeer.IST>http://citeseer.ist.psu.edu]]
-[[Google Scholar>http://scholar.google.com]]
-[[Windows Live Academic>http://academic.live.com]]
-[[The ACM Degital Library>http://dl.acm.org/]]