[[MatsuLab. Lecture Note]]
 
*大規模計算論 High Performance Computing [#g6337a39]
:Date|
Tuesday 10:45 - 12:15 (Period: 3-4) &br;
Friday 10:45 - 12:15 (Period: 3-4)
:Room|
Main building 119A (H119A)
:Contact|
|松岡教授 (Prof. S.Matsuoka) | matsu [at] is.titech.ac.jp |
|TA 長坂 (Y.Nagasaka)         | nagasaka.y.aa [at] m.titech.ac.jp |
&color(red,white){メーリングリストに追加しますので、至急TAまでメールを送ってください。Please email to Nagasaka (TA) as soon as possible  in order to add you to the mailing list.};

**目次 [#j532edb8]
#contents

**休講予定日 Lecture Cancelled [#cb04d351]
11/14 (Tue), 11/17 (Fri)

**授業概要と参考資料 Guidance and References [#ha180204]
-ガイダンス資料/Guidance &ref("hpc2017_guidance.pdf");

**発表スケジュール Schedule [#n9a6672a]
&color(red,white){暫定的な割り当ては以下の通りですが、都合が悪い場合はTAまで希望日をメールしてください。};
|CENTER:|CENTER:|CENTER:|CENTER:|LEFT:|c
||Date|Presenter|Slides|Paper|
| 1 | 09/26 | (Guidance) |  |  |
| 2 | 09/29 | Lecture | &ref(GTC2016_RNN_Performance.pptx,,,RNN,);, &ref(pascal-DL.pptx,,,DL); |  |
| 3 | 10/03 | Matsumura | &ref("hpc.pdf"); | &ref("BlockMomentumSGD.pdf"); |
| 4 | 10/10 | Tsuchikawa | &ref("HPC_2.pdf"); | &ref("s-caffe.pdf"); |
| 5 | 10/13 | Zixuan | &ref("16M58336_ZhouZixuan.pdf"); | &ref("GeePS.pdf"); |
| 6 | 10/17 | Barton | &ref("C-Brain presentation.pdf"); | &ref("C-Brain paper.pdf"); |
| 7 | 10/20 | Haoyu | &ref("DGX-1_KNL_presentation.pptx"); | &ref("DGX-1_KNL_presentation.pdf"); |
| 8 | 10/24 | Yashima | &ref("HPC_presentation.pdf"); | &ref("FireCaffe.pdf"); |
| 9 | 10/27 %%(2)%% | Yi | &ref("hpc presentation.pdf"); | &ref("osdi14-paper-chilimbi.pdf"); |
| 10 | 10/31 %%(2)%% | Deshmukh | &ref("HPC presentation 30 october 2017.pdf"); | &ref(Training large scale deep neural networks on the intel Xeon Phi many-core coprocessor.pdf); |
| 11 | 11/03 | Sun | &ref("17M38236.pdf"); | &ref("Distributed_Training_of_Deep_Neural_Networks_Theoretical_and_Pra ctical_Limits_of_Parallel_Scalability.pdf"); |
|  |  | Duan | &ref(HPCPresentation.pdf); | &ref(K-means.pdf); |
| 12 | 11/07 | Erum | &ref("116119.pdf"); | &ref("116124.pdf"); |
|  |  | Jun | &ref("hpc2017.pdf"); | &ref("kim17b.pdf"); |
| 13 | 11/10 | Chenwu | &ref("20171110_hpc17.pdf"); | &ref("p1355-yan.pdf"); |
|  |  | Maurya | &ref("HPC_Presentation_20171110.pdf"); | &ref("nn with few multiplications.pdf"); &ref("BinaryConnect.pdf"); |
| 14 | 11/21 | Hwang | &ref("TPU_slides.pdf"); | &ref("TPU_paper.pdf"); |
|  |  | Sakurai | &ref("hpc_sakurai.pdf"); | &ref("DCatch.pdf"); |
| 15 | 11/24 (2) | Ky & Beaudoin |  |  |
| 15 | 11/24 | Ky | &ref("Hpc171123_presentation.pdf"); | &ref("Evolving_deep_neural_networks_a_new_prospect.pdf"); |
| |  | Beaudoin | &ref("CD-DNN-HMMs_Beaudoin.pdf"); | &ref("Context-Dependent_Pre-Trained_Deep_Neura.pdf"); |



** 選択済み論文リスト Selected Papers List [#xf8bcff6]
- Scalable Training of Deep Learning Machines by Incremental Block Training with Intra-block Parallel Optimization and Blockwise Model-Update Filtering (ICASSP-2016)
- Scalable Distributed DNN Training Using Commodity GPU Cloud Computing (INTERSPEECH 2015)
- Scalable and Sustainable Deep Learning via Randomized Hashing (KDD 2017)
- S-Caffe: Co-designing MPI Runtimes and Caffe for Scalable Deep Learning on Modern GPU Clusters (PPoPP17)
- DLAU: A Scalable Deep Learning Accelerator Unit on FPGA(IEEE TCAD, 2017)
- GeePS: scalable deep learning on distributed GPUs with a GPU-specialized parameter server (EuroSys '16)
- Deep learning with COTS HPC systems (ICML, 2013)
- Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks (FPGA'15)
- C-Brain: A deep learning accelerator that tames the diversity of CNNs through adaptive data-level parallelization (DAC'16)
- DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices (IPSN 2016)
- A Parallel Computing Platform for Training Large Scale Neural Networks (IEEE Big Data, 2013)
- Large Scale Distributed Deep Networks (NIPS'12)
- Scaling Deep Learning Workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing (IPDPSW2017)
- Efficient and Scalable Multi-Source Streaming Broadcast on GPU Clusters for Deep Learning (ICPP2017)
- FireCaffe: near-linear acceleration of deep neural network training on compute clusters (CVPR, 2016)
- Big Data with Cloud Computing : an insight on the computing environment, MapReduce, and programming frameworks (Journal Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery)
- Project Adam: Building an Efficient and Scalable Deep Learning Training System (OSDI'14)
- Efficient Machine Learning for Big Data: A Review (Journal Big Data Research)
- In-Datacenter Performance Analysis of a Tensor Processing Unit​ TM (ISCA2017)
- Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks (FPGA 2016)
- Snowflake: An efficient hardware accelerator for convolutional neural networks (ISCAS 2017)
- Modeling Scalability of Distributed Machine Learning (ICDE2017)
- Distributed training of deep neural networks: theoretical and practical limits of parallel scalability (MLHPC 2016)
- Evaluation of Deep Learning Frameworks Over Different HPC Architectures (ICDCS, 2017)
- Efficient Large Message Broadcast using NCCL and CUDA-Aware MPI for Deep Learning (EuroMPI2016)
- Training large scale deep neural networks on the intel Xeon Phi many-core coprocessor (IPDPSW 2014)
- A Software Technique to Enhance Register Utilization of Convolutional Neural Networks on GPGPUs (ICASI 2017)
- Megalloc: Fast Distributed Memory Allocator for NVM-based Cluster (NAS, 2017)
- Mimir: Memory-Efficient and Scalable MapReduce for Large Supercomputing Systems (IPDPS, 2017)
- Data-Intensive Supercomputing in the Cloud: Global Analytics for Satellite Imagery (DataCloud, 2016)
- A Parallel FastTrack Data Race Detection on Multi-core Systems (IPDPS, 2017)
- Efficient Data Race Detection for Distributed Memory Parallel Programs (SC'11, 2011)
- D-Catch: Automatically Detecting Distributed Concurrency Bugs in Cloud Systems (ASPLOS, 2017)
- Optimized big data K-means clustering using MapReduce (The Journal of Supercomputing, 2014)
- An efficient K-means clustering algorithm on MapReduce[C]//International Conference on Database Systems for Advanced Applications (Springer, Cham, 2014)
- Accelerating K-Means clustering with parallel implementations and GPU computing (HPEC, 2015)
- Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization (NIPS, 2017)
- SplitNet: Learning to Semantically Split Deep Networks for Parameter Reduction and Model Parallelization (ICML, 2017)
- Traffic Flow Prediction With Big Data: A Deep Learning Approach (IEEE Transactions on Intelligent Transportation Systems, 2014)
- Improving the speed of neural networks on CPUs (NIPS, 2011)
- Performance Modeling and Scalability Optimization of Distributed Deep Learning Systems (KDD'15, 2015)
- Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism (ISCA'17, 2017)

//**期末レポート Report
//- &color(red,white){期限 Due date: 02/17 (Extended)};
//- Summarize the general topic covering and including ALL THREE PAPERS regarding the state of the art in HPC and Big Data convergence.
//- It should be 10 pages in [[IEEE conference paper format>http://www.ieee.org/conferences_events/conferences/publishing/templates.html]]
//- Please submit it to TA by email &color(red,white){(NOT mailing list)};

**リンク Links [#tdc564e8]
-[[ACM/IEEE Supercomputing>http://www.supercomp.org]]
-[[IEEE IPDPS>http://www.ipdps.org]]
-[[IEEE HPDC>http://www.hpdc.org/]]
-[[ACM International Conference on Supercomputing (ICS)>http://www.ics-conference.org/]]
-[[ISC>http://www.isc-events.com/]]
-[[IEEE Cluster Computing>http://www.clustercomp.org/]]
-[[IEEE/ACM Grid Computing>http://www.gridcomputing.org/]]
-[[IEEE/ACM CCGrid>http://www.buyya.com/ccgrid/]]
-[[IEEE Big Data>http://cci.drexel.edu/bigdata/bigdata2015/]]
-[[CiteSeer.IST>http://citeseer.ist.psu.edu]]
-[[Google Scholar>http://scholar.google.com]]
-[[Windows Live Academic>http://academic.live.com]]
-[[The ACM Degital Library>http://dl.acm.org/]]

トップ   編集 差分 バックアップ 添付 複製 名前変更 リロード   新規 一覧 単語検索 最終更新   ヘルプ   最終更新のRSS