[[MatsuLab. Lecture Note]] *大規模計算論 High Performance Computing [#g6337a39] :Date| Tuesday 10:45 - 12:15 (Period: 3-4) &br; Friday 10:45 - 12:15 (Period: 3-4) :Room| Main building 119A (H119A) :Contact| |松岡教授 (Prof. S.Matsuoka) | matsu [at] is.titech.ac.jp | |TA 長坂 (Y.Nagasaka) | nagasaka.y.aa [at] m.titech.ac.jp | &color(red,white){メーリングリストに追加しますので、至急TAまでメールを送ってください。Please email to Nagasaka (TA) as soon as possible in order to add you to the mailing list.}; **目次 [#j532edb8] #contents **休講予定日 Lecture Cancelled [#cb04d351] 11/14 (Tue), 11/17 (Fri) **授業概要と参考資料 Guidance and References [#ha180204] -ガイダンス資料/Guidance &ref("hpc2017_guidance.pdf"); **発表スケジュール Schedule [#n9a6672a] &color(red,white){暫定的な割り当ては以下の通りですが、都合が悪い場合はTAまで希望日をメールしてください。}; |CENTER:|CENTER:|CENTER:|CENTER:|LEFT:|c ||Date|Presenter|Slides|Paper| | 1 | 09/26 | (Guidance) | | | | 2 | 09/29 | Lecture | &ref(GTC2016_RNN_Performance.pptx,,,RNN,);, &ref(pascal-DL.pptx,,,DL); | | | 3 | 10/03 | Matsumura | &ref("hpc.pdf"); | &ref("BlockMomentumSGD.pdf"); | | 4 | 10/10 | Tsuchikawa | &ref("HPC_2.pdf"); | &ref("s-caffe.pdf"); | | 5 | 10/13 | Zixuan | &ref("16M58336_ZhouZixuan.pdf"); | &ref("GeePS.pdf"); | | 6 | 10/17 | Barton | &ref("C-Brain presentation.pdf"); | &ref("C-Brain paper.pdf"); | | 7 | 10/20 | Haoyu | &ref("DGX-1_KNL_presentation.pptx"); | &ref("DGX-1_KNL_presentation.pdf"); | | 8 | 10/24 | Yashima | &ref("HPC_presentation.pdf"); | &ref("FireCaffe.pdf"); | | 9 | 10/27 %%(2)%% | Yi | &ref("hpc presentation.pdf"); | &ref("osdi14-paper-chilimbi.pdf"); | | 10 | 10/31 %%(2)%% | Deshmukh | &ref("HPC presentation 30 october 2017.pdf"); | &ref(Training large scale deep neural networks on the intel Xeon Phi many-core coprocessor.pdf); | | 11 | 11/03 | Sun | &ref("17M38236.pdf"); | &ref("Distributed_Training_of_Deep_Neural_Networks_Theoretical_and_Pra ctical_Limits_of_Parallel_Scalability.pdf"); | | | | Duan | &ref(HPCPresentation.pdf); | &ref(K-means.pdf); | | 12 | 11/07 | Erum | &ref("116119.pdf"); | &ref("116124.pdf"); | | | | Jun | &ref("hpc2017.pdf"); | &ref("kim17b.pdf"); | | 13 | 11/10 | Chenwu | &ref("20171110_hpc17.pdf"); | &ref("p1355-yan.pdf"); | | | | Maurya | &ref("HPC_Presentation_20171110.pdf"); | &ref("nn with few multiplications.pdf"); &ref("BinaryConnect.pdf"); | | 14 | 11/21 | Hwang | &ref("TPU_slides.pdf"); | &ref("TPU_paper.pdf"); | | | | Sakurai | &ref("hpc_sakurai.pdf"); | &ref("DCatch.pdf"); | | 15 | 11/24 (2) | Ky & Beaudoin | | | | 15 | 11/24 | Ky | &ref("Hpc171123_presentation.pdf"); | &ref("Evolving_deep_neural_networks_a_new_prospect.pdf"); | | | | Beaudoin | &ref("CD-DNN-HMMs_Beaudoin.pdf"); | &ref("Context-Dependent_Pre-Trained_Deep_Neura.pdf"); | ** 選択済み論文リスト Selected Papers List [#xf8bcff6] - Scalable Training of Deep Learning Machines by Incremental Block Training with Intra-block Parallel Optimization and Blockwise Model-Update Filtering (ICASSP-2016) - Scalable Distributed DNN Training Using Commodity GPU Cloud Computing (INTERSPEECH 2015) - Scalable and Sustainable Deep Learning via Randomized Hashing (KDD 2017) - S-Caffe: Co-designing MPI Runtimes and Caffe for Scalable Deep Learning on Modern GPU Clusters (PPoPP17) - DLAU: A Scalable Deep Learning Accelerator Unit on FPGA(IEEE TCAD, 2017) - GeePS: scalable deep learning on distributed GPUs with a GPU-specialized parameter server (EuroSys '16) - Deep learning with COTS HPC systems (ICML, 2013) - Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks (FPGA'15) - C-Brain: A deep learning accelerator that tames the diversity of CNNs through adaptive data-level parallelization (DAC'16) - DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices (IPSN 2016) - A Parallel Computing Platform for Training Large Scale Neural Networks (IEEE Big Data, 2013) - Large Scale Distributed Deep Networks (NIPS'12) - Scaling Deep Learning Workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing (IPDPSW2017) - Efficient and Scalable Multi-Source Streaming Broadcast on GPU Clusters for Deep Learning (ICPP2017) - FireCaffe: near-linear acceleration of deep neural network training on compute clusters (CVPR, 2016) - Big Data with Cloud Computing : an insight on the computing environment, MapReduce, and programming frameworks (Journal Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery) - Project Adam: Building an Efficient and Scalable Deep Learning Training System (OSDI'14) - Efficient Machine Learning for Big Data: A Review (Journal Big Data Research) - In-Datacenter Performance Analysis of a Tensor Processing Unit​ TM (ISCA2017) - Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks (FPGA 2016) - Snowflake: An efficient hardware accelerator for convolutional neural networks (ISCAS 2017) - Modeling Scalability of Distributed Machine Learning (ICDE2017) - Distributed training of deep neural networks: theoretical and practical limits of parallel scalability (MLHPC 2016) - Evaluation of Deep Learning Frameworks Over Different HPC Architectures (ICDCS, 2017) - Efficient Large Message Broadcast using NCCL and CUDA-Aware MPI for Deep Learning (EuroMPI2016) - Training large scale deep neural networks on the intel Xeon Phi many-core coprocessor (IPDPSW 2014) - A Software Technique to Enhance Register Utilization of Convolutional Neural Networks on GPGPUs (ICASI 2017) - Megalloc: Fast Distributed Memory Allocator for NVM-based Cluster (NAS, 2017) - Mimir: Memory-Efficient and Scalable MapReduce for Large Supercomputing Systems (IPDPS, 2017) - Data-Intensive Supercomputing in the Cloud: Global Analytics for Satellite Imagery (DataCloud, 2016) - A Parallel FastTrack Data Race Detection on Multi-core Systems (IPDPS, 2017) - Efficient Data Race Detection for Distributed Memory Parallel Programs (SC'11, 2011) - D-Catch: Automatically Detecting Distributed Concurrency Bugs in Cloud Systems (ASPLOS, 2017) - Optimized big data K-means clustering using MapReduce (The Journal of Supercomputing, 2014) - An efficient K-means clustering algorithm on MapReduce[C]//International Conference on Database Systems for Advanced Applications (Springer, Cham, 2014) - Accelerating K-Means clustering with parallel implementations and GPU computing (HPEC, 2015) - Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization (NIPS, 2017) - SplitNet: Learning to Semantically Split Deep Networks for Parameter Reduction and Model Parallelization (ICML, 2017) - Traffic Flow Prediction With Big Data: A Deep Learning Approach (IEEE Transactions on Intelligent Transportation Systems, 2014) - Improving the speed of neural networks on CPUs (NIPS, 2011) - Performance Modeling and Scalability Optimization of Distributed Deep Learning Systems (KDD'15, 2015) - Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism (ISCA'17, 2017) //**期末レポート Report //- &color(red,white){期限 Due date: 02/17 (Extended)}; //- Summarize the general topic covering and including ALL THREE PAPERS regarding the state of the art in HPC and Big Data convergence. //- It should be 10 pages in [[IEEE conference paper format>http://www.ieee.org/conferences_events/conferences/publishing/templates.html]] //- Please submit it to TA by email &color(red,white){(NOT mailing list)}; **リンク Links [#tdc564e8] -[[ACM/IEEE Supercomputing>http://www.supercomp.org]] -[[IEEE IPDPS>http://www.ipdps.org]] -[[IEEE HPDC>http://www.hpdc.org/]] -[[ACM International Conference on Supercomputing (ICS)>http://www.ics-conference.org/]] -[[ISC>http://www.isc-events.com/]] -[[IEEE Cluster Computing>http://www.clustercomp.org/]] -[[IEEE/ACM Grid Computing>http://www.gridcomputing.org/]] -[[IEEE/ACM CCGrid>http://www.buyya.com/ccgrid/]] -[[IEEE Big Data>http://cci.drexel.edu/bigdata/bigdata2015/]] -[[CiteSeer.IST>http://citeseer.ist.psu.edu]] -[[Google Scholar>http://scholar.google.com]] -[[Windows Live Academic>http://academic.live.com]] -[[The ACM Degital Library>http://dl.acm.org/]]