MatsuLab. Lecture Note
大規模計算論 High Performance Computing †
- Date
- Tuesday 10:45 - 12:15 (Period: 3-4)
Friday 10:45 - 12:15 (Period: 3-4)
- Room
- Main building 119A (H119A)
- Contact
松岡教授 (Prof. S.Matsuoka) | matsu [at] is.titech.ac.jp |
TA 長坂 (Y.Nagasaka) | nagasaka.y.aa [at] m.titech.ac.jp |
メーリングリストに追加しますので、至急TAまでメールを送ってください。Please email to Nagasaka (TA) as soon as possible in order to add you to the mailing list.
目次 †
休講予定日 Lecture Cancelled †
11/14 (Tue), 11/17 (Fri)
授業概要と参考資料 Guidance and References †
発表スケジュール Schedule †
暫定的な割り当ては以下の通りですが、都合が悪い場合はTAまで希望日をメールしてください。
選択済み論文リスト Selected Papers List †
- Scalable Training of Deep Learning Machines by Incremental Block Training with Intra-block Parallel Optimization and Blockwise Model-Update Filtering (ICASSP-2016)
- Scalable Distributed DNN Training Using Commodity GPU Cloud Computing (INTERSPEECH 2015)
- Scalable and Sustainable Deep Learning via Randomized Hashing (KDD 2017)
- S-Caffe: Co-designing MPI Runtimes and Caffe for Scalable Deep Learning on Modern GPU Clusters (PPoPP17)
- DLAU: A Scalable Deep Learning Accelerator Unit on FPGA(IEEE TCAD, 2017)
- GeePS: scalable deep learning on distributed GPUs with a GPU-specialized parameter server (EuroSys? '16)
- Deep learning with COTS HPC systems (ICML, 2013)
- Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks (FPGA'15)
- C-Brain: A deep learning accelerator that tames the diversity of CNNs through adaptive data-level parallelization (DAC'16)
- DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices (IPSN 2016)
- A Parallel Computing Platform for Training Large Scale Neural Networks (IEEE Big Data, 2013)
- Large Scale Distributed Deep Networks (NIPS'12)
- Scaling Deep Learning Workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing (IPDPSW2017)
- Efficient and Scalable Multi-Source Streaming Broadcast on GPU Clusters for Deep Learning (ICPP2017)
- FireCaffe?: near-linear acceleration of deep neural network training on compute clusters (CVPR, 2016)
- Big Data with Cloud Computing : an insight on the computing environment, MapReduce?, and programming frameworks (Journal Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery)
- Project Adam: Building an Efficient and Scalable Deep Learning Training System (OSDI'14)
- Efficient Machine Learning for Big Data: A Review (Journal Big Data Research)
- In-Datacenter Performance Analysis of a Tensor Processing Unit TM (ISCA2017)
- Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks (FPGA 2016)
- Snowflake: An efficient hardware accelerator for convolutional neural networks (ISCAS 2017)
- Modeling Scalability of Distributed Machine Learning (ICDE2017)
- Distributed training of deep neural networks: theoretical and practical limits of parallel scalability (MLHPC 2016)
- Evaluation of Deep Learning Frameworks Over Different HPC Architectures (ICDCS, 2017)
- Efficient Large Message Broadcast using NCCL and CUDA-Aware MPI for Deep Learning (EuroMPI2016)
- Training large scale deep neural networks on the intel Xeon Phi many-core coprocessor (IPDPSW 2014)
- A Software Technique to Enhance Register Utilization of Convolutional Neural Networks on GPGPUs (ICASI 2017)
- Megalloc: Fast Distributed Memory Allocator for NVM-based Cluster (NAS, 2017)
- Mimir: Memory-Efficient and Scalable MapReduce? for Large Supercomputing Systems (IPDPS, 2017)
- Data-Intensive Supercomputing in the Cloud: Global Analytics for Satellite Imagery (DataCloud?, 2016)
- A Parallel FastTrack? Data Race Detection on Multi-core Systems (IPDPS, 2017)
- Efficient Data Race Detection for Distributed Memory Parallel Programs (SC'11, 2011)
- D-Catch: Automatically Detecting Distributed Concurrency Bugs in Cloud Systems (ASPLOS, 2017)
リンク Links †