hpc2017 のバックアップ(No.21)

	Date	Presenter	Slides	Paper
1	09/26	(Guidance)
2	09/29	Lecture	RNN, DL
3	10/03	Matsumura	hpc.pdf	BlockMomentumSGD.pdf
4	10/10	Tsuchikawa	HPC_2.pdf	s-caffe.pdf
5	10/13	Zixuan	16M58336_ZhouZixuan.pdf	GeePS.pdf
6	10/17	Barton	C-Brain presentation.pdf	C-Brain paper.pdf
7	10/20	Haoyu	DGX-1_KNL_presentation.pptx	DGX-1_KNL_presentation.pdf
8	10/24	Yashima
9	10/27 ~~(2)~~	Yi
10	10/31 (2)	Deshmukh
11	11/03 (2)	Duan, Sun
12	11/07 (2)	Erum, Jun
13	11/10 (2)	Chenwu, Maurya
14	11/21 (2)	Hwang & Sakurai
15	11/24 (2)	Ky & Beaudoin

↑

選択済み論文リスト Selected Papers List †

Scalable Training of Deep Learning Machines by Incremental Block Training with Intra-block Parallel Optimization and Blockwise Model-Update Filtering (ICASSP-2016)
Scalable Distributed DNN Training Using Commodity GPU Cloud Computing (INTERSPEECH 2015)
Scalable and Sustainable Deep Learning via Randomized Hashing (KDD 2017)
S-Caffe: Co-designing MPI Runtimes and Caffe for Scalable Deep Learning on Modern GPU Clusters (PPoPP17)
DLAU: A Scalable Deep Learning Accelerator Unit on FPGA(IEEE TCAD, 2017)
GeePS: scalable deep learning on distributed GPUs with a GPU-specialized parameter server (EuroSys? '16)
Deep learning with COTS HPC systems (ICML, 2013)
Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks (FPGA'15)
C-Brain: A deep learning accelerator that tames the diversity of CNNs through adaptive data-level parallelization (DAC'16)
DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices (IPSN 2016)
A Parallel Computing Platform for Training Large Scale Neural Networks (IEEE Big Data, 2013)
Large Scale Distributed Deep Networks (NIPS'12)
Scaling Deep Learning Workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing (IPDPSW2017)
Efficient and Scalable Multi-Source Streaming Broadcast on GPU Clusters for Deep Learning (ICPP2017)
FireCaffe?: near-linear acceleration of deep neural network training on compute clusters (CVPR, 2016)
Big Data with Cloud Computing : an insight on the computing environment, MapReduce?, and programming frameworks (Journal Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery)
Project Adam: Building an Efficient and Scalable Deep Learning Training System (OSDI'14)
Efficient Machine Learning for Big Data: A Review (Journal Big Data Research)
In-Datacenter Performance Analysis of a Tensor Processing Unit TM (ISCA2017)
Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks (FPGA 2016)
Snowflake: An efficient hardware accelerator for convolutional neural networks (ISCAS 2017)
Modeling Scalability of Distributed Machine Learning (ICDE2017)
Distributed training of deep neural networks: theoretical and practical limits of parallel scalability (MLHPC 2016)
Evaluation of Deep Learning Frameworks Over Different HPC Architectures (ICDCS, 2017)
Efficient Large Message Broadcast using NCCL and CUDA-Aware MPI for Deep Learning (EuroMPI2016)
Training large scale deep neural networks on the intel Xeon Phi many-core coprocessor (IPDPSW 2014)
A Software Technique to Enhance Register Utilization of Convolutional Neural Networks on GPGPUs (ICASI 2017)
Megalloc: Fast Distributed Memory Allocator for NVM-based Cluster (NAS, 2017)
Mimir: Memory-Efficient and Scalable MapReduce? for Large Supercomputing Systems (IPDPS, 2017)
Data-Intensive Supercomputing in the Cloud: Global Analytics for Satellite Imagery (DataCloud?, 2016)
A Parallel FastTrack? Data Race Detection on Multi-core Systems (IPDPS, 2017)
Efficient Data Race Detection for Distributed Memory Parallel Programs (SC'11, 2011)
D-Catch: Automatically Detecting Distributed Concurrency Bugs in Cloud Systems (ASPLOS, 2017)

↑

大規模計算論 High Performance Computing †

目次 †

休講予定日 Lecture Cancelled †

授業概要と参考資料 Guidance and References †

発表スケジュール Schedule †

選択済み論文リスト Selected Papers List †

リンク Links †