hpc2017

	Date	Presenter	Slides	Paper
1	09/26	(Guidance)
2	09/29	Lecture	RNN, DL
3	10/03	Matsumura	hpc.pdf	BlockMomentumSGD.pdf
4	10/10	Tsuchikawa	HPC_2.pdf	s-caffe.pdf
5	10/13	Zixuan	16M58336_ZhouZixuan.pdf	GeePS.pdf
6	10/17	Barton	C-Brain presentation.pdf	C-Brain paper.pdf
7	10/20	Haoyu	DGX-1_KNL_presentation.pptx	DGX-1_KNL_presentation.pdf
8	10/24	Yashima	HPC_presentation.pdf	FireCaffe.pdf
9	10/27 ~~(2)~~	Yi	hpc presentation.pdf	osdi14-paper-chilimbi.pdf
10	10/31 ~~(2)~~	Deshmukh	HPC presentation 30 october 2017.pdf	Training large scale deep neural networks on the intel Xeon Phi many-core coprocessor.pdf
11	11/03	Sun	17M38236.pdf	Distributed_Training_of_Deep_Neural_Networks_Theoretical_and_Pra ctical_Limits_of_Parallel_Scalability.pdf
		Duan	HPCPresentation.pdf	K-means.pdf
12	11/07	Erum	116119.pdf	116124.pdf
		Jun	hpc2017.pdf	kim17b.pdf
13	11/10	Chenwu	20171110_hpc17.pdf	p1355-yan.pdf
		Maurya	HPC_Presentation_20171110.pdf	nn with few multiplications.pdf BinaryConnect.pdf
14	11/21	Hwang	TPU_slides.pdf	TPU_paper.pdf
		Sakurai	hpc_sakurai.pdf	DCatch.pdf
15	11/24	Ky	Hpc171123_presentation.pdf	Evolving_deep_neural_networks_a_new_prospect.pdf
		Beaudoin	CD-DNN-HMMs_Beaudoin.pdf	Context-Dependent_Pre-Trained_Deep_Neura.pdf

↑

選択済み論文リスト Selected Papers List †

Scalable Training of Deep Learning Machines by Incremental Block Training with Intra-block Parallel Optimization and Blockwise Model-Update Filtering (ICASSP-2016)
Scalable Distributed DNN Training Using Commodity GPU Cloud Computing (INTERSPEECH 2015)
Scalable and Sustainable Deep Learning via Randomized Hashing (KDD 2017)
S-Caffe: Co-designing MPI Runtimes and Caffe for Scalable Deep Learning on Modern GPU Clusters (PPoPP17)
DLAU: A Scalable Deep Learning Accelerator Unit on FPGA(IEEE TCAD, 2017)
GeePS: scalable deep learning on distributed GPUs with a GPU-specialized parameter server (EuroSys? '16)
Deep learning with COTS HPC systems (ICML, 2013)
Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks (FPGA'15)
C-Brain: A deep learning accelerator that tames the diversity of CNNs through adaptive data-level parallelization (DAC'16)
DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices (IPSN 2016)
A Parallel Computing Platform for Training Large Scale Neural Networks (IEEE Big Data, 2013)
Large Scale Distributed Deep Networks (NIPS'12)
Scaling Deep Learning Workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing (IPDPSW2017)
Efficient and Scalable Multi-Source Streaming Broadcast on GPU Clusters for Deep Learning (ICPP2017)
FireCaffe?: near-linear acceleration of deep neural network training on compute clusters (CVPR, 2016)
Big Data with Cloud Computing : an insight on the computing environment, MapReduce?, and programming frameworks (Journal Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery)
Project Adam: Building an Efficient and Scalable Deep Learning Training System (OSDI'14)
Efficient Machine Learning for Big Data: A Review (Journal Big Data Research)
In-Datacenter Performance Analysis of a Tensor Processing Unit TM (ISCA2017)
Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks (FPGA 2016)
Snowflake: An efficient hardware accelerator for convolutional neural networks (ISCAS 2017)
Modeling Scalability of Distributed Machine Learning (ICDE2017)
Distributed training of deep neural networks: theoretical and practical limits of parallel scalability (MLHPC 2016)
Evaluation of Deep Learning Frameworks Over Different HPC Architectures (ICDCS, 2017)
Efficient Large Message Broadcast using NCCL and CUDA-Aware MPI for Deep Learning (EuroMPI2016)
Training large scale deep neural networks on the intel Xeon Phi many-core coprocessor (IPDPSW 2014)
A Software Technique to Enhance Register Utilization of Convolutional Neural Networks on GPGPUs (ICASI 2017)
Megalloc: Fast Distributed Memory Allocator for NVM-based Cluster (NAS, 2017)
Mimir: Memory-Efficient and Scalable MapReduce? for Large Supercomputing Systems (IPDPS, 2017)
Data-Intensive Supercomputing in the Cloud: Global Analytics for Satellite Imagery (DataCloud?, 2016)
A Parallel FastTrack? Data Race Detection on Multi-core Systems (IPDPS, 2017)
Efficient Data Race Detection for Distributed Memory Parallel Programs (SC'11, 2011)
D-Catch: Automatically Detecting Distributed Concurrency Bugs in Cloud Systems (ASPLOS, 2017)
Optimized big data K-means clustering using MapReduce? (The Journal of Supercomputing, 2014)
An efficient K-means clustering algorithm on MapReduce?[C]//International Conference on Database Systems for Advanced Applications (Springer, Cham, 2014)
Accelerating K-Means clustering with parallel implementations and GPU computing (HPEC, 2015)
Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization (NIPS, 2017)
SplitNet?: Learning to Semantically Split Deep Networks for Parameter Reduction and Model Parallelization (ICML, 2017)
Traffic Flow Prediction With Big Data: A Deep Learning Approach (IEEE Transactions on Intelligent Transportation Systems, 2014)
Improving the speed of neural networks on CPUs (NIPS, 2011)
Performance Modeling and Scalability Optimization of Distributed Deep Learning Systems (KDD'15, 2015)
Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism (ISCA'17, 2017)

↑

最新の20件

大規模計算論 High Performance Computing †

目次 †

休講予定日 Lecture Cancelled †

授業概要と参考資料 Guidance and References †

発表スケジュール Schedule †

選択済み論文リスト Selected Papers List †

リンク Links †