MatsuLab. Lecture Note

大規模計算論 High Performance Computing

Tuesday 10:45 - 12:15 (Period: 3-4)
Friday 10:45 - 12:15 (Period: 3-4)
Main building 119A (H119A)
松岡教授 (Prof. S.Matsuoka)matsu [at]
TA 長坂 (Y.Nagasaka)nagasaka.y.aa [at]
メーリングリストに追加しますので、至急TAまでメールを送ってください。Please email to Nagasaka (TA) as soon as possible in order to add you to the mailing list.


休講予定日 Lecture Cancelled

11/14 (Tue), 11/17 (Fri)

授業概要と参考資料 Guidance and References

発表スケジュール Schedule


選択済み論文リスト Selected Papers List

  • Scalable Training of Deep Learning Machines by Incremental Block Training with Intra-block Parallel Optimization and Blockwise Model-Update Filtering (ICASSP-2016)
  • Scalable Distributed DNN Training Using Commodity GPU Cloud Computing (INTERSPEECH 2015)
  • Scalable and Sustainable Deep Learning via Randomized Hashing (KDD 2017)
  • S-Caffe: Co-designing MPI Runtimes and Caffe for Scalable Deep Learning on Modern GPU Clusters (PPoPP17)
  • DLAU: A Scalable Deep Learning Accelerator Unit on FPGA(IEEE TCAD, 2017)
  • GeePS: scalable deep learning on distributed GPUs with a GPU-specialized parameter server (EuroSys? '16)
  • Deep learning with COTS HPC systems (ICML, 2013)
  • Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks (FPGA'15)
  • C-Brain: A deep learning accelerator that tames the diversity of CNNs through adaptive data-level parallelization (DAC'16)
  • DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices (IPSN 2016)
  • A Parallel Computing Platform for Training Large Scale Neural Networks (IEEE Big Data, 2013)
  • Large Scale Distributed Deep Networks (NIPS'12)
  • Scaling Deep Learning Workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing (IPDPSW2017)
  • Efficient and Scalable Multi-Source Streaming Broadcast on GPU Clusters for Deep Learning (ICPP2017)
  • FireCaffe?: near-linear acceleration of deep neural network training on compute clusters (CVPR, 2016)
  • Big Data with Cloud Computing : an insight on the computing environment, MapReduce?, and programming frameworks (Journal Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery)
  • Project Adam: Building an Efficient and Scalable Deep Learning Training System (OSDI'14)
  • Efficient Machine Learning for Big Data: A Review (Journal Big Data Research)
  • In-Datacenter Performance Analysis of a Tensor Processing Unit​ TM (ISCA2017)
  • Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks (FPGA 2016)
  • Snowflake: An efficient hardware accelerator for convolutional neural networks (ISCAS 2017)
  • Modeling Scalability of Distributed Machine Learning (ICDE2017)
  • Distributed training of deep neural networks: theoretical and practical limits of parallel scalability (MLHPC 2016)
  • Evaluation of Deep Learning Frameworks Over Different HPC Architectures (ICDCS, 2017)
  • Efficient Large Message Broadcast using NCCL and CUDA-Aware MPI for Deep Learning (EuroMPI2016)
  • Training large scale deep neural networks on the intel Xeon Phi many-core coprocessor (IPDPSW 2014)
  • A Software Technique to Enhance Register Utilization of Convolutional Neural Networks on GPGPUs (ICASI 2017)
  • Megalloc: Fast Distributed Memory Allocator for NVM-based Cluster (NAS, 2017)
  • Mimir: Memory-Efficient and Scalable MapReduce? for Large Supercomputing Systems (IPDPS, 2017)
  • Data-Intensive Supercomputing in the Cloud: Global Analytics for Satellite Imagery (DataCloud?, 2016)
  • A Parallel FastTrack? Data Race Detection on Multi-core Systems (IPDPS, 2017)
  • Efficient Data Race Detection for Distributed Memory Parallel Programs (SC'11, 2011)
  • D-Catch: Automatically Detecting Distributed Concurrency Bugs in Cloud Systems (ASPLOS, 2017)
  • Optimized big data K-means clustering using MapReduce? (The Journal of Supercomputing, 2014)
  • An efficient K-means clustering algorithm on MapReduce?[C]//International Conference on Database Systems for Advanced Applications (Springer, Cham, 2014)
  • Accelerating K-Means clustering with parallel implementations and GPU computing (HPEC, 2015)
  • Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization (NIPS, 2017)
  • SplitNet?: Learning to Semantically Split Deep Networks for Parameter Reduction and Model Parallelization (ICML, 2017)
  • Traffic Flow Prediction With Big Data: A Deep Learning Approach (IEEE Transactions on Intelligent Transportation Systems, 2014)
  • Improving the speed of neural networks on CPUs (NIPS, 2011)
  • Performance Modeling and Scalability Optimization of Distributed Deep Learning Systems (KDD'15, 2015)
  • Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism (ISCA'17, 2017)

リンク Links

添付ファイル: fileHpc171123_presentation.pdf 132件 [詳細] fileTPU_slides.pdf 226件 [詳細] filenn with few multiplications.pdf 135件 [詳細] fileBinaryConnect.pdf 162件 [詳細] fileDCatch.pdf 129件 [詳細] fileTPU_paper.pdf 154件 [詳細] filehpc_sakurai.pdf 125件 [詳細] fileCD-DNN-HMMs_Beaudoin.pdf 133件 [詳細] fileContext-Dependent_Pre-Trained_Deep_Neura.pdf 138件 [詳細] fileEvolving_deep_neural_networks_a_new_prospect.pdf 121件 [詳細] fileHPC_Presentation_20171110.pdf 166件 [詳細] file20171110_hpc17.pdf 154件 [詳細] filep1355-yan.pdf 799件 [詳細] fileHPCPresentation.pdf 120件 [詳細] filekim17b.pdf 204件 [詳細] filehpc2017.pdf 155件 [詳細] file116124.pdf 133件 [詳細] file116119.pdf 131件 [詳細] fileDistributed_Training_of_Deep_Neural_Networks_Theoretical_and_Pra ctical_Limits_of_Parallel_Scalability.pdf 155件 [詳細] fileK-means.pdf 139件 [詳細] file17M38236.pdf 125件 [詳細] fileTraining large scale deep neural networks on the intel Xeon Phi many-core coprocessor.pdf 134件 [詳細] fileHPC presentation 30 october 2017.pdf 145件 [詳細] fileosdi14-paper-chilimbi.pdf 130件 [詳細] fileC-Brain presentation.pdf 194件 [詳細] fileGeePS.pdf 175件 [詳細] fileC-Brain paper.pdf 180件 [詳細] fileDGX-1_KNL_presentation.pdf 140件 [詳細] fileDGX-1_KNL_presentation.pptx 165件 [詳細] fileFireCaffe.pdf 171件 [詳細] fileHPC_presentation.pdf 137件 [詳細] filehpc presentation.pdf 139件 [詳細] file16M58336_ZhouZixuan.pdf 173件 [詳細] files-caffe.pdf 284件 [詳細] fileBlockMomentumSGD.pdf 181件 [詳細] fileHPC_2.pdf 176件 [詳細] filehpc2017_guidance.pdf 218件 [詳細] fileGTC2016_RNN_Performance.pptx 148件 [詳細] filepascal-DL.pptx 151件 [詳細] filehpc.pdf 193件 [詳細]

トップ   編集 凍結 差分 バックアップ 添付 複製 名前変更 リロード   新規 一覧 単語検索 最終更新   ヘルプ   最終更新のRSS
Last-modified: 2018-05-29 (火) 19:12:57 (20d)