MatsuLab. Lecture Note

大規模計算論 High Performance Computing

Tuesday 10:45 - 12:15 (Period: 3-4)
Friday 10:45 - 12:15 (Period: 3-4)
Main building 119A (H119A)
松岡教授 (Prof. S.Matsuoka)matsu [at]
TA 長坂 (Y.Nagasaka)nagasaka.y.aa [at]
メーリングリストに追加しますので、至急TAまでメールを送ってください。Please email to Nagasaka (TA) as soon as possible in order to add you to the mailing list.


休講予定日 Lecture Cancelled

11/14 (Tue), 11/17 (Fri)

授業概要と参考資料 Guidance and References

発表スケジュール Schedule


選択済み論文リスト Selected Papers List

  • Scalable Training of Deep Learning Machines by Incremental Block Training with Intra-block Parallel Optimization and Blockwise Model-Update Filtering (ICASSP-2016)
  • Scalable Distributed DNN Training Using Commodity GPU Cloud Computing (INTERSPEECH 2015)
  • Scalable and Sustainable Deep Learning via Randomized Hashing (KDD 2017)
  • S-Caffe: Co-designing MPI Runtimes and Caffe for Scalable Deep Learning on Modern GPU Clusters (PPoPP17)
  • DLAU: A Scalable Deep Learning Accelerator Unit on FPGA(IEEE TCAD, 2017)
  • GeePS: scalable deep learning on distributed GPUs with a GPU-specialized parameter server (EuroSys? '16)
  • Deep learning with COTS HPC systems (ICML, 2013)
  • Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks (FPGA'15)
  • C-Brain: A deep learning accelerator that tames the diversity of CNNs through adaptive data-level parallelization (DAC'16)
  • DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices (IPSN 2016)
  • A Parallel Computing Platform for Training Large Scale Neural Networks (IEEE Big Data, 2013)
  • Large Scale Distributed Deep Networks (NIPS'12)
  • Scaling Deep Learning Workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing (IPDPSW2017)
  • Efficient and Scalable Multi-Source Streaming Broadcast on GPU Clusters for Deep Learning (ICPP2017)
  • FireCaffe?: near-linear acceleration of deep neural network training on compute clusters (CVPR, 2016)
  • Big Data with Cloud Computing : an insight on the computing environment, MapReduce?, and programming frameworks (Journal Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery)
  • Project Adam: Building an Efficient and Scalable Deep Learning Training System (OSDI'14)
  • Efficient Machine Learning for Big Data: A Review (Journal Big Data Research)
  • In-Datacenter Performance Analysis of a Tensor Processing Unit​ TM (ISCA2017)
  • Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks (FPGA 2016)
  • Snowflake: An efficient hardware accelerator for convolutional neural networks (ISCAS 2017)
  • Modeling Scalability of Distributed Machine Learning (ICDE2017)
  • Distributed training of deep neural networks: theoretical and practical limits of parallel scalability (MLHPC 2016)
  • Evaluation of Deep Learning Frameworks Over Different HPC Architectures (ICDCS, 2017)
  • Efficient Large Message Broadcast using NCCL and CUDA-Aware MPI for Deep Learning (EuroMPI2016)
  • Training large scale deep neural networks on the intel Xeon Phi many-core coprocessor (IPDPSW 2014)
  • A Software Technique to Enhance Register Utilization of Convolutional Neural Networks on GPGPUs (ICASI 2017)
  • Megalloc: Fast Distributed Memory Allocator for NVM-based Cluster (NAS, 2017)
  • Mimir: Memory-Efficient and Scalable MapReduce? for Large Supercomputing Systems (IPDPS, 2017)
  • Data-Intensive Supercomputing in the Cloud: Global Analytics for Satellite Imagery (DataCloud?, 2016)
  • A Parallel FastTrack? Data Race Detection on Multi-core Systems (IPDPS, 2017)
  • Efficient Data Race Detection for Distributed Memory Parallel Programs (SC'11, 2011)
  • D-Catch: Automatically Detecting Distributed Concurrency Bugs in Cloud Systems (ASPLOS, 2017)
  • Optimized big data K-means clustering using MapReduce? (The Journal of Supercomputing, 2014)
  • An efficient K-means clustering algorithm on MapReduce?[C]//International Conference on Database Systems for Advanced Applications (Springer, Cham, 2014)
  • Accelerating K-Means clustering with parallel implementations and GPU computing (HPEC, 2015)
  • Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization (NIPS, 2017)
  • SplitNet?: Learning to Semantically Split Deep Networks for Parameter Reduction and Model Parallelization (ICML, 2017)
  • Traffic Flow Prediction With Big Data: A Deep Learning Approach (IEEE Transactions on Intelligent Transportation Systems, 2014)
  • Improving the speed of neural networks on CPUs (NIPS, 2011)
  • Performance Modeling and Scalability Optimization of Distributed Deep Learning Systems (KDD'15, 2015)
  • Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism (ISCA'17, 2017)

リンク Links

添付ファイル: fileHpc171123_presentation.pdf 668件 [詳細] fileTPU_slides.pdf 3278件 [詳細] filenn with few multiplications.pdf 738件 [詳細] fileBinaryConnect.pdf 1399件 [詳細] fileDCatch.pdf 528件 [詳細] fileTPU_paper.pdf 1299件 [詳細] filehpc_sakurai.pdf 520件 [詳細] fileCD-DNN-HMMs_Beaudoin.pdf 757件 [詳細] fileContext-Dependent_Pre-Trained_Deep_Neura.pdf 625件 [詳細] fileEvolving_deep_neural_networks_a_new_prospect.pdf 824件 [詳細] fileHPC_Presentation_20171110.pdf 2585件 [詳細] file20171110_hpc17.pdf 875件 [詳細] filep1355-yan.pdf 1373件 [詳細] fileHPCPresentation.pdf 534件 [詳細] filekim17b.pdf 1353件 [詳細] filehpc2017.pdf 828件 [詳細] file116124.pdf 633件 [詳細] file116119.pdf 612件 [詳細] fileDistributed_Training_of_Deep_Neural_Networks_Theoretical_and_Pra ctical_Limits_of_Parallel_Scalability.pdf 879件 [詳細] fileK-means.pdf 632件 [詳細] file17M38236.pdf 530件 [詳細] fileTraining large scale deep neural networks on the intel Xeon Phi many-core coprocessor.pdf 667件 [詳細] fileHPC presentation 30 october 2017.pdf 627件 [詳細] fileosdi14-paper-chilimbi.pdf 599件 [詳細] fileC-Brain presentation.pdf 2123件 [詳細] fileGeePS.pdf 984件 [詳細] fileC-Brain paper.pdf 1052件 [詳細] fileDGX-1_KNL_presentation.pdf 1216件 [詳細] fileDGX-1_KNL_presentation.pptx 880件 [詳細] fileFireCaffe.pdf 806件 [詳細] fileHPC_presentation.pdf 712件 [詳細] filehpc presentation.pdf 641件 [詳細] file16M58336_ZhouZixuan.pdf 978件 [詳細] files-caffe.pdf 2490件 [詳細] fileBlockMomentumSGD.pdf 1092件 [詳細] fileHPC_2.pdf 667件 [詳細] filehpc2017_guidance.pdf 726件 [詳細] fileGTC2016_RNN_Performance.pptx 861件 [詳細] filepascal-DL.pptx 688件 [詳細] filehpc.pdf 684件 [詳細]

トップ   編集 凍結 差分 バックアップ 添付 複製 名前変更 リロード   新規 一覧 単語検索 最終更新   ヘルプ   最終更新のRSS
Last-modified: 2018-05-29 (火) 19:12:57 (1449d)