Xevolver CREST: An evolutionary approach to construction of a software development environment for massively-parallel heterogeneous systems

Publications

FY 2016

Journals

Cong Li, “Communication-Avoiding Conjugate Gradient Method for Next Generation Supercomputing Systems,” ISC High Performance (ISC 2016) PhD Forum, June 20, 2016.

Daisuke Takahashi, “Implementation of Multiple-Precision Floating-Point Arithmetic on Intel Xeon Phi Coprocessors,” Proc. 16th International Conference on Computational Science and Its Applications (ICCSA 2016), Part II, Lecture Notes in Computer Science, Vol. 9787, pp. 60–70, Springer International Publishing (2016).

Hiroshi Maeda and Daisuke Takahashi, “Parallel Sparse Matrix-Vector Multiplication Using Accelerators,” Proc. 16th International Conference on Computational Science and Its Applications (ICCSA 2016), Part II, Lecture Notes in Computer Science, Vol. 9787, pp. 3–18, Springer International Publishing (2016). (NVIDIA Best Paper Award)

Takuya Ikuzawa, Fumihiko Ino, and Kenichi Hagihara, “Reducing Memory Usage by the Lifting-based Discrete Wavelet Transform with a Unified Buffer on a GPU,” Journal of Parallel and Distributed Computing, Vol. 93/94, pp. 44–55, (2016-07).

Kazuhiko Komatsu, Ryusuke Egawa, Shoichi Hirasawa, Hiroyuki Takizawa, Ken’ichi Itakura, Hiroaki Kobayashi, “Translation of Large-Scale Simulation Codes for an OpenACC Platform Using the Xevolver Framework,” International Journal on Networking and Computing (special issue on CANDAR’16), Vol. 6, No. 2, pp. 167-180 , Aug. 2016.

Raghunandan Mathur, Hiroshi Matsuoka, Osamu Watanabe, Akihiro Musa, Ryusuke Egawa and Hiroaki Kobayashi, “A Memory-Efficient Implementation of a Plasmonics Simulation Application on SX-ACE, “International Journal on Networking and Computing (special issue on CANDAR’16), Vol. 6, No. 2, pp. 243-262, Aug. 2016.

Reiji Suda, Hiroyuki Takizawa, Shoichi Hirasawa, “Xevtgen: Fortran code transformer generator for high performance scientific codes,” International Journal on Networking and Computing (special issue on CANDAR’16), Vol. 6, No. 2, pp. 263-289 , Aug. 2016.

Daisuke Takahashi, “Automatic Tuning of Computation-Communication Overlap for Parallel 1-D FFT (SP),” 19th IEEE International Conference on Computational Science and Engineering (CSE 2016), Paris, France, August 24-26, 2016.

Toshiaki Hishinuma, Takuma Sakakibara, Akihiro Fujii, Teruo Tanaka, Shoichi Hirasawa, “Xev-GMP: Automatic code generation for GMP multiple-precision code from C code,” 19th IEEE International Conference on Computational Science and Engineering (CSE 2016), Paris, France, August 24-26, 2016.

Cui Hang, Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi, “A Code Selection Mechanism Using Deep Learning,” IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-16), Lyon, France, September 21-23, 2016.

Xiong Xiao, Shoichi Hirasawa, Hiroyuki Takizawa, and Hiroaki Kobayashi, “The Importance of Dynamic Load Balancing among OpenMP Thread Teams for Irregular Workloads,” The Fourth International Symposium on Computing and Networking, Hiroshima, Japan, November 22-25, pp. 529-535, 2016.

Yasuharu Hayashi, Hiroyuki Takizawa and Hiroaki Kobayashi, “A User-Defined Code Transformation Approach to Overlapping MPI Communication with Computation,” The Fourth International Symposium on Computing and Networking, Hiroshima, Japan, November 22-25, pp. 508-514, 2016.

Reiji Suda and Hiroyuki Takizawa, “A software system supporting XML-based source-to-source code transformations on Fortran programs,” The Fourth International Symposium on Computing and Networking, Hiroshima, Japan, November 22-25, pp. 522-528, 2016.

Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, and Hiroaki Kobayashi, “A Directive Generation Approach Using User-defined Rules,” The Fourth International Symposium on Computing and Networking, Hiroshima, Japan, November 22-25, pp. 515-521, 2016.

Y. Sakaguchi, K. Kataumi, H. Matsuoka, O. Watanabe, A. Musa, K. Komatsu, R. Egawa, H. Kobayashi, S. Yamamoto, “Performance Optimization of Numerical Turbine for Supercomputer SX-ACE,” the 28th
International Conference on Parallel Computational Fluid Dynamics, May 9-12, 2016.

Takuya Tsunogawa, Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi, “Automatic Parameter Tuning of Stencil Computation Using Directives”, ACS, Vol.9, No. 4, pp. 25—37, 2016.

Nobuhiro Miki, Fumihiko Ino, and Kenichi Hagihara, “An Extension of OpenACC Directives for Out-of-Core Stencil Computation with Temporal Blocking,” In Proceedings of the 3rd Workshop on Accelerator Programming Using Directives (WACCPD 2016), pp. 36–45, Salt Lake City, UT, USA, (2016-11).

Ryotaro Sakai, Fumihiko Ino, and Kenichi Hagihara, “Towards Automating Multi-dimensional Data Decomposition for Executing a Single-GPU Code on a Multi-GPU System,” In Proceedings of the 4th International Symposium on Networking and Computing (CANDAR 2016), pp. 408–414, Hiroshima, Japan, (2016-11). Presented at the 4th International Workshop on Computer Systems and Architectures (CSA 2016).

Yuki Takeuchi, Yoshihide Yoshimoto, and Reiji Suda, “Second order accuracy finite difference methods for space-fractional partial differential equations,” Journal of Computational and Applied Mathematics, Vol. 320, pp. 101-119, 2017.

Ryusuke Egawa, Kazuhiko Komatsu, Shintaro Momose, Yoko Isobe, Akihiro Musa, Hiroyuki Takizawa, Hiroaki Kobayashi, “Potential of a Modern Vector Supercomputer for Practical Applications – Performance Evaluation of SX-ACE –,”Journal of Supercomputing, pp. 1 – 29, 2017, DOI: 10.1007/s11227-017-1993-y.

Yuta Sakaguchi, Kenryo Kataumi, Hiroshi Matsuoka, Osamu Watanabe, Akihiro Musa, Kazuhiko, Komatsu, Ryusuke Egawa, Hiroaki Kobayashi, Satoru Yamamoto, “A Case Study of Performance Optimization on Numerical Turbine for Supercomputer SX-ACE”, Computers & Fluids, 2017 (to appear).

Other Publications

Hiroyuki Takizawa, Takeshi Yamada, Shoichi Hirasawa, and Reiji Suda, “A Use Case of a Code Transformation Rule Generator for Data Layout Optimization,” Sustained Simulation Performance 2016, Springer-Verlang, pp. 21-30, 2016.

Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, and Hiroaki Kobayashi, “Directive Translation for Various HPC Systems Using the Xevolver Framework,” Sustained Simulation Performance 2016, Springer-Verlang, pp. 109-117, 2016.

Shoichi Hirasawa, Hiroyuki Takizawa, and Hiroaki Kobayashi, “An Automatic Performance Tracking System for Large-scale Numerical Applications,” Sustained Simulation Performance 2016, Springer-Verlang, pp. 119-127, 2016.

Invited Talks

Daisuke Takahashi,“Implementation of Parallel FFTs on Knights Landing Cluster,” SIAM Conference on Computational Science and Engineering (CSE17), February 28, 2017.

Daisuke Takahashi, “Automatic Tuning for Parallel FFTs on Cluster of Intel Xeon Phi Processors,” 2017 Conference on Advanced Topics and Auto Tuning in High-Performance Scientific Computing (ATAT in HPSC 2017), March 11, 2017.

Kazuhiko Komatsu, “Directive Translation Approach in Keeping a Code Clean,” 2017 Conference on Advanced Topics and Auto Tuning in High-Performance Scientific Computing (ATAT in HPSC 2017), March 11, 2017.

Ryusuke Egawa, “An HPC Refactoring Catalog – Accumulating Know-Hows of Sytem Specific Optimization and its Practical Usage,” 2017 Conference on Advanced Topics and Auto Tuning in High-Performance Scientific Computing (ATAT in HPSC 2017), March 12, 2017.

Reiji Suda, “Generation of Math Library for Multi-Parameter Autotuning,” 2017 Conference on Advanced Topics and Auto Tuning in High-Performance Scientific Computing (ATAT in HPSC 2017), March 12, 2017.

Hiroyuki Takizawa, “Combining Autotuning and Code Transformations,” 2017 Conference on Advanced Topics and Auto Tuning in High-Performance Scientific Computing (ATAT in HPSC 2017), March 12, 2017.

Presentations

Kobayashi Shohei, “Numerical Unstability and Improvement of a No-Pivoting LU Decomposition Algorithm by a Discrete Fourier Matrix,” Information Processing Society of Japan, SIG Technical Reports, 2016-HPC-154, 8 pages, April 2016.

Reiji Suda, “Diamond Tiling Extended to General Sparse Matrix Powers Kernel”, First International Workshop on Deepening Performance Models for Automatic Tuning (DPMAT), Sep. 7th, 2016, Nagoya University.

Hiroyuki Takizawa, “Autotuning meets Code Transformations – A case study of Xevolver framework –,” The 24th Workshop on Sustained Simulation Performance, Stuttgart, December 6, 2016.

Ryusuke Egawa, Yoko Isobe, Soya Fujimoto, Power and Performance Analysis of SX-ACE, The 24th Workshop on Sustained Simulation Performance, Stuttgart, December 6, 2016.

Kazuhiko Komatsu, “A Directive Generation Using A Code Translation Framework,” The 24th Workshop on Sustained Simulation Performance, Stuttgart, December 6, 2016.

Hirokazu Honda, Yoshinori Tamada, Reiji Suda, “Efficient Parallel Algorithm for Optimal DAG Structure Search on Parallel Computer with Torus Network”, Proc. ICA3PP 2016: Algorithms and Architectures for Parallel Processing, Dec. 14-16, 2016, Granada, Spain, LNCS 10048, pp. 483-502, DOI:10.1007/978-3-319-49583-5_37, Dec. 2016

Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi. “User-defined Directive Translation using the Xevovler Framework,” 2017 SIAM Conference on Compuiter Science and Engineering (CSE17), Hilton Atlanta, Altanta, USA, February 27 – March 3, 2017.

Hiroyuki Takizawa, “Performance Tuning with Machine Learning,” The 25th Workshop on Sustained Simulation Performance, Sendai, March 13, 2017.

Posters

Hiroyuki Takizawa, Daichi Sato, Shoichi Hirasawa, and Hiroaki Kobayashi, “Making a Legacy Code Auto-tunable without Messing It Up,” ACM/IEEE Supercomputing Conference 2016 (SC16), 2016. (poster)

Keiichiro Fukazawa, Ryusuke Egawa, Yuko Isobe and Ikuo Miyoshi, “Performance Evaluation of MHD Simulation Code on SX-ACE and FX100,” Poster presentation at International Symposium on High-Performance Parallel and Distributed Computing (HPDC 2016), Kyoto Japan, June 2016.(abstract review)

FY 2015

Journals

Alfian Amrizal, Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi, “Automatic Parameter Tuning of Hierarchical Incremental Checkpointing,” High Performance Computing for Computational Science — VECPAR 2014, Lecture Notes in Computer Science Volume 8969, pp 298-309, 2015.

Hiroyuki Takizawa, Shoichi Hirasawa, Makoto Sugawara, Isaac Gelado, Hiroaki Kobayashi and Wen-mei W. Hwu, “Optimized Data Transfers Based on the OpenCL Event Management Mechanism,” Scientific Programming, vol. 2015, Article ID 576498, 16 pages, 2015. doi:10.1155/2015/576498.

Shoichi Hirasawa, Hiroyuki Takizawa and Hiroaki Kobayashi, “A Light-weight Rollback Mechanism for Testing Kernel Variants in Auto-tuning,” IEICE Transactions on Information and Systems, Vol.E98-D, No.12, pp.2178-2186, Dec. 2015.

Takeshi Yamada, Shoichi Hirasawa, Hiroyuki Takizawa and Hiroaki Kobayashi, “A Case Study of User-Defined Code Transformations for Data Layout Optimizations,” The Third International Symposium on Computing and Networking — Across Practical Development and Theoretical Research —, Sapporo, Hokkaido, Japan, December 8-11, 2015.

Kazuhiko Komatsu, Ryusuke Egawa, Shoichi Hirasawa, Hiroyuki Takizawa, Ken’Ichi Itakura and Hiroaki Kobayashi, “Migration of an Atmospheric Simulation Code to an OpenACC Platform Using the Xevolver Framework,” The Third International Symposium on Computing and Networking — Across Practical Development and Theoretical Research —, Sapporo, Hokkaido, Japan, December 8-11, 2015.

Raghunandan Mathur, Hiroshi Matsuoka, Osamu Watanabe, Akihiro Musa, Ryusuke Egawa and Hiroaki Kobayashi, “A Case Study of Memory Optimization for Migration of a Plasmonics Simulation Application to SX-ACE,” The Third International Symposium on Computing and Networking — Across Practical Development and Theoretical Research —, Sapporo, Hokkaido, Japan, December 8-11, 2015.

Reiji Suda, Hiroyuki Takizawa and Shoichi Hirasawa, “Xevtgen: fortran code transformer generator for high performance scientific codes,” The Third International Symposium on Computing and Networking — Across Practical Development and Theoretical Research —, Sapporo, Hokkaido, Japan, December 8-11, 2015.

Shoichi Hirasawa, Hiroyuki Takizawa and Hiroaki Kobayashi, “A Verification Framework for Streamlining Empirical Auto-tuning,” The Third International Symposium on Computing and Networking — Across Practical Development and Theoretical Research —, Sapporo, Hokkaido, Japan, December 8-11, 2015.

Kei Ikeda, Fumihiko Ino, and Kenichi Hagihara, “An OpenACC Optimizer for Accelerating Histogram Computation on a GPU,” Proceedings of the 24th Euromicro International Conference on Parallel, Distributed and Network-Based Computing (PDP 2016), pp.466–477, Heraklion, Greece, Feb. 17, 2016.

Nobuhiro Miki, Fumihiko Ino, and Kenichi Hagihara, “Applying Temporal Blocking to Out-of-Core Stencil Computation with OpenACC,” Proceedings of the Work in Progress Session held in connection with the 24th Euromicro International Conference on Parallel, Distributed and Network-Based Computing (PDP 2016), Heraklion, Greece, 2 pages, Feb. 19, 2016.

Other Publications

Hiroyuki Takizawa, Daichi Sato, Shoichi Hirasawa, and Hiroaki Kobayashi, “A High-Level Interface of Xevolver for Composing Loop Transformations,” Sustained Simulation Performance 2015, pp 137-145, 2015.

Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, and Hiroaki Kobayashi, “Performance Evaluation of Compiler-Assisted OpenMP Codes on Various HPC Systems,” Sustained Simulation Performance 2015, pp 147-157, 2015.

Ryusuke Egawa, Kazuhiko Komatsu, and Hiroaki Kobayashi, “Code Optimization Activities Toward a High Sustained Simulation Performance,” Sustained Simulation Performance 2015, pp 159-168, 2015.

Invited Talks

Kazuhiko Komatsu, “Migration of an HPC Code to an OpenACC Platform Using a Code Translation Framework,” 2016 Conference on Advanced Topics and Auto Tuning in High-Performance Scientific Computing (ATAT in HPSC 2016), Feb. 2016.

Daisuke Takahashi, “Automatic Tuning for Parallel FFTs on Intel Xeon Phi Clusters,” 2016 Conference on Advanced Topics and Auto Tuning in High-Performance Scientific Computing (ATAT in HPSC 2016), Feb. 2016.

Reiji Suda, “Semi-Automatic Construction of Performance Modeling Software for Autotuning,” 2016 Conference on Advanced Topics and Auto Tuning in High-Performance Scientific Computing (ATAT in HPSC 2016), Feb. 2016.

Hiroyuki Takizawa, Takeshi Yamada, Shoichi Hirasawa, and Hiroaki Kobayashi, “Data Layout Optimization Using User-Defined Code Transformations,” 2016 Conference on Advanced Topics and Auto Tuning in High-Performance Scientific Computing (ATAT in HPSC 2016), Feb. 2016.

Shoichi Hirasawa, Hiroyuki Takizawa and Hiroaki Kobayashi, “Streamlining Empirical Tuning of Large-scale HPC Applications,” 2016 Conference on Advanced Topics and Auto Tuning in High-Performance Scientific Computing (ATAT in HPSC 2016), Feb. 2016.

Presentations

Reiji Suda, “Saving Collective Communications in Conjugate Gradient Method for Very Large Supercomputers,” 3rd TWSIAM Annual meeting, May 31, 2015.

Hiroyuki Takizawa, Shoichi Hirasawa, Kazuhiko Komatsu, Ryusuke Egawa and Hiroaki Kobayashi, “Expressing system-awareness as code transformations for performance portability across diverse HPC systems,” Workshop on Portability Among HPC Architectures for Scientific Applications, Nov. 2015.

Hiroyuki Takizawa, Takeshi Yamada, Takuya Tsunogawa, Shoichi Hirasawa, and Hiroaki Kobayashi, “Performance Engineering of HPC Applications Based on Pattern Matching,” The 23rd Workshop on Sustained Simulation Performance, Mar. 16-17, 2016.

Shoichi Hirasawa, Hiroyuki Takizawa, and Hiroaki Kobayashi, “A Correctness Verification Framework for Empirically Tuning Large-scale HPC Applications,” The 23rd Workshop on Sustained Simulation Performance, Mar. 16-17, 2016.

Posters

Kazuhiko Komatsu, Ryusuke Egawa, Yoko Isobe, Ryusei Ogata, Hiroyuki Takizawa and Hiroaki Kobayashi, “An Approach to the Highest Efficiency of the HPCG Benchmark on the SX-ACE Supercomputer,” in the International Conference for High Performance Computing, Networking, Storage and Analysis (SC15), Nov. 2015. (Poster)

FY 2014

Journals

Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi, “A Compiler-Assisted OpenMP Migration Method Based on Automatic Parallelizing Information,” ISC’14, Germany, 2014/6/25.

Daichi Mukunoki and Daisuke Takahashi, “Using Quadruple Precision Arithmetic to Accelerate Krylov Subspace Methods on GPUs,” Proc. 10th International Conference on Parallel Processing and Applied Mathematics (PPAM 2013), Part I, Workshop on Numerical Algorithms on Hybrid Architectures, Lecture Notes in Computer Science, Vol. 8384, pp. 632-642, 2014. (DOI: 10.1007/978-3-642-55224-3_59)

Chunyan Wang, Shoichi Hirasawa, Hiroyuki Takizawa and Hiroaki Kobayashi, “Platform-Specific Code Smell Alert System for High Performance Computing Applications,” The 16th Workshop on Advances on Parallel and Distributed Processing Symposium (APDCM 2014), 2014.

Alfian Amrizal and Shoichi Hirasawa and Hiroyuki Takizawa and Hiroaki Kobayashi, “Automatic Parameter Tuning of Hierarchical Incremental Checkpointing,” The 9th International Workshop on Automatic Performance Tuning (iWAPT2014), 2014.

Xiong Xiao, Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi, “An Approach to Customization of Compiler Directives for Application-Specific Code Transformations,” IEEE 8th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-14), Sep., 2014.

Akihiro Fujii and Osni Marques, “Axis Communication Method for Algebraic Multigrid Solver,” IEICE Transactions on Information and Systems, Vol.E97-D, No.11, pp. 2955-2958, 2014. (DOI: 10.1587/transinf.2014EDL8052)

Yuki Sumiyoshi, Akihiro Fujii, Akira Nukada and Teruo Tanaka, “Mixed-Precision AMG method for Many Core Accelerators,” Proc. EuroMPI/ASIA ‘ 14, International Workshop on Enhancing Parallel Scientific Applications with Accelerated HPC (ESAA 2014). p. 127, 2014. (DOI:10.1145/2642769.2642794)

Yuki Sugimoto, Fumihiko Ino, Kenichi Hagihara, “Improving Cache Locality for GPU-based Volume Rendering,” Parallel Computing, Vol. 40, No. 5/6, pp.59-69, 2014. (DOI:10.1016/j.parco.2014.03.013)

Kei Ikeda, Fumihiko Ino, Kenichi Hagihara, Efficient Acceleration of Mutual Information Computation for Nonrigid Registration Using CUDA,” IEEE Journal of Biomedical and Health Informatics, Vol. 18, No. 3, pp.956-968, 2014. (DOI: 10.1109/JBHI.2014.2310745)

Shohei Ando, Fumihiko Ino, Toru Fujiwara, and Kenichi Hagihara, “A Parallel Algorithm for Enumerating Joint Weight of a Binary Linear Code in Network Coding,” Proceedings of the 2nd International Symposium on Networking and Computing, pp.xx–xx, (2014-12).

Hiroyuki Takizawa, Shoichi Hirasawa, Yasuharu Hayashi, Ryusuke Egawa, Hiroaki Kobayashi, “Xevolver: An XML-based Code Translation Framework for Supporting HPC Application Migration,” IEEE International Conference on High Performance Computing (HiPC), pages 1-11, Dec. 2014.

Chunyan Wang, Shoichi Hirasawa, Hiroyuki Takizawa, and Hiroaki Kobayashi, “Identification and Elimination of Platform-Specific Code Smells in High Performance Computing Applications,” International Journal of Networking and Computing, Volume 5, Number 1, pages 180–199, January 2015

Chunyan Wang, Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi, “Combining Code Refactoring and Auto-tuning to Improve Performance Portability of High-Performance Computing Applications,” The Sixth International Conference on Computational Logics, Algebras, Programming, Tools, and Benchmarking(COMPUTATION TOOLS 2015), Mar. 2015.

Other Publications

Ryusuke Egawa, Kazuhiko Komatsu, Hiroaki Kobayashi, “Designing an HPC Refactoring Catalog Toward the Exa-scale Computing Era,” Sustained Simulation Performance 2014, pp 91-98, 2014.

Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi, “Performance Evaluation of an OpenMP Parallelization by Using Automatic Parallelization Information,” Sustained Simulation Performance 2014, pp 119-126, 2014.

Invited Talks

Hiroyuki Takizawa, “Evolutionary Adaptation of HPC Applications to Revolutionary System Changes,” ISC’14, Germany, 2014/6/23.

Ryusuke Egawa, “System Design Strategies for Disaster-prevention Applications,” EUROMPI/ASIA 2014 WORKSHOP: CHALLENGES IN DATA-CENTRIC COMPUTING (BIGDATACOMPUTING’2014), 10 Sep.2014, Kyoto, Japan.

Hiroyuki Takizawa, “Autotuning with User-defined Code Transformations,” 2015 Conference on Advanced Topics and Auto Tuning in High-Performance Scientific Computing February 27-28, 2015.

Shoichi Hirasawa, “A Correctness Checking Framework for Empirical Auto-tuning,” 2015 Conference on Advanced Topics and Auto Tuning in High-Performance Scientific Computing February 27-28, 2015.

Daisuke Takahashi, “Automatic Tuning for Parallel FFTs on GPU Clusters,” 2015 Conference on Advanced Topics and Auto Tuning in High-Performance Scientific Computing February 27-28, 2015.

Reiji Suda, “Noise-reducing Collective Communication Algorithms,” 2015 Conference on Advanced Topics and Auto Tuning in High-Performance Scientific Computing February 27-28, 2015.

Ryusuke Egawa, “Overcoming Performance Portability Issues on Modern HPC Systems,” 2015 Conference on Advanced Topics and Auto Tuning in High-Performance Scientific Computing February 27-28, 2015.

Presentations

Fumihiko Ino, Akihito Nakano, Kenichi Hagihara, “An Extension of OpenACC for Pipelined Processing of Large Data on a GPU,” Legacy HPC Application Migration 2014, 2014/9/23.

Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi, “OpenMP Parallelization Method using Compiler Information of Automatic Optimization,” Legacy HPC Application Migration 2014, 2014/9/23.

Reiji Suda, Shoichi Hirasawa, Hiroyuki Takizawa, “User-defined Source-to-source Code Transformation Tools using Xevolver, ” Legacy HPC Application Migration 2014, 2014/9/24.

Akihiro Fujii, Takuya Nomura, Teruo Tanaka, “Communication Optimization Technique of Algebraic multi-grid solver to Each Computing System, ” Legacy HPC Application Migration 2014, 2014/9/24.

Hiroyuki Takizawa,”An Evolutionary Approach to Construction of a Software Development Environment for Massively-Parallel Heterogeneous Systems,” 2014 ATIP Workshop: Japanese Research Toward Next-Generation Extreme Computing, Nov.17, 2014.

Reiji Suda, “Developments and experiences in Xevolver, an extensible code transformation system for supporting software evolution,” JST CREST International Symposium on Post Petescale System Software, ISP2S2, Dec., 2014.

Fumihiko Ino, “An extension of OpenACC for pipelined execution of large datasets,” JST CREST International Symposium on Post Petescale System Software, ISP2S2, Dec., 2014.

Kazuhiko Komatsu, “High-productive OpenMP migration using compile information,” JST CREST International Symposium on Post Petescale System Software, ISP2S2, Dec., 2014.

Ryusuke Egawa, “Code Optimization Activities toward Sustained Simulation Performance,” 20th Workshop on Sustained Simulation Performance, Dec. 15-16, 2014.

Hiroyuki Takizawa, “Xevolver: an extensible framework for user-defined code transformation,” 20th Workshop on Sustained Simulation Performance, Dec. 15-16, 2014.

Kazuhiko Komatsu, “High-productive OpenMP migration using Automatic Parallelizing Information,” 20th Workshop on Sustained Simulation Performance, Dec. 15-16, 2014.

Akihiro Fujii, Takuya Nomura, Teruo Tanaka, Osni Marques, ”AMGS: Algebraic Multigrid Solver with Coarse Grid Aggregation,” Annual Meeting on Advanced Computing System and Infrastructure (ACSI) 2015.

Hiroyuki Takizawa, “What can we do to fight with system diversity?,” 21st Workshop on Sustained Simulation Performance, Feb. 18-19, 2015.

Ryusuke Egawa, “Green HPC System Design with Innovative Technologies,” 21st Workshop on Sustained Simulation Performance, Feb. 18-19, 2015.

Hiroyuki Takizawa, Shoichi Hirasawa, Hiroaki Kobayashi, “A Framework for Separation of Concerns Between Application Requirements and System Requirements,” 2015 SIAM Conference on Computational Science and Engineering (CSE15), Salt Palace Convention Center, Salt Lake City, Utah, USA, March 18, 2015.

Daisuke Takahashi, “Automatic Tuning for Parallel FFTs on GPU Clusters,” 2015 SIAM Conference on Computational Science and Engineering (CSE15), Salt Palace Convention Center, Salt Lake City, Utah, USA, March 18, 2015.

Hiroshi Maeda and Daisuke Takahashi, “Performance Evaluation of Sparse Matrix-Vector Multiplication Using GPU/MIC Cluster,” 2015 SIAM Conference on Computational Science and Engineering (CSE15), Salt Palace Convention Center, Salt Lake City, Utah, USA, March 14, 2015.

Posters

Tomochika Kato, Fumihiko Ino, and Kenichi Hagihara. “PACC: An Extension of OpenACC for Pipelined Processing of Large Data on a GPU,” Poster in the 27th International Conference for High Performance Computing, Networking, Storage and Analysis, (2014-11).

Ryusuke Egawa, Shintaro Momose, Kazuhiko Komatsu, Yoko Isobe, Hiroyuki Takizawa, Akihiro Musa, Hiroaki Kobayashi, “Early Evaluation of the SX-ACE Processor,” Poster in the 27th International Conference for High Performance Computing, Networking, Storage and Analysis, (2014-11).

Shoichi Hirasawa, “HPC Refactoring and Code Transformation toward Next-generation Extreme Computing,” 2014 ATIP Workshop: Japanese Research Toward Next-Generation Extreme Computing, Nov.17, 2014.

Shoichi Hirasawa, Tohoku Univ./JST CREST, “Enhancing Performance Portability of Real Applications Using Xevolver,” JST CREST International Symposium on Post Petescale System Software, ISP2S2, Dec., 2014.

Takahashi Daisuke, “Parallel Numerical Libraries with Xevolver towards Exa-Scale Systems,” JST CREST International Symposium on Post Petescale System Software, ISP2S2, Dec., 2014.

Ken’ichi Itakura, “Designing an HPC Refactoring Catalog toward Post Peta-scale Computing Era,” JST CREST International Symposium on Post Petescale System Software, ISP2S2, Dec., 2014.

Reiji Suda, “Tools for Exa-Scale Computational Science Codes based on Xevolver,” JST CREST International Symposium on Post Petescale System Software, ISP2S2, Dec., 2014.

FY 2013

Journals

Fumihiko Ino, Kentaro Shigeoka, Tomohiro Okuyama, Masaya Motokubota, and Kenichi Hagihara, “A Parallel Scheme for Accelerating Parameter Sweep Applications on a GPU,” Concurrency and Computation: Practice and Experience, Vol.26, No.2, pp.516-531, 2014. (DOI: 10.1002/cpe.3016)

Hiroyuki Takizawa, Makoto Sugawara, Shoichi Hirasawa, Isaac Gelado, Hiroaki Kobayashi, and Wen-mei W. Hwu, “clMPI: An OpenCL Extension for Interoperation with the Message Passing Interface,” the IEEE 27th International Symposium on Parallel & Distributed Processing Workshops(IPDPSW2013), pp.1138-1148, 2013. (DOI: 10.1109/IPDPSW.2013.183)

Makoto Sugawara, Shoichi Hirasawa, Kazuhiko Komatsu, Hiroyuki Takizawa and Hiroaki Kobayashi, “A Comparison of Performance Tunabilities between OpenCL and OpenACC,” the IEEE 7th International Symposium on Embedded Multicore SoCs (MCSoC-13), pp. 147-152, 2013. (DOI: 10.1109/MCSoC.2013.31)

Fumihiko Ino, Shinta Nakagawa, and Kenichi Hagihara, “GPU-Chariot: A Programming Framework for Stream Applications Running on Multi-GPU Systems,” IEICE Transactions on Information and Systems, Vol.96-D, No.12, pp.2604-2613, 2013. (DOI: 10.1587/transinf.E96.D.2604)

Daichi Mukunoki and Daisuke Takahashi, “Optimization of Sparse Matrix-vector Multiplication for CRS Format on NVIDIA Kepler Architecture GPUs,” Proc. 13th International Conference on Computational Science and Its Applications (ICCSA 2013), Part V, Lecture Notes in Computer Science, Vol. 7975, pp.211-223, 2013. ?DOI: 10.1007/978-3-642-39640-3_15?

Daisuke Takahashi, “Implementation of Parallel 1-D FFT on GPU Clusters,” Proc. 2013 IEEE 16th International Conference on Computational Science and Engineering(CSE 2013) , pp.174-180, 2013. (DOI: 10.1109/CSE.2013.36)

Takaaki Hiragushi and Daisuke Takahashi, “Efficient Hybrid Breadth-First Search on GPUs,” LNCS Algorithms and Architectures for Parallel Processing (ICA3PP 2013), Vol. 8286, pp. 40-50, 2013. (DOI: 10.1007/978-3-319-03889-6_5?

Ayumu Tomiyama, Reiji Suda, “Automatic Parameter Optimization for Edit Distance Algorithm on GPU,” LNCS High Performance Computing for Computational Science – VECPAR 2012, Vol. 7851, pp.420-434, 2013. (DOI: 10.1007/978-3-642-38718-0_38)

Kamil Rocki, Reiji Suda, “High Performance GPU Accelerated Local Optimization in TSP,” Third Workshop on Parallel Computing and Optimization (PCO’13) in conjunction with 27th IEEE International Parallel & Distributed Processing Symposium (IPDPS), pp. 1788-1796, 2013. (DOI: 10.1109/IPDPSW.2013.227)

Cheng Luo and Reiji Suda, “An Efficient Task Partitioning and Scheduling Method for Symmetric Multiple GPU Architecture,” the 11th International Symposium on Parallel and Distributed Processing with Applications(ISPA2013), pp.1133-1142, 2013. (DOI: 10.1109/TrustCom.2013.137)

Tian Xiaochen, Kamil Rocki, Reiji Suda, “Register Level Sort Algorithm on Multi-Core SIMD Processors,” IA^3 Workshop on Irregular Applications: Architectures & Algorithms, The International Conference for High-Performance Computing, Networking, Storage, and Analysis (SC13), pp. 9:1-9:8, 2013. (DOI: 10.1145/2535753.2535762)

Kamil Rocki, Martin Burtscher, and Reiji Suda, “The Future of Accelerator Programming: Abstraction, Performance or Can We Have Both?,” the 29th Symposium on Applied Computing, Gyeongju (SAC 2014), 2014. (to appear)

Other Publications

Kamil Rocki, Reiji Suda, “Large-scale Parallel Iterated Local Search Algorithm for Travelling Salesman Problem”, TSUBAME e-Science Journal, Vol.10, pp.13-17, pp. 30-34, 2013.

Kazuhiko Komatsu, Toshihide Sasaki, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi, “Analysing the Performance Improvements of Optimizations on Modern HPC Systems,” Sustained Simulation Performance 2013, Springer Berlin Heidelberg, pp. 13-25, 2013.

Invited Talks

Shoichi Hirasawa, “An Automatic Performance Tracking System for Software Evolution of Large Scale Vector Applications,” Xev CREST Project Open Seminar, Tokyo, May 28, 2013.

Hiroyuki Takizawa, “An extensible programming framework for custom code transformations,” 2014 Conference on Advanced Topics and Auto Tuning in High Performance Scientific Computing, Taipei, Mar. 14-15, 2014.

Shoichi Hirasawa, “A Light-weight Rollback Mechanism for Testing Code Variants in Auto-tuning,” 2014 Conference on Advanced Topics and Auto Tuning in High Performance Scientific Computing, Taipei, Mar. 14-15, 2014.

Daisuke Takahashi, “Implementation of Parallel FFTs on GPU Clusters,” 2014 Conference on Advanced Topics and Auto Tuning in High Performance Scientific Computing, Taipei, Mar. 14-15, 2014.

Reiji Suda, “Autotuning with a Nuisance Parameter: A Case Study for Power Optimization,” 2014 Conference on Advanced Topics and Auto Tuning in High Performance Scientific Computing, Taipei, Mar. 14-15, 2014.

Presentations

Hiroyuki Takizawa, “Towards an Extensible Programming Environment for Software Evolution,” Special Session: Legacy HPC Application Migration 2013 (LHAM) (held in conjunction with IEEE MCSoC-13), Tokyo, Sep. 27, 2013.

Daisuke Takahashi, “Experience of Implementing Parallel FFTs on GPU Clusters,” Special Session: Legacy HPC Application Migration 2013 (LHAM) (held in conjunction with IEEE MCSoC-13), Tokyo, Sep. 27, 2013.

Akihiro Fujii, Takuya Nomura, Teruo Tanaka, and Osni Marques, “Dynamic Parallel Algebraic Multigrid Coarsening for Strong Scaling,” MS50 “Auto-tuning Technologies for Extreme-Scale Solvers” – Part III, SIAM Conference on Parallel Processing for Scientific Computing (PP14), Portland (USA), Feb. 20, 2014.

Kamil Rocki, “OpenCL-based Approach to Heterogeneous Parallel TSP Optimization,” IWOCL 2013, International Workshop on OpenCL, the Georgia Institute of Technology, Boston(USA), May 13-14, 2013.

Yuki Takeuchi, “Second order accuracy finite difference methods for fractional diffusion equations,” ASME 2013 International Design Engineering Technical Conferences (IDETC) and Computers and Information in Engineering Conference (CIE), Portland(USA), Aug. 4-7, 2013. (abstract review)

Yuki Takeuchi, “Approximate solutions of fractional differential equations with Riesz fractional derivatives in a finite domain,” International Conference on Scientific Computation and Differential Equations(SciCADE 2013), Valladolid(Spain), Sep. 16-20, 2013.

Kamil Rocki, “The Future of Accelerator Programming: Abstraction, Performance or Can We Have Both?,” Special Session: Legacy HPC Application Migration 2013 (LHAM) (held in conjunction with IEEE MCSoC-13), Tokyo, Sep. 27, 2013.

Cong Li, Reiji Suda, Kohei Shimane, and Hongzhi Chen, “BCBCG: Iterative Solver with Less Number of Global Communications,” MS42 Auto-tuning Technologies for Extreme-Scale Solvers – Part II of III (Feb 20), SIAM PP14, Portland(USA), Feb. 18-21, 2014.

Jiahong Chen, Ray-Bing Chen, Akihiro Fujii, Reiji Suda, Weichung Wang, “Timing Performance Surrogates in Auto-Tuning for Qualitative and Quantitative Factors,” CP16 Performance Optimization (Feb 19), SIAM PP14, Portland(USA), Feb. 18-21, 2014.

Ryusuke Egawa, “An HPC Refactoring Catalog; Guidelines to Bridge The Gap between HPC Systems,” Special Session: Legacy HPC Application Migration 2013 (LHAM) (held in conjunction with IEEE MCSoC-13), Tokyo, Sep. 27, 2013.

Ryusuke Egawa, “Designing an HPC Refactoring Catalog toward the Exa-scale Computing Era,” 18th Workshop on Sustained Simulation Performance(WSSP18), Stuttgart(Germany), Oct. 28-29, 2013.

Kazuhiko Komatsu, “Performance evaluation of auto-parallelized codes on various supercomputing systems,” 18th Workshop on Sustained Simulation Performance(WSSP18), Stuttgart(Germany), Oct. 28-29, 2013.

Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Takashi Soga, Akihiro Musa, Hiroaki Kobayashi, “Design of the Next-Generation Vector Architecture for Postpeta-Scale CFD,” International Conference on Fluid Dynamics(ICFD2013), Sendai, Nov. 25-27, 2013.

Kazuhiko Komatsu, “Performance Comparison of Auto-parallelized Codes and OpenMP Codes on Various Supercomputing Systems,” 19th Workshop on Sustained Simulation Performance(WSSP19), Sendai, Mar. 27-28, 2014.

Posters

Hiroyuki Takizawa, Xiong Xiao, Shoichi Hirasawa, Hiroaki Kobayashi, “An XML-based Programming Framework for User-defined Code Transformations,” The 4th AICS International Symposium, Kobe, Dec. 2-3, 2013.

Hiroyuki Takizawa, Shoichi Hirasawa, and Hiroaki Kobayashi, “Xevolver : an XML-based Programming Framework for Software Evolution,” poster presentation at Supercomputing Conference 2013 (SC13), Denver(USA), 2013. (abstract review)

Daichi Mukunoki and Daisuke Takahashi, “Linear Algebra Operations using Quadruple-Precision Arithmetic on GPU,” GPU Technology Conference (GTC 2013), San Jose(USA), Mar. 24-27, 2013.

Kamil Rocki, Martin Burtscher, Reiji Suda, “The Future of Accelerator Programming: Abstraction, Performance or Can We Have Both?,” The 19th IEEE International Conference on Parallel and Distributed Systems(ICPADS2013), Seoul(Korea), Dec. 15-18, 2013.

FY 2012

Journals

Fumihiko Ino, Yuma Munekawa, and Kenichi Hagihara, “Sequence Homology Search Using Fine Grained Cycle Sharing of Idle GPUs,” IEEE Transactions on Parallel and Distributed Systems, Vol.23, No.4, pp.751-759, April 2012. (DOI: 10.1109/TPDS.2011.239)

Tomohiro Okuyama, Fumihiko Ino, and Kenichi Hagihara, “A Task Parallel Algorithm for Finding All-Pairs Shortest Paths Using the GPU,” International Journal of High Performance Computing and Networking, Vol.7, No.2, pp.87-98, April 2012. (DOI: 10.1504/IJHPCN.2012.046384)

Alfian Amrizal, Shoichi Hirasawa, Kazuhiko Komatsu, Hiroyuki Takizawa, and Hiroaki Kobayashi, “Improving the Scalability of Transparent Checkpointing for GPU Computing Systems,” IEEE Region 10 Conference (TENCON 2012), 2012.

Kamil Rocki, Reiji Suda, “Accelerating 2-opt and 3-opt local search using GPU in the Travelling Salesman Problem”, The 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID 2012), Ottawa, Canada, 13-16 May 2012

Yuki Takeuchi and Reiji Suda, “New numerical computation formula and error analysis of some existing formulae in fractional derivatives and integrals,” The 5th IFAC Symposium on Fractional Differentiation and its Applications (FDA’12), Hohai University, Nanjing, China, May 14-17, 2012.

Kei Ikeda, Fumihiko Ino, and Kenichi Hagihara, “Accelerating Joint Histogram Computation for Image Registration on the GPU,” In Proceedings of Computer Assisted Radiology and Surgery: 26th International Congress and Exhibition (CARS 2012), pp.S72-S73, June 2012.

Muhammad Ismail Faruqi, Fumihiko Ino, and Kenichi Hagihara, “Acceleration of Variance of Color Differences-Based Demosaicing Using CUDA,” In Proceedings of the 10th International Conference on High Performance Computing and Simulation (HPCS 2012), pp.503-510, July 2012.

Kamil Rocki, Reiji Suda, “An efficient GPU implementation of the iterative hill climbing based TSP solver for large problem instances”, ACM/SIGEVO GECCO 2012: Genetic and Evolutionary Computation Conference, Philadelphia, USA, July 07 – 11, 2012

Ayumu Tomiyama, Reiji Suda, “Automatic Parameter Optimization for Edit Distance Algorithm on GPU”, the seventh international Workshop on Automatic Performance Tuning (iWAPT 2012) / VECPAR 2012, RIKEN Advanced Institute for Computational Science, Kobe, July 17th, 2012.

Hiroki Yoshizawa and Daisuke Takahashi: Automatic Tuning of Sparse Matrix-Vector Multiplication for CRS format on GPUs, Proc. 2012 IEEE 15th International Conference on Computational Science and Engineering (CSE 2012), pp. 130–136 (2012). (DOI: 10.1109/ICCSE.2012.28)

Daichi Mukunoki and Daisuke Takahashi: Implementation and Evaluation of Triple Precision BLAS Subroutines on GPUs, Proc. 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW 2012), The 13th Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC-12), pp. 1378–1386 (2012). (DOI : 10.1109/IPDPSW.2012.175)

Kohei Shimane, Reiji Suda, “A Fast Tour Construction Algorithm for ACOTSP”, The 4th International Conference on Metaheuristics and Nature Inspired Computing(META’2012), Port El-Kantaoiui (Sousse, Tunisia) Oct 27-31, 2012.

Kamil Rocki, Reiji Suda, “High Performance GPU Accelerated TSP Solver” (Electronic Poster), The International Conference for High-Performance Computing, Networking, Storage, and Analysis (SC12), 10-16 November 2012, Salt Lake City, USA.

Other Publications

Kazuhiko Komatsu, Takashi Soga, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi, Shun Takahashi, Daisuke Sasaki, and Kazuhiro Nakahashi, “Performance Evaluation of BCM on Various Supercomputing Systems,” In 24th International Conference on Parallel Computational Fluid Dynamics, 2012.

Kazuhiko Komatsu, Takashi Soga, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi, “Performance evaluation of a next-generation CFD on various supercomputing systems,” High Performance Computing on Vector Systems, 2012

Invited Talks

Reiji Suda, “HPC, PARALLEL, AT”, NII Shonan Meeting on Bridging the theory of staged programming languages and the practice of high-performance computing”, May 19-22, 2012.

Kazuhiko Komatsu, Takashi Soga, Ryusuke Egawa, Hiroyuki Takizawa, and Hiroaki Kobayashi, “Performance Evaluation of a CFD using Cartesian Meshes on Various Supercomputing Systems,” In NUG XXIV, June 2012.

Ryusuke Egawa, “Introduction to SIMD, Vector, and Parallel Supercomputing,” SICE2012 Tutorial II, Akita, 2012.

Kazuhiko Komatsu, “Introduction to GPU Computing,” SICE2012 Tutorial II, Akita, 2012.

Kamil Rocki. “Accelerating Parallel Monte Carlo Tree Search using CUDA”, GTC Japan 2012, 7/26

Hiroyuki Takizawa, “Software Evolution for System Architecture Revolution,” IEEE International Symposium on Embedded Multicore SoCs, September 21, 2012.

Daisuke Takahashi: Automatic Tuning for Parallel FFTs on Clusters of Multi-Core Processors, Special Session: Auto-Tuning for Multicore and GPU (ATMG) (held in conjunction with IEEE MCSoC-12), The University of Aizu, Aizu, Japan, September 22, 2012.

Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, and Hiroaki Kobayashi. Performance of Practical Applications on Modern Supercomputing Systems. In SC12 NEC booth presentation, Nov 2012.

Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, and Hiroaki Kobayashi. Toward High Performance-Portabilities on Modern HPC Systems. In 16th Workshop on Sustained Simulation Performance, Dec. 2012.

Hiroyuki Takizawa, “A new research project for enabling evolution of legacy code into massively-parallel heterogeneous computing applications.”, The 14th Teraflop Workshop, Stuttgart, Dec. 5, 2012.

Hiroyuki Takizawa, “Autotuning for Improving the Fault Tolerance of Large-scale Simulations,” Conference on Advanced Topics and Auto Tuning in High Performance Scientific Computing, March 27-29, 2013

Shoichi Hirasawa, “An Automatic Performance Tracking System for Scientific Software Evolution,” Conference on Advanced Topics and Auto Tuning in High Performance Scientific Computing, March 27-29, 2013

Daisuke Takahashi, “Automatic Tuning for Parallel FFTs”, 2013 Conference on Advanced Topics and Auto Tuning in High Performance Scientific Computing (2013@^2HPSC), National Taiwan University, March 28, 2013.

Reiji Suda, “Performance Correlations for Autotuning Efficiency, Conference on Advanced Topics and Auto Tuning in High Performance Scientific Computing, March 27-29, 2013

Akihiro Fujii and Teruo Tanaka, “Online Auto-Tuning Technique for Algebraic Multi-Grid Solver”, 2013 Conference on Advanced Topics and Auto Tuning in High Performance Scientific Computing (2013@^2HPSC), National Taiwan University, March 28, 2013.

Presentations

Reiji Suda, “4DAC and One-Step Approximation: Mathematical Formulation and Algorithm for Automatic Tuning”, EASIAM, Jun 27th, 2012.

Toru Motoya and Reiji Suda, “Conjugate Gradient Methods Relieved for Inner Product Communication Latencies”, International workshop on HPC, Krylov Subspace method and its applications, Jan 13-14, 2013, Beppu B-con Plaza.

Yuki Sugimoto, Fumihiko Ino, and Kenichi Hagihara, “An Acceleration Method for GPU-Based Volume Rendering by Localizing Texture Memory Reference,” IPSJ, 2012-HPC-138, (2013-02). 7 pages.

Shoichi Hirasawa, Hiroyuki Takizawa, and Hiroaki Kobayashi, “An IDE Integrated Cross-Platform Build System for Scientific Applications,” SIAM CSE2013 Minisymposium on Auto-tuning Technologies for Tools and Development Environment in Extreme-Scale Scientific Computing, February 2013

Vivek S Nittoor and Reiji Suda, “Balanced Tanner Units And Their Properties”, To Appear, Indo-Slovenia Conference on Graph Theory and Applications (Indo-Slov-2013) ?Feb 22-24, 2013, India.

Vivek S Nittoor and Reiji Suda, “Partition Parameters for Girth Maximum BTUs”, To Appear, Indo-Slovenia Conference on Graph Theory and Applications (Indo-Slov-2013) ?Feb 22-24, 2013, India.

Daichi Mukunoki and Daisuke Takahashi: Iterative Method for Sparse Linear Systems using Quadruple Precision Operations on GPUs, 2013 SIAM Conference on Computational Science and Engineering (CSE13), The Westin Boston Waterfront, Boston, Massachusetts, USA, February 28, 2013

Reiji Suda, “Toward Tunable Multi-Scheme Parallelization”, 2013 SIAM Conference on Computational Science and Engineering (CSE13), The Westin Boston Waterfront, Boston, Massachusetts, USA, February 28, 2013.

Posters

Kei Ikeda, Fumihiko Ino, and Kenichi Hagihara, “Accelerating Mutual Information Computation for Nonrigid Registration the GPU,” In Poster in the 3rd GPU Technology Conference (GTC 2012), May 2012.

Vivek S Nittoor and Reiji Suda, “Search for Optimal Graphs”, Poster Presentation at Extremal Combinatorics Conference at Illinois, Urbana-Champaigne, IL, 14-16 Mar 2013.

Daichi Mukunoki and Daisuke Takahashi, “Linear Algebra Operations using Quadruple-Precision Arithmetic on GPU,” GPU Technology Conference (GTC 2014), San Jose(USA), Mar. 24-27, 2013.

Fumihiko Ino and Kenichi Hagihara, “Fine-Grained Cycle Sharing of Idle GPUs for Homology Search,” In Poster in the 4th GPU Technology Conference (GTC 2013), San Jose, CA, USA, March 2013.

FY 2011

Journals

Kosuke Takahashi, Akihiro Fujii, Teruo Tanaka, “GPGPU-based Algebraic Multigrid Method”, Proc. 23rd IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS 2011) , pp. 93–99, 2011. ?DOI: 10.2316/P.2011.757-061)

Daichi Mukunoki and Daisuke Takahashi, “Implementation and Evaluation of Triple Precision BLAS Subroutines on GPUs”, Proc. 13th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC-12).

Other Publications

Invited Talks

Hiroyuki Takizawa, “How can we help software evolution for post-Peta scale computing and beyond?,” The 2nd AICS symposium, Kobe, Mar. 2, 2012.

Ryusuke Egawa, “Designing a Refactoring Catalog for HPC,” The 15th Workshop on Sustained Simulation Performance, Sendai, Mar. 23, 2012

Presentations

Ryusuke Egawa, “Evolutionary Creation of Programming Environments for Massively-parallel Heterogeneous Computing Systems?” APES Project Seminar, Aachen, Germany, Oct. 4, 2011.

Yuki Sugimoto, Fumihiko Ino, Kenichi Hagihara, “Improving Cache Locality for Ray Casting with CUDA,” The 3rd Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures, Munich, Feb. 29, 2012.

Muhammad Alfian Amrizal, Katsuto Sato, Kazuhiko Komatsu, Hiroyuki Takizawa, and Hiroaki Kobayashi. “Evaluation of a Scalable Checkpointing Mechanism for Heterogeneous Computing Systems.” IPSJ-Tohoku, Sendai, 2012/3/2.

Cong LI?Reiji SUDA, “A Three-Step Performance Automatic Tuning Strategy using Statistical Model for OpenCL Implementation of Krylov Subspace Methods,” IPSJ-HPC, Kobe, 2012/3/26.

Reiji Suda and Vivek S. Nittoor, “Efficient Monte Carlo Optimization with ATMathCoreLib,” IPSJ-HPC, Kobe, 2012/3/27.

Posters

Vivek S Nittoor and Reiji Suda, “A High Performance Computing Approach For Finding and Decoding Optimal Codes on Graphs”, HiPC 2011 at Bangalore, India, Dec. 18, 2011.