Daichi MUKUNOKI

名古屋大学情報基盤センターにおいて高性能計算（HPC），GPUコンピューティング，スーパーコンピューターにおける大規模並列計算，AI用プロセッサの低精度演算器の科学技術計算における活用に向けた研究，数値計算における混合精度計算，AI技術を活用した高性能計算・数値計算コードの生成技術などの研究開発を行っています．このほか所属研究室の学生とともに自動チューニング，アプリケーションのGPU対応化などのさまざまな研究を行っています．

Schedule / 学会発表予定（2026年3月13日更新）

椋木大地，生成AIによるスーパーコンピュータのプログラム開発 ― HPC-GENIEプロジェクトの紹介，【第97回】大学等におけるオンライン教育とデジタル変革に関するサイバーシンポジウム「教育機関DXシンポ」，2026年3月16日．
樹神宇徳，林俊一郎，椋木大地，横田理央，大島聡史，星野哲也，片桐孝洋，HPC-AI Scientist: LLMを活用したHPC向け研究フレームワーク，第203回ハイパフォーマンスコンピューティング・第17回量子ソフトウェア合同研究発表会，2026年3月．
三笠諒，林俊一郎，椋木大地，星野哲也，片桐孝洋，HPCコード生成能力向上のためのLLMを用いた強化学習手法の評価，第203回ハイパフォーマンスコンピューティング・第17回量子ソフトウェア合同研究発表会，2026年3月．
松崎竜之介，村上魁，椋木大地，宮島敬明，吉井一友，Cerebras CS-3における密行列積アルゴリズムの性能比較，第203回ハイパフォーマンスコンピューティング・第17回量子ソフトウェア合同研究発表会，2026年3月．
Koki Isobe, Daichi Mukunoki, Tetsuya Hoshino, Takahiro Katagiri, GPU Acceleration of Medical Image Representation Learning Models with Distributed Data Parallel, I/O Optimization, and AI-Assisted Development, 2026 Conference on Advanced Topics and Auto Tuning in High-Performance Scientific Computing, March 2026.
Takanori Kotama, Shun-ichiro Hayashi, Daichi Mukunoki, Rio Yokota, Satoshi Ohshima, Tetsuya Hoshino, Takahiro Katagiri, HPC-AutoResearch: An HPC-Native Framework for Autonomous LLM-Driven Experimentation, 2026 Conference on Advanced Topics and Auto Tuning in High-Performance Scientific Computing, March 2026.
Ryo Mikasa, Shun-Ichiro Hayashi, Daichi Mukunoki, Tetsuya Hoshino, Takahiro Katagiri, Performance-Aware GRPO Training for Large Language Models in High-Performance Computing Code Generation, 2026 Conference on Advanced Topics and Auto Tuning in High-Performance Scientific Computing, March 2026.
Shun-ichiro Hayashi, Koki Morita, Daichi Mukunoki, Tetsuya Hoshino, Takahiro Katagiri, VibeCodeHPC: A Multi LLM Agent System for HPC Code Auto-Tuning, The Twenty-first International Workshop on Automatic Performance Tuning (in conjunction with IEEE IPDPS 2026), May 2026 (accepted).
Daichi Mukunoki, Exploring Multi-Agent Systems for HPC Code Development, The 2nd International Workshop on Foundational Large Language Models Advances for HPC (conjunction with ISC-HPC 2026), Keynote Speech, June 26, 2026.

Profile / プロフィール

Email:
daichi.mukunoki _at_ gmail.com, mukunoki _at_ cc.nagoya-u.ac.jp
連絡先:
464-8601 愛知県名古屋市千種区不老町名古屋大学情報基盤センター 5F 0504
研究室:
片桐・星野研究室
研究分野:
高性能計算，並列数値計算，GPU，算術演算，高精度演算，混合精度計算，再現可能計算，性能最適化，自動チューニング，AI/LLMによるコード生成，など．
所属学会:
Association for Computing Machinery，情報処理学会，日本医用画像工学会，自動チューニング研究会
技術スキル:
C/C++, CUDA, MPI, OpenMP, Python, LaTeX, HTML
Google Scholar:
https://scholar.google.com/citations?user=TnysP90AAAAJ&hl=ja
ORCID:
https://orcid.org/0000-0002-0051-6811
researchmap:
https://researchmap.jp/mukunoki
ResearchGate:
https://www.researchgate.net/profile/Daichi_Mukunoki
Scopus:
https://www.scopus.com/authid/detail.uri?authorId=55027398500
LinkedIn:
https://www.linkedin.com/in/daichi-mukunoki

Biography / 経歴

Work experience / 職歴

2025年4月1日 - 現在：名古屋大学，情報基盤センター, 情報基盤デザイン開発部門，助教
2024年12月1日 - 2025年3月31日：名古屋大学, 情報基盤センター, 特任助教
2024年4月1日 - 2024年11月30日：芝浦工業大学, システム理工学部数理科学科, 臨時技術職員
2023年11月1日 - 2024年2月29日：株式会社ソニー・インタラクティブエンタテインメント, 基盤システム・エクスペリエンス設計本部 G部門 2部 7課, Sr. Software Engineer
2021年11月1日 - 2023年3月31日：東京大学情報基盤センター, 客員研究員
2019年4月1日 - 2023年10月31日：特定国立研究開発法人理化学研究所計算科学研究センター, 大規模並列数値計算技術研究チーム, 研究員
2019年4月1日 - 2021年3月31日：特定国立研究開発法人理化学研究所計算科学研究センター, フラッグシップ2020プロジェクトアーキテクチャ開発チーム, 研究員
2018年4月1日 - 2019年3月31日：特定国立研究開発法人理化学研究所計算科学研究センター, フラッグシップ2020プロジェクトアーキテクチャ開発チーム, 客員研究員
2018年4月1日 - 2019年3月31日：特定国立研究開発法人理化学研究所計算科学研究センター, 大規模並列数値計算技術研究チーム, 客員研究員
2017年10月1日 - 2019年3月31日：東京女子大学, 大学院理学研究科博士後期課程数学専攻, 特任研究員
2017年10月1日 - 2018年3月31日：特定国立研究開発法人理化学研究所計算科学研究機構, フラッグシップ2020プロジェクトアーキテクチャ開発チーム, 客員研究員
2017年10月1日 - 2018年3月31日：特定国立研究開発法人理化学研究所計算科学研究機構, 研究部門大規模並列数値計算技術研究チーム, 客員研究員
2017年4月1日 - 2017年9月30日：特定国立研究開発法人理化学研究所計算科学研究機構, フラッグシップ2020プロジェクトアーキテクチャ開発チーム, 特別研究員
2016年4月1日 - 2017年3月30日：独立行政法人理化学研究所計算科学研究機構, フラッグシップ2020プロジェクトコデザイン推進チーム, 特別研究員
2015年5月1日 - 2016年3月31日：独立行政法人理化学研究所計算科学研究機構, エクサスケールコンピューティング開発プロジェクトコデザイン推進チーム, 特別研究員
2014年6月1日 - 2017年9月30日：独立行政法人理化学研究所計算科学研究機構, 研究部門大規模並列数値計算技術研究チーム, 特別研究員
2013年12月1日 - 2014年5月31日：独立行政法人日本学術振興会, 特別研究員（PD）
2013年4月1日 - 2013年11月30日：独立行政法人日本学術振興会, 特別研究員（DC2）

Education / 学歴

2011年4月1日 - 2013年11月30日：筑波大学大学院システム情報工学研究科コンピュータサイエンス専攻博士後期課程（博士（工学））
2009年4月1日 - 2011年3月31日：筑波大学大学院システム情報工学研究科コンピュータサイエンス専攻博士前期課程（修士（工学））
2006年4月1日 - 2009年3月31日：筑波大学図書館情報専門学群（学士（図書館情報学））
2001年4月1日 - 2006年3月31日：岐阜工業高等専門学校電子制御工学科（準学士（工学））

Research Stay Abroad / 海外研究滞在

2023年9月：University of Perpignan Via Domitia, France（Prof. Defour）
2023年9月 - 2023年10月：LIP6, Sorbonne University, France（Prof. Jezequel）
2023年3月 - 2023年6月：LIP6, Sorbonne University, France（Prof. Jezequel）
2022年5月 - 2022年7月：LIP6, Sorbonne University, France（Prof. Jezequel）
2019年9月：LIP6, Sorbonne University, France（Prof. Jezequel）
2019年1月 - 2019年2月：LIP6, Sorbonne University, France（Prof. Jezequel）
2018年9月：LIP6, Sorbonne University, France（Prof. Jezequel）
2018年8月 - 2018年9月：KTH Royal Institute of Technology, Sweden（Dr. Iakymchuk）
2018年2月 - 2018年3月：KTH Royal Institute of Technology, Sweden（Dr. Iakymchuk）
2018年2月：LIP6, Sorbonne University, France（Prof. Graillat）

Research / 研究

大規模言語モデル（LLM）を用いたHPCコード・数値計算コードの自動生成

LLMの性能が著しく向上し，ChatGPTに代表される対話型エージェントだけでなく，プログラムコードの実装支援・自動生成技術が実用化されつつある．しかし専門性の高いコードの生成にはまだ多くの課題がある．特にHPCで用いる数値計算コードは高度な性能最適化が求められるため難易度が高く，現状では高度な知識を持った専門家の技術を必要としている．複数LLMの連携，プロンプトの反復的な改良，RAGの活用などの工夫により，HPC・数値計算コードに特化したLLM支援コード生成方式の開発を行っている．

主要な論文

Shun-ichiro Hayashi, Koki Morita, Daichi Mukunoki, Tetsuya Hoshino, Takahiro Katagiri, VibeCodeHPC: An Agent-Based Iterative Prompting Auto-Tuner for HPC Code Generation Using LLMs, arXiv preprint arXiv:2510.00031, September 2025.
Daichi Mukunoki, Shun-ichiro Hayashi, Tetsuya Hoshino, Takahiro Katagiri, Performance Evaluation of General Purpose Large Language Models for Basic Linear Algebra Subprograms Code Generation, arXiv preprint arXiv:2507.04697, 2025.

低精度演算を利用した高精度演算の実現

通常の科学技術計算には64-bit浮動小数点演算（FP64，いわゆる倍精度）が用いられる．しかしそれよりも低精度な演算，例えば32-bit単精度演算（FP32），16-bit半精度演算（FP16，BF16）が高速なプロセッサが存在する．例えばグラフィクス処理に特化したGPUや，AI処理に特化したプロセッサがその一例である．FP64演算を全くサポートしないプロセッサも存在する．このようなプロセッサでFP64演算を可能にする方法として，複数の浮動小数点数を連結して多数桁を表現する多倍長精度演算法や，行列積に対する無誤差計算法である尾崎スキームが知られている．これらの手法を応用することにより低精度演算を用いて高精度演算を高速に実行する方法・実装を開発している．

主要な論文

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura, Accurate Matrix Multiplication on Binary128 Format Accelerated by Ozaki Scheme, Proc. The 50th International Conference on Parallel Processing (ICPP-2021), No. 78, pp. 1-11, Aug. 9, 2021.
Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura, DGEMM using Tensor Cores, and Its Accurate and Reproducible Versions, Proc. ISC High Performance 2020, Lecture Notes in Computer Science, Vol. 12151, pp. 230-248, Jun. 2020.

再現可能な数値計算の実現

有限桁の演算で実行される数値計算は丸め誤差の影響により計算順序によって計算結果が変わりうる．また特殊な命令の利用により丸め誤差の入り方が変わりうる．したがって同じアルゴリズム，同じ問題に対しても，実行環境や実装，コンパイルした実行ファイルが異なると，計算結果を再現できない場合がある．この問題は計算結果の再現，品質保証，プログラムのデバッグにおいて問題となりうる．そこで誤差の発生しない高精度演算法を利用することで，どのような計算環境においても計算結果がビットレベルで一致する計算法を開発した．

主要な論文

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Roman Iakymchuk, Conjugate Gradient Solvers with High Accuracy and Bit-wise Reproducibility between CPU and GPU using Ozaki scheme, Proc. The International Conference on High Performance Computing in Asia-Pacific Region (HPCAsia 2021), pp. 100-109, 2021 (preprint is also available: hal-02986873).
Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki, Reproducible BLAS Routines with Tunable Accuracy Using Ozaki Scheme for Many-core Architectures, Proc. 13th International Conference on Parallel Processing and Applied Mathematics (PPAM 2019), Lecture Notes in Computer Science, Vol. 12043, pp. 516-527, Mar. 2020.

大規模分散並列計算のための高性能行列積の開発

数万プロセス規模の大規模分散並列計算では演算集中な計算であっても性能が通信律速となり，強スケーリングでの性能スケーラビリティが悪化する．そこで通信回避型アルゴリズム（2.5次元アルゴリズム）を利用し，既存の2次元分散方式を採用する行列積（いわゆるPDGEMM）と互換性のある行列積ルーチンを実装し，性能モデリングに基づく性能分析を行った．スーパーコンピュータ「京」（理化学研究所），スーパーコンピュータ「富岳」（理化学研究所），Oakforest-PACS（東京大学）を利用した性能評価では，既存のPDGEMMと比べて強スケーリング性能が改善することを示した．

主要な論文

Daichi Mukunoki, Toshiyuki Imamura, Performance Analysis of 2D-compatible 2.5D-PDGEMM on Knights Landing Cluster, Proc. International Conference on Computational Science (ICCS 2018), Lecture Notes in Computer Science, Vol. 10862, pp. 853-858, Jun. 2018.
Daichi Mukunoki, Toshiyuki Imamura, Implementation and Performance Analysis of 2.5D-PDGEMM on the K Computer, Proc. 12th International Conference on Parallel Processing and Applied Mathematics (PPAM 2017), Lecture Notes in Computer Science, Vol. 10777, pp. 348-358, Mar. 2018.

GPUにおける高速な行列計算法の開発

数万スレッドでベクトル的に計算を行うGPUのプログラムは，CPU向けのプログラムとは異なる実装・性能最適化が求められる．密行列および疎行列に対する行列ベクトル積などの基本線形代数演算に対して，新しい高性能実装法を開発した．また性能を左右するプログラム実行スレッド数を自動的に調整する技術を開発した．

主要な論文

Daichi Mukunoki, Toshiyuki Imamura, Daisuke Takahashi, Automatic Thread-Block Size Adjustment for Memory-Bound BLAS Kernels on GPUs, Proc. IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-16). pp. 377-384, Sep. 2016.

新しい混合精度計算法の開発

科学技術計算に標準的に用いられる倍精度演算（FP64）だけでなく，より低精度の演算，例えば単精度演算（FP32），半精度演算（FP16）などを用いることで，より高速に計算を行う混合精度計算法が広く研究されている．主に疎行列計算向けに新しい混合精度計算法，あるいは計算法に依存しない自動混合精度化手法の研究を行っている．また疎行列反復解法において，通常は倍精度演算を用いる計算においてあえて高精度な演算（4倍精度など）を用いることで，疎行列反復解法の収束性を改善し，高速に計算できる手法を開発した．

主要な論文

Daichi Mukunoki, Daisuke Takahashi, Using Quadruple Precision Arithmetic to Accelerate Krylov Subspace Methods on GPUs, Proc. 10th International Conference on Parallel Processing and Applied Mathematics (PPAM 2013), Part I, Workshop on Numerical Algorithms on Hybrid Architectures, Lecture Notes in Computer Science, Vol. 8384, pp. 632-642, May 2014.

Publications / 研究業績

Journal Papers / 論文誌

Xuanzhengbo Ren, Yuta Kawai, Tetsuya Hoshino, Hirofumi Tomita, Takahiro Katagiri, Daichi Mukunoki, Seiya Nishizawa, Learning-Augmented Performance Model for Tensor Product Factorization in High-Order FEM, IEEE Access, 2026 (accepted).
Xuanzhengbo Ren, Tetsuya Hoshino, Daichi Mukunoki, Takahiro Katagiri, Porting NICAM Microphysics to Kokkos: A Case Study in Performance Portability for Atmospheric Models, IPSJ Transactions on Advanced Computing System, 2025 (accepted).
Takaaki Miyajima, Ryunosuke Matsuzaki, Daichi Mukunoki, Single-Precision Matrix Multiplication Performance on Cerebras CS-2: Evaluation and Modelling of Performance, Scalability and Energy Efficiency, Journal of Information Processing, Vol. 34 pp. 132-139, 2026.
Katsuhisa Ozaki, Daichi Mukunoki, Takeshi Ogita, Extension of accurate numerical algorithms for matrix multiplication based on error-free transformation, Japan Journal of Industrial and Applied Mathematics, Vol. 42, pp. 1-20, Oct. 29, 2024.
Kensuke Aihara, Katsuhisa Ozaki, Daichi Mukunoki, Mixed-precision conjugate gradient algorithm using the groupwise update strategy, Japan Journal of Industrial and Applied Mathematics, Volume 41, pp. 837-855, Feb. 6, 2024.
Daichi Mukunoki, Takeshi Ogita, Performance and Energy Consumption of Accurate and Mixed-precision Linear Algebra Kernels on GPUs, Journal of Computational and Applied Mathematics, Vol. 372, p. 112701, Jul., 2020.
椋木大地, 高橋大介, GPUにおける3倍・4倍精度浮動小数点演算の実現と性能評価, 情報処理学会論文誌コンピューティングシステム, Vol. 6, No. 1, pp. 66-77, 2013年1月31日 (in Japanese).

Peer-reviewed Conference Proceedings / 査読付プロシーディングス

Shun-ichiro Hayashi, Koki Morita, Daichi Mukunoki, Tetsuya Hoshino, Takahiro Katagiri, VibeCodeHPC: A Multi LLM Agent System for HPC Code Auto-Tuning, The Twenty-first International Workshop on Automatic Performance Tuning, 2026 (accepted).
Daichi Mukunoki, DGEMM using FP64 Arithmetic Emulation and FP8 Tensor Cores with Ozaki Scheme, Proc. the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) - ExHET'26: The Fifth International Workshop on Extreme Heterogeneity Solutions, pp. 303-311, 2026.
Tetsuya Hoshino, Shun-Ichiro Hayashi, Daichi Mukunoki, Takahiro Katagiri and Toshihiro Hanawa, Evaluating Claude Code's Coding and Test Automation for GPU Acceleration of a Legacy Fortran Application: A GeoFEM Case Study, Proc. the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) - The 1st International Workshop on Foundational Large Language Models Advances for HPC in Asia (LLM4HPCAsia 2026), pp. 353–360, 2026 (Best Paper Award).
Daichi Mukunoki, Katsuhisa Ozaki, Sparse Iterative Solvers Using High-Precision Arithmetic with Quasi Multi-Word Algorithms, Proc. 2025 IEEE 18th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC 2025), pp. 33-40, 2025.
Tatsuro Hanyu, Takahiro Katagiri, Daichi Mukunoki, Tetsuya Hoshino, An Algorithm Portfolio Approach for Parameter Tuning in Coherent Ising Machines, Proc. 2025 Thirteenth International Symposium on Computing and Networking Workshops (CANDARW) - 17th International Workshop on Parallel and Distributed Algorithms and Applications (PDAA 2025), pp. 142-148, 2025.
Xuanzhengbo Ren, Yuta Kawai, Hirofumi Tomita, Seiya Nishizawa, Takahiro Katagiri, Tetsuya Hoshino, Masatoshi Kawai, Daichi Mukunoki, Nagai Toru, Performance Evaluation of Loop Body Splitting for Fast Modal Filtering in SCALE-DG on A64FX, HPC Asia '25 Workshops: Proc. 2025 International Conference on High Performance Computing in Asia-Pacific Region Workshops, pp. 36-44, 2025.
Ryunosuke Matsuzaki, Daichi Mukunoki, Takaaki Miyajima, Performance evaluation and modelling of single-precision matrix multiplication on Cerebras CS-2, Proc. SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, 14th Workshop on Irregular Applications: Architectures and Algorithms, pp. 727-731, 2024.
Stef Graillat, Fabienne Jézéquel, Théo Mary, Roméo Molina, Daichi Mukunoki, Reduced-Precision and Reduced-Exponent Formats for Adaptive-Precision Sparse Matrix-Vector Product, Proc. 30th International European Conference on Parallel and Distributed Computing (Euro-Par 2024), Lecture Notes in Computer Science, Vol. 14803, pp. 17-30, Aug. 26, 2024.
Daichi Mukunoki, Masatoshi Kawai, Toshiyuki Imamura, Sparse Matrix-Vector Multiplication with Reduced-Precision Memory Accessor, Proc. 2023 IEEE 16th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC 2023), pp. 608-615, 2023 (Best Paper Award).
Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura, Infinite-precision Inner Product and Sparse Matrix Vector Multiplication using Ozaki Scheme with Dot2 on Many-core Processors, Proc. 14th International Conference on Parallel Processing and Applied Mathematics (PPAM 2022), Lecture Notes in Computer Science, vol 13826, pp. 40–54, 2023.
Daichi Mukunoki, Yusuke Hirota, Toshiyuki Imamura, Task Scheduling Strategies for Batched Basic Linear Algebra Subprograms on Many-core CPUs, Proc. 2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC 2021), pp. 234-241, 2021.
Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura, Accurate Matrix Multiplication on Binary128 Format Accelerated by Ozaki Scheme, Proc. The 50th International Conference on Parallel Processing (ICPP-2021), No. 78, pp. 1-11, Aug. 9, 2021.
Takeyuki Harayama, Shuhei Kudo, Daichi Mukunoki, Toshiyuki Imamura, Daisuke Takahashi, A rapid Euclidean norm calculation algorithm that reduces overflow and underflow, Proc. The 2021 International Conference on Computational Science and Its Applications (ICCSA 2021), Lecture Notes in Computer Science, Vol. 12949, pp. 95-110, Sep. 9, 2021.
Jens Domke, Emil Vatai, Aleksandr Drozd, Peng Chen, Yosuke Oyama, Lingqi Zhang, Shweta Salaria, Daichi Mukunoki, Artur Podobas, Mohamed Wahib, Satoshi Matsuoka, Matrix Engines for High Performance Computing: A Paragon of Performance or Grasping at Straws?, Proc. 35th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2021), pp. 1056-1065, Jun. 28, 2021 (preprint: arXiv:2010.14373)
Katsuhisa Ozaki, Takeshi Ogita, Daichi Mukunoki, Interval Matrix Multiplication using Fast Low-Precision Arithmetic on GPU, Proc. 9th International Workshop on Reliable Engineering Computing (REC2021), pp. 419-434, May 2021.
Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Roman Iakymchuk, Conjugate Gradient Solvers with High Accuracy and Bit-wise Reproducibility between CPU and GPU using Ozaki scheme, Proc. The International Conference on High Performance Computing in Asia-Pacific Region (HPCAsia 2021), pp. 100-109, 2021 (preprint is also available: hal-02986873).
Fabienne Jézéquel, Stef Graillat, Daichi Mukunoki, Toshiyuki Imamura, Roman Iakymchuk, Can we avoid rounding-error estimation in HPC codes and still get trustful results?, Proc. 13th International Workshop on Numerical Software Verification 2020 (NSV 20), Lecture Notes in Computer Science, Vol. 12549, pp. 163-177, Dec. 2020 (preprint: hal-02486753).
Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura, DGEMM using Tensor Cores, and Its Accurate and Reproducible Versions, Proc. ISC High Performance 2020, Lecture Notes in Computer Science, Vol. 12151, pp. 230-248, Jun. 2020.
Yiyu Tan, Toshiyuki Imamura, Daichi Mukunoki, Design of an FPGA-based Matrix Multiplier with Task Parallelism, Proc. International Conference on Parallel Computing (ParCo2019), Parallel Computing: Technology Trends, Vol. 36, pp. 241-250, 2020.
Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki, Reproducible BLAS Routines with Tunable Accuracy Using Ozaki Scheme for Many-core Architectures, Proc. 13th International Conference on Parallel Processing and Applied Mathematics (PPAM 2019), Lecture Notes in Computer Science, Vol. 12043, pp. 516-527, Mar. 2020.
Daichi Mukunoki, Toshiyuki Imamura, Performance Analysis of 2D-compatible 2.5D-PDGEMM on Knights Landing Cluster, Proc. International Conference on Computational Science (ICCS 2018), Lecture Notes in Computer Science, Vol. 10862, pp. 853-858, Jun. 2018.
Daichi Mukunoki, Toshiyuki Imamura, Implementation and Performance Analysis of 2.5D-PDGEMM on the K Computer, Proc. 12th International Conference on Parallel Processing and Applied Mathematics (PPAM 2017), Lecture Notes in Computer Science, Vol. 10777, pp. 348-358, Mar. 2018.
Toshiyuki Imamura, Daichi Mukunoki, Yusuke Hirota, Susumu Yamada, Masahiko Machida, Design Towards Modern High Performance LA Library Enabling Heterogeneity and Flexible Data Formats, Parallel Computing is Everywhere, Proc. International Conference on Parallel Computing (ParCo2017), Advances in Parallel Computing, pp. 97-106, Sep. 2017.
Daichi Mukunoki, Toshiyuki Imamura, Daisuke Takahashi, Automatic Thread-Block Size Adjustment for Memory-Bound BLAS Kernels on GPUs, Proc. IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-16). pp. 377-384, Sep. 2016.
Daichi Mukunoki, Toshiyuki Imamura, Reduced-Precision Floating-Point Formats on GPUs for High Performance and Energy Efficient Computation, Proc. IEEE International Conference on Cluster Computing (Cluster 2016), pp. 144-145, Sep. 13, 2016 (extended abstract for poster presentation).
Daichi Mukunoki, Toshiyuki Imamura, Daisuke Takahashi, Fast Implementation of General Matrix-Vector Multiplication (GEMV) on Kepler GPUs, Proc. 23rd Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP 2015), pp. 642-650, Mar. 2015.
Daichi Mukunoki, Daisuke Takahashi, Using Quadruple Precision Arithmetic to Accelerate Krylov Subspace Methods on GPUs, Proc. 10th International Conference on Parallel Processing and Applied Mathematics (PPAM 2013), Part I, Workshop on Numerical Algorithms on Hybrid Architectures, Lecture Notes in Computer Science, Vol. 8384, pp. 632-642, May 2014.
Daichi Mukunoki, Daisuke Takahashi, Optimization of Sparse Matrix-vector Multiplication for CRS Format on NVIDIA Kepler Architecture GPUs, Proc. 13th International Conference on Computational Science and Its Applications (ICCSA 2013), Part V, Lecture Notes in Computer Science, Vol. 7975, pp. 211-223, Jun. 2013.
Daichi Mukunoki, Daisuke Takahashi, Implementation and Evaluation of Triple Precision BLAS Subroutines on GPUs, Proc. 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW 2012), The 13th Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC-12), pp. 1378-1386, May 2012.
Daichi Mukunoki, Daisuke Takahashi, Implementation and Evaluation of Quadruple Precision BLAS Functions on GPUs, Proc. 10th International Conference on Applied Parallel and Scientific Computing (PARA 2010), Part I, Lecture Notes in Computer Science, Vol. 7133, pp. 249-259, 2012.
椋木大地, 高橋大介, GPUによる4倍・8倍精度BLASの実装と評価, 2011年ハイパフォーマンスコンピューティングと計算科学シンポジウムHPCS2011論文集, pp. 148-156, 2011年1月 (in Japanese).

Technical Reports (Non-reviewed) / 研究報告等（査読なし）

磯部晃輝，椋木大地，小田昌宏，星野哲也，森健策，片桐孝洋，医用画像表現学習モデルのGPU 高速化に関する研究，情報処理学会第88回全国大会，2026年3月．
阪口修吾，星野哲也，河合直聡，椋木大地，片桐孝洋，ICTCG法の計算プログラムのGPU移植を対象としたコード生成AIの比較，情報処理学会第88回全国大会，2026年3月．
末永和也，樫村寛大，森崎修司，椋木大地，星野哲也，片桐孝洋，LAPACK固有値ソルバのテストケース最適化のための行列積演算のバグ混入の影響調査，情報処理学会第88回全国大会，2026年3月．
坂倉耕太，中野達也，望月祐志，大島聡史，椋木大地，星野哲也，片桐孝洋，量子化学計算プログラム ABINIT-MPにおけるMP2計算のGPU高速化，情報処理学会第88回全国大会，2026年3月．
Ryo Mikasa, Shun-ichiro Hayashi, Daichi Mukunoki, Tetsuya Hoshino, Takahiro Katagiri, Improving HPC Code Generation Capability of LLMs via Online Reinforcement Learning with Real-Machine Benchmark Rewards, arXiv preprint arXiv:2602.12049, 2026.
Xuanzhengbo Ren, Yuta Kawai, Tetsuya Hoshino, Hirofumi Tomita, Takahiro Katagiri, Daichi Mukunoki, Seiya Nishizawa, Learning-Augmented Performance Model for Tensor Product Factorization in High-Order FEM, arXiv preprint arXiv:2601.06886, 2026.
椋木大地，森田光貴，林俊一郎，三笠諒，星野哲也，片桐孝洋，gpt-oss-120bを用いたコード自動最適化マルチエージェントシステムの試作，第255回システム・アーキテクチャ・第202回ハイパフォーマンスコンピューティング合同研究発表会，2025年12月15日．
林俊一郎，森田光貴，椋木大地，星野哲也，片桐孝洋．VibeCodeHPC：自動コード最適化CLI型マルチエージェント，第255回システム・アーキテクチャ・第202回ハイパフォーマンスコンピューティング合同研究発表会，2025年12月15日．
村上魁，長島令旺，中村暁，松崎竜之介，吉井一友，椋木大地，宮島敬明，csDF：Cerebras CS-2向け疑似倍精度浮動小数点演算ライブラリの実装，第255回システム・アーキテクチャ・第202回ハイパフォーマンスコンピューティング合同研究発表会，2025年12月15日．
Daichi Mukunoki, Katsuhisa Ozaki, Sparse Iterative Solvers Using High-Precision Arithmetic with Quasi Multi-Word Algorithms, arXiv preprint arXiv:2510.13536, 2025.
Shun-ichiro Hayashi, Daichi Mukunoki, Tetsuya Hoshino, Satoshi Ohshima, Takahiro Katagiri, 3Dify: a Framework for Procedural 3D-CG Generation Assisted by LLMs Using MCP and RAG, arXiv preprint arXiv:2510.04536, October 2025.
Shun-ichiro Hayashi, Koki Morita, Daichi Mukunoki, Tetsuya Hoshino, Takahiro Katagiri, VibeCodeHPC: An Agent-Based Iterative Prompting Auto-Tuner for HPC Code Generation Using LLMs, arXiv preprint arXiv:2510.00031, September 2025.
樫村寛大，片桐孝洋，森崎修司，椋木大地，星野哲也，機械学習によるLAPACK固有値計算ルーチンのテストシーケンス最適化の試行，第201回ハイパフォーマンスコンピューティング研究発表会，2025年9月29日（in Japanese）．
星野哲也，林俊一郎，椋木大地，片桐孝洋，塙敏博，GeoFEMを対象としたClaudeCodeによるGPUコード開発の評価，第201回ハイパフォーマンスコンピューティング研究発表会，2025年9月29日（in Japanese）．
片桐孝洋，林俊一郎，椋木大地，星野哲也，大島聡史，HPC-GENIE: LLMを利用したHPCコード自動生成プロジェクトー概要とケーススタディー，日本応用数理学会 2025年度年会講演予稿集，2025年9月 (in Japanese)．
椋木大地，林俊一郎，AI向け低精度演算の動向と尾崎スキームによるFP64演算への適用可能性の検討，日本応用数理学会 2025年度年会講演予稿集，2025年9月 (in Japanese)．
松崎竜之介，椋木大地，宮島敬明，Cerebras CS-2における疑似倍精度行列積，日本応用数理学会 2025年度年会講演予稿集，2025年9月 (in Japanese)．
椋木大地，林俊一郎，星野哲也，片桐孝洋，BLASコードを題材としたGPTモデルによる数値計算コード実装支援に関する考察，第200回ハイパフォーマンスコンピューティング研究発表会（SWoPP2025），2025年8月4日．
林俊一郎，椋木大地，大島聡史，片桐孝洋，星野哲也，MCP・RAGを用いたプロシージャル3D生成LLMエージェント3Difyの提案とスパコンの利用，第200回ハイパフォーマンスコンピューティング研究発表会（SWoPP2025），2025年8月4日．
Daichi Mukunoki, DGEMM without FP64 Arithmetic - using FP64 Emulation and FP8 Tensor Cores with Ozaki Scheme, arXiv preprint arXiv:2508.00441, 2025.
Tatsuro Hanyu, Takahiro Katagiri, Daichi Mukunoki, Tetsuya Hoshino, Towards Generalized Parameter Tuning in Coherent Ising Machines: A Portfolio-Based Approach, arXiv preprint arXiv:2507.20295, 2025.
Daichi Mukunoki, Shun-ichiro Hayashi, Tetsuya Hoshino, Takahiro Katagiri, Performance Evaluation of General Purpose Large Language Models for Basic Linear Algebra Subprograms Code Generation, arXiv preprint arXiv:2507.04697, 2025.
羽生達郎, 片桐孝洋, 森下誠, 高橋一郎，河合直聡, 椋木大地, 星野哲也, 永井亨, コヒーレントイジングマシンにおけるパラメタチューニングへのATの適用, 計算工学講演会論文集, Vol. 30, No. E-01-03, 2025年6月 (in Japanese).
中谷崇真, 河合直聡, 片桐孝洋, 星野哲也, 永井亨, 椋木大地, 疎行列反復解法の深層学習を用いた実行時間予測モデル構築と評価, 情報処理学会研究報告: ハイパフォーマンスコンピューティング, Vol. 2025-HPC-199, No. 3, pp. 1-8, 2025年5月 (in Japanese).
羽生達郎, 森下誠, 水木直也, 片桐孝洋, 椋木大地, 河合直聡, 星野哲也, 永井亨, コヒーレントイジングマシンの性能パラメタ最適化のための探索アルゴリズム選択可能な手法の提案, 情報処理学会研究報告: ハイパフォーマンスコンピューティング, Vol. 2025-HPC-198, No. 37, pp. 1-10, 2025年3月 (in Japanese).
水木直也, 森下誠, 河合直聡, 片桐孝洋, 椋木大地, 星野哲也, 永井亨, SVMによる誤差を含むクラス分類における多種疑似量子アニーラの性能評価, 情報処理学会研究報告: ハイパフォーマンスコンピューティング, Vol. 2025-HPC-198, No. 34, pp. 1-8, 2025年3月 (in Japanese).
椋木大地, 尾崎克久, Quasi Triple-Word Arithmeticによる6倍精度演算の疎行列反復解法への応用, 情報処理学会研究報告: ハイパフォーマンスコンピューティング, Vol. 2024-HPC-197, 2024-ARC-259, No. 11, pp. 1-7, 2024年12月 (in Japanese).
Stef Graillat, Fabienne Jézéquel, Théo Mary, Roméo Molina, Daichi Mukunoki, Performance Evaluation of Adaptive-Precision SpMV with Reduced-Precision Formats, HAL, hal-04261073, Oct. 2023.
椋木大地, 尾崎克久, 荻田武史, 今村俊幸, 尾崎スキームによる無限精度内積と再現可能疎行列反復ソルバーへの応用, 日本応用数理学会2022年度年会講演予稿集, Sep. 10, 2022 (in Japanese).
椋木大地, 廣田悠輔, 今村俊幸, CPUにおけるbatched BLASのためのタスクスケジューリング戦略, 日本応用数理学会2021年度年会講演予稿集, Sep. 7, 2021 (in Japanese).
原山赳幸, 工藤周平, 椋木大地, 今村俊幸, 高橋大介, オーバー・アンダーフローを抑えた高精度かつ高速な2ノルム計算手法, 情報処理学会研究報告: ハイパフォーマンスコンピューティング, Vol. 2020-HPC-177, No. 8, pp. 1-9, 2020年12月 (in Japanese).
椋木大地, 尾崎克久, 荻田武史: 尾崎スキームを用いたbinary128による4倍精度行列積, 日本応用数理学会2020年度年会講演予稿集, Sep. 10, 2020 (in Japanese).
Roman Iakymchuk, Daichi Mukunoki, Artur Podobas, Fabienne Jézéquel, Toshiyuki Imamura, Norihisa Fujita, Jens Huthmann, Shuhei Kudo, Yiyu Tan, Jens Domke, Kai Torben Ohlhus, Takeshi Fukaya, Takeo Hoshi, Yuki Murakami, Maho Nakata, Takeshi Ogita, Kentaro Sano, Taisuke Boku, While Paper from Workshop on Large-scale Parallel Numerical Computing Technology (LSPANC 2020): HPC and Computer Arithmetic toward Minimal-Precision Computing, arXiv:2004.04628, hal-02536316, Apr. 2020.
Toshiyuki Imamura, Daichi Mukunoki, Fabienne Jézéquel, Stef Graillat, Roman Iakymchuk, Numerical Reproducibility based on Minimal-Precision Validation, Computational Reproducibility at Exascale Workshop (CRE2019), in cooperation with SC19, Nov. 17, 2019.
椋木大地, 荻田武史, 尾崎克久, 今村俊幸, 尾崎スキームによる高精度かつ再現性のあるBLAS実装, 日本応用数理学会2019年年会講演予稿集, pp. 402-403, 2019年9月 (in Japanese).
椋木大地, 荻田武史, 尾崎克久, Level-3 BLASに基づく高精度行列積計算法による高精度かつ再現性のあるBLASルーチンの実装とその最適化, 情報処理学会研究報告: ハイパフォーマンスコンピューティング, Vol. 2018-HPC-166, No. 9, pp. 1-8, 2018年9月 (in Japanese).
椋木大地, 今村俊幸, 2.5次元アルゴリズムを用いた高性能PDGEMMの開発, 東京大学情報基盤センタースーパーコンピューティングニュース, Vol. 20, No. 4, pp. 31-36, 2018年7月 (in Japanese).
椋木大地, 今村俊幸, 京コンピュータにおける2.5次元アルゴリズムを用いた分散並列行列積の実装と評価, 情報処理学会研究報告: ハイパフォーマンスコンピューティング, Vol. 2017-HPC-159, No. 1, pp. 1-6, 2017年4月 (in Japanese).
森倉悠介, 椋木大地, 深谷猛, 山中脩也, 大石進一, 大規模並列計算機における連立1次方程式の精度保証付き数値計算に対する性能評価, 情報処理学会研究報告: ハイパフォーマンスコンピューティング, Vol. 2016-HPC-157, No. 1, pp. 1-7, 2016年12月 (in Japanese).
今村俊幸, 椋木大地, コンシューマレンジGPUに最適化した固有値ソルバーの実装と評価, 情報処理学会研究報: ハイパフォーマンスコンピューティング, Vol. 2016-HPC-157, No. 7, pp. 1-9, 2016年12月 (in Japanese).
椋木大地, 今村俊幸, 短尺浮動小数点形式の検討, 情報処理学会研究報告: ハイパフォーマンスコンピューティング, Vol. 2015-HPC-152, No. 4, pp. 1-10, 2015年12月 (in Japanese).
佐々木信一, 菱沼利彰, 藤井昭宏, 田中輝雄, 椋木大地, 今村俊幸, 京・FX10における倍々精度演算の高速化, 情報処理学会研究報告, Vol. 2015-HPC-151, No. 15, pp. 1-7, 2015年9月 (in Japanese).
今村俊幸, 椋木大地, 山田進, 町田昌彦, SYMV・GEMVルーチン群のマルチGPU化とその評価, 情報処理学会研究報告: ハイパフォーマンスコンピューティング, Vol. 2015-HPC-151, No. 13, pp. 1-8, 2015年9月 (in Japanese).
佐々成正, 山田進, 町田昌彦, 椋木大地, 今村俊幸, FFTを使った時間発展問題における累積誤差, 応用数理学会2015年度年会講演論文集, 2015年9月 (in Japanese).
椋木大地, 今村俊幸, 高橋大介, NVIDIA GPUにおけるメモリ律速なBLASカーネルのスレッド数自動選択手法, 情報処理学会研究報告: ハイパフォーマンスコンピューティング, Vol. 2015-HPC-150, No. 13, pp. 1-13, 2015年7月 (in Japanese) (情報処理学会 2016年度山下記念研究賞).
椋木大地, 今村俊幸, 高橋大介, NVIDIA GPUにおけるGEMVカーネルの自動チューニング, 計算工学講演会論文集, Vol. 20, No. E-2-1, 2015年6月 (in Japanese).
今村俊幸, 椋木大地, 山田進, 町田昌彦, CUDA-BLAS等の選択による最速GPU固有値ソルバーの性能評価, 情報処理学会研究報告: ハイパフォーマンスコンピューティング, Vol. 2015-HPC-148, No. 4, pp. 1-9, 2015年2月 (in Japanese).
椋木大地, 今村俊幸, MaxwellアーキテクチャGPUにおける疑似倍精度演算を用いたDGEMMの実装と評価, 情報処理学会研究報告: ハイパフォーマンスコンピューティング, Vol. 2014-HPC-147, No. 26, pp. 1-6, 2014年12月 (in Japanese).
今村俊幸, 椋木大地, 山田進, 町田昌彦, CUDA-xSYMVの実装と評価, 情報処理学会研究報告: ハイパフォーマンスコンピューティング, Vol. 2014-HPC-146, No. 14, pp. 1-12, 2014年10月 (in Japanese).
椋木大地, 高橋大介, GPUにおける4倍精度浮動小数点演算を用いたクリロフ部分空間法の高速化, 情報処理学会研究報告: ハイパフォーマンスコンピューティング, Vol. 2013-HPC-140, No. 35, pp. 1-7, 2013年7月 (in Japanese).
椋木大地, 高橋大介, GPUにおける高速なCRS形式疎行列ベクトル積の実装, 情報処理学会研究報告: ハイパフォーマンスコンピューティング, Vol. 2013-HPC-138, No. 5, pp. 1-7, 2013年2月 (in Japanese) (情報処理学会 2013年度コンピュータサイエンス領域奨励賞).
椋木大地, 高橋大介, GPUにおける4倍精度演算を用いた疎行列反復解法の実装と評価, 情報処理学会研究報告: ハイパフォーマンスコンピューティング, Vol. 2012-HPC-137 (2012-ARC-202), No. 37, pp. 1-8, 2012年12月 (in Japanese) (情報処理学会計算機アーキテクチャ研究会 2012年度若手奨励賞).
椋木大地, 高橋大介, GPUによる3倍精度浮動小数点演算の検討, 情報処理学会研究報告: ハイパフォーマンスコンピューティング, Vol. 2011-HPC-132 (2011-ARC-197), No. 23, pp. 1-9, 2011年11月 (in Japanese).
椋木大地, 高橋大介, GPUによる4倍精度BLASの実装と評価, 計算工学講演会論文集, Vol. 15, No. 2, pp. 891-894, 2010年5月 (in Japanese).
椋木大地, 高橋大介, GPUによる4倍精度BLASの実装と評価, 情報処理学会研究報告: ハイパフォーマンスコンピューティング, Vol. 2009-HPC-123 (2009-ARC-186), No. 13, pp. 1-6, 2009年11月 (in Japanese).

Peer-reviewed Poster Presentations / 査読付ポスター発表

Daichi Mukunoki, DGEMM using FP64 Arithmetic Emulation and FP8 Tensor Cores with Ozaki Scheme, the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26), poster session, Jan. 2026.
Shun-Ichiro Hayashi, Daichi Mukunoki, Tetsuya Hoshino, Takahiro Katagiri, VibeCodeHPC: A Multi-LLM Agent Auto-Tuner for HPC Codes, the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26), poster session, Jan. 2026.
Koki Isobe, Daichi Mukunoki, Masahiro Oda, Tetsuya Oda, Kensaku Mori, Takahiro Katagiri, GPUAcceleration of Medical Image Representation Learning Models with Distributed Data Parallel and I/O Optimization, the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26), poster session, Jan. 2026.
Hiroto Kashimura, Takahiro Katagiri, Shuji Morisaki, Daichi Mukunoki, Tetsuya Hoshino, A Trial on Optimizing Test Sequences for LAPACK Eigenvalue Computation Routines using Machine Learning, the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26), poster session, Jan. 2026.
Takanori Kotama, Rio Yokota, Daichi Mukunoki, Tetsuya Hoshino, Takahiro Katagiri, Proposal of The AI Scientist v2 for High Performance Computing with Local Large Language Models, the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26), poster session, Jan. 2026.
Ryo Mikasa, Daichi Mukunoki, Tetsuya Hoshino, Takahiro Katagiri, A Multi Agent System for Local LLM-Based HPC Code Generation, the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26), poster session, Jan. 2026.
Naoya Mizuki, Takahiro Katagiri, Daichi Mukunoki, Tetsuya Hoshino, Performance Evaluation of SVM with Multiple Quantum-inspired Annealers, the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26), poster session, Jan. 2026.
Koki Morita, Daichi Mukunoki, Tetsuya Hoshino, Takahiro Katagiri, Evaluation of the Capability of Coding AI in Generating SYCL-Based Numerical Computation Codes for Intel GPUs, the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26), poster session, Jan. 2026.
Takamasa Nakaya, Takahiro Katagiri, Tetsuya Hoshino, Daichi Mukunoki, Masatoshi Kawai, Verification of the Effectiveness of Deep Learning in Preprocessing Parameter Estimation for the Conjugate Gradient Method, the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26), poster session, Jan. 2026.
Reo Nagashima, Akeru Nakamura, Kai Murakami, Ryunosuke Matsuzaki, Daichi Mukunoki, Takaaki Miyajima, csDF: a double-float arithmetic library for the Cerebras CS-2, SC25 research poster session, Nov. 16-21, 2025.
Atsushi Suzuki, Daichi Mukunoki, Toshiyuki Imamura, tmBLAS: a Mixed Precision BLAS by C++ Template, ISC High Performance (ISC 2023), research poster session, May, 2023.
Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura, A Fast Infinite Precision Inner Product using Ozaki Scheme and Dot2, and Its Application to Reproducible Conjugate Gradient Solvers, ISC High Performance (ISC 2022), research poster session, Jun. 1, 2022 (Research Poster Award 2nd Place Winner).
Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura, Accurate Matrix Multiplication on Binary128 using Ozaki Scheme, ISC High Performance (ISC 2021), research poster session, Jun. 29, 2021 (Research Poster Award).
Roman Iakymchuk, Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki, Stef Graillat, Accurate and Reproducible Conjugate Gradient in Hybrid Parallel Environments, ISC High Performance (ISC 2021), Jun. 29, 2021.
Daichi Mukunoki, Toshiyuki Imamura, Yiyu Tan, Atsushi Koshiba, Jens Huthmann, Kentaro Sano, Fabienne Jézéquel, Stef Graillat, Roman Iakymchuk, Norihisa Fujita, Taisuke Boku, Minimal-Precision Computing for High-Performance, Energy-Efficient, and Reliable Computations, SC19 research poster session, Nov. 19-21, 2019.
Yusuke Hirota, Daichi Mukunoki, Toshiyuki Imamura, Automatic Generation of Full-Set Batched BLAS, ISC High Performance (ISC 2018), research poster session, Jun. 26, 2018.
Daichi Mukunoki, Toshiyuki Imamura, Implementation and Evaluation of 2.5D Matrix Multiplication on K Computer, ISC High Performance (ISC 2017), research poster session, Jun. 20, 2017 (PRACE-ISC Research Poster Award 2017).

Poster Presentations (Non-reviewed) / ポスター発表（査読なし）

椋木大地，林俊一郎，星野哲也，森田光貴, 片桐孝洋，LLMを用いた数値計算コードの自動生成・自動性能最適化への挑戦と展望，物性研究所ソフトウェア開発・高度化プロジェクト研究会〜計算物質科学の発展を支えるオープンソースソフトウェアの開発と普及，ポスター発表，2025年10月20日．
林俊一郎，森田光貴, 椋木大地，星野哲也，片桐孝洋，LLMによるコード自動最適化「VibeCodeHPC」の開発状況と実験が示したマルチエージェントの優位性，物性研究所ソフトウェア開発・高度化プロジェクト研究会〜計算物質科学の発展を支えるオープンソースソフトウェアの開発と普及，ポスター発表，2025年10月20日．
湯淺義尚，小田昌宏，椋木大地，片桐孝洋，星野哲也，河合直聡，永井亨，森健策，GPU搭載スーパーコンピュータを用いたCOVID-19診断支援のための肺野セグメンテーションの高速化，第44回日本医用画像工学会大会（JAMIT 2025），2025年8月28日〜30日（in Japanese）．
林俊一郎，椋木大地，星野哲也，片桐孝洋，HPC-GENIE: High-Performance Computing with Generative Neural Intelligence for Execution，xSIG 2025，ポスター発表，2025年8月 (発表者の林さんがPoster Award受賞）．
Daichi Mukunoki, Atsushi Suzuki, Toshiyuki Imamura, Multiple and Mixed Precision BLAS with C++ Template, 5th R-CCS International Symposium, Feb. 6, 2023.
Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura, Accurate Matrix Computations using Ozaki Scheme on CPUs and GPUs, The 30th Anniversary Symposium of the Center for Computational Sciences at the University of Tsukuba, Oct. 14, 2022.
Daichi Mukunoki, Roman Iakymchuk, Fabienne Jezequel, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura, Remedies for Reproducibility Issue in Conjugate Gradient Solvers, SparseDays2022, poster session, Jun. 20-22, 2022.
Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura, Roman Iakymchuk, High-Precision, Accurate, and Reproducible Linear Algebra Operations using Ozaki Scheme, 3rd R-CCS International Symposium, Feb. 15, 2021.
Toshiyuki Imamura, Daichi Mukunoki, Yiyu Tan, Atsushi Koshiba, Jens Huthmann, Kentaro Sano, Fabienne Jézéquel, Stef Graillat, Roman Iakymchuk, Norihisa Fujita, Taisuke Boku, Minimal-Precision Computing for High-Performance, Energy-Efficient, and Reliable Computations, 2nd R-CCS International Symposium, Feb. 17, 2020.
Yiyu Tan, Toshiyuki Imamura, Daichi Mukunoki, An FPGA-based Matrix Multiplier with Task Parallelism, 2nd R-CCS International Symposium, Feb. 17, 2020.
Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura, Accurate DGEMM using Tensor Cores, The International Conference on High Performance Computing in Asia-Pacific Region (HPCAsia 2020), Jan. 15-17, 2020.
Roman Iakymchuk, Fabienne Jézéquel, Stef Graillat, Daichi Mukunoki, Toshiyuki Imamura, Yiyu Tan, Atsushi Koshiba, Jens Huthmann, Kentaro Sano, Norihisa Fujita, Taisuke Boku, Optimizing Precision for High-Performance, Robust, and Energy-Efficient Computations, The International Conference on High Performance Computing in Asia-Pacific Region (HPCAsia 2020), Jan. 15-17, 2020.
Daichi Mukunoki, Toshiyuki Imamura, Yiyu Tan, Atsushi Koshiba, Jens Huthmann, Kentaro Sano, Fabienne Jézéquel, Stef Graillat, Roman Iakymchuk, Norihisa Fujita, Taisuke Boku, Minimal-Precision Computing for High-Performance, Energy-Efficient, and Reliable Computations, France-Japan-Germany trilateral workshop: Convergence of HPC and Data Science for Future Extreme Scale Intelligent Applications, Nov. 7, 2019.
Yiyu Tan, Daichi Mukunoki, Toshiyuki Imamura, Norihisa Fujita, Taisuke Boku, Reduced and Extended-Precision Computations on FPGAs and GPUs, The 11th symposium on Discovery, Fusion, Creation of New Knowledge by Multidisciplinary Computational Sciences, University of Tsukuba, Oct. 15, 2019.
Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki, Accurate and Reproducible Linear Algebra Operations for Many-core Architectures, Russian Supercomputing Days 2019 (RuSCDays 2019), Sep. 23 - 24, 2019 (Best Research Poster Award).
Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki, OzBLAS: Accurate and Reproducible BLAS Based on Ozaki Scheme, GPU Technology Conference (GTC 2019), Mar. 17-21, 2019.
Toshiyuki Imamura, Yusuke Hirota, Daichi Mukunoki, Shuhei Kudo, Akiyoshi Kuroda, Naoki Sueyasu, Development of Scientific Numerical Libraries on post-K computer, 1st R-CCS International Symposium, Feb. 18-19, 2019.
荻田武史, 椋木大地, 尾崎克久, HPC分野における精度保証付き数値計算学の展開, 第3回CDMSI（ポスト「京」重点課題（７））シンポジウム, 2017年12月5日 (in Japanese).
椋木大地, 今村俊幸, 高橋大介, PascalアーキテクチャGPUにおける線形計算カーネルの実装技術の検討, GTC Japan 2016, 2016年10月5日 (in Japanese).
大井祥栄, 廣田悠輔, 椋木大地, 今村俊幸, KMATHLIB -High Performance and Scalable Numerical Library for the K Computer-, 応用数理学会2016年度年会, 2016年9月13日 (in Japanese).
Daichi Mukunoki, Toshiyuki Imamura, Daisuke Takahashi, Introduction of Research Activities for GPU Computing at Large-scale Parallel Numerical Computing Technology Research Team on AICS, The 6th AICS International Symposium, Feb. 22, 2016.
Yusuke Morikura, Daichi Mukunoki, Takeshi Fukaya, Naoya Yamanaka, Shin'ichi Oishi, Performance Evaluation of Verified Computation for Linear Systems on Parallel Computers, 2nd Annual Meeting on Advanced Computing System and Infrastructure (ACSI2016), Jan. 19, 2016.
大井祥栄, 廣田悠輔, 椋木大地, 今村俊幸, 京コンピュータ向け数値計算ライブラリ群KMATHLIBの実装, 応用数理学会2015年度年会, 2015年9月9日 (in Japanese).
椋木大地, 今村俊幸, 高橋大介, GPUにおけるスレッド数自動選択機能を持ったメモリ律速な線形計算カーネル群「MUBLAS」の実装と評価, GTC Japan 2015, 2015年9月18日 (in Japanese).
Daichi Mukunoki, Toshiyuki Imamura, Daisuke Takahashi, High-Performance GEMV and SYMV with Auto-Tuning for Performance Stabilization on Multiple GPU Generations, GPU Technology Conference (GTC 2015), Mar. 17, 2015.
椋木大地, 今村俊幸, 高橋大介, Kepler・MaxwellアーキテクチャGPUにおける性能が行列形状に依存しない高速なGEMVの実装, Annual Meeting on Advanced Computing System and Infrastructure (ACSI) 2015論文集, 2015年1月26日 (extended abstract in conference proceedings) (in Japanese).
佐々木信一, 藤井昭宏, 田中輝雄, 椋木大地, 今村俊幸, スーパコンピュータ京における倍々精度演算の高速化, Annual Meeting on Advanced Computing System and Infrastructure (ACSI) 2015論文集, 2015年1月26日 (extended abstract in conference proceedings) (in Japanese).
今村俊幸, 椋木大地, 佐々成正, 山田進, 町田昌彦, 疑似四倍精度拡張数学パッケージQP-Pack, Annual Meeting on Advanced Computing System and Infrastructure (ACSI) 2015論文集, 2015年1月26日 (extended abstract in conference proceedings) (in Japanese).
椋木大地, 今村俊幸, 高橋大介, KeplerアーキテクチャGPUにおける高速なSGEMVの実装, GTC Japan 2014, 2014年7月16日 (in Japanese).
Daichi Mukunoki, Daisuke Takahashi, Linear Algebra Operations using Quadruple-precision Arithmetic on GPU, GPU Technology Conference (GTC2014), Mar. 24, 2014.
Daichi Mukunoki, Daisuke Takahashi, Performance Comparison of Double, Triple and Quadruple Precision Real and Complex BLAS Subroutines on GPUs, Proc. ATIP/A*CRC Workshop on Accelerator Technologies for High-Performance Computing: Does Asia Lead the Way? (ATIP/A*CRC Workshop '12), pp. 788-790, May. 7, 2012 (extended abstract in conference proceedings).

Oral Presentations / 口頭発表

Daichi Mukunoki, Koki Morita, Hayashi Shun-ichiro, Tetsuya Hoshino, Takahiro Katagiri, Toward Automatic Generation of High Performance Numerical Codes by LLMs, SIAM Conference on Parallel Processing for Scientific Computing (PP26), Berlin, Germany, March 2026.
椋木大地，高性能計算のためのコード生成AIエージェント開発，MateriAI 2025 〜計算物質科学分野におけるAI技術の活用，2026年2月2日．
椋木大地，生成AIの活用によるHPCコードGPU化の展望，「次世代計算基盤を見据えたソフトウェア環境整備とそれを担う人材の育成に関する提言」についての意見交換会，2026年1月21日．
椋木大地，生成AIによるHPCコード開発の革新に向けて：HPC-GENIEプロジェクトの取り組みと展望，情報処理学会東海支部主催第6回講演会，2025年1月9日．
椋木大地，AI時代のハードウェアとFP64エミュレーション，第17回自動チューニング技術の現状と応用に関するシンポジウム（ATTA2025），2025年12月23日．
Daichi Mukunoki, Automatic Generation of Numerical Codes for GPUs Using LLMs, JHPCN Field Workshop: State-of-the-Art in Code Generative AI for High-Performance Computing, Dec. 5, 2025.
Daichi Mukunoki, Automatic Generation and GPU Porting of Numerical Computation Codes Using Generative AI, 58th ASE Seminar, Dec. 1, 2025.
林俊一郎，椋木大地，生成AIを活用した数値計算・HPCコード自動生成への挑戦と展望，2025年度第2回物性アプリオープンフォーラム，2025年9月29日（in Japanese）．
Daichi Mukunoki, Challenges and Prospects in Automatic Generation of HPC Codes Using Generative AI, The 6th "FugakuNEXT" Application Seminar, Sep. 25, 2025.
椋木大地，汎用LLMによるBLASコード自動生成能力の考察，第6回スーパーコンピュータ「不老」ユーザ会，2025年9月11日（in Japanese）．
椋木大地，LLMによるBLASコード生成に関する考察，第33回AT研究会オープンアカデミックセッション（ATOS33），2025年7月28日（in Japanese）．
Daichi Mukunoki, Masatoshi Kawai, Toshiyuki Imamura, Reduced-Precision Data Representation on Sparse Matrix-Vector Multiplications, 10th International Congress on Industrial and Applied Mathematics (ICIAM 2023), Aug. 21, 2023.
Toshiyuki Imamura, Daichi Mukunoki, Atsushi Suzuki, Multiple- and Mixed-Precision BLAS with C++ Template, 10th International Congress on Industrial and Applied Mathematics (ICIAM 2023), Aug. 24, 2023.
椋木大地, 河合直聡, 疎行列ベクトル積における低精度データ表現の導入について, 第14回自動チューニング技術の現状と応用に関するシンポジウム（ATTA2022）, Dec. 23, 2022 (in Japanese).
Kensuke Aihara, Katsuhisa Ozaki, Daichi Mukunoki, A mixed-precision algorithm of the CG method using the group-wise update strategy, The 41st JSST Annual International Conference on Simulation Technology (JSST2022), online, Aug. 31-Sep. 2, 2022.
Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura, Roman Iakymchuk, Impact and Contribution of Ozaki scheme in High Performance Computing, International Workshop on Reliable Computing and Computer-Assisted Proofs (ReCAP 2022), online, Mar. 15, 2022.
相原研輔, 尾崎克久, 椋木大地, Flying restart付きCG法に対する混合精度演算による近似解精度の向上, 日本応用数理学会第18回研究部会連合発表会, online, Mar. 9, 2022 (in Japanese).
尾崎克久, 椋木大地, 荻田武史, 行列積に対する試行型エラーフリー変換に対する誤差の対処法とその応用, 日本応用数理学会第18回研究部会連合発表会, online, Mar. 8, 2022 (in Japanese).
Daichi Mukunoki, Yusuke Hirota, Toshiyuki Imamura, Performance Evaluation of Batched BLAS on A64FX, 4th R-CCS International Symposium (lightning talk), online, Feb. 7, 2022.
椋木大地, 精度自動チューニングに向けた基盤技術の検討, 第13回自動チューニング技術の現状と応用に関するシンポジウム (ATTA2021), online, Dec. 13, 2021 (in Japanese).
Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura, DGEMM using Tensor Cores, SIAM Conference on Computational Science and Engineering (CSE21), online, Mar. 4, 2021.
Fabienne Jézéquel, Stef Graillat, Daichi Mukunoki, Toshiyuki Imamura, Roman Iakymchuk, Fast rounding error estimation for compute-intensive operations using standard floating-point arithmetic, Rencontres Arithmétiques de l'Informatique Mathématique (RAIM), Paris, May 2021.
椋木大地, 尾崎克久, 荻田武史, binary128 に対する尾崎スキーム行列積, 第4回精度保証付き数値計算の実問題への応用研究集会 (NVR 2020), online, Nov. 28-28, 2020 (in Japanese).
Roman Iakymchuk, Daichi Mukunoki, Conjugate Gradient Solvers with Accuracy and Reproducibility Guarantees in Hybrid Parallel Environments, Sparse Days Cerfacs, online, Nov. 24t, 2020.
Daichi Mukunoki, DGEMM using Tensor Cores and OzBLAS, 11th Joint Laboratory for Extreme Scale Computing (JLESC) Workshop, online, Sep. 8, 2020.
Daichi Mukunoki, Minimal-Precision Computing for High-Performance, Energy-Efficient, and Reliable Computations, SIAM Conference on Parallel Processing for Scientific Computing (PP20), Seattle, Feb. 15, 2020 .
Daichi Mukunoki, Accurate BLAS implementations: OzBLAS and BLAS-DOT2, Workshop on Largescale Parallel Numerical Computing Technology (LSPANC 2020 January), RIKEN R-CCS, Kobe, Jan. 30, 2020.
Daichi Mukunoki, Minimal-Precision Computing for High-Performance, Energy-Efficient, and Reliable Computations, Sapporo Winter HPC Seminar 2020, Information Initiative Center, Hokkaido University, Jan. 24, 2020.
Daichi Mukunoki, Takeshi Ogita, High-performance Implementations of Accurate Linear Algebra Kernels on GPUs, 3rd International Conference on Modern Mathematical Methods and High Performance Computing in Science & Technology (M3HPCST), Jan. 9-11, 2020.
椋木大地, 荻田武史, 尾崎克久, 尾崎スキームによる高精度BLAS実装「OzBLAS」とその応用, 第3回精度保証付き数値計算の実問題への応用研究集会 (NVR 2019), 高松市, 2019年12月1日 (in Japanese).
Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki, Accurate and Reproducible CG Method on GPUs, European Numerical Mathematics and Advanced Applications Conference 2019 (ENUMATH2019), Egmond aan Zee, Oct. 1, 2019.
Daichi Mukunoki, High-Performance Implementations of Accurate and Reproducible BLAS Routines on GPUs, Workshop on Largescale Parallel Numerical Computing Technology (LSPANC 2019 June), RIKEN R-CCS, Kobe, Jun. 7, 2019.
椋木大地, 尾崎スキームに基づく高精度かつ再現性のあるBLASルーチンの実装と自動チューニングの適用, 第22回AT研究会オープンアカデミックセッション（ATOS22）, 東京大学情報基盤センター, 東京都, May 13, 2019 (in Japanese).
椋木大地, 荻田武史, 尾崎克久, 尾崎スキームによる高精度かつ再現性のあるBLASルーチンの実装と評価, 第2回精度保証付き数値計算の実問題への応用研究集会 (NVR 2018), 広島市, Dec. 2, 2018 (in Japanese).
Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki, High Performance Implementation of Reproducible BLAS Routines with Tunable Accuracy Using Ozaki Scheme, Computational Reproducibility at Exascale 2018 (CRE2018), in cooperation with SC18, Dallas, Nov. 11, 2018.
Daichi Mukunoki, Takeshi Ogita, High Performance Implementation of Accurate Matrix Multiplications on GPUs, The 18th International Symposium on Scientific Computing, Computer Arithmetic, and Verified Numerical Computations (SCAN2018), The International Conference Center at Waseda University, Tokyo, Sep. 11, 2018.
Roman Iakymchuk, Pedro Valero-Lara, Daichi Mukunoki, Accurate and cost-efficient triangular solve, The 18th International Symposium on Scientific Computing, Computer Arithmetic, and Verified Numerical Computations (SCAN2018), The International Conference Center at Waseda University, Tokyo, Sep. 11, 2018.
Daichi Mukunoki, Roman Iakymchuk, Stef Graillat, Takeshi Ogita, High-performance implementations of reproducible and accurate matrix-multiplication, 10th International Workshop on Parallel Matrix Algorithms and Applications (PMAA18), ETH Zurich, Zurich, June 27, 2018.
Daichi Mukunoki, Toshiyuki Imamura, Performance Analysis of 2.5D-PDGEMM on the K Computer, SIAM Conference on Parallel Processing for Scientific Computing (PP18), Waseda University, Tokyo, Mar. 8, 2018.
椋木大地, 次世代計算機のための数値計算ライブラリの実装技術, 日本応用数理学会三部会連携「応用数理セミナー」, 早稲田大学西早稲田キャンパス, 東京都, 2017年12月26日 (in Japanese).
椋木大地, 今村俊幸, Reduced-/Extended-precision BLASの実装方法の検討, Fifth Workshop on Largescale Parallel Numerical Computing Technology (LSPANC 2017), RIKEN AICS, 神戸市, 2017年3月27日 (in Japanese).
Daichi Mukunoki, Toshiyuki Imamura, Daisuke Takahashi, Implementation Techniques for High Performance BLAS Kernels on Modern GPUs, SIAM Conference on Computational Science and Engineering (CSE17), Hilton Atlanta, Atlanta, Feb. 28, 2017.
Yusuke Morikura, Daichi Mukunoki, Takeshi Fukaya, Naoya Yamanaka, Shin’ichi Oishi, Performance Evaluation of Verified Computation for Linear Systems on Supercomputer, SIAM: East Asian Section Conference (EASIAM 2016), University of Macau, Macau, Jun. 20-22, 2016
Daichi Mukunoki, Toshiyuki Imamura, Daisuke Takahashi, Automatic Thread-Block Size Adjustment for Dense Matrix-Vector Multiplication on CUDA, 2016 Conference on Advanced Topics and Auto Tuning in High-Performance Scientific Computing (ATAT2016), Mathematics Research Center, National Taiwan University, Taipei, Feb. 19, 2016 (Invited).
椋木大地, 高橋大介, GPUにおける3倍精度演算と4倍精度疎行列反復解法, 第3回多倍長精度計算フォーラム, 工学院大学, 東京都, 2013年3月8日 (in Japanese).
Daichi Mukunoki, Daisuke Takahashi, Iterative Method for Sparse Linear Systems using Quadruple Precision Operations on GPUs, SIAM Conference on Computational Science and Engineering (CSE13), The Westin Boston Waterfront, Boston, Massachusetts, Feb. 28, 2013.
椋木大地, 高橋大介, GPUによる4倍精度行列計算, 2011年並列／分散／協調処理に関する『鹿児島』サマー・ワークショップ（SWoPP鹿児島2011） , かごしま県民交流センター, 鹿児島市, 2011年7月27日 (in Japanese).

Awards / 受賞

Best Paper Award（Tetsuya Hoshino, Shun-Ichiro Hayashi, Daichi Mukunoki, Takahiro Katagiri, Toshihiro Hanawa, Evaluating Claude Code's Coding and Test Automation for GPU Acceleration of a Legacy Fortran Application: A GeoFEM Case Study, Proc. the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) - The 1st International Workshop on Foundational Large Language Models Advances for HPC in Asia (LLM4HPCAsia 2026), pp. 353–360, 2026において）．
Best Paper Award（Daichi Mukunoki, Masatoshi Kawai, Toshiyuki Imamura, Sparse Matrix-Vector Multiplication with Reduced-Precision Memory Accessor, Proc. 2023 IEEE 16th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-2023), pp. 608-615, 2023において）．
Research Poster Award 2nd Place Winner（Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura, A Fast Infinite Precision Inner Product using Ozaki Scheme and Dot2, and Its Application to Reproducible Conjugate Gradient Solvers, ISC High Performance (ISC 2022), research poster session, Jun. 1, 2022において）．
2021年度理化学研究所桜舞賞（「Precision-awareな数値演算手法の研究において）.
Research Poster Award（Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura, Accurate Matrix Multiplication on Binary128 using Ozaki Scheme, ISC High Performance (ISC 2021), research poster session, Jun. 29, 2021において）．
Best Research Poster Award（Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki, Accurate and Reproducible Linear Algebra Operations for Many-core Architectures, Russian Supercomputing Days 2019 (RuSCDays 2019), Sep. 23-24, 2019において）．
PRACE-ISC Research Poster Award 2017（Daichi Mukunoki, Toshiyuki Imamura, Implementation and Evaluation of 2.5D Matrix Multiplication on K Computer, ISC High Performance (ISC 2017), research poster session, Jun. 20, 2017において）．
情報処理学会 2016年度山下記念研究賞（椋木大地, 今村俊幸, 高橋大介, NVIDIA GPUにおけるメモリ律速なBLASカーネルのスレッド数自動選択手法, 情報処理学会研究報告: ハイパフォーマンスコンピューティング, Vol. 2015-HPC-150, No. 13, pp. 1-13, 2015年7月において）．
情報処理学会 2013年度コンピュータサイエンス領域奨励賞（椋木大地, 高橋大介, GPUにおける高速なCRS形式疎行列ベクトル積の実装, 情報処理学会研究報告: ハイパフォーマンスコンピューティング, Vol. 2013-HPC-138, No. 5, pp. 1-7, 2013年2月において）．
情報処理学会計算機アーキテクチャ研究会 2012年度若手奨励賞（椋木大地, 高橋大介, GPUにおける4倍精度演算を用いた疎行列反復解法の実装と評価, 情報処理学会研究報告: ハイパフォーマンスコンピューティング, Vol. 2012-HPC-137 (2012-ARC-202), No. 37, pp. 1-8, 2012年12月において）．

Funding / 研究資金獲得状況

2026年4月 - 2027年3月：2026年度学際大規模情報基盤共同利用・共同研究拠点（JHPCN）公募型共同研究，jh260065，「AI時代のスーパーコンピュータに向けた混合精度数値計算法の探求」代表者.
2025年8月 - 2027年7月：日本学術振興会科学研究費助成事業研究活動スタート支援, 25K24387，「AIスーパーコンピュータにおける科学技術計算加速のための高精度演算技術応用」, 2,730,000 円, 代表者.
2022年4月 - 2023年10月：日本学術振興会科学研究費助成事業国際共同研究加速基金（国際共同研究強化(A)）, 20KK0259，「次世代計算機のための高精度かつ精度検証可能な行列計算法の開発」, 9,230,000 円, 代表者.
2020年4月 - 2021年3月：学際大規模情報基盤共同利用・共同研究拠点公募型共同研究 (JHPCN), 公募型共同研究, jh220022, 「高性能かつ高信頼な数値計算手法とその応用」, 共同研究者（代表:荻田武史）.
2019年4月 - 2022年3月：日本学術振興会科学研究費助成事業若手研究, 19K20286, 「超並列計算環境のための高精度かつ再現性のある行列計算ライブラリの開発」, 2,340,000 円, 代表者.
2019年4月 - 2022年3月：日本学術振興会科学研究費助成事業基盤研究(B), 19H04127, 「エクサ時代の非同期タスクを応用した高性能高次元数値線形代数の研究」, 17,420,000 円, 分担者（代表者：今村俊幸）.
2018年4月 - 2019年3月：学際大規模情報基盤共同利用・共同研究拠点公募型共同研究 (JHPCN), 萌芽型共同研究, EX18104, 「高性能かつ高信頼な数値計算手法とその応用」, 共同研究者（代表者：深谷猛）.
2016年4月 - 2019年3月：日本学術振興会科学研究費助成事業若手研究(B), 16K16062, 「高性能・省電力な計算のための短尺浮動小数点表現の検討」, 2,210,000 円, 代表者.
2015年4月 - 2018年3月：日本学術振興会科学研究費助成事業基盤研究(B), 15H02709, 「O(1億)コア環境におけるスケーラブルな数値計算ソフトウェアの理論と応用」, 18,330,000 円, 分担者（代表者：今村俊幸）.
2013年4月 - 2015年3月：日本学術振興会科学研究費助成事業特別研究員奨励費, 13J01290, 「GPUスパコンのための3倍・4倍精度線形演算ライブラリの開発に関する研究」, 2,070,000 円, 代表者.

Teaching / 教育活動

2025年10月 - 2025年11月：コンピュータ科学実験b, 名古屋大学情報学部コンピュータ科学科.
2025年10月22日：第103回お試しアカウント付きスーパーコンピュータ「不老」利用型講習会分散機械学習講習会，講師.
2025年10月3日：第100回お試しアカウント付きスーパーコンピュータ「不老」利用型講習会超初心者利用講習会，講師.
2025年5月16日：第88回・お試しアカウント付きスーパーコンピュータ「不老」利用型講習会〜超初心者利用，講師.
2025年4月 - 2025年5月：数値解析及び演習, 名古屋大学情報学部コンピュータ科学科.
2025年4月 - 2025年5月：コンピュータ科学実験a, 名古屋大学情報学部コンピュータ科学科.
2023年2月 - 2023年7月：Co-superviser of Master 2 research internship, University of Perpignan, Via Domitia (France).
2018年9月 - 2019年1月：情報処理技法（リテラシ）II, 東京女子大学.
その他：学内・学外共同研究による学生研究指導．インターンシップ研究指導．

Products / 成果物

以下にプロジェクト成果物のソフトウェアを載せる．

OzBLAS: Accurate and Reproducible BLAS based on Ozaki scheme, https://github.com/mukunoki/ozblas
MUBLAS (as a demonstration of automatic thread-block size determination on CUDA kernels), https://www.r-ccs.riken.jp/labs/lpnctrt/en/projects/mublas/
BLAS-DOT2: Higher-precision BLAS based on Dot2, http://www.math.twcu.ac.jp/ogita/post-k/results.html
GEMM-TC: S/DGEMM using Tensor Cores, http://www.math.twcu.ac.jp/ogita/post-k/results.html（merged into OzBLAS (https://github.com/mukunoki/ozblas)）
Semi-ScaLAPACK-Compatible 2.5D-PxGEMM based on SUMMA (SC-SUMMA-25D), https://www.r-ccs.riken.jp/labs/lpnctrt/projects/25dpdgemm/
Batched BLAS Generator, https://www.r-ccs.riken.jp/labs/lpnctrt/projects/batchedblas/
RpFp (reduced precision memory accessor), https://www.r-ccs.riken.jp/labs/lpnctrt/projects/rpfp/
etc.

Professional Activities / 学会活動

Program Committee Member, The 2nd International Workshop on Foundational Large Language Models Advances for HPC (LLM4HPC 2026) to be held in conjunction with ISC-HPC 2026, 2026.
Program Committee Member, The 28th Workshop on Advances in Parallel and Distributed Computational Models (APDCM2026) to be held in conjunction with IPDPS 2026, 2026.
Program Committee Member, The 16th International Conference on Parallel Processing & Applied Mathematics (PPAM 2026), 2026.
Poster Chair, The International Conference on High Performance Computing in Asia-Pacific Region 2026 (HPCAsia 2026), 2026.
研究推進委員, 自動チューニング研究会, 2025-.
副代表者, 2025年度RIMS共同研究 (公開型), 「数値解析が切り開く新たな情報社会〜データ駆動型から「富岳NEXT」〜」, 京都大学数理解析研究所, 2025.
Program Committee Member, The 15th International Conference on Parallel Processing & Applied Mathematics (PPAM 2024), 2024.
Program Chair, Special Session: Performance Optimization and Auto-Tuning of Software on Multicore/Manycore Systems (POAT 2023) (in conjunction with MCSoC-2023), 2023.
Program Committee Member, 2023 IEEE 16th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-2023), 2023.
Program Committee Member, The 24th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2023) (in conjunction with IPDPS 2023), 2023.
Mini-Symposium Organizer, Mini Symposium: Exploring Arithmetic and Data Representation Beyond the Standard in HPC (at ICIAM 2023), 2023.
Program Committee Member, The 22nd International Conference on Computational Science (ICCS 2022), 2022.
Program Chair, Special Session: Auto-Tuning for Multicore and GPU (ATMG2022) (in conjunction with MCSoC-2022), 2022.
Program Committee Member, The 14th International Conference on Parallel Processing & Applied Mathematics (PPAM 2022), 2022.
Program Committee Member (Algorithm track), 36th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2022), 2022.
Publicity Chair, The International Conference on High Performance Computing in Asia-Pacific Region (HPC Asia 2022), 2022.
Program Committee Member, IEEE 22nd International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2021) (in conjunction with IPDPS 2021), 2021.
幹事（交流促進委員会）, 自動チューニング研究会, 2021-2023.
Research Poster Committee Member, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC20), 2020.
Program Committee Member, The 21st IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2020) (in conjunction with IPDPS 2020), 2020.
Program Committee Member, Workshop on Large-scale Parallel Numerical Computing Technology (LSPANC 2020 January), 2020.
編集委員, 情報処理学会論文誌コンピューティングシステム, 2020-2025
Program Committee Member, 2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-2019), 2019.
Program Committee Member, The 20th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2019) (in conjunction with IPDPS 2019), 2019.
Program Committee Member, The 4th International Workshop on GPU Computing and AI (GCA'19) (in conjunction with CANDAR'19), 2019.
Program Committee Member, The Fourteenth International Workshop on Automatic Performance Tuning (iWAPT2019) (in conjunction with IPDPS 2019), 2019.
Program Committee Member, 2018 IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-2018), 2018.
Program Committee Member, The Third International Workshop on GPU Computing and AI (GCA'18) (in conjunction with CANDAR'18), 2018.
Program Committee Member, The 19th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2018) (in conjunction with IPDPS 2018), 2018.
Program Committee Member, The Thirteenth International Workshop on Automatic Performance Tuning (iWAPT2018) (in conjunction with IPDPS 2018), 2018.
Program Committee Member, Special Session: Auto-Tuning for Multicore and GPU (ATMG 2018) (in conjunction with MCSoC-2018), 2018.
Mini-Symposium Organizer, Mini Symposium: Development of Numerical Computing Software on Emerging Computing Platforms (at SIAM PP 18), 2018.
Program Committee Member, The 18th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2017) (in conjunction with IPDPS 2017), 2017.
Program Committee Member, The Second International Workshop on GPU Computing and AI (GCA'17) (in conjunction with CANDAR'17), 2017.
Program Committee Member, The Twelfth International Workshop on Automatic Performance Tuning (iWAPT2017) (in conjunction with IPDPS 2017), 2017.
Program Committee Member, Special Session: Auto-Tuning for Multicore and GPU (ATMG 2017) (in conjunction with MCSoC-17), 2017.
Program Committee Member, The 17th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2016) (in conjunction with IPDPS 2016), 2016.
Program Committee Member, The First International Workshop on GPU Computing and Applications (GCA'16) (in conjunction with CANDAR'16), 2016.
Program Committee Member, The 16th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2015) (in conjunction with IPDPS 2015), 2015.
Program Committee Member, The 15th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2014) (in conjunction with IPDPS 2014), 2014.