Yuto Imai

Multimodal AI Researcher & Machine Learning Engineer
📧 ytim8812@keio.jp
🌐 GitHub | Personal Site

Education

Keio University - Graduate School of Science and Technology
Center for Information and Computer Science
Apr 2024 - Present (Master’s Program)
Hiyoshi, Yokohama City, Japan

Keio University - Department of Information and Computer Science
Bachelor’s Degree
Apr 2020 - Mar 2024
Hiyoshi, Yokohama City, Japan

Research Focus

Multimodal AI & Computer Vision

Evaluation of MLLMs
Omnidirectional vision and language benchmark
Cross-modal retrieval and ranking systems
Referring expression comprehension
Vision-language model applications in robotics
Large-scale indoor object search engines

Publications & Research

Journal Articles

R. Korekata, K. Kaneda, S. Nagashima, Y. Imai, and K. Sugiura, “DM2RM: Dual-Mode Multimodal Ranking for Target Objects and Receptacles Based on Open-Vocabulary Instructions”, Advanced Robotics, Vol. 39, Issue 5, pp. 243-258, 2025.

Conference Presentations

2025

今井悠人, 五十川麻理子: “OmniOpenVQA: 全天球画像を入力とした自由応答形式のVQAデータセットの構築”, 画像の認識・理解シンポジウム2025, 2025.
戸倉健登, 後神美結, 雨宮佳音, 八島大地, 勝又圭, 今井悠人, 小松拓実, 是方諒介, 杉浦孔明: “シーンテキストを考慮したCrosslingual Visual Promptに基づくマルチモーダル検索”, 画像の認識・理解シンポジウム2025, 2025
戸倉健登, 是方諒介, 小松拓実, 今井悠人, 杉浦孔明: “Crosslingual Visual Promptに基づくテキスト付き画像からの日常物体検索”, 2025年度人工知能学会全国大会, 1Win4-52, 2025.

2024

今井悠人, 是方諒介, 杉浦孔明: “Dense textを用いたマルチモーダルLLMに基づく大規模屋内環境における物体検索”, 第42回日本ロボット学会学術講演会, 2024
今井悠人, 兼田寛大, 是方諒介, 杉浦孔明: “マルチモーダル基盤モデルと緩和対照損失を用いた大規模屋内検索エンジン”, 第38回人工知能学会全国大会, 2024年5月
是方諒介, 兼田寛大, 長嶋隼矢, 今井悠人, 杉浦孔明: “大規模言語モデルを用いたSwitching機構付きマルチモーダル検索モデルに基づく生活支援ロボットによる物体操作”, 第38回人工知能学会全国大会, 2024

2023

今井悠人, 飯岡雄偉, 畑中駿平, 九曜克之, 杉浦孔明: “マルチモーダル基盤モデルと拡散モデルに基づく対象物体の参照表現セグメンテーション”, 第41回日本ロボット学会学術講演会, 2K1-06, 2023

Poster Presentations

Ryosuke Korekata, Kanta Kaneda, Shunya Nagashima, Yuto Imai, Komei Sugiura: “Multimodal Ranking for Target Objects and Receptacles Based on Open-Vocabulary Instructions”, 2024 IEEE International Conference on Robotics and Automation (Late Breaking Results Poster), Yokohama, Japan, May 2024

Professional Experience

SB Intuitions Corp.

Research Intern (Aug 2024 - Present)

Development of Japanese-specialized vision-language models
Performance evaluation of large-scale multimodal models
Technical blog contributions on model evaluation and benchmarking

Elith Inc.

Machine Learning Engineer Intern (Dec 2023 - Mar 2024)

Applied machine learning solutions for real-world applications

pluszero Inc.

Machine Learning Engineer Intern (Apr 2023 - Mar 2024)

Developed and deployed machine learning models for production environments

Awards & Recognition

Research Awards

第5回優秀研究・技術賞, 日本ロボット学会, 2024年9月
- Selected from 800+ research presentations for exceptional contribution in usefulness, originality, novelty, and technical excellence
2024年度人工知能学会全国大会優秀賞, 人工知能学会, 2024年11月
- Selected from 900+ research presentations as top 3% for overall excellence

Competition Achievements

Winner, DialFRED Challenge @ CVPR 2023
- Team project: “DialMAT: Dialogue-Enabled Transformer with Moment-based Adversarial Training”
- Collaboration with Kanta Kaneda, Ryosuke Korekata, and team members

Scholarships

Keio University An Encouragement of Learning Scholarship: ¥300,000 (2021-2024)
Keio University An Encouragement of Learning Scholarship: ¥600,000 (2020)
Keio University Iji-kai Scholarship: ¥800,000 (2020)

Teaching Experience

Keio AI and Advanced Programming Consortium

Advanced Machine Learning Course Instructor (Apr 2023 - Dec 2023)

Taught diffusion model theory and applications to 4th-year undergraduate students
Led reading groups and practical sessions

Yokohama Science Frontier High School

Science Literacy I Guest Lecturer (2023)

Delivered specialized lectures to high school students on AI and machine learning concepts

Technical Skills

Programming & Frameworks

Python - Advanced (PyTorch, TensorFlow, scikit-learn)
Machine Learning: Computer Vision, NLP, Multimodal AI
Research Tools: Jupyter, MLflow, Weights & Biases
Development: Git, Docker, Linux

Specializations

Multimodal AI: Vision-language models, cross-modal retrieval
Computer Vision: Object detection, segmentation, visual reasoning
Natural Language Processing: Text-image understanding, instruction following
Robotics: Indoor navigation, object manipulation planning

Publications & Media

Technical Blogs

Book Reviews

『ゼロから作るDeep Learning⑤ -生成モデル編』公開レビュー (2024年4月)

Creative Activities

Theater & Performance

劇団二進数第7回公演: 『死して尚、生きてナオ』- Performer (佐藤佐吉演劇祭2024参加作品)
劇団二進数第6回公演: 『有象無象³』- Audio Staff
劇団二進数: 『脇役人生の転機』- Audio Staff (第8回全国学生演劇祭観客賞・審査員奨励賞受賞)
創像工房in front of.: 『大海を知るかもめ』- Chief Audio Engineer

Languages

Japanese - Native
English - TOEIC 750 (Business level reading/writing, conversational speaking)

Last Updated: June 2025