Resume
Yuto Imai
Multimodal AI Researcher & Machine Learning Engineer
📧 ytim8812@keio.jp
🌐 GitHub | Personal Site
Education
Keio University - Graduate School of Science and Technology
Center for Information and Computer Science
Apr 2024 - Present (Master’s Program)
Hiyoshi, Yokohama City, Japan
Keio University - Department of Information and Computer Science
Bachelor’s Degree
Apr 2020 - Mar 2024
Hiyoshi, Yokohama City, Japan
Research Focus
Multimodal AI & Computer Vision
- Evaluation of MLLMs
- Omnidirectional vision and language benchmark
- Cross-modal retrieval and ranking systems
- Referring expression comprehension
- Vision-language model applications in robotics
- Large-scale indoor object search engines
Publications & Research
Journal Articles
- R. Korekata, K. Kaneda, S. Nagashima, Y. Imai, and K. Sugiura, “DM2RM: Dual-Mode Multimodal Ranking for Target Objects and Receptacles Based on Open-Vocabulary Instructions”, Advanced Robotics, Vol. 39, Issue 5, pp. 243-258, 2025.
Conference Presentations
2025
- 今井悠人, 五十川麻理子: “OmniOpenVQA: 全天球画像を入力とした自由応答形式のVQAデータセットの構築”, 画像の認識・理解シンポジウム2025, 2025.
- 戸倉健登, 後神美結, 雨宮佳音, 八島大地, 勝又圭, 今井悠人, 小松拓実, 是方諒介, 杉浦孔明: “シーンテキストを考慮したCrosslingual Visual Promptに基づくマルチモーダル検索”, 画像の認識・理解シンポジウム2025, 2025
- 戸倉健登, 是方諒介, 小松拓実, 今井悠人, 杉浦孔明: “Crosslingual Visual Promptに基づくテキスト付き画像からの日常物体検索”, 2025年度 人工知能学会全国大会, 1Win4-52, 2025.
2024
- 今井悠人, 是方諒介, 杉浦孔明: “Dense textを用いたマルチモーダルLLMに基づく大規模屋内環境における物体検索”, 第42回日本ロボット学会学術講演会, 2024
- 今井悠人, 兼田寛大, 是方諒介, 杉浦孔明: “マルチモーダル基盤モデルと緩和対照損失を用いた大規模屋内検索エンジン”, 第38回人工知能学会全国大会, 2024年5月
- 是方諒介, 兼田寛大, 長嶋隼矢, 今井悠人, 杉浦孔明: “大規模言語モデルを用いたSwitching機構付きマルチモーダル検索モデルに基づく生活支援ロボットによる物体操作”, 第38回人工知能学会全国大会, 2024
2023
- 今井悠人, 飯岡雄偉, 畑中駿平, 九曜克之, 杉浦孔明: “マルチモーダル基盤モデルと拡散モデルに基づく対象物体の参照表現セグメンテーション”, 第41回日本ロボット学会学術講演会, 2K1-06, 2023
Poster Presentations
- Ryosuke Korekata, Kanta Kaneda, Shunya Nagashima, Yuto Imai, Komei Sugiura: “Multimodal Ranking for Target Objects and Receptacles Based on Open-Vocabulary Instructions”, 2024 IEEE International Conference on Robotics and Automation (Late Breaking Results Poster), Yokohama, Japan, May 2024
Professional Experience
SB Intuitions Corp.
Research Intern (Aug 2024 - Present)
- Development of Japanese-specialized vision-language models
- Performance evaluation of large-scale multimodal models
- Technical blog contributions on model evaluation and benchmarking
Elith Inc.
Machine Learning Engineer Intern (Dec 2023 - Mar 2024)
- Applied machine learning solutions for real-world applications
pluszero Inc.
Machine Learning Engineer Intern (Apr 2023 - Mar 2024)
- Developed and deployed machine learning models for production environments
Awards & Recognition
Research Awards
- 第5回優秀研究・技術賞, 日本ロボット学会, 2024年9月
- Selected from 800+ research presentations for exceptional contribution in usefulness, originality, novelty, and technical excellence
- 2024年度人工知能学会全国大会優秀賞, 人工知能学会, 2024年11月
- Selected from 900+ research presentations as top 3% for overall excellence
Competition Achievements
- Winner, DialFRED Challenge @ CVPR 2023
- Team project: “DialMAT: Dialogue-Enabled Transformer with Moment-based Adversarial Training”
- Collaboration with Kanta Kaneda, Ryosuke Korekata, and team members
Scholarships
- Keio University An Encouragement of Learning Scholarship: ¥300,000 (2021-2024)
- Keio University An Encouragement of Learning Scholarship: ¥600,000 (2020)
- Keio University Iji-kai Scholarship: ¥800,000 (2020)
Teaching Experience
Keio AI and Advanced Programming Consortium
Advanced Machine Learning Course Instructor (Apr 2023 - Dec 2023)
- Taught diffusion model theory and applications to 4th-year undergraduate students
- Led reading groups and practical sessions
Yokohama Science Frontier High School
Science Literacy I Guest Lecturer (2023)
- Delivered specialized lectures to high school students on AI and machine learning concepts
Technical Skills
Programming & Frameworks
- Python - Advanced (PyTorch, TensorFlow, scikit-learn)
- Machine Learning: Computer Vision, NLP, Multimodal AI
- Research Tools: Jupyter, MLflow, Weights & Biases
- Development: Git, Docker, Linux
Specializations
- Multimodal AI: Vision-language models, cross-modal retrieval
- Computer Vision: Object detection, segmentation, visual reasoning
- Natural Language Processing: Text-image understanding, instruction following
- Robotics: Indoor navigation, object manipulation planning
Publications & Media
Technical Blogs
- SB Intuitions Tech Blog: Sarashina2-Vision-8B, 14Bの性能評価
- SB Intuitions Tech Blog: Sarashina2-Vision 日本語特化の大規模視覚言語モデルの公開
- SB Intuitions Tech Blog: 多肢選択形式のベンチマーク
- Zenn.dev: Pytorchで書いたモデルの中間層と友達になろう
Book Reviews
- 『ゼロから作るDeep Learning⑤ -生成モデル編』公開レビュー (2024年4月)
Creative Activities
Theater & Performance
- 劇団二進数 第7回公演: 『死して尚、生きてナオ』- Performer (佐藤佐吉演劇祭2024参加作品)
- 劇団二進数 第6回公演: 『有象無象³』- Audio Staff
- 劇団二進数: 『脇役人生の転機』- Audio Staff (第8回全国学生演劇祭 観客賞・審査員奨励賞受賞)
- 創像工房in front of.: 『大海を知るかもめ』- Chief Audio Engineer
Languages
Japanese - Native
English - TOEIC 750 (Business level reading/writing, conversational speaking)
Last Updated: June 2025