Research & Technical Projects

Showcasing my work in Vision and Language.


🔬 Research Projects

Ongoing Projects(WIP)

Duration: 2025

Large-Scale Indoor Object Search Engine

Conference: RSJ 2024, JSAI 2024

  • JSAI 2024 Excellence Award (Top 3% of 900+ presentations)

RSJ24 Presentation:

JSAI24 Presentation:


Referring Expression Segmentation with Diffusion Models

Conference: RSJ 2023

  • RSJ 2023 Excellence Award (Top 1% of 800+ presentations)

RSJ23 Presentation:


🏆 Competition Projects

DialFRED Challenge Winner @ CVPR 2023

  • 1st Place Winner at international competition at top-tier venue

💼 Industry Projects

Sarashina2-Vision: Japanese-Specialized Vision-Language Models

Company: SB Intuitions Corp.
Duration: Aug 2024 - Present
Role: Research Intern

Contributing development of Japanese-specialized large-scale vision-language models for cultural and linguistic contexts.


Production ML Systems

Companies: pluszero Inc., Elith Inc.
Duration: 2023 - 2024
Role: ML Engineer Intern

Developed and deployed machine learning solutions for real-world applications across multiple startups.


📚 Conference Presentations & Research Resources

Journal Club Presentations

Hyperbolic Image-Text Representations

Improving Cross-Modal Retrieval with Diverse Embeddings

Cost Aggregation with 4D Convolutional Swin Transformer

Technical Writing


🎓 Educational Projects

Advanced Machine Learning Course Instruction

Institution: Keio AI and Advanced Programming Consortium
Duration: Apr 2023 - Dec 2023
Focus: Diffusion Model Theory and Applications

  • Curriculum Development: Created comprehensive course materials on diffusion models
  • Teaching: Led practical sessions and theoretical discussions for 4th-year students

High School AI Education

Institution: Yokohama Science Frontier High School
Duration: 2023
Course: Science Literacy I

  • Outreach: Delivered specialized lectures on AI and machine learning
  • Engagement: Inspired next generation of researchers through hands-on demonstrations

🎭 Creative & Technical Projects

Theater Production Technology

Organizations: 劇団二進数, 創像工房in front of.
Duration: 2020 - 2024
Roles: Performer, Audio Engineer, Technical Director

Combining technical skills with creative expression in live theater productions.

  • Technical Skills: Sound design, audio engineering, live mixing
  • Creative Skills: Performance, script development, production management
  • Recognition: Multiple awards including 観客賞 and 審査員奨励賞

Notable Productions:

  • 『死して尚、生きてナオ』 - Audio Staff (佐藤佐吉演劇祭2024)
  • 『有象無象³』 - Audio Staff
  • 『脇役人生の転機』 - Audio Staff (competition winner)
  • 『大海を知るかもめ』 - Chief Audio Engineer

🛠️ Technical Expertise by Domain

Multimodal AI & Vision-Language Models

  • Cross-modal retrieval and ranking systems
  • Vision-language model fine-tuning and evaluation
  • Japanese-specialized multimodal models

Computer Vision & Deep Learning

  • Object detection and segmentation
  • Diffusion models for generative tasks
  • Referring expression understanding

Natural Language Processing

  • Instruction following and dialog systems
  • Dense text processing for retrieval

Robotics & Applications

  • Indoor navigation and object search
  • Embodied AI and instruction following

💡 Interested in collaboration? I’m always excited to work on innovative projects at the intersection of AI, robotics, and real-world applications. Feel free to reach out!


This portfolio showcases projects from 2020-2025. For the most recent updates, check my GitHub or research presentations.