EVENT
Event News
Talk on "Sign Language and AI"
We are pleased to inform you about the upcoming seminar titled:"Sign Language and AI" Everyone interested is cordially invited to attend!
Title:
Sign Language and AI
Invited Speaker:
Takashi Koyano (Executive Producer, NHK Enterprises Inc, Japan)
Presenters:
Santiago Poveda Gutierrez (University of Kyoto)
Junwen Mo (University of Tokyo)
Biplav Sharma Regmi (Asian Institute of Technology)
Zhidong Xiao (Bournemouth University)
Time/Date:
Tuesday, November 4, 2025
13:00 Registration opens
13:15 - 13:30 Opening Remarks & Introducing British Council Project by Zhidong Xiao
13:30 - 14:30 Invited Lecture by Takashi Koyano (NHK Enterprises)
14:30 - 15:00 Q&A
15:00 - 15:30 Break
15:30 - 16:00 Presentation by Santiago Poveda Gutierrez
16:00 - 16:30 Presentation by Junwen Mo
16:30 - 17:00 Presentation by Biplav Sharma Regmi / Tim Henrik Sandermann (Tim joining via Zoom)
17:00 - 17:45 Presentation by Zhidong Xiao
18:00 - Informal social gathering
Invited Lecture:
Takashi Koyano (Executive Producer, NHK Enterprises Inc, Japan )
Title: Advancing inclusion with innovative sign language CG services.
Abstract: KIKI is a digital human designed to closely resemble a real person, enabling rich and highly accurate sign language expression. She is connecting the worlds of people with hearing impairments and those without. Now she is an official ambassador to the Tokyo Deaflympics 2025.
In this session, I will review the development of KIKI, including character design, motion capture, and CG creations. It has not been a smooth ride. We've experienced ups and downs throughout the years with trial and error. This session will provide you with a future reference for creating avatar-based CG sign language systems in the AI era.
Bio:Takashi Koyano has more than 20 years of experience in the digital field at NHK. He has worked on NHK's digital deployment for the Tokyo Olympics and Paralympics, as well as the PyeongChang Olympics and Paralympics, including accessibility aspects such as automated sign language. Currently, he is working on enhancing the automated sign-language CG generation system not only for broadcasting but also other public information systems, such as railways and airlines.
Presentations:
Santiago Poveda Gutierrez (University of Kyoto)
Title: Could MM-LLMs Understand Sign Language? A Novel Approach to SLR
Abstract: Recent advances in multimodal large language models (MM-LLMs) have demonstrated remarkable abilities in video understanding and natural language reasoning. However, their potential for processing signed languages remains largely unexplored. This talk introduces a novel approach to Sign Language Recognition (SLR) that uses MM-LLMs for fine-grained action understanding. Instead of relying solely on glosses--non-standardized, lossy representations of signs--our method uses a new semi-phonetic text-based representation, similar to the one found in some sign language dictionaries. By generating detailed action descriptions from video input and matching them with dictionary entries through sentence encoders, we aim to bypass some of the data scarcity and gloss-related limitations that currently constrain the field, as well as to provide the first insights into the capacity (or lack thereof) of MM-LLMs to process signed languages appropriately.
I will present the concept and design of our pipeline and the results of preliminary experiments, which suggest challenges as well as promising directions. I will also discuss ongoing efforts such as dataset expansion via crowdsourcing and synthetic data generation, contrastive learning, and fine-tuning to enhance performance.
Bio: Santiago Poveda Gutiérrez is a Master's student at Kawahara Lab in the University of Kyoto. Their research focuses on sign language processing, with a particular focus on its applications and on low-resource languages.
Junwen Mo (University of Tokyo)
Title: Improving Sign Language Understanding with a Multi-Stream Masked Autoencoder Trained on ASL Videos
Abstract: Artificial intelligence has advanced rapidly in recent years, achieving remarkable success across many domains. However, progress in sign language understanding has been slower, primarily due to the scarcity of available data. Annotated sign language videos are difficult to obtain, and many sign languages are relatively rare. In this work, we propose a Multi-Stream Masked Autoencoder (MS-MAE), pretrained on unlabeled ASL videos. Our experiments demonstrate that pretraining solely on ASL videos significantly improves performance on sign language understanding tasks, such as isolated sign recognition and sign language translation, in low-resource settings and across multiple sign languages. Additionally, we visualize the attention distribution of the pretrained model on samples from unseen sign language, CSL. The model appears to segment sign language sentences into several units, some of which align with individual signs. This observation not only helps explain the effectiveness of transfer learning but also suggests a promising direction for future research: analyzing model behavior could yield valuable insights for linguistic studies.
Bio: Junwen Mo is a PhD candidate at Nakayama Lab, The University of Tokyo. His research focuses on large-scale sign language processing, with a particular emphasis on sign language translation.
Biplav Sharma Regmi (Asian Institute of Technology)
Title: Modeling Mouth Movements for Inclusive Sign Language AI: Toward Deaf-Centered Access and Understanding
Abstract: Sign languages combine complex manual and non-manual articulations, where mouth movements carry essential grammatical and expressive meaning. Yet most Sign Language Processing (SLP) systems remain hand-focused, leading to outputs that Deaf users find limited or unnatural. This research, conducted jointly by the National Institute of Informatics and collaborators from the Asian Institute of Technology, explores the role of mouth actions as linguistic signals for more equitable and accessible sign language technologies.
We present an AI-driven approach that models four types of mouth actions - mouthing, mouth gesture, other mouth movement, and no movement - using a dual-stream neural architecture integrating appearance and geometric features. The model achieves sufficient accuracy on annotated Japanese Sign Language data and serves as a foundation for expanding multimodal datasets and integrating mouth-hand alignment into downstream sign recognition and translation.
Beyond technical accuracy, this work examines the social and ethical implications of automated sign processing, emphasizing Deaf-led data annotation, sustainable research practices, and fairness in AI design. By incorporating non-manual markers, we move toward sign language technologies that reflect the linguistic richness and communicative realities of Deaf communities.
Bio: Biplav Sharma Regmi is a PhD candidate in Data Science and AI at the Asian Institute of Technology, Thailand, specializing in computer vision and multimodal language modeling. His research focuses on ethical and inclusive AI for Sign Language Processing and Traffic Surveillance. He conducted his recent internship at the National Institute of Informatics, Japan, under the supervision of Dr. Mayumi Bono, collaborating with Tim Sandermann and Okada Tomohiro on mouth movement analysis in Japanese Sign Language. Together, they explore Deaf-centered AI approaches that integrate linguistic insight, accessibility, and technological innovation.
Zhidong Xiao ( Bournemouth University )
Title: Motion-Temporal Calibration Network for Continuous Sign Language Recognition
Abstract: Continuous Sign Language Recognition (CSLR) is fundamental to bridging the communication gap between hard hearing individuals and the broader society. The primary challenge lies in effectively modeling the complex spatial-temporal dynamic features in sign language videos. Current approaches typically employ independent processing strategies for motion feature extraction and temporal modeling, which impedes the unified modeling of action continuity and semantic integrity in sign language sequences. To address these limitations, we propose the Motion-Temporal Calibration Network (MTCNet), a novel framework for continuous sign language recognition that integrates dynamic feature enhancement and temporal calibration. The framework consists of two key innovative modules. First, the Cross-Frame Motion Refinement (CFMR) module implements an inter-frame differential attention mechanism combined with residual learning strategies, enabling precise motion feature modeling and effective enhancement of dynamic information between adjacent frames. Second, the Temporal-Channel Adaptive Recalibration (TCAR) module utilizes adaptive convolution kernel design and a dual-branch feature extraction architecture, facilitating joint optimization in both temporal and channel dimensions. In experimental evaluations, our method demonstrates competitive performance on the widely-used PHOENIX-2014 and PHOENIX-2014-T datasets, achieving results comparable to leading unimodal approaches. Moreover, it achieves state-of-the-art performance on the Chinese Sign Language (CSL) dataset. Through comprehensive ablation studies and quantitative analysis, we validate the effectiveness of our proposed method in fine-grained dynamic feature modeling and long-term dependency capture while maintaining computational efficiency.
Bio: Dr. Zhidong Xiao has over ten years' academic leadership experience at National Centre for Computer Animation, Faculty of Media, Science and Technology, Bournemouth University in the United Kingdom. Dr. Xiao's research interests are in the areas of Computer Graphics, Motion Capture, Artificial Intelligence, Machine Learning, Physics-based Simulation and Robotics etc. His current research focus is on the continuous sign language recognition. Dr. Xiao is the Fellow of British Computer Society and the member of EPSRC and ESRC Peer Review College. He has supervised seven PhD students to successful completion. Working as PI and Co-I, Dr. Xiao had successfully secured and completed a few research council funded projects and some commercial projects with industry partners.
Place:
Room 1208/1210 , NII
Other:
Language: English
Accessibility: Speech-to-text transcription app will be provided.
Admission: Free
Registration: Not required
Link:
For more information, please visit:
https://research.nii.ac.jp/~bono/en/event/20251104.html
The first related event will be held on November 3, 2025:
https://research.nii.ac.jp/~bono/ja/event/20251103.html
Contact:
If you would like to join, please contact by email.
Email :bono[at]nii.ac.jp

Summary of NII 2024
NII Today No.104(EN)
NII Today No.103(EN)
Overview of NII 2024
Guidance of Informatics Program, SOKENDAI 24-25
NII Today No.102(EN)
SINETStream Use Case: Mobile Animal Laboratory [Bio-Innovation Research Center, Tokushima Univ.]
The National Institute of Information Basic Principles of Respect for LGBTQ
DAAD