Efficient Self-Attention Model for Speech Recognition-Based Assistive Robots Control

Samuel Poirier; Ulysse Côté-Allard; François Routhier; Alexandre Campeau-Lecours

doi:10.3390/s23136056

Efficient Self-Attention Model for Speech Recognition-Based Assistive Robots Control

Sensors (Basel). 2023 Jun 30;23(13):6056. doi: 10.3390/s23136056.

Authors

Samuel Poirier^{1

2}, Ulysse Côté-Allard³, François Routhier^{1

2}, Alexandre Campeau-Lecours^{1

2}

Affiliations

¹ Université Laval, Quebec City, QC G1V 0A6, Canada.
² Centre for Interdisciplinary Research in Rehabilitation and Social Integration, CIUSSS de la Capitale-Nationale, Quebec City, QC G1M 2S8, Canada.
³ Department of Technology Systems, University of Oslo, 0313 Oslo, Norway.

Abstract

Assistive robots are tools that people living with upper body disabilities can leverage to autonomously perform Activities of Daily Living (ADL). Unfortunately, conventional control methods still rely on low-dimensional, easy-to-implement interfaces such as joysticks that tend to be unintuitive and cumbersome to use. In contrast, vocal commands may represent a viable and intuitive alternative. This work represents an important step toward providing a viable vocal interface for people living with upper limb disabilities by proposing a novel lightweight vocal command recognition system. The proposed model leverages the MobileNet2 architecture, augmenting it with a novel approach to the self-attention mechanism, achieving a new state-of-the-art performance for Keyword Spotting (KWS) on the Google Speech Commands Dataset (GSCD). Moreover, this work presents a new dataset, referred to as the French Speech Commands Dataset (FSCD), comprising 4963 vocal command utterances. Using the GSCD as the source, we used Transfer Learning (TL) to adapt the model to this cross-language task. TL has been shown to significantly improve the model performance on the FSCD. The viability of the proposed approach is further demonstrated through real-life control of a robotic arm by four healthy participants using both the proposed vocal interface and a joystick.

Keywords: assistive robots; deep learning; human–machine interface; keyword spotting; robotic assistive arm; self-attention; speech command; speech recognition; transfer learning.

MeSH terms

Activities of Daily Living
Humans
Robotics*
Self-Help Devices*
Speech
Speech Perception*

Abstract

MeSH terms

Grants and funding