Efficient Self-Attention Model for Speech Recognition-Based Assistive Robots Control

Sensors (Basel). 2023 Jun 30;23(13):6056. doi: 10.3390/s23136056.

Abstract

Assistive robots are tools that people living with upper body disabilities can leverage to autonomously perform Activities of Daily Living (ADL). Unfortunately, conventional control methods still rely on low-dimensional, easy-to-implement interfaces such as joysticks that tend to be unintuitive and cumbersome to use. In contrast, vocal commands may represent a viable and intuitive alternative. This work represents an important step toward providing a viable vocal interface for people living with upper limb disabilities by proposing a novel lightweight vocal command recognition system. The proposed model leverages the MobileNet2 architecture, augmenting it with a novel approach to the self-attention mechanism, achieving a new state-of-the-art performance for Keyword Spotting (KWS) on the Google Speech Commands Dataset (GSCD). Moreover, this work presents a new dataset, referred to as the French Speech Commands Dataset (FSCD), comprising 4963 vocal command utterances. Using the GSCD as the source, we used Transfer Learning (TL) to adapt the model to this cross-language task. TL has been shown to significantly improve the model performance on the FSCD. The viability of the proposed approach is further demonstrated through real-life control of a robotic arm by four healthy participants using both the proposed vocal interface and a joystick.

Keywords: assistive robots; deep learning; human–machine interface; keyword spotting; robotic assistive arm; self-attention; speech command; speech recognition; transfer learning.

MeSH terms

  • Activities of Daily Living
  • Humans
  • Robotics*
  • Self-Help Devices*
  • Speech
  • Speech Perception*