Three-Dimensional Convolutional Vision Transformer for Sign Language Translation 


Vol. 13,  No. 3, pp. 140-147, Mar.  2024
https://doi.org/10.3745/TKIPS.2024.13.3.130


PDF
  Abstract

In the Republic of Korea, people with hearing impairments are the second-largest demographic within the registered disability community, following those with physical disabilities. Despite this demographic significance, research on sign language translation technology is limited due to several reasons including the limited market size and the lack of adequately annotated datasets. Despite the difficulties, a few researchers continue to improve the performacne of sign language translation technologies by employing the recent advance of deep learning, for example, the transformer architecture, as the transformer-based models have demonstrated noteworthy performance in tasks such as action recognition and video classification. This study focuses on enhancing the recognition performance of sign language translation by combining transformers with 3D-CNN. Through experimental evaluations using the PHOENIX-Wether-2014T dataset [1], we show that the proposed model exhibits comparable performance to existing models in terms of Floating Point Operations Per Second (FLOPs).

  Statistics


  Cite this article

[IEEE Style]

H. Seong and H. Cho, "Three-Dimensional Convolutional Vision Transformer for Sign Language Translation," The Transactions of the Korea Information Processing Society, vol. 13, no. 3, pp. 140-147, 2024. DOI: https://doi.org/10.3745/TKIPS.2024.13.3.130.

[ACM Style]

Horyeor Seong and Hyeonjoong Cho. 2024. Three-Dimensional Convolutional Vision Transformer for Sign Language Translation. The Transactions of the Korea Information Processing Society, 13, 3, (2024), 140-147. DOI: https://doi.org/10.3745/TKIPS.2024.13.3.130.