Region of Interest Extraction and Bilinear Interpolation Application for Preprocessing of Lipreading Systems
Vol. 13, No. 4, pp. 189-198,
Apr. 2024
https://doi.org/10.3745/TKIPS.2024.13.4.189
PDF
Abstract
Lipreading is one of the important parts of speech recognition, and several studies have been conducted to improve the performance
of lipreading in lipreading systems for speech recognition. Recent studies have used method to modify the model architecture of lipreading
system to improve recognition performance. Unlike previous research that improve recognition performance by modifying model
architecture, we aim to improve recognition performance without any change in model architecture. In order to improve the recognition
performance without modifying the model architecture, we refer to the cues used in human lipreading and set other regions such as
chin and cheeks as regions of interest along with the lip region, which is the existing region of interest of lipreading systems, and compare
the recognition rate of each region of interest to propose the highest performing region of interest In addition, assuming that the difference
in normalization results caused by the difference in interpolation method during the process of normalizing the size of the region of
interest affects the recognition performance, we interpolate the same region of interest using nearest neighbor interpolation, bilinear
interpolation, and bicubic interpolation, and compare the recognition rate of each interpolation method to propose the best performing
interpolation method. Each region of interest was detected by training an object detection neural network, and dynamic time warping
templates were generated by normalizing each region of interest, extracting and combining features, and mapping the dimensionality
reduction of the combined features into a low-dimensional space. The recognition rate was evaluated by comparing the distance between
the generated dynamic time warping templates and the data mapped to the low-dimensional space. In the comparison of regions of
interest, the result of the region of interest containing only the lip region showed an average recognition rate of 97.36%, which is 3.44%
higher than the average recognition rate of 93.92% in the previous study, and in the comparison of interpolation methods, the bilinear
interpolation method performed 97.36%, which is 14.65% higher than the nearest neighbor interpolation method and 5.55% higher than
the bicubic interpolation method. The code used in this study can be found a https://github.com/haraisi2/Lipreading-Systems.
Statistics
Cite this article
[IEEE Style]
J. H. Han, Y. K. Kim, M. H. Kim, "Region of Interest Extraction and Bilinear Interpolation Application for Preprocessing of Lipreading Systems," The Transactions of the Korea Information Processing Society, vol. 13, no. 4, pp. 189-198, 2024. DOI: https://doi.org/10.3745/TKIPS.2024.13.4.189.
[ACM Style]
Jae Hyeok Han, Yong Ki Kim, and Mi Hye Kim. 2024. Region of Interest Extraction and Bilinear Interpolation Application for Preprocessing of Lipreading Systems. The Transactions of the Korea Information Processing Society, 13, 4, (2024), 189-198. DOI: https://doi.org/10.3745/TKIPS.2024.13.4.189.