Abstract:
This paper proposes a solution of tongue segmentation in images. The solution relies on a convolutional neural network, using deep U-Net with deep layers of encoder-decoder modules. The model is trained with a starting resolution of 512 x 512 pixels. To enhance the segmentation performances of the trained model across recording environments, three main types of data augmentations are added in the training process, including additive gaussian noise, multiply and add to brightness, and change color temperature. They could also handle an inadequate number of data samples in the limited datasets. The proposed method is evaluated based on four measurement metrics of Dice coefficient, mean IoU, Jaccard distance, and accuracy. The model is successfully trained on publicly available datasets, and then transferred to be tested with the self-collected dataset in the real-world environment. © 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.