Chú thích ảnh cho người khiếm thị sử dụng Transformer

Thịnh Nguyễn Văn; Khiêm Nguyễn Thiên; Đạt Đỗ Đức; Trí Nguyễn Ngọc Hoài; Thịnh Võ Văn; Thích Văn Thịnh

Authors

Thịnh Nguyễn Văn Trường Đại học Sư phạm TP. Hồ Chí Minh
Khiêm Nguyễn Thiên Trường Đại học Sư phạm TP. HCM
Đạt Đỗ Đức Trường Đại học Sư phạm TP. HCM
Trí Nguyễn Ngọc Hoài Trường Đại học Sư phạm TP. HCM
Thịnh Võ Văn Trường Đại học Sư phạm TP. HCM
Thích Văn Thịnh Trường Đại học Sư phạm TP. HCM

Keywords:

Chú thích ảnh tự động, mô hình Encoder-Decoder, Swin Transformer, Transformer Decoder, ứng dụng hỗ trợ người khiếm thị

Abstract

Visual impairment affects millions worldwide, posing significant challenges in accessing visual information. With the rapid development of mobile devices, particularly the Android platform, image-to-audio description applications are becoming increasingly popular. However, generating accurate, context-rich descriptions that are compatible with mobile deployment remains a considerable challenge. This study proposes an image captioning model based on the encoder-decoder architecture, in which the Swin Transformer is employed to extract hierarchical visual features and a Transformer Decoder is used to generate textual descriptions. The model is trained on standard datasets such as MS COCO and Flickr30k and fine-tuned on the Vietnamese-language KTVIC dataset to enhance its applicability in local contexts. Experimental results show that the model performs well on standard evaluation metrics including BLEU, METEOR, and CIDEr. In addition to the model, we developed an Android application that integrates image captioning and text-to-speech functionality, enabling real-time spoken descriptions to assist visually impaired users in accessing image content. The application demonstrates stable performance and responsive inference time under practical conditions. These results highlight the potential of the proposed approach in improving visual information accessibility for the visually impaired community.

Chú thích ảnh cho người khiếm thị sử dụng Transformer

Authors

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Categories

Most read articles by the same author(s)

Similar Articles

Cover

Language

Information