DIC-Transformer: interpretation of plant disease classification results using image caption generation technology [PDF]
Disease image classification systems play a crucial role in identifying disease categories in the field of agricultural diseases. However, current plant disease image classification methods can only predict the disease category and do not offer ...
Qingtian Zeng, Jian Sun, Shansong Wang
doaj +2 more sources
Enhancing image caption generation through context-aware attention mechanism [PDF]
Image captioning, the process of generating natural language descriptions based on image content, has garnered attention in AI research for its implications in scene understanding and human-computer interaction.
Ahatesham Bhuiyan +3 more
doaj +2 more sources
VSAM-Based Visual Keyword Generation for Image Caption
Image caption is to understand and describe the visual content, which is expected to be applied in automatic news reporting in future. In recent years, there has been an increasing interest in an Encoder-Decoder framework for image caption: the encoder ...
Suya Zhang +3 more
doaj +3 more sources
PBC-Transformer: Interpreting Poultry Behavior Classification Using Image Caption Generation Techniques [PDF]
Accurate classification of poultry behavior is critical for assessing welfare and health, yet most existing methods predict behavior categories without providing explanations for the image content. This study introduces the PBC-Transformer model, a novel
Jun Li +7 more
doaj +2 more sources
Fine-Tuning a Small Vision Language Model Using Synthetic Data for Explaining Bacterial Skin Disease Images [PDF]
Background/Objectives: Vision language models (VLMs) show strong potential for medical image understanding, but their large scale often limits practical deployment. This study investigates whether a compact VLM can be effectively adapted for dermatology,
Shiwan Zhang +3 more
doaj +2 more sources
Review of Image Captioning Methods Based on Encoding-Decoding Technology [PDF]
In recent years, image caption generation, as a multimodal task in the field of artificial intelligence, integrates the related research of computer vision and natural language processing, and can realize the modal conversion from image to text. It plays
GENG Yaogang, MEI Hongyan, ZHANG Xing, LI Xiaohui
doaj +1 more source
A Multimodal Framework for Video Caption Generation
Video captioning is a highly challenging computer vision task that automatically describes the video clips using natural language sentences with a clear understanding of the embedded semantics.
Reshmi S. Bhooshan, Suresh K.
doaj +1 more source
Automated Caption Generation for Video Call with Language Translation [PDF]
In the modern era, virtual communication between individuals is common. Many people’s lives have been made simpler in a number of circumstances by providing subtitles, generating automated captions for social media videos, and language translation from a
Polepaka Sanjeeva +4 more
doaj +1 more source
Parallel Dense Video Caption Generation with Multi-Modal Features
The task of dense video captioning is to generate detailed natural-language descriptions for an original video, which requires deep analysis and mining of semantic captions to identify events in the video. Existing methods typically follow a localisation-
Xuefei Huang +3 more
doaj +1 more source
Multifaceted Feature Coding Image Caption Generation Algorithm Based on Transformer [PDF]
Object features extracted by object detection algorithms play an increasingly critical role in the generation of image captions.However, only using the features of object detection as the input of an image caption task can lead to the loss of other ...
HENG Hongjun, FAN Yuchen, WANG Jialiang
doaj +1 more source

