EdgeVidCap: A Channel-Spatial Dual-Branch Lightweight Video Captioning Model for IoT Edge Cameras. [PDF]
Guo L +9 more
europepmc +1 more source
Multimodal generative AI for interpreting 3D medical images and videos. [PDF]
Lee JO +4 more
europepmc +1 more source
Learning Semantic Features for Dense Video Captioning
Sujin Lee, Incheol Kim
openaire +1 more source
Adaptive graph signal processing for robust multimodal fusion with dynamic semantic alignment. [PDF]
Karthikeya KV +4 more
europepmc +1 more source
Training strategies for semi-supervised remote sensing image captioning. [PDF]
Cheng Q +5 more
europepmc +1 more source
Insights into Object Semantics: Leveraging Transformer Networks for Advanced Image Captioning. [PDF]
Abdal Hafeth D, Kollias S.
europepmc +1 more source
Will multimodal large language models ever achieve deep understanding of the world? [PDF]
Farkaš I, Vavrečka M, Wermter S.
europepmc +1 more source
Seeing Far and Clearly: Mitigating Hallucinations in MLLMs with Attention Causal Decoding. [PDF]
Tang F +20 more
europepmc +1 more source
EchoNet++: A multilingual soccer match audio commentary dataset. [PDF]
Majeed F, Nazir M, Agus M, Schneider J.
europepmc +1 more source
Enhancing interior design and space planning via human-machine intelligent interaction for artistic cognition. [PDF]
Jiang J.
europepmc +1 more source

