Results 121 to 130 of about 5,597 (161)
Large Language Models and 3D Vision for Intelligent Robotic Perception and Autonomy. [PDF]
Mehta V, Sharma C, Thiyagarajan K.
europepmc +1 more source
Action Recognition with 3D Residual Attention and Cross Entropy. [PDF]
Ouyang Y, Li X.
europepmc +1 more source
Object Detection with Transformers: A Review. [PDF]
Shehzadi T +4 more
europepmc +1 more source
Advancing medical imaging with language models: featuring a spotlight on ChatGPT. [PDF]
Hu M +5 more
europepmc +1 more source
Evaluating Features and Variations in Deepfake Videos Using the CoAtNet Model. [PDF]
Alattas E +3 more
europepmc +1 more source
Dense Video Captioning Using Graph-Based Sentence Summarization
12 ...
Zhiwang Zhang, Dong Xu, Wangli Ouyang
exaly +3 more sources
Some of the next articles are maybe not open access.
Related searches:
Related searches:
Accelerated masked transformer for dense video captioning
Neurocomputing, 2021Abstract Dense video captioning aims to generate dense descriptions for all possible events in an untrimmed video. The task is challenging that it requires accurately localizing events in the video and simultaneously describe each event with a sentence.
Zhou Yu
exaly +2 more sources
Hierarchical Language Modeling for Dense Video Captioning
Lecture Notes in Networks and Systems, 2022Jaivik Dave, S. Padmavathi
exaly +2 more sources

