Results 1 to 10 of about 488,216 (336)

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection [PDF]

open access: yesEuropean Conference on Computer Vision, 2023
In this paper, we present an open-set object detector, called Grounding DINO, by marrying Transformer-based detector DINO with grounded pre-training, which can detect arbitrary objects with human inputs such as category names or referring expressions ...
Shilong Liu   +10 more
semanticscholar   +1 more source

Grounding ‘Grounding’ in NLP [PDF]

open access: yesFindings of the Association for Computational Linguistics: ACL-IJCNLP 2021, 2021
The NLP community has seen substantial recent interest in grounding to facilitate interaction between language technologies and the world. However, as a community, we use the term broadly to reference any linking of text to data or non-textual modality. In contrast, Cognitive Science more formally defines "grounding" as the process of establishing what
Chandu, Khyathi Raghavi   +2 more
openaire   +2 more sources

Kosmos-2: Grounding Multimodal Large Language Models to the World [PDF]

open access: yesInternational Conference on Learning Representations, 2023
We introduce Kosmos-2, a Multimodal Large Language Model (MLLM), enabling new capabilities of perceiving object descriptions (e.g., bounding boxes) and grounding text to the visual world. Specifically, we represent refer expressions as links in Markdown,
Zhiliang Peng   +6 more
semanticscholar   +1 more source

Coincident Objects and The Grounding Problem [PDF]

open access: yesJournal of Philosophical Investigations, 2022
Pluralists believe in the occurrence of numerically distinct spatiotemporally coincident objects. They argue that there are coincident objects that share all physical and spatiotemporal properties and relations; nevertheless, they differ in terms of ...
Ataollah Hashemi
doaj   +1 more source

GLaMM: Pixel Grounding Large Multimodal Model [PDF]

open access: yesComputer Vision and Pattern Recognition, 2023
Large Multimodal Models (LMMs) extend Large Lan-guage Models to the vision domain. Initial LMMs used holistic images and text prompts to generate ungrounded textual responses.
H. Rasheed   +9 more
semanticscholar   +1 more source

Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning [PDF]

open access: yesInternational Conference on Machine Learning, 2023
Recent works successfully leveraged Large Language Models'(LLM) abilities to capture abstract knowledge about world's physics to solve decision-making problems.
Thomas Carta   +5 more
semanticscholar   +1 more source

Electromagnetic disturbance characteristic of typical high voltage switchgear interruption process in offshore wind farm based on integrated conduction model

open access: yesHigh Voltage, 2023
In an offshore wind farm, a high‐voltage switchgear interruption in an offshore substation creates a high‐frequency, high‐amplitude overvoltage that can cause severe electromagnetic interference problems in the intelligent electronic device.
Huaqing Wang   +4 more
doaj   +1 more source

SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Task Planning [PDF]

open access: yesConference on Robot Learning, 2023
Large language models (LLMs) have demonstrated impressive results in developing generalist planning agents for diverse tasks. However, grounding these plans in expansive, multi-floor, and multi-room environments presents a significant challenge for ...
Krishan Rana   +5 more
semanticscholar   +1 more source

Three types of bidirectional leader development in triggered lightning flashes

open access: yesScientific Reports, 2023
Eight cases of bidirectional leader (BL) development in artificially triggered lightning flashes are reported with synchronous high-speed camera images and electric field signals.
Rui Su   +5 more
doaj   +1 more source

UniVTG: Towards Unified Video-Language Temporal Grounding [PDF]

open access: yesIEEE International Conference on Computer Vision, 2023
Video Temporal Grounding (VTG), which aims to ground target clips from videos (such as consecutive intervals or disjoint shots) according to custom language queries (e.g., sentences or words), is key for video browsing on social media.
Kevin Lin   +7 more
semanticscholar   +1 more source

Home - About - Disclaimer - Privacy