Results 101 to 110 of about 96,015 (310)
Multimodal text analysis has become a crucial part of research, teaching and practice for a wide range of academic and practical disciplines. A variety of techniques, theoretical frameworks, and methodologies have therefore evolved for such analysis. For
O'Halloran, Kay, Smith, B.
core
Date of Acceptance: 15/12/2014We study the performance and user experience of two popular mainstream mobile text entry methods: the Smart Touch Keyboard (STK) and the Smart Gesture Keyboard (SGK).
Per Ola Kristensson +5 more
core +1 more source
This work presents a robotic control method for human–robot collaborative assembly based on a biomechanics‐constrained digital human model. Reinforcement learning is used to generate physiologically plausible human motion trajectories, which are integrated into a virtual environment for robot control learning.
Bitao Yao +4 more
wiley +1 more source
A text classification method based on multimodal fusion enhancement
Although multimodal text classification techniques have potential when applied to specific scenarios, there are still some limitations.Existing multimodal fusion models require modal alignment in the input data, resulting in a large amount of incomplete ...
Dezhi LIU +3 more
doaj
Multimodal Event Classification for Social Media Based on Text-Image-Caption Assisted Alignment
The vast amount and diverse forms of information (such as text, images, etc.) provide people with rich data. How to effectively obtain and utilize multimodal data has gradually become a research hotspot in the field of artificial intelligence.
Yuanting Wang
doaj +1 more source
This chapter begins with the issues surrounding large-scale analyses of the modal ensemble through a case study that focuses on one form of contemporary written communication, the Instagram post, and demonstrates an analytical approach that takes into account the whole text, including non-verbal elements. It employs corpus-assisted multimodal discourse
openaire +2 more sources
MULTI: multimodal understanding leaderboard with text and images
The rapid development of multimodal large language models (MLLMs) raises the question of how they compare to human performance. While existing datasets often feature synthetic or overly simplistic tasks, some models have already surpassed human expert baselines.
Zichen Zhu +13 more
openaire +2 more sources
Multimodal Human–Robot Interaction Using Human Pose Estimation and Local Large Language Models
A multimodal human–robot interaction framework integrates human pose estimation (HPE) and a large language model (LLM) for gesture‐ and voice‐based robot control. Speech‐to‐text (STT) enables voice command interpretation, while a safety‐aware arbitration mechanism prioritizes gesture input for rapid intervention.
Nasiru Aboki +2 more
wiley +1 more source
Multimodal Pragmatic Jailbreak on Text-to-image Models
Diffusion models have recently achieved remarkable advancements in terms of image quality and fidelity to textual prompts. Concurrently, the safety of such generative models has become an area of growing concern. This work introduces a novel type of jailbreak, which triggers T2I models to generate the image with visual text, where the image and the ...
Liu, T +8 more
openaire +4 more sources
LLM‐Integrated Human–Robot Interaction System for Microrobots
This paper proposes an LLM‐based control framework for guiding microrobots using human natural language. This framework can convert the natural human speech into safe and executable command sets for reliable navigation in complex environments. The experimental results show high accuracy and robustness in task performance, demonstrating the potential of
Bairong Zhu, Amar Salehi, Tingting Yu
wiley +1 more source

