Results 261 to 270 of about 3,407,261 (330)
Some of the next articles are maybe not open access.

Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control

International Conference on Computer Graphics and Interactive Techniques
Diffusion models have demonstrated impressive performance in generating high-quality videos from text prompts or images. However, precise control over the video generation process—such as camera manipulation or content editing—remains a significant ...
Zekai Gu   +11 more
semanticscholar   +1 more source

LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models

arXiv.org
Visual instruction tuning has made considerable strides in enhancing the capabilities of Large Multimodal Models (LMMs). However, existing open LMMs largely focus on single-image tasks, their applications to multi-image scenarios remains less explored ...
Feng Li   +7 more
semanticscholar   +1 more source

ViPE: Video Pose Engine for 3D Geometric Perception

arXiv.org
Accurate 3D geometric perception is an important prerequisite for a wide range of spatial AI systems. While state-of-the-art methods depend on large-scale training data, acquiring consistent and precise 3D annotations from in-the-wild videos remains a ...
Jiahui Huang   +14 more
semanticscholar   +1 more source

3D Video Tools

2018
This chapter presents an overview of different tools used in research and engineering of 3D video delivery systems. These include software tools for 3D video compression and streaming, 3D video players, and their interfaces. Other types of tools widely used in research studies and development of new networking solutions, such as network simulators ...
Dumic, Emil   +9 more
openaire   +3 more sources

Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation

ACM Transactions on Graphics
Real-world applications like video gaming and virtual reality often demand the ability to model 3D scenes that users can explore along custom camera trajectories.
Tianyu Huang   +10 more
semanticscholar   +1 more source

Fast Depth Map Intra Coding for 3D Video Compression-Based Tensor Feature Extraction and Data Analysis

IEEE transactions on circuits and systems for video technology (Print), 2020
3D high-efficiency video coding (3D-HEVC) is the latest standard for 3D video compression created by the ISO/IEC MPEG and ITU-T Video Coding Experts Group (VCEG) based on a new video format called multiview texture videos plus depth maps (MVDs).
Hamza Hamout, Abderrahmane Elyousfi
semanticscholar   +1 more source

SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion

European Conference on Computer Vision
We present Stable Video 3D (SV3D) -- a latent video diffusion model for high-resolution, image-to-multi-view generation of orbital videos around a 3D object.
Vikram S. Voleti   +8 more
semanticscholar   +1 more source

CineMaster: A 3D-Aware and Controllable Framework for Cinematic Text-to-Video Generation

International Conference on Computer Graphics and Interactive Techniques
In this work, we present CineMaster, a novel framework for 3D-aware and controllable text-to-video generation. Our goal is to empower users with comparable controllability as professional film directors: precise placement of objects within the scene ...
Qinghe Wang   +9 more
semanticscholar   +1 more source

PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation

European Conference on Computer Vision
Realistic object interactions are crucial for creating immersive virtual experiences, yet synthesizing realistic 3D object dynamics in response to novel interactions remains a significant challenge.
Tianyuan Zhang   +7 more
semanticscholar   +1 more source

Spatiotemporal Multimodal Learning With 3D CNNs for Video Action Recognition

IEEE transactions on circuits and systems for video technology (Print), 2021
Extracting effective spatial-temporal information is significantly important for video-based action recognition. Recently 3D convolutional neural networks (3D CNNs) that could simultaneously encode spatial and temporal dynamics in videos have made ...
Hanbo Wu, Xin Ma, Yibin Li
semanticscholar   +1 more source

Home - About - Disclaimer - Privacy