2412
* *ACCV
* *ACCV
* *ECCV
* 2S-ODIS: Two-stage Omni-directional Image Synthesis by Geometric Distortion Correction
* 3Cat-8 Mission: A 6-Unit CubeSat for Ionospheric Multisensing and Technology Demonstration Test-Bed
* 3d Adaptive Structural Convolution Network for Domain-invariant Point Cloud Recognition
* 3d Congealing: 3d-aware Image Alignment in the Wild
* 3D Gaussian Parametric Head Model
* 3d Hand Pose Estimation in Everyday Egocentric Images
* 3d Hand Sequence Recovery from Real Blurry Images and Event Stream
* 3d Human Pose Estimation via Non-causal Retentive Networks
* 3d Open-vocabulary Panoptic Segmentation with 2d-3d Vision-language Distillation
* 3D Point Cloud Fusion Method Based on EMD Auto-Evolution and Local Parametric Network
* 3D point cloud regularization method for uniform mesh generation of mining excavations
* 3d Prompt Learning for RGB-D Tracking
* 3d Reconstruction of Objects in Hands Without Real World 3d Supervision
* 3D scene generation for zero-shot learning using ChatGPT guided language prompts
* 3d Single-object Tracking in Point Clouds with High Temporal Variation
* 3d Weakly Supervised Semantic Segmentation with 2d Vision-language Guidance
* 3d-aware Instance Segmentation and Tracking in Egocentric Videos
* 3D-Aware Text-Driven Talking Avatar Generation
* 3D-GOI: 3D GAN Omni-inversion for Multifaceted and Multi-Object Editing
* 3DEGO: 3d Editing on the Go!
* 3DFG-PIFU: 3d Feature Grids for Human Digitization from Sparse Views
* 3dgazenet: Generalizing 3d Gaze Estimation with Weak-supervision from Synthetic Views
* 3DPSR: An innovative approach for pose and shape refinement in 3D human meshes from a single 2D image
* 3DSA: Multi-view 3d Human Pose Estimation With 3d Space Attention Mechanisms
* 3IGS: Factorised Tensorial Illumination for 3d Gaussian Splatting
* 3R-INN: How to Be Climate Friendly While Consuming/delivering Videos?
* 3X2: 3d Object Part Segmentation by 2d Semantic Correspondences
* 4d Contrastive Superflows are Dense 3d Representation Learners
* 4DIFF: 3D-Aware Diffusion Model for Third-to-first Viewpoint Translation
* 4DPV: 4d PET from Videos by Coarse-to-fine Non-rigid Radiance Fields
* 5G NR Codes and Modulation Deep-RL Optimization for uRLLC in Vehicular OCC
* 6dgs: 6d Pose Estimation from a Single Image and a 3d Gaussian Splatting Model
* 6dof Head Pose Estimation Through Explicit Bidirectional Interaction with Face Geometry
* Abc Easy as 123: A Blind Counter for Exemplar-free Multi-class Class-agnostic Counting
* Accdiffusion: An Accurate Method for Higher-resolution Image Generation
* Accelerated Deep Nonlinear Dictionary Learning
* Accelerating Image Generation with Sub-path Linear Approximation Model
* Accelerating Image Super-resolution Networks with Pixel-level Classification
* Accelerating Online Mapping and Behavior Prediction via Direct BEV Feature Attention
* Accuracy and Precision of Shallow-Water Photogrammetry from the Sea Surface
* Accurate Airway Tree Segmentation in CT Scans via Anatomy-Aware Multi-Class Segmentation and Topology-Guided Iterative Learning
* Accurate Detection Is Not All You Need to Combat Label Noise in Web-noisy Datasets, An
* ACFNet: An adaptive cross-fusion network for infrared and visible image fusion
* ACMatch: Improving context capture for two-view correspondence learning via adaptive convolution
* Acoustic features analysis for explainable machine learning-based audio spoofing detection
* Act Like a Radiologist: Radiology Report Generation Across Anatomical Regions
* Action Robust Reinforcement Learning for Air Mobility Deconfliction Against Conflict Induced Spoofing
* Action-conditioned contrastive learning for 3D human pose and shape estimation in videos
* Action2sound: Ambient-aware Generation of Action Sounds from Egocentric Videos
* Actionswitch: Class-agnostic Detection of Simultaneous Actions in Streaming Videos
* Actionvos: Actions as Prompts for Video Object Segmentation
* Active Coarse-to-fine Segmentation of Moveable Parts from Real Images
* Active Generation for Image Classification
* Ad3: Introducing a Score for Anomaly Detection Dataset Difficulty Assessment Using Viaduct Dataset
* AdaCLIP: Adapting CLIP with Hybrid Learnable Prompts for Zero-shot Anomaly Detection
* Adadiff: Accelerating Diffusion Models Through Step-wise Adaptive Computation
* Adadiffsr: Adaptive Region-aware Dynamic Acceleration Diffusion Model for Real-world Image Super-resolution
* Adadistill: Adaptive Knowledge Distillation for Deep Face Recognition
* Adaglimpse: Active Visual Exploration with Arbitrary Glimpse Position and Scale
* Adaifl: Adaptive Image Forgery Localization via a Dynamic and Importance-aware Transformer Network
* Adalog: Post-training Quantization for Vision Transformers with Adaptive Logarithm Quantizer
* Adanat: Exploring Adaptive Policy for Token-based Image Generation
* Adapt Without Forgetting: Distill Proximity from Dual Teachers in Vision-language Models
* Adapt2reward: Adapting Video-language Models to Generalizable Robotic Rewards via Failure Prompts
* Adapting Fine-grained Cross-view Localization to Areas Without Fine Ground Truth
* Adapting Models to Scarce Target Data Without Source Samples
* Adapting to Shifting Correlations with Unlabeled Data Calibration
* Adaptive Annealing for Robust Averaging
* Adaptive Bias Discovery for Learning Debiased Classifier
* Adaptive Bounding Box Uncertainties via Two-step Conformal Prediction
* Adaptive Compressed Sensing with Diffusion-based Posterior Sampling
* Adaptive Correspondence Scoring for Unsupervised Medical Image Registration
* Adaptive Face Recognition for Multi-Type Occlusions
* Adaptive feature alignment for adversarial training
* Adaptive Granularity-Fused Keypoint Detection for 6D Pose Estimation of Space Targets
* Adaptive High-frequency Transformer for Diverse Wildlife Re-Identification
* Adaptive Human Trajectory Prediction via Latent Corridors
* Adaptive Memetic Algorithm for a Cost-Optimal Electric Vehicle-Drone Routing Problem, An
* Adaptive Multi-Function Radar Temporal Behavior Analysis
* Adaptive Multi-head Contrastive Learning
* Adaptive Multi-Modal Fusion of Spatially Variant Kernel Refinement with Diffusion Model for Blind Image Super-Resolution
* Adaptive Multi-scale Degradation-Based Attack for Boosting the Adversarial Transferability
* Adaptive Multi-task Learning for Few-shot Object Detection
* Adaptive Neural Message Passing for Inductive Learning on Hypergraphs
* Adaptive Parametric Activation
* Adaptive representation learning and sample weighting for low-quality 3D face recognition
* Adaptive Screen-space Meshing Approach for Normal Integration, An
* Adaptive Selection of Sampling-Reconstruction in Fourier Compressed Sensing
* Adaptive Weighted Coherence Ratio Approach for Industrial Explosion Damage Mapping: Application to the 2015 Tianjin Port Incident
* Adashield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shield Prompting
* AddBiomechanics Dataset: Capturing the Physics of Human Motion at Scale
* Addme: Zero-Shot Group-Photo Synthesis by Inserting People Into Scenes
* AddressCLIP: Empowering Vision-language Models for City-wide Image Address Localization
* Aden: Adaptive Density Representations for Sparse-view Camera Pose Estimation
* ADMAP: Anti-disturbance Framework for Vectorized HD Map Construction
* ADSP: Advanced Dataset for Shadow Processing, Enabling Visible Occluders via Synthesizing Strategy
* Advancements in Mixed Reality for Autonomous Vehicle Testing and Advanced Driver Assistance Systems: A Survey
* Advances in Optical and Thermal Remote Sensing of Vegetative Drought and Phenology
* Advancing mangrove species mapping: An innovative approach using Google Earth images and a U-shaped network for individual-level Sonneratia apetala detection
* ADVDIFF: Generating Unrestricted Adversarial Examples Using Diffusion Models
* Adversarial Diffusion Distillation
* Adversarial Prompt Tuning for Vision-language Models
* Adversarial Robustification via Text-to-image Diffusion Models
* Adversarial Style-Irrelevant Feature Learning With Refined Soft Pseudo Labels for Domain-Adaptive Vehicle Re-Identification
* Adversarialeak: External Information Leakage Attack Using Adversarial Samples on Face Recognition Systems
* Adversarially Robust Distillation by Reducing the Student-teacher Variance Gap
* Adverse Weather Optical Flow: Cumulative Homogeneous-Heterogeneous Adaptation
* Aednet: Adaptive Embedding and Multiview-aware Disentanglement for Point Cloud Completion
* Aerial Photogrammetry Benchmark Dataset for Point Cloud Segmentation and Style Translation, An
* Aff-ttention! Affordances and Attention Models for Short-term Object Interaction Anticipation
* Affect-Conditioned Image Generation
* Affective Visual Dialog: A Large-scale Benchmark for Emotional Reasoning Based on Visually Grounded Conversations
* Affine Steerers for Structured Keypoint Description
* Afreeca: Annotation-free Counting for All
* Agent Attention: On the Integration of Softmax and Linear Attention
* Agent3d-Zero: An Agent for Zero-shot 3d Understanding
* Agglomerative Token Clustering
* Agglomerator++: Interpretable part-whole hierarchies and latent space representations in neural networks
* Aid-Appeal: Automatic Image Dataset and Algorithm for Content Appeal Enhancement and Assessment Labeling
* Airborne Multi-Channel Forward-Looking Radar Super-Resolution Imaging Using Improved Fast Iterative Interpolated Beamforming Algorithm
* AIS Data-Based Hybrid Predictor for Short-Term Ship Trajectory Prediction Considering Uncertainties
* Align Before Collaborate: Mitigating Feature Misalignment for Robust Multi-Agent Perception
* Aligndiff: Aligning Diffusion Models for General Few-shot Segmentation
* Aligning Neuronal Coding of Dynamic Visual Scenes with Foundation Vision Models
* Alignist: CAD-Informed Orientation Distribution Estimation by Fusing Shape and Correspondences
* Alignzeg: Mitigating Objective Misalignment for Zero-shot Semantic Segmentation
* All You Need Is Your Voice: Emotional Face Representation with Audio Perspective for Emotional Talking Face Generation
* All-seeing Project V2: Towards General Relation Comprehension of the Open World, The
* Alternate Diverse Teaching for Semi-supervised Medical Image Segmentation
* AMD: Automatic Multi-step Distillation of Large-scale Vision Models
* Amego: Active Memory from Long Egocentric Videos
* Ames: Asymmetric and Memory-efficient Similarity Estimation for Instance-level Retrieval
* Amodal Instance Segmentation with Diffusion Shape Prior Estimation
* Analysis of Decoupling Effects and Influence Factors in Transportation: Evidence from Guangdong Province, China
* Analysis of Land Surface Performance Differences and Uncertainty in Multiple Versions of MODIS LST Products
* Analysis of Spatio-Temporal Relationship Between Ecosystem Services and Human Footprints Under Different Human Activity Gradients: A Case Study of Xiangjiang River Basin
* Analysis of systems' performance in natural language processing competitions
* Analysis-by-Synthesis Transformer for Single-View 3d Reconstruction
* Analytic-splatting: Anti-aliased 3d Gaussian Splatting via Analytic Integration
* Analyzing Surgeon-Robot Cooperative Performance in Robot-Assisted Intravascular Catheterization
* Anatomask: Enhancing Medical Image Segmentation with Reconstruction-guided Self-masking
* Animal Avatars: Reconstructing Animatable 3d Animals from Casual Videos
* Animatabledreamer: Text-guided Non-rigid 3d Model Generation and Reconstruction with Canonical Score Distillation
* Animate Your Motion: Turning Still Images into Dynamic Videos
* Animateme: 4d Facial Expressions via Diffusion Models
* ANNE: Adaptive Nearest Neighbours and Eigenvector-based sample selection for robust learning with noisy labels
* Anti-Collapse Loss for Deep Metric Learning
* Anti-Maneuvering Repeater Jamming Using Up- and Down-Chirp Modulation in Spaceborne Synthetic Aperture Radar
* Any Target Can be Offense: Adversarial Example Generation via Generalized Latent Infection
* Any2point: Empowering Any-modality Large Models for Efficient 3d Understanding
* Anycontrol: Create Your Artwork with Versatile Control on Text-to-image Generation
* Anyhome: Open-vocabulary Generation of Structured and Textured 3d Homes
* AnySR: Realizing Image Super-Resolution as Any-Scale, Any-Resource
* Anytime Continual Learning for Open Vocabulary Classification
* APL: Anchor-based Prompt Learning for One-stage Weakly Supervised Referring Expression Comprehension
* Appearance-based Refinement for Object-centric Motion Segmentation
* Apply prior feature integration to sparse object detectors
* Approaching Outside: Scaling Unsupervised 3d Object Detection from 2d Scene
* Approximate geometric structure transfer for cross-domain image classification
* Arbitrary-scale Video Super-resolution with Structural and Textural Priors
* Arc2Face: A Foundation Model for ID-consistent Human Faces
* Architecture-Agnostic Untrained Network Priors for Image Reconstruction with Frequency Regularization
* Arctic Weather Satellite Sensitivity to Supercooled Liquid Water in Snowfall Conditions
* Are Synthetic Data Useful for Egocentric Hand-object Interaction Detection?
* Aroface: Alignment Robustness to Improve Low-quality Face Recognition
* Artificial Intelligence-Based Video Saliency Prediction: Challenges and Trends
* ARTVLM: Attribute Recognition Through Vision-based Prefix Language Modeling
* ASANet: Asymmetric Semantic Aligning Network for RGB and SAR image land cover classification
* Assessing Sample Quality via the Latent Space of Generative Models
* Assessment and Application of Multi-Source Precipitation Products in Cold Regions Based on the Improved SWAT Model
* Assessment and Dynamic Prediction of Green Space Ecological Service Value in Guangzhou City, China
* Assessment of CCMP in Capturing High Winds with Respect to Individual Satellite Datasets
* Assessment of the Seasonal Uncertainty of Microwave L-Band Satellite Soil Moisture Products in Jiangsu Province, China, An
* Assessment of Vegetation Drought Loss and Recovery in Central Asia Considering a Comprehensive Vegetation Index
* Associative graph convolution network for point cloud analysis
* Asymmetric Mask Scheme for Self-Supervised Real Image Denoising
* Asymmetric Mutual Learning for Unsupervised Transferable Visible-Infrared Re-Identification
* Asynchronous Bioplausible Neuron for Spiking Neural Networks for Event-based Vision
* Asynchronous Large Language Model Enhanced Planner for Autonomous Driving
* Atmospheric Boundary Layer Stability in Urban Beijing: Insights from Meteorological Tower and Doppler Wind Lidar
* Atmospheric correction of geostationary ocean color imager data over turbid coastal waters under high solar zenith angles
* Attention Beats Linear for Fast Implicit Neural Representation Generation
* Attention Decomposition for Cross-domain Semantic Segmentation
* Attention enhanced machine instinctive vision with human-inspired saliency detection
* Attention Prompting on Image for Large Vision-language Models
* Attention-challenging Multiple Instance Learning for Whole Slide Image Classification
* Attention4align: Align Multi-view Parts Via Part2part Hierarchical Attention Map for Fine-grained 3d Object Classification
* Attentionhand: Text-driven Controllable Hand Image Generation for 3d Hand Reconstruction in the Wild
* ATTIQA: Generalizable Image Quality Feature Extractor Using Attribute-aware Pretraining
* Attnzero: Efficient Attention Discovery for Vision Transformers
* Attribute disentanglement and re-entanglement for generalized zero-shot learning
* Attribute Prototype-Guided Iterative Scene Graph for Explainable Radiology Report Generation
* Audio-driven Talking Face Generation with Stabilized Synchronization Loss
* Audio-Semantic Enhanced Pose-Driven Talking Head Generation
* Audio-synchronized Visual Animation
* Audio-visual Generalized Zero-Shot Learning the Easy Way
* Auformer: Vision Transformers Are Parameter-efficient Facial Action Unit Detectors
* AugDETR: Improving Multi-scale Learning for Detection Transformer
* Augmented Neural Fine-tuning for Efficient Backdoor Purification
* Augundo: Scaling Up Augmentations for Monocular Depth Completion and Estimation
* Auto-das: Automated Proxy Discovery for Training-free Distillation-aware Architecture Search
* Auto-GAS: Automated Proxy Discovery for Training-free Generative Architecture Search
* Autoad-zero: A Training-free Framework for Zero-shot Audio Description
* Autodir: Automatic All-in-one Image Restoration with Latent Diffusion
* Autoeval-video: An Automatic Benchmark for Assessing Large Vision Language Models in Open-ended Video Question Answering
* Automated anthropometric measurements from 3D point clouds of scanned bodies
* Automated localization of dike leakage outlets using UAV-borne thermography and YOLO-based object detectors
* automatic procedure for mapping burned areas globally using Sentinel-2 and VIIRS/MODIS active fires in Google Earth Engine, An
* Automatically Detected CSES Ionospheric Precursors Before Part of the Strong Aftershocks of the 23 January 2024 Wushi MS 7.1 Earthquake in Northwest China
* Automating the Derivation of Sugarcane Growth Stages from Earth Observation Time Series
* Autonomous Extraction Technology for Aquaculture Ponds in Complex Geological Environments Based on Multispectral Feature Fusion of Medium-Resolution Remote Sensing Imagery
* Autonomous Vehicles Lane-Changing Trajectory Planning Based on Hierarchical Decoupling
* Auxiliary Domain-guided Adaptive Object Detection in Adverse Weather Conditions
* Avatar Fingerprinting for Authorized Use of Synthetic Talking-head Videos
* Avatarpose: Avatar-Guided 3D Pose Estimation of Close Human Interaction from Sparse Multi-view Videos
* AWADA: Foreground-focused adversarial learning for cross-domain object detection
* AWOL: Analysis Without Synthesis Using Language
* Axis Estimation of Spaceborne Targets via Inverse Synthetic Aperture Radar Image Sequence Based on Regression Network
* A_OPTRAM-ET: An automatic optical trapezoid model for evapotranspiration estimation and its global-scale assessments
* B3-CDG: A pseudo-sample diffusion generator for bi-temporal building binary change detection
* Background Adaptation with Residual Modeling for Exemplar-free Class-incremental Semantic Segmentation
* Bad Students Make Great Teachers: Active Learning Accelerates Large-scale Visual Understanding
* Bad-gaussians: Bundle Adjusted Deblur Gaussian Splatting
* BAFFLE: A Baseline of Backpropagation-free Federated Learning
* Bags: Blur Agnostic Gaussian Splatting Through Multi-scale Kernel Modeling
* Balanced Collision Avoidance Algorithm for USVs in Complex Environment: A Deep Reinforcement Learning Approach, A
* Balancing Electric Scooter Battery Swapping Network by Spatio-Temporal Recommendation
* BAM-DETR: Boundary-aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos
* BAMG: Text-based Person Re-identification via Bottlenecks Attention and Masked Graph Modeling
* BAMM: Bidirectional Autoregressive Motion Model
* Based on Spatial and Temporal Implicit Semantic Relational Inference for Cross-Modal Retrieval
* Basic: Bayesnet Structure Learning for Computational Scalable Neural Image Compression
* Bayesian Detector Combination for Object Detection with Crowdsourced Annotations
* Bayesian Evidential Deep Learning for Online Action Detection
* Bayesian Self-training for Semi-supervised 3d Segmentation
* Bayesian Uncertainty Calibration for Federated Time Series Analysis
* Be Yourself: Bounded Attention for Multi-subject Text-to-image Generation
* Be-your-outpainter: Mastering Video Outpainting Through Input-specific Adaptation
* Beaf: Observing Before-after Changes to Evaluate Hallucination in Vision-language Models
* Beat-it: Beat-synchronized Multi-condition 3d Dance Generation
* Behavior and Energy of the M2 Internal Tide in the Madagascar-Mascarene Region
* BenchLMM: Benchmarking Cross-Style Visual Capability of Large Multimodal Models
* Benchmarking Object Detectors with Coco: A New Path Forward
* Benchmarking Spurious Bias in Few-shot Image Classifiers
* Benchmarking the Robustness of Cross-view Geo-localization Models
* Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects
* Bending classification from interference signals of a fiber optic sensor using shallow learning and convolutional neural networks
* BENeRF: Neural Radiance Fields from a Single Blurry Image and Event Stream
* Beta-Tuned Timestep Diffusion Model
* Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation
* Better Call Sal: Towards Learning to Segment Anything in Lidar
* Beyond Coarse-grained Matching in Video-text Retrieval
* Beyond Mot: Semantic Multi-object Tracking
* Beyond Pixels: Semi-supervised Semantic Segmentation with a Multi-scale Patch-based Multi-label Classifier
* Beyond Prompt Learning: Continual Adapter for Efficient Rehearsal-free Continual Learning
* Beyond the Contact: Discovering Comprehensive Affordance for 3d Objects from Pre-Trained 2d Diffusion Models
* Beyond the Data Imbalance: Employing the Heterogeneous Datasets for Vehicle Maneuver Prediction
* Beyond Viewpoint: Robust 3d Object Recognition Under Arbitrary Views Through Joint Multi-part Representation
* Beyondscene: Higher-resolution Human-centric Scene Generation with Pretrained Diffusion
* BI-AVAN: A Brain-Inspired Adversarial Visual Attention Network for Characterizing Human Visual Attention From Neural Activity
* Bi-directional Contextual Attention for 3d Dense Captioning
* BI-MDRG: Bridging Image History in Multimodal Dialogue Response Generation
* BI-TTA: Bidirectional Test-time Adapter for Remote Physiological Measurement
* Bidirectional Progressive Transformer for Interaction Intention Anticipation
* Bidirectional Stereo Image Compression with Cross-dimensional Entropy Model
* Bidirectional temporal and frame-segment attention for sparse action segmentation of figure skating
* Bidirectional Uncertainty-Based Active Learning for Open-Set Annotation
* Biefficient: Bidirectionally Prompting Vision-language Models for Parameter-efficient Video Recognition
* Binomial Self-compensation for Motion Error in Dynamic 3d Scanning
* Bk-sdm: A Lightweight, Fast, and Cheap Version of Stable Diffusion
* Bkdsnn: Enhancing the Performance of Learning-based Spiking Neural Networks Training with Blurred Knowledge Distillation
* BlazeBVD: Make Scale-time Equalization Great Again for Blind Video Deflickering
* Blenderalchemy: Editing 3d Graphics with Vision-language Models
* Blind Image Deblurring via Minimizing Similarity Between Fuzzy Sets on Image Pixels
* Blind Image Deblurring with Noise-robust Kernel Estimation
* Blind Image Deconvolution by Generative-based Kernel Prior and Initializer via Latent Encoding
* Blind Super Resolution with Reference Images and Implicit Degradation Representation
* Blink: Multimodal Large Language Models Can See but Not Perceive
* Blinkvision: A Benchmark for Optical Flow, Scene Flow and Point Tracking Estimation Using Rgb Frames and Events
* Blockchain-Based EV Constant Function Pricer and Oraclized State of Charge Estimator
* Bones Can't Be Triangles: Accurate and Efficient Vertebrae Keypoint Estimation Through Collaborative Error Revision
* Boost Your NeRF: A Model-agnostic Mixture of Experts Framework for High Quality and Efficient Rendering
* Boosting 3d Single Object Tracking with 2D Matching Distillation and 3D Pre-Training
* Boosting Few-shot Detection with Large Language Models and Layout-to-image Synthesis
* Boosting Gaze Object Prediction via Pixel-level Supervision from Vision Foundation Model
* Boosting Micro-Expression Recognition via Self-Expression Reconstruction and Memory Contrastive Learning
* Boosting Point Set-Based Network with Optimal Transport Optimization for Oriented Object Detection
* Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-consistency Training
* Boosting Transferability in Vision-language Attacks via Diversification Along the Intersection Region of Adversarial Trajectory
* Boosting Weak Learners With Multi-Agent Reinforcement Learning for Enhanced Stacking Models: An Application on Driver Emotion Classification
* Boosting Weakly-Supervised Image Segmentation via Representation, Transform, and Compensator
* Bootstap: Bootstrapped Training for Tracking-any-point
* Bot-Facesort: Bag-of-tricks for Robust Multi-face Tracking in Unconstrained Videos
* Bottom-up Domain Prompt Tuning for Generalized Face Anti-spoofing
* boundary-aware point clustering approach in Euclidean and embedding spaces for roof plane segmentation, A
* Brain Netflix: Scaling Data to Reconstruct Videos from Brain Signals
* Brain-id: Learning Contrast-agnostic Anatomical Representations for Brain Imaging
* Brave: Broadening the Visual Encoding of Vision-language Models
* Bridge Graph Attention Based Graph Convolution Network With Multi-Scale Transformer for EEG Emotion Recognition
* Bridge Past and Future: Overcoming Information Asymmetry in Incremental Object Detection
* Bridge: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues
* Bridging Different Language Models and Generative Vision Models for Text-to-image Generation
* Bridging Optimal Transport and Jacobian Regularization by Optimal Trajectory for Enhanced Adversarial Defense
* Bridging real and simulated data for cross-spatial- resolution vegetation segmentation with application to rice crops
* Bridging Synthetic and Real Worlds for Pre-training Scene Text Detectors
* Bridging the Gap Between Explicit and Implicit Representations: Cross-Data Association for VSLAM
* Bridging the Gap Between Haze Scenarios: A Unified Image Dehazing Model
* Bridging the Gap Between Human Motion and Action Semantics via Kinematic Phrases
* Bridging the gap between object detection in close-up and high-resolution wide shots
* Bridging the Gap: Studio-like Avatar Creation from a Monocular Phone Capture
* Bridging the Pathology Domain Gap: Efficiently Adapting CLIP for Pathology Image Analysis with Limited Labeled Data
* Bridging the Projection Gap: Overcoming Projection Bias Through Parameterized Distance Learning
* Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-supervised Learning
* Brushnet: A Plug-and-play Image Inpainting Model with Decomposed Dual-branch Diffusion
* BSTS: A Weakly-Supervised Method for Semantic Learning of 3D Point Clouds
* Bucketed Ranking-based Losses for Efficient Training of Object Detectors
* Bugnist a Large Volumetric Dataset for Object Detection Under Domain Shift
* Building Change Detection Network Based on Multilevel Geometric Representation Optimization Using Frame Fields
* Building Contextualized Trust Profiles in Conditionally Automated Driving
* BundleMoCap++: Efficient, robust and smooth motion capture from sparse multiview videos
* Burstm: Deep Burst Multi-scale SR Using Fourier Space with Optical Flow
* Bypass network for semantics driven image paragraph captioning
* Byteedit: Boost, Comply and Accelerate Generative Image Editing
* C2C: Component-to-Composition Learning for Zero-shot Compositional Action Recognition
* CADVLM: Bridging Language and Vision in the Generation of Parametric CAD Sketches
* CaesarNeRF: Calibrated Semantic Representation for Few-shot Generalizable Neural Rendering
* CAFNet: Context aligned fusion for depth completion
* Calibration Transfer via Knowledge Distillation
* Calibration-Based Multi-Prototype Contrastive Learning for Domain Generalization Semantic Segmentation in Traffic Scenes
* Caltech Aerial RGB-Thermal Dataset in the Wild
* Camera Calibration Using a Collimator System
* Camera Height Doesn't Change: Unsupervised Training for Metric Monocular Road-scene Depth Estimation
* Camera-Incremental Object Re-Identification With Identity Knowledge Evolution
* Camera-lidar Cross-modality Gait Recognition
* Camera-Shooting Resilient Watermarking on Image Instance Level
* Camoteacher: Dual-rotation Consistency Learning for Semi-supervised Camouflaged Object Detection
* Camouflaged Object Detection via Location-Awareness and Feature Fusion
* Can Mangrove Silviculture Be Carbon Neutral?
* Can OOD Object Detectors Learn from Foundation Models?
* Can Textual Semantics Mitigate Sounding Object Segmentation Preference?
* Canonical Shape Projection Is All You Need for 3d Few-shot Class Incremental Learning
* Canonicalfusion: Generating Drivable 3d Human Avatars from Multiple Images
* Capture Concept Through Comparison: Vision-and-language Representation Learning with Intrinsic Information Mining
* CARB-NET: Camera-assisted Radar-based Network for Vulnerable Road User Detection
* Cardiacnet: Learning to Reconstruct Abnormalities for Cardiac Disease Assessment from Echocardiogram Videos
* CARFF: Conditional Auto-encoded Radiance Field for 3d Scene Forecasting
* Carformer: Self-driving with Learned Object-centric Representations
* Carotid Vessel Wall Segmentation Through Domain Aligner, Topological Learning, and Segment Anything Model for Sparse Annotation in MR Images
* Cascade Prompt Learning for Vision-language Model Adaptation
* Cascade-Zero123: One Image to Highly Consistent 3d with Self-prompted Nearby Views
* Cascaded recurrent networks with masked representation learning for stereo matching of high-resolution satellite images
* CAST: An innovative framework for Cross-dimensional Attention Structure in Transformers
* CAST: Clustering self-Attention using Surrogate Tokens for efficient transformers
* CAT-SAM: Conditional Tuning for Few-shot Adaptation of Segment Anything Model
* CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-visual Scenarios
* Catastrophic Overfitting: A Potential Blessing in Disguise
* Catchbackdoor: Backdoor Detection via Critical Trojan Neural Path Fuzzing
* Category Adaptation Meets Projected Distillation in Generalized Continual Category Discovery
* Category-level Object Detection, Pose Estimation and Reconstruction from Stereo Images
* Causal Subgraphs and Information Bottlenecks: Redefining OOD Robustness in Graph Neural Networks
* Causality-Informed Graph Convolutional Network for Video Assessment of Parkinsonian Leg Agility, A
* Causality-inspired Discriminative Feature Learning in Triple Domains for Gait Recognition
* CC-SAM: SAM with Cross-feature Attention and Context for Ultrasound Image Segmentation
* CCNDF: Curvature Constrained Neural Distance Fields from 3d Lidar Sequences
* CCR: A Counterfactual Causal Reasoning-Based Method for Cross-View Geo-Localization
* CDP-MIL: Robust Multiple Instance Learning via Cascaded Dirichlet Process
* CenterFormer: A Novel Cluster Center Enhanced Transformer for Unconstrained Dental Plaque Segmentation
* Centering the Value of Every Modality: Towards Efficient and Resilient Modality-Agnostic Semantic Segmentation
* Cephalometric Landmark Regression Method Based on Dual-encoder for High-resolution X-ray Image, A
* CEPrompt: Cross-Modal Emotion-Aware Prompting for Facial Expression Recognition
* Certifiably Robust Image Watermark
* CF-SOLT: Real-time and accurate traffic accident detection using correlation filter-based tracking
* CFMMC-Align: Coarse-Fine Multi-Modal Contrastive Alignment Network for Traffic Event Video Question Answering
* CFN-ESA: A Cross-Modal Fusion Network With Emotion-Shift Awareness for Dialogue Emotion Recognition
* CG-SLAM: Efficient Dense RGB-D SLAM in a Consistent Uncertainty-aware 3d Gaussian Field
* CHA: Conditional Hyper-Adapter method for detecting human-object interaction
* Chains of Diffusion Models
* Challenging Forgets: Unveiling the Worst-case Forget Sets in Machine Unlearning
* Chameleon: A Data-efficient Generalist for Dense Visual Prediction in the Wild
* Champ: Controllable and Consistent Human Image Animation with 3d Parametric Guidance
* Channel and Spatial Enhancement Network for human parsing
* Chaos-Based Tunable Selective Encryption Algorithm for H.265/HEVC with Semantic Understanding, A
* Character-Aware Audio-Visual Subtitling in Context
* Characterization of CYGNSS Ocean Surface Wind Speed Products
* Characterization of Polarimetric Properties in Various Brain Tumor Types Using Wide-Field Imaging Mueller Polarimetry
* Characterizing Hierarchical Semantic-Aware Parts With Transformers for Generalized Zero-Shot Learning
* Characterizing Model Robustness via Natural Input Gradients
* Chat-edit-3d: Interactive 3d Scene Editing via Text Prompts
* CHEX: Interactive Localization and Region Description in Chest X-rays
* Chinese Character Component Segmentation Based on Character Structure Masks
* Chronologically Accurate Retrieval for Temporal Grounding of Motion-language Models
* CiABL: Completeness-Induced Adaptative Broad Learning for Cross-Subject Emotion Recognition With EEG and Eye Movement Signals
* CIC-BART-SSA: Controllable Image Captioning with Structured Semantic Augmentation
* Cipherdm: Secure Three-party Inference for Diffusion Model Sampling
* City-on-web: Real-time Neural Rendering of Large-scale Scenes on the Web
* Citygaussian: Real-time High-quality Large-scale Scene Rendering with Gaussians
* Cityguessr: City-level Video Geo-localization on a Global Scale
* CLAMP-VIT: Contrastive Data-free Learning for Adaptive Post-training Quantization of VITs
* CLAP: Isolating Content from Style Through Contrastive Learning with Augmented Prompts
* ClarityDiffuseNet: Enhancing fundus image quality under black shadows with diffusion model-based research
* Class Activation Map Calibration for Weakly Supervised Semantic Segmentation
* Class Probability Space Regularization for semi-supervised semantic segmentation
* Class-agnostic Object Counting with Text-to-image Diffusion Model
* Class-aware Contrastive Learning for Fine-grained Skeleton-based Action Recognition
* Class-incremental Learning with CLIP: Adaptive Representation Adjustment and Parameter Fusion
* Classification Matters: Improving Video Action Detection with Class-specific Attention
* Classification of Ship Type from Combination of HMM-DNN-CNN Models Based on Ship Trajectory Features
* Classifying the Shapes of Buildings by Combining Distance Field Enhancement and a Convolution Neural Network
* Clean and Compact: Efficient Data-free Backdoor Defense with Model Compactness
* ClearCLIP: Decomposing CLIP Representations for Dense Vision-language Inference
* Clearer Frames, Anytime: Resolving Velocity Ambiguity in Video Frame Interpolation
* Cleo: Continual Learning of Evolving Ontologies
* Click Prompt Learning with Optimal Transport for Interactive Segmentation
* Click-gaussian: Interactive Segmentation to Any 3d Gaussians
* Cliff: Continual Latent Diffusion for Open-vocabulary Object Detection
* Cliffphys: Camera-based Respiratory Measurement Using Clifford Neural Networks
* Climatological Evaluation of Three Assimilation and Reanalysis Datasets on Soil Moisture over the Tibetan Plateau
* CLIP-dinoiser: Teaching CLIP a Few Dino Tricks for Open-vocabulary Semantic Segmentation
* CLIP-DPO: Vision-language Models as a Source of Preference for Fixing Hallucinations in LVLMS
* CLIP-guided Generative Networks for Transferable Targeted Adversarial Attacks
* Close, But Not There: Boosting Geographic Distance Sensitivity in Visual Place Recognition
* Closed-loop Unsupervised Representation Disentanglement with Beta-VAE Distillation and Diffusion Probabilistic Feedback
* Closer Look at GAN Priors: Exploiting Intermediate Features for Enhanced Model Inversion Attacks, A
* Closer: Towards Better Representation Learning for Few-shot Class-incremental Learning
* Cloudfixer: Test-time Adaptation for 3d Point Clouds via Diffusion-guided Geometric Transformation
* CLR-GAN: Improving GANs Stability and Quality via Consistent Latent Representation and Reconstruction
* Clustering, triangulation, and evaluation of 3D lines in multiple images
* Clusteringsdf: Self-organized Neural Implicit Surfaces for 3d Decomposition
* Cmd: A Cross Mechanism Domain Adaptation Dataset for 3d Object Detection
* Cmta: Cross-modal Temporal Alignment for Event-guided Video Deblurring
* CNG-SFDA: Clean-and-noisy Region Guided Online-offline Source-free Domain Adaptation
* CNN Mixture-of-depths
* CNN-Based Reversible Data Hiding for JPEG Images
* CNN-O-ELMNet: Optimized Lightweight and Generalized Model for Lung Disease Classification and Severity Assessment
* Co-segmentation Without any Pixel-level Supervision with Application to Large-scale Sketch Classification
* Co-speech Gesture Video Generation with 3d Human Meshes
* Co-Student: Collaborating Strong and Weak Students for Sparsely Annotated Object Detection
* Co-synthesis of Histopathology Nuclei Image-label Pairs Using a Context-conditioned Joint Diffusion Model
* Coarse-to-fine Implicit Representation Learning for 3d Hand-object Reconstruction from a Single Rgb-d Image
* Coarse-to-Fine Target Detection for HFSWR With Spatial-Frequency Analysis and Subnet Structure
* Coastal Reclamation Embankment Deformation: Dynamic Monitoring and Future Trend Prediction Using Multi-Temporal InSAR Technology in Funing Bay, China
* Coca: Classifier-oriented Calibration via Textual Prototype for Source-free Universal Domain Adaptation
* Cocktail Universal Adversarial Attack on Deep Neural Networks
* COD: Learning Conditional Invariant Representation for Domain Adaptation Regression
* CODA: Instructive Chain-of-domain Adaptation with Severity-aware Visual Prompt Tuning
* CodingHomo: Bootstrapping Deep Homography With Video Coding
* Cogview3: Finer and Faster Text-to-image Generation via Relay Diffusion
* Coherentgs: Sparse Novel View Synthesis with Coherent 3d Gaussians
* Coho: Context-sensitive City-scale Hierarchical Urban Layout Generation
* Coin-matting: Confounder Intervention for Image Matting
* Coin: Control-Inpainting Diffusion Prior for Human and Camera Motion Estimation
* Cola: Conditional Dropout and Language-driven Robust Dual-modal Salient Object Detection
* Coleaf: A Contrastive-collaborative Learning Framework for Weakly Supervised Audio-visual Video Parsing
* Collaboration or Competition: An Infomax-Based Period-Aware Transformer for Ticket-Grabbing Prediction
* Collaborative Control for Geometry-conditioned PBR Image Generation
* Collaborative Debias Strategy for Temporal Sentence Grounding in Video
* Collaborative License Plate Recognition via Association Enhancement Network With Auxiliary Learning and a Unified Benchmark
* Collaborative Vehicular Threat Sharing: A Long-Term Contract-Based Incentive Mechanism With Privacy Preservation
* Collaborative Vision-text Representation Optimizing for Open-vocabulary Segmentation
* Colored Point Cloud Quality Assessment Using Complementary Features in 3D and 2D Spaces
* Colormae: Exploring Data-independent Masking Strategies in Masked Autoencoders
* Colormnet: A Memory-based Deep Spatial-temporal Feature Propagation Network for Video Colorization
* ColorPeel: Color Prompt Learning with Diffusion Models via Color and Shape Disentanglement
* Com Kitchens: An Unedited Overhead-view Video Dataset as a Vision-language Benchmark
* Combined-Cycle Propulsion-Involved Trajectory Optimization and Performance-Driven Attitude Control for Aerospace Plane During the Ascent Phase
* Combined-Slip Trajectory Tracking and Yaw Stability Control for 4WID Autonomous Vehicles Based on Effective Cornering Stiffness
* Combining Generative and Geometry Priors for Wide-angle Portrait Correction
* Combining Sentinel-2 Data and Risk Maps to Detect Trees Predisposed to and Attacked by European Spruce Bark Beetle
* Comboverse: Compositional 3D Assets Creation Using Spatially-aware Diffusion Guidance
* Comfusion: Enhancing Personalized Generation by Instance-scene Compositing and Fusion
* Comments on A Quantum Cryptographic Protocol for Secure Vehicular Communication
* Common Sense Reasoning for Deepfake Detection
* Common-feature-track-matching approach for multi-epoch UAV photogrammetry co-registration
* Commonly Interesting Images
* Como: Compact Mapping and Odometry
* Como: Controllable Motion Generation Through Language Guided Pose Code Editing
* Compact 3D Scene Representation via Self-Organizing Gaussian Grids
* Compact Dynamic 3d Gaussian Representation for Real-time Dynamic View Synthesis, A
* Comparative Study of Image Restoration Networks for General Backbone Network Design, A
* Comparative Validation and Misclassification Diagnosis of 30-Meter Land Cover Datasets in China
* Comparison of the Distribution of Evapotranspiration on Shady and Sunny Slopes in Southwest China
* Compensation Sampling for Improved Convergence in Diffusion Models
* Compgs: Smaller and Faster Gaussian Splatting with Vector Quantization
* complex neural network model by Hilbert Transform, A
* Compose: Comprehensive Portrait Shadow Editing
* Compositional Substitutivity of Visual Reasoning for Visual Question Answering
* Comprehensive Analysis of BDS/GNSS Differential Code Bias and Compatibility Performance
* Comprehensive Attribution: Inherently Explainable Vision Model with Feature Detector
* Comprehensive Study of Multimodal Large Language Models for Image Quality Assessment, A
* Comprehensive Survey on Traffic Missing Data Imputation, A
* Comprehensive Survey: Quality of Service in Railway Communication Using Information-Centric Networking and Light Fidelity
* Compress3D: A Compressed Latent Space for 3d Generation from a Single Image
* Computational Model for Color Assimilation Illusions and Color Constancy, A
* Computing the Lipschitz Constant Needed for Fast Scene Recovery from Cassi Measurements
* Comusion: Towards Consistent Stochastic Human Motion Prediction via Motion Diffusion
* Concept Arithmetics for Circumventing Concept Inhibition in Diffusion Models
* Concept Sliders: Lora Adaptors for Precise Control in Diffusion Models
* ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction
* Conceptual Codebook Learning for Vision-language Models
* Concise Plane Arrangements for Low-poly Surface and Volume Modelling
* CONDA: Condensed Deep Association Learning for Co-Salient Object Detection
* Condense: Consistent 2d/3d Pre-training for Dense and Sparse Features from Multi-view Images
* Conditional Distribution Modelling for Few-shot Image Synthesis with Diffusion Models
* Confidence Self-calibration for Multi-label Class-incremental Learning
* Confidence-based Iterative Generation for Real-world Image Super-resolution
* Congeo: Robust Cross-view Geo-localization Across Ground View Variations
* Connecting Consistency Distillation to Score Distillation for Text-to-3d Generation
* Consistency-driven feature scoring and regularization network for visible-infrared person re-identification
* Consistent 3d Line Mapping
* Consistent Representation Mining for Multi-Drone Single Object Tracking
* Constraint-Aware Learning for Fractional Flow Reserve Pullback Curve Estimation From Invasive Coronary Imaging
* Constructing Concept-based Models to Mitigate Spurious Correlations with Minimal Human Effort
* Constructing Efficient Mesh-Based Global Grid Systems with Reduced Distortions
* Constructing High-Order Functional Connectivity Networks With Temporal Information From fMRI Data
* Construction of Mining Subsidence Basin and Inversion of Predicted Subsidence Parameters Based on UAV Photogrammetry Products Considering Horizontal Displacement
* Content-adaptive Style Transfer: A Training-free Approach with Vq Autoencoders
* Content-Aware Radiance Fields: Aligning Model Complexity with Scene Intricacy Through Learned Bitwidth Quantization
* Context Diffusion: In-context Aware Image Generation
* Context-aware Action Recognition: Introducing a Comprehensive Dataset for Behavior Contrast
* Context-Aware Search for Environmental Data Using Dense Retrieval
* Context-guided Spatial Feature Reconstruction for Efficient Semantic Segmentation
* Contextual Correspondence Matters: Bidirectional Graph Matching for Video Summarization
* Continual Learning and Unknown Object Discovery in 3d Scenes via Self-distillation
* Continual Learning for Remote Physiological Measurement: Minimize Forgetting and Simplify Inference
* Continual Learning Improves Zero-shot Action Recognition
* Continual learning with high-order experience replay for dynamic network embedding
* Continuity Preserving Online Centerline Graph Learning
* Continuous 3D Myocardial Motion Tracking via Echocardiography
* Continuous and Overall Quality of Experience Evaluation for Streaming Video Based on Rich Features Exploration and Dual-Stage Attention
* Continuous fake media detection: Adapting deepfake detectors to new generative techniques
* Continuous Memory Representation for Anomaly Detection
* Continuous So(3) Equivariant Convolution for 3d Point Cloud Analysis
* Contourlet Residual for Prompt Learning Enhanced Infrared Image Super-resolution
* Contrasting Deepfakes Diffusion via Contrastive Learning and Global-local Similarities
* Contrastive Ground-level Image and Remote Sensing Pre-training Improves Representation Learning for Natural World Imagery
* Contrastive Learning Based Modality-Invariant Feature Acquisition for Robust Multimodal Emotion Recognition With Missing Modalities
* Contrastive learning for real SAR image despeckling
* Contrastive Learning Using Synthetic Images Generated from Real Images
* Contrastive Learning with Counterfactual Explanations for Radiology Report Generation
* Contrastive Learning with Synthetic Positives
* Contrastive Max-correlation for Multi-view Clustering
* Contrastive Region Guidance: Improving Grounding in Vision-language Models Without Training
* Contrastive representation enhancement and learning for handwritten mathematical expression recognition
* Contribution-based Low-rank Adaptation with Pre-training Model for Real Image Restoration
* Controlcap: Controllable Region-level Captioning
* Controllable Contextualized Image Captioning: Directing the Visual Narrative Through User-defined Highlights
* Controllable Human-object Interaction Synthesis
* Controllable Navigation Instruction Generation with Chain of Thought Prompting
* Controllable Syllable-Level Lyrics Generation from Melody with Prior Attention
* Controlling the World by Sleight of Hand
* ControlLLM: Augment Language Models with Tools by Searching on Graphs
* Controlnet++: Improving Conditional Controls with Efficient Consistency Feedback
* Controlnet-xs: Rethinking the Control of Text-to-image Diffusion Models as Feedback-control Systems
* convex Kullback-Leibler optimization for semi-supervised few-shot learning, A
* Convex Relaxations for Manifold-valued Markov Random Fields with Approximation Guarantees
* Cooperated Truck-Drone Routing With Drone Energy Consumption and Time Windows
* Coordinated Planning of EV Charging Stations and Mobile Energy Storage Vehicles in Highways With Traffic Flow Modeling
* COPT: Unsupervised Domain Adaptive Segmentation Using Domain-agnostic Text Embeddings
* Cor-gs: Sparse-view 3d Gaussian Splatting via Co-regularization
* Cores: Orchestrating the Dance of Reasoning and Segmentation
* Correlation Analysis of Vertical Ground Movement and Climate Using Sentinel-1 InSAR
* Correspondence-free Se(3) Point Cloud Registration in RKHS via Unsupervised Equivariant Learning
* Correspondences of the Third Kind: Camera Pose Estimation from Object Reflection
* Corruption-based anomaly detection and interpretation in tabular data
* Cosign: Few-step Guidance of Consistency Model to Solve General Inverse Problems
* COSMU: Complete 3d Human Shape from Monocular Unconstrained Images
* cost-effective and robust mapping method for diverse crop types using weakly supervised semantic segmentation with sparse point samples, A
* COSTA: A Multi-Center TOF-MRA Dataset and a Style Self-Consistency Network for Cerebrovascular Segmentation
* Cotracker: It Is Better to Track Together
* Counterfactual Causal-Effect Intervention for Interpretable Medical Visual Question Answering
* Countformer: Multi-view Crowd Counting Transformer
* COVLM: Leveraging Consensus from Vision-language Models for Semi-supervised Multi-modal Fake News Detection
* CPM: Class-conditional Prompting Machine for Audio-Visual Segmentation
* Cpt-vr: Improving Surface Rendering via Closest Point Transform with View-reflection Appearance
* Crisp: Leveraging Tread Depth Maps for Enhanced Crime-scene Shoeprint Matching
* CRM: Single Image to 3d Textured Mesh with Convolutional Reconstruction Model
* Cromo-mixup: Augmenting Cross-model Representations for Continual Self-supervised Learning
* Cross Feature Fusion of Fundus Image and Generated Lesion Map for Referable Diabetic Retinopathy Classification
* Cross-attention based dual-similarity network for few-shot learning
* Cross-domain Few-shot Object Detection via Enhanced Open-set Object Detector
* Cross-domain Learning for Video Anomaly Detection with Limited Supervision
* Cross-domain Semantic Segmentation on Inconsistent Taxonomy Using Vlms
* Cross-input Certified Training for Universal Perturbations
* Cross-modal adapter for vision-language retrieval
* Cross-modal change detection using historical land use maps and current remote sensing images
* Cross-modal independent matching network for image-text retrieval
* Cross-modality Complementary Learning for Video-based Cloth-changing Person Re-identification
* Cross-Model Cross-Stream Learning for Self-Supervised Human Action Recognition
* Cross-platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach
* Cross-view Image Geo-localization with Panorama-BEV Co-retrieval Network
* Cross-View Location Alignment Enhanced Spatial-Topological Aware Dual Transformer for Travel Time Estimation
* CrossGLG: Llm Guides One-shot Skeleton-based 3d Action Recognition in a Cross-level Manner
* Crosspar: Enhancing Pedestrian Attribute Recognition with Vision-language Fusion and Human-centric Pre-training
* Crossscore: Towards Multi-view Image Evaluation and Scoring
* Crossvit-ReID: Cross-attention Vision Transformer for Occluded Cloth-changing Person Re-identification
* Crowd-SAM: SAM as a Smart Annotator for Object Detection in Crowded Scenes
* Csot: Cross-scan Object Transfer for Semi-supervised Lidar Object Detection
* CSPN: A Category-Specific Processing Network for Low-Light Image Enhancement
* CS^2K: Class-Specific and Class-Shared Knowledge Guidance for Incremental Semantic Segmentation
* CTOD: Cross-Attentive Task-Alignment for One-Stage Object Detection
* CTRLorALTer: Conditional LorALTer for Efficient 0-shot Control and Altering of T2I Models
* Curved Diffusion: A Generative Model with Optical Geometry Control
* Customize-a-video: One-shot Motion Customization of Text-to-video Diffusion Models
* Customized Bus Service Design With Holding Control and Heterogeneous Fleet: A Column-Generation-Based Decomposition Algorithm
* Customized Generation Reimagined: Fidelity and Editability Harmonized
* Cut Out the Middleman: Revisiting Pose-based Gait Recognition
* CvFormer: Cross-view transFormers with pre-training for fMRI analysis of human brain
* CVT-OCC: Cost Volume Temporal Fusion for 3d Occupancy Prediction
* CWGA-Net: Center-Weighted Graph Attention Network for 3D object detection from point clouds
* Cyberattack Detection-Isolation Algorithm for CAV Under Changing Driving Environment, A
* D'OH: Decoder-only Random Hypernetworks for Implicit Neural Representations
* D-InSAR-Based Analysis of Slip Distribution and Coulomb Stress Implications from the 2024 Mw 7.01 Wushi Earthquake
* D-sco: Dual-stream Conditional Diffusion for Monocular Hand-held Object Reconstruction
* DA-BEV: Unsupervised Domain Adaptation for Bird's Eye View Perception
* Dailydvs-200: A Comprehensive Benchmark Dataset for Event-based Action Recognition
* Damage Detection and Segmentation in Disaster Environments Using Combined YOLO and Deeplab
* Damsdet: Dynamic Adaptive Multispectral Detection Transformer with Competitive Query Selection and Adaptive Feature Fusion
* Data augmentation strategies for semi-supervised medical image segmentation
* Data Augmentation via Latent Diffusion for Saliency Prediction
* Data Collection-free Masked Video Modeling
* Data Overfitting for On-device Super-resolution with Dynamic Algorithm and Compiler Co-design
* Data Poisoning Quantization Backdoor Attack
* Data-Learning Game Output Regulation Approach for Human-Machine Cooperative Driving Toward Varied Drivers and Vehicles
* Data-to-model Distillation: Data-efficient Learning Framework
* Datadream: Few-shot Guided Dataset Generation
* Dataset Distillation by Automatic Training Trajectories
* Dataset Enhancement with Instance-level Augmentations
* Dataset Growth
* Dataset Quantization with Active Learning Based Adaptive Sampling
* DatasetNeRF: Efficient 3d-aware Data Factory with Generative Radiance Fields
* Datenerf: Depth-aware Text-based Editing of Nerfs
* DA^2: Degree-Accumulated Data Augmentation on Point Clouds with Curriculum Dynamic Threshold Selection
* DBCvT: Double Branch Convolutional Transformer for Medical Image Classification
* DBMHT: A double-branch multi-hypothesis transformer for 3D human pose estimation in video
* DBSR: Quadratic Conditional Diffusion Model for Blind Cardiac MRI Super-Resolution
* DC-Mamba: A Novel Network for Enhanced Remote Sensing Change Detection in Difficult Cases
* DC-Solver: Improving Predictor-Corrector Diffusion Sampler via Dynamic Compensation
* DCDiff: Dual-Granularity Cooperative Diffusion Models for Pathology Image Analysis
* DCDM: Diffusion-conditioned-diffusion Model for Scene Text Image Super-resolution
* DCFU-Net: Rethinking an Effective Attention and Convolutional Architecture for Retinal Vessel Segmentation
* DDOWOD: DiffusionDet for open-world object detection
* De-confounded Gaze Estimation
* De-confusing Pseudo-labels in Source-free Domain Adaptation
* DEAL: Disentangle and Localize Concept-level Explanations for VLMs
* Debiasing Surgeon: Fantastic Weights and How to Find Them
* Debiformer: Vision Transformer with Deformable Agent Bi-level Routing Attention
* Deblur e-NeRF: NeRF from Motion-blurred Events under High-speed or Low-light Conditions
* Deblurring 3d Gaussian Splatting
* Decap: Towards Generalized Explicit Caption Editing via Diffusion Mechanism
* DecentNeRFs: Decentralized Neural Radiance Fields from Crowdsourced Images
* Decentralized Multi-Vehicle Motion Planning for Platoon Forming in Mixed Traffic Using Monte Carlo Tree Search
* Deceptive-NeRF/3DGS: Diffusion-generated Pseudo-observations for High-quality Sparse-view Reconstruction
* Decider: Leveraging Foundation Model Priors for Improved Model Failure Detection and Explanation
* Deciphering the Role of Representation Disentanglement: Investigating Compositional Generalization in CLIP Models
* Deco: Decoupled Human-centered Diffusion Video Editing with Motion Consistency
* Decollage: 3d Detailization by Controllable, Localized, and Learned Geometry Enhancement
* Decomposed Vector-quantized Variational Autoencoder for Human Grasp Generation
* Decomposition Betters Tracking Everything Everywhere
* Decomposition of Neural Discrete Representations for Large-scale 3d Mapping
* Decoupled Detr for Few-shot Object Detection
* Decoupling and Insensitivity of Greenness and Gross Primary Productivity Across Aridity Gradients in China
* Decoupling Common and Unique Representations for Multimodal Self-supervised Learning
* Decoupling-Based Resilient Control of Vehicular Platoons Under Injection of False Wireless Data
* Deep Companion Learning: Enhancing Generalization Through Historical Consistency
* Deep Cost Ray Fusion for Sparse Depth Video Completion
* Deep Diffusion Image Prior for Efficient OOD Adaptation in 3d Inverse Problems
* Deep Feature Surgery: Towards Accurate and Efficient Multi-exit Networks
* Deep Learning Image Segmentation Based on Adaptive Total Variation Preprocessing
* Deep Learning Model Size Performance Evaluation for Lightning Whistler Detection on Arase Satellite Dataset
* Deep Nets with Subsampling Layers Unwittingly Discard Useful Activations at Test-time
* Deep Online Probability Aggregation Clustering
* Deep Patch Visual SLAM
* Deep Polarization Cues for Single-shot Shape and Subsurface Scattering Estimation
* Deep Reward Supervisions for Tuning Text-to-image Diffusion Models
* Deep shared proxy construction hashing for cross-modal remote sensing image fast target retrieval
* Deep Spatial: Spectral Joint-Sparse Prior Encoding Network for Hyperspectral Target Detection
* DeepArUco++: Improved detection of square fiducial markers in challenging lighting conditions
* DeepDuoHDR: A Low Complexity Two Exposure Algorithm for HDR Deghosting on Mobile Devices
* DeepMarkerNet: Leveraging supervision from the Duchenne Marker for spontaneous smile recognition
* DeepSet SimCLR: Self-supervised deep sets for improved pathology representation learning
* Defect Spectrum: A Granular Look of Large-scale Defect Datasets with Rich Semantics
* Deformable Shape-aware Point Generation for 3d Object Detection
* Deformable surface reconstruction via Riemannian metric preservation
* Deformation Monitoring and Analysis of Beichuan National Earthquake Ruins Museum Based on Time Series InSAR Processing
* DeIoU: Toward Distinguishable Box Prediction in Densely Packed Object Detection
* Delineations for Police Patrolling on Street Network Segments with p-Median Location Models
* Delving Deep into Engagement Prediction of Short Videos
* Delving into Adversarial Robustness on Document Tampering Localization
* Delving into CLIP latent space for Video Anomaly Recognition
* DEM Registration Method Without Ground Control Points for Landslide Deformation Monitoring, The
* DENEB: A Hallucination-robust Automatic Evaluation Metric for Image Captioning
* Denoising diffusion model with adversarial learning for unsupervised anomaly detection on brain MRI images
* Denoising Vision Transformers
* denoisplit: A Method for Joint Microscopy Image Splitting and Unsupervised Denoising
* Dense Hand-object (ho) Graspnet with Full Grasping Taxonomy and Dynamics
* Dense Multimodal Alignment for Open-vocabulary 3d Scene Understanding
* Dense Trajectory Fields: Consistent and Efficient Spatio-temporal Pixel Tracking
* Densenets Reloaded: Paradigm Shift Beyond Resnets and VITS
* Dependable and Efficient Decentralized Trust Management System Based on Consortium Blockchain for Intelligent Transportation Systems, A
* Dependency-aware Differentiable Neural Architecture Search
* Depict: Diffusion-enabled Permutation Importance for Image Classification Tasks
* Depicting Beyond Scores: Advancing Image Quality Assessment Through Multi-modal Language Models
* DePS: Delayed-espilon-shrinking for Faster Once-for-all Training
* Depth Attention for Robust RGB Tracking
* Depth on Demand: Streaming Dense Depth from a Low Frame Rate Active Sensor
* Depth-aware Blind Image Decomposition for Real-world Adverse Weather Recovery
* Depth-guided NeRF Training via Earth Mover's Distance
* DepthBLIP-2: Leveraging Language to Guide BLIP-2 in Understanding Depth Information
* Depthsegnet24: A Label-free Model for Robust Day-night Depth and Semantics
* DESAT: A Distance-Enhanced Strip Attention Transformer for Remote Sensing Image Super-Resolution
* Description and In-Flight Assessment of the POSEIDON-3C Altimeter of the SWOT Mission
* Design of a differentiable L-1 norm for pattern recognition and machine learning
* Design of a Robotic System Featured With High Operation Transparency for Quantifying Arm Impedance During Ultrasound Scanning
* Design of an Adaptive Enhanced AMP-Based Image Block Compressed Sensing and Its Application to Image Encryption, The
* Design of Blockchain-Based Multi-Domain Authentication Protocol for Secure EV Charging Services in V2G Environments
* Designing Extremely Memory-efficient CNNs for On-device Vision Tasks
* Dessie: Disentanglement for Articulated 3d Horse Shape and Pose Estimation from Images
* Detailsemnet: Elevating Signature Verification Through Detail-semantic Integration
* Detecting as Labeling: Rethinking Lidar-camera Fusion in 3d Object Detection
* Detection of buildings with potential damage using differential deformation maps
* Deterministic Sea Wave Reconstruction and Prediction Based on Coherent S-Band Radar Using Condition Number Regularized Least Squares
* DETRA: A Unified Model for Object Detection and Trajectory Forecasting
* Dettoolchain: A New Prompting Paradigm to Unleash Detection Ability of MLLM
* Deturb: Atmospheric Turbulence Mitigation with Deformable 3d Convolutions and 3d Swin Transformers
* Development of a Methodology Based on ALS Data and Diameter Distribution Simulation to Characterize Management Units at Tree Level
* Development of a MR Training System for Living Donor Liver Transplantation Using Simulated Liver Phantom and ICP Tracking Technology
* Development of Collision Avoidance System Integrated With Real-Time Tire-Road Friction Coefficient Estimator
* Developmental Plasticity-Inspired Adaptive Pruning for Deep Spiking and Artificial Neural Networks
* DEVIAS: Learning Disentangled Video Representations of Action and Scene
* Devil Is in the Details: Simple Remedies for Image-to-LiDAR Representation Learning, The
* Devil Is in the Statistics: Mitigating and Exploiting Statistics Difference for Generalizable Semi-supervised Medical Image Segmentation, The
* Dfimat: Decoupled Flexible Interactive Matting in Multi-person Scenarios
* DG-PIC: Domain Generalized Point-in-context Learning for Point Cloud Understanding
* DGD: Dynamic 3d Gaussians Distillation
* DGE: Direct Gaussian 3d Editing by Consistent Multi-view Editing
* Dginstyle: Domain-generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control
* Dgr-mil: Exploring Diverse Global Representation in Multiple Instance Learning for Whole Slide Image Classification
* DHR: Dual Features-driven Hierarchical Rebalancing in Inter- and Intra-class Regions for Weakly-supervised Semantic Segmentation
* Diagnosing and Re-learning for Balanced Multimodal Learning
* Dial: Dense Image-text Alignment for Weakly Supervised Semantic Segmentation
* DIE-CDK: A Discriminative Information Enhancement Method With Cross-Modal Domain Knowledge for Fine-Grained Ship Detection
* Diff-reg: Diffusion Model in Doubly Stochastic Matrix Space for Registration Problem
* Diff-tracker: Text-to-image Diffusion Models are Unsupervised Trackers
* Diff3detr: Agent-based Diffusion Model for Semi-supervised 3d Object Detection
* DIFFBIR: Toward Blind Image Restoration with Generative Diffusion Prior
* Diffcd: A Symmetric Differentiable Chamfer Distance for Neural Implicit Surface Fitting
* Diffclass: Diffusion-based Class Incremental Learning
* Diffender: Diffusion-based Adversarial Defense Against Patch Attacks
* Differentiable Convex Polyhedra Optimization from Multi-view Images
* Differentiable Product Quantization for Memory Efficient Camera Relocalization
* Differentiating Cheatgrass and Medusahead Phenological Characteristics in Western United States Rangelands
* Difffas: Face Anti-spoofing via Generative Diffusion Models
* Diffit: Diffusion Vision Transformers for Image Generation
* Diffloss: Unleashing Diffusion Model as Constraint for Training Image Restoration Network
* Diffpmae: Diffusion Masked Autoencoders for Point Cloud Reconstruction
* Diffsurf: A Transformer-based Diffusion Model for Generating and Reconstructing 3d Surfaces in Pose
* Diffumatting: Synthesizing Arbitrary Objects with Matting-level Annotation
* Diffusing Background Dictionary for Hyperspectral Anomaly Detection
* Diffusion for Natural Image Matting
* Diffusion for Out-of-distribution Detection on Road Scenes and Beyond
* Diffusion Model Compression for Image-to-image Translation
* Diffusion Model for Robust Multi-sensor Fusion in 3d Object Detection and BEV Segmentation
* Diffusion Model for Simulation Ready Coronary Anatomy with Morpho-skeletal Control, A
* Diffusion Model is a Good Pose Estimator from 3d RF-Vision
* Diffusion Models are Geometry Critics: Single Image 3d Editing Using Pre-trained Diffusion Priors
* Diffusion Models as Data Mining Tools
* Diffusion Models as Optimizers for Efficient Planning in Offline Rl
* Diffusion Models for Counterfactual Explanations
* Diffusion Models for Monocular Depth Estimation: Overcoming Challenging Conditions
* Diffusion Models for Open-vocabulary Segmentation
* Diffusion Prior-based Amortized Variational Inference for Noisy Inverse Problems
* Diffusion Reward: Learning Rewards via Conditional Video Diffusion
* Diffusion Soup: Model Merging for Text-to-image Diffusion Models
* Diffusion-Based Hypotheses Generation and Joint-Level Hypotheses Aggregation for 3D Human Pose Estimation
* Diffusion-based Image-to-image Translation by Noise Correction via Prompt Interpolation
* Diffusion-based Multimodal Video Captioning
* Diffusion-driven Data Replay: A Novel Approach to Combat Forgetting in Federated Class Continual Learning
* Diffusion-guided Weakly Supervised Semantic Segmentation
* Diffusion-refined Vqa Annotations for Semi-supervised Gaze Following
* Diffusiondepth: Diffusion Denoising Approach for Monocular Depth Estimation
* Diffusionpen: Towards Controlling the Style of Handwritten Text Generation
* DIFFUX2CT: Diffusion Learning to Reconstruct CT Images from Biplanar X-rays
* Digital Twins-Assisted Multi-Autonomous Vehicle Distributed Collaborative Path Planning Algorithm with Fidelity Guarantee, A
* DIM: Dyadic Interaction Modeling for Social Behavior Generation
* Dino-tracker: Taming Dino for Self-supervised Point Tracking in a Single Video
* Direct Alignment for Robust NeRF Learning
* Direct Approach to Viewing Graph Solvability, A
* Direct Assimilation of Radar Reflectivity Data with a Two-Moment Microphysics Scheme for a Landfalling Typhoon in an OSSE Framework, The
* Direct Distillation Between Different Domains
* DISCO: Embodied Navigation and Interaction via Differentiable Scene Semantics and Dual-level Control
* Discomatch: Fast Discrete Optimisation for Geometrically Consistent 3d Shape Matching
* Discover-Then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery
* Discovering Novel Actions from Open World Egocentric Videos with Object-grounded Visual Commonsense Reasoning
* Discrete Cosine Transform-Based Joint Spectral-Spatial Information Compression and Band-Correlation Calculation for Hyperspectral Feature Extraction
* Discrete diffusion models with Refined Language-Image Pre-trained representations for remote sensing image captioning
* Discrete-Time Sliding Mode-Based Finite-Time Trajectory Tracking Control of Underactuated Surface Vessels With Large Sampling Periods
* Discussion Points of the Remote Sensing Study and Integrated Analysis of the Archaeological Landscape of Rujm el-Hiri
* Disentangled Clothed Avatar Generation from Text Descriptions
* Disentangled Generation and Aggregation for Robust Radiance Fields
* Disentangling Masked Autoencoders for Unsupervised Domain Generalization
* Dissecting Dissonance: Benchmarking Large Multimodal Models Against Self-contradictory Instructions
* Dissolving is Amplifying: Towards Fine-grained Anomaly Detection
* Distance-based loss function for deep feature space learning of convolutional neural networks
* Distill Gold from Massive Ores: Bi-level Data Pruning Towards Efficient Dataset Distillation
* Distilling Diffusion Models Into Conditional GANs
* Distilling Knowledge from Large-scale Image Models for Object Detection
* Distractor-free Novel View Synthesis via Exploiting Memorization Effect in Optimization
* Distractors-immune Representation Learning with Cross-modal Contrastive Regularization for Change Captioning
* Distributed Active Client Selection With Noisy Clients Using Model Association Scores
* Distributed Model Predictive Control for Virtually Coupled Heterogeneous Trains: Comparison and Assessment
* Distributed Multi-Agent Reinforcement Learning for Cooperative Low-Carbon Control of Traffic Network Flow Using Cloud-Based Parallel Optimization
* Distributed Semantic Segmentation with Efficient Joint Source and Task Decoding
* Distributed virtual selective-forwarding units and SDN-assisted edge computing for optimization of multi-party WebRTC videoconferencing
* Distribution Alignment for Fully Test-Time Adaptation with Dynamic Online Data Streams
* Distribution-aware Robust Learning from Long-tailed Data with Noisy Labels
* Distributionally Robust Loss for Long-tailed Multi-label Image Classification
* Divergent Drying Mechanisms in Humid and Non-Humid Regions Across China
* Diverse Text-to-3d Synthesis with Augmented Text Embedding
* Divide and Fuse: Body Part Mesh Recovery from Partially Visible Human Images
* DMiT: Deformable Mipmapped Tri-plane Representation for Dynamic Scenes
* DNI: Dilutional Noise Initialization for Diffusion Video Editing
* Do Generalised Classifiers Really Work on Human Drawn Sketches?
* Do Text-free Diffusion Models Learn Discriminative Visual Representations?
* Do They Share the Same Tail? Learning Individual Compositional Attribute Prototype for Generalized Zero-shot Learning
* DOCCI: Descriptions of Connected and Contrasting Images
* Dolfin: Diffusion Layout Transformers Without Autoencoder
* Dolphins: Multimodal Language Model for Driving
* Domain Adaptation Transformer for Unsupervised Driving-Scene Segmentation in Adverse Conditions
* Domain Aware Multi-task Pretraining of 3d Swin Transformer for T1-Weighted Brain MRI
* Domain Generalization of 3d Object Detection by Density-resampling
* Domain Reduction Strategy for Non-line-of-sight Imaging
* Domain Shifting: A Generalized Solution for Heterogeneous Cross-modality Person Re-identification
* Domain-adaptive 2d Human Pose Estimation via Dual Teachers in Extremely Low-light Conditions
* Domain-adaptive Video Deblurring via Test-time Blurring
* Domainfusion: Generalizing to Unseen Domains with Latent Diffusion Models
* Domesticating SAM for Breast Ultrasound Image Segmentation via Spatial-frequency Fusion and Uncertainty Correction
* Double supervision for scene text detection and recognition based on BMINet
* Doubletake: Geometry Guided Depth Estimation
* Doughnet: A Visual Predictive Model for Topological Manipulation of Deformable Objects
* DPA-Net: Structured 3d Abstraction from Sparse Views via Differentiable Primitive Assembly
* DPL: Cross-quality Deepfake Detection via Dual Progressive Learning
* DQ-DETR: DETR with Dynamic Query for Tiny Object Detection
* DR-Block: Convolutional Dense Reparameterization for CNN Generalization Free Improvement
* DragAnything: Motion Control for Anything Using Entity Representation
* Dragapart: Learning a Part-level Motion Prior for Articulated Objects
* Dragvideo: Interactive Drag-style Video Editing
* Dreamdiffusion: High-quality EEG-to-Image Generation with Temporal Masked Signal Modeling and Clip Alignment
* Dreamdissector: Learning Disentangled Text-to-3d Generation from 2d Diffusion Priors
* Dreamdrone: Text-to-image Diffusion Models Are Zero-shot Perpetual View Generators
* Dreamlip: Language-image Pre-training with Long Captions
* Dreammesh: Jointly Manipulating and Texturing Triangle Meshes for Text-to-3d Generation
* Dreammotion: Space-time Self-similar Score Distillation for Zero-shot Video Editing
* Dreammover: Leveraging the Prior of Diffusion Models for Image Interpolation with Large Motion
* Dreamreward: Text-to-3d Generation with Human Preference
* Dreamsampler: Unifying Diffusion Sampling and Score Distillation for Image Manipulation
* Dreamscene360: Unconstrained Text-to-3d Scene Generation with Panoramic Gaussian Splatting
* Dreamscene: 3d Gaussian-based Text-to-3d Scene Generation via Formation Pattern Sampling
* Dreamstruct: Understanding Slides and User Interfaces via Synthetic Data Generation
* Dreamview: Injecting View-specific Text Guidance Into Text-to-3d Generation
* DRGNN: Disentangled representation graph neural network for diverse category-level recommendations
* Drivedreamer: Towards Real-world-drive World Models for Autonomous Driving
* Drivelm: Driving with Graph Visual Question Answering
* Driver Drowsiness Detection Based on Joint Human Face and Facial Landmark Localization With Cheap Operations
* Driving Risk Assessment Framework Considering Driver's Fatigue State and Distraction Behavior, A
* Drivingdiffusion: Layout-guided Multi-view Driving Scenarios Video Generation with Latent Diffusion Model
* Dropout Mixture Low-rank Adaptation for Visual Parameters-efficient Fine-tuning
* DS-TFSN-Based Vehicle Travel Time Prediction Method for Digital Twin System of Freeways
* DSA: Discriminative Scatter Analysis for Early Smoke Segmentation
* DSCIC: Deep Screen Content Image Compression
* DSCIMABNet: A novel multi-head attention depthwise separable CNN model for skin cancer detection
* DSMIX: Distortion-induced Sensitivity Map Based Pre-training for No-reference Image Quality Assessment
* Dspdet3d: 3d Small Object Detection with Dynamic Spatial Pruning
* DSU-GAN: A robust frontal face recognition approach based on generative adversarial network
* DTCLMapper: Dual Temporal Consistent Learning for Vectorized HD Map Construction
* Dual Domain Perception and Progressive Refinement for Mirror Detection
* Dual Memory Networks Guided Reverse Distillation for Unsupervised Anomaly Detection
* Dual Prototype-driven Objectness Decoupling for Cross-domain Object Detection in Urban Scene
* Dual-Camera Smooth Zoom on Mobile Phones
* Dual-decoupling Learning and Metric-adaptive Thresholding for Semi-supervised Multi-label Learning
* Dual-level Adaptive Self-labeling for Novel Class Discovery in Point Cloud Segmentation
* Dual-path Adversarial Lifting for Domain Shift Correction in Online Test-time Adaptation
* Dual-path Multimodal Optimal Transport for Composed Image Retrieval
* Dual-rain: Video Rain Removal Using Assertive and Gentle Teachers
* Dual-Reference Source-Free Active Domain Adaptation for Nasopharyngeal Carcinoma Tumor Segmentation Across Multiple Hospitals
* Dual-stage Hyperspectral Image Classification Model with Spectral Supertoken
* Dual-Stream Anomaly Detection Network for Real-World Traffic Scenarios
* Dual-View Data Hallucination With Semantic Relation Guidance for Few-Shot Image Recognition
* DualBEV: Unifying Dual View Transformation with Probabilistic Correspondences
* Dualdn: Dual-domain Denoising via Differentiable ISP
* DualRisk: A Two-Branch Model for Sparse Traffic Accident Risk Forecasting
* Dvlo: Deep Visual-lidar Odometry with Local-to-global Feature Fusion and Bi-directional Structure Alignment
* DyConfidMatch: Dynamic thresholding and re-sampling for 3D semi-supervised learning
* Dyfadet: Dynamic Feature Aggregation for Temporal Action Detection
* Dyn-adapter: Towards Disentangled Representation for Efficient Visual Recognition
* Dynamic Data Selection for Efficient SSL via Coarse-to-fine Refinement
* Dynamic Expansion and Merging of the Equatorial Ionization Anomaly During the 10-11 May 2024 Super Geomagnetic Storm
* Dynamic Graph Memory Bank for Video Inpainting
* Dynamic Guidance Adversarial Distillation with Enhanced Teacher Knowledge
* Dynamic Neural Field Approach for Intelligent Cockpits: Online Learning and Prediction of Traveling Routines, A
* Dynamic Neural Radiance Field from Defocused Monocular Video
* Dynamic Recursive Logit Model for Vehicle Driving Route Choices and Path Inference with Incomplete Fixed Location Sensor Data
* Dynamic Resource Allocation for Cloud-Edge Collaboration Offloading in VEC Networks With Diverse Tasks
* Dynamic Retraining-Updating Mean Teacher for Source-Free Object Detection
* Dynamic Routing and Knowledge Re-Learning for Data-Free Black-Box Attack
* Dynamic Semantic-Based Spatial-Temporal Graph Convolution Network for Skeleton-Based Human Action Recognition
* Dynamic Weighted Fusion and Progressive Refinement Network for Visible-Depth-Thermal Salient Object Detection
* Dynamic Window Transformer for Image Super-resolution
* Dynamicrafter: Animating Open-domain Images with Video Diffusion Priors
* Dynmf: Neural Motion Factorization for Real-time Dynamic View Synthesis with 3d Gaussian Splatting
* Dynosurf: Neural Deformation-based Temporally Consistent Dynamic Surface Reconstruction
* Dyset: A Dynamic Masked Self-distillation Approach for Robust Trajectory Prediction
* D^4-vton: Dynamic Semantics Disentangling for Differential Diffusion Based Virtual Try-on
* E.T. the Exceptional Trajectories: Text-to-Camera-Trajectory Generation with Character Awareness
* E3m: Zero-shot Spatio-temporal Video Grounding with Expectation-maximization Multimodal Modulation
* E3V-K5: An Authentic Benchmark for Redefining Video-based Energy Expenditure Estimation
* EA-VTR: Event-aware Video-text Retrieval
* Eaformer: Scene Text Segmentation with Edge-aware Transformers
* EAGLES: Efficient Accelerated 3d Gaussians with Lightweight Encodings
* Early Anticipation of Driving Maneuvers
* Early Modeling of the Upcoming Landsat Next Constellation for Soybean Yield Prediction Under Varying Levels of Water Availability
* Early Preparation Pays Off: New Classifier Pre-tuning for Class Incremental Semantic Segmentation
* EAS-SNN: End-to-end Adaptive Sampling and Representation for Event-based Detection with Recurrent Spiking Neural Networks
* Easing 3d Pattern Reasoning with Side-view Features for Semantic Scene Completion
* EBDM: Exemplar-guided Image Translation with Brownian-bridge Diffusion Models
* Echoscene: Indoor Scene Generation via Information Echo Over Scene Graph Diffusion
* Ecomatcher: Efficient Clustering Oriented Matcher for Detector-free Image Matching
* Economic Framework for 6-dof Grasp Detection, An
* Ecosystem Stability in the Ugan-Kuqa River Basin, Xinjiang, China: Investigation of Spatial and Temporal Dynamics and Driving Forces
* ECTFormer: An efficient Conv-Transformer model design for image recognition
* EDAF: Early Detection of Atrial Fibrillation from Post-stroke Brain Mri
* EDERF: Updating Local Scenes and Editing Across Fields for Real-time Dynamic Reconstruction of Road Scene
* Edformer: Transformer-based Event Denoising Across Varied Noise Levels
* Edge Detection of Source Body from Magnetic Anomaly Based on ResNet
* Edge-Guided Fusion and Motion Augmentation for Event-image Stereo
* EDH-STNet: An Evaporation Duct Height Spatiotemporal Prediction Model Based on Swin-Unet Integrating Multiple Environmental Information Sources
* Editable Image Elements for Controllable Synthesis
* Editorial for pattern recognition letters special issue on Advances in Disinformation Detection and Media Forensics
* Editshield: Protecting Unauthorized Image Editing by Instruction-guided Diffusion Models
* EDS: Exploring deeper into semantics for video captioning
* Edtalk: Efficient Disentanglement for Emotional Talking Head Synthesis
* EDWNet: A Novel Encoder-Decoder Architecture Network for Water Body Extraction from Optical Images
* EEG Microstates and fNIRS Metrics Reveal the Spatiotemporal Joint Neural Processing Features of Human Emotions
* Effect of DEM Used for Terrain Correction on Forest Windthrow Detection Using COSMO SkyMed Data
* Effective Lymph Nodes Detection in CT Scans Using Location Debiased Query Selection and Contrastive Query Representation in Transformer
* effective multi-scale interactive fusion network with hybrid Transformer and CNN for smoke image segmentation, An
* Effective variance attention-enhanced diffusion model for crop field aerial image super resolution
* Efficient 3d-aware Facial Image Editing via Attribute-specific Prompt Learning
* Efficient Active Domain Adaptation for Semantic Segmentation by Selecting Information-rich Superpixels
* Efficient and Effective Transformer Decoder-based Framework for Multi-task Visual Grounding, An
* Efficient and Practical Conditional Privacy-Preserving Aggregate Authentication for Vehicular Ad-Hoc Networks, An
* Efficient and Versatile Robust Fine-tuning of Zero-shot Models
* Efficient Bias Mitigation Without Privileged Information
* Efficient Cascaded Multiscale Adaptive Network for Image Restoration
* Efficient Cross-Modal Video Retrieval With Meta-Optimized Frames
* Efficient degradation representation learning network for remote sensing image super-resolution
* Efficient Depth-guided Urban View Synthesis
* Efficient Diffusion Model for Image Restoration by Residual Shifting
* Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators
* Efficient Diffusion-driven Corruption Editor for Test-time Adaptation
* Efficient Feature Reuse Distillation Network for Lightweight Image Super-Resolution, An
* Efficient Few-shot Action Recognition via Multi-level Post-reasoning
* Efficient Frequency-domain Image Deraining with Contrastive Regularization
* Efficient Image Pre-training with Siamese Cropped Masked Autoencoders
* Efficient Implicit SDF and Color Reconstruction via Shared Feature Field
* Efficient Inference of Vision Instruction-following Models with Elastic Cache
* Efficient Learning of Event-based Dense Representation Using Hierarchical Memories with Adaptive Update
* Efficient NeRF Optimization - Not All Samples Remain Equally Hard
* Efficient Neural Video Representation with Temporally Coherent Modulation
* Efficient Pre-training for Localized Instruction Generation of Procedural Videos
* Efficient Rolling-Horizon Approach for Cooperative Multi-Lane Platoon Formation With Undefined Configurations, An
* Efficient Snapshot Spectral Imaging: Calibration-free Parallel Structure with Aperture Diffraction Fusion
* Efficient Statistical Sampling Adaptation for Exemplar-Free Class Incremental Learning
* Efficient Training of Spiking Neural Networks with Multi-parallel Implicit Stream Architecture
* Efficient Training with Denoised Neural Weights
* Efficient Unsupervised Visual Representation Learning with Explicit Cluster Balancing
* Efficient Vehicle Detection and Optimization in Multi-Graph Mode Considering Multi-Section Tracking Based on Geographic Similarity
* Efficient Vehicle Trajectory Prediction With Goal Lane Segments and Dual-Stream Cross Attention
* Efficient Vision Transformers with Partial Attention
* Effiseanet: Pioneering Lightweight Network for Underwater Salient Object Detection
* EGIC: Enhanced Low-bit-rate Generative Image Compression Guided by Semantic Segmentation
* EGO-LM: An efficient, generic, and out-of-the-box language model for handwritten text recognition
* Egobody3m: Egocentric Body Tracking on a VR Headset using a Diverse Dataset
* Egocoord: Self-calibrated Egocentric 3d Body Pose Estimation Using Pixel-wise Coordinate Encoding
* Egocvr: An Egocentric Benchmark for Fine-grained Composed Video Retrieval
* Egoexo-fitness: Towards Egocentric and Exocentric Full-body Action Understanding
* Egolifter: Open-world 3d Segmentation for Egocentric Perception
* EGOPET: Egomotion and Interaction Data from an Animal's Perspective
* Egoposeformer: A Simple Baseline for Stereo Egocentric 3d Human Pose Estimation
* Egoposer: Robust Real-time Egocentric Pose Estimation from Sparse and Intermittent Observations Everywhere
* EgoVSR: Toward High-Quality Egocentric Video Super-Resolution
* Einet: Point Cloud Completion via Extrapolation and Interpolation
* Elaborate Teacher: Improved Semi-Supervised Object Detection With Rich Image Exploiting
* Elegantly Written: Disentangling Writer and Character Styles for Enhancing Online Chinese Handwriting
* Elevating All Zero-shot Sketch-based Image Retrieval Through Multimodal Prompt Learning
* Eliminating Feature Ambiguity for Few-shot Segmentation
* Eliminating Warping Shakes for Unsupervised Online Video Stitching
* ELLAR: An Action Recognition Dataset for Extremely Low-light Conditions with Dual Gamma Adaptive Modulation
* ELSE: Efficient Deep Neural Network Inference Through Line-based Sparsity Exploration
* Elucidating the Hierarchical Nature of Behavior with Masked Autoencoders
* Elysium: Exploring Object-level Perception in Videos via MLLM
* eMARLIN+: Addressing Partial Observability to Promote Traffic Signal Coordination by Leveraging Historical Information
* Embedded feature selection for robust probability learning machines
* Embedded Real-Time Vehicle and Pedestrian Detection Using a Compressed Tiny YOLO v3 Architecture
* Embedding-free Transformer with Inference Spatial Reduction for Efficient Semantic Segmentation
* Embodied Understanding of Driving Scenarios
* Embracing Events and Frames with Hierarchical Feature Refinement Network for Object Detection
* EMDM: Efficient Motion Diffusion Model for Fast and High-quality Motion Generation
* Emergent Visual-semantic Hierarchies in Image-text Representations
* Emerging Property of Masked Token for Effective Pre-training
* EMIE-MAP: Large-scale Road Surface Reconstruction Based on Explicit Mesh and Implicit Encoding
* EMO: Emote Portrait Alive Generating Expressive Portrait Videos with Audio2video Diffusion Model Under Weak Conditions
* EmoTake: Exploring Drivers' Emotion for Takeover Behavior Prediction
* Emotalk3d: High-fidelity Free-view Synthesis of Emotional 3d Talking Head
* Emotalker: Audio Driven Emotion Aware Talking Head Generation
* Empirical Evaluation of Machine Learning Models for Fuel Consumption, Driver Identification, and Behavior Prediction
* Empirical Study and Analysis of Text-to-image Generation Using Large Language Model-powered Textual Representation, An
* Empowering Embodied Visual Tracking with Visual Foundation Models and Offline RL
* Encapsulating Knowledge in One Prompt
* End-to-End Image Colorization With Multiscale Pyramid Transformer
* End-to-end Rate-distortion Optimized 3d Gaussian Representation
* end-to-end tracking framework via multi-view and temporal feature aggregation, An
* Energy-calibrated VAE with Test Time Free Lunch
* Energy-Efficient Symbiotic UAV-Enabled MEC Networks via RIS: Joint Trajectory and Phase-Shift Control Optimization
* Energy-induced Explicit Quantification for Multi-modality MRI Fusion
* Enhanced Asymmetric Invertible Network for Neural Video Delivery
* Enhanced Controllability of Diffusion Models via Feature Disentanglement and Realism-enhanced Sampling Methods
* Enhanced Fishing Monitoring in the Central-Eastern North Pacific Using Deep Learning with Nightly Remote Sensing
* Enhanced Kalman with Adaptive Appearance Motion Sort for Grounded Generic Multiple Object Tracking
* Enhanced Motion Compensated Temporal Filter for VVenC
* Enhanced Motion Forecasting with Visual Relation Reasoning
* Enhanced Shuffle Attention with Context Decoupling Head with Wise IoU Loss for SAR Ship Detection, An
* Enhanced Sparsification via Stimulative Training
* Enhanced Super-resolution Training via Mimicked Alignment for Real-world Scenes
* Enhanced YOLOv8-Based Model with Context Enrichment Module for Crowd Counting in Complex Drone Imagery
* Enhancing 3d Human Pose Estimation with Bone Length Adjustment
* Enhancing abusive language detection: A domain-adapted approach leveraging BERT pre-training tasks
* Enhancing Anchor-based Weakly Supervised Referring Expression Comprehension with Cross-modality Attention
* Enhancing Cross-subject fmri-to-video Decoding with Global-local Functional Alignment
* Enhancing Diffusion Models with Text-encoder Reinforcement Learning
* Enhancing Long-Term Robustness of Inter-Space Laser Links in Space Gravitational Wave Detection: An Adaptive Weight Optimization Method for Multi-Attitude Sensors Data Fusion
* Enhancing Multimedia Applications by Removing Dynamic Objects in Neural Radiance Fields
* Enhancing Object Detection in Adverse Weather Conditions Through Entropy and Guided Multimodal Fusion
* Enhancing Optimization Robustness in 1-bit Neural Networks Through Stochastic Sign Descent
* Enhancing Pedestrian Route Choice Models Through Maximum-Entropy Deep Inverse Reinforcement Learning with Individual Covariates (MEDIRL-IC)
* Enhancing Perceptual Quality in Video Super-resolution Through Temporally-consistent Detail Synthesis Using Diffusion Models
* Enhancing Photo Animation: Augmented Stylistic Modules and Prior Knowledge Integration
* Enhancing Plausibility Evaluation for Generated Designs with Denoising Autoencoder
* Enhancing Recipe Retrieval with Foundation Models: A Data Augmentation Perspective
* Enhancing robust VQA via contrastive and self-supervised learning
* Enhancing Robustness to Noise Corruption for Point Cloud Recognition via Spatial Sorting and Set-mixing Aggregation Module
* Enhancing Semantic Fidelity in Text-to-image Synthesis: Attention Regulation in Diffusion Models
* Enhancing Source-Free Domain Adaptive Object Detection with Low-Confidence Pseudo Label Distillation
* Enhancing Tampered Text Detection Through Frequency Feature Fusion and Decomposition
* Enhancing Tracking Robustness with Auxiliary Adversarial Defense Networks
* Enhancing Traffic Object Detection in Variable Illumination with RGB-Event Fusion
* Enhancing Vectorized Map Perception with Historical Rasterized Maps
* Enriching Information and Preserving Semantic Consistency in Expanding Curvilinear Object Segmentation Datasets
* Ensemble Network-Based Distillation for Hyperspectral Image Classification in the Presence of Label Noise
* Ensemble of Deep Clustering Models with Autoencoders to Mine Travel Patterns From Smart Card Data, An
* Ensemble SARSA and LSTM for User-Centric Handover Decisions in 5G Vehicular Networks
* Entaugment: Entropy-driven Adaptive Data Augmentation Framework for Image Classification
* Epipolargan: Omnidirectional Image Synthesis with Explicit Camera Control
* EPP-GAS: An Efficient and Privacy-Preserving Cross Trust-Domain Group Authentication Scheme for Vehicle Platoon Based on Blockchain
* EQ-CBM: A Probabilistic Concept Bottleneck with Energy-based Models and Quantized Vectors
* Equi-GSPR: Equivariant Se(3) Graph Network Model for Sparse Point Cloud Registration
* Equilibria for Joint Congestion Game With Destination and Route Choices
* Equivariant Spatio-temporal Self-supervision for Lidar Object Detection
* ErasedRAW: Learning to Insert Objects by Erasing Them from Images
* Esm-yolo: Enhanced Small Target Detection Based on Visible and Infrared Multi-modal Fusion
* Estimating AVHRR snow cover fraction by coupling physical constraints into a deep learning framework
* Estimating optical flow: A comprehensive review of the state of the art
* Estimating Per-Class Statistics for Label Noise Learning
* Estimating Rainfall Anomalies with IMERG Satellite Data: Access via the IPE Web Application
* Estimating Soil Organic Carbon from Multispectral Images Using Physics-informed Neural Networks
* Estimation of Crop Residue Cover Utilizing Multiple Ground Truth Survey Techniques and Multi-Satellite Regression Models
* Estimation of Winter Wheat Stem Biomass by a Novel Two-Component and Two-Parameter Stratified Model Using Proximal Remote Sensing and Phenological Variables
* ETA Inversion: Designing an Optimal ETA Function for Diffusion-Based Real Image Editing
* Evaluating Text-to-visual Generation with Image-to-text Generation
* Evaluating the Adversarial Robustness of Semantic Segmentation: Trying Harder Pays Off
* Evaluation of Fengyun-4B Satellite Temperature Profile Products Using Radiosonde Observations and ERA5 Reanalysis over Eastern Tibetan Plateau
* Evaluation of the Accuracy of Interactive Multisensor Snow and Ice Mapping System (IMS) 1 km Product Using Ground Snow Depth Data Across China
* Evaluation of Virtual Agents' Hostility in Video Games
* Evaluation of visual SLAM algorithms in unstructured planetary-like and agricultural environments
* Event Camera Data Dense Pre-training
* Event Trojan: Asynchronous Event-based Backdoor Attacks
* Event-adapted Video Super-resolution
* Event-aided Time-to-collision Estimation for Autonomous Driving
* Event-based Head Pose Estimation: Benchmark and Method
* Event-based Image Enhancement Under High Dynamic Range Scenarios
* Event-based Mosaicing Bundle Adjustment
* Event-based Motion Magnification
* Event-Enhanced Snapshot Mosaic Hyperspectral Frame Deblurring
* Eventbind: Learning a Unified Representation to Bind Them All for Event-based Open-world Understanding
* EventHDR: From Event to High-Speed HDR Videos and Beyond
* Every Pixel Has Its Moments: Ultra-high-resolution Unpaired Image-to-image Translation via Dense Normalization
* Every Shot Counts: Using Exemplars for Repetition Counting in Videos
* eViTBins: Edge-Enhanced Vision-Transformer Bins for Monocular Depth Estimation on Edge Devices
* Evolving Interpretable Visual Classifiers with Large Language Models
* EvRepSL: Event-Stream Representation via Self-Supervised Learning for Event-Based Vision
* EVSIGN: Sign Language Recognition and Translation with Streaming Events
* EX2EG-MAE: A Framework for Adaptation of Exocentric Video Masked Autoencoders for Egocentric Social Role Understanding
* Exact Diffusion Inversion via Bidirectional Integration Approximation
* Examining Spatial Accessibility and Equity of Public Hospitals for Older Adults in Songjiang District, Shanghai
* Examining the Causal and Heterogeneous Influence of Three-Dimensional Urban Forms on CO2 Emissions in 285 Chinese Cities
* Exemplar-free Continual Representation Learning via Learnable Drift Compensation
* Exmatch: Self-Guided Exploitation for Semi-Supervised Learning with Scarce Labeled Samples
* Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-concept Alignment and Retention
* Explain via Any Concept: Concept Bottleneck Model with Open Vocabulary Concepts
* Explainability-based knowledge distillation
* Explainable AI for Safe and Trustworthy Autonomous Driving: A Systematic Review
* Explainable hypergraphs for gait based Parkinson classification
* Explainable Vision Question Answer Model via Diffusion Chain-of-thought, An
* Explicitly Guided Information Interaction Network for Cross-modal Point Cloud Completion
* Exploiting Cross-modal Cost Volume for Multi-sensor Depth Estimation
* Exploiting Dual-correlation for Multi-frame Time-of-flight Denoising
* Exploiting Semantic Reconstruction to Mitigate Hallucinations in Vision-language Models
* Exploiting Supervised Poison Vulnerability to Strengthen Self-supervised Defense
* Explorative Inbetweening of Time and Space
* Explore the Potential of CLIP for Training-free Open Vocabulary Semantic Segmentation
* Exploring Active Learning in Meta-learning: Enhancing Context Set Labeling
* Exploring Conditional Multi-Modal Prompts for Zero-Shot HOI Detection
* Exploring Event-Based Human Pose Estimation with 3D Event Representations
* Exploring Factors Related to Drivers' Mental Model of and Trust in Advanced Driver Assistance Systems Using an ABN-Based Mixed Approach
* Exploring Georeferenced Augmented Reality for Architectural Visualization with Unmanned Aerial Vehicles
* Exploring Guided Sampling of Conditional GANs
* Exploring Limits of Diffusion-synthetic Training with Weakly Supervised Semantic Segmentation
* Exploring Phrase-level Grounding with Text-to-image Diffusion Model
* Exploring Potential Customized Bus Passengers Across Private Car Trajectory Data
* Exploring Pre-trained Text-to-video Diffusion Models for Referring Video Object Segmentation
* Exploring Reliable Matching with Phase Enhancement for Night-time Semantic Segmentation
* Exploring sample relationship for few-shot classification
* Exploring the Contribution Roles from Municipal Cities in the Rise in Household CO2 Emissions in China: From a Local Scale Analysis in the Global Context
* Exploring the Feature Extraction and Relation Modeling For Light-weight Transformer Tracking
* Exploring the Relationship Between Very-High-Resolution Satellite Imagery Data and Fruit Count for Predicting Mango Yield at Multiple Scales
* Exploring Vulnerabilities in Spiking Neural Networks: Direct Adversarial Attacks on Raw Event Data
* Expressive Whole-body 3d Gaussian Avatar
* External Knowledge Enhanced 3d Scene Generation from Sketch
* Eye-movement-prompted large image captioning model
* Eyes Closed, Safety on: Protecting Multimodal LLMs via Image-to-text Transformation
* F-HOI: Toward Fine-grained Semantic-aligned 3d Human-object Interactions
* FA-HRNet: A New Fusion Attention Approach for Vegetation Semantic Segmentation and Analysis
* Fabrication of Reality and Fantasy: Scene Generation with LLM-assisted Prompt Interpretation, The
* Face Reconstruction Transfer Attack as Out-of-distribution Generalization
* Face-Adapter for Pre-Trained Diffusion Models with Fine-grained ID and Attribute Control
* Faceptor: A Generalist Model for Face Perception
* Facial Affective Behavior Analysis with Instruction Tuning
* Facial and Neck Region Analysis for Deepfake Detection Using Remote Photoplethysmography Signal Similarity
* Facing Asymmetry: Uncovering the Causal Link Between Facial Symmetry and Expression Classifiers Using Synthetic Interventions
* Factorized Diffusion: Perceptual Illusions by Noise Decomposition
* Factorizing Text-to-video Generation by Explicit Image Conditioning
* FAFA: Frequency-aware Flow-aided Self-supervision for Underwater Object Pose Estimation
* Fair Ranking and New Model for Panoptic Scene Graph Generation, A
* Fairdomain: Achieving Fairness in Cross-domain Medical Image Segmentation and Classification
* Fairness-aware Vision Transformer via Debiased Self-attention
* Fairvit: Fair Vision Transformer via Adaptive Masking
* Fake It till You Make It: Curricular Dynamic Forgery Augmentations Towards General Deepfake Detection
* FALIP: Visual Prompt as Foveal Attention Boosts CLIP Zero-shot Performance
* FAM: Adaptive federated meta-learning for MRI data
* Famous: High-fidelity Monocular 3d Human Digitization Using View Synthesis
* Fare: A Feature-aware Radical Encoding Strategy for Zero-shot Chinese Character Recognition
* FARSE-CNN: Fully Asynchronous, Recurrent and Sparse Event-based Cnn
* Fast adaptively balanced min-cut clustering
* Fast and accurate SAR geocoding with a plane approximation
* Fast Context-based Low-light Image Enhancement via Neural Implicit Representations
* fast differential network with adaptive reference sample for gaze estimation, A
* Fast Diffusion-based Counterfactuals for Shortcut Removal and Generation
* Fast Encoding and Decoding for Implicit Video Representation
* Fast Linear Equation Solving Algorithm and its Pipelined Hardware Architecture Design for VVC Affine Motion Estimation
* Fast Point Cloud Geometry Compression with Context-based Residual Coding and INR-based Refinement
* Fast Registration of Photorealistic Avatars for VR Facial Animation
* Fast Sprite Decomposition from Animated Graphics
* Fast Training of Diffusion Transformer with Extreme Masking for 3d Point Clouds Generation
* Fast Video Deduplication and Localization With Temporal Consistence Re-Ranking
* Fast View Synthesis of Casual Videos with Soup-of-planes
* FAST-LiDAR-SLAM: A Robust and Real-Time Factor Graph for Urban Scenarios With Unstable GPS Signals
* FastCad: Real-Time CAD Retrieval and Alignment from Scans and Videos
* Faster Convergence and Uncorrelated Gradients in Self-supervised Online Continual Learning
* FastPCI: Motion-structure Guided Fast Point Cloud Frame Interpolation
* FastSTI: A Fast Conditional Pseudo Numerical Diffusion Model for Spatio-Temporal Traffic Data Imputation
* Fault Kinematics of the 2022 Delingha Mw 5.6 and Mw 5.7 Earthquakes Revealed by InSAR Observations
* FBSTCNet: A Spatio-Temporal Convolutional Network Integrating Power and Connectivity Features for EEG-Based Emotion Decoding
* FDNet: Frequency Decomposition Network for Learned Image Compression
* FE-SKViT: A Feature-Enhanced ViT Model with Skip Attention for Automatic Modulation Recognition
* Feature differences reduction and specific features preserving network for RGB-T salient object detection
* Feature Diversification and Adaptation for Federated Domain Generalization
* Feature Estimation of Global Language Processing in EEG Using Attention Maps
* Feature Generator for Few-shot Learning, A
* Feature Intensification Using Perception-Guided Regional Classification for Remote Sensing Image Super-Resolution
* Feature-matching method based on keypoint response constraint using binary encoding of phase congruency
* Federated Class Incremental Learning: A Pseudo Feature Based Approach Without Exemplars
* Federated Learning with Local Openset Noisy Labels
* Fedharm: Harmonizing Model Architectural Diversity in Federated Learning
* FedHide: Federated Learning by Hiding in the Neighbors
* FedKT: Federated learning with knowledge transfer for non-IID data
* FEDL: Confidential Deep Learning for Autonomous Driving in VANETs Based on Functional Encryption
* Fedra: A Random Allocation Strategy for Federated Tuning to Unleash the Power of Heterogeneous Clients
* Fedrepopt: Gradient Re-parametrized Optimizers in Federated Learning
* FedShip: Federated Over-the-Air Learning for Communication-Efficient and Privacy-Aware Smart Shipping in 6G Communications
* Fedtsa: A Cluster-based Two-stage Aggregation Method for Model-heterogeneous Federated Learning
* Fedvad: Enhancing Federated Video Anomaly Detection with GPT-driven Semantic Distillation
* FERRET-UI: Grounded Mobile UI Understanding with Multimodal LLMs
* Few Exemplar-based General Medical Image Segmentation via Domain-aware Selective Adaptation
* Few-shot Anomaly-driven Generation for Anomaly Classification and Segmentation
* Few-shot Class Incremental Learning with Attention-aware Self-adaptive Prompt
* Few-shot Defect Image Generation Based on Consistency Modeling
* Few-shot Image Generation by Conditional Relaxing Diffusion Inversion
* Few-shot NeRF by Adaptive Rendering Loss Regularization
* Few-Shot Point Cloud Semantic Segmentation via Support-Query Feature Interaction
* FG-CXR: A Radiologist-aligned Gaze Dataset for Enhancing Interpretability in Chest X-ray Report Generation
* FHLight: A novel method of indoor scene illumination estimation using improved loss function
* Find n' Propagate: Open-vocabulary 3d Object Detection in Urban Environments
* Finding a Taxi With Illegal Driver Substitution Activity via Behavior Modelings
* Finding Meaning in Points: Weakly Supervised Semantic Segmentation for Event Cameras
* Finding Needles in a Haystack: A Black-box Approach to Invisible Watermark Detection
* Finding Nemo: Negative-mined Mosaic Augmentation for Referring Image Segmentation
* Finding Visual Task Vectors
* Fine-grained Automatic Augmentation for handwritten character recognition
* Fine-Grained Background Representation for Weakly Supervised Semantic Segmentation
* Fine-grained Dynamic Network for Generic Event Boundary Detection
* Fine-grained Scene Graph Generation via Sample-level Bias Prediction
* Fine-grained semantic oriented embedding set alignment for text-based person search
* Fine-tuning Large Language Models for Automatic Font Skeleton Generation: Exploration and Analysis
* Finematch: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction
* Finepseudo: Improving Pseudo-labelling Through Temporal-alignablity for Semi-supervised Fine-grained Action Recognition
* FIOD-VUE: Focusing on Invariant Information in Object Detection of Varying Underwater Environment
* FipTR: A Simple yet Effective Transformer Framework for Future Instance Prediction in Autonomous Driving
* First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-language Models?, The
* Fisher Calibration for Backdoor-robust Heterogeneous Federated Learning
* Fisherrf: Active View Selection and Mapping with Radiance Fields Using Fisher Information
* Flash Cache: Reducing Bias in Radiance Cache Based Inverse Rendering
* Flash-Splat: 3d Reflection Removal with Flash Cues and Gaussian Splats
* Flashsplat: 2d to 3d Gaussian Splatting Segmentation Solved Optimally
* Flashtex: Fast Relightable Mesh Texturing with Lightcontrolnet
* Flat: Flux-aware Imperceptible Adversarial Attacks on 3d Point Clouds
* Flatness-aware Sequential Learning Generates Resilient Backdoors
* Flexattention for Efficient High-resolution Vision-language Models
* Flexible Distribution Alignment: Towards Long-tailed Semi-supervised Learning with Proper Calibration
* Flexiedit: Frequency-aware Latent Refinement for Enhanced Non-rigid Editing
* Flood Risk Analysis of Urban Agglomerations in the Yangtze River Basin Under Extreme Precipitation Based on Remote Sensing Technology
* Flooded Infrastructure Change Detection in Deeply Supervised Networks Based on Multi-Attention-Constrained Multi-Scale Feature Fusion
* Flow-assisted Motion Learning Network for Weakly-supervised Group Activity Recognition
* FLOWCON: Out-of-distribution Detection Using Flow-based Contrastive Learning
* Flowed Time of Flight Radiance Fields
* Flying with Photons: Rendering Novel Views of Propagating Light
* Fmboost: Boosting Latent Diffusion with Flow Matching
* Focusdiffuser: Perceiving Local Disparities for Camouflaged Object Detection
* Focusnet: Cascaded Lightweight Networks and Ascending Feature Enhancement for Efficient Salient Object Detection
* Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Models
* Fontstudio: Shape-adaptive Diffusion Model for Coherent and Consistent Font Effect Generation
* For a semiotic AI: Bridging computer vision and visual semiotics for computational observation of large scale facial image archives
* Forbes: Face Obfuscation Rendering via Backpropagation Refinement Scheme
* Forecasting Blue and Green Water Footprint of Wheat Based on Single, Hybrid, and Stacking Ensemble Machine Learning Algorithms Under Diverse Agro-Climatic Conditions in Nile Delta, Egypt
* Forecasting Future Videos from Novel Views via Disentangled 3d Scene Representation
* Forest2seq: Revitalizing Order Prior for Sequential Indoor Scene Synthesis
* Forget More to Learn More: Domain-specific Feature Unlearning for Semi-supervised and Unsupervised Domain Adaptation
* Forget to Learn (F2L): Circumventing plasticity-stability trade-off in continuous unsupervised domain adaptation
* Formula-Supervised Visual-Geometric Pre-Training
* Foster Adaptivity and Balance in Learning with Noisy Labels
* FOTV-HQS: A Fractional-order Total Variation Model for Lidar Super-resolution with Deep Unfolding Network
* Found missing semantics: Supplemental prototype network for few-shot semantic segmentation
* Foundation Model-powered 3d Few-shot Class Incremental Learning via Training-free Adaptor
* Foundpose: Unseen Object Pose Estimation with Foundation Features
* Four Ways to Improve Verbo-visual Fusion for Dense 3d Visual Grounding
* FouriScale: A Frequency Perspective on Training-free High-resolution Image Synthesis
* Framework for Efficient Model Evaluation Through Stratification, Sampling, and Estimation, A
* Frdiff: Feature Reuse for Universal Training-free Acceleration of Diffusion Models
* Freditor: High-fidelity and Transferable NeRF Editing by Frequency Decomposition
* Free Lunch for Gait Recognition: A Novel Relation Descriptor
* Free-atm: Harnessing Free Attention Masks for Representation Learning on Diffusion-generated Images
* Free-editor: Zero-shot Text-driven 3d Scene Editing
* Free-viewpoint Video of Outdoor Sports Using a Flying Camera
* Free-VSC: Free Semantics from Visual Foundation Models for Unsupervised Video Semantic Compression
* Freeaugment: Data Augmentation Search Across All Degrees of Freedom
* Freecompose: Generic Zero-Shot Image Composition with Diffusion Prior
* Freediff: Progressive Frequency Truncation for Image Editing with Diffusion Models
* Freeinit: Bridging Initialization Gap in Video Diffusion Models
* Freemotion: A Unified Framework for Number-Free Text-to-Motion Synthesis
* Freemotion: Mocap-free Human Motion Synthesis with Multimodal Large Language Models
* FreeStyleRet: Retrieving Images from Style-Diversified Queries
* Freeview Sketching: View-aware Fine-grained Sketch-based Image Retrieval
* Freeze: Training-Free Zero-Shot 6D Pose Estimation with Geometric and Vision Foundation Models
* Frepolad: Frequency-rectified Point Latent Diffusion for Point Cloud Generation
* Frequency Learning Network with Dual-guidance Calibration for Camouflaged Object Detection
* Frequency-Hopping Binary Offset Carrier Modulation with Independent Frequency-Hopping Patterns in Lower and Upper Sidebands
* Frequency-spatial Entanglement Learning for Camouflaged Object Detection
* Frest: Feature Restoration for Semantic Segmentation Under Multiple Adverse Conditions
* FRI-NET: Floorplan Reconstruction via Room-wise Implicit Representation
* FrictionSegNet: Simultaneous Semantic Segmentation and Friction Estimation Using Hierarchical Latent Variable Models
* From Fake to Real: Pretraining on Balanced Synthetic Images to Prevent Spurious Correlations in Image Recognition
* From Methods to Applications: A Review of Deep 3D Human Motion Capture
* From Pixels to Objects: A Hierarchical Approach for Part and Object Segmentation Using Local and Global Aggregation
* Frontier-enhanced Topological Memory with Improved Exploration Awareness for Embodied Visual Navigation
* FROSSL: Frobenius Norm Minimization for Efficient Multiview Self-supervised Learning
* Frugal 3d Point Cloud Model Training via Progressive Near Point Filtering and Fused Aggregation
* FS-Depth: Focal-and-Scale Depth Estimation From a Single Image in Unseen Indoor Scene
* Fsd-BEV: Foreground Self-distillation for Multi-view 3d Object Detection
* Fsgait: Fine-grained Self-supervised Gait Abnormality Detection
* Fsgs: Real-time Few-shot View Synthesis Using Gaussian Splatting
* FTBC: Forward Temporal Bias Correction for Optimizing ANN-SNN Conversion
* FTM: The Face Truth Machine: Hand-crafted features from micro-expressions to support lie detection
* Full-Aperture Reflective Remote Fourier Ptychography with Sample Matching
* Full-body Human De-lighting with Semi-supervised Learning
* Full-reference calibration-free image quality assessment
* Full-Waveform Inversion of Two-Parameter Ground-Penetrating Radar Based on Quadratic Wasserstein Distance
* Fully Authentic Visual Question Answering Dataset from Online Communities
* Fully exploring object relation interaction and hidden state attention for video captioning
* Fully Sparse 3d Occupancy Prediction
* Functional Transform-Based Low-Rank Tensor Factorization for Multi-Dimensional Data Recovery
* Fundamental Matrix Estimation Using Relative Depths
* FUNQA: Towards Surprising Video Comprehension
* Fuseteacher: Modality-fused Encoders are Strong Vision Supervisors
* Fusion and Discrimination: A Multimodal Graph Contrastive Learning Framework for Multimodal Sarcasm Detection
* Fusion of Temporal Transformer and Spatial Graph Convolutional Network for 3-D Skeleton-Parts-Based Human Motion Prediction
* Future of Radar Space Observation in Europe: Major Upgrade of the Tracking and Imaging Radar (TIRA), The
* Future Site Suitability for Urban Waste Management in English Bazar and Old Malda Municipalities, West Bengal: A Geospatial and Machine Learning Approach
* Futuredepth: Learning to Predict the Future Improves Video Depth Estimation
* FYI: Flip Your Images for Dataset Distillation
* G3R: Gradient Guided Generalizable Reconstruction
* Gaitw: Enhancing Gait Recognition in the Wild Using Dynamic Information
* Gallop: Learning Global and Local Prompts for Vision-language Models
* Gamma-face: Gaussian Mixture Models Amend Diffusion Models for Bias Mitigation in Face Images
* GARET: Cross-view Video Geolocalization with Adapters and Auto-regressive Transformers
* Garmentaligner: Text-to-garment Generation via Retrieval-augmented Multi-level Corrections
* Garmentcodedata: A Dataset of 3d Made-to-measure Garments with Sewing Patterns
* Gated Temporal Diffusion for Stochastic Long-term Dense Anticipation
* GAURA: Generalizable Approach for Unified Restoration and Rendering of Arbitrary Views
* Gaussctrl: Multi-view Consistent Text-driven 3d Gaussian Splatting Editing
* Gaussian Discriminant Variational Autoencoder (GDVAE): A Self-explainable Model with Counterfactual Explanations, The
* Gaussian error loss function for image smoothing
* Gaussian Frosting: Editable Complex Radiance Fields with Real-time Rendering
* Gaussian Grouping: Segment and Edit Anything in 3d Scenes
* Gaussian in the Wild: 3d Gaussian Splatting for Unconstrained Image Collections
* Gaussian Splatting on the Move: Blur and Rolling Shutter Compensation for Natural Camera Motion
* Gaussianformer: Scene as Gaussians for Vision-based 3d Semantic Occupancy Prediction
* Gaussianimage: 1000 Fps Image Representation and Compression by 2d Gaussian Splatting
* Gaussreg: Fast 3d Registration with Gaussian Splatting
* Gaze Target Detection Based on Head-local-global Coordination
* Gazexplain: Learning to Predict Natural Language Explanations of Visual Scanpaths
* GB-RVFL: Fusion of randomized neural network and granular ball computing
* GBMOD: A granular-ball mean-shift outlier detector
* GCN and Transformer complementary network for skeleton-based action recognition, A
* Genad: Generative End-to-end Autonomous Driving
* general albedo recovery approach for aerial photogrammetric images through inverse rendering, A
* General and Task-oriented Video Segmentation
* General Geometry-aware Weakly Supervised 3d Object Detection
* Generalad: Anomaly Detection Across Domains by Attending to Distorted Features
* Generalizable Facial Expression Recognition
* Generalizable Human Gaussians for Sparse View Synthesis
* Generalizable Structure-aware Inf: Biplanar-view CT Reconstruction via Disentangled Implicit Neural Field
* Generalizable Symbolic Optimizer Learning
* Generalization in deep learning-based aircraft classification for SAR imagery
* Generalized Coverage for More Robust Low-budget Active Learning
* Generalized Relevance Learning Grassmann Quantization
* Generalized spatio-temporal-spectral integrated fusion for soil moisture downscaling
* Generalizing to Unseen Domains via Text-guided Augmentation: A Training-free Approach
* Generatect: Text-conditional Generation of 3d Chest CT Volumes
* Generating 3d House Wireframes with Semantics
* Generating Human Interaction Motions in Scenes with Text Control
* Generating Physically Realistic and Directable Human Motions from Multi-modal Inputs
* Generative adversarial network for semi-supervised image captioning
* Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis
* Generative Self-supervised Learning for Medical Image Classification
* Generative Simplex Mapping: Non-Linear Endmember Extraction and Spectral Unmixing for Hyperspectral Imagery
* Generic Autoregressive Predictive Feedback Framework for Skeleton-based Action Recognition, A
* Genixer: Empowering Multimodal Large Language Model as a Powerful Data Generator
* GENQ: Quantization in Low Data Regimes with Generative Synthetic Data
* Genrc: Generative 3d Room Completion from Sparse Image Collections
* GenSumm: A Joint Framework for Multi-Task Tweet Classification and Summarization Using Sentiment Analysis and Generative Modelling
* Genview: Enhancing View Quality with Pretrained Generative Model for Self-supervised Learning
* Geocalib: Learning Single-image Calibration with Geometric Optimization
* Geogaussian: Geometry-aware Gaussian Splatting for Scene Rendering
* Geographic Information System-Based Model and Analytic Hierarchy Process for Wind Farm Site Selection in the Red Sea, A
* Geographically-Informed Modeling and Analysis of Platform Attitude Jitter in GF-7 Sub-Meter Stereo Mapping Satellite
* Geological Investigation of the Lunar Reiner Gamma Magnetic Anomaly Region, The
* Geometric Distortion Immunized Deep Watermarking Framework with Robustness Generalizability, A
* Geometry Fidelity for Spherical Images
* Geometrysticker: Enabling Ownership Claim of Recolorized Neural Radiance Fields
* Georefinenet: A Multistage Framework for Enhanced Cephalometric Landmark Detection in CBCT Images Using 3d Geometric Information
* Geospecific View Generation Geometry-Context Aware High-resolution Ground View Inference from Satellite Views
* Geowizard: Unleashing the Diffusion Priors for 3d Geometry Estimation from a Single Image
* Get Your Embedding Space in Order: Domain-adaptive Regression for Forest Monitoring
* Getting it Right: Improving Spatial Consistency in Text-to-image Models
* Ggrt: Towards Pose-free Generalizable 3d Gaussian Splatting in Real-time
* GhostingNet: A Novel Approach for Glass Surface Detection with Ghosting Cues
* GIS Plugin for the Assessment of Deformations in Existing Bridge Portfolios via MTInSAR Data, A
* GIS-Based Framework to Analyze the Behavior of Urban Greenery During Heatwaves Using Satellite Data, A
* Gist, Content, Target-Oriented: A 3-Level Human-Like Framework for Video Moment Retrieval
* Git: Towards Generalist Vision Transformer Through Universal Language Interface
* GIVT: Generative Infinite-vocabulary Transformers
* GKGNET: Group K-nearest Neighbor Based Graph Convolutional Network for Multi-label Image Recognition
* Glad: Towards Better Reconstruction with Global and Local Adaptive Diffusion Models for Unsupervised Anomaly Detection
* Glare: Low Light Image Enhancement via Generative Latent Feature Based Codebook Retrieval
* GLGFN: Global-Local Grafting Fusion Network for High-Resolution Image Deraining
* Global and Local Contrastive Learning for Self-Supervised Skeleton-Based Action Recognition
* Global Assessment of Mesoscale Eddies with TOEddies: Comparison Between Multiple Datasets and Colocation with In Situ Measurements
* Global Counterfactual Directions
* Global perspectives on sand dune patterns: Scale-adaptable classification using Landsat imagery and deep learning strategies
* global reweighting approach for cross-domain semantic segmentation, A
* Global Semantic Localization from Abstract Ellipse-Ellipsoid Model and Object-Level Instance Topology
* Global Structure-from-motion Revisited
* Global-aware Fragment Representation Aggregation Network for image-text retrieval
* Global-local Collaborative Inference with Llm for Lidar-based Open-vocabulary Detection
* Global-to-pixel Regression for Human Mesh Recovery
* Globalpointer: Large-scale Plane Adjustment with Bi-convex Relaxation
* Glyph-BYT5: A Customized Text Encoder for Accurate Visual Text Rendering
* GMM-IKRS: Gaussian Mixture Models for Interpretable Keypoint Refinement and Scoring
* GMT: Enhancing Generalizable Neural Rendering via Geometry-driven Multi-reference Texture Transfer
* GOEmbed: Gradient Origin Embeddings for Representation Agnostic 3D Feature Learning
* Goldfish: Vision-language Understanding of Arbitrarily Long Videos
* Good Teachers Explain: Explanation-enhanced Knowledge Distillation
* GPNF: A Point Cloud Registration Framework Using Sharp Global Linear Attention Prior and Neighborhood Filtering Strategy
* Gpsformer: A Global Perception and Local Structure Fitting-based Transformer for Point Cloud Understanding
* GRA: Detecting Oriented Objects Through Group-wise Rotating and Attention
* Grace: Graph-based Contextual Debiasing for Fair Visual Question Answering
* Gradient-aware for Class-imbalanced Semi-supervised Medical Image Segmentation
* Gradient-regularized Out-of-distribution Detection
* Gradual Adversarial Training Method for Semantic Segmentation, A
* GRAPE: Generalizable and Robust Multi-view Facial Capture
* Graph Cut-guided Maximal Coding Rate Reduction for Learning Image Embedding and Clustering
* Graph Neural Network Causal Explanation via Neural Causal Models
* Graph-based Approach for Category-agnostic Pose Estimation, A
* GraphAVO: Self-Supervised Visual Odometry Based on Graph-Assisted Geometric Consistency
* Graphbev: Towards Robust BEV Feature Alignment for Multi-modal 3d Object Detection
* Graspxl: Generating Grasping Motions for Diverse Objects at Scale
* Gravity Predictions in Data-Missing Areas Using Machine Learning Methods
* Gravity-Aligned Rotation Averaging with Circular Regression
* Grefel: Geometry-aware Reliable Facial Expression Learning Under Bias and Imbalanced Data Distribution
* GREIT-HRNET: Grouped Lightweight High-resolution Network for Human Pose Estimation
* Grid-attention: Enhancing Computational Efficiency of Large Vision Models Without Fine-tuning
* Grids: Grouped Multiple-Degradation Restoration with Image Degradation Similarity
* Griffon: Spelling Out All Object Locations at Any Granularity with Large Language Models
* Grit: A Generative Region-to-text Transformer for Object Understanding
* GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation
* GROCO: Ground Constraint for Metric Self-supervised Monocular Depth
* GROMA: Localized Visual Tokenization for Grounding Multimodal Large Language Models
* Ground-Penetrating Radar Image Matching Method Based on Central Dense Structure Context Features, The
* Grounding Dino: Marrying Dino with Grounded Pre-training for Open-set Object Detection
* Grounding Image Matching in 3d with Mast3r
* Grounding Language Models for Visual Entity Recognition
* Groundup: Rapid Sketch-based 3d City Massing
* Group Testing for Accurate and Efficient Range-based Near Neighbor Search for Plagiarism Detection
* Groupdiff: Diffusion-based Group Portrait Editing
* Gs-lrm: Large Reconstruction Model for 3d Gaussian Splatting
* GS-Pose: Category-level Object Pose Estimation via Geometric and Semantic Correspondence
* GS-SFS: Joint Gaussian Splatting and Shape-From-Silhouette for Multiple Human Reconstruction in Large-Scale Sports Scenes
* Gs2mesh: Surface Reconstruction from Gaussian Splatting via Novel Stereo Views
* Gsd: View-guided Gaussian Splatting Diffusion for 3d Reconstruction
* GSMNET: Towards Long-term Trajectory Prediction by Integrating Multi-scale Information
* Gtms: A Gradient-driven Tree-guided Mask-free Referring Image Segmentation Method
* GTP-4O: Modality-prompted Heterogeneous Graph Learning for Omni-modal Biomedical Representation
* GTPT: Group-based Token Pruning Transformer for Efficient Human Pose Estimation
* Guide-and-rescale: Self-guidance Mechanism for Effective Tuning-free Real Image Editing
* Guide3d: A Bi-planar X-ray Dataset for 3d Shape Reconstruction
* Gvgen: Text-to-3d Generation with Volumetric Representation
* G^2fR: Frequency Regularization in Grid-Based Feature Encoding Neural Radiance Fields
* H-v2x: A Large Scale Highway Dataset for BEV Perception
* Hac: Hash-grid Assisted Context for 3d Gaussian Splatting Compression
* HAD-Net: An attention U-based network with hyper-scale shifted aggregating and max-diagonal sampling for medical image segmentation
* Haha: Highly Articulated Gaussian Human Avatars with Textured Mesh Prior
* Haloquest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning
* Handdagt: A Denoising Adaptive Graph Transformer for 3d Hand Pose Estimation
* Handdgp: Camera-space Hand Mesh Prediction with Differentiable Global Positioning
* Handling the Non-smooth Challenge in Tensor SVD: A Multi-objective Tensor Recovery Framework
* Hard Positive Truth About Vision-language Compositionality, The
* Hard: Hardware-aware Lightweight Real-time Semantic Segmentation Model Deployable from Edge to GPU
* Harivo: Harnessing Text-to-image Models for Video Generation
* Harmonizing Knowledge Transfer in Neural Network with Unified Distillation
* Harmony in diversity: Content cleansing change detection framework for very-high-resolution remote-sensing images
* Harnessing Text-to-image Diffusion Models for Category-agnostic Pose Estimation
* Hat: History-augmented Anchor Transformer for Online Temporal Action Localization
* HBANet: A hybrid boundary-aware attention network for infrared and visible image fusion
* HDNEXT: Hybrid Dynamic Mednext with Level Set Regularization for Medical Image Segmentation
* HDR reconstruction from a single exposure LDR using texture and structure dual-stream generation
* HDRSA-Net: Hybrid dynamic residual self-attention network for SAR-assisted optical image cloud and shadow removal
* Head360: Learning a Parametric 3d Full-Head for Free-view Synthesis in 360
* HeadGas: Real-time Animatable Head Avatars via 3d Gaussian Splatting
* Headstudio: Text to Animatable Head Avatars with 3d Gaussian Splatting
* Height Measurement for Meter-Wave MIMO Radar Based on Sparse Array Under Multipath Interference
* Henet: Hybrid Encoding for End-to-end Multi-task 3d Perception from Multi-view Cameras
* Hergen: Elevating Radiology Report Generation with Longitudinal Data
* Hetecooper: Feature Collaboration Graph for Heterogeneous Collaborative Perception
* Heterogeneous Graph Learning for Scene Graph Prediction in 3d Point Clouds
* HGL: Hierarchical Geometry Learning for Test-time Adaptation in 3d Point Cloud Segmentation
* Hi-NeRF: Hybridizing 2d Inpainting with Neural Radiance Fields for 3d Scene Inpainting
* HiCervix: An Extensive Hierarchical Dataset and Benchmark for Cervical Cytology Classification
* Hidiffusion: Unlocking Higher-resolution Creativity and Efficiency in Pretrained Diffusion Models
* Hiding Imperceptible Noise in Curvature-aware Patches for 3d Point Cloud Attack
* HIEI: A Universal Framework for Generating High-quality Emerging Images from Natural Images
* Hierarchical Aggregated Graph Neural Network for Skeleton-Based Action Recognition
* Hierarchical Conditioning of Diffusion Models Using Tree-of-life for Studying Species Evolution
* Hierarchical Gaussian Mixture Normalizing Flow Modeling for Unified Anomaly Detection
* Hierarchical Glocal Attention Pooling for Graph Classification
* Hierarchical Prompting for Diffusion Classifiers
* Hierarchical Separable Video Transformer for Snapshot Compressive Imaging
* Hierarchical Shared Encoder With Task-Specific Transformer Layer Selection for Emotion-Cause Pair Extraction
* Hierarchical Spectral-Spatial Transformer for Hyperspectral and Multispectral Image Fusion
* Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion
* Hierarchical Unsupervised Relation Distillation for Source Free Domain Adaptation
* Hierarchically Structured Neural Bones for Reconstructing Animatable Objects from Casual Videos
* HIFI-123: Towards High-fidelity One Image to 3d Content Generation
* HIFI-Score: Fine-grained Image Description Evaluation with Hierarchical Parsing Graphs
* High Compression Efficiency Hardware Encoder for Intra and Inter Coding With 4K@30fps Throughput, A
* High efficiency deep image compression via channel-wise scale adaptive latent representation learning
* High-fidelity 3d Textured Shapes Generation by Sparse Encoding and Adversarial Decoding
* High-fidelity Modeling of Generalizable Wrinkle Deformation
* High-precision Self-supervised Monocular Depth Estimation with Rich-resource Prior
* High-Quality Damaged Building Instance Segmentation Based on Improved Mask Transfiner Using Post-Earthquake UAS Imagery: A Case Study of the Luding Ms 6.8 Earthquake in China
* High-quality Mesh Blendshape Generation from Face Videos via Neural Inverse Rendering
* High-quality Robust Diffusion Framework for Corrupted Dataset, A
* High-quality Visually-guided Sound Separation from Diverse Categories
* High-resolution and Few-shot View Synthesis from Asymmetric Dual-lens Inputs
* High-resolution mapping of grassland canopy cover in China through the integration of extensive drone imagery and satellite data
* High-Resolution Remotely Sensed Evidence Shows Solar Thermal Power Plant Increases Grassland Growth on the Tibetan Plateau
* High-Resolution Spaceborne SAR Geolocation Accuracy Analysis and Error Correction
* highly efficient index for robust mapping of tidal flats from sentinel-2 images directly, A
* Highly realistic synthetic dataset for pixel-level DensePose estimation via diffusion model
* HIMO: A New Benchmark for Full-body Human Interacting with Multiple Objects
* histogram-based approach to calculate graph similarity using graph neural networks, A
* HIT-SR: Hierarchical Transformer for Efficient Image Super-resolution
* HMGS: Hybrid Model of Gaussian Splatting for Enhancing 3d Reconstruction with Reflections
* Ho-gaussian: Hybrid Optimization of 3d Gaussian Splatting for Urban Scenes
* HOI-V: One-stage human-object interaction detection based on multi-feature fusion in videos
* Holoadmm: High-quality Holographic Complex Field Recovery
* Holodepth: Programmable Depth-varying Projection via Computer-generated Holography
* Homogeneous tokenizer matters: Homogeneous visual tokenizer for remote sensing image understanding
* How Far Can a 1-pixel Camera Go? Solving Vision Tasks Using Photoreceptors and Computationally Designed Visual Morphology
* How Many Are in This Image A Safety Evaluation Benchmark for Vision LLMs
* How Severe Was the 2022 Flash Drought in the Yangtze River Basin?
* How to Train the Teacher Model for Effective Knowledge Distillation
* How Video Meetings Change Your Expression
* Howtocaption: Prompting LLMs to Transform Video Annotations at Scale
* HPE-LI: Wifi-enabled Lightweight Dual Selective Kernel Convolution for Human Pose Estimation
* HPFF: Hierarchical Locally Supervised Learning with Patch Feature Fusion
* HSR: Holistic 3d Human-scene Reconstruction from Monocular Videos
* HSRA-Net: Intelligent Detection Network of Anomaly Monitoring Data in High-Speed Railway
* HSSHG: Heuristic Semantics-Constrained Spatio-Temporal Heterogeneous Graph for VideoQA
* HT-SSPG: Hierarchical Transformers for Semantic Surface Point Generation in 3d Object Detection
* HTCSigNet: A Hybrid Transformer and Convolution Signature Network for offline signature verification
* Human Hair Reconstruction with Strand-Aligned 3D Gaussians
* Human Motion Forecasting in Dynamic Domain Shifts: A Homeostatic Continual Test-time Adaptation Framework
* Human Pose Recognition via Occlusion-preserving Abstract Images
* Human-in-the-loop Visual Re-ID for Population Size Estimation
* Human-object interaction detection algorithm based on graph structure and improved cascade pyramid network
* Humanrefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-reversible Guidance
* HUMOS: Human Motion Model Conditioned on Body Shape
* Hvclip: High-dimensional Vector in Clip for Unsupervised Domain Adaptation
* Hybrid and Non-minimal Planar Motion Estimation from Point Correspondences
* Hybrid GRU-Random Forest Model for Accurate Atmospheric Duct Detection with Incomplete Sounding Data
* Hybrid Video Diffusion Models with 2d Triplane and 3d Wavelet Representation
* Hybridbooth: Hybrid Prompt Inversion for Efficient Subject-driven Generation
* Hydra: A Hyper Agent for Dynamic Compositional Visual Reasoning
* HYPE: Hyperbolic Entailment Filtering for Underspecified Images and Texts
* HyperDehazing: A hyperspectral image dehazing benchmark dataset and a deep learning model for haze removal
* Hyperion: A Fast, Versatile Symbolic Gaussian Belief Propagation Framework for Continuous-time SLAM
* Hypernetworks for Generalizable BRDF Representation
* Hyperspacex: Radial and Angular Exploration of Hyperspherical Dimensions
* Hyperspectral image classification with token fusion on GPU
* Hytas: A Hyperspectral Image Transformer Architecture Search Benchmark and Analysis
* I Can't Believe It's Not Scene Flow!
* I-MEDSAM: Implicit Medical Image Segmentation with Segment Anything
* IAM-VFI: Interpolate Any Motion for Video Frame Interpolation with Motion Complexity Map
* Iddiffuse: Dual-conditional Diffusion Model for Enhanced Facial Image Anonymization
* Idea2img: Iterative Self-refinement with GPT-4V for Automatic Image Design and Generation
* Idempotent Unsupervised Representation Learning for Skeleton-based Action Recognition
* Identification and Causes of Neighborhood Commercial Areas: Focusing on the Development of Daily Life Circles in Urban Built Environments
* Identification of Global Extended Pseudo Invariant Calibration Sites (EPICS) and Their Validation Using Radiometric Calibration Network (RadCalNet)
* Identification of Key Determinants Influencing Spatiotemporal Heterogeneity of Urban Resilience
* Identity-consistent Diffusion Network for Grading Knee Osteoarthritis Progression in Radiographic Imaging
* Idling Neurons, Appropriately Lenient Workload During Fine-tuning Leads to Better Generalization
* Idol: Unified Dual-modal Latent Diffusion for Human-centric Joint Video-depth Generation
* IEIRNet: Inconsistency Exploiting Based Identity Rectification for Face Forgery Detection
* IFKMHC: Implicit Fuzzy K-Means Model for High-Dimensional Data Clustering
* IFTR: An Instance-level Fusion Transformer for Visual Collaborative Perception
* IG Captioner: Information Gain Captioners Are Strong Zero-shot Classifiers
* IGNORE: Information Gap-based False Negative Loss Rejection for Single Positive Multi-label Learning
* iHuman: Instant Animatable Digital Humans From Monocular Videos
* Image Compression for Machine and Human Vision with Spatial-frequency Adaptation
* Image compressive sensing reconstruction via nonlocal low-rank residual-based ADMM framework
* Image Demoiréing in Raw and srgb Domains
* Image Deraining with Frequency-enhanced State Space Model
* Image Fusion Algorithm for Sustainable Development Goals Satellite-1 Night-Time Light Images Based on Optimized Image Stretching and Dual-Domain Fusion, An
* Image is Worth 1/2 Tokens After Layer 2: Plug-and-play Inference Acceleration for Large Vision-language Models, An
* Image Manipulation Detection with Implicit Neural Representation and Limited Supervision
* Image Shadow Removal Via Multi-Scale Deep Retinex Decomposition
* Image-Adaptive 3D Lookup Tables for Real-Time Image Enhancement with Bilateral Grids
* Image-feature Weak-to-strong Consistency: An Enhanced Paradigm for Semi-supervised Learning
* Image-to-lidar Relational Distillation for Autonomous Driving Data
* Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models
* Imagination-Augmented Hierarchical Reinforcement Learning for Safe and Interactive Autonomous Driving in Urban Environments
* Imaging Interiors: An Implicit Solution to Electromagnetic Inverse Scattering Problems
* Imaging with Confidence: Uncertainty Quantification for High-dimensional Undersampled MR Images
* iMatching: Imperative Correspondence Learning
* IMMA: Immunizing Text-to-image Models Against Malicious Adaptation
* Impact of Arable Land Abandonment on Crop Production Losses in Ukraine During the Armed Conflict
* Impact of Firework Ban Relaxation on Variations in SO2 Emissions in China During the 2023 Chinese New Year, The
* Impact of Latency and Continuity of GNSS Products on Filter-Based Real-Time LEO Satellite Clock Determination
* Impacts of Spatial and Temporal Resolution on Remotely Sensed Corn and Soybean Emergence Detection
* Impacts of Storm Zyprian on Middle and Upper Atmosphere Observed from Central European Stations
* Impacts of Typhoons on the Evolution of Surface Anticyclonic Eddies into Subsurface Anticyclonic Eddies in the Northwestern Subtropical Pacific Ocean
* Implications of Plantation Forest-Driven Land Use/Land Cover Changes for Ecosystem Service Values in the Northwestern Highlands of Ethiopia, The
* Implicit Concept Removal of Diffusion Models
* Implicit Filtering for Learning Neural Signed Distance Functions from 3d Point Clouds
* Implicit Neural Models to Extract Heart Rate from Video
* Implicit Steganography Beyond the Constraints of Modality
* Implicit Style-Content Separation Using B-lora
* Improve Model Robustness in Less Time Than It Takes to Drink A Cup of Coffee with Plug-and-play Robustness Plugins
* Improved Early-Stage Maize Row Detection Using Unmanned Aerial Vehicle Imagery
* Improved Generative Adversarial Network for Generating Multi-Scale Electronic Map Tiles Considering Cartographic Requirements, An
* Improved multi-focus image fusion using online convolutional sparse coding based on sample-dependent dictionary
* Improved Phase Gradient Autofocus Method for Multi-Baseline Circular Synthetic Aperture Radar Three-Dimensional Imaging
* Improved Polar Current Shell Algorithm for Ocean Current Retrieval from X-Band Radar Data
* Improvement of Coal Mining-Induced Subsidence-Affected (MISA) Zone Irregular Boundary Delineation by MT-InSAR Techniques, UAV Photogrammetry, and Field Investigation
* Improving 2d Feature Representations by 3d-aware Fine-tuning
* Improving 3d Semi-supervised Learning by Effectively Utilizing All Unlabelled Data
* Improving Adversarial Transferability via Model Alignment
* Improving Agent Behaviors with Rl Fine-tuning for Autonomous Driving
* Improving crop type mapping by integrating LSTM with temporal random masking and pixel-set spatial information
* Improving Diffusion Models for Authentic Virtual Try-on in the Wild
* Improving Domain Generalization in Self-supervised Monocular Depth Estimation via Stabilized Adversarial Training
* Improving drone-based uncalibrated estimates of wheat canopy temperature in plot experiments by accounting for confounding factors in a multi-view analysis
* Improving Feature Stability During Upsampling: Spectral Artifacts and the Importance of Spatial Context
* Improving Geo-diversity of Generated Images with Contextualized Vendi Score Guidance
* Improving Hyperbolic Representations via Gromov-Wasserstein Regularization
* Improving Image Clustering with Artifacts Attenuation via Inference-time Attention Engineering
* Improving Image Synthesis with Diffusion-negative Sampling
* Improving Intervention Efficacy via Concept Realignment in Concept Bottleneck Models
* Improving Knowledge Distillation via Regularizing Feature Direction and Norm
* Improving Medical Multi-modal Contrastive Learning with Expert Annotations
* Improving Misaligned Multi-Modality Image Fusion With One-Stage Progressive Dense Registration
* Improving Network Interpretability via Explanation Consistency Evaluation
* Improving Neural Surface Reconstruction with Feature Priors from Multi-view Images
* Improving Point-based Crowd Counting and Localization Based on Auxiliary Point Guidance
* Improving Representation With Hierarchical Contrastive Learning for Emotion-Cause Pair Extraction
* Improving Robustness to Model Inversion Attacks via Sparse Coding Architectures
* Improving stability and performance of spiking neural networks through enhancing temporal consistency
* Improving Text-guided Object Inpainting with Semantic Pre-inpainting
* Improving the sparse coding model via hybrid Gaussian priors
* Improving Unsupervised Domain Adaptation: A Pseudo-candidate Set Approach
* Improving Video Segmentation via Dynamic Anchor Queries
* Improving Virtual Try-on with Garment-focused Diffusion Models
* Improving Vision and Language Concepts Understanding with Multimodal Counterfactual Samples
* Improving Zero-shot Generalization for CLIP with Variational Adapter
* Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation
* In Defense of Lazy Visual Grounding for Open-vocabulary Semantic Segmentation
* Incorporating Forest Mapping-Related Uncertainty into the Error Propagation of Wall-to-Wall Biomass Maps: A General Approach for Large and Small Areas
* Incremental feature selection: Parallel approach with local neighborhood rough sets and composite entropy
* Incremental Unified Framework for Small Defect Inspection, An
* inemo: Incremental Neural Mesh Models for Robust Class-incremental Learning
* inf-brush: Controllable Large Image Synthesis with Diffusion Models in Infinite Dimensions
* INF-DIT: Upsampling Any-resolution Image with Memory-efficient Diffusion Transformer
* Infinite-ID: Identity-preserved Personalization via ID-Semantics Decoupling Paradigm
* Influence of Land Use and Land Cover Changes and Precipitation Patterns on Groundwater Storage in the Mississippi River Watershed: Insights from GRACE Satellite Data
* Infmae: A Foundation Model in the Infrared Modality
* InfoGCN++: Learning Representation by Predicting the Future for Online Skeleton-Based Action Recognition
* Infonorm: Mutual Information Shaping of Normals for Sparse-view Reconstruction
* Information Bottleneck Based Data Correction in Continual Learning
* Information Theoretical View for Out-of-distribution Detection, An
* Infrared and visible image fusion based on hybrid multi-scale decomposition and adaptive contrast enhancement
* Initial Design for Next-Generation BeiDou Integrity Subsystem: Space-Ground Integrated Integrity Monitoring
* Innovative Crack Detection Algorithm Based on Efficient Feature Fusion and Progressive Transfer Learning, An
* Innovative multi-stage matching for counting anything
* Insect Identification in the Wild: The Ami Dataset
* Insmapper: Exploring Inner-instance Information for Vectorized HD Mapping
* Instance-dependent Noise Refinement in Segment Anything Model for Weakly Supervised Object Detection
* Instance-dependent Noisy-label Learning with Graphical Model Based Noise-rate Estimation
* Instant 3d Human Avatar Generation Using Image Diffusion Models
* Instant pose extraction based on mask transformer for occluded person re-identification
* Instant Uncertainty Calibration of NeRFs Using a Meta-calibrator
* Instantgeoavatar: Effective Geometry and Appearance Modeling of Animatable Avatars from Monocular Video
* InstaStyle: Inversion Noise of a Stylized Image is Secretly a Style Adviser
* Instructgie: Towards Generalizable Image Editing
* Instruction Tuning-free Visual Token Complement for Multimodal LLMs
* Instructir: High-quality Image Restoration Following Human Instructions
* Integer-valued Training and Spike-driven Inference Spiking Neural Network for High-performance and Energy-efficient Object Detection
* Integrated Assessment of Security Risk Considering Police Resources
* Integrated Back of Queue Estimation and Vehicle Trajectory Optimization Considering Uncertainty in Traffic Signal Timings
* Integrated Intelligent Control Systems for Eco and Safe Driving in Autonomous Vehicles
* Integrating Markov Blanket Discovery Into Causal Representation Learning for Domain Generalization
* Integrating synthetic datasets with CLIP semantic insights for single image localization advancements
* Integration of Global and Local Representations for Fine-grained Cross-modal Alignment
* Integration of Macroscopic Traffic Optimization Control and Microscopic Traffic Flow Model for Mixed Traffic: A Cyber-Physical System Perspective, The
* Intelligent Caching Based on Popular Content in Vehicular Networks: A Deep Transfer Learning Approach
* Inter-class Topology Alignment for Efficient Black-box Substitute Attacks
* Inter-Intra Cluster Reorganization for Unsupervised Vehicle Re-Identification
* Interaction-centric Spatio-temporal Context Reasoning for Multi-person Video HOI Recognition
* Interaction-guided Two-branch Image Dehazing Network
* Interactive 3d Object Detection with Prompts
* Interfusion: Text-driven Generation of 3d Human-object Interaction
* Interleaving One-class and Weakly-supervised Models with Adaptive Thresholding for Unsupervised Video Anomaly Detection
* Internvideo2: Scaling Foundation Models for Multimodal Video Understanding
* Interpretability-guided Test-time Adversarial Defense
* Intra: Interaction Relationship-aware Weakly Supervised Affordance Grounding
* Intrinsic Single-Image HDR Reconstruction
* Intrinsicanything: Learning Diffusion Priors for Inverse Rendering Under Unknown Illumination
* Introducing Routing Functions to Vision-language Parameter-efficient Fine-tuning with Low-rank Bottlenecks
* Invertible Neural Warp for NeRF
* Invertible Secret Image Sharing With Authentication for Embedding Color Palette Image Into True Color Image
* Investigating Style Similarity in Diffusion Models
* Investigating Tropical Cyclone Warm Core and Boundary Layer Structures with Constellation Observing System for Meteorology, Ionosphere, and Climate 2 Radio Occultation Data
* Investigation of the Effect of Smart Cockpit Layout on Distracted Driving Behavior Based on Real Road Experiments, An
* Invisible backdoor attack with attention and steganography
* Irgen: Generative Modeling for Image Retrieval
* IRSAM: Advancing Segment Anything Model for Infrared Small Target Detection
* Is Retain Set All You Need in Machine Unlearning? Restoring Performance of Unlearned Models with Out-of-distribution Images
* Is User Feedback Always Informative? Retrieval Latent Defending for Semi-supervised Domain Adaptation Without Source Data
* IS-MAP: Neural Implicit Mapping and Positioning for Structural Environments
* Isomorphic Pruning for Vision Models
* It's Just Another Day: Unique Video Captioning by Discriminitive Prompting
* Iterative Ensemble Training with Anti-gradient Control for Mitigating Memorization in Diffusion Models
* Iterative Separation of Blended Seismic Data in Shot Domain Using Deep Learning
* Ittakestwo: Leveraging Peer Representations for Semi-supervised Lidar Semantic Segmentation
* IVTP: Instruction-guided Visual Token Pruning for Large Vision-language Models
* I^2-SLAM: Inverting Imaging Process for Robust Photorealistic Dense SLAM
* JDT3D: Addressing the Gaps in Lidar-based Tracking-by-attention
* Joint block adjustment and variational optimization for global and local radiometric normalization toward multiple remote sensing image mosaicking
* Joint Classification of Hyperspectral and LiDAR Data via Multiprobability Decision Fusion Method
* Joint Image Super-resolution and Low-light Enhancement in the Dark
* Joint Intra-view and Inter-view Enhanced Tensor Low-rank Induced Affinity Graph Learning
* Joint Optimization of Latency and Energy Consumption via Deep Reinforcement Learning for Proximity Detection in Road Networks
* Joint Path and Pick-Up Design for Connectivity-Aware UAV-Enabled Multi-Package Delivery
* Joint Regional Uptake Quantification of Thorium-227 and Radium-223 Using a Multiple-Energy-Window Projection-Domain Quantitative SPECT Method
* Joint RGB-Spectral Decomposition Model Guided Image Enhancement in Mobile Photography
* Joint Sparse Local Linear Discriminant Analysis for Feature Dimensionality Reduction of Hyperspectral Images
* Joint utilization of positive and negative pseudo-labels in semi-supervised facial expression recognition
* Jointdreamer: Ensuring Geometry Consistency and Text Congruence in Text-to-3d Generation via Joint Score Distillation
* Jointly stochastic fully symmetric interpolatory rules and local approximation for scalable Gaussian process regression
* Just a Hint: Point-supervised Camouflaged Object Detection
* Kalman-inspired Feature Propagation for Video Face Super-resolution
* KDProR: A Knowledge-decoupling Probabilistic Framework for Video-Text Retrieval
* Kernel Diffusion: An Alternate Approach to Blind Deconvolution
* Keypoint Promptable Re-identification
* Keypointdetr: An End-to-end 3d Keypoint Detector
* Kfd-NeRF: Rethinking Dynamic NeRF with Kalman Filter
* Khmerst: A Low-resource Khmer Scene Text Detection and Recognition Benchmark
* Kinetic Typography Diffusion Model
* Kmtalk: Speech-driven 3d Facial Animation with Key Motion Embedding
* Knowledge Augmented Relation Inference for Group Activity Recognition
* Knowledge Consistency Distillation for Weakly Supervised One Step Person Search
* Knowledge Distillation Dealing with Sample-wise Long-tail Problem
* Knowledge Transfer with Simulated Inter-image Erasing for Weakly Supervised Semantic Segmentation
* Knowledge-enhanced Visual-language Pretraining for Computational Pathology
* KOLOMVERSE: Korea Open Large-Scale Image Dataset for Object Detection in the Maritime Universe
* L-Differ: Single Image Reflection Removal with Language-based Diffusion Model
* L2T-DFM: Learning to Teach with Dynamic Fused Metric
* Label-anticipated Event Disentanglement for Audio-visual Video Parsing
* Label-free Neural Semantic Image Synthesis
* Label-noise learning via uncertainty-aware neighborhood sample selection
* Labeldistill: Label-guided Cross-modal Knowledge Distillation for Camera-based 3d Object Detection
* Labeled Data Selection for Category Discovery
* Lacustrine Wetlands Landscape Simulation and Multi-Scenario Prediction Based on the Patch-Generating Land-Use Simulation Model: A Case Study on Shengjin Lake Reserve, China
* Lagrangian Hashing for Compressed Neural Field Representations
* LAMI-DETR: Open-vocabulary Detection with Language Model Instruction
* Landslide Susceptibility Mapping Based on Ensemble Learning in the Jiuzhaigou Region, Sichuan, China
* Lane Detection by Variational Auto-Encoder With Normalizing Flow for Autonomous Driving
* Lane Graph as Path: Continuity-preserving Path-wise Modeling for Online Lane Graph Construction
* Language-assisted Skeleton Action Understanding for Skeleton-based Temporal Action Segmentation
* Language-driven 6-dof Grasp Detection Using Negative Prompt Guidance
* Language-driven Physics-based Scene Synthesis and Editing via Feature Splatting
* Language-guided Joint Audio-visual Editing via One-shot Adaptation
* Lapose: Laplacian Mixture Shape Modeling for RGB-based Category-level Object Pose Estimation
* LAPT: Label-driven Automated Prompt Tuning for OOD Detection with Vision-language Models
* LARA: Efficient Large-baseline Radiance Fields
* large corpus for the recognition of Greek Sign Language gestures, A
* Large Language Model-Driven Structured Output: A Comprehensive Benchmark and Spatial Data Generation Framework
* Large Motion Model for Unified Multi-modal Motion Generation
* Large Scale Pavement Crack Evaluation Through a Novel Spatial Machine Learning Approach Considering Geocomplexity
* Large-Kernel Central Block Masked Convolution and Channel Attention-Based Reconstruction Network for Anomaly Detection of High-Resolution Hyperspectral Imagery
* Large-scale Multi-hypotheses Cell Tracking Using Ultrametric Contours Maps
* Large-scale Reinforcement Learning for Diffusion Models
* Lass3d: Language-assisted Semi-supervised 3d Semantic Segmentation with Progressive Unreliable Data Exploitation
* Last Mile: A Novel, Hotspot-Based Distributed Path-Sharing Network for Food Deliveries
* Latency Attack Resilience in Object Detectors: Insights from Computing Architecture
* Latent Diffusion Enhanced Rectangle Transformer for Hyperspectral Image Restoration
* Latent Diffusion Prior Enhanced Deep Unfolding for Snapshot Spectral Compressive Imaging
* Latent Guard: A Safety Framework for Text-to-image Generation
* Latent-INR: A Flexible Framework for Implicit Representations of Videos with Discriminative Semantics
* Latenteditor: Text Driven Local Editing of 3d Scenes
* LatentSplat: Autoencoding Variational Gaussians for Fast Generalizable 3d Reconstruction
* Latte3d: Large-scale Amortized Text-to-enhanced3d Synthesis
* LAWA: Using Latent Space for In-Generation Image Watermarking
* Layer-wise Relevance Propagation with Conservation Property for Resnet
* Layerdiff: Exploring Text-guided Multi-layered Composable Image Synthesis via Layer-collaborative Diffusion Model
* Layered Rendering Diffusion Model for Controllable Zero-Shot Image Synthesis
* Layeredflow: A Real-world Benchmark for Non-lambertian Multi-Layer Optical Flow
* Layout-corrector: Alleviating Layout Sticking Phenomenon in Discrete Diffusion Model
* LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer
* Layoutflow: Flow Matching for Layout Generation
* Lazy Diffusion Transformer for Interactive Image Editing
* LCGNet: Local Sequential Feature Coupling Global Representation Learning for Functional Connectivity Network Analysis With fMRI
* LCM-Lookahead for Encoder-based Text-to-image Personalization
* LCM: Log Conformal Maps for Robust Representation Learning to Mitigate Perspective Distortion
* LCMA-Net: A light cross-modal attention network for streamer re-identification in live video
* LCSL: Long-Tailed Classification via Self-Labeling
* Learn from the Learnt: Source-free Active Domain Adaptation via Contrastive Sampling and Visual Persistence
* Learn to Memorize and to Forget: A Continual Learning Perspective of Dynamic Slam
* Learn to Optimize Denoising Scores: A Unified and Improved Diffusion Prior for 3d Generation
* Learn to Preserve and Diversify: Parameter-efficient Group with Orthogonal Regularization for Domain Generalization
* Learned Good Features to Track
* Learned HDR Image Compression for Perceptually Optimal Storage and Display
* Learned Neural Physics Simulation for Articulated 3d Human Pose Reconstruction
* Learned Rate Control for Frame-level Adaptive Neural Video Compression via Dynamic Neural Network
* Learning 2d Human Poses for Better 3d Lifting via Multi-model 3d-guidance
* Learning 3d Geometry and Feature Consistent Gaussian Splatting for Object Removal
* Learning 3d Point Cloud Registration as a Single Optimization Problem
* Learning 3d-aware GANs from Unposed Images with Template Feature Field
* Learning a Cross-Modality Anomaly Detector for Remote Sensing Imagery
* Learning a Dynamic Privacy-preserving Camera Robust to Inversion Attacks
* Learning accurate and enriched features for stereo image super-resolution
* Learning Anomalies with Normality Prior for Unsupervised Video Anomaly Detection
* Learning by Aligning 2D Skeleton Sequences and Multi-Modality Fusion
* Learning Camouflaged Object Detection from Noisy Pseudo Label
* Learning Chain of Counterfactual Thought for Bias-robust Vision-language Reasoning
* Learning Classwise Untangled Continuums for Conditional Normalizing Flows
* Learning Complementary Maps for Light Field Salient Object Detection
* Learning Cross-hand Policies of High-dof Reaching and Grasping
* Learning Differentially Private Diffusion Models via Stochastic Adversarial Distillation
* Learning Diffusion Models for Multi-view Anomaly Detection
* Learning Discriminative Motion Models for Multiple Object Tracking
* Learning Dual Hierarchical Representation for 3d Surface Reconstruction
* Learning Dual-level Deformable Implicit Representation for Real-world Scale Arbitrary Super-resolution
* Learning Equilibrium Transformation for Gamut Expansion and Color Restoration
* Learning Exhaustive Correlation for Spectral Super-resolution: Where Spatial-spectral Attention Meets Linear Dependence
* Learning from the Web: Language Drives Weakly-supervised Incremental Learning for Semantic Segmentation
* Learning High-resolution Vector Representation from Multi-camera Images for 3d Object Detection
* Learning Interval-aware Embedding for Macro and Micro-expression Spotting
* Learning Local Pattern Modularization for Point Cloud Reconstruction from Unseen Classes
* Learning Local-Global Representation for Scribble-Based RGB-D Salient Object Detection via Transformer
* Learning Low-Rank Representation Approximation for Few-Shot Deep Subspace Clustering
* Learning Modality-agnostic Representation for Semantic Segmentation from Any Modalities
* Learning Multimodal Latent Generative Models with Energy-based Prior
* Learning Natural Consistency Representation for Face Forgery Video Detection
* Learning Neural Deformation Representation for 4d Dynamic Shape Generation
* Learning Neural Radiance Field from Quasi-uniformly Sampled Spherical Image for Immersive Virtual Reality
* Learning Neural Volumetric Pose Features for Camera Localization
* Learning Non-Linear Invariants for Unsupervised Out-of-distribution Detection
* Learning Non-uniform Step Sizes for Neural Network Quantization
* Learning Omni-Dimensional Spatio-Temporal Dependencies for Millimeter-Wave Radar Perception
* Learning Pseudo 3d Guidance for View-consistent Texturing with 2d Diffusion
* Learning Quantized Adaptive Conditions for Diffusion Models
* Learning Representation for Multitask Learning Through Self-supervised Auxiliary Learning
* Learning Representations from Foundation Models for Domain Generalized Stereo Matching
* Learning Representations from Foundation Models for Domain Generalized Stereo Matching
* Learning Representations of Satellite Images From Metadata Supervision
* Learning Scalable Model Soup on a Single Gpu: An Efficient Subspace Training Strategy
* Learning Semantic Latent Directions for Accurate and Controllable Human Motion Prediction
* Learning the Unlearned: Mitigating Feature Suppression in Contrastive Learning
* Learning to Adapt Sam for Segmenting Cross-domain Point Clouds
* Learning to Build by Building Your Own Instructions
* Learning to Complement and to Defer to Multiple Users
* Learning to Detect Multi-class Anomalies with Just One Normal Image Prompt
* Learning to Distinguish Samples for Generalized Category Discovery
* Learning to Drive via Asymmetric Self-play
* Learning to Enhance Aperture Phasor Field for Non-line-of-sight Imaging
* Learning to Generate Conditional Tri-plane for 3d-aware Expression Controllable Portrait Animation
* Learning to Localize Actions in Instructional Videos with Llm-based Multi-pathway Text-video Alignment
* Learning to Make Keypoints Sub-pixel Accurate
* Learning to Obstruct Few-shot Image Classification over Restricted Classes
* Learning to Robustly Reconstruct Dynamic Scenes from Low-light Spike Streams
* Learning to Unlearn for Robust Machine Unlearning
* Learning Trimodal Relation for Audio-visual Question Answering with Missing Modality
* Learning Unified Reference Representation for Unsupervised Multi-class Anomaly Detection
* Learning Unsigned Distance Functions from Multi-view Images with Volume Rendering Priors
* Learning Video Context as Interleaved Multimodal Sequences
* Learning Where to Look: Self-supervised Viewpoint Selection for Active Localization Using Geometrical Information
* Learning with Unmasked Tokens Drives Stronger Vision Learners
* Learning-based Axial Video Motion Magnification
* Lego: Learning Egocentric Action Frame Generation via Visual Instruction Tuning
* Lego: Learning to Disentangle and Invert Personalized Concepts Beyond Object Appearance in Text-to-image Diffusion Models
* LEIA: Latent View-invariant Embeddings for Implicit 3d Articulation
* Length-aware Motion Synthesis via Latent Diffusion
* Lerojd: Lidar Extended Radar-only Object Detection
* Letsmap: Unsupervised Representation Learning for Label-efficient Semantic BEV Mapping
* Leveraging Enhanced Queries of Point Sets for Vectorized Map Construction
* Leveraging Hierarchical Feature Sharing for Efficient Dataset Condensation
* Leveraging Imperfect Restoration for Data Availability Attack
* Leveraging Near-field Lighting for Monocular Depth Estimation from Endoscopy Videos
* Leveraging Representations from Intermediate Encoder-blocks for Synthetic Image Detection
* Leveraging Scale- and Orientation-covariant Features for Planar Motion Estimation
* Leveraging Semantic Cues from Foundation Vision Models for Enhanced Local Feature Correspondence
* Leveraging Single-Bounce Reflections and Onboard Motion Sensors for Enhanced 5G Positioning
* Leveraging Temporal Contextualization for Video Action Recognition
* Leveraging Text Localization for Scene Text Removal via Text-aware Masked Image Modeling
* Leveraging Thermal Modality to Enhance Reconstruction in Low-light Conditions
* LFS-Aware Surface Reconstruction From Unoriented 3D Point Clouds
* LG-Gaze: Learning Geometry-aware Continuous Prompts for Language-guided Gaze Estimation
* LGM: Large Multi-view Gaussian Model for High-resolution 3d Content Creation
* LH-YOLO: A Lightweight and High-Precision SAR Ship Detection Model Based on the Improved YOLOv8n
* Lhrs-bot: Empowering Remote Sensing with Vgi-enhanced Large Multimodal Language Model
* LiDAR-Aided Channel Model for Vehicular Intelligent Sensing-Communication Integration, A
* Lidar-based All-weather 3d Object Detection via Prompting and Distilling 4d Radar
* Lidar-Event Stereo Fusion with Hallucinations
* Lift: A Surprisingly Simple Lightweight Feature Transform for Dense Vit Descriptors
* Light-in-Flight for a World-in-Motion
* Lightendiffusion: Unsupervised Low-light Image Enhancement with Latent-retinex Diffusion Models
* Lightning Detection Using GEO-KOMPSAT-2A/Advanced Meteorological Imager and Ground-Based Lightning Observation Sensor LINET Data
* LightSOD: Towards lightweight and efficient network for salient object detection
* Lightweight and Efficient Distracted Driver Detection Model Fusing Convolutional Neural Network and Vision Transformer, A
* lightweight convolutional neural network-based feature extractor for visible images, A
* Lightweight Cross-Modal Transformer for RGB-D Salient Object Detection
* Lightweight Neural Network for Centroid Detection of Weak, Small Infrared Targets via Background Matching in Complex Scenes
* Linearly Controllable GAN: Unsupervised Feature Categorization and Decomposition for Image Generation and Manipulation
* Linefit: A Geometric Approach for Fitting Line Segments in Images
* Lingoqa: Visual Question Answering for Autonomous Driving
* Linking in Style: Understanding Learned Features in Deep Learning Models
* LISO: Lidar-Only Self-supervised 3d Object Detection
* Listen to Look Into the Future: Audio-visual Egocentric Gaze Anticipation
* Lita: Language Instructed Temporal-localization Assistant
* LITE-SAM Is Actually What You Need for Segment Everything
* Livehps++: Robust and Coherent Motion Capture in Dynamic Free Environment
* Livephoto: Real Image Animation with Text-guided Motion Control
* Llama-vid: An Image is Worth 2 Tokens in Large Language Models
* LLAVA-Grounding: Grounded Visual Chat with Large Multimodal Models
* LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
* LLAVA-UHD: An LMM Perceiving Any Aspect Ratio and High-resolution Images
* LLIC: Large Receptive Field Transform Coding With Adaptive Weights for Learned Image Compression
* LLM as Copilot for Coarse-grained Vision-and-language Navigation
* Llm as Dataset Analyst: Subpopulation Structure Discovery with Large Language Model
* LLMCO4MR: LLMs-aided Neural Combinatorial Optimization for Ancient Manuscript Restoration from Fragments with Case Studies on Dunhuang
* Llmga: Multimodal Large Language Model Based Generation Assistant
* LMEye: An Interactive Perception Network for Large Language Models
* Lmt-gp: Combined Latent Mean-teacher and Gaussian Process for Semi-supervised Low-light Image Enhancement
* Ln3diff: Scalable Latent Neural Fields Diffusion for Speedy 3d Generation
* Lnl+k: Enhancing Learning with Noisy Labels Through Noise Source Knowledge Integration
* LOA-TRANS: Enhancing Visual Grounding by Location-aware Transformers
* Loc3diff: Local Diffusion for 3d Human Head Synthesis and Editing
* Local Action-guided Motion Diffusion Model for Text-to-motion Generation
* Local All-pair Correspondence for Point Tracking
* Local and Global Flatness for Federated Domain Generalization
* Local and global self-attention enhanced graph convolutional network for skeleton-based action recognition
* Local Occupancy-enhanced Object Grasping with Multiple Triplanar Projection
* Local Reference Feature Transfer (LRFT): A simple pre-processing step for image enhancement
* Localization and Expansion: A Decoupled Framework for Point Cloud Few-shot Semantic Segmentation
* Localized Evaluation of Surface Water Quality Using GIS-Based Water Quality Index along Satpara Watershed Skardu Baltistan, Pakistan, A
* Locate N' Rotate: Two-stage Openable Part Detection with Foundation Model Priors
* Location-Aware and Privacy-Preserving Data Cleaning for Intelligent Transportation
* LOCO-MAD: Long-range Context-enhanced Model Towards Plot-centric Movie Audio Description
* Locomotion: Learning Motion-focused Video-language Representations
* Log-VMAMBA: Local-global Vision Mamba for Medical Image Segmentation
* LOGDESC: Local Geometric Features Aggregation for Robust Point Cloud Registration
* Logosticker: Inserting Logos Into Diffusion Models for Customized Generation
* Loli-street: Benchmarking Low-light Image Enhancement and Beyond
* Long-CLIP: Unlocking the Long-text Capability of CLIP
* Long-range Turbulence Mitigation: A Large-scale Dataset and A Coarse-to-fine Framework
* Long-tail Temporal Action Segmentation with Group-wise Temporal Logit Adjustment
* Long-Term Demand Prediction for Public Bicycle Sharing System: A Spatio-Temporal Attentional Graph Convolution Networks Approach
* Long-Term Ground Deformation Monitoring and Quantitative Interpretation in Shanghai Using Multi-Platform TS-InSAR, PCA, and K-Means Clustering
* Long-term Temporal Context Gathering for Neural Video Compression
* LongVLM: Efficient Long Video Understanding via Large Language Models
* Look Around and Learn: Self-training Object Detection by Exploration
* Look Hear: Gaze Prediction for Speech-directed Human Attention
* LookupVIT: Compressing Visual Information to a Limited Number of Tokens
* Lossy Image Compression with Foundation Diffusion Models
* Lost and Found: Overcoming Detector Failures in Online Multi-object Tracking
* Lost in Translation: Latent Concept Misalignment in Text-to-image Diffusion Models
* Lost in Translation: Modern Neural Networks Still Struggle with Small Realistic Image Transformations
* Lottery Ticket Hypothesis in Denoising: Towards Semantic-driven Initialization, The
* LPVIT: Low-power Semi-structured Pruning for Vision Transformers
* LRSLAM: Low-rank Representation of Signed Distance Fields in Dense Visual SLAM System
* LTRL: Boosting Long-tail Recognition via Reflective Learning
* LuminanceGAN: Controlling the brightness of generated images for various night conditions
* m&m's: A Benchmark to Evaluate Tool-use for multi-step multi-modal Tasks
* M-adapter: Multi-level image-to-video adaptation for video action recognition
* M-rat: a Multi-grained Retrieval Augmentation Transformer for Image Captioning
* M-to-N Backdoor Paradigm: A Multi-Trigger and Multi-Target Attack to Deep Learning Models
* M2d2m: Multi-Motion Generation from Text with Discrete Diffusion Models
* M3A: A multimodal misinformation dataset for media authenticity analysis
* M3dbench: Towards Omni 3d Assistant with Interleaved Multi-modal Instructions
* MAAN: Memory-Augmented Auto-Regressive Network for Text-Driven 3D Indoor Scene Generation
* Macdiff: Unified Skeleton Modeling with Masked Conditional Diffusion
* Machine learning applications in breast cancer prediction using mammography
* Machine Learning Enhances Soil Aggregate Stability Mapping for Effective Land Management in a Semi-Arid Region
* MAD-DR: Map Compression for Visual Localization with Matchness Aware Descriptor Dimension Reduction
* Made to Order: Discovering Monotonic Temporal Changes via Self-supervised Video Ordering
* Magdiff: Multi-alignment Diffusion for High-fidelity Video Generation and Editing
* Magiceraser: Erasing Any Objects via Semantics-aware Control
* Magicmirror: Fast and High-quality Avatar Generation with a Constrained Search Space
* Magmax: Leveraging Model Merging for Seamless Continual Learning
* MAGR: Manifold-aligned Graph Regularization for Continual Action Quality Assessment
* Mahalanobis Distance-based Multi-view Optimal Transport for Multi-view Crowd Localization
* Main genes in breast cancer primary tumor and first metastasis in lymph nodes revealed by information-theory-based genetic networks pattern analysis
* Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation
* Make a Strong Teacher with Label Assistance: A Novel Knowledge Distillation Approach for Semantic Segmentation
* Make Your Vit-based Multi-view 3d Detectors Faster via Token Compression
* Make-your-3d: Fast and Consistent Subject-driven 3d Content Generation
* Making Large Language Models Better Planners with Reasoning-decision Alignment
* Mamba-based Light Field Super-resolution with Efficient Subspace Scanning
* Mamba-ND: Selective State Space Modeling for Multi-dimensional Data
* Mambair: A Simple Baseline for Image Restoration with State-space Model
* Mangrove Mapping in China Using Gaussian Mixture Model with a Novel Mangrove Index (SSMI) Derived from Optical and SAR Imagery
* Manigaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation
* Manikin: Biomechanically Accurate Neural Inverse Kinematics for Human Motion Estimation
* Map-adapt: Real-time Quality-adaptive Semantic 3d Maps
* Mapdistill: Boosting Efficient Camera-based Hd Map Construction via Camera-lidar Fusion Model Distillation
* MAPM: PolSAR Image Classification with Masked Autoencoder Based on Position Prediction and Memory Tokens
* Mapping Crop Types for Beekeepers Using Sentinel-2 Satellite Image Time Series: Five Essential Crops in the Pollination Services
* Mapping Polylepis Forest Using Sentinel, PlanetScope Images, and Topographical Features with Machine Learning
* Mapping the Brazilian savanna's natural vegetation: A SAR-optical uncertainty-aware deep learning approach
* Maptracker: Tracking with Strided Memory Fusion for Consistent Vector HD Mapping
* Marineinst: A Foundation Model for Marine Image Analysis with Instance Visual Description
* Mariner: Enhancing Novel Views by Matching Rendered Images with Nearby References
* Markov Knowledge Distillation: Make Nasty Teachers Trained by Self-undermining Knowledge Distillation Fully Distillable
* Mars: Multi-view Attention Regularizations for Patch-based Feature Recognition of Space Terrain
* Mart: Multiscale Relational Transformer Networks for Multi-agent Trajectory Prediction
* Marvelovd: Marrying Object Recognition and Vision-language Models for Robust Open-vocabulary Object Detection
* MASA: Motion-Aware Masked Autoencoder with Semantic Alignment for Sign Language Recognition
* Mask as Supervision: Leveraging Unified Mask Information for Unsupervised 3d Pose Estimation
* Mask2map: Vectorized HD Map Construction Using Bird's Eye View Segmentation Masks
* MaskCRT: Masked Conditional Residual Transformer for Learned Video Compression
* Masked Angle-aware Autoencoder for Remote Sensing Images
* Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity
* Masked Motion Prediction with Semantic Contrast for Point Cloud Sequence Learning
* Masked Video and Body-Worn IMU Autoencoder for Egocentric Action Recognition
* MaskFusionNet: A Dual-Stream Fusion Model With Masked Pre-Training Mechanism for rPPG Measurement
* Masking Cascaded Self-attentions for Few-shot Font-generation Transformer
* MAST-GCN: Multi-Scale Adaptive Spatial-Temporal Graph Convolutional Network for EEG-Based Depression Recognition
* Masterweaver: Taming Editability and Face Identity for Personalized Text-to-image Generation
* Match Me If You Can: Semi-supervised Semantic Correspondence Learning with Unpaired Images
* Match-free Inbetweening Assistant (MIBA): A Practical Animation Tool Without User Stroke Correspondence
* Match-stereo-videos: Bidirectional Alignment for Consistent Dynamic Stereo Matching
* Mathverse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
* Maxfusion: Plug&play Multi-modal Generation in Text-to-Image Diffusion Models
* Maximum Platoon Size for Platoon-Based Cooperative Signal-Free Control at Intersections
* Maxmi: A Maximal Mutual Information Criterion for Manipulation Concept Discovery
* MC-PANDA: Mask Confidence for Panoptic Domain Adaptation
* Mcgrids: Monte Carlo-driven Adaptive Grids for Iso-surface Extraction
* MCPL: Multi-Modal Collaborative Prompt Learning for Medical Vision-Language Model
* ME-FCN: A Multi-Scale Feature-Enhanced Fully Convolutional Network for Building Footprint Extraction
* Measuring student behavioral engagement using histogram of actions
* Mecformer: Multi-task Whole Slide Image Classification with Expert Consultation Network
* Medblip: Bootstrapping Language-image Pretraining from 3d Medical Images and Texts
* Medical Federated Model with Mixture of Personalized and Shared Components
* Medrat: Unpaired Medical Report Generation via Auxiliary Tasks
* Meerkat: Audio-visual Large Language Model for Grounding in Space and Time
* Megascenes: Scene-level View Synthesis at Scale
* Membn: Robust Test-time Adaptation via Batch Norm with Statistics Memory
* Memory positional encoding for image captioning
* Memory-efficient Fine-tuning for Quantized Diffusion Model
* Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas
* Merlin: Empowering Multimodal LLMs with Foresight Minds
* Merlin: Single-shot Material Estimation and Relighting for Photometric Stereo
* Mesh refinement method for multi-view stereo with unary operations
* Mesh2nerf: Direct Mesh Supervision for Neural Radiance Field Representation and Generation
* Meshavatar: Learning High-quality Triangular Human Avatars from Multi-view Videos
* Meshfeat: Multi-resolution Features for Neural Fields on Meshes
* Meshgs: Adaptive Mesh-aligned Gaussian Splatting for High-quality Rendering
* Meshsegmenter: Zero-shot Mesh Semantic Segmentation via Texture Synthesis
* Meshvpr: Citywide Visual Place Recognition Using 3d Meshes
* MEsonGS: Post-training Compression of 3d Gaussians via Efficient Attribute Transformation
* Meta-Learning Without Data via Unconditional Diffusion Models
* Meta-optimized Angular Margin Contrastive Framework for Video-language Representation Learning
* Meta-prompting for Automating Zero-shot Visual Recognition with LLMs
* Metaat: Active Testing for Label-efficient Evaluation of Dense Recognition Tasks
* Metaaug: Meta-Data Augmentation for Post-training Quantization
* Metacap: Meta-learning Priors from Multi-view Imagery for Sparse-view Human Performance Capture and Rendering
* Metaweather: Few-shot Weather-degraded Image Restoration
* method for evaluating deep generative models of images for hallucinations in high-order spatial context, A
* Methodology for Object-Level Change Detection in Post-Earthquake Building Damage Assessment Based on Remote Sensing Images: OCD-BDA
* Methods and Evaluation of AI-Based Meteorological Models for Zenith Tropospheric Delay Prediction
* METNet: A mesh exploring approach for segmenting 3D textured urban scenes
* Mevg: Multi-event Video Generation with Text-to-video Models
* Mew: Multiplexed Immunofluorescence Image Analysis Through an Efficient Multiplex Network
* MGNICENET: Unified Monocular Geometric Scene Understanding
* Micdrop: Masking Image and Depth Features via Complementary Dropout for Domain-adaptive Semantic Segmentation
* Microphysical Characteristics of Precipitation for Four Types of Typical Weather Systems on Hainan Island
* MIGA-Net: Multi-View Image Information Learning Based on Graph Attention Network for SAR Target Recognition
* Migs: Multi-identity Gaussian Splatting via Tensor Decomposition
* milliflow: Scene Flow Estimation on mmwave Radar Point Cloud for Human Motion Sensing
* Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-language Models
* MIND-3D: Reconstruct High-quality 3d Objects in Human Brain
* Mineral detection based on hyperspectral remote sensing imagery on Mars: From detection methods to fine mapping
* Mini-splatting: Representing Scenes with a Constrained Number of Gaussians
* Minimalist Vision with Freeform Pixels
* Minimizing AoI in High-Speed Railway Mobile Networks: DQN-Based Methods
* Mining and Visualization of Tourism Cultural Image Based on the Information Transmission Model of Tourism Cultural Map: Taking Nanjing Xuanwu Lake Tourist Attraction as an Example
* Mining Spatiotemporal Mobility Patterns Using Improved Deep Time Series Clustering
* MinoritySalMix and adaptive semantic weight compensation for long-tailed classification
* Mirrorgaussian: Reflecting 3d Gaussians for Reconstructing Mirror Reflections
* Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment
* Missing Modality Prediction for Unpaired Multimodal Learning via Joint Embedding of Unimodal Models
* Mitigating Aberration-Induced Noise: A Deep Learning-Based Aberration-to-Aberration Approach
* Mitigating Background Shift in Class-incremental Semantic Segmentation
* Mitigating Perspective Distortion-induced Shape Ambiguity in Image Crops
* Mitigation of Motion Sickness and Optimization of Motion Comfort in Autonomous Vehicles: Systematic Survey
* MIWC: A multi-temporal image weighted composition method for satellite-derived bathymetry in shallow waters
* Mixdq: Memory-efficient Few-step Text-to-image Diffusion Models with Metric-decoupled Mixed Precision Quantization
* Mixed-Precision Transformer Accelerator With Vector Tiling Systolic Array for License Plate Recognition in Unconstrained Scenarios, A
* Mixture of Efficient Diffusion Experts Through Automatic Interval and Sub-network Selection
* Ml-semreg: Boosting Point Cloud Registration with Multi-level Semantic Consistency
* MLP architecture fusing RGB and CASSI for computational spectral imaging, A
* MLP: Motion Label Prior for Temporal Sentence Localization in Untrimmed 3D Human Motions
* MLPHand: Real Time Multi-view 3d Hand Reconstruction via Mlp Modeling
* MM-Safetybench: A Benchmark for Safety Evaluation of Multimodal Large Language Models
* MM1: Methods, Analysis and Insights from Multimodal LLM Pre-training
* MMBENCH: Is Your Multi-Modal Model an All-Around Player?
* MMEARTH: Exploring Multi-modal Pretext Tasks for Geospatial Representation Learning
* MMI-Det: Exploring Multi-Modal Integration for Visible and Infrared Object Detection
* MMIFR: Multi-modal industry focused data repository
* MMVR: Millimeter-wave Multi-view Radar Dataset and Benchmark for Indoor Perception
* MMVS: Enabling Robust Adaptive Video Streaming for Wildly Fluctuating and Heterogeneous Networks
* MO-EMT-NAS: Multi-objective Continuous Transfer of Architectural Knowledge Between Tasks from Different Datasets
* Moai: Mixture of All Intelligence for Large Language and Vision Models
* Mobilediffusion: Instant Text-to-image Generation on Mobile Devices
* Mobilenetv4: Universal Models for the Mobile Ecosystem
* MoBoo: Memory-Boosted Vision Transformer for Class-Incremental Learning
* MOD-UV: Learning Mobile Object Detectors from Unlabeled Videos
* Modality Translation for Object Detection Adaptation Without Forgetting Prior Knowledge
* Model Breadcrumbs: Scaling Multi-task Model Merging with Sparse Masks
* Model Stock: All We Need Is Just a Few Fine-tuned Models
* Modeling and Driving Human Body Soundfields Through Acoustic Primitives
* Modeling Category Semantic and Sentiment Knowledge for Aspect-Level Sentiment Analysis
* Modeling Label Correlations with Latent Context for Multi-label Recognition
* Modeling Multi-Timescale Dynamics for Airport Surface Congestion and Recovery
* Modeling Population Mobility Flows: A Hybrid Approach Integrating a Gravity Model and Machine Learning
* Modeling the Land Surface Phenological Responses of Dominant Miombo Tree Species to Climate Variability in Western Tanzania
* Modelling Competitive Behaviors in Autonomous Driving Under Generative World Model
* Modulated deformable convolution based on graph convolution network for rail surface crack detection
* MOE-DIFFIR: Task-customized Diffusion Priors for Universal Compressed Image Restoration
* Moead: A Parameter-efficient Model for Multi-class Anomaly Detection
* MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model
* MOMA: Multimodal LLM Adapter for Fast Personalized Image Generation
* MoMa: Skinned motion retargeting using masked pose modeling
* Momentum Auxiliary Network for Supervised Local Learning
* Monitoring and Analysis of Surface Deformation in the Buzhaoba Open-Pit Mine Based on SBAS-InSAR Technology
* Monitoring Dissolved Organic Carbon Concentration and Flux in the Qiantang Riverine System Using Sentinel-2 Satellite Images
* Mono-VIFI: A Unified Learning Framework for Self-supervised Single and Multi-frame Monocular Depth Estimation
* Monocular depth estimation with boundary attention mechanism and Shifted Window Adaptive Bins
* Monocular Occupancy Prediction for Scalable Indoor Scenes
* Monocular Ranging Method for Ship Targets Based on Unmanned Surface Vessels in a Shaking Environment, A
* Monodssms: Efficient Monocular 3d Object Detection with Depth-aware State Space Models
* Monotta: Fully Test-time Adaptation for Monocular 3d Object Detection
* Monowad: Weather-adaptive Diffusion Model for Robust Monocular 3d Object Detection
* Montrage: Monitoring Training for Attribution of Generative Diffusion Models
* More and Larger Auxiliary Feature-guided Spatial-temporal Super-resolution for Rendered Sequences
* More Balanced Loss-Reweighting Method for Long-Tailed Traffic Sign Detection and Recognition, A
* Motion and Structure from Event-based Normal Flow
* Motion Aware Event Representation-driven Image Deblurring
* Motion Keyframe Interpolation for Any Human Skeleton via Temporally Consistent Point Cloud Sampling and Reconstruction
* Motion Mamba: Efficient and Long Sequence Motion Generation
* Motion-Aware Mask Feature Reconstruction for Skeleton-Based Action Recognition
* Motion-guided Latent Diffusion for Temporally Consistent Real-world Video Super-resolution
* Motion-guided small MAV detection in complex and non-planar scenes
* Motion-oriented Compositional Neural Radiance Fields for Monocular Dynamic Human Modeling
* Motion-prior Contrast Maximization for Dense Continuous-time Motion Estimation
* Motionchain: Conversational Motion Controllers via Multimodal Prompts
* Motiondirector: Motion Customization of Text-to-video Diffusion Models
* Motionlcm: Real-time Controllable Motion Generation via Latent Consistency Model
* Mountain Landslide Monitoring Using a DS-InSAR Method Incorporating a Spatio-Temporal Atmospheric Phase Screen Correction Model
* Movideo: Motion-aware Video Generation with Diffusion Model
* Moving Horizon Estimation with Variable Structure Interacting Multiple Model for Surrounding Vehicle States in Complex Environments
* Moving Object Segmentation: All You Need is SAM (and Flow)
* MRSP: Learn Multi-Representations of Single Primitive for Compositional Zero-Shot Learning
* MS-DETR: Multispectral Pedestrian Detection Transformer With Loosely Coupled Fusion and Modality-Balanced Optimization
* MS-UMLP: Medical Image Segmentation via Multi-scale U-shape Mlp-mixer
* MSANet: LiDAR-Camera Online Calibration with Multi-Scale Fusion and Attention Mechanisms
* MSCMNet: Multi-scale Semantic Correlation Mining for Visible-Infrared Person Re-Identification
* Msd: A Benchmark Dataset for Floor Plan Generation of Building Complexes
* MSF-SLAM: Multi-Sensor-Fusion-Based Simultaneous Localization and Mapping for Complex Dynamic Environments
* MT-DSNet: Mix-mask teacher-student strategies and dual dynamic selection plug-in module for fine-grained image recognition
* MTA-CLIP: Language-guided Semantic Segmentation with Mask-text Alignment
* Mtadcs: Moving Trace and Feature Density-based Confidence Sample Selection Under Label Noise
* Mtkd: Multi-teacher Knowledge Distillation for Image Super-resolution
* MTMAMBA: Enhancing Multi-Task Dense Scene Understanding by Mamba-based Decoders
* Multi-Active and Multi-Passive Sensor Fusion Algorithm for Multi-Target Tracking in Dense Group Clutter Environments, A
* Multi-branch Collaborative Learning Network for 3d Visual Grounding
* Multi-Class Vehicle Detection Using VDnet in Heterogeneous Traffic
* Multi-Domain Merging Adaptation for Container Rehandling Probability Prediction
* Multi-granularity Sparse Relationship Matrix Prediction Network for End-to-end Scene Graph Generation
* Multi-hmr: Multi-person Whole-body Human Mesh Recovery in a Single Shot
* Multi-Label Chest X-Ray Image Classification With Single Positive Labels
* multi-label classification method based on transformer for deepfake detection, A
* Multi-label Cluster Discrimination for Visual Representation Learning
* Multi-level urban street representation with street-view imagery and hybrid semantic graph
* Multi-memory Matching for Unsupervised Visible-infrared Person Re-identification
* Multi-Modal 3D Object Detection by Box Matching
* Multi-Modal Attribute Prompting for Vision-Language Models
* Multi-modal Crowd Counting via a Broker Modality
* Multi-Modal Large Language Model Enhanced Pseudo 3D Perception Framework for Visual Commonsense Reasoning
* Multi-modal Relation Distillation for Unified 3d Representation Learning
* Multi-modal transformer with language modality distillation for early pedestrian action anticipation
* Multi-modal Video Dialog State Tracking in the Wild
* Multi-Object Tracking Using Score-Driven Hierarchical Association Strategy Between Predicted Tracklets and Objects
* Multi-Objective Optimization of Urban Gas Station Site Selection Under Territorial Spatial Planning Constraints
* Multi-path Segmentation Network Based on CNN and Transformer for Skin Lesion Image
* Multi-person Pose Forecasting with Individual Interaction Perceptron and Prior Learning
* Multi-phase Multi-graph Approach for Focal Liver Lesion Classification on CT Scans, A
* Multi-Prior Driven Resolution Rescaling Blocks for Intra Frame Coding
* Multi-ROI Human Mesh Recovery with Camera Consistency and Contrastive Losses
* Multi-scale Cross Distillation for Object Detection in Aerial Images
* Multi-Scale Effects of Supply-Demand Changes in Water-Related Ecosystem Services Across Different Landscapes in River Basin
* Multi-Scenario Remote Sensing Image Forgery Detection Based on Transformer and Model Fusion
* Multi-Sensor Learning Enables Information Transfer Across Different Sensory Data and Augments Multi-Modality Imaging
* Multi-sentence Grounding for Long-term Instructional Video
* Multi-Stage Auxiliary Learning for Visible-Infrared Person Re-Identification
* Multi-task Domain Adaptation for Language Grounding with 3d Objects
* Multi-task OCTA image segmentation with innovative dimension compression
* multi-view graph neural network for building age prediction, A
* Multi-View Visual Semantic Embedding for Cross-Modal Image-Text Retrieval
* Multidelete for Multimodal Machine Unlearning
* MultiGen: Zero-Shot Image Generation from Multi-Modal Prompts
* Multimodal Benchmark Dataset and Model for Crop Disease Diagnosis, A
* Multimodal Cross-domain Few-shot Learning for Egocentric Action Recognition
* Multimodal Label Relevance Ranking via Reinforcement Learning
* Multimodal Prediction of Obsessive-Compulsive Disorder and Comorbid Depression Severity and Energy Delivered by Deep Brain Electrodes
* Multimodal-XAD: Explainable Autonomous Driving Based on Multimodal Environment Descriptions
* Multimodality Adaptive Transformer and Mutual Learning for Unsupervised Domain Adaptation Vehicle Re-Identification
* Multimodality-guided Visual-Caption Semantic Enhancement
* Multiple Active Stereo Systems Calibration Method Based on Neural SDF Using Dsss for Wide Area 3d Reconstruction
* Multiscale Graph Texture Network
* Multiscale Sliced Wasserstein Distances as Perceptual Color Difference Measures
* Multiscale Spatiotemporal Variation Analysis of Regional Water Use Efficiency Based on Multifractals
* Multistain Pretraining for Slide Representation Learning in Pathology
* MultiSubjects: A multi-subject video dataset for single-person basketball action recognition from basketball gym
* Multivariate prototype representation for domain-generalized incremental learning
* Multiview Detection with Cardboard Human Modeling
* Muses: The Multi-sensor Semantic Perception Dataset for Driving Under Uncertainty
* MuSRFM: Multiple scale resolution fusion based precise and robust satellite derived bathymetry model for island nearshore shallow water regions using sentinel-2 multi-spectral imagery
* Mutdet: Mutually Optimizing Pre-Training for Remote Sensing Object Detection
* Mutual Learning for Acoustic Matching and Dereverberation via Visual Scene-driven Diffusion
* MV2MP: Segmentation Free Performance Capture of Humans in Direct Physical Contact from Sparse Multi-Cam Setups
* MVDD: Multi-view Depth Diffusion Models
* Mvdiffusion++: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3d Object Reconstruction
* Mvpgs: Excavating Multi-view Priors for Gaussian Splatting from Sparse Input Views
* Mvsgaussian: Fast Generalizable Gaussian Splatting Reconstruction from Multi-view Stereo
* Mvsplat: Efficient 3d Gaussian Splatting from Sparse Multi-view Images
* MvWECM: Multi-view Weighted Evidential C-Means clustering
* MWVOS: Mask-Free Weakly Supervised Video Object Segmentation via promptable foundation model
* MYVLM: Personalizing VLMS for User-specific Queries
* M^2 Depth: Self-supervised Two-Frame Multi-camera Metric Depth Estimation
* Möbius Transform for Mitigating Perspective Distortions in Representation Learning
* N2F2: Hierarchical Scene Understanding with Nested Neural Feature Fields
* Namedcurves: Learned Image Enhancement via Color Naming
* Namer: Non-autoregressive Modeling for Handwritten Mathematical Expression Recognition
* NAS-BNN: Neural Architecture Search for Binary Neural Networks
* Nash Meets Wertheimer: Using Good Continuation in Jigsaw Puzzles
* NAVGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-language Models
* Navigating Text-to-image Generative Bias Across Indic Languages
* Navigation Instruction Generation with BEV Perception and Large Language Models
* Navigation of a Team of UAVs for Covert Video Sensing of a Target Moving on an Uneven Terrain
* Near-Inertial Oscillations Induced by Winter Monsoon Onset in the Southwest Taiwan Strait
* NEPHI: Neural Deformation Fields for Approximately Diffeomorphic Medical Image Registration
* NeRF-MAE: Masked Autoencoders for Self-supervised 3d Representation Learning for Neural Radiance Fields
* NeRF-XL: Scaling NeRFs with Multiple GPUs
* NeRFect Match: Exploring NeRF Features for Visual Localization, The
* NeRFtrinsic Four: An end-to-end trainable NeRF jointly optimizing diverse intrinsic and extrinsic camera parameters
* Nermo: Learning Implicit Neural Representations for 3d Human Motion Prediction
* Neural Active Structure-from-motion in Dark and Textureless Environment
* Neural Graphics Texture Compression Supporting Random Access
* Neural Metamorphosis
* Neural Network Based Multi-Level In-Loop Filtering for Versatile Video Coding
* Neural Poisson Solver: A Universal and Continuous Framework for Natural Signal Blending
* Neural Spectral Decomposition for Dataset Distillation
* Neural Substitution for Branch-level Network Re-parameterization
* Neural Surface Detection for Unsigned Distance Fields
* Neural Volumetric World Models for Autonomous Driving
* NeuralTPS: Learning Signed Distance Functions Without Priors from Single Sparse Point Clouds
* Neuroncap: Photorealistic Closed-loop Safety Testing for Autonomous Driving
* Neuropictor: Refining fmri-to-image Reconstruction via Multi-individual Pretraining and Multi-level Modulation
* Neusdfusion: A Spatial-aware Generative Model for 3d Shape Completion, Reconstruction, and Generation
* New Approach to Intelligent Pedestrian Detection and Signaling on Crosswalks
* New Astronomical Observatory Design for the Detection and Tracking of Satellite Objects: The Satellite Robotic Observatory (SRO)
* New Dataset and Framework for Real-world Blurred Images Super-Resolution, A
* New Decision Tree Based on Intuitionistic Fuzzy Twin Support Vector Machines, A
* New Method to Correct Vegetation Bias in a Copernicus Digital Elevation Model to Improve Flow Path Delineation
* new methodology for establishing an SOC content prediction model that is spatiotemporally transferable at multidecadal and intercontinental scales, A
* New Spatial Analysis and Hybrid Heuristics Enhance Truck Freight Tonnage Estimation Based on Weigh-in-Motion Data
* new two-stage low-light enhancement network with progressive attention fusion strategy, A
* Newmove: Customizing Text-to-video Models with Novel Motions
* newton interpolation network for smoke semantic segmentation, A
* NGP-RT: Fusing Multi-level Hash Features with Lightweight Attention for Real-time Novel View Synthesis
* Nickel and Diming Your Gan: A Dual-method Approach to Enhancing Gan Efficiency via Knowledge Distillation
* NICP: Neural ICP for 3d Human Registration at Scale
* Nighttime fog and low stratus detection under multi-scene and all lunar phase conditions using S-NPP/VIIRS visible and infrared channels
* Nl2contact: Natural Language Guided 3d Hand-object Contact Modeling with Diffusion Model
* Noise Calibration: Plug-and-play Content-preserving Video Enhancement Using Pre-trained Video Diffusion Models
* Noise-assisted Prompt Learning for Image Forgery Detection and Localization
* Noise-Robust Vision-Language Pre-Training With Positive-Negative Learning
* NoiseBox: Toward More Efficient and Effective Learning With Noisy Labels
* Non-exemplar Domain Incremental Learning via Cross-domain Concept Integration
* Non-line-of-sight Estimation of Fast Human Motion with Slow Scanning Imagers
* Non-negative subspace feature representation for few-shot learning in medical imaging
* Non-parametric Sensor Noise Modeling and Synthesis
* Non-transferable Pruning
* Nonlinear least-squares solutions to the TLS multi-station registration adjustment problem
* Nonverbal Interaction Detection
* Norface: Improving Facial Expression Analysis by Identity Normalization
* Norma: A Noise Robust Memory-augmented Framework for Whole Slide Image Classification
* Not Just Change the Labels, Learn the Features: Watermarking Deep Neural Networks with Multi-view Data
* Novel Approach for Ex Situ Water Quality Monitoring Using the Google Earth Engine and Spectral Indices in Chilika Lake, Odisha, India, A
* Novel Cross-Perturbation for Single Domain Generalization, A
* novel HMM distance measure with state alignment, A
* novel image inpainting method based on a modified Lengyel-Epstein model, A
* Novel Machine Learning-Based Online Optimal Control Strategy for Fuel Cell in Electrified Transportation System, A
* novel theoretical analysis on optimal pipeline of multi-frame image super-resolution using sparse coding, A
* Novel Voronoi-Based Spatio-Temporal Graph Convolutional Network for Traffic Crash Prediction Considering Geographical Spatial Distributions, A
* Novum: Neural Object Volumes for Robust Object Classification
* NT-VOT211: A Large-scale Benchmark for Night-time Visual Object Tracking
* nu-craft: Crafting High Resolution 3d Semantic Occupancy for Unified 3d Scene Understanding
* Numerical Simulation of Convective Systems in Southeast China: A Comparison of Microphysical Schemes and Sensitivity Experiments on Raindrop Break and Evaporation, A
* NUVO: Neural UV Mapping for Unruly 3d Representations
* NVDS^+: Towards Efficient and Versatile Neural Stabilizer for Video Depth Estimation
* NVS-Adapter: Plug-and-play Novel View Synthesis from a Single Image
* Nymeria: A Massive Collection of Multimodal Egocentric Daily Motion in the Wild
* O1O: Grouping of Known Classes to Identify Unknown Objects as Odd-one-Out
* OAPT: Offset-aware Partition Transformer for Double Jpeg Artifacts Removal
* OAT: Object-level Attention Transformer for Gaze Scanpath Prediction
* Object-aware NIR-to-visible Translation
* Object-aware Query Perturbation for Cross-modal Image-text Retrieval
* Object-centric Diffusion for Efficient Video Editing
* Object-conditioned Energy-based Attention Map Alignment in Text-to-image Diffusion Models
* Object-Goal Navigation of Home Care Robot Based on Human Activity Inference and Cognitive Memory
* Object-Oriented Anchoring and Modal Alignment in Multimodal Learning
* ObjectCompose: Evaluating Resilience of Vision-based Models on Object-to-background Compositional Changes
* Objectdrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion
* Observability-Based Gaussian Sum Cubature Kalman Filter for Three-Dimensional Target Tracking Using a Single Two-Dimensional Radar
* Observed Changes and Projected Risks of Hot-Dry/Hot-Wet Compound Events in China
* Obstacle Avoidance for a Large-Scale High-Speed Underactuated AUV in Complex Environments
* Occfusion: Depth Estimation Free Multi-sensor Fusion for 3d Occupancy Prediction
* Occgen: Generative Multi-modal 3d Occupancy Prediction for Autonomous Driving
* Occluded Gait Recognition with Mixture of Experts: An Action Detection Perspective
* Occlusion Handling in 3d Human Pose Estimation with Perturbed Positional Encoding
* Occlusion-Aware Seamless Segmentation
* Occlusion-related graph convolutional neural network for multi-object tracking
* Occupancy as Set of Points
* Occworld: Learning a 3d Occupancy World Model for Autonomous Driving
* Octopus: Embodied Vision-language Programmer from Environmental Feedback
* OGNI-DC: Robust Depth Completion with Optimization-guided Neural Iterations
* OLAF: A Plug-and-play Framework for Enhanced Multi-object Multi-part Scene Parsing
* OMG: Occlusion-friendly Personalized Multi-concept Generation in Diffusion Models
* Omni-Recon: Harnessing Image-based Rendering for General-purpose Neural Radiance Fields
* Omni6D: Large-Vocabulary 3D Object Dataset for Category-Level 6D Object Pose Estimation
* Omni6dpose: A Benchmark and Model for Universal 6d Object Pose Estimation and Tracking
* Omniact: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web
* Omnifusion: Exemplar-based Video Colorization Using Omnimotion and Diffusion Priors
* OmniNOCS: A Unified NOCS Dataset and Model for 3d Lifting of 2d Objects
* Omnisat: Self-supervised Modality Fusion for Earth Observation
* OmniSSR: Zero-Shot Omnidirectional Image Super-resolution Using Stable Diffusion Model
* Omniview-tuning: Boosting Viewpoint Invariance of Vision-language Pre-training Models
* OMR: Occlusion-aware Memory-based Refinement for Video Lane Detection
* On Calibration of Object Detectors: Pitfalls, Evaluation and Baselines
* On Learning Discriminative Features from Synthesized Data for Self-supervised Fine-grained Visual Recognition
* On Practical Implementations of Connected Vehicles: The Issue of Acceleration Feedback
* On Pretraining Data Diversity for Self-supervised Learning
* On Spectral Properties of Gradient-based Explanation Methods
* On the Approximation Risk of Few-Shot Class-Incremental Learning
* On the effects of obfuscating speaker attributes in privacy-aware depression detection
* On the Error Analysis of 3d Gaussian Splatting and an Optimal Projection Strategy
* On the Evaluation Consistency of Attribution-based Explanations
* On the Topology Awareness and Generalization Performance of Graph Neural Networks
* On the Utility of 3d Hand Poses for Action Recognition
* On the Viability of Monocular Depth Pre-Training for Semantic Segmentation
* On the Vulnerability of Skip Connections to Model Inversion Attacks
* On Unsupervised Partial Shape Correspondence
* On-orbit geometric calibration of MERSI whiskbroom scanner
* On-the-fly Category Discovery for Lidar Semantic Segmentation
* On-the-Fly Modulation for Balanced Multimodal Learning
* One point is all you need for weakly supervised object detection
* ONE-DM: One-shot Diffusion Mimicker for Handwritten Text Generation
* One-index vector quantization based adversarial attack on image classification
* One-Shot Any-Scene Crowd Counting With Local-to-Global Guidance
* One-stage Prompt-based Continual Learning
* Onebev: Using One Panoramic Image for Bird's-eye-view Semantic Mapping
* Onediff: A Generalist Model for Image Difference Captioning
* Onerestore: A Universal Restoration Framework for Composite Degradation
* Onetrack: Demystifying the Conflict Between Detection and Tracking in End-to-end 3d Trackers
* Onevos: Unifying Video Object Segmentation with All-in-one Transformer Framework
* Online Continuous Generalized Category Discovery
* Online indoor visual odometry with semantic assistance under implicit epipolar constraints
* Online probabilistic knowledge distillation on cryptocurrency trading using Deep Reinforcement Learning
* Online Temporal Action Localization with Memory-augmented Transformer
* Online Vectorized HD Map Construction Using Geometry
* Online Video Quality Enhancement with Spatial-temporal Look-up Tables
* Online Zero-shot Classification with CLIP
* Op-align: Object-level and Part-level Alignment for Self-supervised Category-level Articulated Object Pose Estimation
* Open Data for Transparency of Government Tenders: A State Analysis in Croatian Agriculture Land Lease
* Open Panoramic Segmentation
* Open Vocabulary 3d Scene Understanding via Geometry Guided Self-distillation
* Open Vocabulary Multi-label Video Classification
* Open-set Biometrics: Beyond Good Closed-set Models
* Open-set Domain Adaptation via Joint Error Based Multi-class Positive and Unlabeled Learning
* Open-set Recognition in the Age of Vision-language Models
* Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models
* Open-Vocabulary RGB-Thermal Semantic Segmentation
* Open-vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively
* Open-world Dynamic Prompt and Continual Visual Representation Learning
* Open: Object-Wise Position Embedding for Multi-View 3D Object Detection
* OPENINS3D: Snap and Lookup for 3d Open-vocabulary Instance Segmentation
* Openkd: Opening Prompt Diversity for Zero- and Few-shot Keypoint Detection
* Openpsg: Open-set Panoptic Scene Graph Generation via Large Multimodal Models
* Opensight: A Simple Open-Vocabulary Framework for Lidar-Based Object Detection
* Operational Open-set Recognition and Postmax Refinement
* Ophnet: A Large-scale Video Benchmark for Ophthalmic Surgical Workflow Understanding
* Optical See-Through Head-Mounted Display With Mitigated Parallax-Related Registration Errors: A User Study Validation
* Optimal Control View of Lora and Binary Controller Design for Vision Transformers, An
* Optimal Feature-Guided Position-Shape Dual Optimization for Building Point Cloud Facade Detail Enhancement
* Optimal Hyperspectral Characteristic Parameters Construction and Concentration Retrieval for Inland Water Chlorophyll-a Under Different Motion States
* Optimal Transport of Diverse Unsupervised Tasks for Robust Learning from Noisy Few-shot Data
* Optimization Framework to Enforce Multi-view Consistency for Texturing 3d Meshes, An
* Optimization-based Uncertainty Attribution Via Learning Informative Perturbations
* Optimized Breast Lesion Segmentation in Ultrasound Videos Across Varied Resource-scant Environments
* Optimizing Circular MIMO Array Imaging Using Partial Equivalent Method for Sidelobe Suppression
* Optimizing Diffusion Models for Joint Trajectory Prediction and Controllable Generation
* Optimizing Factorized Encoder Models: Time and Memory Reduction for Scalable and Efficient Action Recognition
* Optimizing hybrid models for canopy nitrogen mapping from Sentinel-2 in Google Earth Engine
* Optimizing Illuminant Estimation in Dual-exposure HDR Imaging
* Optimizing Latent Variables in Integrating Transfer and Query Based Attack Framework
* OR-LIM: Observability-aware robust LiDAR-inertial-mapping under high dynamic sensor motion
* Organ-Aware Diagnosis Framework for Radiology Report Generation, An
* Origin-Destination Matrix Prediction in Public Transport Networks: Incorporating Heterogeneous Direct and Transfer Trips
* Osmosis: RGBD Diffusion Prior for Underwater Image Restoration
* Otseg: Multi-prompt Sinkhorn Attention for Zero-shot Semantic Segmentation
* Oulu Remote-photoplethysmography Physical Domain Attacks Database (orpdad)
* Out-of-bounding-box Triggers: A Stealthy Approach to Cheat Object Detectors
* OV-UNI3DETR: Towards Unified Open-vocabulary 3d Object Detection via Cycle-modality Propagation
* Overcome Modal Bias in Multi-modal Federated Learning via Balanced Modality Selection
* Overcoming Distribution Mismatch in Quantizing Image Super-resolution Networks
* Overcoming Modality Bias in Question-Driven Sign Language Video Translation
* OVSW: Overcoming Silent Weights for Accurate Binary Neural Networks
* O_2V-Mapping: Online Open-vocabulary Mapping with Neural Implicit Representation
* P2P-Bridge: Diffusion Bridges for 3d Point Cloud Denoising
* PACE: A Large-scale Dataset with Pose Annotations in Cluttered Environments
* Pairingnet: A Learning-Based Pair-Searching and -Matching Network for Image Fragments
* Pairwise Distance Distillation for Unsupervised Real-world Image Super-resolution
* PALM: Predicting Actions through Language Models
* Panel-specific Degradation Representation for Raw Under-display Camera Image Restoration
* Pangu-draw: Advancing Resource-efficient Text-to-image Synthesis with Time-decoupled Training and Reusable Coop-diffusion
* Panofree: Tuning-free Holistic Multi-view Image Generation with Cross-view Self-guidance
* Panovos: Bridging Non-panoramic and Panoramic Views with Transformer for Video Segmentation
* Papmot: Exploring Adversarial Patch Attack Against Multiple Object Tracking
* PAPR: Training-free One-step Patch Pruning with Lightweight Convnets for Faster Inference
* Parameter-efficient and Memory-efficient Tuning for Vision Transformer: A Disentangled Approach
* Parameter-Efficient Instance-adaptive Neural Video Compression
* Parameter-selective Continual Test-time Adaptation
* Parameterization-driven Neural Surface Reconstruction for Object-oriented Editing in Neural Rendering
* Parameterized Quasi-physical Simulators for Dexterous Manipulations Transfer
* Parco: Part-coordinating Text-to-motion Synthesis
* Pare-net: Position-aware Rotation-equivariant Networks for Robust Point Cloud Registration
* Paris3d: Reasoning-based 3d Part Segmentation Using Large Multimodal Model
* Parnet: Aortic Reconstruction from Orthogonal X-rays Using Pre-trained Generative Adversarial Networks
* Parrot Captions Teach CLIP to Spot Text
* Parrot: Pareto-optimal Multi-reward Reinforcement Learning Framework for Text-to-image Generation
* Part2object: Hierarchical Unsupervised 3d Instance Segmentation
* Partcraft: Crafting Creative Objects by Parts
* Partglee: A Foundation Model for Recognizing and Parsing Any Objects
* Partimagenet++ Dataset: Scaling Up Part-based Models for Robust Recognition
* Partstad: 2d-to-3d Part Segmentation Task Adaptation
* Patchrefiner: Leveraging Synthetic Data for Real-domain High-resolution Monocular Metric Depth Estimation
* Pathformer3d: A 3d Scanpath Transformer for 360° Images
* Pathmmu: A Massive Multimodal Expert-level Benchmark for Understanding and Reasoning in Pathology
* Pathological Asymmetry-Guided Progressive Learning for Acute Ischemic Stroke Infarct Segmentation
* Pathology-knowledge Enhanced Multi-instance Prompt Learning for Few-Shot Whole Slide Image Classification
* Pav: Personalized Head Avatar from Unstructured Video Collection
* Pavement Point Cloud Upsampling Based on Transformer: Toward Enhancing 3D Pavement Data
* Paying More Attention to Image: A Training-free Method for Alleviating Hallucination in LVLMS
* PCF-Lift: Panoptic Lifting by Probabilistic Contrastive Fusion
* Pdiscoformer: Relaxing Part Discovery Constraints with Vision Transformers
* PDT: UAV Target Detection Dataset for Pests and Diseases Tree
* Pea-diffusion: Parameter-efficient Adapter with Knowledge Distillation in Non-english Text-to-image Generation
* Per-gaussian Embedding-based Deformation for Deformable 3d Gaussian Splatting
* Perceptual Evaluation of Audio-Visual Synchrony Grounded in Viewers' Opinion Scores
* Performance Evaluation of Satellite Precipitation Products During Extreme Events: The Case of the Medicane Daniel in Thessaly, Greece
* Performance of GPM IMERG Product Validated on Hourly Observations over Land Areas of Northern Hemisphere, The
* Personalized Federated Domain-incremental Learning Based on Adaptive Knowledge Matching
* Personalized Federated Learning on long-tailed data via knowledge distillation and generated features
* Personalized Privacy Protection Mask Against Unauthorized Facial Recognition
* Personalized Video Relighting With an At-Home Light Stage
* Petface: A Large-scale Dataset and Benchmark for Animal Identification
* Pfededit: Personalized Federated Learning via Automated Model Editing
* PFGS: High Fidelity Point Cloud Rendering via Feature Splatting
* Phase Concentration and Shortcut Suppression for Weakly Supervised Semantic Segmentation
* phenological-knowledge-independent method for automatic paddy rice mapping with time series of polarimetric SAR images, A
* Photon Inhibition for Energy-efficient Single-photon Imaging
* Photorealistic Object Insertion with Diffusion-guided Inverse Rendering
* Photorealistic Video Generation with Diffusion Models
* PhraseAug: An Augmented Medical Report Generation Model With Phrasebook
* Physavatar: Learning the Physics of Dressed 3d Avatars from Visual Observations
* Physdreamer: Physics-based Interaction with 3d Objects via Video Generation
* Physgen: Rigid-body Physics-grounded Image-to-Video Generation
* Physical-based Event Camera Simulator
* Physically Plausible Color Correction for Neural Radiance Fields
* Physics-informed Knowledge Transfer for Underwater Monocular Depth Estimation
* Pick-a-back: Selective Device-to-device Knowledge Transfer in Federated Continual Learning
* Piecewise Convolutional Neural Network Relation Extraction with Self-Attention Mechanism
* PILORA: Prototype Guided Incremental LORA for Federated Class-incremental Learning
* PIM-Net: Progressive Inconsistency Mining Network for image manipulation localization
* PISR: Polarimetric Neural Implicit Surface Reconstruction for Textureless and Specular Objects
* PITE: Pixel-Temporal Alignment for Large Video-Language Model
* PIX2GIF: Motion-guided Diffusion for GIF Generation
* Pixart-sigma: Weak-to-strong Training of Diffusion Transformer for 4k Text-to-image Generation
* Pixel-aware Stable Diffusion for Realistic Image Super-resolution and Personalized Stylization
* Pixel-gs: Density Control with Pixel-aware Gradient for 3d Gaussian Splatting
* Pixel-Learnable 3DLUT With Saturation-Aware Compensation for Image Enhancement
* Pixmamba: Leveraging State Space Models in a Dual-level Architecture for Underwater Image Enhancement
* Pixood: Pixel-level Out-of-distribution Detection
* Placing Objects in Context via Inpainting for Out-of-distribution Segmentation
* Plain-det: A Plain Multi-dataset Object Detector
* Plainusr: Chasing Faster Convnet for Efficient Super-resolution
* Plan, Posture and Go: Towards Open-vocabulary Text-to-motion Generation
* Planar Feature-Preserving Texture Defragmentation Method for 3D Urban Building Models, A
* Platypus: A Generalized Specialist Model for Reading Text in Various Forms
* Plot: Text-based Person Search with Part Slot Attention for Corresponding Part Discovery
* Plug and Play: A Representation Enhanced Domain Adapter for Collaborative Perception
* Plug-and-play Learned Proximal Trajectory for 3d Sparse-view X-ray Computed Tomography
* Pluggable Style Representation Learning for Multi-style Transfer
* PMGNet: Disentanglement and entanglement benefit mutually for compositional zero-shot learning
* PMT: Progressive Mean Teacher via Exploring Temporal Consistency for Semi-Supervised Medical Image Segmentation
* PMTrack: Multi-Object Tracking with Motion-Aware
* POA: Pre-training Once for Models of All Sizes
* Poca: Post-training Quantization with Temporal Alignment for Codec Avatars
* Poet: Prompt Offset Tuning for Continual Human Action Adaptation
* Point Cloud Completion via Self-Projected View Augmentation and Implicit Field Constraint
* Point Cloud Segmentation Neural Network with Same-Type Point Cloud Assistance
* Point-supervised Panoptic Segmentation via Estimating Pseudo Labels from Learnable Distance
* Pointllm: Empowering Large Language Models to Understand Point Clouds
* PointNeRF++: A Multi-scale, Point-based Neural Radiance Field
* Pointreggpt: Boosting 3d Point Cloud Registration Using Generative Point-cloud Pairs for Training
* PolarFormer: A Transformer-Based Method for Multi-Lesion Segmentation in Intravascular OCT
* Polarisation Synthesis Applied to 3D Polarimetric Imaging for Enhanced Buried Object Detection and Identification
* PolyGNN: Polyhedron-based graph neural network for 3D building reconstruction from point clouds
* Polynomial kernel learning for interpolation kernel machines with application to graph classification
* Polyoculus: Simultaneous Multi-view Image-based Novel View Synthesis
* Polyp-ses: Automatic Polyp Segmentation with Self-enriched Semantic Model
* PolyR-CNN: R-CNN for end-to-end polygonal building outline extraction
* Polyroom: Room-aware Transformer for Floorplan Reconstruction
* Ponymation: Learning Articulated 3d Animal Motions from Unlabeled Online Videos
* Population Distribution Forecasting Based on the Fusion of Spatiotemporal Basic and External Features: A Case Study of Lujiazui Financial District
* Portrait4D-V2: Pseudo Multi-view Data Creates Better 4d Head Synthesizer
* Pose-aware Self-supervised Learning with Viewpoint Trajectory Regularization
* Pose-guided Fine-grained Sign Language Video Generation
* Poseaugment: Generative Human Pose Data Augmentation with Physical Plausibility for IMU-based Motion Capture
* Posecrafter: One-Shot Personalized Video Synthesis Following Flexible Pose Control
* Poseembroider: Towards a 3d, Visual, Semantic-aware Human Pose Representation
* Posesor: Human Pose Can Guide Our Attention
* Posformer: Recognizing Complex Handwritten Mathematical Expression with Position Forest Transformer
* Positional diffusion: Graph-based diffusion models for set ordering
* Positive and Negative Set Designs in Contrastive Feature Learning for Temporal Action Segmentation
* Post-training Quantization with Progressive Calibration and Activation Relaxing for Text-to-image Diffusion Models
* Posterllama: Bridging Design Ability of Language Model to Content-aware Layout Generation
* Potential and Observed Supply-Demand Characteristics of Medical Services: A Case Study of Nighttime Visits in Shenzhen
* Power Corridor Safety Hazard Detection Based on Airborne 3D Laser Scanning Technology
* Power Variable Projection for Initialization-free Large-scale Bundle Adjustment
* Powerful and Flexible: Personalized Text-to-image Generation via Reinforcement Learning
* PPAD: Iterative Interactions of Prediction and Planning for End-to-end Autonomous Driving
* PQ-SAM: Post-training Quantization for Segment Anything Model
* Pre-trained Visual Dynamics Representations for Efficient Policy Learning
* Precisecontrol: Enhancing Text-to-image Diffusion Models with Fine-grained Attribute Control
* Precision Detection of Infrared Small Target in Ground-to-Air Scene
* Predbench: Benchmarking Spatio-temporal Prediction Across Diverse Disciplines
* Predicting gradient is better: Exploring self-supervised learning for SAR ATR with a joint-embedding predictive architecture
* Predicting Human Postures for Manual Material Handling Tasks Using a Conditional Diffusion Model
* Prediction Exposes Your Face: Black-box Model Inversion via Prediction Alignment
* Predictive Clustering of Vessel Behavior Based on Hierarchical Trajectory Representation
* Predictive Vehicle Stability Assessment Using Lyapunov Exponent Under Extreme Conditions
* Prelar: World Model Pre-Training with Learnable Action Representation
* Preliminary Study of Environmental Variations Around the Du-Ku Highway Since 2000, The
* Prescribed-Time Dynamic Positioning Control for USV with Lumped Disturbances, Thruster Saturation and Prescribed Performance Constraints
* Presight: Enhancing Autonomous Vehicle Perception with City-scale NeRF Priors
* PRET: Planning with Directed Fidelity Trajectory for Vision and Language Navigation
* Preventing Catastrophic Forgetting Through Memory Networks in Continuous Detection
* Preventing Catastrophic Overfitting in Fast Adversarial Training: A Bi-level Optimization Perspective
* PRFusion: Toward Effective and Robust Multi-Modal Place Recognition With Image and Point Cloud Fusion
* Primedepth: Efficient Monocular Depth Estimation with a Stable Diffusion Preimage
* Prior Knowledge-Guided Triple-Domain Transformer-GAN for Direct PET Reconstruction From Low-Count Sinograms
* Prioritized Semantic Learning for Zero-Shot Instance Navigation
* PRISMethaNet: A novel deep learning model for landfill methane detection using PRISMA satellite data
* Privacy-Aware Anomaly Detection and Notification Enhancement for VANET Based on Collaborative Intrusion Detection System
* Privacy-Aware Design and Analysis of Drone Remote Identification Systems
* Privacy-preserving Adaptive Re-identification Without Image Transfer
* Privacy-preserving speaker verification system using Ranking-of-Element hashing
* Pro2sam: Mask Prompt to Sam with Grid Points for Weakly Supervised Object Localization
* Probabilistic Image-driven Traffic Modeling via Remote Sensing
* Probabilistic Modeling of Train Operations for Uncertainty Quantification: A Context-Aware Bayesian Network Approach
* Probabilistic Weather Forecasting with Deterministic Guidance-based Diffusion Model
* Probability-guided Sampler for Neural Implicit Surface Rendering, A
* Procreate, Don't Reproduce! Propulsive Energy Diffusion for Creative Generation
* Prodepth: Boosting Self-supervised Multi-frame Monocular Depth with Probabilistic Fusion
* Produce Once, Utilize Twice for Anomaly Detection
* Progressive Classifier and Feature Extractor Adaptation for Unsupervised Domain Adaptation on Point Clouds
* Progressive Content-Aware Coded Hyperspectral Snapshot Compressive Imaging
* Progressive Pretext Task Learning for Human Trajectory Prediction
* Progressive Proxy Anchor Propagation for Unsupervised Semantic Segmentation
* Progressive Target Refinement by Self-distillation for Human Pose Estimation
* Projecting Points to Axes: Oriented Object Detection via Point-axis Representation
* Promerge: Prompt and Merge for Unsupervised Instance Segmentation
* Prompt-and-Transfer: Dynamic Class-Aware Enhancement for Few-Shot Segmentation
* Prompt-based Test-time Real Image Dehazing: A Novel Pipeline
* Prompt-driven Contrastive Learning for Transferable Adversarial Attacks
* Prompt-Guided Semantic-Aware Distillation for Weakly Supervised Incremental Semantic Segmentation
* PromptCCD: Learning Gaussian Mixture Prompt Pool for Continual Category Discovery
* Promptfusion: Decoupling Stability and Plasticity for Continual Learning
* Prompting Future Driven Diffusion Model for Hand Motion Prediction
* Prompting Language-informed Distribution for Compositional Zero-Shot Learning
* Promptiqa: Boosting the Performance and Generalization for No-reference Image Quality Assessment via Prompts
* Proposal With Alignment: A Bi-Directional Transformer for 360° Video Viewport Proposal
* Propose, Assess, Search: Harnessing Llms for Goal-oriented Planning in Instructional Videos
* Prosub: Probabilistic Open-set Semi-supervised Learning with Subspace-based Out-of-distribution Detection
* Protecting NeRFs' Copyright via Plug-and-play Watermarking Base Model
* Protip: Probabilistic Robustness Verification on Text-to-image Diffusion Models Against Stochastic Perturbation
* ProtoComp: Diverse Point Cloud Completion with Controllable Prototype
* Prototype-Decomposed Knowledge Distillation for Learning Generalized Federated Representation
* Prototype-Guided Attention Distillation for Discriminative Person Search
* Provably Secure and Lightweight Authentication and Key Agreement Protocol for Fog-Based Vehicular Ad-Hoc Networks
* Proxyclip: Proxy Attention Improves CLIP for Open-vocabulary Segmentation
* PRSN: Prototype resynthesis network with cross-image semantic alignment for few-shot image classification
* PSALM: Pixelwise Segmentation with Large Multi-modal Model
* Pseudo-embedding for Generalized Few-shot 3d Segmentation
* Pseudo-keypoint RKHS Learning for Self-supervised 6dof Pose Estimation
* Pseudo-label refinement via hierarchical contrastive learning for source-free unsupervised domain adaptation
* Pseudo-labeling with keyword refining for few-supervised video captioning
* Pseudo-labelling Should Be Aware of Disguising Channel Activations
* Pseudo-ris: Distinctive Pseudo-supervision Generation for Referring Image Segmentation
* Psg-adapter: Controllable Planning Scene Graph for Improving Text-to-image Diffusion
* PSVMA+: Exploring Multi-Granularity Semantic-Visual Adaption for Generalized Zero-Shot Learning
* PtbNet: Based on Local Few-Shot Classes and Small Objects to Accurately Detect PTB
* Public Bus-Assisted Task Offloading for UAVs
* Put Myself in Your Shoes: Lifting the Egocentric Perspective from Exocentric Videos
* PYRA: Parallel Yielding Re-activation for Training-inference Efficient Task Adaptation
* Pyramid Diffusion for Fine 3d Large Scene Generation
* Pyramid transformer-based triplet hashing for robust visual place recognition
* Q&A Prompts: Discovering Rich Visual Clues through Mining Question-answer Prompts for VQA requiring Diverse World Knowledge
* QR-DETR: Query Routing for Detection Transformer
* Quality Assured: Rethinking Annotation Strategies in Imaging Ai
* Quanta Video Restoration
* Quantifying and Learning Static vs. Dynamic Information in Deep Spatiotemporal Networks
* Quantifying the Effect of Land Use and Land Cover Changes on Spatial-Temporal Dynamics of Water in Hanjiang River Basin
* Quantifying the Geomorphological Susceptibility of the Piping Erosion in Loess Using LiDAR-Derived DEM and Machine Learning Methods
* Quantitative Characterization of Highway Landscape Space Visual Perception Based on Deep Learning
* Quantitative Estimation and Analysis of Spatiotemporal Delay Effects in Expressway Traffic Accidents
* Quantitative Estimation of Driver Cognitive Workload: A Dual-Stage Learning Approach
* Quantization-friendly Winograd Transformations for Convolutional Neural Networks
* Quantized Prompt for Efficient Generalization of Vision-language Models
* QUAR-VLA: Vision-language-action Model for Quadruped Robots
* Quasi-Linear Convective Systems in Catalonia Detected Through Radar and Lightning Data
* QueryCDR: Query-based Controllable Distortion Rectification Network for Fisheye Images
* Question Type-Aware Debiasing for Test-Time Visual Question Answering Model Adaptation
* R.A.C.E.: Robust Adversarial Concept Erasure for Secure Text-to-image Diffusion Model
* R3D-AD: Reconstruction via Diffusion for 3d Anomaly Detection
* R3DS: Reality-linked 3d Scenes for Panoramic Scene Understanding
* Radar gait recognition using Dual-branch Swin Transformer with Asymmetric Attention Fusion
* Radar Moving Target Detection Based on Small-Sample Transfer Learning and Attention Mechanism
* Radedit: Stress-testing Biomedical Vision Models via Diffusion Image Editing
* Radiance Field Learners As UAV First-person Viewers
* Radiative Gaussian Splatting for Efficient X-ray Novel View Synthesis
* Radio Frequency Interference Screening Framework: From Quick-Look Detection Using Statistics-Assisted Network to Raw Echo Tracing, A
* Radiometric Cross-Calibration of HJ-2A/CCD3 Using the Random Forest Algorithm and a Spectral Interpolation Convolution Method with Sentinel-2/MSI
* RAFE: Generative Radiance Fields Restoration
* Raindrop Clarity: A Dual-focused Dataset for Day and Night Raindrop Removal
* Raising the Ceiling: Conflict-free Local Feature Matching with Dynamic View Switching
* Random Walk on Pixel Manifolds for Anomaly Segmentation of Complex Driving Scenes
* Randomized Channel-pass Mask for Channel-wise Explanation of Black-box Models
* Rangeldm: Fast Realistic Lidar Point Cloud Generation
* Ranrac: Robust Neural Scene Representations via Random Ray Consensus
* RAP: Retrieval-augmented Planner for Adaptive Procedure Planning in Instructional Videos
* Rapid-SEG: Range-aware Pointwise Distance Distribution Networks for 3d Lidar Segmentation
* Raster-Based Multi-Objective Spatial Optimization Framework for Offshore Wind Farm Site-Prospecting, A
* Rasterized Edge Gradients: Handling Discontinuities Differentiably
* Rate-distortion-cognition Controllable Versatile Neural Image Compression
* RAVE: Residual Vector Embedding for CLIP-guided Backlit Image Enhancement
* Raw-adapter: Adapting Pre-trained Visual Model to Camera Raw Images
* Rawformer: Unpaired Raw-to-raw Translation for Learnable Camera ISPS
* Ray Denoising: Depth-aware Hard Negative Sampling for Multi-view 3d Object Detection
* Ray Reordering for Hardware-Accelerated Neural Volume Rendering
* Ray-Distance Volume Rendering for Neural Scene Reconstruction
* Rayemb: Arbitrary Landmark Detection in X-ray Images Using Ray Embedding Subspace
* Rcs-prompt: Learning Prompt to Rearrange Class Space for Prompt-based Continual Learning
* RCVS: A Unified Registration and Fusion Framework for Video Streams
* RD-DIFF: Rltransformer-Based Diffusion Model with Diversity-inducing Modulator for Human Motion Prediction
* Re-evaluating winter carbon sink in Southern Ocean by recovering MODIS-Aqua chlorophyll-a product at high solar zenith angles
* Real Appearance Modeling for More General Deepfake Detection
* Real-Data-Driven 2000 FPS Color Video from Mosaicked Chromatic Spikes
* Real-SRGD: Enhancing Real-world Image Super-resolution with Classifier-free Guided Diffusion
* Real-time 3d-aware Portrait Editing from a Single Image
* Real-Time Global Optimal Energy Management Strategy for Connected PHEVs Based on Traffic Flow Information
* Real-time Holistic Robot Pose Estimation with Unknown States
* Real-Time Multi-Scene Visibility Enhancement for Promoting Navigational Safety of Vessels Under Complex Weather Conditions
* Real-Time Network-Level Traffic Signal Control: An Explicit Multiagent Coordination Method
* Real-Time Pedestrian Crossing Anticipation Based on an Action-Interaction Dual-Branch Network
* Realfred: An Embodied Instruction Following Benchmark in Photo-realistic Environments
* Realgen: Retrieval Augmented Generation for Controllable Traffic Scenarios
* Realistic Human Motion Generation with Cross-diffusion Models
* Realviformer: Investigating Attention for Real-world Video Super-resolution
* Reason2drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving
* Reasonable Anomaly Detection Based on Long-Term Sequence Modeling
* Reawakening of Voragine, the Oldest of Etna's Summit Craters: Insights from a Recurrent Episodic Eruptive Behavior
* Rebalancing Using Estimated Class Distribution for Imbalanced Semi-supervised Learning Under Class Distribution Mismatch
* Receler: Reliable Concept Erasing of Text-to-image Diffusion Models via Lightweight Erasers
* Recipe for Cac: Mosaic-based Generalized Loss for Improved Class-agnostic Counting, A
* Recon: Training-free Acceleration for Text-to-image Synthesis with Retrieval of Concept Prompt Trajectories
* Reconstructing high-resolution subsurface temperature of the global ocean using deep forest with combined remote sensing and in situ observations
* Reconstructing NDVI time series in cloud-prone regions: A fusion-and-fit approach with deep learning residual constraint
* Reconstructing Visual Stimulus Representation From EEG Signals Based on Deep Visual Representation Model
* Reconstruction and Simulation of Elastic Objects with Spring-mass 3d Gaussians
* Reconstruction of 30 m Land Cover in the Qilian Mountains from 1980 to 1990 Based on Super-Resolution Generative Adversarial Networks
* Reconstruction of Fine-Spatial-Resolution FY-3D-Based Vegetation Indices to Achieve Farmland-Scale Winter Wheat Yield Estimation via Fusion with Sentinel-2 Data
* Reconstruction of Hourly Gap-Free Sea Surface Skin Temperature from Multi-Sensors
* Recovery-Based Distributed Adaptive ILC With Fading Compensation for MHSTs Under DoS Attacks: A Model-Free Approach
* Rectify the Regression Bias in Long-tailed Object Detection
* Recurrentbev: A Long-term Temporal Fusion Framework for Multi-View 3d Detection
* Recursive classification of satellite imaging time-series: An application to land cover mapping
* Recursive Visual Programming
* Redefining Normal: A Novel Object-level Approach for Multi-object Novelty Detection
* Redir: Refocus-free Event-based De-occlusion Image Reconstruction
* REF-AVS: Refer and Segment Objects in Audio-visual Scenes
* Reference-based Face Super-resolution Using the Spatial Transformer
* Referring Atomic Video Action Recognition
* Refine, Discriminate and Align: Stealing Encoders via Sample-wise Prototypes and Multi-relational Extraction
* Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-language Models
* Refraction-Aware Structure from Motion for Airborne Bathymetry
* Reframe: Reflective Surface Real-time Rendering for Mobile Devices
* Reg-tta3d: Better Regression Makes Better Test-time Adaptive 3d Object Detection
* Region-adaptive Transform with Segmentation Prior for Image Compression
* Region-aware Distribution Contrast: A Novel Approach to Multi-task Partially Supervised Learning
* Region-aware image-based human action retrieval with transformers
* Region-aware Sequence-to-sequence Learning for Hyperspectral Denoising
* Region-centric Image-Language Pretraining for Open-Vocabulary Detection
* Region-native Visual Tokenization
* Regional dynamic point cloud completion network
* Regiondrag: Fast Region-based Image Editing with Diffusion Models
* Reground: Improving Textual and Spatial Grounding at No Cost
* RegSeg: An End-to-End Network for Multimodal RGB-Thermal Registration and Semantic Segmentation
* Regular Constrained Multimodal Fusion for Image Captioning
* Regularizing Dynamic Radiance Fields with Kinematic Fields
* Regulating Model Reliance on Non-robust Features by Smoothing Input Marginal Density
* Reinforcement Learning Friendly Vision-language Model for Minecraft
* Reinforcement Learning Meets Visual Odometry
* Reinforcement Learning via Auxiliary Task Distillation
* Rejection Sampling IMLE: Designing Priors for Better Few-shot Image Synthesis
* Relation DETR: Exploring Explicit Position Relation Prior for Object Detection
* Relative Pose from Cylinder Silhouettes
* Reliability and Models of Subjective Motion Incongruence Ratings in Urban Driving Simulations
* Reliability in Semantic Segmentation: Can We Use Synthetic Data?
* Reliable and Efficient Concept Erasure of Text-to-image Diffusion Models
* Reliable Phrase Feature Mining for Hierarchical Video-Text Retrieval
* Reliable Spatial-temporal Voxels For Multi-modal Test-time Adaptation
* Relightable 3d Gaussians: Realistic Point Cloud Relighting with BRDF Decomposition and Ray Tracing
* Relightable Neural Actor with Intrinsic Decomposition and Pose Control
* Reloo: Reconstructing Humans Dressed in Loose Garments from Monocular Video in the Wild
* Reluifying Smooth Functions: Low-cost Knowledge Distillation to Obtain High-performance Relu Networks
* Remamber: Referring Image Segmentation with Mamba Twister
* Rematching: Low-resolution Representations for Scalable Shape Correspondence
* REMOS: 3d Motion-Conditioned Reaction Synthesis for Two-Person Interactions
* Remote Inspection of Bridges with the Integration of Scanning Total Station and Unmanned Aerial Vehicle Data
* Remote Sensing and GIS in Natural Resource Management: Comparing Tools and Emphasizing the Importance of In-Situ Data
* Remote Sensing Fine Estimation Model of PM2.5 Concentration Based on Improved Long Short-Term Memory Network: A Case Study on Beijing-Tianjin-Hebei Urban Agglomeration in China
* Removing Distributional Discrepancies in Captions Improves Image-Text Alignment
* Removing Hidden Information by Geometrical Perturbation in Frequency Domain
* Removing Instrumental Noise in Distributed Acoustic Sensing Data: A Comparison Between Two Deep Learning Approaches
* Removing Rows and Columns of Tokens in Vision Transformer Enables Faster Dense Prediction Without Retraining
* Renoise: Real Image Inversion Through Iterative Noising
* Repaint123: Fast and High-quality One Image to 3d Generation with Progressive Controllable Repainting
* Reparameterization Feature Redundancy Extract Network for Unmanned Aerial Vehicles Detection, A
* Replay: Remove Projective Lidar Depthmap Artifacts via Exploiting Epipolar Geometry
* Repose: 3d Human Pose Estimation via Spatio-temporal Depth Relational Consistency
* Representation Enhancement-stabilization: Reducing Bias-variance of Domain Generalization
* Representing Topological Self-similarity Using Fractal Feature Maps for Accurate Segmentation of Tubular Structures
* Reprojection Errors as Prompts for Efficient Scene Coordinate Regression
* REPVF: A Unified Vector Fields Representation for Multi-task 3d Perception
* Research for the Positioning Optimization for Portable Field Terrain Mapping Equipment Based on the Adaptive Unscented Kalman Filter Algorithm
* Research on Airborne Ground-Penetrating Radar Imaging Technology in Complex Terrain
* Research on Land Use and Land Cover Information Extraction Methods for Remote Sensing Images Based on Improved Convolutional Neural Networks
* Reshaping the Online Data Buffering and Organizing Mechanism for Continual Test-Time Adaptation
* Resilience of Entropy Model in Distributed Neural Networks
* Resilient Cruise Control of Heterogeneous Platoons Against Byzantine Attacks: Theory and Experiment
* Resilient GNSS/INS-Based Railway Train Localization Using Odometer/Trackmap-Enabled Jamming Discrimination
* Resolving Scale Ambiguity in Multi-view 3d Reconstruction Using Dual-pixel Sensors
* Resource Block-Based Co-Design of Trajectory and Communication in UAV-Assisted Data Collection Networks
* Resource-Efficient Model-Free Adaptive Platooning Control for Vehicles With Encrypted Information
* Responsible Visual Editing
* Restore Anything with Masks: Leveraging Mask Image Modeling for Blind All-in-one Image Restoration
* Restoring Images in Adverse Weather Conditions via Histogram Transformer
* Resyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer
* Retargeting Visual Data with Deformation Fields
* Rethinking and Improving Visual Prompt Selection for In-Context Learning Segmentation
* Rethinking Data Augmentation for Robust Lidar Semantic Segmentation in Adverse Weather
* Rethinking Data Bias: Dataset Copyright Protection via Embedding Class-wise Hidden Bias
* Rethinking Deep Unrolled Model for Accelerated MRI Reconstruction
* Rethinking Directional Parameterization in Neural Implicit Surface Reconstruction
* Rethinking Domain Generalization: Discriminability and Generalizability
* Rethinking Fast Adversarial Training: A Splitting Technique to Overcome Catastrophic Overfitting
* Rethinking Features-Fused-Pyramid-Neck for Object Detection
* Rethinking Few-shot Class-incremental Learning: Learning from Yourself
* Rethinking Image Super-Resolution from Training Data Perspectives
* Rethinking Image-to-video Adaptation: An Object-centric Perspective
* Rethinking Inconsistent Context and Imbalanced Regression in Depression Severity Prediction
* Rethinking LiDAR Domain Generalization: Single Source as Multiple Density Domains
* Rethinking Normalization Layers for Domain Generalizable Person Re-Identification
* Rethinking Sampling for Music-driven Long-term Dance Generation
* Rethinking unsupervised domain adaptation for semantic segmentation
* Rethinking Unsupervised Outlier Detection via Multiple Thresholding
* Rethinking Video Deblurring with Wavelet-aware Dynamic Transformer and Diffusion Model
* Rethinking Video Sentence Grounding from a Tracking Perspective With Memory Network and Masked Attention
* Rethinking Video-text Understanding: Retrieval from Counterfactually Augmented Data
* Rethinking Weakly-supervised Video Temporal Grounding From a Game Perspective
* Retrieval Robust to Object Motion Blur
* Revealing Hidden Context in Camouflage Instance Segmentation
* Reverse Stable Diffusion: What prompt was used to generate this image?
* Review of Recent Advances in Remote Sensing and Machine Learning Methods for Lake Water Quality Management
* Review of synthetic aperture radar with deep learning in agricultural applications
* Revising Densification in Gaussian Splatting
* Revision: Rendering Tools Enable Spatial Fidelity in Vision-language Models
* Revisit Anything: Visual Place Recognition via Image Segment Retrieval
* Revisit Event Generation Model: Self-supervised Learning of Event-to-video Reconstruction with Implicit Neural Representations
* Revisit Human-scene Interaction via Space Occupancy
* Revisit Self-supervised Depth Estimation with Local Structure-from-motion
* Revisiting Adaptive Cellular Recognition Under Domain Shifts: A Contextual Correspondence View
* Revisiting Calibration of Wide-angle Radially Symmetric Cameras
* Revisiting Domain-adaptive Object Detection in Adverse Weather by the Generation and Composition of High-quality Pseudo-labels
* Revisiting Nonlocal Self-Similarity from Continuous Representation
* Revisiting Sample Weights Based Method for Noisy-label Detection and Classification
* Revisiting Supervision for Continual Representation Learning
* RGB-T tracking with frequency hybrid awareness
* RGB-T Tracking With Template-Bridged Search Interaction and Target-Preserved Template Updating
* RGBD GS-ICP SLAM
* RGBE-Gaze: A Large-Scale Event-Based Multimodal Dataset for High Frequency Remote Gaze Tracking
* RGNET: A Unified Clip Retrieval and Grounding Network for Long Videos
* RICA^2: Rubric-Informed, Calibrated Assessment of Actions
* Riding feeling recognition based on multi-head self-attention LSTM for driverless automobile
* Riemannian Approach for Spatiotemporal Analysis and Generation of 4d Tree-shaped Structures, A
* Ring-NeRF: Rethinking Inductive Biases for Versatile and Efficient Neural Fields
* Ringid: Rethinking Tree-ring Watermarking for Enhanced Multi-key Identification
* Risk-Anticipatory Autonomous Driving Strategies Considering Vehicles' Weights Based on Hierarchical Deep Reinforcement Learning
* Risk-aware Self-consistent Imitation Learning for Trajectory Planning in Autonomous Driving
* Risurconv: Rotation Invariant Surface Attention-augmented Convolutions for 3D Point Cloud Classification and Segmentation
* RLBA-UAV: A Robust and Lightweight Blockchain-Based Authentication and Key Agreement Scheme for PUF-Enabled UAVs
* RNA: Video Editing with Roi-based Neural Atlas
* Road Semantic-Enhanced Land Vehicle Integrated Navigation in GNSS Denied Environments
* Roadpainter: Points Are Ideal Navigators for Topology Transformer
* Robo-abc: Affordance Generalization Beyond Categories via Semantic Correspondence for Robot Manipulation
* RoBoSS: A Robust, Bounded, Sparse, and Smooth Loss Function for Supervised Learning
* Robust 3-D Path Following Control Framework for Magnetic Helical Millirobots Subject to Fluid Flow and Input Saturation
* Robust affine point matching via quadratic assignment on Grassmannians
* Robust Aircraft Detection in Imbalanced and Similar Classes With a Multi-Perspectives Aircraft Dataset
* Robust Calibration of Large Vision-language Adapters
* Robust EV Scheduling in Charging Stations Under Uncertain Demands and Deadlines
* Robust Fault-Tolerant Dynamic Positioning of Marine Surface Vessels With Prescribed Performance
* robust fingerprint identification approach using a fuzzy system and novel rotation method, A
* Robust Fitting on a Gate Quantum Computer
* Robust Incremental Structure-from-motion with Hybrid Features
* robust method for mapping soybean by phenological aligning of Sentinel-2 time series, A
* Robust Multimodal Learning via Representation Decoupling
* Robust Nearest Neighbors for Source-free Domain Adaptation Under Class Distribution Shift
* Robust Resource Allocation for RIS-Aided Multi-User SLAC System
* Robust Tensor Completion via Dictionary Learning and Generalized Nonconvex Regularization for Visual Data Recovery
* Robust Traffic Flow Control Using Connected Vehicle Technology: Signal Spatio-Temporal Logic-Based Approach, A
* Robust Visual Reinforcement Learning by Prompt Tuning
* Robust Zero-shot Crowd Counting and Localization With Adaptive Resolution Sam
* Robust-wide: Robust Watermarking Against Instruction-driven Image Editing
* Robustness Preserving Fine-tuning Using Neuron Importance
* Robustness Tokens: Towards Adversarial Robustness of Transformers
* Rodinhd: High-fidelity 3d Avatar Generation with Diffusion Models
* RoDUS: Robust Decomposition of Static and Dynamic Elements in Urban Scenes
* RogueNeRF: A Robust Geometry-consistent Universal Enhancer for NeRF
* Role of Masking for Efficient Supervised Knowledge Distillation of Vision Transformers, The
* RoofDiffusion: Constructing Roofs from Severely Corrupted Point Data via Diffusion
* Roomtex: Texturing Compositional Indoor Scenes via Iterative Inpainting
* Root-Zone Salinity in Irrigated Arid Farmland: Revealing Driving Mechanisms of Dynamic Changes in China's Manas River Basin over 20 Years
* Roscenes: A Large-scale Multi-view 3d Dataset for Roadside Perception
* Rotary Position Embedding for Vision Transformer
* Rotated Orthographic Projection for Self-supervised 3d Human Pose Estimation
* Rotation-invariant Texture Vit for Fine-grained Recognition of Esophageal Cancer Endoscopic Ultrasound Images, A
* RPBG: Towards Robust Neural Point-Based Graphics in the Wild
* RS-NeRF: Neural Radiance Fields from Rolling Shutter Images
* RS-SAM: Integrating Multi-scale Information for Enhanced Remote Sensing Image Segmentation
* RSL-BA: Rolling Shutter Line Bundle Adjustment
* RT-POSE: A 4d Radar Tensor-based 3d Human Pose Estimation and Localization Benchmark
* Rule-Driven News Captioning
* R^1-tuning: Efficient Image-to-video Transfer Learning for Video Temporal Grounding
* R^2-Bench: Benchmarking the Robustness of Referring Perception Models Under Perturbations
* S-CVAE: Stacked CVAE for Trajectory Prediction With Incremental Greedy Region
* S-JEPA: A Joint Embedding Predictive Architecture for Skeletal Action Recognition
* S2Match: Self-paced sampling for data-limited semi-supervised learning
* S2net: Skeleton-aware Slowfast Network for Efficient Sign Language Recognition
* SA-DVAE: Improving Zero-shot Skeleton-based Action Recognition by Disentangled Variational Autoencoders
* Safari: Adaptive Sequence Transformer for Weakly Supervised Referring Expression Segmentation
* Safe-CLIP: Removing NSFW Concepts from Vision-and-language Models
* Safe-sim: Safety-critical Closed-loop Traffic Simulation with Diffusion-controllable Adversaries
* Safeguard Text-to-image Diffusion Models with Human Feedback Inversion
* SAFENet: Semantic-Aware Feature Enhancement Network for unsupervised cross-domain road scene segmentation
* Safety-Guaranteed Oversized Cargo Cooperative Transportation With Closed-Form Collision-Free Trajectory Generation and Tracking Control
* Safnet: Selective Alignment Fusion Network for Efficient HDR Imaging
* Saft: Towards Out-of-distribution Generalization in Fine-tuning
* SAGS: Structure-aware 3d Gaussian Splatting
* SAH-SCI: Self-supervised Adapter for Efficient Hyperspectral Snapshot Compressive Imaging
* SAIR: Learning Semantic-aware Implicit Representation
* Salience-based Adaptive Masking: Revisiting Token Dynamics for Enhanced Pre-Training
* Salient Object Detection in RGB-D Videos
* SAM-COD: Sam-guided Unified Framework for Weakly-supervised Camouflaged Object Detection
* SAM-Guided Graph Cut for 3d Instance Segmentation
* SAM-Net: Spatio-Temporal Sequence Typhoon Cloud Image Prediction Net with Self-Attention Memory
* SAM-ResNet50: A Deep Learning Model for the Identification and Classification of Drought Stress in the Seedling Stage of Betula luminifera
* Sam4mllm: Enhance Multi-modal Large Language Model for Referring Expression Segmentation
* Samfusion: Sensor-adaptive Multimodal Fusion for 3d Object Detection in Adverse Weather
* SAMIF: Adapting Segment Anything Model for Image Inpainting Forensics
* SAMPolyBuild: Adapting the Segment Anything Model for polygonal building extraction
* Sapiens: Foundation for Human Vision Models
* SAR target augmentation and recognition via cross-domain reconstruction
* Satellite Image Restoration via an Adaptive QWNNM Model
* Satellite-Observed Hydrothermal Conditions Control the Effects of Soil and Atmospheric Drought on Peak Vegetation Growth on the Tibetan Plateau
* SAVE: Encoding spatial interactions for vision transformers
* Save: Protagonist Diversification with Structure Agnostic Video Editing
* SC4D: Sparse-controlled Video-to-4d Generation and Motion Transfer
* Scalable Group Choreography via Variational Phase Manifold Learning
* Scalable video transformer for full-frame video prediction
* Scalar Function Topology Divergence: Comparing Topology of 3d Objects
* Scale-Disentangled and Uncertainty-Guided Alignment for Domain-Adaptive Object Detection
* Scaledreamer: Scalable Text-to-3d Synthesis with Asynchronous Score Distillation
* Scaling Backwards: Minimal Synthetic Pre-Training?
* Scaling Up Personalized Image Aesthetic Assessment via Task Vector Customization
* SCANREASON: Empowering 3d Visual Grounding with Reasoning Capabilities
* Scantalk: 3d Talking Heads from Unregistered Scans
* Scape: A Simple and Strong Category-agnostic Pose Estimator
* Scatterformer: Efficient Voxel Transformer with Scattered Linear Attention
* SCCA-Net: A Novel Network for Image Manipulation Localization Using Split-channel Contextual Attention
* Scenario-Guided Transformer-Enabled Multi-Modal Unknown Event Classification for Air Transport
* Scene Coordinate Reconstruction: Posing of Image Collections via Incremental Learning of a Relocalizer
* Scene-Adaptive SVAD Based On Multi-Modal Action-Based Feature Extraction
* Scene-aware Human Motion Forecasting via Mutual Distance Prediction
* Scene-conditional 3d Object Stylization and Composition
* Scene-Dependent Prediction in Latent Space for Video Anomaly Detection and Anticipation
* Scene-graph VIT: End-to-end Open-vocabulary Visual Relationship Detection
* Scenegraphloc: Cross-modal Coarse Visual Localization on 3d Scene Graphs
* Scenescript: Reconstructing Scenes with an Autoregressive Structured Language Model
* Sceneteller: Language-to-3d Scene Generation
* Sceneverse: Scaling 3d Vision-language Learning for Grounded Scene Understanding
* SCGM: Asymmetric Steganographic Embedding Cost Learning With Adaptive Modulation
* Schedule Disruption Recovery in Liner Shipping Service Based on a Reinforcement Learning-Enabled Adaptive Genetic Algorithm
* Scissorhands: Scrub Data Influence via Connection Sensitivity in Networks
* SCLIP: Rethinking Self-attention for Dense Vision-language Inference
* SCOD: From Heuristics to Theory
* Scomatch: Alleviating Overtrusting in Open-set Semi-supervised Learning
* Scoping the Field: Recent Advances in Optical Remote Sensing for Precision Viticulture
* Score Distillation Sampling with Learned Manifold Corrective
* SCP-Diff: Spatial-categorical Joint Prior for Diffusion Based Semantic Image Synthesis
* Scpnet: Unsupervised Cross-modal Homography Estimation via Intra-modal Self-supervised Learning
* Scribbleprompt: Fast and Flexible Interactive Segmentation for Any Biomedical Image
* SDCINet: A novel cross-task integration network for segmentation and detection of damaged/changed building targets with optical remote sensing imagery
* SDPL: Shifting-Dense Partition Learning for UAV-View Geo-Localization
* SDPT: Synchronous Dual Prompt Tuning for Fusion-based Visual-language Pre-trained Models
* Sea Surface Temperature Prediction Using ConvLSTM-Based Model with Deformable Attention
* SEA-RAFT: Simple, Efficient, Accurate RAFT for Optical Flow
* SEA: Semantic Adversarial Augmentation for Last Layer Features from Unsupervised Representation Learning
* Seamless Optimization of Wavelet Parameters for Denoising LFM Radar Signals: An AI-Based Approach
* Seamless-Through-Breaking: Rethinking Image Stitching for Optimal Alignment
* SeaTrack: Rethinking Observation-Centric SORT for Robust Nearshore Multiple Object Tracking
* Secchi Depth Retrieval in Oligotrophic to Eutrophic Chilean Lakes Using Open Access Satellite-Derived Products
* Second-Order Proximity Guided Sampling Consensus for Robust Model Fitting
* Secure Image Watermarking Framework with Statistical Guarantees via Adversarial Attacks on Secret Key Networks, A
* Secure Personalized Federated Learning Algorithm for Autonomous Driving, A
* SEDIFF: Structure Extraction for Domain Adaptive Depth Estimation via Denoising Diffusion Models
* See and Think: Embodied Agent in Virtual Environment
* Seed: A Simple and Effective 3d DETR in Point Clouds
* Seeing Faces in Things: A Model and Dataset for Pareidolia
* Seeing the Unseen: A Frequency Prompt Guided Transformer for Image Restoration
* Seeing Through Expert's Eyes: Leveraging Radiologist Eye Gaze and Speech Report with Graph Neural Networks for Chest X-ray Image Classification
* Seflow: A Self-supervised Scene Flow Method in Autonomous Driving
* SegGen: Supercharging Segmentation Models with Text2Mask and Mask2Img Synthesis
* SEGIC: Unleashing the Emergent Correspondence for In-context Segmentation
* Segment and Recognize Anything at Any Granularity
* Segment, Lift and Fit: Automatic 3d Shape Labeling from 2d Prompts
* Segment3d: Learning Fine-grained Class-agnostic 3d Segmentation Without Manual Labels
* Segmentation-guided Layer-wise Image Vectorization with Gradient Fills
* Segpoint: Segment Any Point Cloud via Large Language Model
* SEGVG: Transferring Object Bounding Box to Segmentation for Visual Grounding
* Seismic Surface Rupture Zone in the Western Segment of the Northern Margin Fault of the Hami Basin and Its Causal Interpretation, Eastern Tianshan, The
* Seit++: Masked Token Modeling Improves Storage-efficient Training
* Select and Distill: Selective Dual-teacher Knowledge Transfer for Continual Learning on Vision-language Models
* Selective weighted least square and piecewise bilinear transformation for accurate satellite DSM generation
* Selex: Self-expertise in Fine-grained Generalized Category Discovery
* Self Learning-Based Platooning Control Strategy for Connected Autonomous Vehicles With Switching Topologies
* Self-adapting Large Visual-language Models to Edge Devices Across Visual Modalities
* Self-Attention Graph Convolution Imputation Network for Spatio-Temporal Traffic Data
* Self-Constructing Stereo Correspondences for Unsupervised Multi-View Stereo
* Self-cooperation Knowledge Distillation for Novel Class Discovery
* Self-distillation with beta label smoothing-based cross-subject transfer learning for P300 classification
* Self-guided Generation of Minority Samples Using Diffusion Models
* Self-rectifying Diffusion Sampling with Perturbed-attention Guidance
* Self-supervised Any-point Tracking by Contrastive Random Walks
* Self-supervised Audio-visual Soundscape Stylization
* Self-supervised Co-salient Object Detection via Feature Correspondences at Multiple Scales
* Self-Supervised Cyclic Diffeomorphic Mapping for Soft Tissue Deformation Recovery in Robotic Surgery Scenes
* Self-Supervised Feature Adaptation for 3D Industrial Anomaly Detection
* Self-supervised learning from images: No negative pairs, no cluster-balancing
* Self-supervised multimodal change detection based on difference contrast learning for remote sensing imagery
* Self-supervised random mask attention GAN in tackling pose-invariant face recognition
* Self-supervised Representation Learning for Adversarial Attack Detection
* Self-supervised Shape Completion via Involution and Implicit Correspondences
* Self-supervised Underwater Caustics Removal and Descattering via Deep Monocular SLAM
* Self-supervised Video Copy Localization with Regional Token Representation
* Self-supervised Video Desmoking for Laparoscopic Surgery
* Self-supervised Visual Learning from Interactions with Objects
* Self-training Room Layout Estimation via Geometry-aware Ray-casting
* SELFGEO: Self-supervised and Geodesic-consistent Estimation of Keypoints on Deformable Shapes
* Selfswapper: Self-supervised Face Swapping via Shape Agnostic Masked Autoencoder
* Semantic Diversity-aware Prototype-based Learning for Unbiased Scene Graph Generation
* Semantic Residual Prompts for Continual Learning
* semantic segmentation method integrated convolutional nonlinear spiking neural model with Transformer, A
* Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Descriptive Properties, A
* Semantic Visual-Inertial SLAM for Automated Valet Parking
* Semantic-guided Robustness Tuning for Few-shot Transfer Across Extreme Domain Shift
* Semantically Guided Representation Learning For Action Anticipation
* Semantichuman-hd: High-resolution Semantic Disentangled 3d Human Generation
* Semgrasp: Semantic Grasp Generation via Language Aligned Discretization
* Semi-Supervised Multi-View Feature Selection with Adaptive Similarity Fusion and Learning
* Semi-supervised Segmentation of Histopathology Images with Noise-aware Topological Consistency
* Semi-Supervised Teacher-reference-student Architecture for Action Quality Assessment
* Semi-supervised Video Desnowing Network via Temporal Decoupling Experts and Distribution-driven Contrastive Regularization
* Semicalibrated Relative Pose from an Affine Correspondence and Monodepth
* SemiVL: Semi-supervised Semantic Segmentation with Vision-Language Guidance
* Semreg: Semantics Constrained Point Cloud Registration
* Semtrack: A Large-scale Dataset for Semantic Tracking in the Wild
* SENC: Handling Self-collision in Neural Cloth Simulation
* SeqNet: Sequential Networks for One-Shot Traffic Sign Recognition With Transfer Learning
* Sequential polarimetric phase optimization algorithm for dynamic deformation monitoring of landslides
* Sequential Representation Learning via Static-dynamic Conditional Disentanglement
* SES-ReNet: Lightweight deep learning model for human detection in hazy weather conditions
* Sesame: Simple, Easy 3d Object Detection with Point-wise Semantics
* SFFNet: Shallow Feature Fusion Network Based on Detection Framework for Infrared Small Target Detection
* SFPNET: Sparse Focal Point Network for Semantic Segmentation on General Lidar Point Clouds
* SG-NeRF: Neural Surface Reconstruction with Scene Graph Optimization
* Sgs-slam: Semantic Gaussian Splatting for Neural Dense Slam
* SGW-Based Multi-task Learning in Vision Tasks
* Shape from Heat Conduction
* Shape-guided Configuration-aware Learning for Endoscopic-image-based Pose Estimation of Flexible Robotic Instruments
* Shape2scene: 3d Scene Representation Learning Through Pre-training on Shape Data
* Shapefusion: A 3d Diffusion Model for Localized Shape Editing
* Shapellm: Universal 3d Object Understanding for Embodied Interaction
* ShareGPT4V: Improving Large Multi-modal Models with Better Captions
* Shedding More Light on Robust Classifiers Under the Lens of Energy-based Models
* Sherl: Synthesizing High Accuracy and Efficient Memory for Resource-limited Transfer Learning
* SHIC: Shape-Image Correspondences with No Keypoint Supervision
* Shifted Autoencoders for Point Annotation Restoration in Object Counting
* Shine: Saliency-aware Hierarchical Negative Ranking for Compositional Temporal Grounding
* Shoemodel: Learning to Wear on the User-specified Shoes via Diffusion Model
* Siamese Vision Transformers are Scalable Audio-Visual Learners
* SiamMAF: A multipath and feature-enhanced thermal infrared tracker
* Side-Scan Sonar Image Generation Under Zero and Few Samples for Underwater Target Detection
* Sigma: Sinkhorn-guided Masked Video Modeling
* Signavatars: A Large-scale 3d Sign Language Holistic Motion Dataset and Benchmark
* Signgen: End-to-end Sign Language Video Generation with Latent Diffusion
* Silc: Improving Vision Language Pretraining with Self-distillation
* SIMBA: Split Inference: mechanisms, Benchmarks and Attacks
* Similarity of Neural Architectures Using Adversarial Attack Transferability
* SimLOG: Simultaneous Local-Global Feature Learning for 3D Object Detection in Indoor Point Clouds
* Simpb: A Single Model for 2d and 3d Object Detection from Multiple Cameras
* Simple Background Augmentation Method for Object Detection with Diffusion Model, A
* Simple Baseline for Spoken Language to Sign Language Translation with 3d Avatars, A
* simple but effective vision transformer framework for visible-infrared person re-identification, A
* Simple Finetuning Strategy Based on Bias-variance Ratios of Layer-wise Gradients, A
* Simple Latent Diffusion Approach for Panoptic Segmentation and Mask Inpainting, A
* Simple Low-bit Quantization Framework for Video Snapshot Compressive Imaging, A
* Simple Unsupervised Knowledge Distillation With Space Similarity
* Simplifying Source-free Domain Adaptation for Object Detection: Effective Self-training Strategies and Performance Insights
* Simulation Framework for Urban Electric Mobility Based on Limited Widespread Data and Spatial Information, A
* Simultaneous image denoising and completion through convolutional sparse representation and nonlocal self-similarity
* Sinder: Repairing the Singular Defects of Dinov2
* Single-mask Inpainting for Voxel-based Neural Radiance Fields
* Single-photon 3d Imaging with Equi-depth Photon Histograms
* SIT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers
* Situated Instruction Following
* Six-point Method for Multi-camera Systems with Reduced Solution Space
* Six-Year (2014-2020) Statistical Correlation Study of VLF Terminator Time Shift with Earthquakes in Japan, A
* Skateformer: Skeletal-temporal Transformer for Human Action Recognition
* Skeleton Recall Loss for Connectivity Conserving and Resource Efficient Segmentation of Thin Tubular Structures
* Skeleton-based Group Activity Recognition via Spatial-Temporal Panoramic Graph
* Sketch2vox: Learning 3D Reconstruction from a Single Monocular Sketch
* Skews in the Phenomenon Space Hinder Generalization in Text-to-image Generation
* Sky's the Limit: Relightable Outdoor Scenes via a Sky-pixel Constrained Illumination Prior and Outside-in Visibility, The
* Skymask: Attack-Agnostic Robust Federated Learning with Fine-grained Learnable Masks
* Skyscenes: A Synthetic Dataset for Aerial Scene Understanding
* Slack: Semantic, Location, and Appearance Aware Open-vocabulary Tracking
* SLEDGE: Synthesizing Driving Environments with Generative Models and Rule-based Traffic
* SLGD-Loop: A Semantic Local and Global Descriptor-Based Loop Closure Detection for Long-Term Autonomy
* SLIM: Spuriousness Mitigation with Minimal Human Annotations
* Slimflow: Training Smaller One-step Diffusion Models with Rectified Flow
* Slotlifter: Slot-guided Feature Lifting for Learning Object-centric Radiance Fields
* Small object change detection in UAV imagery via a Siamese network enhanced with temporal mutual attention and contextual features: A case study concerning solar water heaters
* Small Object Detection in UAV Remote Sensing Images Based on Intra-Group Multi-Scale Fusion Attention and Adaptive Weighted Feature Fusion Mechanism
* Smart Battery Swapping Control for an Electric Motorcycle Fleet With Peak Time Based on Deep Reinforcement Learning
* Smartcontrol: Enhancing Controlnet for Handling Rough Visual Conditions
* SMC-NCA: Semantic-Guided Multi-Level Contrast for Semi-Supervised Temporal Action Segmentation
* Smfanet: A Lightweight Self-modulation Feature Aggregation Network for Efficient Image Super-resolution
* Smile: Leveraging Submodular Mutual Information For Robust Few-shot Object Detection
* Smoodi: Stylized Motion Diffusion Model
* Smoothness, Synthesis, and Sampling: Re-thinking Unsupervised Multi-view Stereo with DIV Loss
* SMPC-Based Motion Planning of Automated Vehicle When Interacting With Occluded Pedestrians
* SNeRV: Spectra-Preserving Neural Representation for Video
* Snow depth retrieval method for PolSAR data using multi-parameters snow backscattering model
* SNP: Structured Neuron-level Pruning to Preserve Attention Scores
* Snuffy: Efficient Whole Slide Image Classifier
* Sociality Probe: Game-Theoretic Inverse Reinforcement Learning for Modeling and Quantifying Social Patterns in Driving Interaction
* Soft Prompt Generation for Domain Generalization
* Soft Shadow Diffusion (SSD): Physics-inspired Learning for 3d Computational Periscopy
* SoftFormer: SAR-optical fusion transformer for urban land use and land cover classification
* Solving Motion Planning Tasks with a Scalable Generative Model
* Solving the Inverse Problem of Microscopy Deconvolution with a Residual Beylkin-Coifman-Rokhlin Neural Network
* Sos: Segment Object System for Open-world Instance Segmentation with Object Priors
* Source Prompt Disentangled Inversion for Boosting Image Editability with Diffusion Models
* Source-free Domain-invariant Performance Prediction
* Space-Time Analysis of the COVID-19 Pandemic and Its Relationship with Socioeconomic and Demographic Variables in the Metropolitan Region of São Paulo, Brazil
* Spaceborne Passive Localization Algorithm Based on MSD-HOUGH for Multiple Signal Sources, A
* Spacecraft Parabolic Antenna Payload Orientation Estimation Method Based on the Step Effect of Measured Radar Cross Section Sequences, The
* SpaceJam: A Lightweight and Regularization-Free Method for Fast Joint Alignment of Images
* Spamming Labels: Efficient Annotations for the Trackers of Tomorrow
* Sparo: Selective Attention for Robust and Compositional Transformer Encodings for Vision
* SPARP: Fast 3d Object Reconstruction and Pose Estimation from Sparse Views
* Sparse Beats Dense: Rethinking Supervision in Radar-camera Depth Completion
* Sparse Domain Transfer via Elastic Net Regularization
* Sparse Pedestrian Character Learning for Trajectory Prediction
* Sparse Refinement for Efficient High-resolution Semantic Segmentation
* Sparsecraft: Few-shot Neural Reconstruction Through Stereopsis Guided Geometric Linearization
* Sparsectrl: Adding Sparse Controls to Text-to-video Diffusion Models
* Sparselif: High-performance Sparse Lidar-camera Fusion for 3d Object Detection
* Sparseradnet: Sparse Perception Neural Network on Subsampled Radar Data
* Sparsessp: 3d Subcellular Structure Prediction from Sparse-view Transmitted Light Images
* Spatial and Temporal Characterization of Near Space Temperature and Humidity and Their Driving Influences
* Spatial and Temporal Variation of GPP and Its Response to Urban Environmental Changes in Beijing
* Spatial attention for human-centric visual understanding: An Information Bottleneck method
* Spatial Equity Disparities of Work Commuting Based on Job Accessibility in Chengdu, China
* Spatial-Temporal Changes in Ecosystem Service Value and Its Overlap with Coal Mining Intensity in the Yellow River Basin, China, During 2000-2030
* Spatial-temporal Multi-level Association for Video Object Segmentation
* Spatial-Temporal Traffic Prediction With an Interactive Spatial-Enhanced Graph Convolutional Network Model
* Spatialformer: Towards Generalizable Vision Transformers with Explicit Spatial Understanding
* Spatially-variant Degradation Model for Dataset-free Super-resolution
* Spatio-Temporal Approach With Self-Corrective Causal Inference for Flight Delay Prediction, A
* Spatio-Temporal Generalization of VIS-NIR-SWIR Spectral Models for Nitrogen Prediction in Sugarcane Leaves
* Spatio-temporal interactive reasoning model for multi-group activity recognition
* Spatio-temporal Proximity-aware Dual-path Model for Panoramic Activity Recognition
* Spatio-Temporal Variation in Pluvial Flash Flood Risk in the Lhasa River Basin, 1991-2020
* Spatiotemporal Dynamic Analysis of Eutrophication Status Based on Machine Learning-Based Retrieval Algorithm: Case Study in Liangzi Lake, Hubei, China
* Spatiotemporal Dynamics and Prediction of Habitat Quality Based on Land Use and Cover Change in Jiangsu, China
* Spatiotemporal Dynamics of Water Quality: Long-Term Assessment Using Water Quality Indices and GIS
* Spatiotemporal Ego-Graph Domain Adaptation for Traffic Prediction With Data Missing
* Spatiotemporal Evolution of the Mudflat Wetland in the Yellow Sea Using Landsat Time Series, The
* Spatiotemporal Pooling on Appropriate Topological Maps Represented as Two-dimensional Images for EEG Classification
* Spatiotemporal Relationship Between Land Subsidence and Ecological Environmental Quality in Shenfu Mining Area, Loess Plateau, China
* Spatiotemporal Variations and Driving Factors of Water Availability in the Arid and Semiarid Regions of Northern China
* Spcolor: Semantic prior guided exemplar-based image colorization
* Specformer: Guarding Vision Transformer Robustness via Maximum Singular Value Penalization
* Special section: Best papers of the 15th Mexican conference on pattern recognition (MCPR) 2023
* Spectral Modality-aware Interactive Fusion Network for HSI Super-resolution
* Spectral Subsurface Scattering for Material Classification
* Spectral-Spatial Attention Alignment for Multi-Source Domain Adaptation in EEG-Based Emotion Recognition
* Spectram-PS: Spectrally Multiplexed Photometric Stereo Under Unknown Spectral Composition
* Specular highlight removal using Quaternion transformer
* Speech-Driven Gesture Generation Using Transformer-Based Denoising Diffusion Probabilistic Models
* Speedupnet: A Plug-and-play Adapter Network for Accelerating Text-to-image Diffusion Models
* Spherehead: Stable 3d Full-head Synthesis with Spherical Tri-plane Representation
* Spherical Linear Interpolation and Text-Anchoring for Zero-Shot Composed Image Retrieval
* Spherical World-locking for Audio-visual Localization in Egocentric Videos
* Sphinx: A Mixer of Weights, Visual Embeddings and Image Scales for Multi-modal Large Language Models
* Spike-temporal Latent Representation for Energy-efficient Event-to-video Reconstruction
* Spikegs: Learning 3d Gaussian Fields from Continuous Spike Stream
* SpikeODE: Image Reconstruction for Spike Camera With Neural Ordinary Differential Equation
* SpikeTOD: A Biologically Interpretable Spike-Driven Object Detection in Challenging Traffic Scenarios
* Spiking Wavelet Transformer
* Spin: Hierarchical Segmentation with Subpart Granularity in Natural Images
* Spire: Semantic Prompt-driven Image Restoration
* Splatfields: Neural Gaussian Splats for Sparse 3d and 4d Reconstruction
* Spline-based Transformers
* Spotlight on Small-scale Ship Detection: Empowering YOLO with Advanced Techniques and a Novel Dataset
* SPVLOC: Semantic Panoramic Viewport Matching for 6d Camera Localization in Unseen Environments
* SQ-LLAVA: Self-questioning for Large Vision-language Assistant
* SRIL: Selective Regularization for Class-incremental Learning
* Srpose: Two-view Relative Pose Estimation with Sparse Keypoints
* SRRT: Exploring Search Region Regulation for Visual Object Tracking
* SSFAN: A Compact and Efficient Spectral-Spatial Feature Extraction and Attention-Based Neural Network for Hyperspectral Image Classification
* SSL-Cleanse: Trojan Detection and Mitigation in Self-supervised Learning
* SSL-CPCD: Self-Supervised Learning With Composite Pretext-Class Discrimination for Improved Generalisability in Endoscopic Image Analysis
* Ssthyper: Sparse Spectral Transformer for Hyperspectral Image Reconstruction
* ST-LDM: A Universal Framework for Text-grounded Object Generation in Real Images
* ST-LLM: Large Language Models Are Effective Temporal Learners
* Stabilizing and Accelerating Federated Learning on Heterogeneous Data With Partial Client Participation
* Stable Preference: Redefining Training Paradigm of Human Preference Model for Text-to-image Synthesis
* Stable Single-View 3d Human Digitization via Explicit Geometric Field with Semantic Guidance
* Stable Video Portraits
* Stabledrag: Stable Dragging for Point-based Image Editing
* STAF: 3D Human Mesh Recovery From Video With Spatio-Temporal Alignment Fusion
* STAG4D: Spatial-Temporal Anchored Generative 4D Gaussians
* STAMP: Outlier-aware Test-time Adaptation with Stable Memory Replay
* STAR-RL: Spatial-Temporal Hierarchical Reinforcement Learning for Interpretable Pathology Image Super-Resolution
* Statewide Visual Geolocalization in the Wild
* Statistical Analysis of Atmospheric Delay Gradient and Rainfall Prediction in a Tropical Region
* Stepping Stones: A Progressive Training Strategy for Audio-visual Semantic Segmentation
* Stepwise Multi-grained Boundary Detector for Point-supervised Temporal Action Localization
* Stereoglue: Robust Estimation with Single-point Solvers
* Stitched VITS are Flexible Vision Backbones
* STMGF-Net: A Spatiotemporal Multi-Graph Fusion Network for Vessel Trajectory Forecasting in Intelligent Maritime Navigation
* Storyimager: A Unified and Efficient Framework for Coherent Story Visualization and Completion
* Straightforward Layer-wise Pruning for More Efficient Visual Adaptation
* STRANet: Soft-Target and Restriction-Aware Neural Network for Efficient VVC Intra Coding
* Stream Query Denoising for Vectorized HD-map Construction
* Streammotp: Streaming and Unified Framework for Joint 3d Multi-object Tracking and Trajectory Prediction
* Street Gaussians: Modeling Dynamic Urban Scenes with Gaussian Splatting
* Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization
* Strike a Balance in Continual Panoptic Segmentation
* Strike the Balance: On-the-fly Uncertainty Based User Interactions for Long-term Video Object Segmentation
* Stripe Observation Guided Inference Cost-free Attention Mechanism
* Strong but Simple: A Baseline for Domain Generalized Dense Perception by CLIP-based Transfer Learning
* Struck-out handwritten word detection and restoration for automatic descriptive answer evaluation
* Structldm: Structured Latent Diffusion for 3d Human Generation
* Structural Complexity Significantly Impacts Canopy Reflectance Simulations as Revealed from Reconstructed and Sentinel-2-Monitored Scenes in a Temperate Deciduous Forest
* Structure Synchronized Dynamic Event-Triggered Control for Marine Ranching AMVs via the Multi-Task Switching Guidance
* Structure-centric Robust Monocular Depth Estimation via Knowledge Distillation
* Structured-NeRF: Hierarchical Scene Graph with Neural Representation
* STSP: Spatial-temporal Subspace Projection for Video Class-incremental Learning
* Study on the Feasibility and Performance Evaluation of High-Orbit Spacecraft Orbit Determination Based on GNSS/SLR/VLBI
* Style Optimization Networks for real-time semantic segmentation of rainy and foggy weather
* Style Reconstruction-Driven Networks for Occlusion-Aware License Plate Recognition
* Style-extracting Diffusion Models for Semi-supervised Histopathology Segmentation
* Stylecity: Large-scale 3d Urban Scenes Stylization
* Styleclip-based Facial Emotion Manipulation Method for Discrepant Emotion Transitions, A
* Styletokenizer: Defining Image Style by a Single Instance for Controlling Diffusion Models
* Submarine Landslide Identification Based on Improved DeepLabv3 with Spatial and Channel Attention
* Subspace Prototype Guidance for Mitigating Class Imbalance in Point Cloud Semantic Segmentation
* Subtask Prior-Driven Optimized Mechanism on Joint Video Moment Retrieval and Highlight Detection
* Successful Precipitation Downscaling Through an Innovative Transformer-Based Model
* SUMix: Mixup with Semantic and Uncertain Information
* SUP-NeRF: A Streamlined Unification of Pose Estimation and NeRF for Monocular 3d Object Reconstruction
* Superfednas: Cost-efficient Federated Neural Architecture Search for On-device Inference
* Supergaussian: Repurposing Video Models for 3d Super Resolution
* Superpixel Graph Contrastive Clustering With Semantic-Invariant Augmentations for Hyperspectral Images
* Superpixel-informed Implicit Neural Representation for Multi-dimensional Data
* SURF-D: Generating High-quality Surfaces of Arbitrary Topologies Using Diffusion Models
* Surface Profile Recovery from Electromagnetic Fields with Physics-Informed Neural Networks
* Surface Reconstruction from 3d Gaussian Splatting via Local Structural Hints
* Surface-centric Modeling for High-fidelity Generalizable Neural Surface Reconstruction
* Surfocc: Surface-based Feature Lifting for Vision-centric 3d Occupancy Prediction
* Survey of Multi-Vehicle Consensus in Uncertain Networks for Autonomous Driving, A
* Survey on Recent Advancements in Autonomous Driving Using Deep Reinforcement Learning: Applications, Challenges, and Solutions, A
* SUR^2F: A Hybrid Representation for High-quality and Efficient Surface Reconstruction from Multi-view Images
* SV3D: Novel Multi-view Synthesis and 3d Generation from a Single Image Using Latent Video Diffusion
* SVC: Sight view constraint for robust point cloud registration
* Swag: Splatting in the Wild Images with Appearance-conditioned Gaussians
* Swapanything: Enabling Arbitrary Object Swapping in Personalized Image Editing
* SWC-Net and Multi-Phase Heterogeneous FDTD Model for Void Detection Underneath Airport Pavement Slab
* Sweepnet: Unsupervised Learning Shape Abstraction via Neural Sweepers
* Swiftbrush V2: Make Your One-step Diffusion Model Better Than Its Teacher
* Swings: Sliding Windows for Dynamic 3d Gaussian Splatting
* SwissCheese: Fine-Grained Channel-Spatial Feature Filtering for Communication-Efficient Cooperative Perception
* Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-experts
* Symbolic Visual Reinforcement Learning: A Scalable Framework With Object-Level Abstraction and Differentiable Expression Search
* Syn-to-real Domain Adaptation for Point Cloud Completion via Part-based Approach
* Sync from the Sea: Retrieving Alignable Videos from Large-scale Datasets
* Synchronization Is All You Need: Exocentric-to-egocentric Transfer for Temporal Action Segmentation with Unlabeled Synchronized Video Pairs
* Synchronization of Projective Transformations
* Synchronous Diffusion for Unsupervised Smooth Non-rigid 3d Shape Matching
* Synergistic Coupling of Multi-Source Remote Sensing Data for Sandy Land Detection and Multi-Indicator Integrated Evaluation
* Synergizing Autonomous and Traditional Vehicles: A Systematic Review of Advances and Challenges in Traffic Flow Management With Signalized Intersections
* Synergy of Sight and Semantics: Visual Intention Understanding with CLIP
* SynFlowMap: A synchronized optical flow remapping for video motion magnification
* SynSem-ASTE: An Enhanced Multi-Encoder Network for Aspect Sentiment Triplet Extraction With Syntax and Semantics
* Synthesis of Safety and Ride Comfort Control for Chassis of Maglev Trains
* Synthesizing Environment-specific People in Photographs
* Synthesizing Time-varying BRDFS via Latent Space
* S^3d-nerf: Single-shot Speech-driven Neural Radiance Field for High Fidelity Talking Head Synthesis
* T-corresnet: Template Guided 3d Point Cloud Completion with Correspondence Pooling Query Generation Strategy
* T-MAE: Temporal Masked Autoencoders for Point Cloud Representation Learning
* T-REX2: Towards Generic Object Detection via Text-visual Prompt Synergy
* T-Shaped CAN Feature Integration With Lightweight Deep Learning Model for In-Vehicle Network Intrusion Detection
* T2ishield: Defending Against Backdoors on Text-to-image Diffusion Models
* T2TD: Text-3D Generation Model Based on Prior Knowledge Guidance
* Table Transformers for imputing textual attributes
* Tackling Structural Hallucination in Image Translation with Local Diffusion
* TAE: Task-aware Expandable Representation for Long Tail Class Incremental Learning
* Tag: Text Prompt Augmentation for Zero-shot Out-of-distribution Detection
* Tails Tell Tales: Chapter-wide Manga Transcriptions with Character Names
* Take a Step Back: Rethinking the Two Stages in Visual Reasoning
* Talkinggaussian: Structure-persistent 3d Talking Head Synthesis via Gaussian Splatting
* Taming CLIP for Fine-grained and Structured Visual Understanding of Museum Exhibits
* Taming Latent Diffusion Model for Neural Radiance Field Inpainting
* Taming Lookup Tables for Efficient Image Retouching
* Tanet: Triplet Attention Network for All-in-one Adverse Weather Image Restoration
* TAPS: Temporal Attention-based Pruning and Scaling for Efficient Video Action Recognition
* TAPTR: Tracking Any Point with Transformers as Detection
* Targeted adversarial attack on classic vision pipelines
* Task Is Worth One Word: Learning with Task Prompts for High-quality Versatile Image Inpainting, A
* Task-driven Uncertainty Quantification in Inverse Problems via Conformal Prediction
* Task-Specific Importance-Awareness Matters: On Targeted Attacks Against Object Detection
* TC4D: Trajectory-Conditioned Text-to-4D Generation
* Tcan: Animating Human Images with Temporally Consistent Pose Guidance Using Diffusion Models
* TCC-DET: Temporarily Consistent Cues for Weakly-supervised 3d Detection
* TCL-Net: A Lightweight and Efficient Dehazing Network with Frequency-domain Fusion and Multi-angle Attention
* Tclc-gs: Tightly Coupled Lidar-camera Gaussian Splatting for Autonomous Driving: Supplementary Materials
* Teach CLIP to Develop a Number Sense for Ordinal Regression
* Teaching Segment-Anything-Model Domain-Specific Knowledge for Road Crack Segmentation From On-Board Cameras
* Teaching Tailored to Talent: Adverse Weather Restoration via Prompt Pool and Depth-anything Constraint
* Teddy: Efficient Large-scale Dataset Distillation via Taylor-approximated Matching
* Tela: Text to Layer-wise 3d Clothed Human Generation
* Telling Stories for Common Sense Zero-shot Action Recognition
* Temporal As a Plugin: Unsupervised Video Denoising with Pre-trained Image Denoisers
* Temporal Associations Between Polarimetric Updraft Proxies and Signatures of Inflow and Hail in Supercells
* Temporal Dynamic Synchronous Functional Brain Network for Schizophrenia Classification and Lateralization Analysis
* Temporal Event Stereo via Joint Learning with Stereoscopic Flow
* Temporal Residual Guided Diffusion Framework for Event-driven Video Reconstruction
* Temporal Residual Jacobians for Rig-free Motion Transfer
* Temporal-mapping Photography for Event Cameras
* Temporally Consistent Referring Video Object Segmentation With Hybrid Memory
* Temporally Consistent Stereo Matching
* Tendency-driven Mutual Exclusivity for Weakly Supervised Incremental Semantic Segmentation
* Tensor Coupled Learning of Incomplete Longitudinal Features and Labels for Clinical Score Regression
* Tensor-Based Few-Shot Learning for Cross-Domain Hyperspectral Image Classification
* Tensorial Template Matching for Fast Cross-correlation with Rotations and Its Application for Tomography
* Terrestrial Photogrammetry-GIS Methodology for Measuring Rill Erosion at the Sparacia Experimental Area, Sicily
* Test-time Model Adaptation for Image Reconstruction Using Self-supervised Adaptive Layers
* Test-time Stain Adaptation with Diffusion Models for Histopathology Image Classification
* Tetradiffusion: Tetrahedral Diffusion Models for 3d Shape Generation
* TEXDC: Text-driven Disease-aware 4d Cardiac Cine MRI Images Generation
* Texdreamer: Towards Zero-shot High-fidelity 3d Human Texture Generation
* Texgen: Text-guided 3d Texture Generation with Multi-view Sampling and Resampling
* Text in the dark: Extremely low-light text image enhancement
* Text Motion Translator: A Bi-directional Model for Enhanced 3d Human Motion Generation from Open-vocabulary Descriptions
* Text Query to Web Image to Video: A Comprehensive Ad-hoc Video Search
* Text-anchored Score Composition: Tackling Condition Misalignment in Text-to-image Diffusion Models
* Text-augmented Multi-Modality contrastive learning for unsupervised visible-infrared person re-identification
* Text-conditioned Resampler For Long Form Video Understanding
* Text-free diffusion inpainting using reference images for enhanced visual fidelity
* Text-guided Video Masked Autoencoder
* Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human Expression
* Text-video retrieval re-ranking via multi-grained cross attention and frozen image encoders
* Text2LiDAR: Text-guided Lidar Point Cloud Generation via Equirectangular Transformer
* Text2place: Affordance-aware Text Guided Human Placement
* Textdiffuser-2: Unleashing the Power of Language Models for Text Rendering
* Textual Grounding for Open-vocabulary Visual Information Extraction in Layout-Diversified Documents
* Textual Knowledge Matters: Cross-modality Co-teaching for Generalized Visual Class Discovery
* Textual Query-driven Mask Transformer for Domain Generalized Segmentation
* Textual-visual Logic Challenge: Understanding and Reasoning in Text-to-image Generation
* Texture-GS: Disentangling the Geometry and Texture for 3d Gaussian Splatting Editing
* TF-FAS: Twofold-element Fine-grained Semantic Guidance for Generalizable Face Anti-spoofing
* TF²: Few-Shot Text-Free Training-Free Defect Image Generation for Industrial Anomaly Inspection
* Tgcm: Cross-domain Few-shot Semantic Segmentation via One-shot Target Guided Cutmix
* Thermal3D-GS: Physics-induced 3d Gaussians for Thermal Infrared Novel-View Synthesis
* thin cloud blind correction method coupling a physical model with unsupervised deep learning for remote sensing imagery, A
* Think Before Placement: Common Sense Enhanced Transformer for Object Placement
* Think2drive: Efficient Reinforcement Learning by Thinking with Latent World Model for Autonomous Driving (in Carla-v2)
* Thinking Outside the Bbox: Unconstrained Generative Object Compositing
* This Probably Looks Exactly Like That: An Invertible Prototypical Network
* Three Things We Need to Know About Transferring Stable Diffusion to Visual Dense Prediction Tasks
* Three Years of Google Earth Engine-Based Archaeological Surveys in Iraqi Kurdistan: Results from the Ground
* Three-Dimensional Surface Motion Displacement Estimation of the Muz Taw Glacier, Sawir Mountains
* Tibet: Identifying and Evaluating Biases in Text-to-image Generative Models
* Tight and Efficient Upper Bound on Spectral Norm of Convolutional Layers
* Tightly Coupled LIDAR/IMU/UWB Fusion via Resilient Factor Graph for Quadruped Robot Positioning
* Time series sUAV data reveal moderate accuracy and large uncertainties in spring phenology metric of deciduous broadleaf forest as estimated by vegetation index-based phenological models
* Time-efficient and Identity-consistent Virtual Try-on Using A Variant of Altered Diffusion Models
* Time-Event-Memory Triggered Switching Dynamic Positioning for Unmanned Marine Vehicles with Mass-Switched and Dual-Source Disturbances
* Timecraft: Navigate Weakly-supervised Temporal Grounded Video Question Answering via Bi-directional Reasoning
* Timelens-xl: Real-time Event-based Video Frame Interpolation with Large Motion
* Timestep-aware Correction for Quantized Diffusion Models
* Tiny Models are the Computational Saver for Large Models
* TIP: Tabular-Image Pre-Training for Multimodal Classification with Incomplete Data
* Tlcontrol: Trajectory and Language Control for Human Motion Synthesis
* To Generate or Not? Safety-driven Unlearned Diffusion Models Are Still Easy to Generate Unsafe Images ... For Now
* To Supervise or Not to Supervise: Understanding and Addressing the Key Challenges of Point Cloud Transfer Learning
* Tod3cap: Towards 3d Dense Captioning in Outdoor Scenes
* Token Compensator: Altering Inference Cost of Vision Transformer Without Re-tuning
* Tokenize Anything via Prompting
* Topo4d: Topology-preserving Gaussian Splatting for High-fidelity 4d Head Capture
* Topology reorganized graph contrastive learning with mitigating semantic drift
* Topology-preserving Downsampling of Binary Images
* Toward Human-Scale Magnetic Particle Imaging: Development of the First System With Superconductor- Based Selection Coils
* Toward Int4 Fixed-point Training via Exploring Quantization Error for Gradients
* Toward Open Vocabulary Aerial Object Detection with CLIP-activated Student-teacher Learning
* Toward Quantifiable Face Age Transformation Under Attribute Unbias
* Toward Resilient Electric Vehicle Charging Monitoring Systems: Curriculum Guided Multi-Feature Fusion Transformer
* Toward Robust 3D Perception for Autonomous Vehicles: A Review of Adversarial Attacks and Countermeasures
* Toward Tiny and High-quality Facial Makeup with Data Amplify Learning
* Towards 3D Reconstruction of Multi-Shaped Tunnels Utilizing Mobile Laser Scanning Data
* Towards a Density Preserving Objective Function for Learning on Point Sets
* Towards Adaptive Pseudo-label Learning for Semi-supervised Temporal Action Localization
* Towards Certifiably Robust Face Recognition
* Towards Compact Reversible Image Representations for Neural Style Transfer
* Towards Dual Transparent Liquid Level Estimation in Biomedical Lab: Dataset, Methods and Practices
* Towards Generalised and Incremental Bias Mitigation in Personality Computing
* Towards High-quality 3d Motion Transfer with Realistic Apparel Animation
* Towards Image Ambient Lighting Normalization
* Towards Latent Masked Image Modeling for Self-supervised Visual Representation Learning
* Towards Model-agnostic Dataset Condensation by Heterogeneous Models
* Towards More Practical Group Activity Detection: A New Benchmark and Model
* Towards Multi-modal Transformers in Federated Learning
* Towards Multimodal Open-set Domain Generalization and Adaptation Through Self-supervision
* Towards Multimodal Sentiment Analysis Debiasing via Bias Purification
* Towards Natural Language-guided Drones: Geotext-1652 Benchmark with Spatial Relation Matching
* Towards Neuro-symbolic Video Understanding
* Towards Open Domain Text-driven Synthesis of Multi-person Motions
* Towards Open-ended Visual Quality Comparison
* Towards Open-ended Visual Recognition with Large Language Models
* Towards Open-world Object-based Anomaly Detection via Self-supervised Outlier Synthesis
* Towards Physical World Backdoor Attacks Against Skeleton Action Recognition
* Towards Real-world Adverse Weather Image Restoration: Enhancing Clearness and Semantics with Vision-language Models
* Towards Real-world Event-guided Low-light Video Enhancement and Deblurring
* Towards Reliable Advertising Image Generation Using Human Feedback
* Towards Reliable Evaluation and Fast Training of Robust Semantic Segmentation Models
* Towards Robust Event-based Networks for Nighttime via Unpaired Day-to-night Event Translation
* Towards Robust Full Low-bit Quantization of Super Resolution Networks
* Towards Scene Graph Anticipation
* Towards Stable 3d Object Detection
* Towards Unified Representation of Invariant-specific Features in Missing Modality Face Anti-spoofing
* Towards Weakly Supervised Text-to-Audio Grounding
* TP2O: Creative Text Pair-to-object Generation Using Balance Swap-Sampling
* TPA3D: Triplane Attention for Fast Text-to-3d Generation
* TPTrans: Vessel Trajectory Prediction Model Based on Transformer Using AIS Data
* Track Everything Everywhere Fast and Robustly
* Track2act: Predicting Point Tracks from Internet Videos Enables Generalizable Robot Manipulation
* Trackastra: Transformer-based Cell Tracking for Live-cell Microscopy
* Tracking Correction Method for Rapid and Random Protein Molecules Movement
* Tracking Meets LoRA: Faster Training, Larger Model, Stronger Performance
* Tracking Reflected Objects: A Benchmark
* TrackNeRF: Bundle Adjusting NeRF from Sparse and Noisy Views via Feature Tracks
* Traffic Signal Cycle Control With Centralized Critic and Decentralized Actors Under Varying Intervention Frequencies
* Trafficnight: An Aerial Multimodal Benchmark for Nighttime Vehicle Surveillance
* Train Till You Drop: Towards Stable and Robust Source-free Unsupervised 3d Domain Adaptation
* Trainable Highly-expressive Activation Functions
* Training A Secure Model Against Data-free Model Extraction
* Training A Small Emotional Vision Language Model for Visual Art Comprehension
* Training-free Composite Scene Generation for Layout-to-image Synthesis
* Training-free Model Merging for Multi-target Domain Adaptation
* Training-Free Video Temporal Grounding Using Large-Scale Pre-Trained Models
* Trajectory Prediction and Risk Assessment in Car-Following Scenarios Using a Noise-Enhanced Generative Adversarial Network
* Trajectory-aligned Space-time Tokens for Few-shot Action Recognition
* Trajprompt: Aligning Color Trajectory with Vision-language Representations
* Tram: Global Trajectory and Motion of 3d Humans from in-the-wild Videos
* Transcad: A Hierarchical Transformer for CAD Sequence Inference from Point Clouds
* Transferable 3d Adversarial Shape Completion Using Diffusion Models
* Transformer fusion for indoor RGB-D semantic segmentation
* Transformer RGBT Tracking With Spatio-Temporal Multimodal Tokens
* Transformer-Based Light Field Salient Object Detection and Its Application to Autofocus
* Transformer-Based Stereo-Aware 3D Object Detection From Binocular Images
* Transfusion: A Transparency-based Diffusion Model for Anomaly Detection
* TransKD: Transformer Knowledge Distillation for Efficient Semantic Segmentation
* TransMatch: Transformer-based correspondence pruning via local and global consensus
* Transportmer: A Holistic Approach to Trajectory Understanding in Multi-Agent Sports
* TransWild: Enhancing 3D interacting hands recovery in the wild with IoU-guided Transformer
* TravelRAG: A Tourist Attraction Retrieval Framework Based on Multi-Layer Knowledge Graph
* Tree-d Fusion: Simulation-ready Tree Dataset from Single Images with Diffusion Priors
* Treesba: Tree-transformer for Self-supervised Sequential Brick Assembly
* Trinerflet: A Wavelet Based Triplane NeRF Representation
* Triple-Stream Commonsense Circulation Transformer Network for Image Captioning
* Tri^2-plane: Thinking Head Avatar via Feature Pyramid
* TROJVLM: Backdoor Attack Against Vision Language Models
* Trust-Based Dynamic Leader Selection Mechanism for Enhanced Performance in Flying Ad-Hoc Networks (FANETs)
* TrVLR: A Transformer-Based Vehicle Light Recognition Method in Vehicle Inspection
* TTAGaze: Self-Supervised Test-Time Adaptation for Personalized Gaze Estimation
* TTD: Text-tag Self-distillation Enhancing Image-text Alignment in CLIP to Alleviate Single Tag Bias
* TTDNet: An End-to-End Traffic Text Detection Framework for Open Driving Environments
* TTT-MIM: Test-time Training with Masked Image Modeling for Denoising Distribution Shifts
* Tunevlseg: Prompt Tuning Benchmark for Vision-language Segmentation Models
* Tuning-free Image Customization with Image and Text Guidance
* Turbo: Informativity-driven Acceleration Plug-in for Vision-language Large Models
* Turboedit: Instant Text-based Image Editing
* Two-Dimensional Lane-Changing Dynamics Model Based on Force, A
* Two-stage Active Learning for Efficient Temporal Action Segmentation
* Two-stage Video Shadow Detection via Temporal-spatial Adaption
* U-cope: Taking a Further Step to Universal 9d Category-level Object Pose Estimation
* U-Shaped Distribution Guided Sign Language Emotion Recognition With Semantic and Movement Features
* Uage: A Supervised Contrastive Method for Unconstrained Adaptive Gaze Estimation
* UAV-based sparse viewpoint planning framework for detailed 3D modelling of cultural heritage monuments, A
* ucap: An Unsupervised Prompting Method for Vision-language Models
* UCIP: A Universal Framework for Compressed Image Super-resolution Using Dynamic Prompt
* Uda-bench: Revisiting Common Assumptions in Unsupervised Domain Adaptation Using a Standardized Framework
* Udifftext: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models
* UGG: Unified Generative Grasping
* UL-VIO: Ultra-lightweight Visual-inertial Odometry with Noise Robust Test-time Adaptation
* Ultra-Lightweight Automatic License Plate Recognition System for Microcontrollers: A Cost-Effective and Energy-Efficient Solution
* Ultron: Unifying Local Transformer and Convolution for Large-scale Image Retrieval
* Umbrae: Unified Multimodal Brain Decoding
* Umeregrobust - Universal Manifold Embedding Compatible Features for Robust Point Cloud Registration
* UMG-CLIP: A Unified Multi-granularity Vision Generalist for Open-world Understanding
* Ump: Unified Modality-Aware Prompt Tuning for Text-Video Retrieval
* UN-EVIMO: Unsupervised Event-based Independent Motion Segmentation
* Uncertainty Calibration with Energy Based Instance-wise Scaling in the Wild Dataset
* Uncertainty guided test-time training for face forgery detection
* Uncertainty Modeling for Plane and Line Features to Improve Consistency in RTK/INS/LiDAR Integrated Navigation
* Uncertainty Modeling of the Transmission Map for Single Image Dehazing
* Uncertainty quantification metrics for deep regression
* Uncertainty-aware Sign Language Video Retrieval with Probability Distribution Modeling
* Uncertainty-driven Spectral Compressive Imaging with Spatial-frequency Transformer
* Underground Mapping and Localization Based on Ground-penetrating Radar
* Understanding adversarial robustness against on-manifold adversarial examples
* Understanding and Mitigating Human-labelling Errors in Supervised Contrastive Learning
* Understanding Episode Hardness in Few-Shot Learning
* Understanding Multi-compositional Learning in Vision and Language Models via Category Theory
* Understanding Physical Dynamics with Counterfactual World Modeling
* Understanding the Impact of Negative Prompts: When and How Do They Take Effect?
* Underutilized Feature Extraction Methods for Burn Severity Mapping: A Comprehensive Evaluation
* Underwater image enhancement via brightness mask-guided multi-attention embedding
* Underwater Image Enhancement via Principal Component Fusion of Foreground and Background
* Unet--: Memory-efficient and Feature-enhanced Network Architecture Based on U-net with Reduced Skip-connections
* UNI3DL: A Unified Model for 3d Vision-language Understanding
* Uni4Eye++: A General Masked Image Modeling Multi-Modal Pre-Training Framework for Ophthalmic Image Classification and Segmentation
* Unic: Universal Classification Models via Multi-teacher Distillation
* UniCal: Unified Neural Sensor Calibration
* Unicode: Learning a Unified Codebook for Multimodal Large Language Models
* Unidream: Unifying Diffusion Priors for Relightable Text-to-3d Generation
* Unified Anomaly Synthesis Strategy with Gradient Ascent for Industrial Anomaly Detection and Localization, A
* Unified Embedding Alignment for Open-vocabulary Video Instance Segmentation
* unified feature-motion consistency framework for robust image matching, A
* unified framework for unsupervised action learning via global-to-local motion transformer, A
* unified framework to stereotyped behavior detection for screening Autism Spectrum Disorder, A
* Unified Image Compression Method for Human Perception and Multiple Vision Tasks, A
* Unified Local-cloud Decision-making via Reinforcement Learning
* Unified Medical Image Pre-training in Language-guided Common Semantic Space
* Unifs: Universal Few-shot Instance Perception with Point Representations
* Unifying 3d Vision-language Understanding via Promptable Queries
* Uniinr: Event-guided Unified Rolling Shutter Correction, Deblurring, and Interpolation
* UNIIR: Training and Benchmarking Universal Multimodal Information Retrievers
* Unikd: Uncertainty-filtered Incremental Knowledge Distillation for Neural Implicit Representation
* Unimd: Towards Unifying Moment Retrieval and Temporal Action Detection
* UniM^2AE: Multi-modal Masked Autoencoders with Unified 3D Representation for 3D Perception in Autonomous Driving
* Uniprocessor: A Text-induced Unified Low-level Image Processor
* unique dielectric constant estimation for lunar surface through PolSAR model-based decomposition, A
* Unit: Backdoor Mitigation via Automated Neural Distribution Tightening
* Unitalker: Scaling up Audio-driven 3d Facial Animation Through A Unified Model
* Unitraj: A Unified Framework for Scalable Vehicle Trajectory Prediction
* universal adapter in segmentation models for transferable landslide mapping, A
* Universal Multi-View Guided Network for Salient Object and Camouflaged Object Detection, A
* Universal Structure of Yolo Series Small Object Detection Models, A
* UniVoxel: Fast Inverse Rendering by Unified Voxelization of Scene Representation
* Unleashing Text-to-image Diffusion Prior for Zero-shot Image Captioning
* Unleashing the Feature Hierarchy Potential: An Efficient Tri-Hybrid Person Search Model
* Unleashing the Potential of the Semantic Latent Space in Diffusion Models for Image Dehazing
* Unleashing the Power of Prompt-driven Nucleus Instance Segmentation
* Unlocking Attributes' Contribution to Successful Camouflage: A Combined Textual and Visual Analysis Strategy
* Unlocking Textual and Visual Wisdom: Open-vocabulary 3d Object Detection Enhanced by Comprehensive Guidance from Text and Image
* Unlocking the Potential of Federated Learning: The Symphony of Dataset Distillation via Deep Generative Latents
* Unlocking the Secrets of Corn: Physiological Responses and Rapid Forecasting in Varied Drought Stress Environments
* Unmanned Aerial Vehicle-Neural Radiance Field (UAV-NeRF): Learning Multiview Drone Three-Dimensional Reconstruction with Neural Radiance Field
* Unmasking Bias in Diffusion Model Training
* Unrolled Decomposed Unpaired Learning for Controllable Low-light Video Enhancement
* Unsqueeze [CLS] Bottleneck to Learn Rich Representations
* Unsupervised Degradation Representation Learning for Unpaired Restoration of Images and Point Clouds
* Unsupervised Dense Prediction Using Differentiable Normalized Cuts
* Unsupervised Dual Deep Hashing With Semantic-Index and Content-Code for Cross-Modal Retrieval
* Unsupervised Exposure Correction
* Unsupervised Moving Object Segmentation with Atmospheric Turbulence
* Unsupervised Multi-modal Medical Image Registration via Invertible Translation
* Unsupervised Representation Learning by Balanced Self Attention Matching
* Unsupervised Variational Translator for Bridging Image Restoration and High-level Vision Tasks
* Unsupervised Video Summarization via Iterative Training and Simplified Gan
* Unsupervised, Online and On-the-fly Anomaly Detection for Non-stationary Image Distributions
* Unveiling Advanced Frequency Disentanglement Paradigm for Low-light Image Enhancement
* Unveiling and Mitigating Memorization in Text-to-image Diffusion Models Through Cross Attention
* Unveiling Illumination Variations During a Lunar Eclipse: Multi-Wavelength Spaceborne Observations of the January 21, 2019 Event
* Unveiling Privacy Risks in Stochastic Neural Networks Training: Effective Image Reconstruction from Gradients
* Unveiling the Power of Self-Supervision for Multi-View Multi-Human Association and Tracking
* Unveiling Typographic Deceptions: Insights of the Typographic Vulnerability in Large Vision-language Models
* Unveiling Urban River Visual Features Through Immersive Virtual Reality: Analyzing Youth Perceptions with UAV Panoramic Imagery
* Unwrap-Net: A deep neural network-based InSAR phase unwrapping method assisted by airborne LiDAR data
* UpFusion: Novel View Diffusion from Unposed Sparse View Observations
* Upose3d: Uncertainty-aware 3d Human Pose Estimation with Cross-view and Temporal Cues
* Upper-body Hierarchical Graph for Skeleton Based Emotion Recognition in Assistive Driving
* Urban Land Use Classification Model Fusing Multimodal Deep Features
* Urban Multi-Scenario Land Use Optimization Simulation Considering Local Climate Zones
* Urban Waterlogging Detection: A Challenging Benchmark and Large-small Model Co-adapter
* URS-NeRF: Unordered Rolling Shutter Bundle Adjustment for Neural Radiance Fields
* Use estimated signal and noise to adjust step size for image restoration
* Using difference features effectively: A multi-task network for exploring change areas and change moments in time series remote sensing images
* Using My Artistic Style? You Must Obtain My Authorization
* Using Space Syntax and GIS to Determine Future Growth Routes of Cities: The Case of the Kyrenia White Zone
* Utilizing LuTan-1 SAR Images to Monitor the Mining-Induced Subsidence and Comparative Analysis with Sentinel-1
* UUD-Fusion: An unsupervised universal image fusion approach via generative diffusion model
* V-IRL: Grounding Virtual Intelligence in Real Life
* V-trans4style: Visual Transition Recommendation for Video Production Style Adaptation
* V2X-Real: A Largs-scale Dataset for Vehicle-to-everything Cooperative Perception
* V2X-ViTv2: Improved Vision Transformers for Vehicle-to-Everything Cooperative Perception
* VAD: A Video Affective Dataset With Danmu
* VADS: Visuo-Adaptive DualStrike attack on visual question answer
* Validating the Precision and Accuracy of Coral Fragment Photogrammetry
* VAMOS: Versatile Action Models for Video Understanding
* Variational Autoencoder with Gaussian Random Field prior: Application to unsupervised animal detection in aerial images
* Vary: Scaling up the Vision Vocabulary for Large Vision-language Model
* VAuth: Robust Lightweight Mutual Authentication Protocol Preserving User's Anonymity for VANET With FPGA Implementation
* VCD-Texture: Variance Alignment Based 3D-2D Co-Denoising for Text-Guided Texturing
* VCP-CLIP: A Visual Context Prompting Model for Zero-shot Anomaly Segmentation
* VDFT: Robust feature matching of aerial and ground images using viewpoint-invariant deformable feature transformation
* VECLIP: Improving CLIP Training via Visual-enriched Captions
* Vegetation Greening Promoted the Precipitation Recycling Process in Xinjiang
* Vegs: View Extrapolation of Urban Scenes in 3d Gaussian Splatting Using Learned Priors
* Vehicle-Road-Cloud Collaborative Perception Framework and Key Technologies: A Review
* Vehicle-to-Vehicle Channel Measurements and Power Domain Modeling in Mountainous Plateau Environments for Emergency Communications
* Vehicular Social Dynamic Anomaly Detection With Recurrent Multi-Mask Aggregator Enabled VAE
* Veil Privacy on Visual Data: Concealing Privacy for Humans, Unveiling for DNNs
* VEON: Vocabulary-enhanced Occupancy Prediction
* Versatile Framework for Unsupervised Domain Adaptation Based on Instance Weighting, A
* Versatile Incremental Learning: Towards Class and Domain-agnostic Incremental Learning
* Versatile Point Cloud Compressor Using Universal Multiscale Conditional Coding: Part I: Geometry, A
* Versatile Point Cloud Compressor Using Universal Multiscale Conditional Coding: Part II: Attribute, A
* Versatilegaussian: Real-time Neural Rendering for Versatile Tasks Using Gaussian Splatting
* Vertical Distribution, Diurnal Evolution, and Source Region of Formaldehyde During the Warm Season Under Ozone-Polluted and Non-Polluted Conditions in Nanjing, China
* Vetra: A Dataset for Vehicle Tracking in Aerial Imagery: New Challenges for Multi-object Tracking
* VF-NeRF: Viewshed Fields for Rigid NeRF Registration
* Vfusion3d: Learning Scalable 3d Generative Models from Video Diffusion Models
* Vic-mae: Self-supervised Representation Learning from Images and Video with Contrastive Masked Autoencoders
* Video Editing via Factorized Diffusion Distillation
* Video Question Answering with Procedural Programs
* Video Visualization and Visual Analytics: A Task-Based and Application- Driven Investigation
* Video-Based Soft Tissue Deformation Tracking for Laparoscopic Augmented Reality-Based Navigation in Kidney Surgery
* Video-Instrument Synergistic Network for Referring Video Instrument Segmentation in Robotic Surgery
* Videoagent: A Memory-augmented Multimodal Agent for Video Understanding
* Videoagent: Long-form Video Understanding with Large Language Model as Agent
* VideoClusterNet: Self-supervised and Adaptive Face Clustering for Videos
* Videomamba: Spatio-temporal Selective State Space Model
* VideoMamba: State Space Model for Efficient Video Understanding
* Videopatchcore: An Effective Method to Memorize Normality for Video Anomaly Detection
* Videoshop: Localized Semantic Video Editing with Noise-extrapolated Diffusion Inversion
* VideoStudio: Generating Consistent-content and Multi-scene Videos
* VIDF-Net: A Voxel-Image Dynamic Fusion method for 3D object detection
* View Selection for 3d Captioning via Diffusion Ranking
* View-consistent 3d Editing with Gaussian Splatting
* View-consistent Hierarchical 3d Segmentation Using Ultrametric Feature Fields
* Viewformer: Exploring Spatiotemporal Modeling for Multi-view 3D Occupancy Perception via View-guided Transformers
* Viewpoint Textual Inversion: Discovering Scene Representations and 3d View Control in 2d Diffusion Models
* Vifa: An Efficient Visible and Infrared Image Fusion Architecture for Multi-task Applications via Continual Learning
* VIG-Bias: Visually Grounded Bias Discovery and Mitigation
* Vigor: Improving Visual Grounding of Large Vision Language Models with Fine-grained Reward Modeling
* Vila: Efficient Video-language Alignment for Video Question Answering
* VIP: Versatile Image Outpainting Empowered by Multimodal Large Language Model
* Viper: Visual Personalization of Generative Models via Individual Preference Learning
* VIPNET: Combining Viewpoint Information and Shape Priors for Instant Multi-view 3d Reconstruction
* VISA: Reasoning Video Object Segmentation via Large Language Models
* Visage: Video Instance Segmentation with Appearance-guided Enhancement
* Visfocus: Prompt-guided Vision Encoders for Ocr-free Dense Document Understanding
* Visible and Clear: Finding Tiny Objects in Difference Map
* Vision Language Models are blind
* Vision-guided robot calibration using photogrammetric methods
* Vision-language Action Knowledge Learning for Semantic-aware Action Quality Assessment
* Vision-language Dual-pattern Matching for Out-of-distribution Detection
* VisionlLaMA: A Unified LLaMA Backbone for Vision Tasks
* Visiontrap: Vision-augmented Trajectory Prediction Guided by Textual Descriptions
* Vista3d: Unravel the 3d Darkside of a Single Image
* Visual Alignment Pre-training for Sign Language Translation
* Visual Grounding for Object-level Generalization in Reinforcement Learning
* Visual Prompting via Partial Optimal Transport
* Visual Relationship Transformation
* Visual speech recognition using compact hypercomplex neural networks
* Visual Text Generation in the Wild
* Visual-guided hierarchical iterative fusion for multi-modal video action recognition
* Vitatecs: A Diagnostic Dataset for Temporal Concept Understanding of Video-language Models
* Vividdreamer: Invariant Score Distillation for Hyper-realistic Text-to-3d Generation
* VLAD-BUFF: Burst-aware Fast Feature Aggregation for Visual Place Recognition
* VLDadaptor: Domain Adaptive Object Detection With Vision-Language Model Distillation
* VNI-Net: Vector neurons-based rotation-invariant descriptor for LiDAR place recognition
* Volumetric Rendering with Baked Quadrature Fields
* Volumetric Saliency Guided Image Summarization for RGB-D Indoor Scene Classification, A
* VP-SAM: Taming Segment Anything Model for Video Polyp Segmentation via Disentanglement and Spatio-temporal Side Network
* VQ-HPS: Human Pose and Shape Estimation in a Vector-quantized Latent Space
* VQA-DIFF: Exploiting VQA and Diffusion for Zero-shot Image-to-3d Vehicle Asset Generation in Autonomous Driving
* VS-TransGRU: A Novel Transformer-GRU-Based Framework Enhanced by Visual-Semantic Fusion for Egocentric Action Anticipation
* VSVIG: Real-time Video-based Seizure Detection via Skeleton-based Spatiotemporal VIG
* VyaktitvaNirdharan: Multimodal Assessment of Personality and Trait Emotional Intelligence
* Walker: Self-supervised Multiple Object Tracking by Walking on Temporal Appearance Graphs
* Was: Dataset and Methods for Artistic Text Segmentation
* WAST-3D: Wasserstein-2 Distance for Scene-to-scene Stylization on 3d Gaussians
* Watch Your Steps: Local Image and Scene Editing by Text Instructions
* Watching it in Dark: A Target-aware Representation Learning Framework for High-level Vision Tasks in Low Illumination
* Watermark-conditioned Diffusion Model for IP Protection, A
* Wave: Warping Ddim Inversion Features for Zero-shot Text-to-video Editing
* Wavelength-embedding-guided Filter-array Transformer for Spectral Demosaicing
* Wavelet Convolutions for Large Receptive Fields
* Wavelet-based Mamba with Fourier Adjustment for Low-light Image Enhancement
* WBP: Training-time Backdoor Attacks Through Hardware-based Weight Bit Poisoning
* Weak-to-strong Compositional Learning from Generative Models for Language-based Object Detection
* Weakly Correlated Multimodal Sentiment Analysis: New Dataset and Topic-Oriented Model
* Weakly Supervised 3d Object Detection via Multi-level Visual Guidance
* Weakly Supervised Co-training with Swapping Assignments for Semantic Segmentation
* Weakly Supervised Fixated Object Detection in Traffic Videos Based on Driver's Selective Attention Mechanism
* Weakly Supervised Monocular 3D Object Detection by Spatial-Temporal View Consistency
* Weakly Supervised Underwater Object Real-time Detection Based on High-resolution Attention Class Activation Mapping and Category Hierarchy
* Weakly-supervised 3d Hand Reconstruction with Knowledge Prior and Uncertainty Guidance
* Weakly-Supervised 3D Scene Graph Generation via Visual-Linguistic Assisted Pseudo-Labeling
* Weakly-Supervised Camera Localization by Ground-to-Satellite Image Registration
* Weakly-Supervised Pavement Surface Crack Segmentation Based on Dual Separation and Domain Generalization
* Weakly-supervised Spatio-temporal Video Grounding with Variational Cross-modal Alignment
* Wear-any-way: Manipulable Virtual Try-on via Sparse Correspondence Alignment
* Weather-aware autopilot: Domain generalization for point cloud semantic segmentation in diverse weather scenarios
* Weather-Aware Collaborative Perception With Uncertainty Reduction
* WEBRPG: Automatic Web Rendering Parameters Generation for Visual Presentation
* Weconvene: Learned Image Compression with Wavelet-domain Convolution and Entropy Model
* WECROMCL: Weakly Supervised Cross-modality Contrastive Learning for Transcription-only Supervised Text Spotting
* Weight Conditioning for Smooth Optimization of Neural Networks
* Weighted Co-Training Framework for Emotion Recognition Based on EEG Data Generation Using Frequency-Spatial Diffusion Transformer, A
* Weighted Ensemble Models Are Strong Continual Learners
* Weighting Pseudo-labels via High-activation Feature Index Similarity and Object Detection for Semi-supervised Segmentation
* WGS-YOLO: A real-time object detector based on YOLO framework for autonomous driving
* WHAC: World-Grounded Humans and Cameras
* When Do We Not Need Larger Vision Models?
* When Fast Fourier Transform Meets Transformer for Image Restoration
* When Meta-Learning Meets Online and Continual Learning: A Survey
* When Pedestrian Detection Meets Multi-modal Learning: Generalist Model and Benchmark Dataset
* Where am I? Scene Retrieval with Language
* Which Model Generated This Image? A Model-agnostic Approach for Origin Attribution
* WHU-Railway3D: A Diverse Dataset and Benchmark for Railway Point Cloud Semantic Segmentation
* Wide Evaluation of ChatGPT on Affective Computing Tasks, A
* Wildrefer: 3d Object Localization in Large-scale Dynamic Scenes with Multi-modal Visual Data and Natural Language
* Wildvidfit: Video Virtual Try-on in the Wild via Image-based Controlled Diffusion Models
* WIMANS: A Benchmark Dataset for WIFI-based Multi-User Activity Sensing
* Window-based Channel Attention for Wavelet-enhanced Learned Image Compression
* Windpoly: Polygonal Mesh Reconstruction via Winding Numbers
* Within the Dynamic Context: Inertia-aware 3d Human Modeling with Pose Sequence
* Word2Scene: Efficient remote sensing image scene generation with only one word via hybrid intelligence and low-rank representation
* Wordrobe: Text-guided Generation of Textured 3d Garments
* Worldpose: A World Cup Dataset for Global 3d Human Pose Estimation
* Wovogen: World Volume-aware Diffusion for Controllable Multi-camera Driving Scene Generation
* WPS-SAM: Towards Weakly-supervised Part Segmentation with Foundation Models
* Wrim-net: Wide-ranging Information Mining Network for Visible-infrared Person Re-identification
* WSI-VQA: Interpreting Whole Slide Images by Generative Visual Question Answering
* Wts: A Pedestrian-centric Traffic Video Dataset for Fine-grained Spatial-temporal Understanding
* X-former: Unifying Contrastive and Reconstruction Learning for MLLMs
* X-instructblip: A Framework for Aligning Image, 3d, Audio, Video to LLMs and its Emergent Cross-modal Reasoning
* X-pose: Detecting Any Keypoints
* XPSR: Cross-modal Priors for Diffusion-based Image Super-resolution
* Yolov9: Learning What You Want to Learn Using Programmable Gradient Information
* You Only Learn One Query: Learning Unified Human Query for Single-stage Multi-person Multi-task Human-centric Perception
* You Only Need One Step: Fast Super-resolution with Stable Diffusion via Scale Distillation
* You Will Never Walk Alone: One-Shot 3D Action Recognition with Point Cloud Sequence
* Zero-shot Adaptation for Approximate Posterior Sampling of Diffusion Models in Inverse Problems
* Zero-shot Detection of AI-generated Images
* Zero-Shot Image Feature Consensus with Deep Functional Maps
* Zero-shot Multi-object Scene Completion
* Zero-shot Object Counting with Good Exemplars
* Zero-Shot Temporal Action Detection by Learning Multimodal Prompts and Text-Enhanced Actionness
* ZEROI2V: Zero-cost Adaptation of Pre-trained Transformers from Image to Video
* Zest: Zero-shot Material Transfer from a Single Image
* ZIGMA: A DIT-Style Zigzag Mamba Diffusion Model
* Ziplora: Any Subject in Any Style by Effectively Merging Loras
* Zola: Zero-Shot Creative Long Animation Generation with Short Video Model
3727 for 2412