Update Dates 2309

2309 * $R^2$ Former: Unified Retrieval and Reranking Transformer for Place Recognition
* *Affective Behavior Analysis In-the-Wild
* *AgriVision: Agriculture-Vision: Challenges and Opportunities for Computer Vision in Agriculture
* *AI City Challenge
* *Autonomous Driving
* *Biometrics
* *Bridging the Gap Between Computational Photography and Visual Recognition
* *Catch UAVs That Want to Watch You: Detection and Tracking of Unmanned Aerial Vehicle in the Wild
* *ChaLearn Face Anti-Spoofing
* *Challenge on Mobile Intelligent Photography and Imaging
* *Computer Vision for Fashion, Art and Design
* *Computer Vision for Microscopy Image Analysis
* *Computer Vision for Mixed Reality
* *Computer Vision for Physiological Measurement
* *Computer Vision in Sports
* *Continual Learning in Computer Vision
* *Deep Learning for Geometric Computing
* *Deep Learning in Ultrasound Image Analysis
* *Dynamic Scene Reconstruction
* *EarthVision: Large Scale Computer Vision for Remote Sensing Imagery
* *Efficient Deep Learning for Computer Vision
* *Embedded Vision
* *End-to-End Autonomous Driving: Perception, Prediction, Planning and Simulation
* *Event-Based Vision
* *Explainable AI for Computer Vision Workshop
* *Fair, Data-Efficient and Trusted Computer Vision
* *Federated Learning for Computer Vision
* *Gaze Estimation and Prediction in the Wild
* *Generative Models for Computer Vision
* *Image Matching: Local Features and Beyond
* *Large Scale Holistic Video Understanding
* *LatinX in CV Research
* *Learning With Limited Labelled Data for Image and Video Understanding
* *Light Fields for Computer Vision LFNAT: New Applications and Trends in Light Fields
* *Media Forensics
* *Mobile AI
* *Monocular Depth Estimation Challenge
* *Multimodal Content Moderation
* *Multimodal Learning and Applications
* *Neural Architecture Search: Lightweight NAS Challenge (NAS)
* *New Trends in Image Restoration and Enhancement
* *Omnidirectional Computer Vision in Research and Industry
* *Open-Domain Retrieval Under Multi-Modal Settings
* *Perception Beyond the Visible Spectrum
* *Photogrammetric Computer Vision and Image Analysis
* *Pixel-Level Video Understanding in the Wild Challenge
* *Precognition: Seeing Through the Future
* *Rhobin Challenge: Reconstruction of Human-Object Interaction
* *Safe Artificial Intelligence for Automated Driving
* *Structural and Compositional Learning on 3D Data
* *Topology, Algebra, and Geometry in Computer Vision
* *Vision Datasets Understanding
* *Visual Anomaly and Novelty Detection
* *VOCVALC: Visual Odometry and Computer Vision Applications Based on Location Clues - With a Focus on Mobile Platform Applications
* *Women in Computer Vision
* *Workshop and Challenges for New Frontiers in Visual Language Reasoning: Compositionality, Prompts and Causality
* *Workshop of Adversarial Machine Learning on Computer Vision: Art of Robustness
* *Workshop on Capturing, Interpreting & Visualizing Indoor Living Spaces
* *Workshop on Face and Gesture Analysis for Health Informatics
* *Workshop on Foundation Models: Foundation Model Challenge
* *Workshop on Vision-Based Industrial Inspection
* 1% VS 100%: Parameter-Efficient Low Rank Adapter for Dense Predictions
* 1000 FPS HDR Video with a Spike-RGB Hybrid Camera
* 2PCNet: Two-Phase Consistency Training for Day-to-Night Unsupervised Domain Adaptive Object Detection
* 3D Cinemagraphy from a Single Image
* 3D Concept Learning and Reasoning from Multi-View Images
* 3D Face Reconstruction and Gaze Tracking in the HMD for Virtual Interaction
* 3D GAN Inversion with Facial Symmetry Prior
* 3D Highlighter: Localizing Regions on 3D Shapes via Text Descriptions
* 3D human body modeling with orthogonal human mask image based on multi-channel Swin transformer architecture
* 3D Human Keypoints Estimation from Point Clouds in the Wild without Human Labels
* 3D Human Mesh Estimation from Virtual Markers
* 3D Human Pose Estimation via Intuitive Physics
* 3D Human Pose Estimation with Spatio-Temporal Criss-Cross Attention
* 3D Line Mapping Revisited
* 3D Neural Field Generation Using Triplane Diffusion
* 3D pedestrian localization fusing via monocular camera
* 3D Registration with Maximal Cliques
* 3D Semantic Segmentation in the Wild: Learning Generalized Models for Adverse-Condition Point Clouds
* 3D Shape Reconstruction of Semi-Transparent Worms
* 3D Spatial Multimodal Knowledge Accumulation for Scene Graph Prediction in Point Cloud
* 3D Video Loops from Asynchronous Input
* 3D Video Object Detection with Learnable Object-Centric Global Optimization
* 3D-aware Conditional Image Synthesis
* 3D-Aware Face Swapping
* 3D-aware Facial Landmark Detection via Multi-view Consistent Training on Synthetic Data
* 3D-Aware Multi-Class Image-to-Image Translation with NeRFs
* 3D-Aware Object Goal Navigation via Simultaneous Exploration and Identification
* 3D-IntPhys: Towards More Generalized 3D-grounded Visual Intuitive Physics under Challenging Scenes
* 3D-POP: An Automated Annotation Approach to Facilitate Markerless 2D-3D Tracking of Freely Moving Birds with Marker-Based Motion Capture
* 3DAvatarGAN: Bridging Domains for Personalized Editable Avatars
* 3DSAINT Representation for 3D Point Clouds
* 3DSSR: 3D Subscene Retrieval
* 3Mformer: Multi-order Multi-mode Transformer for Skeletal Action Recognition
* 4D LUT: Learnable Context-Aware 4D Lookup Table for Image Enhancement
* 7th AI City Challenge, The
* @ CREPE: Can Vision-Language Foundation Models Reason Compositionally?
* A-CAP: Anticipation Captioning with Commonsense Knowledge
* A2-Aug: Adaptive Automated Data Augmentation
* A2B: Anchor to Barycentric Coordinate for Robust Correspondence
* A2J-Transformer: Anchor-to-Joint Transformer Network for 3D Interacting Hand Pose Estimation from a Single RGB Image
* Abandoned Land Mapping Based on Spatiotemporal Features from PolSAR Data via Deep Learning Methods
* ABAW5 Challenge: A Facial Affect Recognition Approach Utilizing Transformer Encoder and Audiovisual Fusion
* ABAW: Valence-Arousal Estimation, Expression Recognition, Action Unit Detection and Emotional Reaction Intensity Estimation Challenges
* ABCD: Arbitrary Bitwise Coefficient for De-Quantization
* ABLE-NeRF: Attention-Based Rendering with Learnable Embeddings for Neural Radiance Field
* Aboveground Biomass Dynamics of a Coastal Wetland Ecosystem Driven by Land Use/Land Cover Transformation
* Abstract Visual Reasoning Enabled by Language
* Abstract Visual Reasoning: An Algebraic Approach for Solving Raven's Progressive Matrices
* Abstractive Summarization for Video: A Revisit in Multistage Fusion Network With Forget Gate
* Accelerable Lottery Tickets with the Mixed-Precision Quantization
* Accelerated Coordinate Encoding: Learning to Relocalize in Minutes Using RGB and Poses
* Accelerating Dataset Distillation via Model Augmentation
* Accelerating Vision-Language Pretraining with Free Language Modeling
* AccelIR: Task-aware Image Compression for Accelerating Neural Restoration
* Accidental Light Probes
* Accumulated micro-motion representations for lightweight online action detection in real-time
* Accurate and Efficient Supervoxel Re-Segmentation Approach for Large-Scale Point Clouds Using Plane Constraints, An
* ACGAN: Age-compensated makeup transfer based on homologous continuity generative adversarial network model
* Achieving a Better Stability-Plasticity Trade-off via Auxiliary Networks in Continual Learning
* ACL-SPC: Adaptive Closed-Loop System for Self-Supervised Point Cloud Completion
* Acquiring 360° Light Field by a Moving Dual-Fisheye Camera
* ACR: Attention Collaboration-based Regressor for Arbitrary Two-Hand Reconstruction
* ACSeg: Adaptive Conceptualization for Unsupervised Semantic Segmentation
* Action Probability Calibration for Efficient Naturalistic Driving Action Localization
* Actionlet-Dependent Contrastive Learning for Unsupervised Skeleton-Based Action Recognition
* Activating More Pixels in Image Super-Resolution Transformer
* Active Exploration of Multimodal Complementarity for Few-Shot Action Recognition
* Active Finetuning: Exploiting Annotation Budget in the Pretraining-Finetuning Paradigm
* Active-Passive Beamforming With Imperfect CSI for IRS-Assisted Sensing System
* ActMAD: Activation Matching to Align Distributions for Test-Time-Training
* Actor-centric Causality Graph for Asynchronous Temporal Inference in Group Activity, An
* AdaMAE: Adaptive Masking for Efficient Spatiotemporal Learning with Masked Autoencoders
* AdamsFormer for Spatial Action Localization in the Future
* AdaMTL: Adaptive Input-dependent Inference for Efficient Multi-Task Learning
* AdaptCD: An Adaptive Target Region-based Commodity Detection System
* Adapting Grounded Visual Question Answering Models to Low Resource Languages
* Adapting Shortcut with Normalizing Flow: An Efficient Tuning Framework for Visual Recognition
* Adaptive Annealing for Robust Geometric Estimation
* Adaptive Assignment for Geometry Aware Local Feature Matching
* Adaptive Channel Sparsity for Federated Learning under System Heterogeneity
* Adaptive Data-Free Quantization
* Adaptive Feature Attention Module for Robust Visual-LiDAR Fusion-Based Object Detection in Adverse Weather Conditions
* Adaptive Global Decay Process for Event Cameras
* Adaptive Graph Convolutional Subspace Clustering
* Adaptive Human Matting for Dynamic Videos
* Adaptive Human-Centric Video Compression for Humans and Machines
* Adaptive multi-teacher softened relational knowledge distillation framework for payload mismatch in image steganalysis
* Adaptive Patch Deformation for Textureless-Resilient Multi-View Stereo
* Adaptive Plasticity Improvement for Continual Learning
* Adaptive RoI with pretrained models for Automated Retail Checkout
* Adaptive Sparse Convolutional Networks with Global Context Enhancement for Faster Object Detection on Drone Images
* Adaptive Sparse Pairwise Loss for Object Re-Identification
* Adaptive Spot-Guided Transformer for Consistent Local Feature Matching
* Adaptive Zone-aware Hierarchical Planner for Vision-Language Navigation
* AdaptiveMix: Improving GAN Training via Feature Space Shrinkage
* Addressing the Occlusion Problem in Multi-Camera People Tracking with Human Pose Estimation
* Adjustment and Alignment for Unbiased Open Set Domain Adaptation
* Advancing Visual Grounding with Scene Knowledge: Benchmark and Method
* Adversarial Counterfactual Visual Explanations
* Adversarial Defense in Aerial Detection
* Adversarial Dense Contrastive Learning for Semi-Supervised Semantic Segmentation
* Adversarial Domain Generalization for Surveillance Face Anti-Spoofing
* Adversarial Normalization: I Can visualize Everything (ICE)
* Adversarial Robustness via Random Projection Filters
* Adversarially Masking Synthetic to Mimic Real: Adaptive Noise Injection for Point Cloud Segmentation Adaptation
* Adversarially Robust Neural Architecture Search for Graph Neural Networks
* AEA-Net: Affinity-supervised entanglement attentive network for person re-identification
* AeDet: Azimuth-Invariant Multi-View 3D Object Detection
* Affection: Learning Affective Explanations for Real-World Visual Data
* Affine Equivariant Tyler's M-Estimator Applied to Tail Parameter Learning of Elliptical Distributions
* Affordance Diffusion: Synthesizing Hand-Object Interactions
* Affordance Grounding from Demonstration Video to Target Image
* Affordances from Human Videos as a Versatile Representation for Robotics
* AGAIN: Adversarial Training with Attribution Span Enlargement and Hybrid Feature Fusion
* Age estimation by extracting hierarchical age-related features
* Agronav: Autonomous Navigation Framework for Agricultural Robots and Vehicles using Semantic Segmentation and Semantic Line Detection
* AI-Empowered Persuasive Video Generation: A Survey
* AI-Synthesized Voice Detection Using Neural Vocoder Artifacts
* Alias-Free Convnets: Fractional Shift Invariance via Polynomial Activations
* Align and Attend: Multimodal Summarization with Dual Contrastive Losses
* Align Your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
* AligNeRF: High-Fidelity Neural Radiance Fields via Alignment-Aware Training
* Aligning Bag of Regions for Open-Vocabulary Object Detection
* Aligning Step-by-Step Instructional Diagrams to Video Demonstrations
* ALIKE: Accurate and Lightweight Keypoint Detection and Descriptor Extraction
* All are Worth Words: A ViT Backbone for Diffusion Models
* All in One: Exploring Unified Video-Language Pre-Training
* All Keypoints You Need: Detecting Arbitrary Keypoints on the Body of Triple, High, and Long Jump Athletes
* All-in-Focus Imaging from Event Focal Stack
* All-in-One Image Restoration for Unknown Degradations Using Adaptive Discriminative Filters for Specific Degradations
* ALOFT: A Lightweight MLP-Like Architecture with Dynamic Low-Frequency Transform for Domain Generalization
* ALSO: Automotive Lidar Self-Supervision by Occupancy Estimation
* AltFreezing for More General Video Face Forgery Detection
* ALTO: Alternating Latent Topologies for Implicit 3D Reconstruction
* Ambiguity-Resistant Semi-Supervised Learning for Dense Object Detection
* Ambiguous Medical Image Segmentation Using Diffusion Models
* AMT: All-Pairs Multi-Field Transforms for Efficient Frame Interpolation
* Analysing Pine Disease Spread Using Random Point Process by Remote Sensing of a Forest Stand
* Analysis of Correlation between Anthropization Phenomena and Landscape Values of the Territory: A GIS Framework Based on Spatial Statistics
* Analysis of Emotion Annotation Strength Improves Generalization in Speech Emotion Recognition Models
* Analysis of Lithosphere-Atmosphere-Ionosphere Coupling Associated with the 2022 Luding Ms6.8 Earthquake, The
* Analysis of PM2.5 Synergistic Governance Path from a Socio-Economic Perspective: A Case Study of Guangdong Province
* Analyzing and Diagnosing Pose Estimation with Attributions
* Analyzing Physical Impacts Using Transient Surface Wave Imaging
* Analyzing Results of Depth Estimation Models with Monocular Criteria
* Anchor-based discriminative dual distribution calibration for transductive zero-shot learning
* Anchor3DLane: Learning to Regress 3D Anchors for Monocular 3D Lane Detection
* AnchorFormer: Point Cloud Completion from Discriminative Nodes
* ANetQA: A Large-scale Benchmark for Fine-grained Compositional Reasoning over Untrimmed Videos
* Angelic Patches for Improving Third-Party Object Detector Performance
* Angular Patterns of Nonlinear Emission in Dye Water Droplets Stimulated by a Femtosecond Laser Pulse for LiDAR Applications
* Annealing-based Label-Transfer Learning for Open World Object Detection
* Anomaly Detection with Domain Adaptation
* Ante-Hoc Generation of Task-Agnostic Interpretation Maps
* Anti-Bandit for Neural Architecture Search
* AnyFlow: Arbitrary Scale Optical Flow with Implicit Neural Representation
* Appearance Label Balanced Triplet Loss for Multi-modal Aerial View Object Classification
* APPLeNet: Visual Attention Parameterized Prompt Learning for Few-Shot Remote Sensing Image Generalization using CLIP
* Application of Hydro-Based Morphological Models for Environmental Assessment of Watersheds
* Application of Machine-Learning Model for Analyzing the Impact of Land-Use Change on Surface Water Resources in Gauteng Province, South Africa, An
* Applications of Deep Learning for Top-View Omnidirectional Imaging: A Survey
* Applying Remote Sensing Methods to Estimate Alterations in Land Cover Change and Degradation in the Desert Regions of the Southeast Iberian Peninsula
* Approach for Monitoring Shallow Surface Outcrop Mining Activities Based on Multisource Satellite Remote Sensing Data, An
* Architectural Backdoors in Neural Networks
* Architecture, Dataset and Model-Scale Agnostic Data-free Meta-Learning
* ARCTIC: A Dataset for Dexterous Bimanual Hand-Object Manipulation
* Are Binary Annotations Sufficient? Video Moment Retrieval via Hierarchical Uncertainty-based Active Learning
* Are Data-Driven Explanations Robust Against Out-of-Distribution Data?
* Are Deep Neural Networks SMARTer Than Second Graders?
* Are Labels Needed for Incremental Instance Learning?
* Are Local Features All You Need for Cross-Domain Visual Place Recognition?
* Are we certain it's anomalous?
* Are We Ready for Vision-Centric Driving Streaming Perception? The ASAP Benchmark
* ARKitTrack: A New Diverse Dataset for Tracking Using Mobile RGB-D Data
* ARO-Net: Learning Implicit Fields from Anchored Radial Observations
* Artificial Intelligence Forecasting of Marine Heatwaves in the South China Sea Using a Combined U-Net and ConvLSTM System
* AsConvSR: Fast and Lightweight Super-Resolution Network with Assembled Convolutions
* AShapeFormer: Semantics-Guided Object-Level Active Shape Encoding for 3D Object Detection via Transformers
* ASPnet: Action Segmentation with Shared-Private Representation of Multiple Data Sources
* AssemblyHands: Towards Egocentric Activity Understanding via 3D Hand Pose Estimation
* Assessing the Nonlinear Changes in Global Navigation Satellite System Vertical Time Series with Environmental Loading in Mainland China
* Assessment of the Bike-Sharing Socioeconomic Equity in the Use of Routes
* Assigned MURA Defect Generation Based on Diffusion Model
* AstroNet: When Astrocyte Meets Artificial Neural Network
* AsyFOD: An Asymmetric Adaptation Paradigm for Few-Shot Domain Adaptive Object Detection
* Asymmetric Color Transfer with Consistent Modality Learning
* Asymmetric Feature Fusion for Image Retrieval
* Asymptotically Optimal Estimator for Source Location and Propagation Speed by TDOA, An
* Asynchronous Events-based Panoptic Segmentation using Graph Mixer Neural Network
* Asynchronous Federated Continual Learning
* ATOM: Self-supervised human action recognition using atomic motion representation learning
* Attack-Agnostic Deep Face Anti-Spoofing
* Attention Retractable Frequency Fusion Transformer for Image Super Resolution
* Attention Weighted Local Descriptors
* Attention-based Part Assembly for 3D Volumetric Shape Modeling
* Attention-Based Point Cloud Edge Sampling
* AttentionShift: Iteratively Estimated Part-Based Attention Map for Pointly Supervised Instance Segmentation
* Attentive spatial-temporal contrastive learning for self-supervised video representation
* Attribute disentanglement with gradient reversal for interactive fashion retrieval
* Attribute-Preserving Face Dataset Anonymization via Latent Code Optimization
* AttriCLIP: A Non-Incremental Learner for Incremental Knowledge Learning
* Audio-Visual Grouping Network for Sound Localization from Mixtures
* Audio-Visual Person-of-Interest DeepFake Detection
* Augmentation Matters: A Simple-Yet-Effective Approach to Semi-Supervised Semantic Segmentation
* AUNet: Learning Relations Between Action Units for Face Forgery Detection
* Auto-CARD: Efficient and Robust Codec Avatar Driving for Real-time Mobile Telepresence
* AutoAD: Movie Description in Context
* AutoFocusFormer: Image Segmentation off the Grid
* AutoLabel: CLIP-based framework for Open-Set Video Domain Adaptation
* Automatic classification of company's document stream: Comparison of two solutions
* Automatic Detection and Dynamic Analysis of Urban Heat Islands Based on Landsat Images
* Automatic High Resolution Wire Segmentation and Removal
* Automatic Transformation Search Against Deep Leakage From Gradients
* Autonomous Manipulation Learning for Similar Deformable Objects via Only One Demonstration
* AutoRecon: Automated 3D Object Discovery and Reconstruction
* Autoregressive Visual Tracking
* AutoShot: A Short Video Dataset and State-of-the-Art Shot Boundary Detection
* AutoSplice: A Text-prompt Manipulated Image Dataset for Media Forensics
* Avatars Grow Legs: Generating Smooth Human Motion from Sparse Tracking Inputs with Diffusion Model
* AVFace: Towards Detailed Audio-Visual 4D Face Reconstruction
* AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR
* Azimuth Super-Resolution for FMCW Radar in Autonomous Driving
* B-Spline Texture Coefficients Estimator for Screen Content Image Super-Resolution
* BAAM: Monocular 3D pose and shape reconstruction with bi-contextual attention module and attention-guided modeling
* Back to the Feature: Classical 3D Features are (Almost) All You Need for 3D Anomaly Detection
* Back to the future: a night photography rendering ISP without deep learning
* Back to the Source: Diffusion-Driven Adaptation to Test-Time Corruption
* Backdoor Attacks Against Deep Image Compression via Adaptive Frequency Trigger
* Backdoor Cleansing with Unlabeled Data
* Backdoor Defense via Adaptively Splitting Poisoned Dataset
* Backdoor Defense via Deconfounded Representation Learning
* BAD-NeRF: Bundle Adjusted Deblur Neural Radiance Fields
* BAEFormer: Bi-Directional and Early Interaction Transformers for Bird's Eye View Semantic Segmentation
* Bag-of-Prototypes Representation for Dataset-Level Applications, A
* Balanced Energy Regularization Loss for Out-of-distribution Detection
* Balanced Product of Calibrated Experts for Long-Tailed Recognition
* Balanced Spherical Grid for Egocentric View Synthesis
* Balancing Logit Variation for Long-Tailed Semantic Segmentation
* Bandpass Filter Based Dual-stream Network for Face Anti-spoofing
* Base and Meta: A New Perspective on Few-Shot Segmentation
* BASiS: Batch Aligned Spectral Embedding Space
* Batch Model Consolidation: A Multi-Task Model Consolidation Framework
* Bayesian Posterior Approximation With Stochastic Ensembles
* BBDM: Image-to-Image Translation with Brownian Bridge Diffusion Models
* BeautyREC: Robust, Efficient, and Component-Specific Makeup Transfer
* BeCAPTCHA-Type: Biometric Keystroke Data Generation for Improved Bot Detection
* BEDLAM: A Synthetic Dataset of Bodies Exhibiting Detailed Lifelike Animated Motion
* Behavioral Analysis of Vision-and-Language Navigation Agents
* Behind the Scenes: Density Fields for Single View Reconstruction
* Being Comes from Not-Being: Open-Vocabulary Text-to-Motion Generation with Wordless Training
* Benchmark Dataset and Effective Inter-Frame Alignment for Real-World Video Super-Resolution
* Benchmarking Robustness of 3D Object Detection to Common Corruptions in Autonomous Driving
* Benchmarking Robustness to Text-Guided Corruptions
* Benchmarking Self-Supervised Learning on Diverse Pathology Datasets
* Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object Detection
* Best Defense is a Good Offense: Adversarial Augmentation Against Adversarial Attacks, The
* Best of Both Worlds: Multimodal Contrastive Learning with Tabular and Imaging Data
* Best Practices for 2-Body Pose Forecasting
* Better CMOS Produces Clearer Images: Learning Space-Variant Blur Estimation for Blind Image Super-Resolution
* BEV-Guided Multi-Modality Fusion for Driving Perception
* BEV-LaneDet: An Efficient 3D Lane Detection Based on Virtual Camera via Key-Points
* BEV-SAN: Accurate BEV 3D Object Detection via Slice Attention Networks
* BEV@DC: Bird's-Eye View Assisted Training for Depth Completion
* BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision
* BEVHeight: A Robust Framework for Vision-based Roadside 3D Object Detection
* Beyond Appearance: A Semantic Controllable Self-Supervised Learning Framework for Human-Centric Visual Tasks
* Beyond Attentive Tokens: Incorporating Token Importance and Diversity for Efficient Vision Transformers
* Beyond AUROC and co. for evaluating out-of-distribution detection performance
* Beyond mAP: Towards Better Evaluation of Instance Segmentation
* Bi-Directional Distribution Alignment for Transductive Zero-Shot Learning
* Bi-directional Feature Fusion Generative Adversarial Network for Ultra-high Resolution Pathological Image Virtual Re-Staining
* Bi-Level Meta-Learning for Few-Shot Domain Generalization
* Bi-LRFusion: Bi-Directional LiDAR-Radar Fusion for 3D Dynamic Object Detection
* Bi-READ: Bi-Residual AutoEncoder based feature enhancement for video anomaly detection
* Bi3D: Bi-Domain Active Learning for Cross-Domain 3D Object Detection
* Bias in Pruned Vision Models: In-Depth Analysis and Countermeasures
* Bias Mimicking: A Simple Sampling Approach for Bias Mitigation
* Bias-Compensated Integral Regression for Human Pose Estimation
* Bias-Eliminating Augmentation Learning for Debiased Federated Learning
* BiasAdv: Bias-Adversarial Augmentation for Model Debiasing
* BiasBed: Rigorous Texture Bias Evaluation
* BiCro: Noisy Correspondence Rectification for Multi-modality Data via Bi-directional Cross-modal Similarity Consistency
* Bicubic++: Slim, Slimmer, Slimmest Designing an Industry-Grade Super-Resolution Network
* Bidirectional Copy-Paste for Semi-Supervised Medical Image Segmentation
* Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models
* BiFormer: Learning Bilateral Motion Estimation via Bilateral Transformer for 4K Video Frame Interpolation
* BiFormer: Vision Transformer with Bi-Level Routing Attention
* Bilateral Memory Consolidation for Continual Learning
* Binarizing Sparse Convolutional Networks for Efficient Point Cloud Analysis
* Binary Latent Diffusion
* BinaryViT: Pushing Binary Vision Transformers Towards Convolutional Models
* Biomechanics-Guided Facial Action Unit Detection Through Force Modeling
* BioNet: A Biologically-Inspired Network for Face Recognition
* Bit-shrinking: Limiting Instantaneous Sharpness for Improving Post-training Quantization
* BITE: Beyond Priors for Improved Three-D Dog Pose Estimation
* Bitstream-Corrupted JPEG Images are Restorable: Two-stage Compensation and Alignment Framework for Image Restoration
* BKinD-3D: Self-Supervised 3D Keypoint Discovery from Multi-View Videos
* Black-Box Sparse Adversarial Attack via Multi-Objective Optimisation CVPR Proceedings
* BlackVIP: Black-Box Visual Prompting for Robust Transfer Learning
* BlazeStyleGAN: A Real-Time On-Device StyleGAN
* Blemish-aware and Progressive Face Retouching with Limited Paired Data
* BlendFields: Few-Shot Example-Driven Facial Modeling
* Blind Image Inpainting via Omni-dimensional Gated Attention and Wavelet Queries
* Blind Image Quality Assessment via Vision-Language Correspondence: A Multitask Learning Perspective
* Blind Restoration of a Single Real Turbulence-Degraded Image Based on Self-Supervised Learning
* Blind Video Deflickering by Neural Filtering with a Flawed Atlas
* Block Selection Method for Using Feature Norm in Out-of-Distribution Detection
* Blowing in the Wind: CycleNet for Human Cinemagraphs from Still Images
* Blur Interpolation Transformer for Real-World Motion from Blur
* BMRN: Boundary Matching and Refinement Network for Temporal Moment Localization with Natural Language
* BokehOrNot: Transforming Bokeh Effect with Image Transformer and Lens Metadata Embedding
* Boost Vision Transformer with GPU-Friendly Sparsity and Quantization
* Boosting Accuracy and Robustness of Student Models via Adaptive Adversarial Distillation
* Boosting Detection in Crowd Analysis via Underutilized Output Features
* Boosting Low-Data Instance Segmentation by Unsupervised Pre-training with Saliency Prompt
* Boosting Robust Learning Via Leveraging Reusable Samples in Noisy Web Data
* Boosting Semi-Supervised Learning by Exploiting All Unlabeled Data
* Boosting separated softmax with discrimination for class incremental learning
* Boosting Transductive Few-Shot Fine-tuning with Margin-based Uncertainty Weighting and Probability Regularization
* Boosting Verified Training for Robust Image Classifications via Abstraction
* Boosting Video Object Segmentation via Space-Time Correspondence Learning
* Boosting Weakly-Supervised Temporal Action Localization with Text Information
* Bootstrap Your Own Prior: Towards Distribution-Agnostic Novel Class Discovery
* Bootstrapping Objectness from Videos by Relaxed Common Fate and Visual Grouping
* BOP Challenge 2022 on Detection, Segmentation and Pose Estimation of Specific Rigid Objects
* Both Style and Distortion Matter: Dual-Path Unsupervised Domain Adaptation for Panoramic Semantic Segmentation
* Boundary Delineator for Martian Crater Instances with Geographic Information and Deep Learning
* Boundary Unlearning: Rapid Forgetting of Deep Networks via Shifting the Decision Boundary
* Boundary-aware Backward-Compatible Representation via Adversarial Learning in Image Retrieval
* Boundary-enhanced Co-training for Weakly Supervised Semantic Segmentation
* Box-Level Active Detection
* BoxTeacher: Exploring High-Quality Pseudo Labels for Weakly Supervised Instance Segmentation
* Brain-Machine Coupled Learning Method for Facial Emotion Recognition
* Breaching FedMD: Image Recovery via Paired-Logits Inversion Attack
* Breaking the Object in Video Object Segmentation
* Breaking Through the Haze: An Advanced Non-Homogeneous Dehazing Method based on Fast Fourier Convolution and ConvNeXt
* Bridging Precision and Confidence: A Train-Time Loss for Calibrating Object Detection
* Bridging Search Region Interaction with Template for RGB-T Tracking
* Bridging the Gap Between Model Explanations in Partially Annotated Multi-Label Classification
* Bringing Inputs to Shared Domains for 3D Interacting Hands Recovery in the Wild
* BUFFER: Balancing Accuracy, Efficiency, and Generalizability in Point Cloud Registration
* Building Rearticulable Models for Arbitrary 3D Objects from 4D Point Clouds
* BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects
* BUOL: A Bottom-Up Framework with Occupancy-Aware Lifting for Panoptic 3D Scene Reconstruction From a Single Image
* Burstormer: Burst Image Restoration and Enhancement Transformer
* Bush Detection for Vision-based UGV Guidance in Blueberry Orchards: Data Set and Methods
* C-PLES: Contextual Progressive Layer Expansion with Self-attention for Multi-class Landslide Segmentation on Mars using Multimodal Satellite Imagery
* C-SFDA: A Curriculum Learning Aided Self-Training Framework for Efficient Source Free Domain Adaptation
* CABM: Content-Aware Bit Mapping for Single Image Super-Resolution Network with Large Input
* CaCo: Both Positive and Negative Samples are Directly Learnable via Cooperative-Adversarial Contrastive Learning
* CafeBoost: Causal Feature Boost to Eliminate Task-Induced Bias for Class Incremental Learning
* Cali-NCE: Boosting Cross-modal Video Representation Learning with Calibrated Alignment
* Camera based Eye State Estimation for ICU Patients: A Pilot Clinical Study
* Camera-based Recovery of Cardiovascular Signals from Unconstrained Face Videos using an Attention Network
* CAMM: Building Category-Agnostic and Animatable 3D Models from Monocular Videos
* Camouflaged Instance Segmentation via Explicit De-Camouflaging
* Camouflaged Object Detection with Feature Decomposition and Edge Reconstruction
* CAMS: CAnonicalized Manipulation Spaces for Category-Level Functional Hand-Object Manipulation Synthesis
* Can't Steal? Cont-Steal! Contrastive Stealing Attacks Against Image Encoders
* CanBiPT: Cancelable biometrics with physical template
* Canonical Fields: Self-Supervised Learning of Pose-Canonicalized Neural Fields
* CAP-VSTNet: Content Affinity Preserved Versatile Style Transfer
* Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?
* CAP: Robust Point Cloud Classification via Semantic and Structural Modeling
* CapDet: Unifying Dense Captioning and Open-World Detection Pretraining
* CAPE: Camera View Position Embedding for Multi-View 3D Object Detection
* CaPriDe Learning: Confidential and Private Decentralized Learning Based on Encryption-Friendly Distillation Loss
* Carotid Lumen Diameter and Intima-Media Thickness Measurement via Boundary-Guided Pseudo-Labeling
* Cartesian Genetic Programming Parameterization in the Context of Audio Synthesis
* CARTO: Category and Joint Agnostic Reconstruction of ARTiculated Objects
* Cartography and Geomedia in Pragmatic Dimensions
* Cascade Evidential Learning for Open-world Weakly-supervised Temporal Action Localization
* Cascade Network for Pattern Recognition Based on Radar Signal Characteristics in Noisy Environments, A
* Cascaded Local Implicit Transformer for Arbitrary-Scale Super-Resolution
* Cascaded Zoom-in Detector for High Resolution Aerial Images
* Case Study on the Evolution and Precipitation Characteristics of Southwest Vortex in China: Insights from FY-4A and GPM Observations
* CASP-Net: Rethinking Video Saliency Prediction from an Audio-Visual Consistency Perceptual Perspective
* Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention at Vision Transformer Inference
* Casual Conversations v2 Dataset: A diverse, large benchmark for measuring fairness and robustness in audio/vision/speech models, The
* CAT-NeRF: Constancy-Aware Tx2Former for Dynamic Body Modeling
* CAT: LoCalization and IdentificAtion Cascade Detection Transformer for Open-World Object Detection
* Catch Missing Details: Image Reconstruction with Frequency Augmented Variational Autoencoder
* Category Differences Matter: A Broad Analysis of Inter-Category Error in Semantic Segmentation
* Category Query Learning for Human-Object Interaction Classification
* Causalainer: Causal Explainer for Automatic Video Summarization
* Causally-Aware Intraoperative Imputation for Overall Survival Time Prediction
* CAVLI - Using image associations to produce local concept-based explanations
* CCA-FPN: Channel and content adaptive object detection
* CCuantuMM: Cycle-Consistent Quantum-Hybrid Matching of Multiple Shapes
* CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion
* CDText: Scene text detector based on context-aware deformable transformer
* CelebV-Text: A Large-Scale Facial Text-Video Dataset
* Center Focusing Network for Real-Time LiDAR Panoptic Segmentation
* Certified Adversarial Robustness Within Multiple Perturbation Bounds
* CF-Font: Content Fusion for Few-Shot Font Generation
* CFA: Class-Wise Calibrated Fair Adversarial Training
* CFDP: Common Frequency Domain Pruning
* Change-Aware Sampling and Contrastive Learning for Satellite Images
* Characterisation and Dynamics of an Emerging Seagrass Meadow
* Characteristic Function-Based Method for Bottom-Up Human Pose Estimation, A
* Characterization of the Vertical Distribution of Surface Soil Moisture Using ISMN Multilayer In Situ Data and Their Comparison with SMOS and SMAP Soil Moisture Products, The
* Chat2Map: Efficient Scene Mapping from Multi-Ego Conversations
* CheckSORT: Refined Synthetic Data Combination and Optimized SORT for Automatic Retail Checkout
* CherryPicker: Semantic Skeletonization and Topological Reconstruction of Cherry Trees
* CHMATCH: Contrastive Hierarchical Matching and Robust Adaptive Threshold Boosted Semi-Supervised Learning
* CiaoSR: Continuous Implicit Attention-in-Attention Network for Arbitrary-Scale Image Super-Resolution
* CiCo: Domain-Aware Sign Language Retrieval via Cross-Lingual Contrastive Learning
* CIGAR: Cross-Modality Graph Reasoning for Domain Adaptive Object Detection
* CIMI4D: A Large Multimodal Climbing Motion Dataset under Human-scene Interactions
* CIPF: Crossing Intention Prediction Network based on Feature Fusion Modules for Improving Pedestrian Safety
* CIRCLE: Capture In Rich Contextual Environments
* CLAMP: Prompt-based Contrastive Learning for Connecting Language and Animal Pose
* Class Adaptive Network Calibration
* Class Attention Transfer Based Knowledge Distillation
* Class Balanced Adaptive Pseudo Labeling for Federated Semi-Supervised Learning
* Class of Priors for Color Image Restoration Parameterized by Lie Groups Acting on Pixel Values, A
* Class Prototypes based Contrastive Learning for Classifying Multi-Label and Fine-Grained Educational Videos
* Class Relationship Embedded Learning for Source-Free Unsupervised Domain Adaptation
* Class-Balancing Diffusion Models
* Class-Conditional Sharpness-Aware Minimization for Deep Long-Tailed Recognition
* Class-Incremental Exemplar Compression for Class-Incremental Learning
* Classification of Fish Species Using Multispectral Data from a Low-Cost Camera and Machine Learning
* Clicking Matters: Towards Interactive Human Parsing
* CLIP for All Things Zero-Shot Sketch-Based Image Retrieval, Fine-Grained or Not
* CLIP is Also an Efficient Segmenter: A Text-Driven Approach for Weakly Supervised Semantic Segmentation
* CLIP the Gap: A Single Domain Generalization Approach for Object Detection
* CLIP-Guided Vision-Language Pre-training for Question Answering in 3D Scenes
* CLIP-S4: Language-Guided Self-Supervised Semantic Segmentation
* CLIP-Sculptor: Zero-Shot Generation of High-Fidelity and Diverse Shapes from Natural Language
* CLIP2: Contrastive Language-Image-Point Pretraining from Real-World Point Cloud Data
* CLIP2Protect: Protecting Facial Privacy Using Text-Guided Makeup via Adversarial Latent Search
* CLIP2Scene: Towards Label-efficient 3D Scene Understanding by CLIP
* CLIPPING: Distilling CLIP-Based Models with a Student Base for Video-Language Retrieval
* CLIPPO: Image-and-Language Understanding from Pixels Only
* Closer Look at Geometric Temporal Dynamics for Face Anti-Spoofing, A
* Closer Look at Rehearsal-Free Continual Learning *, A
* CloSET: Modeling Clothed Humans on Continuous Surface with Explicit Template Decomposition
* CLOTH4D: A Dataset for Clothed Human Reconstruction
* Clothed Human Performance Capture with a Double-layer Neural Radiance Fields
* Cloud Shadow Detection via Ray Casting with Probability Analysis Refinement Using Sentinel-2 Satellite Data
* Cloud-Device Collaborative Adaptation to Continual Changing Environments in the Real-World
* Clover: Towards A Unified Video-Language Alignment and Fusion Model
* Cluster Structure Function, The
* Clutter and Interference Cancellation in River Surface Velocity Measurement with a Coherent S-Band Radar
* CLVOS23: A Long Video Object Segmentation Dataset for Continual Learning
* CNN-Based Framework for Enhancing 360° VR Experiences With Multisensorial Effects, A
* CNT-NeRF: Carbon Nanotube Forest Depth Layer Decomposition in SEM Imagery using Generative Adversarial Networks
* CNVid-3.5M: Build, Filter, and Pre-Train the Large-Scale Public Chinese Video-Text Dataset
* Co-Salient Object Detection with Uncertainty-Aware Group Exchange-Masking
* Co-SLAM: Joint Coordinate and Sparse Parametric Encodings for Neural Real-Time SLAM
* Co-speech Gesture Synthesis by Reinforcement Learning with Contrastive Pretrained Rewards
* Co-training 2L Submodels for Visual Recognition
* Coaching a Teachable Student
* CODA-Prompt: COntinual Decomposed Attention-Based Prompting for Rehearsal-Free Continual Learning
* CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior
* cognition-inspired trajectory prediction method for vehicles in interactive scenarios, A
* Coherent Concept-based Explanations in Medical Image and Its Application to Skin Lesion Diagnosis
* Coherent Image Animation Using Spatial-Temporal Correspondence
* Collaboration Helps Camera Overtake LiDAR in 3D Detection
* Collaborative Diffusion for Multi-Modal Face Generation and Editing
* Collaborative Noisy Label Cleaner: Learning Scene-aware Trailers for Multi-modal Highlight Detection in Movies
* Collaborative Static and Dynamic Vision-Language Streams for Spatio-Temporal Video Grounding
* Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception
* Collision-Free Adaptive Fuzzy Formation Control for Stochastic Nonlinear Multiagent Systems
* Color Backdoor: A Robust Poisoning Attack in Color Space
* Color-Coated Steel Sheet Roof Building Extraction from External Environment of High-Speed Rail Based on High-Resolution Remote Sensing Images
* Combining Implicit-Explicit View Correlation for Light Field Semantic Segmentation
* Combining Physics and Deep Learning Models to Simulate the Flight of a Golf Ball
* CoMFormer: Continual Learning in Semantic and Panoptic Segmentation
* Command-driven Articulated Object Understanding and Manipulation
* Common Pets in 3D: Dynamic New-View Synthesis of Real-Life Deformable Categories
* Compacting Binary Neural Networks by Sparse Kernel Selection
* Comparative Study of Ground-Gridded and Satellite-Derived Formaldehyde during Ozone Episodes in the Chinese Greater Bay Area, A
* Comparative Study of the Atmospheric Gas Composition Detection Capabilities of FY-3D/HIRAS-I and FY-3E/HIRAS-II Based on Information Capacity
* Comparison of Soft Indicator and Poisson Kriging for the Noise-Filtering and Downscaling of Areal Data: Application to Daily COVID-19 Incidence Rates
* Compensation Learning in Semantic Segmentation
* Complementary Intrinsics from Neural Radiance Fields and CNNs for Outdoor Scene Relighting
* Complete 3D Human Reconstruction from a Single Incomplete Image
* Complete-to-Partial 4D Distillation for Self-Supervised Point Cloud Sequence Representation Learning
* CompletionFormer: Depth Completion with Convolutions and Vision Transformers
* Complexity-guided Slimmable Decoder for Efficient Deep Video Compression
* Composed Image Retrieval via Cross Relation Network With Hierarchical Aggregation Transformer
* Compositor: Bottom-Up Clustering and Compositing for Robust Part and Object Segmentation
* Compound Expression Recognition In-the-wild with AU-assisted Meta Multi-task Learning
* Comprehensive and Delicate: An Efficient Transformer for Image Restoration
* Comprehensive Multi-Metric Index for Health Assessment of the Poyang Lake Wetland, A
* Comprehensive quality assessment of optical satellite imagery using weakly supervised video learning
* Comprehensive Survey and Taxonomy on Single Image Dehazing Based on Deep Learning, A
* Comprehensive Survey of Few-Shot Learning: Evolution, Applications, Challenges, and Opportunities, A
* Comprehensive Visual Features and Pseudo Labeling for Robust Natural Language-based Vehicle Retrieval
* Compressing Volumetric Radiance Fields to 1 MB
* Compression-Aware Video Super-Resolution
* Computational Challenges and Approaches for Electric Vehicles
* Computational Flash Photography through Intrinsics
* Computationally Budgeted Continual Learning: What Does Matter?
* Computer Vision-Based Analysis of Buildings and Built Environments: A Systematic Review of Current Approaches
* Condition-Adaptive Graph Convolution Learning for Skeleton-Based Gait Recognition
* Conditional Generation of Audio from Video via Foley Analogies
* Conditional Image-to-Video Generation with Latent Flow Diffusion Models
* Conditional Temporal Variational AutoEncoder for Action Video Prediction
* Conditional Text Image Generation with Diffusion Models
* Confidence-Aware Fusion Using Dempster-Shafer Theory for Multispectral Pedestrian Detection
* Confidence-Aware Personalized Federated Learning via Variational Expectation Maximization
* Conflict-Based Cross-View Consistency for Semi-Supervised Semantic Segmentation
* Confusion Matrix for Evaluating Feature Attribution Methods, A
* Conjugate Product Graphs for Globally Optimal 2D-3D Shape Matching
* Connecting the Dots: Floorplan Reconstruction Using Two-Level Queries
* Connecting Vision and Language with Video Localized Narratives
* ConQueR: Query Contrast Voxel-DETR for 3D Object Detection
* Consistency and Accuracy of CelebA Attribute Values
* Consistent Direct Time-of-Flight Video Depth Super-Resolution
* Consistent View Synthesis with Pose-Guided Diffusion Models
* Consistent-Teacher: Towards Reducing Inconsistent Pseudo-Targets in Semi-Supervised Object Detection
* Constrained Evolutionary Diffusion Filter for Monocular Endoscope Tracking
* ConStruct-VL: Data-Free Continual Structured VL Concepts Learning*
* Constructing Deep Spiking Neural Networks from Artificial Neural Networks with Knowledge Distillation
* Contactless Respiratory Rate Monitoring For ICU Patients Based On Unsupervised Learning
* Containment Control of Autonomous Underwater Vehicles With Stochastic Environment Disturbances
* Content-Adaptive Downsampling in Convolutional Neural Networks
* Content-aware Token Sharing for Efficient Semantic Segmentation with Vision Transformers
* Context De-Confounded Emotion Recognition
* Context-aware Alignment and Mutual Masking for 3D-Language Pre-training
* Context-Aware Pretraining for Efficient Blind Image Decomposition
* Context-Aware Relative Object Queries to Unify Video Instance and Panoptic Segmentation
* Context-Based Trit-Plane Coding for Progressive Image Compression
* Continual Detection Transformer for Incremental Object Detection
* Continual Domain Adaptation through Pruning-aided Domain-specific Weight Modulation
* Continual Learning for LiDAR Semantic Segmentation: Class-Incremental and Coarse-to-Fine strategies on Sparse Data
* Continual Semantic Segmentation with Automatic Memory Sample Selection
* Continuous Human Action Recognition for Human-Machine Interaction: A Review
* Continuous Intermediate Token Learning with Implicit Motion Manifold for Keyframe Based Motion Interpolation
* Continuous Landmark Detection with 3D Queries
* Continuous Pseudo-Label Rectified Domain Adaptive Semantic Segmentation with Implicit Neural Representations
* Continuous Sign Language Recognition with Correlation Network
* ContraNeRF: Generalizable Neural Radiance Fields for Synthetic-to-real Novel View Synthesis via Contrastive Learning
* Contrast, Stylize and Adapt: Unsupervised Contrastive Learning Framework for Domain Adaptive Semantic Segmentation
* Contrastive Grouping with Transformer for Referring Image Segmentation
* Contrastive Learning for Depth Prediction
* Contrastive Mean Teacher for Domain Adaptive Object Detectors
* Contrastive Semi-Supervised Learning for Underwater Image Restoration via Reliable Bank
* Controllable GAN Synthesis Using Non-Rigid Structure-from-Motion
* Controllable Light Diffusion for Portraits
* Controllable Mesh Generation Through Sparse Latent Point Diffusion Models
* Converging Channel Attention Mechanisms with Multilayer Perceptron Parallel Networks for Land Cover Classification
* ConvMLP: Hierarchical Convolutional MLPs for Vision
* ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders
* ConVol-E: Continuous Volumetric Embeddings for Human-Centric Dense Correspondence Estimation
* Convolutional Neural Network-Based Approximation of Coverage Path Planning Results for Parking Lots
* ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing
* Cooperation or Competition: Avoiding Player Domination for Multi-Target Robustness via Adaptive Budgets
* CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching
* CoralStyleCLIP: Co-optimized Region and Layer Selection for Image Editing
* CoReFusion: Contrastive Regularized Fusion for Guided Thermal Super-Resolution
* Coreset Sampling from Open-Set for Fine-Grained Self-Supervised Learning
* Correlation Pyramid Network for 3D Single Object Tracking
* Correlational Image Modeling for Self-Supervised Visual Pre-Training
* Correspondence Transformers with Asymmetric Feature Learning and Matching Flow Super-Resolution
* COT: Unsupervised Domain Adaptation with Clustering and Optimal Transport
* CoVIO: Online Continual Learning for Visual-Inertial Odometry
* CoWs on Pasture: Baselines and Benchmarks for Language-Driven Zero-Shot Object Navigation
* CP3: Channel Pruning Plug-in for Point-Based Networks
* CR-FIQA: Face Image Quality Assessment by Learning Sample Relative Classifiability
* CRAFT: Concept Recursive Activation FacTorization for Explainability
* CrisisHateMM: Multimodal Analysis of Directed and Undirected Hate Speech in Text-Embedded Images from Russia-Ukraine Conflict
* Critical Learning Periods for Multisensory Integration in Deep Networks
* CRNet: A Fast Continual Learning Framework With Random Theory
* CrOC: Cross-View Online Clustering for Dense Visual Representation Learning
* Cross-Domain 3D Hand Pose Estimation with Dual Modalities
* Cross-Domain Image Captioning with Discriminative Finetuning
* Cross-GAN Auditing: Unsupervised Identification of Attribute Level Similarities and Differences Between Pretrained Generative Models
* Cross-Guided Optimization of Radiance Fields with Multi-View Image Super-Resolution for High-Resolution Novel View Synthesis
* Cross-Image-Attention for Conditional Embeddings in Deep Metric Learning
* Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval
* Cross-View Hierarchy Network for Stereo Image Super-Resolution
* Crossing the Gap: Domain Generalization for Image Captioning
* Crowd3D: Towards Hundreds of People Reconstruction from a Single Image
* CrowdCLIP: Unsupervised Crowd Counting via Vision-Language Model
* CUDA: Convolution-Based Unlearnable Datasets
* CUF: Continuous Upsampling Filters
* Curricular Contrastive Regularization for Physics-Aware Single Image Dehazing
* Curricular Object Manipulation in LiDAR-Based Object Detection
* Curriculum Learning for Data-Efficient Vision-Language Alignment
* Curvature-Balanced Feature Manifold Learning for Long-Tailed Classification
* Cut and Learn for Unsupervised Object Detection and Instance Segmentation
* CutMIB: Boosting Light Field Super-Resolution via Multi-View Image Blending
* CVT-SLR: Contrastive Visual-Textual Transformation for Sign Language Recognition with Variational Alignment
* CXTrack: Improving 3D Point Cloud Tracking with Contextual Information
* D2Former: Jointly Learning Hierarchical Detectors and Contextual Descriptors via Agent-Based Transformers
* D3Former: Debiased Dual Distilled Transformer for Incremental Learning
* DA Wand: Distortion-Aware Selection Using Neural Mesh Parameterization
* DA-DETR: Domain Adaptive Detection Transformer with Information Fusion
* DAA: A Delta Age AdaIN operation for age estimation via binary code transformer
* DACNet: A Deep Automated Checkout Network with Selective Deblurring
* DaFKD: Domain-aware Federated Knowledge Distillation
* DANI-Net: Uncalibrated Photometric Stereo by Differentiable Shadow Handling, Anisotropic Reflectance Modeling, and Neural Inverse Rendering
* DARE-GRAM: Unsupervised Domain Adaptation Regression by Aligning Inverse Gram Matrices
* Dark Side of Dynamic Routing Neural Networks: Towards Efficiency Backdoor Injection, The
* DART: Diversify-Aggregate-Repeat Training Improves Generalization of Neural Networks
* DartBlur: Privacy Preservation with Detection Artifact Suppression
* Data Fusion for Estimating High-Resolution Urban Heatwave Air Temperature
* Data preprocessing and feature selection techniques in gait recognition: A comparative study of machine learning and deep learning approaches
* Data-Based Perspective on Transfer Learning, A
* Data-Centric Solution to NonHomogeneous Dehazing via Vision Transformer, A
* Data-Driven Approach based on Dynamic Mode Decomposition for Efficient Encoding of Dynamic Light Fields, A
* Data-Driven Feature Tracking for Event Cameras
* Data-Efficient Large Scale Place Recognition with Graded Similarity Supervision
* Data-Free Knowledge Distillation via Feature Exchange and Activation Region Constraint
* Data-Free Model Pruning at Initialization via Expanders
* Data-Free Sketch-Based Image Retrieval
* Dataset Efficient Training with Model Ensembling
* DATE: Domain Adaptive Product Seeker for E-Commerce
* DATID-3D: Diversity-Preserved Domain Adaptation Using Text-to-Image Diffusion for 3D Generative Model
* Dawn of the Transformer Era in Speech Emotion Recognition: Closing the Valence Gap
* Daytime Sea Fog Identification Based on Multi-Satellite Information and the ECA-TransUnet Model
* DB-TASNet for disease diagnosis and lesion segmentation in medical images
* DBARF: Deep Bundle-Adjusting Generalizable Neural Radiance Fields
* DBH Estimation for Individual Tree: Two-Dimensional Images or Three-Dimensional Point Clouds?
* DC2: Dual-Camera Defocus Control by Learning to Refocus
* DCFace: Synthetic Face Generation with Dual Condition Diffusion Model
* Deadline-Aware Coded Computation Across Homogeneous Workers
* Dealing with Cross-Task Class Discrimination in Online Continual Learning
* DeAR: Debiasing Vision-Language Models with Additive Residuals
* DeCAtt: Efficient Vision Transformers with Decorrelated Attention Heads
* Decentralized Learning with Multi-Headed Distillation
* DeCo: Decomposition and Reconstruction for Compositional Temporal Grounding via Coarse-to-Fine Contrastive Ranking
* Decoding Visual Neural Representations by Multimodal Learning of Brain-Visual-Linguistic Features
* Decompose More and Aggregate Better: Two Closer Looks at Frequency Representation Learning for Human Motion Prediction
* Decompose, Adjust, Compose: Effective Normalization by Playing with Frequency for Domain Generalization
* Decomposed Cross-Modal Distillation for RGB-based Temporal Action Detection
* Decomposed Soft Prompt Guided Fusion Enhancing for Compositional Zero-Shot Learning
* Decoupled Multimodal Distilling for Emotion Recognition
* Decoupled Semantic Prototypes enable learning from diverse annotation types for semi-weakly segmentation in expert-driven domains
* Decoupling Human and Camera Motion from Videos in the Wild
* Decoupling Learning and Remembering: a Bilevel Memory Framework with Knowledge Projection for Task-Incremental Learning
* Decoupling MaxLogit for Out-of-Distribution Detection
* Decoupling-and-Aggregating for Image Exposure Correction
* Deep Arbitrary-Scale Image Super-Resolution via Scale-Equivariance Pursuit
* Deep Convolutional Sparse Coding Networks for Interpretable Image Fusion
* Deep Curvilinear Editing: Commutative and Nonlinear Image Manipulation for Pretrained Deep Generative Model
* Deep Dehazing Powered by Image Processing Network
* Deep Depth Estimation from Thermal Image
* Deep Deterministic Uncertainty: A New Simple Baseline
* Deep Discriminative Spatial and Temporal Network for Efficient Video Deblurring
* Deep Dive into Gradients: Better Optimization for 3D Object Detection with Gradient-Corrected IoU Supervision
* Deep ensemble-based hard sample mining for food recognition
* Deep Factorized Metric Learning
* Deep Fair Clustering via Maximizing and Minimizing Mutual Information: Theory, Algorithm and Metric
* Deep Frequency Filtering for Domain Generalization
* Deep Gaussian Scale Mixture Prior for Image Reconstruction
* Deep Graph Reprogramming
* Deep Graph-based Spatial Consistency for Robust Non-rigid Point Cloud Registration
* Deep Hashing with Minimal-Distance-Separated Hash Centers
* Deep Hybrid Compression Network for Lidar Point Cloud Classification and Segmentation
* Deep Incomplete Multi-View Clustering with Cross-View Partial Sample and Prototype Alignment
* Deep Learning for Earthquake Disaster Assessment: Objects, Data, Models, Stages, Challenges, and Opportunities
* Deep Learning of Partial Graph Matching via Differentiable Top-K
* Deep Learning Video Classification of Lung Ultrasound Features Associated with Pneumonia
* deep learning-based approach to increase efficiency in the acquisition of ultrasonic non-destructive testing datasets, A
* Deep learning-based image enhancement for robust remote photoplethysmography in various illumination scenarios
* Deep Learning-Enabled Sleep Staging From Vital Signs and Activity Measured Using a Near-Infrared Video Camera
* Deep Long-Tailed Learning: A Survey
* deep neural network model with GCN and 3D convolutional network for short-term metro passenger flow forecasting, A
* Deep Polarization Reconstruction with PDAVIS Events
* Deep Prototypical-Parts Ease Morphological Kidney Stone Identification and are Competitively Robust to Photometric Perturbations
* Deep Random Projector: Accelerated Deep Image Prior
* Deep robust multi-channel learning subspace clustering networks
* Deep semantic image compression via cooperative network pruning
* Deep Semi-Supervised Metric Learning with Mixed Label Propagation
* Deep Stereo Video Inpainting
* Deep unfolding for hyper sharpening using a high-frequency injection module
* DeepGEMM: Accelerated Ultra Low-Precision Inference on CPU Architectures using Lookup Tables
* DeepLSD: Line Segment Detection and Refinement with Deep Image Gradients
* DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network
* DeepMAO: Deep Multi-scale Aware Overcomplete Network for Building Segmentation in Satellite Imagery
* DeepMapping2: Self-Supervised Large-Scale LiDAR Map Optimization
* DeepRM: Deep Recurrent Matching for 6D Pose Refinement
* DeepSegmenter: Temporal Action Localization for Detecting Anomalies in Untrimmed Naturalistic Driving Videos
* DeepSim-Nets: Deep Similarity Networks for Stereo Image Matching
* DeepSmooth: Efficient and Smooth Depth Completion
* DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting
* DeepVecFont-v2: Exploiting Transformers to Synthesize Vector Fonts with Higher Quality
* DeFeeNet: Consecutive 3D Human Motion Prediction with Deviation Feedback
* Defending Against Patch-based Backdoor Attacks on Self-Supervised Learning
* Defending Fake via Warning: Universal Proactive Defense Against Face Manipulation
* Defending Low-Bandwidth Talking Head Videoconferencing Systems From Real-Time Puppeteering Attacks
* Defining and Quantifying the Emergence of Sparse Concepts in DNNs
* DeFlow: Self-supervised 3D Motion Estimation of Debris Flow
* Deformable Mesh Transformer for 3D Human Mesh Recovery
* Deformable Part Region Learning and Feature Aggregation Tree Representation for Object Detection
* DegAE: A New Pretraining Paradigm for Low-Level Vision
* DeGPR: Deep Guided Posterior Regularization for Multi-Class Cell Detection and Counting
* DejaVu: Conditional Regenerative Learning to Enhance Dense Prediction
* Delineating Peri-Urban Areas Using Multi-Source Geo-Data: A Neural Network Approach and SHAP Explanation
* Delivering Arbitrary-Modal Semantic Segmentation
* DeltaEdit: Exploring Text-free Training for Text-Driven Image Manipulation
* Delving into Discrete Normalizing Flows on SO(3) Manifold for Probabilistic Rotation Modeling
* Delving into Shape-aware Zero-shot Semantic Segmentation
* Delving StyleGAN Inversion for Image Editing: A Foundation Latent Space Viewpoint
* Demodulation Based Transformer for rPPG Generation and Heart Rate Estimation
* Demystifying Causal Features on Adversarial Examples and Causal Inoculation for Robust Network by Adversarial Instrumental Variable Regression
* Denoising diffusion models for out-of-distribution detection
* Denoising Diffusion Models for Plug-and-Play Image Restoration
* Dense Distinct Query for End-to-End Object Detection
* Dense Multitask Learning to Reconfigure Comics
* Dense Network Expansion for Class Incremental Learning
* Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline
* Denseformer: A dense transformer framework for person re-identification
* Density Invariant Contrast Maximization for Neuromorphic Earth Observations
* Density Map Distillation for Incremental Object Counting
* Density-Insensitive Unsupervised Domain Adaption on 3D Object Detection
* DepGraph: Towards Any Structural Pruning
* Depth Cue Enhancement and Guidance Network for RGB-D Salient Object Detection
* Depth Estimation from Camera Image and mmWave Radar Point Cloud
* Depth Estimation from Indoor Panoramas with Neural Scene Representation
* Depth-Guided Optimization of Neural Radiance Fields for Indoor Multi-View Stereo
* Design of a Digital Array Signal Processing System with Full Array Element
* DeSRF: Deformable Stylized Radiance Field
* DeSTSeg: Segmentation Guided Denoising Student-Teacher for Anomaly Detection
* Detail-Preserving Self-Supervised Monocular Depth with Self-Supervised Structural Sharpening
* DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment
* Detecting and Grounding Multi-Modal Media Manipulation
* Detecting Backdoors During the Inference Stage Based on Corruption Robustness Consistency
* Detecting Backdoors in Pre-trained Encoders
* Detecting Everything in the Open World: Towards Universal Object Detection
* Detecting Human-Object Contact in Images
* Detecting Mental Distresses Using Social Behavior Analysis in the Context of COVID-19: A Survey
* Detecting Underwater Discrete Scatterers in Echograms with Deep Learning-Based Semantic Segmentation
* Detection Hub: Unifying Object Detection Datasets via Query Adaptation on Language Embedding
* Detection of Forest Fires through Deep Unsupervised Learning Modeling of Sentinel-1 Time Series
* Detection of GAN Generated Image Using Color Gradient Representation
* Detection of Out-of-Distribution Samples Using Binary Neuron Activation Patterns
* Deterministic sampling in heterogeneous graph neural networks
* DETR with Additional Global Aggregation for Cross-domain Weakly Supervised Object Detection
* DETR-based Layered Clothing Segmentation and Fine-Grained Attribute Recognition
* DETRs with Hybrid Matching
* Developing a Pixel-Scale Corrected Nighttime Light Dataset (PCNL, 1992-2021) Combining DMSP-OLS and NPP-VIIRS
* Devil is in the Points: Weakly Semi-Supervised Instance Segmentation via Point-Guided Mask Representation, The
* Devil is in the Queries: Advancing Mask Transformers for Real-world Medical Image Segmentation and Out-of-Distribution Localization
* Devil's on the Edges: Selective Quad Attention for Scene Graph Generation
* DexArt: Benchmarking Generalizable Dexterous Manipulation with Articulated Objects
* DF-Platter: Multi-Face Heterogeneous Deepfake Dataset
* Dialog Must Go On: Improving Visual Dialog via Generative Self-Training, The
* Dicer: Dialogue-Centric Representation for Knowledge-Grounded Dialogue through Contrastive Learning
* Dictionary-based histogram packing technique for lossless image compression
* DiffCollage: Parallel Generation of Large Content with Diffusion Models
* Differentiable Architecture Search with Random Features
* Differentiable Lens: Compound Lens Search over Glass Surfaces and Materials for Object Detection, The
* Differentiable Shadow Mapping for Efficient Inverse Graphics
* Difficulty Estimation with Action Scores for Computer Vision Tasks
* Difficulty-Based Sampling for Debiased Contrastive Representation Learning
* DiffPose: Toward More Reliable 3D Pose Estimation
* DiffRF: Rendering-Guided 3D Radiance Field Diffusion
* DiffSwap: High-Fidelity and Controllable Face Swapping via 3D-Aware Masked Diffusion
* DiffTalk: Crafting Diffusion Models for Generalized Audio-Driven Portraits Animation
* Diffusart: Enhancing Line Art Colorization with Conditional Diffusion Models
* Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models
* Diffusion Models in Vision: A Survey
* Diffusion Probabilistic Model Made Slim
* Diffusion Video Autoencoders: Toward Temporally Consistent Face Video Editing via Disentangled Video Encoding
* Diffusion-based Generation, Optimization, and Planning in 3D Scenes
* Diffusion-Based Signed Distance Fields for 3D Shape Generation
* Diffusion-Enhanced PatchMatch: A Framework for Arbitrary Style Transfer with Diffusion Models
* Diffusion-SDF: Text-to-Shape via Voxelized Diffusion
* DiffusioNeRF: Regularizing Neural Radiance Fields with Denoising Diffusion Models
* DiffusionRig: Learning Personalized Priors for Facial Appearance Editing
* DIFT: Dynamic Iterative Field Transforms for Memory Efficient Optical Flow
* DIFu: Depth-Guided Implicit Function for Clothed Human Reconstruction
* DiGA: Distil to Generalize and then Adapt for Domain Adaptive Semantic Segmentation
* DiGeo: Discriminative Geometry-Aware Learning for Generalized Few-Shot Object Detection
* Digital Grid Model for Complex Time-Varying Environments in Civil Engineering Buildings, A
* Digital Twin Tracking Dataset (DTTD): A New RGB+Depth 3D Dataset for Longer-Range Object Tracking Applications
* Digital Twins for Protecting Cultural Heritage Against Climate Change
* Dilated Convolutional Transformer for High-Quality Image Deraining
* Dimensionality Reduction and Anomaly Detection Based on Kittler's Taxonomy: Analyzing Water Bodies in Two Dimensional Spaces
* Dimensionality-Varying Diffusion Process
* DINER: Depth-aware Image-based NEural Radiance fields
* DINER: Disorder-Invariant Implicit Neural Representation
* DINN360: Deformable Invertible Neural Network for Latitude-aware 360° Image Rescaling
* Dionysus: Recovering Scene Structures by Dividing into Semantic Pieces
* DIP: Dual Incongruity Perceiving Network for Sarcasm Detection
* DIPNet: Efficiency Distillation and Iterative Pruning for Image Super-Resolution
* Directional Connectivity-based Segmentation of Medical Images
* DISC: Learning from Noisy Labels via Dynamic Instance-Specific Selection and Correction
* DisCo-CLIP: A Distributed Contrastive Loss for Memory Efficient CLIP Training
* DisCoScene: Spatially Disentangled Generative Radiance Fields for Controllable 3D-aware Scene Synthesis
* Discovering Class-Specific GAN Controls for Semantic Image Synthesis
* Discovering the Real Association: Multimodal Causal Reasoning in Video Question Answering
* Discrete Point-Wise Attack is Not Enough: Generalized Manifold Adversarial Attack for Face Recognition
* Discriminable feature enhancement for unsupervised domain adaptation
* Discriminating Known from Unknown Objects via Structure-Enhanced Recurrent Variational AutoEncoder
* Discriminative Co-Saliency and Background Mining Transformer for Co-Salient Object Detection
* Discriminator-Cooperated Feature Map Distillation for GAN Compression
* Disentangled Representation Learning for Unsupervised Neural Quantization
* Disentangling high-level factors and their features with conditional vector quantized VAEs
* Disentangling Local and Global Information for Light Field Depth Estimation
* Disentangling Neuron Representations with Concept Vectors
* Disentangling Orthogonal Planes for Indoor Panoramic Room Layout Estimation with Cross-Scale Distortion Awareness
* Disentangling Writer and Character Styles for Handwriting Generation
* DistgEPIT: Enhanced Disparity Learning for Light Field Image Super-Resolution
* Distilling Cross-Temporal Contexts for Continuous Sign Language Recognition
* Distilling Focal Knowledge from Imperfect Expert for 3D Object Detection
* Distilling knowledge for occlusion robust monocular 3D face reconstruction
* Distilling Neural Fields for Real-Time Articulated Shape Reconstruction
* Distilling Self-Supervised Vision Transformers for Weakly-Supervised Few-Shot Classification and Segmentation
* Distilling Vision-Language Pre-Training to Collaborate with Weakly-Supervised Temporal Action Localization
* DistilPose: Tokenized Pose Regression with Heatmap Distillation
* Distinct Impacts of Two Types of Developing El Nino-Southern Oscillations on Tibetan Plateau Summer Precipitation
* DistractFlow: Improving Optical Flow Estimation via Realistic Distractions and Pseudo-Labeling
* Distributed Fault-Tolerant Control for High-Speed Trains Based on Adaptive Terminal Sliding Mode Control
* Distribution Shift Inversion for Out-of-Distribution Prediction
* DisWOT: Student Architecture Search for Distillation WithOut Training
* Diurnal Variation Characteristics of Summer Precipitation and Related Statistical Analysis in the Ili Region, Xinjiang, Northwest China
* DivClust: Controlling Diversity in Deep Clustering
* Diverse 3D Hand Gesture Prediction from Body Dynamics by Bilateral Hand Disentanglement
* Diverse Embedding Expansion Network and Low-Light Cross-Modality Benchmark for Visible-Infrared Person Re-identification
* Diversified and Multi-Class Controllable Industrial Defect Synthesis for Data Augmentation and Transfer
* Diversity is Definitely Needed: Improving Model-Agnostic Zero-shot Classification via Stable Diffusion
* Diversity-Aware Meta Visual Prompting
* Diversity-Measurable Anomaly Detection
* Divide and Adapt: Active Domain Adaptation via Customized Learning
* Divide and Conquer: Answering Questions with Object Factorization and Compositional Reasoning
* DKM: Dense Kernelized Feature Matching for Geometry Estimation
* DKT: Diverse Knowledge Transfer Transformer for Class Incremental Learning
* DLBD: A Self-Supervised Direct-Learned Binary Descriptor
* DNA: Deformable Neural Articulations Network for Template-free Dynamic 3D Human Reconstruction from Monocular RGB-D Video
* DNeRV: Modeling Inherent Dynamics via Difference Neural Representation for Videos
* DNF: Decouple and Feedback Network for Seeing in the Dark
* DOA Estimation of Multiple Coherent Targets Using Weight Vector Orthogonal Decomposition in TDM-MIMO HF-Radar
* DOAD: Decoupled One Stage Action Detection Network
* Document Image Shadow Removal Guided by Color-Aware Background
* Does Image Anonymization Impact Computer Vision Training?
* Domain Expansion of Image Generators
* Domain Generalized Stereo Matching via Hierarchical Visual Transformation
* Domain-Class Correlation Decomposition for Generalizable Person Re-Identification
* Don't Be So Dense: Sparse-to-Sparse GAN Training Without Sacrificing Performance
* Don't FREAK Out: A Frequency-Inspired Approach to Detecting Backdoor Poisoned Samples in DNNs
* Don't Lie to Me! Robust and Efficient Explainability with Verified Perturbation Analysis
* DoNet: Deep De-Overlapping Network for Cytology Instance Segmentation
* Doppler and Pair-Wise Optical Flow Constrained 3D Motion Compensation for 3D Ultrasound Imaging
* Doubly Right Object Recognition: A Why Prompt for Visual Rationales
* DOVE: Learning Deformable 3D Objects by Watching Videos
* Downscaled Satellite Solar-Induced Chlorophyll Fluorescence Detects the Early Response of Sugarcane to Drought Stress in a Major Sugarcane-Planting Region of China
* DP-NeRF: Deblurred Neural Radiance Field with Physical Scene Priors
* DPE: Disentanglement of Pose and Expression for General Video Portrait Editing
* DPF: Learning Dense Prediction Fields with Weak Supervision
* DPOSE: Online Keypoint-CAM Guided Inference for Driver Pose Estimation with GMM-based Balanced Sampling
* DPPD: Deformable Polar Polygon Object Detection
* DR2: Diffusion-Based Robust Degradation Remover for Blind Face Restoration
* DrapeNet: Garment Generation and Self-Supervised Draping
* Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models
* DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
* Drone-Based Assessment of Marine Megafauna off Wave-Exposed Sandy Beaches
* DropKey for Vision Transformer
* DropMAE: Masked Autoencoders with Spatial-Attention Dropout for Tracking Tasks
* DSFNet: Dual Space Fusion Network for Occlusion-Robust 3D Dense Face Alignment
* DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets
* Dual Alignment Unsupervised Domain Adaptation for Video-Text Retrieval
* Dual Attention Poser: Dual Path Body Tracking Based on Attention
* Dual Branch Network for Emotional Reaction Intensity Estimation, A
* Dual cross knowledge distillation for image super-resolution
* Dual Vision Transformer
* dual-balanced network for long-tail distribution object detection, A
* Dual-bridging with Adversarial Noise Generation for Domain Adaptive rPPG Estimation
* Dual-Decoding branch U-shaped semantic segmentation network combining Transformer attention with Decoder: DBUNet, A
* Dual-Path Adaptation from Image to Video Transformers
* DualRefine: Self-Supervised Depth and Pose Estimation Through Iterative Epipolar Sampling land Refinement Toward Equilibrium
* DualRel: Semi-Supervised Mitochondria Segmentation from A Prototype Perspective
* DualVector: Unsupervised Vector Font Synthesis with Dual-Part Representation
* DyLiN: Making Light Field Networks Dynamic
* DYNAFED: Tackling Client Data Heterogeneity with Global Dynamics
* DynaMask: Dynamic Mask Selection for Instance Segmentation
* Dynamic Aggregated Network for Gait Recognition
* Dynamic Coarse-to-Fine Learning for Oriented Tiny Object Detection
* Dynamic Conceptional Contrastive Learning for Generalized Category Discovery
* Dynamic Feature Queue for Surveillance Face Anti-spoofing via Progressive Training
* Dynamic Focus-aware Positional Queries for Semantic Segmentation
* Dynamic Generative Targeted Attacks with Pattern Injection
* Dynamic Graph Enhanced Contrastive Learning for Chest X-Ray Report Generation
* Dynamic Graph Learning with Content-guided Spatial-Frequency Relation Reasoning for Deepfake Detection
* Dynamic Inference Acceleration of 3D Point Cloud Deep Neural Networks Using Point Density and Entropy
* Dynamic Inference with Grounding Based Vision and Language Models
* Dynamic Multi-Scale Voxel Flow Network for Video Prediction, A
* Dynamic Multimodal Fusion
* Dynamic Neural Network for Multi-Task Learning Searching across Diverse Network Topologies
* Dynamic Noise Injection for Facial Expression Recognition In-the-Wild
* Dynamic Response Measurement and Cable Tension Estimation Using an Unmanned Aerial Vehicle
* Dynamic Rigid Bodies Mining and Motion Estimation Based on Monocular Camera
* Dynamic Spatial Sparsification for Efficient Vision Transformers and Convolutional Neural Networks
* Dynamic texture analysis using Temporal Gray scale Pattern Image for water surface velocity measurement
* Dynamically Instance-Guided Adaptation: A Backward-free Approach for Test-Time Domain Adaptive Semantic Segmentation
* DynamicDet: A Unified Dynamic Architecture for Object Detection
* DynamicStereo: Consistent Dynamic Depth from Stereo Videos
* DynaShare: Task and Instance Conditioned Parameter Sharing for Multi-Task Learning
* DyNCA: Real-Time Dynamic Texture Synthesis Using Neural Cellular Automata
* DynIBaR: Neural Dynamic Image-Based Rendering
* DynStatF: An Efficient Feature Fusion Strategy for LiDAR 3D Object Detection
* E2PN: Efficient SE(3)-Equivariant Point Network
* EatSense: Human centric, action recognition and localization dataset for understanding eating behaviors and quality of motion assessment
* EC2: Emergent Communication for Embodied Control
* ECA-ConvNeXt: A Rice Leaf Disease Identification Model Based on ConvNeXt
* ECON: Explicit Clothed humans Optimized via Normal integration
* EcoTTA: Memory-Efficient Continual Test-Time Adaptation via Self-Distilled Regularization
* EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding
* Edge Computing and Sensor-Cloud: Overview, Solutions, and Directions
* Edge Real-Time Object Detection and DPU-Based Hardware Implementation for Optical Remote Sensing Images
* Edge-aware Regional Message Passing Controller for Image Forgery Localization
* EDGE: Editable Dance Generation From Music
* Edges to Shapes to Concepts: Adversarial Augmentation for Robust Vision
* EDICT: Exact Diffusion Inversion via Coupled Transformations
* EditableNeRF: Editing Topologically Varying Neural Radiance Fields by Key Points
* EFE: End-to-end Frame-to-Gaze Estimation
* EFEM: Equivariant Neural Field Expectation Maximization for 3D Object Segmentation Without Scene Supervision
* Effect of Water Vapor Transport on a Typical Rainstorm Process in the Arid Region of Southern Xinjiang: Observations and Numerical Simulations
* Effective Ambiguity Attack Against Passport-based DNN Intellectual Property Protection Schemes through Fully Connected Layer Substitution
* Effective Crop-Paste Pipeline for Few-shot Object Detection, An
* Effective hybrid attention network based on pseudo-color enhancement in ultrasound image segmentation
* Effective Motorcycle Helmet Object Detection Framework for Intelligent Traffic Safety, An
* Effects of Multi-Growth Periods UAV Images on Classifying Karst Wetland Vegetation Communities Using Object-Based Optimization Stacking Algorithm
* Effects of Topography on Vegetation Recovery after Shallow Landslides in the Obara and Shobara Districts, Japan
* Efficient and Explicit Modelling of Image Hierarchies for Image Restoration
* Efficient Bayesian Computation for Low-Photon Imaging Problems
* Efficient BDS-3 Long-Range Undifferenced Network RTK Positioning Algorithm, An
* Efficient Deep Models for Real-Time 4K Image Super-Resolution. NTIRE 2023 Benchmark and Report
* Efficient Frequency Domain-based Transformers for High-Quality Image Deblurring
* Efficient Geometry Surface Coding in V-PCC
* Efficient Hierarchical Entropy Model for Learned Point Cloud Compression
* Efficient Layer Compression Without Pruning
* Efficient Loss Function by Minimizing the Detrimental Effect of Floating-Point Errors on Gradient-Based Attacks
* Efficient Map Sparsification Based on 2D and 3D Discretized Grids
* Efficient Mask Correction for Click-Based Interactive Image Segmentation
* Efficient Movie Scene Detection using State-Space Transformers
* Efficient Multi-exposure Image Fusion via Filter-dominated Fusion and Gradient-driven Unsupervised Learning
* Efficient Multi-Lens Bokeh Effect Rendering and Transformation
* Efficient Multimodal Fusion via Interactive Prompting
* Efficient On-Device Training via Gradient Filtering
* efficient optimization of measurement matrix for compressive sensing, An
* Efficient RGB-T Tracking via Cross-Modality Distillation
* Efficient Robust Principal Component Analysis via Block Krylov Iteration and CUR Decomposition
* Efficient Robustness Assessment via Adversarial Spatial-Temporal Focus on Videos
* Efficient Scale-Invariant Generator with Column-Row Entangled Pixel Synthesis
* Efficient Second-Order Plane Adjustment
* Efficient Semantic Segmentation by Altering Resolutions for Compressed Videos
* Efficient unsupervised learning of biological images with compressed deep features
* Efficient Verification of Neural Networks Against LVM-Based Specifications
* Efficient View Synthesis and 3D-based Multi-Frame Denoising with Multiplane Feature Representations
* EfficientSCI: Densely Connected Network with Space-time Factorization for Large-scale Video Snapshot Compressive Imaging
* EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention
* EGA-Depth: Efficient Guided Attention for Self-Supervised Multi-Camera Depth Estimation
* Ego-Body Pose Estimation via Ego-Head Pose Estimation
* Egocentric Audio-Visual Object Localization
* Egocentric Auditory Attention Localization in Conversations
* Egocentric Video Task Translation
* EKILA: Synthetic Media Provenance and Attribution for Generative Art
* Elastic Aggregation for Federated Optimization
* EMHIFormer: An Enhanced Multi-Hypothesis Interaction Transformer for 3D human pose estimation in video
* EmotiEffNets for Facial Processing in Video-based Valence-Arousal Prediction, Expression Classification and Action Unit Detection
* Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling, An
* EMT-NAS: Transferring architectural knowledge between tasks from different datasets
* Enabling High-Resolution Micro-Vibration Detection Using Ground-Based Synthetic Aperture Radar: A Case Study for Pipeline Monitoring
* End-to-End 3D Dense Captioning with Vote2Cap-DETR
* end-to-end anti-shaking multi-focus image fusion approach, An
* End-to-end Neuromorphic Lip Reading
* End-to-End Vectorized HD-map Construction with Piecewise Bézier Curve
* End-to-end Video Matting with Trimap Propagation
* Endpoints Weight Fusion for Class Incremental Semantic Segmentation
* Enemy of My Enemy is My Friend: Exploring Inverse Adversaries for Improving Adversarial Training, The
* Energy-Efficient Adaptive 3D Sensing
* Enhanced Multimodal Representation Learning with Cross-Modal KD
* Enhanced Stable View Synthesis
* Enhanced Thermal-RGB Fusion for Robust Object Detection
* Enhanced Training of Query-Based Object Detection via Selective Query Recollection
* Enhancing Deformable Local Features by Jointly Learning to Detect and Describe Keypoints
* Enhancing Indoor Air Quality Estimation: A Spatially Aware Interpolation Scheme
* Enhancing Multi-Camera People Tracking with Anchor-Guided Clustering and Spatio-Temporal Consistency ID Re-Assignment
* Enhancing Multiple Reliability Measures via Nuisance-Extended Information Bottleneck
* Enhancing Retail Checkout through Video Inpainting, YOLOv8 Detection, and DeepSort Tracking
* Enhancing Sea Surface Height Retrieval with Triple Features Using Support Vector Regression
* Enhancing the Self-Universality for Transferable Targeted Attacks
* Enlarging Instance-specific and Class-specific Information for Open-set Action Recognition
* Ensemble Method with Edge Awareness for Abnormally Shaped Nuclei Segmentation, An
* Ensemble Spatial and Temporal Vision Transformer for Action Units Detection
* Ensemble-based Blackbox Attacks on Dense Prediction
* Entropic Descent Archetypal Analysis for Blind Hyperspectral Unmixing
* Entropy Coding-based Lossless Compression of Asynchronous Event Sequences
* Envisioning a Next Generation Extended Reality Conferencing System with Efficient Photorealistic Human Rendering
* EPI-Guided Cost Construction Network for Light Field Disparity Estimation
* EqMotion: Equivariant Multi-Agent Motion Prediction with Invariant Interaction Reasoning
* Equiangular Basis Vectors
* Equivalent Transformation and Dual Stream Network Construction for Mobile Image Super-Resolution
* ERM-KTP: Knowledge-Level Machine Unlearning via Knowledge Transfer
* ERNIE-ViLG 2.0: Improving Text-to-Image Diffusion Model with Knowledge-Enhanced Mixture-of-Denoising-Experts
* Erudite Fine-Grained Visual Classification Model, An
* ES3Net: Accurate and Efficient Edge-based Self-Supervised Stereo Matching Network
* ESLAM: Efficient Dense SLAM System Based on Hybrid Representation of Signed Distance Fields
* Estimating and Maximizing Mutual Information for Knowledge Distillation
* Estimation and Development-Potential Analysis of Regional Housing in Ningbo City Based on High-Resolution Stereo Remote Sensing
* Estimation of Aerosol Layer Height from OLCI Measurements in the O2A-Absorption Band over Oceans
* Estimation of High-Resolution Soil Moisture in Canadian Croplands Using Deep Neural Network with Sentinel-1 and Sentinel-2 Images
* ETAD: Training Action Detection End to End on a Laptop
* Euclidean Direction Search Algorithm Based on Maximum Correntropy Criterion
* Euler Characteristic Transform Based Topological Loss for Reconstructing 3D Images from Single 2D Slices
* EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
* Evading DeepFake Detectors via Adversarial Statistical Consistency
* Evading Forensic Classifiers with Attribute-Conditioned Adversarial Faces
* EVAEF: Ensemble Valence-Arousal Estimation Framework in the Wild
* EVAL: Explainable Video Anomaly Localization
* Evaluating synthetic pre-Training for handwriting processing tasks
* Evaluation of Atmospheric Phase Correction Performance in 79 GHz Ground-Based Radar Interferometry: A Comparison with 17 GHz Ground-Based SAR Data
* Evaluation of Smartphone Tracking for Travel Behavior Studies, An
* Event-based Blur Kernel Estimation For Blind Motion Deblurring
* Event-based Blurry Frame Interpolation under Blind Exposure
* Event-Based Frame Interpolation with Ad-hoc Deblurring
* Event-Based Shape from Polarization
* Event-based Video Frame Interpolation with Cross-Modal Asymmetric Bidirectional Motion Fields
* Event-guided low light image enhancement via a dual branch GAN
* Event-Guided Person Re-Identification via Sparse-Dense Complementary Learning
* Event-IMU fusion strategies for faster-than-IMU estimation throughput
* EventNeRF: Neural Radiance Fields from a Single Colour Event Camera
* Evolution Characteristics and Causes: An Analysis of Urban Catering Cluster Spatial Structure
* Evolved Part Masking for Self-Supervised Learning
* EVREAL: Towards a Comprehensive Benchmark and Analysis Suite for Event-based Video Reconstruction
* EvShutter: Transforming Events for Unconstrained Rolling Shutter Correction
* Exact-NeRF: An Exploration of a Precise Volumetric Parameterization for Neural Radiance Fields
* EXCALIBUR: Encouraging and Evaluating Embodied Exploration
* Executing your Commands via Motion Diffusion in Latent Space
* Exemplar-FreeSOLO: Enhancing Unsupervised Instance Segmentation with Exemplars
* EXIF as Language: Learning Cross-Modal Associations between Images and Camera Metadata
* Expanding Synthetic Real-World Degradations for Blind Video Super Resolution
* Explaining Image Classifiers with Multiscale Directional Image Representation
* Explanation and Analysis of Spatio-Temporal Correlations: Towards a Conceptual Approach of a Semantic Comparison Visualization in a Use Case of Carparks in Mainz, Germany
* Explicit Boundary Guided Semi-Push-Pull Contrastive Learning for Supervised Anomaly Detection
* Explicit Visual Prompting for Low-Level Structure Segmentations
* Exploiting Completeness and Uncertainty of Pseudo Labels for Weakly Supervised Video Anomaly Detection
* Exploiting the Complementarity of 2D and 3D Networks to Address Domain-Shift in 3D Semantic Segmentation
* Exploiting Unlabelled Photos for Stronger Fine-Grained SBIR
* Explore the Power of Synthetic Data on Few-shot Object Detection
* Exploring and Exploiting Uncertainty for Incomplete Multi-View Classification
* Exploring and Utilizing Pattern Imbalance
* Exploring Compositional Visual Generation with Latent Classifier Guidance
* Exploring Data Geometry for Continual Learning
* Exploring Discontinuity for Video Frame Interpolation
* Exploring Diversified Adversarial Robustness in Neural Networks via Robust Mode Connectivity
* Exploring Effective Detection and Spatial Pattern of Prickly Pear Cactus (Opuntia Genus) from Airborne Imagery before and after Prescribed Fires in the Edwards Plateau
* Exploring Equity in a Hierarchical Medical Treatment System: A Focus on Determinants of Spatial Accessibility
* Exploring Expression-related Self-supervised Learning and Spatial Reserve Pooling for Affective Behaviour Analysis
* Exploring Incompatible Knowledge Transfer in Few-shot Image Generation
* Exploring Intra-class Variation Factors with Learnable Cluster Prompts for Semi-supervised Image Synthesis
* Exploring Joint Embedding Architectures and Data Augmentations for Self-Supervised Representation Learning in Event-Based Vision
* Exploring Large-scale Unlabeled Faces to Enhance Facial Expression Recognition
* Exploring Motion Ambiguity and Alignment for High-Quality Video Frame Interpolation
* Exploring Random Forest Machine Learning and Remote Sensing Data for Streamflow Prediction: An Alternative Approach to a Process-Based Hydrologic Modeling in a Snowmelt-Driven Watershed
* Exploring Structured Semantic Prior for Multi Label Recognition with Incomplete Labels
* Exploring the Effect of Primitives for Compositional Generalization in Vision-and-Language
* Exploring the Effectiveness of Lightweight Architectures for Face Anti-Spoofing
* Exploring the Importance of Pretrained Feature Extractors for Unsupervised Anomaly Detection and Localization
* Exploring the Potential of Multi-Temporal Crop Canopy Models and Vegetation Indices from Pleiades Imagery for Yield Estimation
* Exploring the Potential of Neural Dataset Search
* Exploring the potential of Siamese network for RGBT object tracking
* Exploring the Relationship Between Architectural Design and Adversarially Robust Generalization
* Exploring the Spatial Relationship between the Ecological Topological Network and Carbon Sequestration Capacity of Coastal Urban Ecosystems: A Case Study of Yancheng City, China
* Exploring the Utility of Self-Supervised Pretraining Strategies for the Detection of Absent Lung Sliding in M-Mode Lung Ultrasound
* Exploring Video Frame Redundancies for Efficient Data Sampling and Annotation in Instance Segmentation
* expOSE: Accurate Initialization-Free Projective Factorization using Exponential Regularization
* Exposing and Mitigating Spurious Correlations for Cross-Modal Retrieval
* Exposing Fine-Grained Adversarial Vulnerability of Face Anti-Spoofing Models
* Exposing GAN-Generated Profile Photos from Compact Embeddings
* Extended H-inf Particle Filter for Attitude Estimation Applied to Remote Sensing Satellite CBERS-4, The
* Extended Study of Human-like Behavior under Adversarial Training, An
* Extracting Class Activation Maps from Non-Discriminative Features as well
* Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolation
* F2-NeRF: Fast Neural Radiance Field Training with Free Camera Trajectories
* FAC: 3D Representation Learning via Foreground Aware Feature Contrast
* Face Animation with an Attribute-Guided Diffusion Model
* Face De-Occlusion With Deep Cascade Guidance Learning
* Face Image Lighting Enhancement Using a 3D Model
* Face Recognition Accuracy Across Demographics: Shining a Light Into the Problem
* Face Transformer: Towards High Fidelity and Accurate Face Swapping
* FaceLit: Neural 3D Relightable Faces
* Facial Expression Recognition Based on Multi-modal Features for Videos in the Wild
* Fair Federated Medical Image Segmentation via Client Contribution Estimation
* Fair Scratch Tickets: Finding Fair Sparse Networks without Weight Training
* Fake it Till You Make it: Learning Transferable Representations from Synthetic ImageNet Clones
* FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks
* Fantastic Breaks: A Dataset of Paired 3D Scans of Real-World Broken Objects and Their Complete Counterparts
* Fashion-Specific Ambiguous Expression Interpretation with Partial Visual-Semantic Embedding
* FashionSAP: Symbols and Attributes Prompt for Fine-Grained Fashion Vision-Language Pre-Training
* FashionVQA: A Domain-Specific Visual Question Answering System
* Fast Contextual Scene Graph Generation with Unbiased Context Augmentation
* Fast GraspNeXt: A Fast Self-Attention Neural Network Architecture for Multi-task Learning in Computer Vision Tasks for Robotic Grasping on the Edge
* Fast Local Thickness
* Fast Matching Method for the SAR Images with Large Viewing Angles Based on Inertial Navigation Information and Neighborhood Structure Consensus, A
* Fast Monocular Scene Reconstruction with Global-Sparse Local-Dense Grids
* Fast Non-Local Attention network for light super-resolution
* Fast Point Cloud Generation with Straight Flows
* Fast Trajectory End-Point Prediction with Event Cameras for Reactive Robot Control
* Fast Vehicle Routing via Knowledge Transfer in a Reproducing Kernel Hilbert Space
* FastInst: A Simple Query-Based Model for Real-Time Instance Segmentation
* Fault Diagnosis of Rotating Machinery Based on a Mutual Dimensionless Index and a Convolution Neural Network, A
* Fault diagnosis of ZDJ7 railway point machine based on improved DCNN and SVDD classification
* FBSNet: A Fast Bilateral Symmetrical Network for Real-Time Semantic Segmentation
* FCC: Feature Clusters Compression for Long-Tailed Visual Recognition
* FeatER: An Efficient Network for Human Reconstruction via Feature Map-Based TransformER
* Feature Aggregated Queries for Transformer-Based Video Object Detectors
* Feature Alignment and Uniformity for Test Time Adaptation
* Feature Representation Learning with Adaptive Displacement Generation and Transformer Fusion for Micro-Expression Recognition
* Feature Selection as a Hedonic Coalition Formation Game for Arabic Topic Detection
* Feature Separation and Recalibration for Adversarial Robustness
* Feature Shrinkage Pyramid for Camouflaged Object Detection with Transformers
* FeatureBooster: Boosting Feature Descriptors with a Lightweight Neural Network
* FedDM: Iterative Distribution Matching for Communication-Efficient Federated Learning
* Federated Domain Generalization with Generalization Adjustment
* Federated Incremental Semantic Segmentation
* Federated Learning in Non-IID Settings Aided by Differentially Private Synthetic Data
* Federated Learning with Data-Agnostic Distribution Fusion
* Federated learning with l1 regularization
* FedSeg: Class-Heterogeneous Federated Learning for Semantic Segmentation
* FEND: A Future Enhanced Distribution-Aware Contrastive Learning Framework for Long-Tail Trajectory Prediction
* Few-Shot Class-Incremental Learning via Class-Aware Bilateral Distillation
* Few-Shot Depth Completion Using Denoising Diffusion Probabilistic Model
* Few-Shot Geometry-Aware Keypoint Localization
* Few-Shot Learning with Visual Distribution Calibration and Cross-Modal Distribution Alignment
* Few-shot logo detection
* Few-Shot Non-Line-of-Sight Imaging with Signal-Surface Collaborative Regularization
* Few-Shot Referring Relationships in Videos
* Few-shot Semantic Image Synthesis with Class Affinity Transfer
* FewSOME: One-Class Few Shot Anomaly Detection with Siamese Networks
* FF-Former: Swin Fourier Transformer for Nighttime Flare Removal
* FFCV: Accelerating Training by Removing Data Bottlenecks
* FFF: Fragment-Guided Flexible Fitting for Building Complete Protein Structures
* FFHQ-UV: Normalized Facial UV-Texture Dataset for 3D Face Reconstruction
* FgbCNN: A unified bilinear architecture for learning a fine-grained feature representation in facial expression recognition
* FIANCEE: Faster Inference of Adversarial Networks via Conditional Early Exits
* Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training
* Find My Astronaut Photo: Automated Localization and Georectification of Astronaut Photography
* Finding Geometric Models by Clustering in the Consensus Space
* Fine-grained Audible Video Description
* Fine-Grained Classification with Noisy Labels
* Fine-Grained Face Swapping Via Regional GAN Inversion
* Fine-grained Image-Text Matching by Cross-modal Hard Aligning Network
* Fine-tuned CLIP Models are Efficient Video Learners
* Finetune like you pretrain: Improved finetuning of zero-shot vision models
* First Nighttime Light Spectra by Satellite: By EnMAP
* FishDreamer: Towards Fisheye Semantic Completion via Unified Image Outpainting and Segmentation
* FishEye8K: A Benchmark and Dataset for Fisheye Camera Object Detection
* FitMe: Deep Photorealistic 3D Morphable Model Avatars
* Fix the Noise: Disentangling Source Feature for Controllable Domain Translation
* FJMP: Factorized Joint Multi-Agent Motion Prediction over Learned Directed Acyclic Interaction Graphs
* FLAG3D: A 3D Fitness Activity Dataset with Language Instruction
* FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer
* FLEX: Full-Body Grasping Without Full-Body Grasps
* Flexible-Cm GAN: Towards Precise 3D Dose Prediction in Radiotherapy
* Flexible-Modal Face Anti-Spoofing: A Benchmark
* FlexiCurve: Flexible Piecewise Curves Estimation for Photo Retouching
* FlexiViT: One Model for All Patch Sizes
* FlexNeRF: Photorealistic Free-viewpoint Rendering of Moving Humans from Sparse Views
* FLIGHT Mode On: A Feather-Light Network for Low-Light Image Enhancement
* Flow cytometry with event-based vision and spiking neuromorphic hardware
* Flow Supervision for Deformable NeRF
* FlowFormer++: Masked Cost Volume Autoencoding for Pretraining Optical Flow Estimation
* FlowGrad: Controlling the Output of Generative ODEs with Gradients
* Focus On Details: Online Multi-Object Tracking with Diverse Fine-Grained Representation
* Focused and Collaborative Feedback Integration for Interactive Image Segmentation
* Foundation Model Drives Weakly Incremental Learning for Semantic Segmentation
* Four-view Geometry with Unknown Radial Distortion
* Frame Flexible Network
* Frame Interpolation Transformer and Uncertainty Guidance
* Frame Level Emotion Guided Dynamic Facial Expression Recognition with Emotion Grouping
* Frame-Event Alignment and Fusion Network for High Frame Rate Tracking
* FREDOM: Fairness Domain Adaptation Approach to Semantic Scene Understanding
* FreeNeRF: Improving Few-Shot Neural Rendering with Free Frequency Regularization
* FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation
* Freestyle Layout-to-Image Synthesis
* FreqHPT: Frequency-aware attention and flow fusion for Human Pose Transfer
* Frequency Increment Design Method of MR-FDA-MIMO Radar for Interference Suppression
* Frequency Tracker for Unsupervised Heart Rate Estimation
* Frequency-Modulated Point Cloud Rendering with Easy Editing
* Fresnel Microfacet BRDF: Unification of Polari-Radiometric Surface-Body Reflection
* From Images to Textual Prompts: Zero-shot Visual Question Answering with Frozen Large Language Models
* From Node Interaction to Hop Interaction: New Effective and Scalable Graph Learning Paradigm
* From Reactive to Active Sensing: A Survey on Information Gathering in Decision-Theoretic Planning
* FRR-Net: A Real-Time Blind Face Restoration and Relighting Network
* Frugal event data: how small is too small? A human performance assessment with shrinking data
* Frustratingly Easy Regularization on Representation Can Boost Deep Reinforcement Learning
* FrustumFormer: Adaptive Instance-aware Resampling for Multi-view 3D Detection
* FsaNet: Frequency Self-Attention for Semantic Segmentation
* FSformer: Fast-Slow Transformer for video action recognition
* Full or Weak Annotations? An Adaptive Strategy for Budget-Constrained Annotation Campaigns
* Full-Body Cardiovascular Sensing with Remote Photoplethysmography
* Full-Spectrum Out-of-Distribution Detection
* Fully Self-Supervised Depth Estimation from Defocus Clue
* Fully-Binarized Distance Computation based On-device Few-Shot Learning for XR applications
* Fusing Pre-Trained Language Models with Multimodal Prompts through Reinforcement Learning
* Fusion of Identification Information from ESM Sensors and Radars Using Dezert-Smarandache Theory Rules
* Fusion-SUNet: Spatial Layout Consistency for 3D Semantic Segmentation
* FUTR3D: A Unified Sensor Fusion Framework for 3D Detection
* Fuzzy Positive Learning for Semi-Supervised Semantic Segmentation
* G-MSM: Unsupervised Multi-Shape Matching with Graph-Based Affinity Priors
* Gait Recognition from Fisheye Images
* GaitGCI: Generative Counterfactual Intervention for Gait Recognition
* Galactic: Scaling End-to-End Reinforcement Learning for Rearrangement at 100k Steps-Per-Second
* GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis
* GamutMLP: A Lightweight MLP for Color Loss Recovery
* GAN-based Vision Transformer for High-Quality Thermal Image Enhancement
* GANHead: Towards Generative Animatable Neural Head Avatars
* GANmouflage: 3D Object Nondetection with Texture Fields
* GAPartNet: Cross-Category Domain-Generalizable Object Perception and Manipulation via Generalizable and Actionable Parts
* GarmentTracking: Category-Level Garment Pose Tracking
* Gate-Shift-Fuse for Video Action Recognition
* Gated Multi-Resolution Transfer Network for Burst Restoration and Enhancement
* Gated Stereo: Joint Depth Estimation from Gated and Wide-Baseline Active Stereo Cues
* Gatha: Relational Loss for enhancing text-based style transfer
* Gaussian Label Distribution Learning for Spherical Image Object Detection
* GazeCaps: Gaze Estimation with Self-Attention-Routed Capsules
* Gazeformer: Scalable, Effective and Fast Prediction of Goal-Directed Human Attention
* GazeNeRF: 3D-Aware Gaze Redirection with Neural Radiance Fields
* GCFAgg: Global and Cross-View Feature Aggregation for Multi-View Clustering
* GCoNet+: A Stronger Group Collaborative Co-Salient Object Detector
* GD-MAE: Generative Decoder for MAE Pre-Training on LiDAR Point Clouds
* GEN: Pushing the Limits of Softmax-Based Out-of-Distribution Detection
* GeneCIS: A Benchmark for General Conditional Image Similarity
* General Regret Bound of Preconditioned Gradient Method for DNN Training, A
* Generalist: Decoupling Natural and Robust Generalization
* Generalizable Implicit Neural Representations via Instance Pattern Composers
* Generalizable Local Feature Pre-training for Deformable Shape Analysis
* Generalization Matters: Loss Minima Flattening via Parameter Hybridization for Efficient Online Knowledge Distillation
* Generalized Decoding for Pixel, Image, and Language
* Generalized Deep 3D Shape Prior via Part-Discretized Diffusion Process
* Generalized Framework for Video Instance Segmentation, A
* Generalized Relation Modeling for Transformer Tracking
* Generalized UAV Object Detection via Frequency Domain Disentanglement
* Generalizing Dataset Distillation via Deep Generative Prior
* Generating Adversarial Attacks in the Latent Space
* Generating Adversarial Samples in Mini-Batches May Be Detrimental To Adversarial Robustness
* Generating Aligned Pseudo-Supervision from Non-Aligned Data for Image Restoration in Under-Display Camera
* Generating Anomalies for Video Anomaly Detection with Prompt-based Feature Mapping
* Generating Features with Increased Crop-Related Diversity for Few-Shot Object Detection
* Generating Holistic 3D Human Motion from Speech
* Generating Human Motion from Textual Descriptions with Discrete Representations
* Generating Part-Aware Editable 3D Shapes without 3D Supervision
* Generation-based contrastive model with semantic alignment for generalized zero-shot learning
* Generative Bias for Robust Visual Question Answering
* Generative Diffusion Prior for Unified Image Restoration and Enhancement
* Generative Reasoning Integrated Label Noise Robust Deep Image Representation Learning
* Generative Semantic Segmentation
* Generic-to-Specific Distillation of Masked Autoencoders
* Genie: Show Me the Data for Quantization
* GenSim: Unsupervised Generic Garment Simulator
* Geographical Information System Based Spatial and Statistical Analysis of the Green Areas in the Cities of Abha and Bisha for Environmental Sustainability
* GeoLayoutLM: Geometric Pre-training for Visual Information Extraction
* Geological Hazard Assessment of Secondary Collapses Due to Volcanic Earthquakes on Changbai Mountain in China
* GeoMAE: Masked Geometric Target Prediction for Self-supervised Point Cloud Pre-Training
* Geometric and Photometric Exploration of GAN and Diffusion Synthesized Faces, A
* Geometric Visual Similarity Learning in 3D Medical Image Self-Supervised Pre-training
* Geometry and Uncertainty-Aware 3D Point Cloud Class-Incremental Semantic Segmentation
* Geometry Enhanced Reference-based Image Super-resolution
* Geometry- and Accuracy-Preserving Random Forest Proximities
* GeoMultiTaskNet: remote sensing unsupervised domain adaptation using geographical coordinates
* GeoMVSNet: Learning Multi-View Stereo with Geometry Perception
* GeoNet: Benchmarking Unsupervised Adaptation across Geographies
* Geophysics in Antarctic Research: A Bibliometric Analysis
* Geospatial Assessment of Managed Aquifer Recharge Potential Sites in Punjab, Pakistan
* GeoVLN: Learning Geometry-Enhanced Visual Representation with Slot Attention for Vision-and-Language Navigation
* Gesture image recognition method based on DC-Res2Net and a feature fusion attention module
* GFIE: A Dataset and Baseline for Gaze-Following from 2D to 3D in Indoor Environments
* GFMRC: A machine reading comprehension model for named entity recognition
* GFNet: Global Filter Networks for Visual Recognition
* GFPose: Learning 3D Human Pose Prior with Gradient Fields
* Giga-SSL: Self-Supervised Learning for Gigapixel Images
* GINA-3D: Learning to Generate Implicit Neural Assets in the Wild
* GIVL: Improving Geographical Inclusivity of Vision-Language Models with Pre-Training Methods
* GKEAL: Gaussian Kernel Embedded Analytic Learning for Few-Shot Class Incremental Task
* Glacier Change and Its Influencing Factors in the Northern Part of the Kunlun Mountains
* Glass Wool Defect Detection Using an Improved YOLOv5
* GlassesGAN: Eyewear Personalization Using Synthetic Appearance Discovery and Targeted Subspace Modeling
* GLeaD: Improving GANs with A Generator-Leading Task
* GLIGEN: Open-Set Grounded Text-to-Image Generation
* Global Aligned Structured Sparsity Learning for Efficient Image Super-Resolution
* Global and Local Mixture Consistency Cumulative Learning for Long-tailed Visual Recognitions
* Global centralised and structured discriminative non-negative matrix factorisation for hyperspectral unmixing
* Global Digital Elevation Model Comparison Criteria: An Evident Need to Consider Their Application
* Global guidance-based integration network for salient object detection in low-light images
* Global Motion Understanding in Large-Scale Video Object Segmentation
* Global Systematic Review of Improving Crop Model Estimations by Assimilating Remote Sensing Data: Implications for Small-Scale Agricultural Systems, A
* Global Transformer and Dual Local Attention Network via Deep-Shallow Hierarchical Feature Fusion for Retinal Vessel Segmentation
* Global Vision Transformer Pruning with Hessian-Aware Saliency
* Global-Local Tracking Framework Driven by Both Motion and Appearance for Infrared Anti-UAV, A
* Global-to-Local Modeling for Video-Based 3D Human Pose and Shape Estimation
* Glocal Energy-based Learning for Few-Shot Open-Set Recognition
* Gloss Attention for Gloss-free Sign Language Translation
* GM-NeRF: Learning Generalizable Model-Based Neural Radiance Fields from Multi-View Images
* Good is Bad: Causality Inspired Cloth-debiasing for Cloth-changing Person Re-identification
* GP-VTON: Towards General Purpose Virtual Try-On via Collaborative Local-Flow Global-Parsing Learning
* GPr-Net: Geometric Prototypical Network for Point Cloud Few-Shot Learning
* GPR-Net: Multi-view Layout Estimation via a Geometry-aware Panorama Registration Network
* Grad-PU: Arbitrary-Scale Point Cloud Upsampling via Gradient Descent with Learned Distance Functions
* GradICON: Approximate Diffeomorphisms via Gradient Inverse Consistency
* Gradient Attention Balance Network: Mitigating Face Recognition Racial Bias via Gradient Attention
* Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization
* Gradient-based Uncertainty Attribution for Explainable Bayesian Deep Learning
* GradMA: A Gradient-Memory-based Accelerated Federated Learning with Alleviated Catastrophic Forgetting
* GradMDM: Adversarial Attack on Dynamic Networks
* Graph Representation for Order-aware Visual Transformation
* Graph Transformer GANs for Graph-Constrained House Generation
* Graph-CoVis: GNN-Based Multi-View Panorama Global Pose Estimation
* Graphics Capsule: Learning Hierarchical 3D Face Representations from 2D Images
* GraVoS: Voxel Selection for 3D Point-Cloud Detection
* GRES: Generalized Referring Expression Segmentation
* Grid-guided Neural Radiance Fields for Large Urban Scenes
* Ground-Truth Free Meta-Learning for Deep Compressive Sampling
* Grounding Counterfactual Explanation of Image Classifiers to Textual Concept Space
* Grouping by Center: Predicting Centripetal Offsets for the Bottom-up Human Pose Estimation
* GrowSP: Unsupervised Semantic Segmentation of 3D Point Clouds
* gSDF: Geometry-Driven Signed Distance Functions for 3D Hand-Object Reconstruction
* Guaranteed Tensor Recovery Fused Low-rankness and Smoothness
* Guest Editorial: Learning from limited annotations for computer vision tasks
* Guided Depth Map Super-Resolution: A Survey
* Guided Depth Super-Resolution by Deep Anisotropic Diffusion
* Guided Local Feature Matching with Transformer
* Guided Recommendation for Model Fine-Tuning
* Guiding Pseudo-labels with Uncertainty Estimation for Source-free Unsupervised Domain Adaptation
* GW-net: An efficient grad-CAM consistency neural network with weakening of random erasing features for semi-supervised person re-identification
* H2ONet: Hand-Occlusion-and-Orientation-Aware Network for Real-Time 3D Hand Mesh Reconstruction
* HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning
* Habitat-Matterport 3D Semantics Dataset
* HairStep: Transfer Synthetic to Real Using Strand and Depth Maps for Single-View 3D Hair Modeling
* Half a Century of Oceans from Space: Features and Futures
* HaLP: Hallucinating Latent Positives for Skeleton-based Self-Supervised Learning of Actions
* Ham2Pose: Animating Sign Language Notation into Pose Sequences
* Hamming Similarity and Graph Laplacians for Class Partitioning and Adversarial Image Detection
* Hand Avatar: Free-Pose Hand Animation and Rendering from Monocular Video
* Handheld Burst Super-Resolution Meets Multi-Exposure Satellite Imagery
* HandNeRF: Neural Radiance Fields for Animatable Interacting Hands
* HandsOff: Labeled Dataset Generation With No Additional Human Annotations
* handwritten ancient text detector based on improved feature pyramid network, A
* Handwritten Text Generation from Visual Archetypes
* Handy: Towards a High Fidelity 3D Hand Shape and Appearance Model
* Hard Patches Mining for Masked Image Modeling
* Hard Sample Matters a Lot in Zero-Shot Quantization
* Hard-negative Sampling with Cascaded Fine-Tuning Network to Boost Flare Removal Performance in the Nighttime Images
* Hardware-aware NAS by Genetic Optimisation with a Design Space Exploration Simulator
* Hardware-Aware Pruning for FPGA Deep Learning Accelerators
* Harmonious Feature Learning for Interactive Hand-Object Pose Estimation
* Harmonious Teacher for Cross-Domain Object Detection
* Harnessing the Power of Text-image Contrastive Models for Automatic Detection of Online Misinformation
* HARP: Personalized Hand Reconstruction from a Monocular RGB Video
* HazardNet: Road Debris Detection by Augmentation of Synthetic Models
* HDR Imaging with Spatially Varying Signal-to-Noise Ratios
* HDR video synthesis by a nonlocal regularization variational model
* Heat Diffusion Based Multi-Scale and Geometric Structure-Aware Transformer for Mesh Segmentation
* HelixSurf: A Robust and Efficient Neural Implicit Surface Learning of Indoor Scenes with Iterative Intertwined Regularization
* Helmet Rule Violation Detection for Motorcyclists using a Custom Tracking Framework and Advanced Object Detection Techniques
* Hessian Distributed Ant Optimized Perron-Frobenius Eigen Centrality for Social Networks
* Heterogeneous Continual Learning
* HexPlane: A Fast Representation for Dynamic Scenes
* HGFormer: Hierarchical Grouping Transformer for Domain Generalized Semantic Segmentation
* HGNet: Learning Hierarchical Geometry from Points, Edges, and Surfaces
* Hi-LASSIE: High-Fidelity Articulated Shape and Skeleton Discovery from Sparse Image Ensemble
* Hi4D: 4D Instance Segmentation of Close Human Interaction
* Hidden Gems: 4D Radar Scene Flow Learning Using Cross-Modal Supervision
* HIER: Metric Learning Beyond Class Labels via Hierarchical Regularization
* Hierarchical B-Frame Video Coding Using Two-Layer CANF Without Motion Coding
* Hierarchical Clustering and Refinement for Generalized Multi-Camera Person Tracking
* Hierarchical cooperative eco-driving control for connected autonomous vehicle platoon at signalized intersections
* Hierarchical Deep Reinforcement Learning Framework for 6-DOF UCAV Air-to-Air Combat, A
* Hierarchical deep semantic alignment for cross-domain 3D model retrieval
* Hierarchical Dense Correlation Distillation for Few-Shot Segmentation
* Hierarchical Discriminative Learning Improves Visual Representations of Biomedical Microscopy
* Hierarchical Explanations for Video Action Recognition
* Hierarchical Fine-Grained Image Forgery Detection and Localization
* Hierarchical Neural Memory Network for Low Latency Event Processing
* Hierarchical Prompt Learning for Multi-Task Learning
* Hierarchical Representation Network for Accurate and Detailed Face Reconstruction from In-The-Wild Images, A
* Hierarchical Semantic Contrast for Scene-aware Video Anomaly Detection
* Hierarchical Semantic Correspondence Networks for Video Paragraph Grounding
* Hierarchical Supervision and Shuffle Data Augmentation for 3D Semi-Supervised Object Detection
* Hierarchical supervisions with two-stream network for Deepfake detection
* Hierarchical Temporal Transformer for 3D Hand Pose Estimation and Action Recognition from Egocentric RGB Videos
* Hierarchical Video-Moment Retrieval and Step-Captioning
* HierVL: Learning Hierarchical Video-Language Embeddings
* High Fidelity 3D Hand Shape Reconstruction via Scalable Graph Frequency Decomposition
* High-efficiency Device-Cloud Collaborative Transformer Model
* High-fidelity 3D Face Generation from Natural Language Descriptions
* High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization
* High-fidelity 3D Human Digitization from Single 2K Resolution Images
* High-Fidelity and Freely Controllable Talking Head Video Generation
* High-Fidelity Clothed Avatar Reconstruction from a Single Image
* High-fidelity Event-Radiance Recovery via Transient Event Frequency
* High-fidelity Facial Avatar Reconstruction from Monocular Video with Generative Priors
* High-Fidelity Generalized Emotional Talking Face Generation with Multi-Modal Emotion Space Learning
* High-Fidelity Guided Image Synthesis with Latent Diffusion Models
* High-Frequency Stereo Matching Network
* High-level context representation for emotion recognition in images
* High-Order Correlation-Guided Slide-Level Histology Retrieval With Self-Supervised Hashing
* High-Perceptual Quality JPEG Decoding via Posterior Sampling
* High-Res Facial Appearance Capture from Polarized Smartphone Images
* High-Resolution and Wide-Swath 3D Imaging for Urban Areas Based on Distributed Spaceborne SAR
* High-Resolution Image Products Acquired from Mid-Sized Uncrewed Aerial Systems for Land-Atmosphere Studies
* High-resolution image reconstruction with latent diffusion models from human brain activity
* High-Resolution National-Scale Mapping of Paddy Rice Based on Sentinel-1/2 Data
* High-Resolution Synthetic RGB-D Datasets for Monocular Depth Estimation
* Highly Confident Local Structure Based Consensus Graph Learning for Incomplete Multi-view Clustering
* Hint-Aug: Drawing Hints from Foundation Vision Transformers towards Boosted Few-shot Parameter-Efficient Tuning
* Histopathology Whole Slide Image Analysis with Heterogeneous Graph Representation Learning
* HMM-Based Map Matching and Spatiotemporal Analysis for Matching Errors with Taxi Trajectories
* HNeRV: A Hybrid Neural Representation for Videos
* HNSSL: Hard Negative-Based Self-Supervised Learning
* HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models
* HOLODIFFUSION: Training a 3D Diffusion Model Using 2D Images
* Homography based Player Identification in Live Sports
* HOOD: Hierarchical Graphs for Generalized Modelling of Clothing Dynamics
* HOTNAS: Hierarchical Optimal Transport for Neural Architecture Search
* HouseDiffusion: Vector Floorplan Generation via a Diffusion Model with Discrete and Continuous Denoising
* How can objects help action recognition?
* How Do Label Errors Affect Thin Crack Detection by DNNs
* How Do the Start Date, End Date, and Frequency of Precipitation Change across China under Warming?
* How Efficient Are Today's Continual Learning Algorithms?
* How many dimensions are required to find an adversarial example?
* How Many Events Make an Object? Improving Single-frame Object Detection on the 1 Mpx Dataset
* How to Backdoor Diffusion Models?
* How to Prevent the Continuous Damage of Noises to Model Training?
* How to Prevent the Poor Performance Clients for Personalized Federated Learning?
* How You Feelin'? Learning Emotions and Mental States in Movie Scenes
* HRDFuse: Monocular 360° Depth Estimation by Collaboratively Learning Holistic-with-Regional Depth Distributions
* HS-Pose: Hybrid Scope Feature Extraction for Category-level Object Pose Estimation
* Hubs and Hyperspheres: Reducing Hubness and Improving Transductive Few-Shot Learning with Hyperspherical Embeddings
* HUGNet: Hemi-Spherical Update Graph Neural Network applied to low-latency event-based optical flow
* Human Body Shape Completion with Implicit Shape and Flow Learning
* Human Gesture and Gait Analysis for Autism Detection
* Human Guided Ground-Truth Generation for Realistic Image Super-Resolution
* Human Pose as Compositional Tokens
* Human Pose Estimation in Extremely Low-Light Conditions
* Human Pose Estimation in Monocular Omnidirectional Top-View Images
* Human Spine Motion Capture using Perforated Kinesiology Tape
* Human Vision Based 3D Point Cloud Semantic Segmentation of Large-Scale Outdoor Scenes
* Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes
* HumanBench: Towards General Human-Centric Perception with Projector Assisted Pretraining
* HumanGen: Generating Human Radiance Fields with Explicit Priors
* HuManiFlow: Ancestor-Conditioned Normalising Flows on SO(3) Manifolds for Human Pose and Shape Distribution Estimation
* Humans as Light Bulbs: 3D Human Reconstruction from Thermal Reflection
* Hunting Sparsity: Density-Guided Contrastive Learning for Semi-Supervised Semantic Segmentation
* Hybrid Active Learning via Deep Clustering for Video Action Detection
* Hybrid Binary Dragonfly Algorithm with an Adaptive Directed Differential Operator for Feature Selection, A
* Hybrid Machine Learning Approach for Evapotranspiration Estimation of Fruit Tree in Agricultural Cyber-Physical Systems
* Hybrid Neural Rendering for Large-Scale Scenes with Motion Blur
* Hybrid Transformer and CNN Attention Network for Stereo Image Super-resolution
* Hyperbolic Contrastive Learning for Visual Representations beyond Objects
* HyperCUT: Video Sequence from a Single Blurry Image using Unsupervised Ordering
* HyperMatch: Noise-Tolerant Semi-Supervised Learning via Relaxed Contrastive Constraint
* HyperReel: High-Fidelity 6-DoF Video with Ray-Conditioned Sampling
* Hyperspectral Image Classification via Spatial Shuffle-Based Convolutional Neural Network
* Hyperspectral Image Super-Resolution via Knowledge-Driven Deep Unrolling and Transformer Embedded Convolutional Recurrent Neural Network
* Hyperspherical Embedding for Point Cloud Completion
* HypLiLoc: Towards Effective LiDAR Pose Regression with Hyperbolic Fusion
* I2-SDF: Intrinsic Indoor Scene Reconstruction and Editing via Raytracing in Neural SDFs
* I2MVFormer: Large Language Model Generated Multi-View Document Supervision for Zero-Shot Image Classification
* IBEM dataset: A large printed scientific image dataset for indexing and searching mathematical expressions, The
* ICESat-2 for Coastal MSS Determination: Evaluation in the Norwegian Coastal Zone
* iCLIP: Bridging Image Classification and Contrastive Language-Image Pre-training for Visual Recognition
* Identification and Optimization of County-Level Ecological Spaces under the Dual-Carbon Target: A Case Study of Shaanxi Province, China
* Identification of Ecological Restoration Approaches and Effects Based on the OO-CCDC Algorithm in an Ecologically Fragile Region
* Identification of Urban Functional Zones Based on POI Density and Marginalized Graph Autoencoder
* Identifying and Monitoring Gardens in Urban Areas Using Aerial and Satellite Imagery
* Identifying Conditioning Factors and Predictors of Conflict Likelihood for Machine Learning Models: A Literature Review
* Identity-driven Three-Player Generative Adversarial Network for Synthetic-based Face Recognition
* Identity-Preserving Talking Face Generation with Landmark and Appearance Priors
* IDGI: A Framework to Eliminate Explanation Noise from Integrated Gradients
* iDisc: Internal Discretization for Monocular Depth Estimation
* IFSeg: Image-free Semantic Segmentation via Vision-Language Model
* Im2Hands: Learning Attentive Implicit Representation of Interacting Two-Hand Shapes
* Image as a Foreign Language: BEIT Pretraining for Vision and Vision-Language Tasks
* Image Cropping with Spatial-aware Feature and Rank Consistency
* Image Denoising: The Deep Learning Revolution and Beyond: A Survey Paper
* Image Inpainting with Hypergraphs for Resolution Improvement in Scanning Acoustic Microscopy
* Image Quality Assessment Dataset for Portraits, An
* Image Quality-aware Diagnosis via Meta-knowledge Co-embedding
* Image Recovery for Blind Polychromatic Ptychography
* Image Reference-guided Fashion Design with Structure-aware Transfer by Diffusion Models
* Image Steganalysis Against Adversarial Steganography by Combining Confidence and Pixel Artifacts
* Image Stitching With Manifold Optimization
* Image Super-Resolution Using T-Tetromino Pixels
* ImageBind One Embedding Space to Bind Them All
* Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting
* ImageNet-E: Benchmarking Neural Network Robustness via Attribute Editing
* Images Speak in Images: A Generalist Painter for In-Context Visual Learning
* Imagic: Text-Based Real Image Editing with Diffusion Models
* Imaging a Moving Point Source from Multifrequency Data Measured at One and Sparse Observation Directions (Part I): Far-Field Case
* Imitation Learning as State Matching via Differentiable Physics
* IMP: Iterative Matching and Pose Estimation with Adaptive Pooling
* Impact of Pseudo Depth on Open World Object Segmentation with Minimal User Guidance
* Impact of the tyre dynamics on autonomous vehicle path following control with front wheel steering and differential motor torque
* Impact of Uncertainty Estimation of Hydrological Models on Spectral Downscaling of GRACE-Based Terrestrial and Groundwater Storage Variation Estimations
* Implications of Solution Patterns on Adversarial Robustness
* Implicit 3D Human Mesh Recovery using Consistency with Pose and Shape from Unseen-view
* Implicit Diffusion Models for Continuous Super-Resolution
* Implicit Epipolar Geometric Function based Light Field Continuous Angular Representation
* Implicit Identity Driven Deepfake Face Swapping Detection
* Implicit Identity Leakage: The Stumbling Block to Improving Deepfake Detection Generalization
* Implicit Neural Head Synthesis via Controllable Local Deformation Fields
* Implicit Occupancy Flow Fields for Perception and Prediction in Self-Driving
* Implicit Surface Contrastive Clustering for LiDAR Point Clouds
* Implicit View-Time Interpolation of Stereo Videos Using Multi-Plane Disparities and Non-Uniform Coordinates
* Importance First: Generating Scene Graph of Human Interest
* Improved Adaptive Sparrow Search Algorithm for TDOA-Based Localization, An
* Improved Association Pipeline for Multi-Person Tracking, An
* improved checkerboard detection algorithm based on adaptive filters, An
* Improved Distribution Matching for Dataset Condensation
* Improved Multi-Frame Coherent Integration Algorithm for Heterogeneous Radar, An
* Improved On-Orbit MTF Measurement Method Based on Point Source Arrays
* Improved prototypical network for active few-shot learning
* Improved Test-Time Adaptation for Domain Generalization
* Improvements to Image Reconstruction-Based Performance Prediction for Semantic Segmentation in Highly Automated Driving
* Improving Automatic Target Recognition in Low Data Regime using Semi-Supervised Learning and Generative Data Augmentation
* Improving Beam Alignment Accuracy in mmWave Communication Systems With Auxiliary Tasks
* Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles
* Improving Cross-Domain Detection with Self-Supervised Learning
* Improving Cross-Modal Retrieval with Set of Diverse Embeddings
* Improving Data-Efficient Fossil Segmentation via Model Editing
* Improving Deep Learning-based Automatic Checkout System Using Image Enhancement Techniques
* Improving Fairness in Facial Albedo Estimation via Visual-Textual Cues
* Improving Generalization of Meta-Learning with Inverted Regularization at Inner-Level
* Improving Generalization with Domain Convex Game
* Improving Graph Representation for Point Cloud Segmentation via Attentive Filtering
* Improving Image Recognition by Retrieving from Web-Scale Image-Text Data
* Improving language-supervised object detection with linguistic structure analysis
* Improving Multi-Agent Motion Prediction with Heuristic Goals and Motion Refinement
* Improving Normalizing Flows with the Approximate Mass for Out-of-Distribution Detection
* Improving Rare Classes on nuScenes LiDAR segmentation Through Targeted Domain Adaptation
* Improving Robust Generalization by Direct PAC-Bayesian Bound Minimization
* Improving Robustness of Semantic Segmentation to Motion-Blur Using Class-Centric Augmentation
* Improving Robustness of Vision Transformers by Reducing Sensitivity to Patch Corruptions
* Improving Selective Visual Question Answering by Learning from Your Peers
* Improving Shape Awareness and Interpretability in Deep Networks Using Geometric Moments
* Improving Systolic Blood Pressure Prediction from Remote Photoplethysmography Using a Stacked Ensemble Regressor
* Improving Table Structure Recognition with Visual-Alignment Sequential Coordinate Modeling
* Improving the Accuracy of Land Use and Land Cover Classification of Landsat Data in an Agricultural Watershed
* Improving the Transferability of Adversarial Samples by Path-Augmented Method
* Improving Vision-and-Language Navigation by Generating Future-View Image Semantics
* Improving Visual Grounding by Encouraging Consistent Gradient-Based Explanations
* Improving Visual Representation Learning Through Perceptual Understanding
* Improving Weakly Supervised Temporal Action Localization by Bridging Train-Test Gap in Pseudo Labels
* Improving Zero-shot Generalization and Robustness of Multi-Modal Models
* In Defense of Structural Symbolic Representation for Video Event-Relation Prediction
* In-Depth Exploration of Person Re-Identification and Gait Recognition in Cloth-Changing Conditions, An
* In-Hand 3D Object Scanning from an RGB Sequence
* Incorporating Visual Grounding In GCN For Zero-shot Learning Of Human Object Interaction Actions
* Incremental 3D Semantic Scene Graph Prediction from RGB Sequences
* Incrementer: Transformer for Class-Incremental Semantic Segmentation with Knowledge Distillation Focusing on Old Class
* Independent Component Alignment for Multi-Task Learning
* Indescribable Multi-Modal Spatial Evaluator
* Indiscernible Object Counting in Underwater Scenes
* Inferring Affective Experience from the Big Picture Metaphor: A Two-dimensional Visual Breadth Model
* Inferring and Leveraging Parts from Object Shape for Improving Semantic Image Synthesis
* Inferring the past: a combined CNN-LSTM deep learning framework to fuse satellites for historical inundation mapping
* Infinite Photorealistic Worlds Using Procedural Generation
* Influence of Street Morphology on Thermal Environment Based on ENVI-met Simulation: A Case Study of Hangzhou Core Area, China, The
* Influential Topographic Factor Identification of Soil Heavy Metals Using GeoDetector: The Effects of DEM Resolution and Pollution Sources
* Information extraction in handwritten historical logbooks
* Information-Theoretic Method to Automatic Shortcut Avoidance and Domain Generalization for Dense Prediction Tasks, An
* Ingredient-oriented Multi-Degradation Learning for Image Restoration
* Initialization Noise in Image Gradients and Saliency Maps
* InstaBoost++: Visual Coherence Principles for Unified 2D/3D Instance Level Data Augmentation
* Instance Relation Graph Guided Source-Free Domain Adaptive Object Detection
* Instance-Aware Domain Generalization for Face Anti-Spoofing
* Instance-Specific and Model-Adaptive Supervision for Semi-Supervised Semantic Segmentation
* Instant Domain Augmentation for LiDAR Semantic Segmentation
* Instant Multi-View Head Capture through Learnable Registration
* Instant Volumetric Head Avatars
* Instant-NVR: Instant Neural Volumetric Rendering for Human-object Interactions from Monocular RGBD Stream
* InstantAvatar: Learning Avatars from Monocular Video in 60 Seconds
* InstMove: Instance Motion for Object-centric Video Segmentation
* InstructPix2Pix: Learning to Follow Image Editing Instructions
* Integral Neural Networks
* Integrally Pre-Trained Transformer Pyramid Networks
* Integrated Fast Hough Transform for Multidimensional Data, An
* Integrated Perception and Planning for Autonomous Vehicle Navigation: An Optimization-based Approach
* Integrating Appearance and Spatial-Temporal Information for Multi-Camera People Tracking
* Integrating Holistic and Local Information to Estimate Emotional Reaction Intensity
* Interactive and Explainable Region-guided Radiology Report Generation
* Interactive Cartoonization with Controllable Perceptual Factors
* Interactive Segmentation as Gaussian Process Classification
* Interactive Segmentation of Radiance Fields
* Intermediate deep feature coding for human-machine vision collaboration
* Internal Diverse Image Completion
* InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
* Interpolation-Based Event Visual Data Filtering Algorithms
* Interpretable Model-Agnostic Plausibility Verification for 2D Object Detectors Using Domain-Invariant Concept Bottleneck Models
* Interventional Bag Multi-Instance Learning On Whole-Slide Pathological Images
* Intra-Class Ranking Metric for Remote Sensing Image Retrieval, An
* Intriguing properties of synthetic images: from generative adversarial networks to diffusion models
* Intrinsic Physical Concepts Discovery with Object-Centric Predictive Models
* Introducing Competition to Boost the Transferability of Targeted Adversarial Examples Through Clean Feature Mixup
* Introducing ICEDAP: An Iterative Coastal Embayment Delineation and Analysis Process with Applications for the Management of Coastal Change
* Introduction of emerging mobility services in rural areas through the use of mobile network data combined with activity-based travel demand modelling
* Invariance Approach to Integrity Monitoring Fault Detectors
* Inverse Rendering of Translucent Objects using Physical and Neural Renderers
* Inversion of Band-Limited Discrete Fourier Transforms of Binary Images: Uniqueness and Algorithms
* Inversion-based Style Transfer with Diffusion Models
* Invertible Neural Skinning
* Inverting the Imaging Process by Learning an Implicit Camera Model
* Investigating Catastrophic Overfitting in Fast Adversarial Training: A Self-fitting Perspective
* Investigating CLIP Performance for Meta-data Generation in AD Datasets
* Investigating Deformation Mechanism of Earth-Rock Dams with InSaR and Numerical Simulation: Application to Liuduzhai Reservoir Dam, China
* Investigating Metropolitan Hierarchies through a Spatially Explicit (Local) Approach
* Investigating the Direct and Spillover Effects of Urbanization on Energy-Related Carbon Dioxide Emissions in China Using Nighttime Light Data
* Investigation of Turbulent Dissipation Rate Profiles from Two Radar Wind Profilers at Plateau and Plain Stations in the North China Plain
* IPCC-TP: Utilizing Incremental Pearson Correlation Coefficient for Joint Multi-Agent Trajectory Prediction
* IPD-Net: SO(3) Invariant Primitive Decompositional Network for 3D Point Clouds
* iQuery: Instruments as Queries for Audio-Visual Sound Separation
* IR Reasoner: Real-time Infrared Object Detection by Visual Reasoning
* Is BERT Blind? Exploring the Effect of Vision-and-Language Pretraining on Visual Language Understanding
* Is Multimodal Vision Supervision Beneficial to Language?
* Is Ride-Hailing an Effective Tool for Improving Transportation Services in Suburban New Towns in China? Evidence from Wuhan Unicom Users' Mobile Phone Usage Big Data
* IS-GGT: Iterative Scene Graph Generation with Generative Transformers
* ISBNet: a 3D Point Cloud Instance Segmentation Network with Instance-aware Sampling and Box-aware Dynamic Convolution
* ISF-GAN: An Implicit Style Function for High-Resolution Image-to-Image Translation
* Isolated Sign Language Recognition based on Tree Structure Skeleton Images
* Iterative Geometry Encoding Volume for Stereo Matching
* Iterative graph filtering network for 3D human pose estimation
* Iterative Next Boundary Detection for Instance Segmentation of Tree Rings in Microscopy Images of Shrub Cross Sections
* Iterative Proposal Refinement for Weakly-Supervised Video Grounding
* Iterative Vision-and-Language Navigation
* IterativePFN: True Iterative Point Cloud Filtering
* itKD: Interchange Transfer-based Knowledge Distillation for 3D Object Detection
* JacobiNeRF: NeRF Shaping with Mutual Information Gradients
* JAWS: Just A Wild Shot for Cinematic Transfer in Neural Radiance Fields
* Jedi: Entropy-Based Localization and Removal of Adversarial Patches
* Joint Appearance and Motion Learning for Efficient Rolling Shutter Correction
* Joint Camera and LiDAR Risk Analysis
* Joint Direction of Arrival-Polarization Parameter Tracking Algorithm Based on Multi-Target Multi-Bernoulli Filter
* Joint HDR Denoising and Fusion: A Real-World Mobile HDR Image Dataset
* Joint Posterior Probability Active Learning for Hyperspectral Image Classification
* Joint Power and Bandwidth Allocation with RCS Fluctuation Characteristic for Space Target Tracking
* Joint representation and classifier learning for long-tailed image classification
* Joint Token Pruning and Squeezing Towards More Aggressive Compression of Vision Transformers
* Joint Video Multi-Frame Interpolation and Deblurring under Unknown Exposure Time
* Joint Visual Grounding and Tracking with Natural Language Specification
* JRDB-Pose: A Large-Scale Dataset for Multi-Person Pose Estimation and Tracking
* Just a Glimpse: Rethinking Temporal Information for Video Continual Learning
* K-means Clustering Based Feature Consistency Alignment for Label-free Model Evaluation
* K-Planes: Explicit Radiance Fields in Space, Time, and Appearance
* K3DN: Disparity-Aware Kernel Estimation for Dual-Pixel Defocus Deblurring
* Kappa Angle Regression with Ocular Counter-Rolling Awareness for Gaze Estimation
* KBody: Balanced monocular whole-body estimation
* KBody: Towards general, robust, and aligned monocular whole-body estimation
* KD-DLGAN: Data Limited Image Generation via Knowledge Distillation
* KERM: Knowledge Enhanced Reasoning for Vision-and-Language Navigation
* Kernel Aware Resampler
* KiUT: Knowledge-injected U-Transformer for Radiology Report Generation
* Knowledge Combination to Learn Rotated Detection without Rotated Annotation
* Knowledge Distillation for 6D Pose Estimation by Aligning Distributions of Local Predictions
* L-CoIns: Language-based Colorization With Instance Awareness
* L1BSR: Exploiting Detector Overlap for Self-Supervised Single-Image Super-Resolution of Sentinel-2 L1B Imagery
* Label Information Bottleneck for Label Enhancement
* Label Smoothing Auxiliary Classifier Generative Adversarial Network with Triplet Loss for SAR Ship Classification
* Label-Free Liver Tumor Segmentation
* Lagrangian Relaxation Method for an Online Decentralized Assignment of Electric Vehicles to Charging Stations, A
* LANA: A Language-Capable Navigator for Instruction Following and Generation
* Land Use and Land Cover Classification in the Northern Region of Mozambique Based on Landsat Time Series and Machine Learning
* Land-Use Mapping with Multi-Temporal Sentinel Images Based on Google Earth Engine in Southern Xinjiang Uygur Autonomous Region, China
* Landsat-7 ETM+, Landsat-8 OLI, and Sentinel-2 MSI Surface Reflectance Cross-Comparison and Harmonization over the Mediterranean Basin Area
* Lanelet2 for nuScenes: Enabling Spatial Semantic Relationships and Diverse Map-based Anchor Paths
* Language Adaptive Weight Generation for Multi-Task Visual Grounding
* Language Guided Local Infiltration for Interactive Image Retrieval
* Language in a Bottle: Language Model Guided Concept Bottlenecks for Interpretable Image Classification
* Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering
* Language-Guided Audio-Visual Source Separation via Trimodal Consistency
* Language-Guided Music Recommendation for Video via Prompt Analogies
* LANIT: Language-Driven Image-to-Image Translation for Unlabeled Data
* Large Kernel Distillation Network for Efficient Single Image Super-Resolution
* Large-Capacity and Flexible Video Steganography via Invertible Neural Network
* Large-Scale Facial Expression Recognition Using Dual-Domain Affect Fusion for Noisy Labels
* Large-Scale Homography Benchmark, A
* Large-Scale Robustness Analysis of Video Action Recognition Models, A
* Large-scale Training Data Search for Object Re-identification
* LargeKernel3D: Scaling up Kernels in 3D Sparse CNNs
* LaserMix for Semi-Supervised LiDAR Semantic Segmentation
* LASP: Text-to-Text Optimization for Language-Aware Soft Prompting of Vision and Language Models
* Latency Matters: Real-Time Action Forecasting Transformer
* Latent Heterogeneous Graph Network for Incomplete Multi-View Learning
* Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures
* LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling
* Layout-based Causal Inference for Object Navigation
* LayoutDiffusion: Controllable Diffusion Model for Layout-to-Image Generation
* LayoutDM: Discrete Diffusion Model for Controllable Layout Generation
* LayoutDM: Transformer-based Diffusion Model for Layout Generation
* LayoutFormer++: Conditional Graphic Layout Generation via Constraint Serialization and Decoding Space Restriction
* LD-GAN: Low-Dimensional Generative Adversarial Network for Spectral Image Generation with Variance Regularization
* LDFA: Latent Diffusion Face Anonymization for Self-driving Applications
* LDWS-net: A learnable deep wavelet scattering network for RGB salient object detection
* Leapfrog Diffusion Model for Stochastic Trajectory Prediction
* Learnable Group-Tube Transform Induced Tensor Nuclear Norm and Its Application for Tensor Completion, A
* Learnable Skeleton-Aware 3D Point Cloud Sampling
* Learned Image Compression with Mixed Transformer-CNN Architectures
* Learned Two-Plane Perspective Prior based Image Resampling for Efficient Object Detection
* Learning 3D Representations from 2D Pre-Trained Models via Image-to-Point Masked Autoencoders
* Learning 3D Scene Priors with 2D Supervision
* Learning 3D-Aware Image Synthesis with Unknown Pose Distribution
* Learning a 3D Morphable Face Reflectance Model from Low-Cost Data
* Learning a Deep Color Difference Metric for Photographic Images
* Learning a Depth Covariance Function
* Learning a Practical SDR-to-HDRTV Up-conversion using New Dataset and Degradation Models
* Learning a Simple Low-Light Image Enhancer from Paired Low-Light Instances
* Learning A Sparse Transformer Network for Effective Image Deraining
* Learning Accurate 3D Shape Based on Stereo Polarimetric Imaging
* Learning Action Changes by Measuring Verb-Adverb Textual Relationships
* Learning Adaptive Dense Event Stereo from the Image Domain
* Learning Adversarially Robust Object Detector with Consistency Regularization in Remote Sensing Images
* Learning Analytical Posterior Probability for Human Mesh Recovery
* Learning Anchor Transformations for 3D Garment Animation
* Learning and Aggregating Lane Graphs for Urban Automated Driving
* Learning Articulated Shape with Keypoint Pseudo-Labels from Web Images
* Learning Attention as Disentangler for Compositional Zero-Shot Learning
* Learning Attribute and Class-Specific Representation Duet for Fine-Grained Fashion Analysis
* Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning
* Learning Bottleneck Concepts in Image Classification
* Learning by Imagination: A Joint Framework for Text-Based Image Manipulation and Change Captioning
* Learning by Restoring Broken 3D Geometry
* Learning CLIP Guided Visual-Text Fusion Transformer for Video-based Pedestrian Attribute Recognition
* Learning Common Rationale to Improve Self-Supervised Representation for Fine-Grained Visual Recognition Problems
* Learning Compact Representations for LiDAR Completion and Generation
* Learning Conditional Attributes for Compositional Zero-Shot Learning
* Learning Correspondence Uncertainty via Differentiable Nonlinear Least Squares
* Learning Customized Visual Models with Retrieval-Augmented Knowledge
* Learning Debiased Representations via Conditional Attribute Interpolation
* Learning Decorrelated Representations Efficiently Using Fast Fourier Transform
* Learning Detailed Radiance Manifolds for High-Fidelity and 3D-Consistent Portrait Synthesis from Monocular Image
* Learning Discriminative Representations for Skeleton Based Action Recognition
* Learning Distortion Invariant Representation for Image Restoration from a Causality Perspective
* Learning Dual-Routing Capsule Graph Neural Network for Few-Shot Video Classification
* Learning Dynamic Style Kernels for Artistic Style Transfer
* Learning Efficient GANs for Image Translation via Differentiable Masks and Co-Attention Distillation
* Learning Emotion Representations from Verbal and Nonverbal Communication
* Learning Epipolar-Spatial Relationship for Light Field Image Super-Resolution
* Learning Event Guided High Dynamic Range Video Reconstruction
* Learning Expressive Prompting With Residuals for Vision Transformers
* Learning Federated Visual Prompt in Null Space for MRI Reconstruction
* Learning from Noisy Labels with Decoupled Meta Label Purifier
* Learning from Unique Perspectives: User-aware Saliency Modeling
* Learning Generative Structure Prior for Blind Text Image Super-resolution
* Learning Geometric-Aware Properties in 2D Representation Using Lightweight CAD Models, or Zero Real 3D Pairs
* Learning Geometry-aware Representations by Sketching
* Learning Human Mesh Recovery in 3D Scenes
* Learning Human-to-Robot Handovers from Point Clouds
* Learning Imbalanced Data with Vision Transformers
* Learning Instance-Level Representation for Large-Scale Multi-Modal Pretraining in E-Commerce
* Learning Joint Latent Space EBM Prior Model for Multi-layer Generator
* Learning knowledge representation with meta knowledge distillation for single image super-resolution
* Learning Locally Editable Virtual Humans
* Learning modality-invariant binary descriptor for crossing palmprint to palm-vein recognition
* Learning Multi-Modal Class-Specific Tokens for Weakly Supervised Dense Object Localization
* Learning Multi-scale Representations with Single-stream Network for Video Retrieval
* Learning Neural Duplex Radiance Fields for Real-Time View Synthesis
* Learning Neural Parametric Head Models
* Learning Neural Proto-Face Field for Disentangled 3D Face Modeling in the Wild
* Learning Neural Volumetric Representations of Dynamic Humans in Minutes
* Learning on Gradients: Generalized Artifacts Representation for GAN-Generated Images Detection
* Learning Open-Vocabulary Semantic Segmentation Models From Natural Language Supervision
* Learning Optical Expansion from Scale Matching
* Learning Orthogonal Prototypes for Generalized Few-Shot Semantic Segmentation
* Learning Partial Correlation based Deep Visual Representation for Image Classification
* Learning Personalized High Quality Volumetric Head Avatars from Monocular RGB Videos
* Learning Procedure-aware Video Representation from Instructional Videos and Their Narrations
* Learning Rotation-Equivariant Features for Visual Correspondence
* Learning Sample Relationship for Exposure Correction
* Learning Semantic Relationship among Instances for Image-Text Matching
* Learning Semantic-Aware Disentangled Representation for Flexible 3D Human Body Editing
* Learning Semantic-Aware Knowledge Guidance for Low-Light Image Enhancement
* Learning Situation Hyper-Graphs for Video Question Answering
* Learning smooth dendrite morphological neurons for pattern classification using linkage trees and evolutionary-based hyperparameter tuning
* Learning Spatial-Temporal Implicit Neural Representations for Event-Guided Video Super-Resolution
* Learning Steerable Function for Efficient Image Resampling
* Learning the Distribution of Errors in Stereo Matching for Joint Disparity and Uncertainty Estimation
* Learning to Correct Sloppy Annotations in Electron Microscopy Volumes
* Learning to Detect and Segment for Open Vocabulary Object Detection
* Learning to Detect Mirrors from Videos via Dual Correspondences
* Learning to Dub Movies via Hierarchical Prosody Models
* Learning to Exploit Temporal Structure for Biomedical Vision-Language Processing
* Learning to Exploit the Sequence-Specific Prior Knowledge for Image Processing Pipelines Optimization
* Learning to Fuse Monocular and Multi-view Cues for Multi-frame Depth Estimation in Dynamic Scenes
* Learning to Generate Image Embeddings with User-Level Differential Privacy
* Learning to Generate Language-Supervised and Open-Vocabulary Scene Graph Using Pre-Trained Visual-Semantic Space
* Learning to Generate Text-Grounded Mask for Open-World Semantic Segmentation from Only Image-Text Pairs
* Learning to Learn With Variational Inference for Cross-Domain Image Classification
* Learning to Measure the Point Cloud Reconstruction Loss in a Representation Space
* Learning to Name Classes for Vision and Language Models
* Learning to Predict Scene-Level Implicit 3D from Posed RGBD Data
* Learning to Render Novel Views from Wide-Baseline Stereo Pairs
* Learning to Retain while Acquiring: Combating Distribution-Shift in Adversarial Data-Free Knowledge Distillation
* Learning to See in Nighttime Driving Scenes with Inter-frequency Priors
* Learning to Segment Every Referring Object Point by Point
* Learning to Zoom and Unzoom
* Learning Transferable Spatiotemporal Representations from Natural Script Knowledge
* Learning Transformation-Predictive Representations for Detection and Description of Local Features
* Learning Transformations to Reduce the Geometric Shift in Object Detection
* Learning unbiased classifiers from biased data with meta-learning
* Learning Video Representations from Large Language Models
* Learning Visibility Field for Detailed 3D Human Reconstruction and Relighting
* Learning Visual Representations via Language-Guided Sampling
* Learning Weather-General and Weather-Specific Features for Image Restoration Under Multiple Adverse Weather Conditions
* Learning with Fantasy: Semantic-Aware Virtual Contrastive Constraint for Few-Shot Class-Incremental Learning
* Learning with Noisy labels via Self-supervised Adversarial Noisy Masking
* Learning-based JNCD prediction for quality-wise perceptual quantization in HEVC
* LEGO-Net: Learning Regular Rearrangements of Objects in Rooms
* LEMaRT: Label-Efficient Masked Region Transform for Image Harmonization
* Lens-to-Lens Bokeh Effect Transformation. NTIRE 2023 Challenge Report
* Less is More: Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation
* Level-S2fM: Structure from Motion on Neural Level Set of Implicit Surfaces
* Leverage Interactive Affinity for Affordance Learning
* Leveraging Future Trajectory Prediction for Multi-Camera People Tracking
* Leveraging GANs for data scarcity of COVID-19: Beyond the hype
* Leveraging Hidden Positives for Unsupervised Semantic Segmentation
* Leveraging Inter-Rater Agreement for Classification in the Presence of Noisy Labels
* Leveraging Multi-view Data for Improved Detection Performance: An Industrial Use Case
* Leveraging per Image-Token Consistency for Vision-Language Pre-Training
* Leveraging TCN and Transformer for effective visual-audio fusion in continuous emotion recognition
* Leveraging Temporal Context in Low Representational Power Regimes
* Leveraging triplet loss for unsupervised action segmentation
* LFNAT 2023 Challenge on Light Field Depth Estimation: Methods and Results
* LG-BPN: Local and Global Blind-Patch Network for Self-Supervised Real-World Denoising
* LiDAR-Based Localization on Highways Using Raw Data and Pole-Like Object Features
* LiDAR-in-the-Loop Hyperparameter Optimization
* LiDAR2Map: In Defense of LiDAR-Based Semantic Map Construction Using Online Camera Distillation
* LidarGait: Benchmarking 3D Gait Recognition with Point Clouds
* Lifelong Age Transformation With a Deep Generative Prior
* Lifelong Learning of Task-Parameter Relationships for Knowledge Transfer
* Lift3D: Synthesize 3D Training Data by Lifting 2D GAN to 3D Generative Radiance Field
* Light Field Compression With Graph Learning and Dictionary-Guided Sparse Coding
* Light Field Synthesis from a Monocular Image using Variable LDI
* Light Source Separation and Intrinsic Image Decomposition under AC Illumination
* Light Touch Approach to Teaching Transformers Multi-view Geometry, A
* Light Weight Model for Active Speaker Detection, A
* Light-Weight Human Eye Fixation Solution for Smartphone Applications, A
* LightedDepth: Video Depth Estimation in Light of Limited Inference View Angles
* Lighting Consistency Technique for Outdoor Augmented Reality Systems Based on Multi-Source Geo-Information, A
* LightPainter: Interactive Portrait Relighting with Freehand Scribble
* Lightweight image denoising network with four-channel interaction transform
* Lightweight Real-Time Image Super-Resolution Network for 4K Images
* LINe: Out-of-Distribution Detection by Leveraging Important Neurons
* Linear Sampling Method for Random Sources, The
* LinK: Linear Kernel for LiDAR-based 3D Perception
* Linking Garment with Person via Semantically Associated Landmarks for Virtual Try-On
* LipFormer: High-fidelity and Generalizable Talking Face Generation with A Pre-learned Facial Codebook
* Listening Human Behavior: 3D Human Pose Estimation with Acoustic Signals
* Lite DETR: An Interleaved Multi-Scale Encoder for Efficient DETR
* Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation
* Lithium-Rich Pegmatite Detection Integrating High-Resolution and Hyperspectral Satellite Data in Zhawulong Area, Western Sichuan, China
* Live Demo: E2P-Events to Polarization Reconstruction from PDAVIS Events
* Live Demonstration: ANN vs SNN vs Hybrid Architectures for Event-based Real-time Gesture Recognition and Optical Flow Estimation
* Live Demonstration: Event-based Visual Microphone
* Live Demonstration: Integrating Event Based Hand Tracking Into TouchFree Interactions
* Live Demonstration: PINK: Polarity-based Anti-flicker for Event Cameras
* Live Demonstration: Real-time Event-based Speed Detection using Spiking Neural Networks
* Live Demonstration: SCAMP-7
* Live Demonstration: Tangentially Elongated Gaussian Belief Propagation for Event-based Incremental Optical Flow Estimation
* Local 3D Editing via 3D Distillation of CLIP Knowledge
* Local Connectivity-Based Density Estimation for Face Clustering
* Local Implicit Normalizing Flow for Arbitrary-Scale Image Super-Resolution
* Local Implicit Ray Function for Generalizable Radiance Field Representation
* Local pseudo-attributes for long-tailed recognition
* Local Region Perception and Relationship Learning Combined with Feature Fusion for Facial Action Unit Detection
* Local-Guided Global: Paired Similarity Representation for Visual Reinforcement Learning
* Local-to-Global Registration for Bundle-Adjusting Neural Radiance Fields
* Localization of Mobile Robots Based on Depth Camera
* Localized Latent Updates for Fine-Tuning Vision-Language Models
* Localized Semantic Feature Mixers for Efficient Pedestrian Detection in Autonomous Driving
* Localized Shortcut Removal
* LOCATE: Localize and Transfer Object Parts for Weakly Supervised Affordance Grounding
* Logical Consistency and Greater Descriptive Power for Facial Hair Attribute Learning
* Logical Implications for Visual Question Answering Consistency
* LOGO: A Long-Form Video Dataset for Group Action Quality Assessment
* LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross-Modal Fusion
* Long Range Pooling for 3D Large-Scale Scene Understanding
* Long-Tailed Visual Recognition via Self-Heterogeneous Integration with Knowledge Excavation
* Long-Term Variability in Sea Surface Temperature and Chlorophyll a Concentration in the Gulf of California
* Long-Term Visual Localization with Mobile Sensors
* Look Around for Anomalies: Weakly-Supervised Anomaly Detection via Context-Motion Relational Learning
* Look ATME: The Discriminator Mean Entropy Needs Attention
* Look Before You Match: Instance Understanding Matters in Video Object Segmentation
* Look, Radiate, and Learn: Self-Supervised Localisation via Radio-Visual Correspondence
* Lookahead Diffusion Probabilistic Models for Refining Mean Estimation
* Looking Through the Glass: Neural Surface Reconstruction Against High Specular Reflections
* Loopback Network for Explainable Microvascular Invasion Classification, A
* Low-latency monocular depth estimation using event timing on neuromorphic hardware
* Low-Light Image Enhancement via Structure Modeling and Guidance
* Low-Light Stereo Image Enhancement
* Low-rank learning for feature selection in multi-label classification
* LP-DIF: Learning Local Pattern-Specific Deep Implicit Function for 3D Objects and Scenes
* LPMSNet: Location Pooling Multi-Scale Network for Cloud and Cloud Shadow Segmentation
* LRRNet: A Novel Representation Learning Guided Fusion Network for Infrared and Visible Images
* LSDIR: A Large Scale Dataset for Image Restoration
* LSFSL: Leveraging Shape Information in Few-shot Learning
* LSTC-rPPG: Long Short-Term Convolutional Network for Remote Photoplethysmography
* LSTFE-Net: Long Short-Term Feature Enhancement Network for Video Small Object Detection
* LVQAC: Lattice Vector Quantization Coupled with Spatially Adaptive Companding for Efficient Learned Image Compression
* M2DAR: Multi-View Multi-Scale Driver Action Recognition with Vision Transformer
* M3ED: Multi-Robot, Multi-Sensor, Multi-Environment Event Dataset
* M6Doc: A Large-Scale Multi-Format, Multi-Type, Multi-Layout, Multi-Language, Multi-Annotation Category Dataset for Modern Document Layout Analysis
* MACARONS: Mapping and Coverage Anticipation with RGB Online Self-Supervision
* Machine Learning Algorithm to Detect and Analyze Meteor Echoes Observed by the Jicamarca Radar, A
* Machine Learning Applied to a Dual-Polarized Sentinel-1 Image for Wind Retrieval of Tropical Cyclones
* MAESTER: Masked Autoencoder Guided Segmentation at Pixel Resolution for Accurate, Self-Supervised Subcellular Structure Recognition
* MagConv: Mask-Guided Convolution for Image Inpainting
* MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis
* Magic3D: High-Resolution Text-to-3D Content Creation
* MagicNet: Semi-Supervised Multi-Organ Segmentation via Magic-Cube Partition and Recovery
* MagicPony: Learning Articulated 3D Animals in the Wild
* MAGVIT: Masked Generative Video Transformer
* MAGVLT: Masked Generative Vision-and-Language Transformer
* MAIR: Multi-View Attention Inverse Rendering with 3D Spatially-Varying Lighting Estimation
* Make Landscape Flatter in Differentially Private Federated Learning
* Make-A-Story: Visual Memory Conditioned Consistent Story Generation
* Makeup transfer: A review
* Making Corgis Important for Honeycomb Classification: Adversarial Attacks on Concept-based Explainability Tools
* Making Models Shallow Again: Jointly Learning to Reduce Non-Linearity and Depth for Latency-Efficient Private Inference
* Making the V in Text-VQA Matter
* Making Vision Transformers Efficient from A Token Sparsification View
* MaLP: Manipulation Localization Using a Proactive Scheme
* MammalNet: A Large-Scale Video Benchmark for Mammal Recognition and Behavior Understanding
* Manifold Hypothesis for Gradient-Based Explanations, The
* Manipulating Transfer Learning for Property Inference
* Many-Task Federated Learning: A New Problem Setting and A Simple Baseline
* MAP: Multimodal Uncertainty-Aware Vision-Language Pre-training Model
* MaPLe: Multi-modal Prompt Learning
* Mapping Degeneration Meets Label Evolution: Learning Infrared Small Target Detection with Single Point Supervision
* Mapping Irrigated Croplands from Sentinel-2 Images Using Deep Convolutional Neural Networks
* Mapping of Ecological Environment Based on Google Earth Engine Cloud Computing Platform and Landsat Long-Term Data: A Case Study of the Zhoushan Archipelago
* Mapping Underwater Aquatic Vegetation Using Foundation Models With Air- and Space-Borne Images: The Case of Polyphytos Lake
* Marching-Primitives: Shape Abstraction from Signed Distance Function
* MarginMatch: Improving Semi-Supervised Learning with Pseudo-Margins
* Markerless Camera-to-Robot Pose Estimation via Self-Supervised Sim-to-Real Transfer
* MARLIN: Masked Autoencoder for facial video Representation LearnINg
* MARRS: Modern Backbones Assisted Co-training for Rapid and Robust Semi-Supervised Domain Adaptation
* MarS3D: A Plug-and-Play Motion-Aware Model for Semantic Segmentation on Multi-Scan 3D Point Clouds
* Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation
* Mask-Free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations
* Mask-Free Video Instance Segmentation
* Mask-Guided Matting in the Wild
* Mask3D: Pretraining 2D Vision Transformers by Learning Masked 3D Priors
* MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining
* MaskCon: Masked Contrastive Learning for Coarse-Labelled Dataset
* Masked and Adaptive Transformer for Exemplar Based Image Translation
* Masked Auto-Encoders Meet Generative Adversarial Networks and Beyond
* Masked Autoencoders Enable Efficient Knowledge Distillers
* Masked Autoencoding Does Not Help Natural Language Supervision at Scale
* Masked Image Modeling with Local Multi-Scale Reconstruction
* Masked Image Training for Generalizable Deep Image Denoising
* Masked Images Are Counterfactual Samples for Robust Fine-Tuning
* Masked Jigsaw Puzzle: A Versatile Position Embedding for Vision Transformers
* Masked Motion Encoding for Self-Supervised Video Representation Learning
* Masked Representation Learning for Domain Generalized Stereo Matching
* Masked Scene Contrast: A Scalable Framework for Unsupervised 3D Representation Learning
* Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
* Masked Vision Transformers for Hyperspectral Image Classification
* Masked Wavelet Representation for Compact Neural Radiance Fields
* MaskSketch: Unpaired Structure-guided Masked Image Generation
* Master: Meta Style Transformer for Controllable Zero-Shot and Few-Shot Artistic Style Transfer
* Matching Is Not Enough: A Two-Stage Framework for Category-Agnostic Pose Estimation
* Matrix Balancing Based Interior Point Methods for Point Set Matching Problems
* Maximum Entropy Information Bottleneck for Uncertainty-aware Stochastic Embedding
* MCF: Mutual Correction Framework for Semi-Supervised Medical Image Segmentation
* MD-VQA: Multi-Dimensional Quality Assessment for UGC Live Videos
* MDL-NAS: A Joint Multi-domain Learning Framework for Vision Transformer
* MDQE: Mining Discriminative Query Embeddings to Segment Occluded Instances on Challenging Videos
* Measuring Human Perception to Improve Open Set Recognition
* MED-VT: Multiscale Encoder-Decoder Video Transformer with Application to Object Segmentation
* MEDIC: Remove Model Backdoors via Importance Driven Cloning
* Megahertz Light Steering Without Moving Parts
* MEGANE: Morphable Eyeglass and Avatar Network
* MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models
* MeMaHand: Exploiting Mesh-Mano Interaction for Single Image Two-Hand Reconstruction
* Memory-efficient and GPU-oriented visual anomaly detection with incremental dimension reduction
* Memory-Friendly Scalable Super-Resolution via Rewinding Lottery Ticket Hypothesis
* MEnsA: Mix-up Ensemble Average for Unsupervised Multi Target Domain Adaptation on 3D Point Clouds
* MES-Loss: Mutually equidistant separation metric learning loss function
* Meta Architecture for Point Cloud Analysis
* Meta Compositional Referring Expression Segmentation
* Meta Omnium: A Benchmark for General-Purpose Learning-to-Learn
* Meta-Causal Learning for Single Domain Generalization
* Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation Using Scene Object Spectrum Grounding
* Meta-learning Approach for Domain Generalisation across Visual Modalities in Vehicle Re-identification, A
* Meta-Learning Approach to Predicting Performance and Data Requirements, A
* Meta-Learning with a Geometry-Adaptive Preconditioner
* Meta-Personalizing Vision-Language Models to Find Named Instances in Video
* Meta-Tuning Loss Functions and Data Augmentation for Few-Shot Object Detection
* MetaCLUE: Towards Comprehensive Visual Metaphors Research
* Metadata-Based RAW Reconstruction via Implicit Neural Functions
* MetaFusion: Infrared and Visible Image Fusion via Meta-Feature Embedding from Object Detection
* MetaMix: Towards Corruption-Robust Continual Learning with Temporally Self-Adaptive Data Transformation
* MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation
* MetaViewer: Towards A Unified Multi-View Representation
* MethaneMapper: Spectral Absorption Aware Hyperspectral Transformer for Methane Detection
* Method for Extracting Lake Water Using ViTenc-UNet: Taking Typical Lakes on the Qinghai-Tibet Plateau as Examples, A
* Method for Intelligent Road Network Selection Based on Graph Neural Network, A
* METransformer: Radiology Report Generation by Transformer with Multiple Learnable Expert Tokens
* Metric and Color Modifications for the Automated Construction of Map Symbols
* MHPL: Minimum Happy Points Learning for Active Source Free Domain Adaptation
* MIANet: Aggregating Unbiased Instance and General Information for Few-Shot Semantic Segmentation
* MIC: Masked Image Consistency for Context-Enhanced Domain Adaptation
* Micron-BERT: BERT-Based Facial Micro-Expression Recognition
* Milestones in Autonomous Driving and Intelligent Vehicles: Part I: Control, Computing System Design, Communication, HD Map, Testing, and Human Behaviors
* MIME: Human-Aware 3D Scene Generation
* MIMMO: Multi-Input Massive Multi-Output Neural Network
* Mind the Label Shift of Augmentation-based Graph OOD Generalization
* Minimizing Maximum Model Discrepancy for Transferable Black-box Targeted Attacks
* Minimizing the Accumulated Trajectory Error to Improve Dataset Distillation
* Minimum Temperature Outweighed the Maximum Temperature in Determining Plant Growth over the Tibetan Plateau from 1982 to 2017, The
* MIPI 2023 Challenge on Nighttime Flare Removal: Methods and Results
* MIPI 2023 Challenge on RGB+ToF Depth Completion: Methods and Results
* MIPI 2023 Challenge on RGBW Fusion: Methods and Results
* MIPI 2023 Challenge on RGBW Remosaic: Methods and Results
* MISC210K: A Large-Scale Dataset for Multi-Instance Semantic Correspondence
* Missingness-Pattern-Adaptive Learning With Incomplete Data
* MIST: Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question Answering
* Mitigating Catastrophic Interference using Unsupervised Multi-Part Attention for RGB-IR Face Recognition
* Mitigating Task Interference in Multi-Task Learning via Explicit Task Routing with Non-Learnable Primitives
* Mixed Autoencoder for Self-Supervised Visual Representation Learning
* Mixed Quantization Enabled Federated Learning to Tackle Gradient Inversion Attacks
* Mixer-based Local Residual Network for Lightweight Image Super-resolution
* MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers
* MixNeRF: Modeling a Ray with Mixture Density for Novel View Synthesis from Sparse Inputs
* MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering
* MIXSIM: A Hierarchical Framework for Mixed Reality Traffic Simulation
* MixTeacher: Mining Promising Labels with Mixed Scale Teacher for Semi-Supervised Object Detection
* ML)2P-Encoder: On Exploration of Channel-Class Correlation for Multi-Label Zero-Shot Learning
* MLGNet: Multi-Task Learning Network with Attention-Guided Mechanism for Segmenting Agricultural Fields
* MM-3DScene: 3D Scene Understanding by Customizing Masked Modeling with Informative-Preserved Reconstruction and Self-Distilled Consistency
* MM-BSN: Self-Supervised Image Denoising for Real-World with Multi-Mask based on Blind-Spot Network
* MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
* MMA-Net: Multi-view mixed attention mechanism for facial action unit detection
* MMANet: Margin-Aware Distillation and Modality-Aware Regularization for Incomplete Multimodal Learning
* MMG-Ego4D: Multi-Modal Generalization in Egocentric Action Recognition
* MMRNet: Improving Reliability for Multimodal Object Detection and Segmentation for Bin Picking via Multimodal Redundancy
* MMVC: Learned Multi-Mode Video Compression with Block-based Prediction Mode Selection and Density-Adaptive Entropy Coding
* Mobile User Interface Element Detection Via Adaptively Prompt Tuning
* MobileBrick: Building LEGO for 3D Reconstruction on Mobile Devices
* MobileDeRainGAN: An Efficient Semi-Supervised Approach to Single Image Rain Removal for Task-Driven Applications
* MobileNeRF: Exploiting the Polygon Rasterization Pipeline for Efficient Neural Field Rendering on Mobile Architectures
* MobileOne: An Improved One millisecond Mobile Backbone
* MobileViG: Graph-Based Sparse Attention for Mobile Vision Applications
* MobileVOS: Real-Time Video Object Segmentation Contrastive Learning meets Knowledge Distillation
* Mod-Squad: Designing Mixtures of Experts As Modular Multi-Task Learners
* Modality-Agnostic Debiasing for Single Domain Generalization
* Modality-invariant Visual Odometry for Embodied Vision
* MoDAR: Using Motion Forecasting for 3D Object Detection in Point Cloud Sequences
* Model Barrier: A Compact Un-Transferable Isolation Domain for Model Intellectual Property Protection
* Model-Agnostic Gender Debiased Image Captioning
* Modeling Entities as Semantic Points for Visual Information Extraction in the Wild
* Modeling Inter-Class and Intra-Class Constraints in Novel Class Discovery
* Modeling the Distributional Uncertainty for Salient Object Detection Models
* Modeling Video as Stochastic Processes for Fine-Grained Video Representation Learning
* Modelling Global Deforestation Using Spherical Geographic Automata Approach
* Modelling Heat Balance of a Large Lake in Central Tibetan Plateau Incorporating Satellite Observations
* Modernizing Old Photos Using Multiple References via Photorealistic Style Transfer
* MoDi: Unconditional Motion Synthesis from Diverse Data
* Modular Memorability: Tiered Representations for Video Memorability Prediction
* MoFusion: A Framework for Denoising-Diffusion-Based Motion Synthesis
* MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Action Recognition
* MONET dataset: Multimodal drone thermal dataset recorded in rural scenarios, The
* Monitoring and Comparative Analysis of Hohhot Subway Subsidence Using StaMPS-PS Based on Two DEMS
* Monitoring Inland Water Quantity Variations: A Comprehensive Analysis of Multi-Source Satellite Observation Technology Applications
* MonoATT: Online Monocular 3D Object Detection with Adaptive Token Transformer
* Monocular 3D Human Pose Estimation for Sports Broadcasts using Partial Sports Field Registration
* MonoHuman: Animatable Human Neural Field from Monocular Video
* Monopulse Parameter Estimation for FDA-MIMO Radar under Mainlobe Deception Jamming
* Moon Imaging Performance of FAST Radio Telescope in Bistatic Configuration with Other Radars
* MOSO: Decomposing MOtion, Scene and Object for Video Prediction
* MoStGAN-V: Video Generation with Temporal Motion Styles
* MOT: Masked Optimal Transport for Partial Domain Adaptation
* Motion Information Propagation for Neural Video Compression
* Motion Matters: Difference-based Multi-scale Learning for Infrared UAV Detection
* Motion-state Alignment for Video Semantic Segmentation
* MotionDiffuser: Controllable Multi-Agent Motion Prediction Using Diffusion
* MotionTrack: End-to-End Transformer-based Multi-Object Tracking with LiDAR-Camera Fusion
* MotionTrack: Learning Robust Short-Term and Long-Term Motions for Multi-Object Tracking
* MOTRv2: Bootstrapping End-to-End Multi-Object Tracking by Pretrained Object Detectors
* MoundCount: A detection-based approach for automatic counting of planting microsites on UAV images
* MoveEnet: Online High-Frequency Human Pose Estimation with an Event Camera
* MOVES: Manipulated Objects in Video Enable Segmentation
* Movies2Scenes: Using Movie Metadata to Learn Scene Representation
* Moving Towards Centers: Re-Ranking With Attention and Memory for Re-Identification
* MP-Former: Mask-Piloted Transformer for Image Segmentation
* MSAFF-Net: Multiscale Attention Feature Fusion Networks for Single Image Dehazing and Beyond
* MSeg3D: Multi-Modal 3D Semantic Segmentation for Autonomous Driving
* MSF: Motion-guided Sequential Fusion for Efficient 3D Object Detection from Point Cloud Sequences
* MSINet: Twins Contrastive Search of Multi-Scale Interaction for Object ReID
* MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth Seeds for 3D Object Detection
* MTLSegFormer: Multi-task Learning with Transformers for Semantic Segmentation in Precision Agriculture
* MTMVC: Semi-supervised 3D hand pose estimation using multi-task and multi-view consistency
* MTN: Forensic Analysis of MP4 Video Files Using Graph Neural Networks
* Multi Domain Learning for Motion Magnification
* Multi Event Localization by Audio-Visual Fusion with Omnidirectional Camera and Microphone Array
* Multi exposure image fusion based on exposure correction and input refinement using limited low dynamic range images
* Multi parallel U-net encoder network for effective polyp image segmentation
* Multi View Action Recognition for Distracted Driver Behavior Localization
* Multi-Agent Automated Machine Learning
* Multi-Agent Deep Reinforcement Learning Framework Strategized by Unmanned Aerial Vehicles for Multi-Vessel Full Communication Connection
* Multi-Annotation Attention Model for Video Summarization
* Multi-Attention Transformer for Naturalistic Driving Action Recognition
* Multi-Biometric Unified Network for Cloth-Changing Person Re-Identification
* Multi-camera People Tracking With Mixture of Realistic and Synthetic Knowledge
* Multi-Centroid Task Descriptor for Dynamic Class Incremental Inference
* Multi-Concept Customization of Text-to-Image Diffusion
* Multi-Criterion Analysis of Cyclone Risk along the Coast of Tamil Nadu, India: A Geospatial Approach
* Multi-Date Earth Observation NeRF: The Detail Is in the Shadows
* Multi-Featured Sea Ice Classification with SAR Image Based on Convolutional Neural Network
* Multi-Granularity Archaeological Dating of Chinese Bronze Dings Based on a Knowledge-Guided Relation Graph
* Multi-Label Compound Expression Recognition: C-EXPR Database and Network
* Multi-Label Speech Emotion Recognition via Inter-Class Difference Loss Under Response Residual Network
* Multi-level attention for referring expression comprehension
* Multi-level Dispersion Residual Network for Efficient Image Super-Resolution
* Multi-Level Grid Database for Protecting and Sharing Historical Geographic Urban Data: A Case Study of Shanghai, A
* Multi-Level Logit Distillation
* Multi-modal Aerial View Image Challenge: Translation from Synthetic Aperture Radar to Electro-Optical Domain Results - PBVS 2023
* Multi-modal Aerial View Object Classification Challenge Results - PBVS 2023
* Multi-modal Emotion Reaction Intensity Estimation with Temporal Augmentation
* Multi-modal Facial Affective Analysis based on Masked Autoencoder
* Multi-modal Gait Recognition via Effective Spatial-Temporal Feature Fusion
* Multi-modal Information Fusion for Action Unit Detection in the Wild
* Multi-Modal Learning with Missing Modality via Shared-Specific Feature Modelling
* Multi-Modal Multi-Objective Contrastive Learning for Sentinel-1/2 Imagery
* Multi-Modal Representation Learning with Text-Driven Soft Masks
* Multi-Mode Online Knowledge Distillation for Self-Supervised Visual Representation Learning
* Multi-Object Manipulation via Object-Centric Neural Scattering Functions
* Multi-Object Tracking by Self-supervised Learning Appearance Model
* Multi-Objective Multi-Satellite Imaging Mission Planning Algorithm for Regional Mapping Based on Deep Reinforcement Learning
* Multi-Realism Image Compression with a Conditional Generator
* Multi-scale attention and dilation network for small defect detection
* Multi-scale convolutional attention network for lightweight image super-resolution
* Multi-scale Local Implicit Keypoint Descriptor for Keypoint Matching
* Multi-Scale Response Analysis and Displacement Prediction of Landslides Using Deep Learning with JTFA: A Case Study in the Three Gorges Reservoir, China
* Multi-sensor Ensemble-guided Attention Network for Aerial Vehicle Perception Beyond Visible Spectrum
* Multi-Sensor Large-Scale Dataset for Multi-View 3D Reconstruction
* Multi-Space Neural Radiance Fields
* Multi-Task Learning based Video Anomaly Detection with Attention
* Multi-view Adversarial Discriminator: Mine the Non-causal Factors for Object Detection in Unseen Domains
* Multi-View Azimuth Stereo via Tangent Space Consistency
* Multi-View Body Image-Based Prediction of Body Mass Index and Various Body Part Sizes
* Multi-view convolutional vision transformer for 3D object recognition
* Multi-View Diffusion Process for Spectral Clustering and Image Retrieval
* Multi-view Inverse Rendering for Large-scale Real-world Indoor Scenes
* Multi-View Reconstruction Using Signed Ray Distance Functions (SRDF)
* Multi-view Semantic Information Guidance for Light Field Image Segmentation
* Multi-View Stereo Representation Revist: Region-Aware MVSNet
* Multiclass Confidence and Localization Calibration for Object Detection
* Multilateral Semantic Relations Modeling for Image Text Retrieval
* Multimodal Continuous Emotion Recognition: A Technical Report for ABAW5
* Multimodal Feature Extraction and Fusion for Emotional Reaction Intensity Estimation and Expression Classification in Videos with Transformers
* Multimodal Fusion-Based Image Hiding Algorithm for Secure Healthcare System
* Multimodal Industrial Anomaly Detection via Hybrid Fusion
* Multimodal Integration of Human-Like Attention in Visual Question Answering
* Multimodal Object Detection by Channel Switching and Spatial Attention
* Multimodal Prompting with Missing Modalities for Visual Recognition
* Multimodal Sentiment Analysis With Image-Text Interaction Network
* Multimodal Sentiment Analysis: A Survey of Methods, Trends, and Challenges
* Multimodaltrace: Deepfake Detection using Audiovisual Representation Learning
* Multiple Instance Learning via Iterative Self-Paced Supervised Contrastive Learning
* Multiple transformation function estimation for image enhancement
* Multiplicative Fourier Level of Detail
* Multiscale residual gradient attention for face anti-spoofing
* Multiscale Tensor Decomposition and Rendering Equation Encoding for View Synthesis
* Multispectral and Thermal Sensors Onboard UAVs for Heterogeneity in Merlot Vineyard Detection: Contribution to Zoning Maps
* Multispectral Contrastive Learning with Viewmaker Networks
* Multispectral Video Semantic Segmentation: A Benchmark Dataset and Baseline
* Multivariate, Multi-Frequency and Multimodal: Rethinking Graph Neural Networks for Emotion Recognition in Conversation
* Multiview Compressive Coding for 3D Reconstruction
* Mushroom Segmentation and 3D Pose Estimation from Point Clouds using Fully Convolutional Geometric Features and Implicit Pose Encoding
* Music-Driven Group Choreography
* Mutual Exclusive Modulator for Long-Tailed Recognition
* muxGNN: Multiplex Graph Neural Network for Heterogeneous Graphs
* MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training
* MVImgNet: A Large-scale Dataset of Multi-view Images
* N-Gram in Swin Transformers for Efficient Lightweight Image Super-Resolution
* N-pad : Neighboring Pixel-based Industrial Anomaly Detection
* NAFBET: Bokeh Effect Transformation with Parameter Analysis Block based on NAFNet
* Name your style: text-guided artistic style transfer
* NamedMask: Distilling Segmenters from Complementary Foundation Models
* NaQ: Leveraging Narrations as Queries to Supervise Episodic Memory
* NAR-Former: Neural Architecture Representation Learning Towards Holistic Attributes Prediction
* Natural Language-Assisted Sign Language Recognition
* NeAT: Learning Neural Implicit Surfaces with Arbitrary Topologies from Multi-View Images
* NEF: Neural Edge Fields for 3D Parametric Curve Reconstruction from Multi-View Images
* NeFII: Inverse Rendering for Reflectance Decomposition with Near-Field Indirect Illumination
* Neighborhood Attention Transformer
* NeighborTrack: Single Object Tracking by Bipartite Matching with Neighbor Tracklets and Its Applications to Sports
* NeMo: 3D Neural Motion Fields from Multiple Video Instances of the Same Action
* NeRDi: Single-View NeRF Synthesis with Language-Guided Diffusion as General Image Priors
* NeRF in the Palm of Your Hand: Corrective Augmentation for Robotics via Novel-View Synthesis
* NeRF-DS: Neural Radiance Fields for Dynamic Specular Objects
* NeRF-RPN: A general framework for object detection in NeRFs
* NeRF-Supervised Deep Stereo
* NeRFInvertor: High Fidelity NeRF-GAN Inversion for Single-Shot Real Image Animation
* Nerflets: Local Radiance Fields for Efficient Structure-Aware 3D Scene Representation from 2D Supervision
* NeRFLight: Fast and Light Neural Radiance Fields using a Shared Feature Grid
* NeRFLiX: High-Quality Neural View Synthesis by Learning a Degradation-Driven Inter-viewpoint MiXer
* NeRFVS: Neural Radiance Fields for Free View Synthesis via Geometry Scaffolds
* NeRT: Implicit Neural Representations for Unsupervised Atmospheric Turbulence Mitigation
* NerVE: Neural Volumetric Edges for Parametric Curve Extraction from Point Cloud
* Network Expansion For Practical Training Acceleration
* Network Specialization via Feature-level Knowledge Distillation
* Network-Free, Unsupervised Semantic Segmentation with Synthetic Images
* NeuDA: Neural Deformable Anchor for High-Fidelity Implicit Surface Reconstruction
* NeUDF: Leaning Neural Unsigned Distance Fields with Volume Rendering
* NeuFace: Realistic 3D Neural Face Rendering from Multi-View Images
* Neumann Network with Recursive Kernels for Single Image Defocus Deblurring
* NeuMap: Neural Coordinate Mapping by Auto-Transdecoder for Camera Localization
* Neural Congealing: Aligning Images to a Joint Semantic Atlas
* Neural Dependencies Emerging from Learning Massive Categories
* Neural Fields Meet Explicit Geometric Representations for Inverse Rendering of Urban Scenes
* Neural Fourier Filter Bank
* Neural Intrinsic Embedding for Non-Rigid Point Cloud Matching
* Neural Kaleidoscopic Space Sculpting
* Neural Kernel Surface Reconstruction
* Neural Koopman Pooling: Control-Inspired Temporal Dynamics Encoding for Skeleton-Based Action Recognition
* Neural Lens Modeling
* Neural Map Prior for Autonomous Driving
* Neural Network-Based Moving Window Iterative Nonlinear System Identification
* Neural Part Priors: Learning to Optimize Part-Based Object Completion in RGB-D Scans
* Neural Pixel Composition for 3D-4D View Synthesis from Multi-Views
* Neural Preset for Color Style Transfer
* Neural Radiance Fields for High-Resolution Remote Sensing Novel View Synthesis
* Neural Rate Estimator and Unsupervised Learning for Efficient Distributed Image Analytics in Split-DNN models
* Neural Residual Radiance Fields for Streamably Free-Viewpoint Videos
* Neural Scene Chronology
* Neural Texture Synthesis with Guided Correspondence
* Neural Transformation Fields for Arbitrary-Styled Font Generation
* Neural Transformation Network to Generate Diverse Views for Contrastive Learning
* Neural Vector Fields: Implicit Representation by Explicit Learning
* Neural Video Compression with Diverse Contexts
* Neural Volumetric Memory for Visual Locomotion Control
* Neural Voting Field for Camera-Space 3D Hand Pose Estimation
* Neuralangelo: High-Fidelity Neural Surface Reconstruction
* NeuralDome: A Neural Modeling Pipeline on Multi-View Human-Object Interactions
* NeuralEditor: Editing Neural Radiance Fields via Manipulating Point Clouds
* NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models
* Neuralizer: General Neuroimage Analysis without Re-Training
* NeuralLift-360: Lifting an in-the-Wild 2D Photo to A 3D Object with 360° Views
* NeuralPCI: Spatio-Temporal Neural Field for 3D Point Cloud Multi-Frame Non-Linear Interpolation
* NeuralUDF: Learning Unsigned Distance Fields for Multi-View Reconstruction of Surfaces with Arbitrary Topologies
* Neuro-Modulated Hebbian Learning for Fully Test-Time Adaptation
* NeurOCS: Neural NOCS Supervision for Monocular 3D Object Localization
* Neuromorphic Event-based Facial Expression Recognition
* Neuromorphic Optical Flow and Real-time Implementation with Event Cameras
* Neuron Structure Modeling for Generalizable Remote Physiological Measurement
* NeuWigs: A Neural Dynamic Model for Volumetric Hair Capture and Animation
* New Bayesian Focal Loss Targeting Aleatoric Uncertainty Estimate: Pollen Image Recognition
* New Benchmark: On the Utility of Synthetic Data with Blender for Bare Supervised Learning and Downstream Domain Adaptation, A
* New Comprehensive Benchmark for Semi-supervised Video Anomaly Detection and Anticipation, A
* New Dataset and Approach for Timestamp Supervised Action Segmentation Using Human Object Interaction, A
* New Dataset Based on Images Taken by Blind People for Testing the Robustness of Image Classification Models Trained for ImageNet Categories, A
* New methods and functionalities for railway maintenance through a draisine prototype based on RADAR sensors
* New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning, A
* NewsNet: A Novel Dataset for Hierarchical Temporal Segmentation
* Next3D: Generative Neural Texture Rasterization for 3D-Aware Head Avatars
* NICO++: Towards Better Benchmarking for Domain Generalization
* NIFF: Alleviating Forgetting in Generalized Few-Shot Object Detection via Neural Instance Feature Forging
* Nighttime Smartphone Reflective Flare Removal Using Optical Center Symmetry Prior
* NIKI: Neural Inverse Kinematics with Invertible Neural Networks for 3D Human Pose and Shape Estimation
* NIPQ: Noise proxy-based Integrated Pseudo-Quantization
* NIRVANA: Neural Implicit Representations of Videos with Adaptive Networks and Autoregressive Patch-Wise Modeling
* NLOST: Non-Line-of-Sight Imaging with Transformer
* No One Left Behind: Improving the Worst Categories in Long-Tailed Learning
* no-reference panoramic image quality assessment with hierarchical perception and color features, A
* Noisy Correspondence Learning with Meta Similarity Correction
* NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers
* NoisyTwins: Class-Consistent and Diverse Image Generation Through StyleGANs
* Non-Contrastive Learning Meets Language-Image Pre-Training
* Non-Contrastive Unsupervised Learning of Physiological Signals from Video
* Non-Line-of-Sight Imaging with Signal Superresolution Network
* Nonverbal Communication Cue Recognition: A Pathway to More Accessible Communication
* NoPe-NeRF: Optimising Neural Radiance Field with No Pose Prior
* Nordic Vehicle Dataset (NVD): Performance of vehicle detectors using newly captured NVD from UAV in different snowy weather conditions
* Normal-guided Garment UV Prediction for Human Re-texturing
* Normalizing Flow based Feature Synthesis for Outlier-Aware Object Detection
* Not All Image Regions Matter: Masked Vector Quantization for Autoregressive Image Generation
* Novel 3D ArcSAR Sensing System Applied to Unmanned Ground Vehicles, A
* novel approach for bias mitigation of gender classification algorithms using consistency regularization, A
* Novel Assessment of the Surface Heat Flux Role in Radon (Rn-222) Gas Flow within Subsurface Geological Porous Media, A
* Novel Benchmark for Refinement of Noisy Localization Labels in Autolabeled Datasets for Object Detection, A
* Novel Bipartite Consensus Tracking Control for Multiagent Systems Under Sensor Deception Attacks, A
* Novel Class Discovery for 3D Point Cloud Semantic Segmentation
* novel fast intra algorithm for VVC based on histogram of oriented gradient, A
* Novel-View Acoustic Synthesis
* NS3D: Neuro-Symbolic Grounding of 3D Objects and Relations
* NTIRE 2023 Challenge on 360° Omnidirectional Image and Video Super-Resolution: Datasets, Methods and Results
* NTIRE 2023 Challenge on Efficient Super-Resolution: Methods and Results
* NTIRE 2023 Challenge on HR Depth from Images of Specular and Transparent Surfaces
* NTIRE 2023 Challenge on Image Denoising: Methods and Results
* NTIRE 2023 Challenge on Image Super-Resolution (×4): Methods and Results
* NTIRE 2023 Challenge on Light Field Image Super-Resolution: Dataset, Methods and Results
* NTIRE 2023 Challenge on Night Photography Rendering
* NTIRE 2023 Challenge on Stereo Image Super-Resolution: Methods and Results
* NTIRE 2023 HR NonHomogeneous Dehazing Challenge Report
* NTIRE 2023 Image Shadow Removal Challenge Report
* NTIRE 2023 Quality Assessment of Video Enhancement Challenge
* NTIRE 2023 Video Colorization Challenge
* Null-text Inversion for Editing Real Images using Guided Diffusion Models
* NVTC: Nonlinear Vector Transform Coding
* NÜWA-LIP: Language-guided Image Inpainting with Defect-free VQGAN
* Objaverse: A Universe of Annotated 3D Objects
* Object Detection with Self-Supervised Scene Adaptation
* Object Discovery from Motion-Guided Tokens
* Object Folder Benchmark: Multisensory Learning with Neural and Real Objects, The
* Object pop-up: Can we infer 3D objects and their poses from human interactions alone?
* Object Pose Estimation with Statistical Guarantees: Conformal Keypoint Detection and Geometric Uncertainty Propagation
* Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection
* Object-Based Ground Filtering of Airborne LiDAR Data for Large-Area DTM Generation, An
* Object-Goal Visual Navigation via Effective Exploration of Relations Among Historical Navigation States
* ObjectMatch: Robust Registration using Canonical Object Correspondences
* ObjectStitch: Object Compositing with Diffusion Model
* Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking
* Occlusion-Free Scene Recovery via Neural Radiance Fields
* OCELOT: Overlapped Cell on Tissue Dataset for Histopathology
* OCTET: Object-aware Counterfactual Explanations
* OcTr: Octree-Based Transformer for 3D Object Detection
* Octree Guided Unoriented Surface Reconstruction
* Octree Transformer: Autoregressive 3D Shape Generation on Hierarchically Structured Sequences
* ODIN: An OmniDirectional INdoor dataset capturing Activities of Daily Living from multiple synchronized modalities
* ODSmoothGrad: Generating Saliency Maps for Object Detectors
* Old Mine Map Georeferencing: Case of Marsigli's 1696 Map of the Smolnik Mines
* Omni Aggregation Networks for Lightweight Image Super-Resolution
* Omni3D: A Large Benchmark and Model for 3D Object Detection in the Wild
* OmniAL: A Unified CNN Framework for Unsupervised Anomaly Localization
* OmniAvatar: Geometry-Guided Controllable 3D Head Synthesis
* OmniCity: Omnipotent City Understanding with Multi-Level and Multi-View Images
* OmniMAE: Single Model Masked Pretraining on Images and Videos
* Omnimatte3D: Associating Objects and Their Effects in Unconstrained Monocular Video
* OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation
* OmniVidar: Omnidirectional Depth Estimation from Multi-Fisheye Images
* On Advantages of Mask-level Recognition for Outlier-aware Segmentation
* On Calibrating Semantic Segmentation Models: Analyses and An Algorithm
* On Data Scaling in Masked Image Modeling
* On Distillation of Guided Diffusion Models
* On the Benefits of 3D Pose and Tracking for Human Action Recognition
* On the Convergence of IRLS and Its Variants in Outlier-Robust Estimation
* On the Difficulty of Unpaired Infrared-to-Visible Video Translation: Fine-Grained Content-Rich Patches Transfer
* On the Effectiveness of Partial Variance Reduction in Federated Learning with Heterogeneous Data
* On the Effects of Self-supervision and Contrastive Alignment in Deep Multi-view Clustering
* On the Importance of Accurate Geometry Data for Dense 3D Vision Tasks
* On the Pitfall of Mixup for Uncertainty Calibration
* On the Real-Time Semantic Segmentation of Aphid Clusters in the Wild
* On the Stability-Plasticity Dilemma of Class-Incremental Learning
* On-Board Multi-Class Geospatial Object Detection Based on Convolutional Neural Network for High Resolution Remote Sensing Images
* On-the-Fly Category Discovery
* One-shot and Partially-Supervised Cell Image Segmentation Using Small Visual Prompt
* One-Shot High-Fidelity Talking-Head Synthesis with Deformable Neural Radiance Field
* One-Shot Model for Mixed-Precision Quantization
* One-shot skeleton-based action recognition on strength and conditioning exercises
* One-shot Unsupervised Domain Adaptation with Personalized Diffusion Models
* One-Stage 3D Whole-Body Mesh Recovery with Component Aware Transformer
* One-to-Few Label Assignment for End-to-End Dense Detection
* OneFormer: One Transformer to Rule Universal Image Segmentation
* Online and real-time mask-guided multi-person tracking and segmentation
* Online Distillation with Continual Learning for Cyclic Domain Shifts
* Online LiDAR-to-Vehicle Alignment Using Lane Markings and Traffic Signs
* OO-dMVMT: A Deep Multi-view Multi-task Classification Framework for Real-time 3D Hand Gesture Classification and Segmentation
* OPDN: Omnidirectional Position-aware Deformable Network for Omnidirectional Image Super-Resolution
* OPE-SR: Orthogonal Position Encoding for Designing a Parameter-free Upsampling Module in Arbitrary-scale Image Super-Resolution
* Open Geospatial Data Integration in Game Engine for Urban Digital Twin Applications
* Open Image Resizing Framework for Remote Sensing Applications and Beyond, An
* Open Set Action Recognition via Multi-Label Evidential Learning
* Open Set Classification of GAN-based Image Manipulations via a ViT-based Hybrid Architecture
* Open set classification of untranscribed handwritten text image documents
* Open Vocabulary Semantic Segmentation with Patch Aligned Contrastive Learning
* Open-Category Human-Object Interaction Pre-training via Language Modeling Framework
* Open-Set Fine-Grained Retrieval via Prompting Vision-Language Evaluator
* Open-Set Likelihood Maximization for Few-Shot Learning
* Open-Set Representation Learning through Combinatorial Embedding
* Open-set Semantic Segmentation for Point Clouds via Adversarial Prototype Framework
* Open-TransMind: A New Baseline and Benchmark for 1st Foundation Model Challenge of Intelligent Transportation
* Open-vocabulary Attribute Detection
* Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models
* Open-Vocabulary Point-Cloud Object Detection without 3D Annotation
* Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP
* Open-World Multi-Task Control Through Goal-Aware Representation Learning and Adaptive Horizon Prediction
* OpenFed: A Comprehensive and Versatile Open-Source Federated Learning Framework
* OpenGait: Revisiting Gait Recognition Toward Better Practicality
* OpenMix: Exploring Outlier Samples for Misclassification Detection
* OpenScene: 3D Scene Understanding with Open Vocabularies
* Optimal Estimation Inversion of Ionospheric Electron Density from GNSS-POD Limb Measurements: Part II-Validation and Comparison Using NmF2 and hmF2
* Optimal Proposal Learning for Deployable End-to-End Pedestrian Detection
* Optimal Transport Minimization: Crowd Localization on Density Maps for Semi-Supervised Counting
* Optimization of grounding resistance in multitrain DC subway system based on MOEA/D-DE
* Optimization-Inspired Cross-Attention Transformer for Compressive Sensing
* Optimized Dictionary-Based Model Identification Method in the Scope of Brain Effective Connectivity, An
* Optimizing Camera Exposure Control Settings for Remote Vital Sign Measurements in Low-Light Environments
* Optimizing Explanations by Network Canonization and Hyperparameter Search
* Optimizing the Spatial Structure of Metasequoia Plantation Forest Based on UAV-LiDAR and Backpack-LiDAR
* ORCa: Glossy Objects as Radiance-Field Cameras
* OReX: Object Reconstruction from Planar Cross-sections Using Neural Fields
* OrienterNet: Visual Localization in 2D Public Maps with Neural Matching
* Orthogonal Annotation Benefits Barely-supervised Medical Image Segmentation
* Orthogonal Matrix Retrieval with Spatial Consensus for 3D Unknown View Tomography
* OSAN: A One-Stage Alignment Network to Unify Multimodal Alignment and Unsupervised Domain Adaptation
* OSRT: Omnidirectional Image Super-Resolution with Distortion-aware Transformer
* OT-Filter: An Optimal Transport Filter for Learning with Noisy Labels
* OTAvatar: One-Shot Talking Face Avatar with Controllable Tri-Plane Rendering
* OTST: A Two-Phase Framework for Joint Denoising and Remosaicing in RGBW CFA
* Out of Distribution Generalization via Interventional Style Transfer in Single-Cell Microscopy
* Out-of-Candidate Rectification for Weakly Supervised Semantic Segmentation
* Out-of-Distributed Semantic Pruning for Robust Semi-Supervised Learning
* OvarNet: Towards Open-Vocabulary Object Attribute Recognition
* Overcoming the TradeOff between Accuracy and Plausibility in 3D Hand Shape Reconstruction
* Overlooked Factors in Concept-Based Explanations: Dataset Choice, Concept Learnability, and Human Capability
* OVTrack: Open-Vocabulary Multiple Object Tracking
* OWL (Observe, Watch, Listen): Audiovisual Temporal Context for Localizing Actions in Egocentric Videos
* PA&DA: Jointly Sampling PAth and DAta for Consistent NAS
* PaCa-ViT: Learning Patch-to-Cluster Attention in Vision Transformers
* PACO: Parts and Attributes of Common Objects
* Paint by Example: Exemplar-based Image Editing with Diffusion Models
* Painting 3D Nature in 2D: View Synthesis of Natural Scenes from a Single Semantic Mask
* Paired-Point Lifting for Enhanced Privacy-Preserving Visual Localization
* PaletteNeRF: Palette-based Appearance Editing of Neural Radiance Fields
* PanelNet: Understanding 360 Indoor Environment via Panel Representation
* PAniC-3D: Stylized Single-view 3D Reconstruction from Portraits of Anime Characters
* PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360°
* PanoPoint: Self-Supervised Feature Points Detection and Description for 360° Panorama
* Panoptic Compositional Feature Field for Editable Scene Rendering with Network-Inferred Labels via Metric Learning
* Panoptic Lifting for 3D Scene Understanding with Neural Fields
* Panoptic Video Scene Graph Generation
* PanopticRoad: Integrated Panoptic Road Segmentation Under Adversarial Conditions
* PanopticVis: Integrated Panoptic Segmentation for Visibility Estimation at Twilight and Night
* PanoSwin: a Pano-style Swin Transformer for Panorama Understanding
* Parallel Diffusion Models of Operator and Image for Blind Inverse Problems
* ParallelEye Pipeline: An Effective Method to Synthesize Images for Improving the Visual Intelligence of Intelligent Vehicles
* Parameter Efficient Local Implicit Image Function Network for Face Segmentation
* Parametric Implicit Face Representation for Audio-Driven Facial Reenactment
* Parametric Study of MPSO-ANN Techniques in Gas-Bearing Distribution Prediction Using Multicomponent Seismic Data, A
* Parcel3D: Shape Reconstruction from Single RGB Images for Applications in Transportation Logistics
* Pareto-aware Neural Architecture Generation for Diverse Computational Budgets
* Part-attentive kinematic chain-based regressor for 3D human modeling
* PartDistillation: Learning Parts from Instance Segmentation
* Partial Network Cloning
* PartManip: Learning Cross-Category Generalizable Part Manipulation Policy from Point Cloud Observations
* PartMix: Regularization Strategy to Learn Part Discovery for Visible-Infrared Person Re-Identification
* Parts2Words: Learning Joint Embedding of Point Clouds and Texts by Bidirectional Matching Between Parts and Words
* PartSLIP: Low-Shot Part Segmentation for 3D Point Clouds via Pretrained Image-Language Models
* Passive Micron-Scale Time-of-Flight with Sunlight Interferometry
* Patch-Based 3D Natural Scene Generation from a Single Example
* Patch-Mix Transformer for Unsupervised Domain Adaptation: A Game Perspective
* PatchCraft Self-Supervised Training for Correlated Image Denoising
* PatchMixer: Rethinking network design to boost generalization for 3D point cloud understanding
* PATS: Patch Area Transportation with Subdivision for Local Feature Matching
* Pavement Temperature Forecasts Based on Model Output Statistics: Experiments for Highways in Jiangsu, China
* PC2: Projection-Conditioned Point Cloud Diffusion for Single-Image 3D Reconstruction
* pCON: Polarimetric Coordinate Networks for Neural Scene Representations
* PCR: Proxy-Based Contrastive Replay for Online Class-Incremental Continual Learning
* PCT-Net: Full Resolution Image Harmonization Using Pixel-Wise Color Transformations
* PD-Quant: Post-Training Quantization Based on Prediction Difference Metric
* PDAVIS: Bio-inspired Polarization Event Camera
* PDPP: Projected Diffusion for Procedure Planning in Instructional Videos
* PeakConv: Learning Peak Receptive Field for Radar Semantic Segmentation
* PEAL: Prior-embedded Explicit Attention Learning for Low-overlap Point Cloud Registration
* PeanutNeRF: 3D Radiance Field for Peanuts
* Pedestrian reported activity and information preference while waiting at a red light
* PEDRo: an Event-based Dataset for Person Detection in Robotics
* Peer-to-Peer Federated Continual Learning for Naturalistic Driving Action Recognition
* PEFAT: Boosting Semi-Supervised Medical Image Classification via Pseudo-Loss Estimation and Feature Adversarial Training
* Perceive, Excavate and Purify: A Novel Object Mining Framework for Instance Segmentation
* Perceiving local relative motion and global correlations for weakly supervised group activity recognition
* Perception and Semantic Aware Regularization for Sequential Confidence Calibration
* Perception Over Time: Temporal Dynamics for Robust Image Understanding
* Perception-Oriented Single Image Super-Resolution using Optimal Objective Estimation
* Perceptual Measure for Deep Single Image Camera and Lens Calibration, A
* PerfHD: Efficient ViT Architecture Performance Ranking using Hyperdimensional Computing
* PermutoSDF: Fast Multi-View Reconstruction with Implicit Surfaces Using Permutohedral Lattices
* Persistent Nature: A Generative Model of Unbounded 3D Worlds
* Person Image Synthesis via Denoising Diffusion Model
* PersonNeRF: Personalized Reconstruction from Photo Collections
* Perspective Fields for Single Image Camera Calibration
* PET-NeuS: Positional Encoding Tri-Planes for Neural Surfaces
* PGDENet: Progressive Guided Fusion and Depth Enhancement Network for RGB-D Indoor Scene Parsing
* PHA: Patch-Wise High-Frequency Augmentation for Transformer-Based Person Re-Identification
* Phase-field Models for Lightweight Graph Convolutional Networks
* Phase-Shifting Coder: Predicting Accurate Orientation in Oriented Object Detection
* Phenology-Based Maximum Light Use Efficiency for Modeling Gross Primary Production across Typical Terrestrial Ecosystems
* Phone2Proc: Bringing Robust Robots into Our Chaotic World
* Photo Pre-Training, But for Sketch
* Photometric Correction for Infrared Sensors
* Photoplethysmography imaging algorithm for real-time monitoring of skin perfusion maps
* Physical-World Optical Adversarial Attacks on 3D Face Recognition
* Physically Adversarial Infrared Patches with Learnable Shapes and Locations
* Physically Realizable Natural-Looking Clothing Textures Evade Person Detectors via 3D Modeling
* Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos
* Physics-Guided ISO-Dependent Sensor Noise Modeling for Extreme Low-Light Photography
* PIC-Score: Probabilistic Interpretable Comparison Score for Optimal Matching Confidence in Single- and Multi-Biometric Face Recognition
* Pic2Word: Mapping Pictures to Words for Zero-shot Composed Image Retrieval
* Picture that Sketch: Photorealistic Image Generation from Abstract Sketches
* PIDNet: A Real-time Semantic Segmentation Network Inspired by PID Controllers
* PillarNeXt: Rethinking Network Designs for 3D Object Detection in LiDAR Point Clouds
* Pilot Study of Query-Free Adversarial Attack against Stable Diffusion, A
* PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object Detection
* PIP-Net: Patch-Based Intuitive Prototypes for Interpretable Image Classification
* PIRLNav: Pretraining with Imitation and RL Finetuning for OBJECTNAV
* PIVOT: Prompting for Video Continual Learning
* PivoTAL: Prior-Driven Supervision for Weakly-Supervised Temporal Action Localization
* Pix2Map: Cross-Modal Retrieval for Inferring Street Maps from Images
* Pixel-level Contrastive Learning of Driving Videos with Optical Flow
* Pixels, Regions, and Objects: Multiple Enhancement for Salient Object Detection
* PixHt-Lab: Pixel Height Based Light Effect Generation for Image Compositing
* PLA: Language-Driven Open-Vocabulary 3D Scene Understanding
* PlaneDepth: Self-Supervised Depth Estimation via Orthogonal Planes
* Planning-oriented Autonomous Driving
* Plateau-Reduced Differentiable Path Tracing
* Plen-VDB: Memory Efficient VDB-Based Radiance Fields for Fast Training and Rendering
* PLIKS: A Pseudo-Linear Inverse Kinematic Solver for 3D Human Body Estimation
* Plot-Scale Irrigation Dates and Amount Detection Using Surface Soil Moisture Derived from Sentinel-1 SAR Data in the Optirrig Crop Model
* Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation
* PMatch: Paired Masked Image Modeling for Dense Geometric Matching
* PMR: Prototypical Modal Rebalance for Multimodal Learning
* POEM: Reconstructing Hand in a Point Embedded Multi-view Stereo
* Point Cloud Forecasting as a Proxy for 4D Occupancy Forecasting
* Point Cloud Scene Completion With Joint Color and Semantic Estimation From Single RGB-D Image
* Point cloud-based scene flow estimation on realistically deformable objects: A benchmark of deep learning-based methods
* Point completion by a Stack-Style Folding Network with multi-scaled graphical features
* Point2Pix: Photo-Realistic Point Cloud Rendering via Neural Radiance Fields
* PointAvatar: Deformable Point-Based Head Avatars from Videos
* PointCert: Point Cloud Classification with Deterministic Certified Robustness Guarantees
* PointClustering: Unsupervised Point Cloud Pre-training using Transformation Invariance in Clustering
* PointCMP: Contrastive Mask Prediction for Self-supervised Learning on Point Cloud Videos
* PointConvFormer: Revenge of the Point-based Convolution
* PointDistiller: Structured Knowledge Distillation Towards Efficient and Compact 3D Detection
* Pointersect: Neural Rendering with Cloud-Ray Intersection
* Pointless Global Bundle Adjustment With Relative Motions Hessians
* PointListNet: Deep Learning on 3D Point Lists
* PointVector: A Vector Representation In Point Cloud Analysis
* Polarimetric iToF: Measuring High-Fidelity Depth Through Scattering Media
* Polarized Color Image Denoising
* Policy Adaptation from Foundation Model Feedback
* Poly-PC: A Polyhedral Network for Multiple Point Cloud Tasks at Once
* PolyFormer: Referring Image Segmentation as Sequential Polygon Generation
* Polynomial Implicit Neural Representations for Large Diverse Datasets
* Pose Synchronization under Multiple Pair-wise Relative Poses
* Pose-disentangled Contrastive Learning for Self-supervised Facial Representation
* PoseExaminer: Automated Testing of Out-of-Distribution Robustness in Human Pose and Shape Estimation
* PoseFormerV2: Exploring Frequency Domain for Efficient and Robust 3D Human Pose Estimation
* Position-Guided Text Prompt for Vision-Language Pre-Training
* Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation
* Positive-Unlabeled Radar False Target Recognition Method Based on Frequency Response Features, A
* Post-Event Surface Deformation of the 2018 Baige Landslide Revealed by Ground-Based and Spaceborne Radar Observations
* Post-Processing Temporal Action Detection
* Post-Training Quantization on Diffusion Models
* PosterLayout: A New Benchmark and Approach for Content-Aware Visual-Textual Presentation Layout
* Posture-based Infant Action Recognition in the Wild with Very Limited Data
* Potato Leaf Area Index Estimation Using Multi-Sensor Unmanned Aerial Vehicle (UAV) Imagery and Machine Learning
* POTTER: Pooling Attention Transformer for Efficient Human Mesh Recovery
* Power Bundle Adjustment for Large-Scale 3D Reconstruction
* Practical Network Acceleration with Tiny Sets
* Practical Stereo Depth System for Smart Glasses, A
* Practical Upper Bound for the Worst-Case Attribution Deviations, A
* PRB-FPN+: Video Analytics for Enforcing Motorcycle Helmet Laws
* Pre-training Auto-generated Volumetric Shapes for 3D Medical Image Segmentation
* Precision Detection of Dense Litchi Fruit in UAV Images Based on Improved YOLOv5 Model
* Prediction of evoked expression from videos with temporal position fusion
* Prediction of Seedling Oilseed Rape Crop Phenotype by Drone-Derived Multimodal Data
* Prediction of Unpaved Road Conditions Using High-Resolution Optical Satellite Imagery and Machine Learning
* Predictive Coding Light: learning compact visual codes by combining excitatory and inhibitory spike timing-dependent plasticity*
* Prefix Conditioning Unifies Language and Label Supervision
* PREIM3D: 3D Consistent Precise Image Attribute Editing from a Single Image
* Presentation-Level Privacy Protection Techniques for Automated Face Recognition: A Survey
* Preserving Linear Separability in Continual Learning by Backward Feature Projection
* Pretrained Pixel-Aligned Reference Network for 3D Human Reconstruction
* Pricing iterative optimization for multi-agent simulation of setting electric vehicle charging model in public parking lots
* Primitive Generation and Semantic-Related Alignment for Universal Zero-Shot Segmentation
* Principles of Forgetting in Domain-Incremental Semantic Segmentation in Adverse Weather Conditions
* Prior Image Guided Snapshot Compressive Spectral Imaging
* Prioritised Moderation for Online Advertising
* Prioritized Subnet Sampling for Resource-Adaptive Supernet Training
* PRISE: Demystifying Deep Lucas-Kanade with Strongly Star-Convex Constraints for Multimodel Image Alignment
* Privacy-preserving Adversarial Facial Features
* Privacy-Preserving Representations are not Enough: Recovering Scene Content from Camera Poses
* Private Image Generation with Dual-Purpose Auxiliary Classifier
* Privileged Knowledge Distillation for Dimensional Emotion Recognition in the Wild
* Proactive integrated traffic control to mitigate congestion at toll plazas
* PROB: Probabilistic Objectness for Open World Object Detection
* Probabilistic Attention Model with Occlusion-aware Texture Regression for 3D Hand Reconstruction from a Single RGB Image, A
* Probabilistic Debiasing of Scene Graphs
* Probabilistic Framework for Lifelong Test-Time Adaptation, A
* Probabilistic Knowledge Distillation of Face Ensembles
* Probabilistic Prompt Learning for Dense Prediction
* Probability-based Global Cross-modal Upsampling for Pansharpening
* Probing Neural Representations of Scene Perception in a Hippocampally Dependent Task Using Artificial Neural Networks
* Probing Sentiment-Oriented PreTraining Inspired by Human Sentiment Perception Mechanism
* Procedure-Aware Pretraining for Instructional Video Understanding
* Processing GPR Surveys in Civil Engineering to Locate Buried Structures in Highly Conductive Subsoils
* ProD: Prompting-to-disentangle Domain Knowledge for Cross-domain Few-shot Image Classification
* Profiling Public Transit Passenger Mobility Using Adversarial Learning
* ProgDTD: Progressive Learned Image Compression with Double-Tail-Drop Training
* Progressive Backdoor Erasing via connecting Backdoor and Adversarial Attacks
* Progressive Disentangled Representation Learning for Fine-Grained Controllable Talking Head Synthesis
* Progressive Neighbor Consistency Mining for Correspondence Pruning
* Progressive Open Space Expansion for Open-Set Model Attribution
* Progressive Random Convolutions for Single Domain Generalization
* Progressive Semantic-Visual Mutual Adaption for Generalized Zero-Shot Learning
* Progressive Spatio-temporal Alignment for Efficient Event-based Motion Estimation
* Progressive Transformation Learning for Leveraging Virtual Images in Training
* Progressively Optimized Local Radiance Fields for Robust View Synthesis
* Projecting the Impact of Climate Change on Runoff in the Tarim River Simulated by the Soil and Water Assessment Tool Glacier Model
* Promoting Generalization in Cross-Dataset Remote Photoplethysmography
* Promoting Semantic Connectivity: Dual Nearest Neighbors Contrastive Learning for Unsupervised Domain Generalization
* Prompt, Generate, Then Cache: Cascade of Foundation Models Makes Strong Few-Shot Learners
* Prompt-Guided Zero-Shot Anomaly Action Recognition using Pretrained Deep Skeleton Features
* PromptCAL: Contrastive Affinity Learning via Auxiliary Prompts for Generalized Novel Category Discovery
* Prompting Large Language Models with Answer Heuristics for Knowledge-Based Visual Question Answering
* Propagate and Calibrate: Real-Time Passive Non-Line-of-Sight Tracking
* ProphNet: Efficient Agent-Centric Motion Forceasting with Anchor-Informed Proposals
* Proposal-Based Multiple Instance Learning for Weakly-Supervised Temporal Action Localization
* Proposing Optimal Locations for Runoff Harvesting and Water Management Structures in the Hami Qeshan Watershed, Iraq
* Protocon: Pseudo-Label Refinement via Online Clustering and Prototypical Consistency for Efficient Semi-Supervised Learning
* Prototype-Based Embedding Network for Scene Graph Generation
* Prototypical Residual Networks for Anomaly Detection and Localization
* ProTéGé: Untrimmed Pretraining for Video Temporal Grounding by Video Temporal Grounding
* Provable Phase Retrieval with Mirror Descent
* Proximal Splitting Adversarial Attack for Semantic Segmentation
* ProxyFormer: Proxy Alignment Assisted Point Cloud Completion with Missing Part Sensitive Transformer
* Pruning Parameterization with Bi-level Optimization for Efficient Semantic Segmentation on the Edge
* Pseudo-Label Guided Contrastive Learning for Semi-Supervised Medical Image Segmentation
* Pseudolite Multipath Estimation Adaptive Mitigation of Vector Tracking Based on Ref-MEDLL
* PSLT: A Light-Weight Vision Transformer With Ladder Self-Attention and Progressive Shift
* PSMNet-FusionX3: LiDAR-Guided Deep Learning Stereo Dense Matching On Aerial Images
* PSVT: End-to-End Multi-Person 3D Pose and Shape Estimation with Progressive Video Transformers
* PUGAN: Physical Model-Guided Underwater Image Enhancement Using GAN With Dual-Discriminators
* Putting People in Their Place: Affordance-Aware Human Insertion into Scenes
* PVO: Panoptic Visual Odometry
* PVT-SSD: Single-Stage 3D Object Detector with Point-Voxel Transformer
* PyPose: A Library for Robot Learning with Physics-based Optimization
* Pyramid Ensemble Structure for High Resolution Image Shadow Removal
* Pyramid NeRF: Frequency Guided Fast Radiance Field Optimization
* PyramidFlow: High-Resolution Defect Contrastive Localization Using Pyramid Normalizing Flow
* Q-DETR: An Efficient Low-Bit Quantized Detection Transformer
* Q: How to Specialize Large Vision-Language Models to Data-Scarce VQA Tasks? A: Self-Train on Unlabeled Images!
* QCNN-H: Single-Image Dehazing Using Quaternion Neural Networks
* QGORE: Quadratic-Time Guaranteed Outlier Removal for Point Cloud Registration
* QPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural Speech-Driven Gesture Generation
* Qualitative failures of image generation models and their application in detecting deepfakes
* Quality assessment of enhanced videos guided by aesthetics and technical quality attributes
* Quality-Aware Part Models for Occluded Person Re-Identification
* Quality-aware Pretrained Models for Blind Image Quality Assessment
* QuantArt: Quantizing Image Style Transfer Towards High Visual Fidelity
* Quantification of Occlusion Handling Capability of a 3D Human Pose Estimation Framework
* Quantifying Extrinsic Curvature in Neural Manifolds
* Quantifying the Loss of Coral from a Bleaching Event Using Underwater Photogrammetry and AI-Assisted Image Segmentation
* Quantitative Comparison of Point Cloud Compression Algorithms With PCC Arena
* Quantitative Manipulation of Custom Attributes on 3D-Aware Image Synthesis
* Quantitative Precipitation Estimation in the Tianshan Mountains Based on Machine Learning
* Quantized Proximal Averaging Networks for Compressed Image Recovery
* Quantum Annealing for Single Image Super-Resolution
* Quantum Multi-Model Fitting
* Quantum-Inspired Spectral-Spatial Pyramid Network for Hyperspectral Image Classification
* Quasi-Biweekly Oscillation of PM2.5 in Winter over North China and Its Leading Circulation Patterns
* Query-Centric Trajectory Prediction
* Query: Dependent Video Representation for Moment Retrieval and Highlight Detection
* QuickSRNet: Plain Single-Image Super-Resolution Architecture for Faster Inference on Mobile Platforms
* R-Unet: A Deep Learning Model for Rice Extraction in Rio Grande do Sul, Brazil
* RA-CLIP: Retrieval Augmented Contrastive Language-Image Pre-Training
* RaBit: Parametric Modeling of 3D Biped Cartoon Characters with a Topological-Consistent Dataset
* Radar Active Jamming Recognition under Open World Setting
* Radar Anti-Jamming Decision-Making Method Based on DDPG-MADDPG Algorithm
* Radar Pulse Stream Clustering Based on MaskRCNN Instance Segmentation Network
* RadarGNN: Transformation Invariant Graph Neural Network for Radar-based Perception
* Random Shuffling Data for Hyperspectral Image Classification with Siamese and Knowledge Distillation Network
* Randomized Adversarial Training via Taylor Expansion
* Range-nullspace Video Frame Interpolation with Focalized Motion Estimation
* RangeViT: Towards Vision Transformers for 3D Semantic Segmentation in Autonomous Driving
* Ranking Regularization for Critical Rare Classes: Minimizing False Positives at a High True Positive Rate
* RankMix: Data Augmentation for Weakly Supervised Learning of Classifying Whole Slide Images with Diverse Sizes and Imbalanced Categories
* Rapid Detection of Iron Ore and Mining Areas Based on MSSA-BNVTELM, Visible: Infrared Spectroscopy, and Remote Sensing
* Rate Gradient Approximation Attack Threats Deep Spiking Neural Networks
* Rate-Distortion Optimized Geometry Compression for Spinning LiDAR Point Cloud
* Raw Image Reconstruction with Learned Compact Metadata
* Rawgment: Noise-Accounted RAW Augmentation Enables Recognition in a Wide Variety of Environments
* RB-Dust - A Reference-based Dataset for Vision-based Dust Removal
* RBDL: Robust block-Structured dictionary learning for block sparse representation
* Re-basin via implicit Sinkhorn differentiation
* Re-GAN: Data-Efficient GANs Training via Architectural Reconfiguration
* Re-IQA: Unsupervised Learning for Image Quality Assessment in the Wild
* Re-Thinking Federated Active Learning Based on Inter-Class Diversity
* Re-Thinking Model Inversion Attacks Against Deep Neural Networks
* Re2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization
* Real-time 6K Image Rescaling with Rate-distortion Optimization
* Real-time and Lightweight Method for Tiny Airborne Object Detection, A
* Real-Time Controllable Denoising for Image and Video
* Real-Time Estimation of Heart Rate in Situations Characterized by Dynamic Illumination using Remote Photoplethysmography
* Real-Time Evaluation in Online Continual Learning: A New Hope
* Real-time Multi-Class Helmet Violation Detection Using Few-Shot Data Sampling Technique and YOLOv8
* Real-time Multi-person Eyeblink Detection in the Wild for Untrimmed Video
* Real-Time Neural Light Field on Mobile Devices
* Real-time Segmenting Human Portrait at Anywhere
* Real-Time Terrain Correction of Satellite Imagery-Based Solar Irradiance Maps Using Precomputed Data and Memory Optimization
* RealFusion 360° Reconstruction of Any Object from a Single Image
* REALIMPACT: A Dataset of Impact Sound Fields for Real Objects
* Realistic Saliency Guided Image Enhancement
* Realistic Synthetic Mushroom Scenes Dataset, A
* ReasonNet: End-to-End Driving with Temporal and Global Reasoning
* Rebalancing Batch Normalization for Exemplar-Based Class-Incremental Learning
* REC-MV: REconstructing 3D Dynamic Cloth from Monocular Videos
* Recent Trends in Task and Motion Planning for Robotics: A Survey
* ReCo: Region-Controlled Text-to-Image Generation
* Recognizability Embedding Enhancement for Very Low-Resolution Face Recognition and Quality Estimation
* Recognizing Rigid Patterns of Unlabeled Point Clouds by Complete and Continuous Isometry Invariants with no False Negatives and no False Positives
* Reconstructing Animatable Categories from Videos
* Reconstructing Signing Avatars from Video Using Linguistic Priors
* Recovering 3D Hand Mesh Sequence from a Single Blurry Image: A New Dataset and Temporal Unfolding
* Recurrence without Recurrence: Stable Video Landmark Detection with Deep Equilibrium Models
* Recurrent Homography Estimation Using Homography-Guided Image Warping and Focus Transformer
* Recurrent Vision Transformers for Object Detection with Event Cameras
* Recursions Are All You Need: Towards Efficient Deep Unfolding Networks
* ReDirTrans: Latent-to-Latent Translation for Gaze and Head Redirection
* REDRESS: Generating Compressed Models for Edge Inference Using Tsetlin Machines
* Reducing the Label Bias for Timestamp Supervised Temporal Action Segmentation
* Reducing Vision-Answer Biases for Multiple-Choice VQA
* Ref-NPR: Reference-Based Non-Photorealistic Radiance Fields for Controllable Scene Stylization
* RefCLIP: A Universal Teacher for Weakly Supervised Referring Expression Comprehension
* Referring Image Matting
* Referring Multi-Object Tracking
* RefSR-NeRF: Towards High Fidelity and Super Resolution View Synthesis
* RefTeacher: A Strong Baseline for Semi-Supervised Referring Expression Comprehension
* Refusion: Enabling Large-Size Realistic Image Restoration with Latent-Space Diffusion Models
* Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers
* Region-based Appearance and Flow Characteristics for Anomaly Detection in Infrared Surveillance Imagery
* Regularization of polynomial networks for image recognition
* Regularize implicit neural representation by itself
* Regularized Vector Quantization for Tokenized Image Synthesis
* Regularizing Orientation Estimation in Cryogenic Electron Microscopy Three-Dimensional Map Refinement through Measure-Based Lifting over Riemannian Manifolds
* Regularizing Second-Order Influences for Continual Learning
* ReidTrack: Reid-only Multi-target Multi-camera Tracking
* Reinforcement Learning-Based Black-Box Model Inversion Attacks
* Relational Context Learning for Human-Object Interaction Detection
* Relational Edge-Node Graph Attention Network for Classification of Micro-Expressions
* Relational Space-Time Query in Long-Form Videos
* Reliability in Semantic Segmentation: Are we on the Right Track?
* Reliability of GPM IMERG Satellite Precipitation Data for Modelling Flash Flood Events in Selected Watersheds in the UAE
* Reliable and Interpretable Personalized Federated Learning
* Reliable Student: Addressing Noise in Semi-Supervised 3D Object Detection
* ReLight My NeRF: A Dataset for Novel View Synthesis and Relighting of Real World Objects
* Relightable Neural Human Assets from Multi-view Gradient Illuminations
* RelightableHands: Efficient Neural Relighting of Articulated Hand Models
* RelTR: Relation Transformer for Scene Graph Generation
* Remote mass facial temperature screening in varying ambient temperatures and distances
* Removing Objects From Neural Radiance Fields
* Renderable Neural Radiance Map for Visual Navigation
* RenderDiffusion: Image Diffusion for 3D Reconstruction, Inpainting and Generation
* Reparameterized Residual Feature Network For Lightweight Image Super-Resolution
* RepMode: Learning to Re-Parameterize Diverse Experts for Subcellular Structure Prediction
* Representation Learning for Visual Object Tracking by Masked Appearance Transfer
* Representing Multimodal Behaviors With Mean Location for Pedestrian Trajectory Prediction
* Representing Volumetric Videos as Dynamic MLP Maps
* Reproducible Scaling Laws for Contrastive Language-Image Learning
* Research on Road Network Partitioning Considering the Coupling of Network Connectivity and Traffic Attributes
* ResFormer: Scaling ViTs with Multi-Resolution Training
* Residual 3D convolutional neural network to enhance sinograms from small-animal positron emission tomography images
* Residual Degradation Learning Unfolding Framework with Mixing Priors Across Spectral and Spatial for Compressive Spectral Imaging
* Resource Problem of Using Linear Layer Leakage Attack in Federated Learning, The
* Resource-Efficient RGBD Aerial Tracking
* Respiratory Rate Estimation Based on Detected Mask Area in Thermal Images
* Response of Evapotranspiration (ET) to Climate Factors and Crop Planting Structures in the Shiyang River Basin, Northwestern China
* Restoration of Hand-Drawn Architectural Drawings using Latent Space Mapping with Degradation Generator
* Rethinking Dilated Convolution for Real-time Semantic Segmentation
* Rethinking Domain Generalization for Face Anti-spoofing: Separability and Alignment
* Rethinking Feature-based Knowledge Distillation for Face Recognition
* Rethinking Federated Learning with Domain Shift: A Prototype View
* Rethinking Few-Shot Medical Segmentation: A Vector Quantization View
* Rethinking Gradient Projection Continual Learning: Stability/Plasticity Feature Space Decoupling
* Rethinking Image Super Resolution from Long-Tailed Distribution Learning Perspective
* Rethinking Optical Flow from Geometric Matching Consistent Perspective
* Rethinking Out-of-distribution (OOD) Detection: Masked Image Modeling is All You Need
* Rethinking the Approximation Error in 3D Surface Fitting for Point Cloud Normal Estimation
* Rethinking the Correlation in Few-Shot Segmentation: A Buoys View
* Rethinking the Learning Paradigm for Dynamic Facial Expression Recognition
* Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
* Reveal: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory
* Revealing Hidden Context Bias in Segmentation and Object Detection through Concept-specific Explanations
* Revealing Schematic Map Designs with Preservation of Relativity in Node Position and Segment Length in Existing Official Maps
* Revealing the Dark Secrets of Masked Image Modeling
* reversibility of cancelable biometric templates based on iterative perturbation stochastic approximation strategy, The
* Review of Practical AI for Remote Sensing in Earth Sciences, A
* ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Regeneration
* Revisiting Class Imbalance for End-to-end Semi-Supervised Object Detection
* Revisiting Multimodal Representation in Contrastive Learning: From Patch and Token Embeddings to Finite Discrete Tokens
* Revisiting Prototypical Network for Cross Domain Few-Shot Learning
* Revisiting Residual Networks for Adversarial Robustness
* Revisiting Reverse Distillation for Anomaly Detection
* Revisiting Rolling Shutter Bundle Adjustment: Toward Accurate and Fast Solution
* Revisiting Rotation Averaging: Uncertainties and Robust Losses
* Revisiting Self-Similarity: Structural Embedding for Image Retrieval
* Revisiting Temporal Modeling for CLIP-Based Image-to-Video Knowledge Transferring
* Revisiting the P3P Problem
* Revisiting the Stack-Based Inverse Tone Mapping
* Revisiting Weak-to-Strong Consistency in Semi-Supervised Semantic Segmentation
* RGB No More: Minimally-Decoded JPEG Vision Transformers
* RGBD2: Generative Scene Synthesis via Incremental View Inpainting Using RGBD Diffusion Models
* RIATIG: Reliable and Imperceptible Adversarial Text-to-Image Generation with Natural Prompts
* RIAV-MVS: Recurrent-Indexing an Asymmetric Volume for Multi-View Stereo
* RIDCP: Revitalizing Real Image Dehazing via High-Quality Codebook Priors
* RIFormer: Keep Your Vision Backbone Effective But Removing Token Mixer
* Rigidity-Aware Detection for 6D Object Pose Estimation
* RILS: Masked Visual Reconstruction in Language Semantic Space
* RIP Analysis for L1/Lp (p>1) Minimization Method
* Rip Current Segmentation: A Novel Benchmark and YOLOv8 Baseline Results
* RL-CAM: Visual Explanations for Convolutional Networks using Reinforcement Learning
* RMLVQA: A Margin Loss Approach For Visual Question Answering with Language Biases
* Robot Motion Learning Method Using Broad Learning System Verified by Small-Scale Fish-Like Robot, A
* Robot Structure Prior Guided Temporal Attention for Camera-to-Robot Pose Estimation from Image Sequence
* Robust 3D Shape Classification via Non-local Graph Attention Network
* Robust and Scalable Gaussian Process Regression and Its Applications
* Robust and Scalable Vehicle Re-Identification via Self-Supervision
* Robust Automatic Motorcycle Helmet Violation Detection for an Intelligent Transportation System
* Robust Dual Spatial Weighted Sparse Unmixing for Remotely Sensed Hyperspectral Imagery
* Robust Dynamic Radiance Fields
* Robust Generalization Against Photon-Limited Corruptions via Worst-Case Sharpness Minimization
* Robust Hierarchical Symbolic Explanations in Hyperbolic Space for Image Classification
* Robust Leaderless Time-Varying Formation Control for Nonlinear Unmanned Aerial Vehicle Swarm System With Communication Delays
* Robust Low-Rank Matrix Recovery as Mixed Integer Programming via L_0-Norm Optimization
* Robust Mean Teacher for Continual and Gradual Test-Time Adaptation
* Robust Model-based Face Reconstruction through Weakly-Supervised Outlier Segmentation
* Robust Monocular 3D Human Motion with Lasso-Based Differential Kinematics
* Robust Multiview Multimodal Driver Monitoring System Using Masked Multi-Head Self-Attention
* Robust Multiview Point Cloud Registration with Reliable Pose Graph Initialization and History Reweighting
* Robust Outlier Rejection for 3D Registration with Variational Bayes
* Robust Partial Fingerprint Recognition
* Robust Single Image Reflection Removal Against Adversarial Attacks
* Robust Test-Time Adaptation in Dynamic Scenarios
* Robust Unsupervised StyleGAN Image Restoration
* RobustNeRF: Ignoring Distractors with Robust Losses
* Robustness Against Gradient based Attacks through Cost Effective Network Fine-Tuning
* Robustness of Visual Explanations to Common Data Augmentation Methods
* Robustness with Query-efficient Adversarial Attack using Reinforcement Learning
* RockSeg: A Novel Semantic Segmentation Network Based on a Hybrid Framework Combining a Convolutional Neural Network and Transformer for Deep Space Rock Images
* RODIN: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion
* Role of Transients in Two-Bounce Non-Line-of-Sight Imaging
* RONO: Robust Discriminative Learning with Noisy Labels for 2D-3D Cross-Modal Retrieval
* RoSteALS: Robust Steganography using Autoencoder Latent Space
* Rotation-Invariant Transformer for Point Cloud Matching
* Rotation-Translation-Decoupled Solution for Robust and Efficient Visual-Inertial Initialization, A
* RSS-LIWOM: Rotating Solid-State LiDAR for Robust LiDAR-Inertial-Wheel Odometry and Mapping
* RTTLC: Video Colorization with Restored Transformer and Test-time Local Converter
* Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks
* RUST: Latent Neural Scene Representations from Unposed Imagery
* RWSC-Fusion: Region-Wise Style-Controlled Fusion Network for the Prohibited X-ray Security Image Synthesis
* RxRx1: A Dataset for Evaluating Experimental Batch Correction Methods
* S3C: Semi-Supervised VQA Natural Language Explanation via Self-Critical Learning
* SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
* Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models
* Saliency-aware Stereoscopic Video Retargeting
* Sample-level Multi-view Graph Clustering
* Samples with Low Loss Curvature Improve Data Efficiency
* Sampling is Matter: Point-Guided 3D Human Mesh Reconstruction
* Sanity checks for patch visualisation in prototype-based image classification
* SANO: Score-based Diffusion Model for Anomaly Localization in Dermatology
* SAP-DETR: Bridging the Gap Between Salient Points and Queries-Based Transformer Detector for Fast Model Convergency
* SASTGCN: A Self-Adaptive Spatio-Temporal Graph Convolutional Network for Traffic Prediction
* Satellite Altimetry for Ocean and Coastal Applications: A Review
* Satellite-Based Identification and Characterization of Extreme Ice Features: Hummocks and Ice Islands
* SaTSeaD: Satellite Triangulated Sea Depth Open-Source Bathymetry Module for NASA Ames Stereo Pipeline
* SB-VQA: A Stack-Based Video Quality Assessment Framework for Video Enhancement
* SC-NAFSSR: Perceptual-Oriented Stereo Image Super-Resolution Using Stereo Consistency Guided NAFSSR
* SCADE: NeRFs from Space Carving with Ambiguity-Aware Depth Estimates
* Scalable, Detailed and Mask-Free Universal Photometric Stereo
* Scale Analysis of Typhoon In-Fa (2021) Based on FY-4A Geostationary Interferometric Infrared Sounder (GIIRS) Observed and All-Sky-Simulated Brightness Temperature
* Scale-Invariant Trajectory Simplification Method for Efficient Data Collection in Videos, A
* SCALE: Online Self-Supervised Lifelong Learning without Prior Knowledge
* ScaleDet: A Scalable Multi-Dataset Object Detector
* ScaleFL: Resource-Adaptive Federated Learning with Heterogeneous Clients
* ScaleKD: Distilling Scale-Aware Knowledge in Small Object Detector
* Scaling Language-Image Pre-Training via Masking
* Scaling up GANs for Text-to-Image Synthesis
* Scan2LoD3: Reconstructing semantic 3D building models at LoD3 using ray casting and Bayesian networks
* ScanDMM: A Deep Markov Model of Scanpath Prediction for 360° Images
* SCANet: Self-Paced Semi-Curricular Attention Network for Non-Homogeneous Image Dehazing
* ScarceNet: Animal Pose Estimation with Scarce Annotations
* SCConv: Spatial and Channel Reconstruction Convolution for Feature Redundancy
* Scene Graph Driven Text-Prompt Generation for Image Inpainting
* Scene-Aware Egocentric 3D Human Pose Estimation
* SceneComposer: Any-Level Semantic Image Synthesis
* SceneTrilogy: On Human Scene-Sketch and its Complementarity with Photo and Text
* Schrödinger's Camera: First Steps Towards a Quantum-Based Privacy Preserving Camera
* SCoDA: Domain Adaptive Shape Completion for Real Scans
* SCONE-GAN: Semantic Contrastive learning-based Generative Adversarial Network for an end-to-end image translation
* SCOOP: Self-Supervised Correspondence and Optimization-Based Scene Flow
* Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D Generation
* Scoring Your Prediction on Unseen Data
* SCOTCH and SODA: A Transformer Video Shadow Detection Framework
* SCPNet: Semantic Scene Completion on Point Cloud
* SDC-UDA: Volumetric Unsupervised Domain Adaptation Framework for Slice-Direction Continuous Cross-Modality Medical Image Segmentation
* SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation
* SDV-LOAM: Semi-Direct Visual-LiDAR Odometry and Mapping
* SE-ORNet: Self-Ensembling Orientation-Aware Network for Unsupervised Point Cloud Shape Correspondence
* SeaMAE: Masked Pre-Training with Meteorological Satellite Imagery for Sea Fog Detection
* Search-Map-Search: A Frame Selection Paradigm for Action Recognition
* Seasonal Domain Shift in the Global South: Dataset and Deep Features Analysis
* Seasoning Model Soups for Robustness to Adversarial and Natural Distribution Shifts
* SeaThru-NeRF: Neural Radiance Fields in Scattering Media
* SECAD-Net: Self-Supervised CAD Reconstruction by Learning Sketch-Extrude Operations
* Second Monocular Depth Estimation Challenge, The
* Secure and Disambiguating Approach for Generative Linguistic Steganography, A
* Secure Outsourced SIFT: Accurate and Efficient Privacy-Preserving Image SIFT Feature Extraction
* Seeing a Rose in Five Thousand Ways
* Seeing Beyond the Brain: Conditional Diffusion Model with Sparse Masked Modeling for Vision Decoding
* Seeing Electric Network Frequency from Events
* Seeing Through the Data: A Statistical Evaluation of Prohibited Item Detection Benchmark Datasets for X-ray Security Screening
* Seeing Through the Glass: Neural 3D Reconstruction of Object Inside a Transparent Container
* Seeing What You Miss: Vision-Language Pre-training with Semantic Completion Learning
* Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert
* Seeing With Sound: Long-Range Acoustic Beamforming for Multimodal Scene Understanding
* Seg-XRes-CAM: Explaining Spatially Local Regions in Image Segmentation
* SegLoc: Learning Segmentation-Based Representations for Privacy-Preserving Visual Localization
* Selective Bokeh Effect Transformation
* Selective Structured State-Spaces for Long-Form Video Understanding
* Self-Adaptive Filtering for Ultra-Large-Scale Airborne LiDAR Data in Urban Environments Based on Object Primitive Global Energy Minimization
* Self-Correctable and Adaptable Inference for Generalizable Human Pose Estimation
* Self-Guided Diffusion Models
* Self-Positioning Point-Based Transformer for Point Cloud Understanding
* Self-supervised 3D Human Pose Estimation from a Single Image
* Self-Supervised 3D Scene Flow Estimation Guided by Superpoints
* Self-supervised AutoFlow
* Self-Supervised Blind Motion Deblurring with Deep Expectation Maximization
* Self-Supervised Geometry-Aware Encoder for Style-Based 3D GAN Inversion
* Self-Supervised Image-to-Point Distillation via Semantically Tolerant Contrastive Loss
* Self-Supervised Implicit Glyph Attention for Text Recognition
* Self-supervised Interest Point Detection and Description for Fisheye and Perspective Images
* Self-Supervised Learning for Accurate Liver View Classification in Ultrasound Images with Minimal Labeled Data
* Self-Supervised Learning for Multimodal Non-Rigid 3D Shape Matching
* Self-Supervised Learning for Videos: A Survey
* Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture
* Self-supervised Non-uniform Kernel Estimation with Flow-based Motion Prior for Blind Image Deblurring
* Self-Supervised Normalizing Flows for Image Anomaly Detection and Localization
* Self-Supervised Pre-Training with Masked Shape Prediction for 3D Scene Understanding
* Self-Supervised Representation Learning for CAD
* Self-Supervised Super-Plane for Neural 3D Reconstruction
* Self-Supervised Video Forensics by Audio-Visual Anomaly Detection
* Self-Supervised Video Interaction Classification using Image Representation of Skeleton Data
* Self-Supervised Video Similarity Learning
* SelfME: Self-Supervised Motion Learning for Micro-Expression Recognition
* SEM-POS: Grammatically and Semantically Correct Video Captioning
* Semantic Guidance Learning for High-Resolution Non-homogeneous Dehazing
* Semantic Human Parsing via Scalable Semantic Transfer Over Multiple Label Domains
* Semantic Point Cloud Upsampling
* Semantic Prompt for Few-Shot Image Recognition
* Semantic Ray: Learning a Generalizable Semantic Field with Cross-Reprojection Attention
* Semantic Scene Completion with Cleaner Self
* Semantic-Conditional Diffusion Networks for Image Captioning*
* Semantic-Promoted Debiasing and Background Disambiguation for Zero-Shot Instance Segmentation
* Semi-DETR: Semi-Supervised Object Detection with Detection Transformers
* Semi-FCMNet: Semi-Supervised Learning for Forest Cover Mapping from Satellite Imagery via Ensemble Self-Training and Perturbation
* Semi-Supervised 2D Human Pose Estimation Driven by Position Inconsistency Pseudo Label Correction Module
* Semi-Supervised Domain Adaptation with Source Label Adaptation
* Semi-Supervised Hand Appearance Recovery via Structure Disentanglement and Dual Adversarial Discrimination
* Semi-supervised learning made simple with self-supervised clustering
* Semi-Supervised Parametric Real-World Image Harmonization
* Semi-Supervised Stereo-Based 3D Object Detection via Cross-View Consensus
* Semi-Supervised Video Inpainting with Cycle Consistency Constraints
* Semi-Weakly Supervised Object Kinematic Motion Prediction
* SemiCVT: Semi-Supervised Convolutional Vision Transformer for Semantic Segmentation
* Semidefinite Relaxations for Robust Multiview Triangulation
* Sensing Mobility and Routine Locations through Mobile Phone and Crowdsourced Data: Analyzing Travel and Behavior during COVID-19
* Sensitivity of Optical Satellites to Estimate Windthrow Tree-Mortality in a Central Amazon Forest
* Separable Quaternion Matrix Factorization for Polarization Images
* SepicNet: Sharp Edges Recovery by Parametric Inference of Curves in 3D Shapes
* SeqTrack: Sequence to Sequence Learning for Visual Object Tracking
* Sequential Training of GANs Against GAN-Classifiers Reveals Correlated Knowledge Gaps Present Among Independently Trained GAN Instances
* Serverless Electrocardiogram Stream Processing in Federated Clouds With Lambda Architecture
* SeSDF: Self-Evolved Signed Distance Field for Implicit 3D Clothed Human Reconstruction
* SFD2: Semantic-Guided Feature Detection and Description
* SFGDO: Smart flower gradient descent optimization enabled generative adversarial network for recognition of Tamil handwritten character
* SfM-TTR: Using Structure from Motion for Test-Time Refinement of Single-View Depth Networks
* SGLoc: Scene Geometry Encoding for Outdoor LiDAR Localization
* ShadowDiffusion: When Degradation Prior Meets Diffusion Model for Shadow Removal
* ShadowNeuS: Neural SDF Reconstruction by Shadow Ray Supervision
* Shakes on a Plane: Unsupervised Depth Estimation from Unstabilized Photography
* Shape and Intensity Analysis of Glioblastoma Multiforme Tumors
* Shape of You: Precise 3D shape estimations for diverse body types
* Shape, Pose, and Appearance from a Single Image via Bootstrapped Radiance Field Inversion
* Shape-Aware Text-Driven Layered Video Editing
* Shape-Constraint Recurrent Flow for 6D Object Pose Estimation
* Shape-Erased Feature Learning for Visible-Infrared Person Re-Identification
* Shape-Net: Room Layout Estimation from Panoramic Images Robust to Occlusion using Knowledge Distillation with 3D Shapes as Additional Inputs
* ShapeClipper: Scalable 3D Shape Learning from Single-View Images via Geometric and CLIP-Based Consistency
* ShapeTalk: A Language Dataset and Framework for 3D Shape Edits and Deformations
* Shared Interest ... Sometimes: Understanding the Alignment between Human Perception, Vision Architectures, and Saliency Map Techniques
* Sharpness-Aware Gradient Matching for Domain Generalization
* Shepherding Slots to Objects: Towards Stable and Robust Object-Centric Learning
* SHIFT15M: Fashion-specific dataset for set-to-set matching with several distribution shifts
* Shifted Diffusion for Text-to-image Generation
* Shining light on the DVS pixel: A tutorial and discussion about biasing and optimization
* Shortcomings of Top-Down Randomization-Based Sanity Checks for Evaluations of Deep Neural Network Explanations
* SHS-Net: Learning Signed Hyper Surfaces for Oriented Normal Estimation of Point Clouds
* Siam-EMNet: A Siamese EfficientNet-MANet Network for Building Change Detection in Very High Resolution Images
* Siamese DETR
* Siamese Image Modeling for Self-Supervised Vision Representation Learning
* Sibling-Attack: Rethinking Transferable Adversarial Attacks against Face Recognition
* Side Adapter Network for Open-Vocabulary Semantic Segmentation
* SIEDOB: Semantic Image Editing by Disentangling Object and Background
* Sign Language Translation from Instructional Videos
* SignBERT+: Hand-Model-Aware Self-Supervised Pre-Training for Sign Language Understanding
* SIM: Semantic-aware Instance Mask Generation for Box-Supervised Instance Segmentation
* SimDE: A Simple Domain Expansion Approach for Single-source Domain Generalization
* Similar Class Style Augmentation for Efficient Cross-Domain Few-Shot Learning
* Similarity Maps for Self-Training Weakly-Supervised Phrase Grounding
* Similarity Metric Learning For RGB-Infrared Group Re-Identification
* Simple Baseline for Video Restoration with Grouped Spatial-Temporal Shift, A
* Simple Cues Lead to a Strong Multi-Object Tracker
* Simple Framework for Text-Supervised Semantic Segmentation, A
* Simple Transformer-style Network for Lightweight Image Super-resolution, A
* SimpleNet: A Simple Network for Image Anomaly Detection and Localization
* SimpSON: Simplifying Photo Cleanup with Single-Click Distracting Object Segmentation Network
* Simulated Annealing in Early Layers Leads to Better Generalization
* Simulating Task-Free Continual Learning Streams From Existing Datasets
* Simulation and Driving Factor Analysis of Satellite-Observed Terrestrial Water Storage Anomaly in the Pearl River Basin Using Deep Learning
* Simulation of the Ecological Service Value and Ecological Compensation in Arid Area: A Case Study of Ecologically Vulnerable Oasis
* Simultaneously Short- and Long-Term Temporal Modeling for Semi-Supervised Video Semantic Segmentation
* SINE: Semantic-driven Image-based NeRF Editing with Prior-guided Editing Field
* SINE: SINgle Image Editing with Text-to-Image Diffusion Models
* Single Domain Generalization for LiDAR Semantic Segmentation
* Single Image Backdoor Inversion via Robust Smoothed Classifiers
* Single Image based Infant Body Height and Weight Estimation
* Single Image Depth Prediction Made Better: A Multivariate Gaussian Take
* Single Residual Network with ESA Modules and Distillation, A
* Single View Scene Scale Estimation using Scale Field
* SinGRAF: Learning a 3D Generative Radiance Field for a Single Scene
* Singular Value Decomposition of the Wave Forward Operator with Radial Variable Coefficients
* Site Selection Prediction for Coffee Shops Based on Multi-Source Space Data Using Machine Learning Techniques
* Sketch-Segformer: Transformer-Based Segmentation for Figurative and Creative Sketches
* Sketch2Saliency: Learning to Detect Salient Objects from Human Drawings
* SketchXAI: A First Look at Explainability for Human Sketches
* SkiLL: Skipping Color and Label Landscape: Self Supervised Design Representations for Products in E-commerce
* Skinned Motion Retargeting with Residual Perception of Motion Semantics and Geometry
* SkyEye: Self-Supervised Bird's-Eye-View Semantic Mapping Using Monocular Frontal View Images
* SLACK: Stable Learning of Augmentations with Cold-Start and KL Regularization
* Sliced Optimal Partial Transport
* SliceMatch: Geometry-Guided Aggregation for Cross-View Pose Estimation
* Slide-Transformer: Hierarchical Vision Transformer with Local Self-Attention
* Slimmable Dataset Condensation
* SLOPER4D: A Scene-Aware Dataset for Global 4D Human Pose Estimation in Urban Environments
* SlowLiDAR: Increasing the Latency of LiDAR-Based Detection Using Adversarial Examples
* SMAE: Few-shot Learning for HDR Deghosting with Saturation-Aware Masked Autoencoders
* Smallcap: Lightweight Image Captioning Prompted with Retrieval Augmentation
* SmartAssign:Learning A Smart Knowledge Assignment Strategy for Deraining and Desnowing
* SmartBrush: Text and Shape Guided Object Inpainting with Diffusion Model
* SMOC-Net: Leveraging Camera Pose for Self-Supervised Monocular Object Pose Estimation
* SMPConv: Self-Moving Point Representations for Continuous Convolution
* Snow Cover on the Tibetan Plateau and Topographic Controls
* SoccerNet-Caption: Dense Video Captioning for Soccer Broadcasts Commentaries
* Soft Augmentation for Image Classification
* Soft-Landing Strategy for Alleviating the Task Discrepancy Problem in Temporal Action Localization Tasks
* Solar Irradiance Anticipative Transformer
* SOLARNet: A single stage regression based framework for efficient and robust object recognition in aerial images
* Solid Precipitation and Visibility Measurements at the Centre for Atmospheric Research Experiments in Southern Ontario and Bratt's Lake in Southern Saskatchewan
* Solving 3D Inverse Problems Using Pre-Trained 2D Diffusion Models
* Solving Oscillation Problem in Post-Training Quantization Through a Theoretical Perspective
* Solving Relaxations of MAP-MRF Problems: Combinatorial in-Face Frank-Wolfe Directions
* Soma Segmentation Benchmark in Full Adult Fly Brain, A
* SOOD: Towards Semi-Supervised Oriented Object Detection
* Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment
* Source-Free Adaptive Gaze Estimation by Uncertainty Reduction
* Source-Free Progressive Graph Learning for Open-Set Domain Adaptation
* Source-Free Video Domain Adaptation with Spatial-Temporal-Historical Consistency Learning
* Space-Based THz Radar Fly-Around Imaging Simulation for Space Targets Based on Improved Path Tracing
* Space-Time Super-Resolution for Light Field Videos
* SPARF: Neural Radiance Fields from Sparse and Noisy Poses
* Sparse fooling images: Fooling machine perception through unrecognizable images
* Sparse Multi-Modal Graph Transformer with Shared-Context Processing for Representation Learning of Giga-pixel Images
* Sparse Multimodal Vision Transformer for Weakly Supervised Semantic Segmentation
* Sparse Quadratic Approximation for Graph Learning
* Sparse Signal Models for Data Augmentation in Deep Learning ATR
* Sparse-E2VID: A Sparse Convolutional Model for Event-Based Video Reconstruction Trained with Real Event Noise
* Sparse-to-Dense Matching Network for Large-Scale LiDAR Point Cloud Registration
* SparseFusion: Distilling View-Conditioned Diffusion for 3D Reconstruction
* Sparsely Annotated Semantic Segmentation with Adaptive Gaussian Mixtures
* SparsePose: Sparse-View Camera Pose Regression and Refinement
* SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer
* Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers
* SPARTAN: Self-supervised Spatiotemporal Transformers Approach to Group Activity Recognition
* SpaText: Spatio-Textual Representation for Controllable Image Generation
* Spatial Analysis of Intra-Annual Reed Ecosystem Dynamics at Lake Neusiedl Using RGB Drone Imagery and Deep Learning
* Spatial and Temporal Patterns of Ecosystem Services and Trade-Offs/Synergies in Wujiang River Basin, China
* Spatial and Temporal Variation in Vegetation Cover and Its Response to Topography in the Selinco Region of the Qinghai-Tibet Plateau
* Spatial Data-Driven Approach for Mineral Prospectivity Mapping, A
* Spatial Distribution Characteristics and Influencing Factors on the Retail Industry in the Central Urban Area of Lanzhou City at the Scale of Daily Living Circles
* Spatial Distributions of Cloud Occurrences in Terms of Volume Fraction as Inferred from CloudSat and CALIPSO
* Spatial Mechanism and Predication of Rural Tourism Development in China: A Random Forest Regression Analysis, The
* Spatial Morphological Characteristics and Evolution of Traditional Villages in the Mountainous Area of Southwest Zhejiang
* Spatial Statistical Prediction of Solar-Induced Chlorophyll Fluorescence (SIF) from Multivariate OCO-2 Data
* Spatial Variability of Raindrop Size Distribution at Beijing City Scale and Its Implications for Polarimetric Radar QPE
* Spatial-Angular Multi-Scale Mechanism for Light Field Spatial Super-Resolution
* Spatial-Frequency Mutual Learning for Face Super-Resolution
* Spatial-Temporal Analysis of Vehicle Routing Problem from Online Car-Hailing Trajectories
* Spatial-temporal Concept based Explanation of 3D ConvNets
* Spatial-Temporal Graph-Based AU Relationship Learning for Facial Action Unit Detection
* Spatial-Temporal Pattern Analysis of Grassland Yield in Mongolian Plateau Based on Artificial Neural Network
* Spatial-Temporal Semantic Perception Network for Remote Sensing Image Semantic Change Detection
* Spatial-then-Temporal Self-Supervised Learning for Video Correspondence
* Spatially Adaptive Self-Supervised Learning for Real-World Image Denoising
* Spatio-Focal Bidirectional Disparity Estimation from a Dual-Pixel Image
* Spatio-Temporal Characteristics and Differences in Snow Density between the Tibet Plateau and the Arctic
* Spatio-Temporal Pixel-Level Contrastive Learning-based Source-Free Domain Adaptation for Video Semantic Segmentation
* Spatiotemporal Self-Supervised Learning for Point Clouds in the Wild
* Spatiotemporal Variation and Factors Influencing Water Yield Services in the Hengduan Mountains, China
* Spatiotemporal Weighted for Improving the Satellite-Based High-Resolution Ground PM2.5 Estimation Using the Light Gradient Boosting Machine
* Special section: Best papers of the 14th Mexican conference on pattern recognition (MCPR) 2022
* Specialist Diffusion: Plug-and-Play Sample-Efficient Fine-Tuning of Text-to-Image Diffusion Models to Learn Any Unseen Style
* Spectral Bayesian Uncertainty for Image Super-Resolution
* Spectral Enhanced Rectangle Transformer for Hyperspectral Image Denoising
* Spectral Transfer Guided Active Domain Adaptation For Thermal Imagery
* SPECTRE: Visual Speech-Informed Perceptual 3D Facial Expression Reconstruction from Videos
* Speed Is All You Need: On-Device Acceleration of Large Diffusion Models via GPU-Aware Optimizations
* Sphere-Guided Training of Neural Implicit Surfaces
* SphereGlue: Learning Keypoint Matching on High Resolution Spherical Images
* Spherical Image Inpainting with Frame Transformation and Data-Driven Prior Deep Networks
* Spherical Transformer for LiDAR-Based 3D Recognition
* Spider GAN: Leveraging Friendly Neighbors to Accelerate GAN Training
* SPIn-NeRF: Multiview Segmentation and Perceptual Inpainting with Neural Radiance Fields
* SplineCam: Exact Visualization and Characterization of Deep Network Geometry and Decision Boundaries
* SportsPose - A Dynamic 3D sports pose dataset
* Spreading of Localized Information across an Entire 3D Electrical Resistivity Volume via Constrained EMI Inversion Based on a Realistic Prior Distribution
* Spring: A High-Resolution High-Detail Dataset and Benchmark for Scene Flow, Optical Flow and Stereo
* SQUID: Deep Feature In-Painting for Unsupervised Anomaly Detection
* sRGB Real Noise Synthesizing with Neighboring Correlation-Aware Noise Model
* SRTM DEM Correction Using Ensemble Machine Learning Algorithm
* SS-TTA: Test-Time Adaption for Self-Supervised Denoising Methods
* SSGVS: Semantic Scene Graph-to-Video Synthesis
* ST-RoomNet: Learning Room Layout Estimation From Single Image Through Unsupervised Spatial Transformations
* Standing Between Past and Future: Spatio-Temporal Modeling for Multi-Camera 3D Multi-Object Tracking
* STAR Loss: Reducing Semantic Ambiguity in Facial Landmark Detection
* STAR: Sparse Thresholded Activation under partial-Regularization for Activation Sparsity Exploration
* StarCraftImage: A Dataset For Prototyping Spatial Reasoning Methods For Multi-Agent Environments
* Stare at What You See: Masked Image Modeling without Reconstruction
* Starting from Non-Parametric Networks for 3D Point Cloud Analysis
* Starting Point Selection and Multiple-Standard Matching for Video Object Segmentation With Language Annotation
* State Graph Reasoning for Multimodal Conversational Recommendation
* STDLens: Model Hijacking-Resilient Federated Learning for Object Detection
* Stealthy Backdoor Attack Against Speaker Recognition Using Phase-Injection Hidden Trigger
* SteerNeRF: Accelerating NeRF Rendering via Smooth Viewpoint Trajectory
* StepFormer: Self-Supervised Step Discovery and Localization in Instructional Videos
* Stereo Cross Global Learnable Attention Module for Stereo Image Super-Resolution
* StillFast: An End-to-End Approach for Short-Term Object Interaction Anticipation
* Stimulus Verification is a Universal and Effective Sampler in Multi-modal Human Trajectory Prediction
* Stitchable Neural Networks
* STMixer: A One-Stage Sparse Action Detector
* STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action Recognition
* Stream-Based Active Distillation for Scalable Model Deployment
* Streaming Video Model
* Streamlined Global and Local Features Combinator (SGLC) for High Resolution Image Dehazing
* Strong and Robust Skeleton-Based Gait Recognition Method with Gait Periodicity Priors, A
* Strong Baseline for Generalized Few-Shot Semantic Segmentation, A
* Strong Detector with Simple Tracker
* Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction
* Structure Aggregation for Cross-Spectral Stereo Image Guided Denoising
* Structured 3D Features for Reconstructing Controllable Avatars
* Structured Epipolar Matcher for Local Feature Matching
* Structured Kernel Estimation for Photon-Limited Deconvolution
* Structured Sparsity Learning for Efficient Video Super-Resolution
* StructVPR: Distill Structural Knowledge with Weighting Samples for Visual Place Recognition
* Study of the Bistatic Cross-Correlation Function of Two Signals Separated in Frequency Reflected by the Water Surface, The
* Study on the Spatial and Temporal Distribution of Urban Vegetation Phenology by Local Climate Zone and Urban-Rural Gradient Approach
* Style Projected Clustering for Domain Generalized Semantic Segmentation
* StyleAdv: Meta Style Adversarial Training for Cross-Domain Few-Shot Learning
* StyleGAN Salon: Multi-View Latent Optimization for Pose-Invariant Hairstyle Transfer
* StyleGene: Crossover and Mutation of Region-level Facial Genes for Kinship Face Synthesis
* StyleIPSB: Identity-Preserving Semantic Basis of StyleGAN for High Fidelity Face Swapping
* StyleRes: Transforming the Residuals for Real Image Editing with StyleGAN
* StyleRF: Zero-Shot 3D Style Transfer of Neural Radiance Fields
* StyLess: Boosting the Transferability of Adversarial Examples
* StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-Based Generator
* SUDS: Scalable Urban Dynamic Scenes
* Suitability of Machine-Learning Algorithms for the Automatic Acoustic Seafloor Classification of Hard Substrate Habitats in the German Bight, The
* SunStage: Portrait Reconstruction and Relighting Using the Sun as a Light Stage
* Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning
* Super-Resolution Neural Operator
* Super-Resolution Training Paradigm Based on Low-Resolution Data Only to Surpass the Technical Limits of STEM and STM Microscopy, A
* Superclass Learning with Representation Enhancement
* SuperDisco: Super-Class Discovery Improves Visual Recognition for the Long-Tail
* Supervised Masked Knowledge Distillation for Few-Shot Transformers
* SUPRA: Superpixel Guided Loss for Improved Multi-modal Segmentation in Endoscopy
* SurfelNeRF: Neural Surfel Radiance Fields for Online Photorealistic Reconstruction of Indoor Scenes
* Surrogate Modeling for Bayesian Optimization Beyond a Single Gaussian Process
* Surveillance Face Presentation Attack Detection Challenge
* Survey of Methods for Automated Quality Control Based on Images, A
* Survey on Underwater Computer Vision, A
* SVFormer: Semi-supervised Video Transformer for Action Recognition
* SVGformer: Representation Learning for Continuous Vector Graphics using Transformers
* SViTT: Temporal Learning of Sparse Video-Text Transformers
* Swept-Angle Synthetic Wavelength Interferometry
* SwinFSR: Stereo Image Super-Resolution using SwinIR and Frequency Domain Knowledge
* Switchable Representation Learning Framework with Self-Compatibility
* Symmetric Shape-Preserving Autoencoder for Unsupervised Real Scene Point Cloud Completion
* Synchronization of Switched Neural Networks via Attacked Mode-Dependent Event-Triggered Control and Its Application in Image Encryption
* SynthASpoof: Developing Face Presentation Attack Detection Based on Privacy-friendly Synthetic Data
* Synthesis of Synthetic Hyperspectral Images with Controllable Spectral Variability Using a Generative Adversarial Network
* Synthesizing Photorealistic Virtual Humans Through Cross-Modal Disentanglement
* Synthetic Data for Defect Segmentation on Complex Metal Surfaces
* Synthetic Sample Selection for Generalized Zero-Shot Learning
* SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision
* System for Dense Monocular Mapping with a Fisheye Camera, A
* System-Status-Aware Adaptive Network for Online Streaming Video Understanding
* Systematic Architectural Design of Scale Transformed Attention Condenser DNNs via Multi-Scale Class Representational Response Similarity Analysis
* Systematic Review on Advancements in Remote Sensing for Assessing and Monitoring Land Use and Land Cover Changes Impacts on Surface Water Resources in Semi-Arid Tropical Environments, A
* t-RAIN: Robust generalization under weather-aliasing label shift attacks
* T-SEA: Transfer-Based Self-Ensemble Attack on Object Detection
* T2V2T: Text-to-Video-to-Text Fusion for Text-to-Video Retrieval
* Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation
* Tangentially Elongated Gaussian Belief Propagation for Event-Based Incremental Optical Flow Estimation
* TAPS3D: Text-Guided 3D Textured Shape Generation from Pseudo Supervision
* Target Localization Method Based on Image Degradation Suppression and Multi-Similarity Fusion in Low-Illumination Environments
* Target Recognition in SAR Images Using Complex-Valued Network Guided with Sub-Aperture Decomposition
* Target-referenced Reactive Grasping for Dynamic Objects
* TarViS: A Unified Approach for Target-Based Video Segmentation
* Task Difficulty Aware Parameter Allocation and Regularization for Lifelong Learning
* Task Residual for Tuning Vision-Language Models
* Task-Specific Fine-Tuning via Variational Information Bottleneck for Weakly-Supervised Pathology Whole Slide Image Classification
* TBP-Former: Learning Temporal Bird's-Eye-View Pyramid for Joint Perception and Prediction in Vision-Centric Autonomous Driving
* Teacher-generated spatial-attention labels boost robustness and accuracy of contrastive models
* Teaching Matters: Investigating the Role of Supervision in Vision Transformers
* Teaching Structured Vision and Language Concepts to Vision and Language Models
* Teleidoscopic Imaging System for Microscale 3D Shape Reconstruction
* Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation
* Tempo-Spatial Landslide Susceptibility Assessment from the Perspective of Human Engineering Activity
* Temporal and Spatial Variations in Carbon Flux and Their Influencing Mechanisms on the Middle Tien Shan Region Grassland Ecosystem, China
* Temporal Attention Unit: Towards Efficient Spatiotemporal Predictive Learning
* Temporal Consistent 3D LiDAR Representation Learning for Semantic Perception in Autonomous Driving
* Temporal Consistent Automatic Video Colorization via Semantic Correspondence
* Temporal Encoder-Decoder Approach to Extracting Blood Volume Pulse Signal Morphology from Face Videos, A
* Temporal information oriented motion accumulation and selection network for RGB-based action recognition
* Temporal Interpolation is all You Need for Dynamic Neural Radiance Fields
* Temporal Pixel-Level Semantic Understanding Through the VSPW Dataset
* Temporally Averaged Regression for Semi-Supervised Low-Light Image Enhancement
* Temporally Consistent Online Depth Estimation Using Point-Based Fusion
* Temporally consistent reconstruction of 3D clothed human surface with warp field
* TemPose: a new skeleton-based transformer model designed for fine-grained motion recognition in badminton
* TempSAL: Uncovering Temporal Information for Deep Saliency Prediction
* TempT: Temporal consistency for Test-time adaptation
* TensoIR: Tensorial Inverse Rendering
* Tensor4D: Efficient Neural 4D Decomposition for High-Fidelity Dynamic Reconstruction and Rendering
* TeSLA: Test-Time Self-Learning With Automatic Adversarial Augmentation
* Test of Time: Instilling Video-Language Models with a Sense of Time
* Test Time Adaptation with Regularized Loss for Weakly Supervised Salient Object Detection
* TEVAD: Improved video anomaly detection with captions
* TexPose: Neural Texture Learning for Self-Supervised 6D Object Pose Estimation
* Text with Knowledge Graph Augmented Transformer for Video Captioning
* Text-Guided Generation and Refinement Model for Image Captioning, A
* Text-Guided Unsupervised Latent Transformation for Multi-Attribute Image Manipulation
* Text-Visual Prompting for Efficient 2D Temporal Video Grounding
* Text2Concept: Concept Activation Vectors Directly from Text
* Text2Scene: Text-driven Indoor Scene Stylization with Part-Aware Details
* TextFace: Text-to-Style Mapping Based Face Generation and Manipulation
* Texts as Images in Prompt Tuning for Multi-Label Image Recognition
* Texture-Guided Saliency Distilling for Unsupervised Salient Object Detection
* TFRGAN: Leveraging Text Information for Blind Face Restoration with Extreme Degradation
* Theia: Bleed-Through Estimation with Convolutional Neural Networks
* Theoretical Foundation of the Stretch Energy Minimization for Area-Preserving Simplicial Mappings
* Therbligs in Action: Video Understanding through Motion Primitives
* Thermal Image Super-Resolution Challenge Results - PBVS 2023
* Thermal Infrared Single Image Dehazing and Blind Image Quality Assessment
* Thermal Spread Functions (TSF): Physics-Guided Material Classification
* Think Twice before Driving: Towards Scalable Decoders for End-to-End Autonomous Driving
* Three Guidelines You Should Know for Universally Slimmable Self-Supervised Learning
* Three Recipes for Better 3D Pseudo-GTs of 3D Human Mesh Estimation in the Wild
* Three-Stage Framework with Reliable Sample Pool for Long-Tailed Classification, A
* TimeBalance: Temporally-Invariant and Temporally-Distinctive Video Representations for Semi-Supervised Action Recognition
* TimelyFL: Heterogeneity-aware Asynchronous Federated Learning with Adaptive Partial Training
* TINC: Tree-Structured Implicit Neural Compression
* TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models
* TIPI: Test Time Adaptation with Transformation Invariance
* TMO-Det: Deep tone-mapping optimized with and for object detection
* TMO: Textured Mesh Acquisition of Objects with a Mobile Device by using Differentiable Rendering
* Token Boosting for Robust Self-Supervised Visual Transformer Pre-training
* Token Contrast for Weakly-Supervised Semantic Segmentation
* Token Merging for Fast Stable Diffusion
* Token Turing Machines
* TokenHPE: Learning Orientation Tokens for Efficient Head Pose Estimation via Transformers
* Top-Down Visual Attention from Analysis by Synthesis
* TopDiG: Class-Agnostic Topological Directional Graph Extraction from Remote Sensing Images
* TopFusion: Using Topological Feature Space for Fusion and Imputation in Multi-Modal Data
* TOPLight: Lightweight Neural Networks with Task-Oriented Pretraining for Visible-Infrared Recognition
* TopNet: Transformer-Based Object Placement Network for Image Compositing
* Topological Identification of Vortical Flow Structures in the Left Ventricle of the Heart
* Topology Preserving Compositionality for Robust Medical Image Segmentation
* Topology-Aware Focal Loss for 3D Image Segmentation
* Topology-Guided Multi-Class Cell Context Generation for Digital Pathology
* TorchSparse++: Efficient Point Cloud Engine
* ToThePoint: Efficient Contrastive Learning of 3D Point Clouds via Recycling
* Tourism Support System to Utilize Virtual Reality Space Reflecting Dynamic Information in Real Time
* Toward Accurate Post-Training Quantization for Image Super Resolution
* Toward RAW Object Detection: A New Benchmark and A New Model
* Toward Real-World Light Field Super-Resolution
* Toward Stable, Interpretable, and Lightweight Hyperspectral Super-Resolution
* Toward Verifiable and Reproducible Human Evaluation for Text-to-Image Generation
* Towards a Smaller Student: Capacity Dynamic Distillation for Efficient Image Retrieval
* Towards Accurate Image Coding: Improved Autoregressive Image Generation with Dynamic Vector Quantization
* Towards Active Learning for Action Spotting in Association Football Videos
* Towards All-in-One Pre-Training via Maximizing Multi-Modal Mutual Information
* Towards Artistic Image Aesthetics Assessment: A Large-scale Dataset and a New Method
* Towards Automated Polyp Segmentation Using Weakly- and Semi-Supervised Learning and Deformable Transformers
* Towards Benchmarking and Assessing Visual Naturalness of Physical World Adversarial Attacks
* Towards Better Decision Forests: Forest Alternating Optimization
* Towards Better Gradient Consistency for Neural Signed Distance Functions via Level Set Alignment
* Towards Better Stability and Adaptability: Improve Online Self-Training for Model Adaptation in Semantic Segmentation
* Towards Bridging the Performance Gaps of Joint Energy-Based Models
* Towards Building Self-Aware Object Detectors via Reliable Uncertainty Quantification and Calibration
* Towards Characterizing the Semantic Robustness of Face Recognition
* Towards Compositional Adversarial Robustness: Generalizing Adversarial Training to Composite Semantic Perturbations
* Towards Domain Generalization for Multi-view 3D Object Detection in Bird-Eye-View
* Towards Effective Adversarial Textured 3D Meshes on Physical Face Recognition
* Towards Effective Visual Representations for Partial-Label Learning
* Towards Efficient Use of Multi-Scale Features in Transformer-Based Object Detectors
* Towards End-to-End Generative Modeling of Long Videos with Memory-Efficient Bidirectional Transformers
* Towards Evaluating Explanations of Vision Transformers for Medical Imaging
* Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel Video-Language Retrieval
* Towards Flexible Multi-modal Document Models
* Towards Generalisable Video Moment Retrieval: Visual-Dynamic Injection to Image-Text Pre-Training
* Towards High-Quality and Efficient Video Super-Resolution via Spatial-Temporal Data Overfitting
* Towards Modality-Agnostic Person Re-identification with Descriptive Query
* Towards Open-World Segmentation of Parts
* Towards Practical Plug-and-Play Diffusion Models
* Towards Professional Level Crowd Annotation of Expert Domain Data
* Towards Real-Time 4K Image Super-Resolution
* Towards Realistic Long-Tailed Semi-Supervised Learning: Consistency is All You Need
* Towards Robust Tampered Text Detection in Document Image: New Dataset and New Solution
* Towards Scalable Neural Representation for Diverse Videos
* Towards Sim-to-Real Industrial Parts Classification with Synthetic Dataset
* Towards Stable Human Pose Estimation via Cross-View Fusion and Foot Stabilization
* Towards Transferable Targeted Adversarial Examples
* Towards Trustable Skin Cancer Diagnosis via Rewriting Model's Decision
* Towards Unbiased Volume Rendering of Neural Implicit Surfaces with Geometry Priors
* Towards Unified Scene Text Spotting Based on Sequence Generation
* Towards Universal Fake Image Detectors that Generalize Across Generative Models
* Towards Unsupervised Object Detection from LiDAR Point Clouds
* Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion
* TRACE: 5D Temporal Regression of Avatars with Dynamic Cameras in 3D Environments
* Tracked-Vehicle Retrieval by Natural Language Descriptions with Multi-Contextual Adaptive Knowledge
* Tracking Multiple Deformable Objects in Egocentric Videos
* Tracking Through Containers and Occluders in the Wild
* Trade-off between Robustness and Accuracy of Vision Transformers
* Train-Once-for-All Personalization
* Train/Test-Time Adaptation with Retrieval
* Trainable Projected Gradient Method for Robust Fine-Tuning
* Training Debiased Subnetworks with Contrastive Weight Pruning
* Training Strategies for Vision Transformers for Object Detection
* Trajectory-Aware Body Interaction Transformer for Multi-Person Pose Forecasting
* Transductive Few-Shot Learning with Prototype-Based Label Propagation by Iterative Graph Refinement
* TransER: Hybrid Model and Ensemble-based Sequential Learning for Non-homogenous Dehazing
* Transfer Knowledge from Head to Tail: Uncertainty Calibration under Long-tailed Distribution
* Transfer4D: A Framework for Frugal Motion Capture and Deformation Transfer
* Transferable Adversarial Attacks on Vision Transformers with Token Gradient Regularization
* TransFlow: Transformer as Flow Learner
* Transformer Scale Gate for Semantic Segmentation
* Transformer-Based Context Condensation for Boosting Feature Pyramids in Object Detection
* Transformer-based feature interactor for person re-identification with margin self-punishment loss
* Transformer-based Fusion of 2D-pose and Spatio-temporal Embeddings for Distracted Driver Action Recognition
* Transformer-based global-local feature learning model for occluded person re-identification
* Transformer-Based Learned Optimization
* Transformer-based Unified Recognition of Two Hands Manipulating Objects
* Transforming Radiance Field with Lipschitz Network for Photorealistic 3D Scene Stylization
* Transforming spatio-temporal self-attention using action embedding for skeleton-based action recognition
* TransFusion: Multi-modal Fusion Network for Semantic Segmentation
* TranSG: Transformer-Based Skeleton Graph Prototype Contrastive Learning with Structure-Trajectory Prompted Reconstruction for Person Re-Identification
* Trap Attention: Monocular Depth Estimation with Manual Traps
* Treasure Beneath Multiple Annotations: An Uncertainty-Aware Edge Detector, The
* Tree Instance Segmentation with Temporal Contour Graph
* Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction
* TriDet: Temporal Action Detection with Relative Boundary Modeling
* Triplet Temporal-based Video Recognition with Multiview for Temporal Action Localization
* TriVol: Point Cloud Rendering via Triple Volumes
* TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets
* TrojViT: Trojan Insertion in Vision Transformers
* TruFor: Leveraging All-Round Clues for Trustworthy Image Forgery Detection and Localization
* TryOnDiffusion: A Tale of Two UNets
* TSRFormer: Transformer Based Two-stage Refinement for Single Image Shadow Removal
* TTA-COPE: Test-Time Adaptation for Category-Level Object Pose Estimation
* TTVFI: Learning Trajectory-Aware Transformer for Video Frame Interpolation
* Tunable Convolutions with Parametric Multi-Loss Optimization
* Turning a CLIP Model into a Scene Text Detector
* Turning Strengths into Weaknesses: A Certified Robustness Inspired Attack Framework against Graph Neural Networks
* Twin Contrastive Learning with Noisy Labels
* TWINS: A Fine-Tuning Framework for Improved Transferability of Adversarial Robustness and Generalization
* Two-shot Video Object Segmentation
* two-stage algorithm for vehicle routing problem with charging relief in post-disaster, A
* Two-stage Co-segmentation Network Based on Discriminative Representation for Recovering Human Mesh from Videos
* Two-Stream Networks for Weakly-Supervised Temporal Action Localization with Semantic-Aware Mechanisms
* Two-View Geometry Scoring Without Correspondences
* Two-Way Multi-Label Loss
* U2RLE: Uncertainty-Guided 2-Stage Room Layout Estimation
* UDE: A Unified Driving Engine for Human Motion Generation
* ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding
* Ultra-High Resolution Segmentation with Ultra-Rich Context: A Novel Benchmark
* Ultra-Sonic Sensor based Object Detection for Autonomous Vehicles
* Ultrahigh Resolution Image/Video Matting with Spatio-Temporal Sparsity
* UMat: Uncertainty-Aware Single Image High Resolution Material Capture
* Unbalanced Optimal Transport: A Unified Framework for Object Detection
* Unbiased 4D: Monocular 4D Reconstruction with a Neural Deformation Model
* Unbiased Multiple Instance Learning for Weakly Supervised Video Anomaly Detection
* Unbiased Scene Graph Generation in Videos
* Uncertainty in Real-Time Semantic Segmentation on Embedded Systems
* Uncertainty-Aware Optimal Transport for Semantically Coherent Out-of-Distribution Detection
* Uncertainty-Aware Source-Free Domain Adaptive Semantic Segmentation
* Uncertainty-Aware Unsupervised Image Deblurring with Deep Residual Prior
* Uncertainty-Aware Vision-Based Metric Cross-View Geolocalization
* Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models
* Uncovering the Inner Workings of STEGO for Safe Unsupervised Semantic Segmentation
* Uncovering the Missing Pattern: Unified Framework Towards Trajectory Imputation and Prediction
* UnCRtainTS: Uncertainty Quantification for Cloud Removal in Optical Satellite Time Series
* Uncurated Image-Text Datasets: Shedding Light on Demographic Bias
* Understanding and Constructing Latent Modality Structures in Multi-Modal Representation Learning
* Understanding and Improving Features Learned in Deep Functional Maps
* Understanding and Improving Visual Prompting: A Label-Mapping Perspective
* Understanding Deep Generative Models with Generalized Empirical Likelihoods
* Understanding Imbalanced Semantic Segmentation Through Neural Collapse
* Understanding Masked Autoencoders via Hierarchical Latent Variable Models
* Understanding Masked Image Modeling via Learning Occlusion Invariant Feature
* Understanding the Robustness of 3D Object Detection with Bird'View Representations in Autonomous Driving
* Underwater Moving Object Detection using an End-to-End Encoder-Decoder Architecture and GraphSage with Aggregator and Refactoring
* Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks
* Uni3D: A Unified Baseline for Multi-Dataset 3D Object Detection
* Unicode Analogies: An Anti-Objectivist Visual Reasoning Challenge
* UniDAformer: Unified Domain Adaptive Panoptic Segmentation Transformer via Hierarchical Mask Calibration
* UniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse Proposal Generation and Goal-Conditioned Policy
* UniDistill: A Universal Cross-Modality Knowledge Distillation Framework for 3D Object Detection in Bird's-Eye View
* Unified Approach to Facial Affect Analysis: the MAE-Face Visual Representation, A
* Unified HDR Imaging Method with Pixel and Patch Level, A
* Unified Keypoint-Based Action Recognition Framework via Structured Keypoint Pooling
* Unified Knowledge Distillation Framework for Deep Directed Graphical Models, A
* Unified Mask Embedding and Correspondence Learning for Self-Supervised Video Segmentation
* unified model for continuous conditional video prediction, A
* Unified Multi-modal Structure for Retrieving Tracked Vehicles through Natural Language Descriptions, A
* Unified Pose Sequence Modeling
* Unified Pyramid Recurrent Network for Video Frame Interpolation, A
* Unified Spatial-Angular Structured Light for Single-View Acquisition of Shape and Reflectance, A
* Unified Transformer-based Tracker for Anti-UAV Tracking, A
* Unifying Layout Generation with a Decoupled Diffusion Model
* Unifying Short and Long-Term Tracking with Graph Hierarchies
* Unifying Vision, Text, and Layout for Universal Document Processing
* UniHCP: A Unified Model for Human-Centric Perceptions
* UniSim: A Neural Closed-Loop Sensor Simulator
* Unite and Conquer: Plug and Play Multi-Modal Synthesis Using Diffusion Models
* Universal Face Encoder: Learning Disentangled Representations Across Different Attributes, The
* Universal Guidance for Diffusion Models
* Universal Instance Perception as Object Discovery and Retrieval
* Universal Watermark Vaccine: Universal Adversarial Perturbations for Watermark Protection
* Unknown Sniffer for Object Detection: Don't Turn a Blind Eye to Unknown Objects
* Unlearnable Clusters: Towards Label-Agnostic Unlearnable Examples
* Unlimited-Size Diffusion Restoration
* Unmasking Your Expression: Expression-Conditioned GAN for Masked Face Inpainting
* Unpaired Image-to-Image Translation with Shortest Path Regularization
* Unsupervised 3D Point Cloud Representation Learning by Triangle Constrained Contrast for Autonomous Driving
* Unsupervised 3D Shape Reconstruction by Part Retrieval and Assembly
* Unsupervised Automatic Defect Inspection based on Image Matching and Local One-class Classification
* Unsupervised Bidirectional Style Transfer Network using Local Feature Transform Module
* Unsupervised Continual Semantic Adaptation Through Neural Rendering
* Unsupervised Contour Tracking of Live Cells by Mechanical and Cycle Consistency Losses
* Unsupervised Cross-Media Graph Convolutional Network for 2D Image-Based 3D Model Retrieval
* Unsupervised Cumulative Domain Adaptation for Foggy Scene Optical Flow
* Unsupervised Deep Asymmetric Stereo Matching with Spatially-Adaptive Self-Similarity
* Unsupervised Deep Probabilistic Approach for Partial Point Cloud Registration
* Unsupervised Domain Adaption with Pixel-Level Discriminator for Image-Aware Layout Generation
* Unsupervised Inference of Signed Distance Functions from Single Sparse Point Clouds without Learning Priors
* Unsupervised Intrinsic Image Decomposition with LiDAR Intensity
* Unsupervised Low-Light Video Enhancement With Spatial-Temporal Co-Attention Transformer
* Unsupervised Object Localization: Observing the Background to Discover Objects
* Unsupervised person re-identification by dynamic hybrid contrastive learning
* Unsupervised Pixel-Level Detection of Rail Surface Defects Using Multistep Domain Adaptation
* Unsupervised Point Cloud Representation Learning With Deep Neural Networks: A Survey
* Unsupervised Sampling Promoting for Stochastic Human Trajectory Prediction
* Unsupervised Space-Time Network for Temporally-Consistent Segmentation of Multiple Motions
* Unsupervised Style-based Explicit 3D Face Reconstruction from Single Image
* Unsupervised Visible-Infrared Person Re-Identification via Progressive Graph Matching and Alternate Learning
* Unsupervised Volumetric Animation
* Unusual Enhancement of the Optical Depth on the Continental Shelf Depth Latitudinal Variation in the Stratospheric Polar Vortex
* Upcycling Models Under Domain and Category Shift
* Urban Resident Travel Survey Method Based on Cellular Signaling Data
* Urban Vulnerability Analysis Based on Micro-Geographic Unit with Multi-Source Data: Case Study in Urumqi, Xinjiang, China
* Use of High-Resolution Land Cover Maps to Support the Maintenance of the NWI Geospatial Dataset: A Case Study in a Coastal New Orleans Region
* Use Your Head: Improving Long-Tail Video Recognition
* Usefulness of an Urban Growth Model in Creating Scenarios for City Resilience Planning: An End-User Perspective
* Using a Cost-Distance Time-Geographic Approach to Identify Red Deer Habitat Use in Banff National Park, Alberta, Canada
* Using the InVEST-PLUS Model to Predict and Analyze the Pattern of Ecosystem Carbon storage in Liaoning Province, China
* USMLP: U-shaped Sparse-MLP network for mass segmentation in mammograms
* UTM: A Unified Multiple Object Tracking Model with Identity-Aware Feature Enhancement
* UV Volumes for Real-time Rendering of Editable Free-view Human Performance
* V2V4Real: A Real-World Large-Scale Dataset for Vehicle-to-Vehicle Cooperative Perception
* V2X-Seq: A Large-Scale Sequential Dataset for Vehicle-Infrastructure Cooperative Perception and Forecasting
* VAESim: A probabilistic approach for self-supervised prototype discovery
* Validation of the Ocean Wave Spectrum from the Remote Sensing Data of the Chinese-French Oceanography Satellite
* Variance Stabilizing Transformations for Intensity Estimators of Shot Noise
* Variational Distribution Learning for Unsupervised Text-to-Image Generation
* Variational Relational Point Completion Network for Robust 3D Classification
* VARS: Video Assistant Referee System for Automated Soccer Decision Making from Multiple Views
* VCGAN: Video Colorization With Hybrid Generative Adversarial Network
* VDN-NeRF: Resolving Shape-Radiance Ambiguity via View-Dependence Normalization
* VDPVE: VQA Dataset for Perceptual Video Enhancement
* VecFontSDF: Learning to Reconstruct and Synthesize High-Quality Vector Fonts via Signed Distance Functions
* Vector Quantization with Self-Attention for Quality-Independent Representation Learning
* VectorFloorSeg: Two-Stream Graph Attention Network for Vectorized Roughcast Floorplan Segmentation
* VectorFusion: Text-to-SVG by Abstracting Pixel-Based Diffusion Models
* VGFlow: Visibility guided Flow Network for Human Reposing
* Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition
* Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
* Video Analytics for Detecting Motorcyclist Helmet Rule Violations
* Video Compression with Entropy-Constrained Neural Representations
* Video Dehazing via a Multi-Range Temporal Alignment Network with Physical Prior
* Video Event Restoration Based on Keyframes for Video Anomaly Detection
* Video Probabilistic Diffusion Models in Projected Latent Space
* Video Quality Assessment Based on Swin Transformer with Spatio-Temporal Feature Fusion and Data Augmentation
* Video Test-Time Adaptation for Action Recognition
* Video Tiny-Object Detection Guided by the Spatial-Temporal Motion Information
* Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
* VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation
* VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
* VideoMatt: A Simple Baseline for Accessible Real-Time Video Matting
* VideoTrack: Learning to Track Objects via Video Transformer
* ViewNet: A Novel Projection-Based Backbone with View Pooling for Few-shot Point Cloud Classification
* Viewpoint Alignment and Discriminative Parts Enhancement in 3D Space for Vehicle ReID
* Viewpoint Equivariance for Multi-View 3D Object Detection
* VILA: Learning Image Aesthetics from User Comments with Vision-Language Pretraining
* ViLEM: Visual-Language Error Modeling for Image-Text Retrieval
* VindLU: A Recipe for Effective Video-and-Language Pretraining
* ViP3D: End-to-End Visual Trajectory Prediction via 3D Agent Queries
* ViPLO: Vision Transformer Based Pose-Conditioned Self-Loop Graph for Human-Object Interaction Detection
* Virtual Metrology Filter-Based Algorithms for Estimating Constant Ocean Current Velocity
* Virtual Occlusions Through Implicit Depth
* Virtual Sparse Convolution for Multimodal 3D Object Detection
* VisFusion: Visibility-Aware Online 3D Scene Reconstruction from Videos
* Visibility Aware Human-Object Interaction Tracking from Single RGB Camera
* Visibility Constrained Wide-Band Illumination Spectrum Design for Seeing-in-the-Dark
* Vision + Language Applications: A Survey
* Vision DiffMask: Faithful Interpretation of Vision Transformers with Differentiable Patch Masking
* Vision Transformers are Good Mask Auto-Labelers
* Vision Transformers are Parameter-Efficient Audio-Visual Learners
* Vision Transformers with Mixed-Resolution Tokenization
* VisiTherS: Visible-thermal infrared stereo disparity estimation of human silhouette
* Visual Atoms: Pre-Training Vision Transformers with Sinusoidal Waves
* Visual Dependency Transformers: Dependency Tree Emerges from Reversed Attention
* Visual DNA: Representing and Comparing Images Using Distributions of Neuron Activations
* Visual Exemplar Driven Task-Prompting for Unified Perception in Autonomous Driving
* Visual Gyroscope: Combination of Deep Learning Features and Direct Alignment for Panoramic Stabilization
* Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images
* Visual Localization using Imperfect 3D Models from the Internet
* Visual Programming: Compositional visual reasoning without training
* Visual Prompt Multi-Modal Tracking
* Visual Prompt Tuning for Generative Transfer Learning
* Visual Query Tuning: Towards Effective Usage of Intermediate Representations for Parameter and Memory Efficient Transfer Learning
* Visual Reasoning: From State to Transformation
* Visual Recognition by Request
* Visual Recognition-Driven Image Restoration for Multiple Degradation with Intrinsic Semantics Recovery
* Visual Semantic Relatedness Dataset for Image Captioning
* Visual tracking using transformer with a combination of convolution and attention
* Visual-Language Prompt Tuning with Knowledge-Guided Context Optimization
* Visual-Tactile Sensing for In-Hand Object Reconstruction
* Visualizing Skiers' Trajectories in Monocular Videos
* Visually-Guided Audio Spatialization in Video with Geometry-Aware Multi-task Learning
* Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting
* ViTs for SITS: Vision Transformers for Satellite Image Time Series
* VIVE3D: Viewpoint-Independent Video Editing using 3D-Aware GANs
* VL-SAT: Visual-Linguistic Semantics Assisted Training for 3D Semantic Scene Graph Prediction in Point Cloud
* VLPD: Context-Aware Pedestrian Detection via Vision-Language Semantic Self-Supervision
* vMAP: Vectorised Object Mapping for Neural Field SLAM
* VNE: An Effective Method for Improving Deep Representation by Manipulating Eigenvalue Distribution
* VolRecon: Volume Rendering of Signed Ray Distance Functions for Generalizable Multi-View Reconstruction
* VoP: Text-Video Co-Operative Prompt Tuning for Cross-Modal Retrieval
* VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking
* VoxFormer: Sparse Voxel Transformer for Camera-Based 3D Semantic Scene Completion
* VQACL: A Novel Visual Question Answering Continual Learning Setting
* VTAE: Variational Transformer Autoencoder With Manifolds Learning
* Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring
* Water Surface Acoustic Wave Detection by a Millimeter Wave Radar
* Wavelet Diffusion Models are fast and scalable Image Generators
* Wavelet-FCWAN: Fast and Covert Watermarking Attack Network in Wavelet Domain
* Wavelet-Guided Promotion-Suppression Transformer for Surface-Defect Detection
* Weak-shot Object Detection through Mutual Knowledge Transfer
* Weakly Supervised Class-agnostic Motion Prediction for Autonomous Driving
* Weakly Supervised Monocular 3D Object Detection Using Multi-View Projection and Direction Consistency
* Weakly Supervised Posture Mining for Fine-Grained Classification
* Weakly Supervised Segmentation with Point Annotations for Histopathology Images via Contrast-Based Variational Model
* Weakly Supervised Semantic Segmentation via Adversarial Learning of Classifier and Reconstructor
* Weakly Supervised Temporal Sentence Grounding with Uncertainty-Guided Self-training
* Weakly Supervised Video Emotion Detection and Prediction via Cross-Modal Temporal Erasing Network
* Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos
* Weakly Supervised Visual Question Answer Generation
* Weakly-Supervised Domain Adaptive Semantic Segmentation with Prototypical Contrastive Learning
* Weakly-supervised Single-view Image Relighting
* WeatherStream: Light Transport Automation of Single Image Deweathering
* WEDGE: A multi-weather autonomous driving dataset built from generative vision-language models
* WETM: A word embedding-based topic model with modified collapsed Gibbs sampling for short text
* Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others, A
* What Affects Learned Equivariance in Deep Image Recognition Models?
* What Can Human Sketches Do for Object Detection?
* What Happened 3 Seconds Ago? Inferring the Past with Thermal Imaging
* What is the Real Need for Scene Text Removal? Exploring the Background Integrity and Erasure Exhaustivity Properties
* What Limits the Performance of Local Self-attention?
* What Makes a Good Data Augmentation for Few-Shot Unsupervised Image Anomaly Detection?
* What You Can Reconstruct from a Shadow
* When Multi-Focus Image Fusion Networks Meet Traditional Edge-Preservation Technology
* Where are they looking in the 3D space?
* Where is My Spot? Few-shot Image Generation via Latent Subspace Optimization
* Where is my Wallet? Modeling Object Proposal Sets for Egocentric Visual Query Localization
* Where We Are and What We're Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes
* Why is the Winner the Best?
* Wide-Angle Rectification via Content-Aware Conformal Mapping
* Wild Face Anti-Spoofing Challenge 2023: Benchmark and Results
* Wildlife Image Generation from Scene Graphs
* WildLight: In-the-wild Inverse Rendering with a Flashlight
* WinCLIP: Zero-/Few-Shot Anomaly Classification and Segmentation
* Wind Direction Extraction from X-Band Marine Radar Images Based on the Attenuation Horizontal Component
* WINNER: Weakly-supervised hIerarchical decompositioN and aligNment for spatio-tEmporal video gRounding
* WIRE: Wavelet Implicit Neural Representations
* Wisdom of Crowds: Temporal Progressive Attention for Early Action Prediction, The
* Within-Camera Multilayer Perceptron DVS Denoising
* WPE: Weighted prototype estimation for few-shot learning
* WPPNets and WPPFlows: The Power of Wasserstein Patch Priors for Superresolution
* WSRD: A Novel Benchmark for High Resolution Image Shadow Removal
* X-Avatar: Expressive Human Avatars
* X-maps: Direct Depth Lookup for Event-based Structured Light Systems
* X-Pruner: eXplainable Pruning for Vision Transformers
* X3KD: Knowledge Distillation Across Modalities, Tasks and Stages for Multi-Camera 3D Object Detection
* XDNet: A Few-Shot Meta-Learning Approach for Cross-Domain Visual Inspection
* YOLO-DCTI: Small Object Detection in Remote Sensing Base on Contextual Transformer Enhancement
* YOLOv3-based human detection and heuristically modified-LSTM for abnormal human activities detection in ATM machine
* YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors
* You Are Catching My Attention: Are Vision Transformers Bad Learners under Backdoor Attacks?
* You Can Ground Earlier than See: An Effective and Efficient Pipeline for Temporal Sentence Grounding in Compressed Videos
* You Do Not Need Additional Priors or Regularizers in Retinex-Based Low-Light Image Enhancement
* You Need Multiple Exiting: Dynamic Early Exiting for Accelerating Unified Vision Language Model
* You Only Segment Once: Towards Real-Time Panoptic Segmentation
* ZBS: Zero-Shot Background Subtraction via Instance-Level Background Modeling and Foreground Selection
* ZEBRA: Explaining rare cases through outlying interpretable concepts
* ZegCLIP: Towards Adapting CLIP for Zero-shot Semantic Segmentation
* Zero-Shot Action Recognition with Transformer-based Video Semantic Embedding
* Zero-shot Classification at Different Levels of Granularity
* Zero-Shot Dual-Lens Super-Resolution
* Zero-Shot Everything Sketch-Based Image Retrieval, and in Explainable Style
* Zero-Shot Generative Model Adaptation via Image-Specific Prompt Learning
* Zero-Shot Model Diagnosis
* Zero-Shot Noise2Noise: Efficient Image Denoising without any Data
* Zero-shot Object Classification with Large-scale Knowledge Graph
* Zero-Shot Object Counting
* Zero-shot Pose Transfer for Unrigged Stylized 3D Characters
* Zero-Shot Predicate Prediction for Scene Graph Parsing
* Zero-shot Referring Image Segmentation with Global-Local Context Features
* Zero-shot temporal event localisation: Label-free, training-free, domain-free
* Zero-Shot Text-to-Parameter Translation for Game Character Auto-Creation
* Zero-shot Unsupervised Transfer Instance Segmentation
* ZippyPoint: Fast Interest Point Detection, Description, and Matching through Mixed Precision Discretization
* Zoom-VQA: Patches, Frames and Clips Integration for Video Quality Assessment
* À-la-carte Prompt Tuning (APT): Combining Distinct Data Via Composable Prompting
3705 for 2309

Index for "2"

Last update:15-Jan-25 15:03:25
Use price@usc.edu for comments.