_ | audio | _ |
3-D | audio | -Visual Corpus of Affective Communication, A |
3D | audio | in human-computer interfaces |
3D | audio | -Visual Speaker Tracking with A Novel Particle Filter |
3D | audio | -Visual Speaker Tracking with A Two-Layer Particle Filter |
Abnormal acoustic event localization based on selective frequency bin in high noise environment for | audio | surveillance |
Abnormal events detection using unsupervised One-Class SVM: Application to | audio | surveillance and evaluation |
ACAV100M: Automatic Curation of Large-Scale Datasets for | audio | -Visual Video Representation Learning |
Accelerating Index-Based | audio | Identification |
Accurate glottal model parametrization by integrating | audio | and high-speed endoscopic video data |
Acoustic Signals, Sounds, | audio | |
Acoustoseismic Method for Buried-Object Detection by Means of Surface-Acceleration Measurements and | audio | Facilities |
Active | audio | -Visual Separation of Dynamic Sound Sources |
Active learning of custom sound taxonomies in unstructured | audio | data |
Active Learning Paradigm for Online | audio | -Visual Emotion Recognition, An |
AD-NeRF: | audio | Driven Neural Radiance Fields for Talking Head Synthesis |
Adaptive | audio | Steganography Based on Advanced Audio Coding and Syndrome-Trellis Coding |
Adaptive | audio | Steganography Based on Advanced Audio Coding and Syndrome-Trellis Coding |
Adaptive context recognition based on | audio | signal |
Adaptive Selection of Embedding Locations for Spread Spectrum Watermarking of Compressed | audio | |
Adaptive Signal Models for Wide-Band Speech and | audio | Compression |
Adaptive Speaker Identification with | audio | -Visual Cues for Movie Content Analysis |
Adaptive Synthesis in Progressive Retrieval of | audio | -Visual Data |
AdVerb: Visually Guided | audio | Dereverberation |
Adversarial-Metric Learning for | audio | -Visual Cross-Modal Matching |
Adversarially Training for | audio | Classifiers |
AENet: Learning Deep | audio | Features for Video Analysis |
Affective | audio | Annotation of Public Speeches with Convolutional Clustering Neural Network |
Affective | audio | -Visual Words and Latent Topic Driving Model for Realizing Movie Affective Scene Classification |
AI-Based human | audio | processing for COVID-19: A comprehensive overview |
AIT 3D | audio | / Visual Person Tracker for CLEAR 2007, The |
AKVSR: | audio | Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model |
AKVSR: | audio | Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model |
Algorithm of Resist Cropping Robust | audio | Watermark Based On Wavelet Transformation, An |
Algorithms for multiplex scheduling of object-based | audio | -visual presentations |
AlignNet: A Unifying Approach to | audio | -Visual Alignment |
Analysis of Lip Geometric Features for | audio | -Visual Speech Recognition |
Analysis of Optimal Search Interval for Estimation of Modified Quantization Step Size in Quantization-Based | audio | Watermark Detection |
Animating Face using Disentangled | audio | Representations |
Annotation-free | audio | -Visual Segmentation |
Appearance Matters, So Does | audio | : Revealing the Hidden Face via Cross-Modality Transfer |
Applying | audio | description for context understanding of surveillance videos by people with visual impairments |
Applying Segment-Level Attention on Bi-Modal Transformer Encoder for | audio | -Visual Emotion Recognition |
approach to immersive | audio | rendering with wave field synthesis for 3D multimedia content, An |
Are Multiple Cross-Correlation Identities better than just Two? Improving the Estimate of Time Differences-of-Arrivals from Blind | audio | Signals |
ARMA digital filter design method for | audio | and musical purposes |
Arousal Recognition Using | audio | -Visual Features and FMRI-Based Brain Response |
Assessment and classification of singing quality based on | audio | -visual features |
Assisted Listening Using a Headset: Enhancing | audio | perception in real, augmented, and virtual environments |
Associating | audio | -visual activity cues in a dominance estimation framework |
ASVFI: | audio | -Driven Speaker Video Frame Interpolation |
Asymmetric Contrastive Learning for | audio | Fingerprinting |
Asymmetric Matching Method for a Robust Binary | audio | Fingerprinting, An |
ATGNN: | audio | Tagging Graph Neural Network |
Attention Based Speaker-independent | audio | -visual Deep Learning Model for Speech Enhancement, An |
Attention Fusion for | audio | -Visual Person Verification Using Multi-Scale Features |
Attention guided deep | audio | -face fusion for efficient speaker naming |
Attention-Guided Neural Networks for Full-Reference and No-Reference | audio | -Visual Quality Assessment |
Atypical Lyrics Completion Considering Musical | audio | Signals |
| audio | and Video Coding Standard Workgroup of China |
| audio | and Video Coding Standard, AVS Coding Issues, Standards |
| audio | and Video-based Emotion Recognition using Multimodal Transformers |
| audio | Assisted Robust Visual Tracking With Adaptive Particle Filtering |
| audio | Based Real-Time Speech Animation of Embodied Conversational Agents |
| audio | classification based on MPEG-7 spectral basis representations |
| audio | Classification in Speech and Music: A Comparison Between a Statistical and a Neural Approach |
| audio | Coding Using Overlap and Kernel Adaptation |
| audio | Copyright Protection Schemes Based on SMM in Cepstrum Domain, An |
| audio | decoding with frequency and complexity scalability |
| audio | effects to enhance spatial information displays |
| audio | Event-Relational Graph Representation Learning for Acoustic Scene Classification |
| audio | Features for Music Emotion Recognition: A Survey |
| audio | Identification by Sampling Sub-fingerprints and Counting Matches |
| audio | Matters in Video Super-Resolution by Implicit Semantic Guidance |
| audio | Matters in Visual Attention |
| audio | Music Genre Classification Using Different Classifiers and Feature Selection Methods |
| audio | Partitioning and Transcription for Broadcast Data Indexation |
| audio | personalization using head related transfer function in 3DTV |
| audio | Postprocessing Detection Based on Amplitude Cooccurrence Vector Feature |
| audio | Properties of Perceived Boundaries in Music |
| audio | Recapture Detection With Convolutional Neural Networks |
| audio | Related Quality of Experience Evaluation in Urban Transportation Environments With Brain Inspired Graph Learning |
| audio | Retrieval With Natural Language Queries: A Benchmark Study |
| audio | Secret Management Scheme Using Shamir's Secret Sharing |
| audio | Segmentation and Speaker Localization in Meeting Videos |
| audio | signal identification via pattern capture and template matching |
| audio | Signal-based Depression Level Prediction Combining Temporal and Spectral Features |
| audio | Soft Declipping Based on Constrained Weighted Least Squares |
| audio | Source Separation Using Variational Autoencoders and Weak Class Supervision |
| audio | Source Separation, Source Localization, Direction of Arrival, DoA, Analysis |
| audio | Surveillance Eye, The |
| audio | Surveillance of Roads: A System for Detecting Anomalous Sounds |
| audio | surveillance using a bag of aural words classifier |
| audio | to Body Dynamics |
| audio | visual isolated Hindi digits recognition using HMM |
| audio | Visual Person Authentication by Multiple Nearest Neighbor Classifiers |
| audio | Visual Scene-Aware Dialog |
| audio | Visual Speaker Verification Based on Hybrid Fusion of Cross Modal Features |
| audio | Watermarking Algorithm Robust to TSM Based on Counter Propagation Neural Network |
| audio | Watermarking Based on Music Content Analysis: Robust against Time Scale Modification |
| audio | Watermarking Scheme Based on Singular-Spectrum Analysis, An |
| audio | watermarking techniques using sinusoidal patterns based on pseudorandom sequences |
| audio | Watermarks, Speech Watermarks |
| audio | - and Gaze-driven Facial Animation of Codec Avatars |
| audio | - and Video-Based Biometric Person Authentication |
| audio | -Adaptive Activity Recognition Across Video Domains |
| audio | -Assisted Movie Dialogue Detection |
| audio | -Based Automatic Generation of a Piano Reduction Score by Considering the Musical Structure |
| audio | -Based Emotion Recognition Enhancement Through Progressive Gans |
| audio | -Based Granularity-Adapted Emotion Classification |
| audio | -Based Machine Learning Model for Traffic Congestion Detection |
| audio | -Based Musical Version Identification: Elements and challenges |
| audio | -based Near-Duplicate Video Retrieval with Audio Similarity Learning |
| audio | -based Near-Duplicate Video Retrieval with Audio Similarity Learning |
| audio | -Based Sports Video Segmentation and Event Detection Algorithm, An |
| audio | -Driven Deformation Flow for Effective Lip Reading |
| audio | -Driven Dubbing for User Generated Contents via Style-Aware Semi-Parametric Synthesis |
| audio | -Driven Emotional Video Portraits |
| audio | -Driven Laughter Behavior Controller |
| audio | -driven Neural Gesture Reenactment with Video Motion Graphs |
| audio | -Driven Robot Upper-Body Motion Synthesis |
| audio | -Driven Stylized Gesture Generation with Flow-Based Model |
| audio | -driven talking face generation with diverse yet realistic facial animations |
| audio | -Driven Talking Face Video Generation With Dynamic Convolution Kernels |
| audio | -Driven Talking Video Frame Restoration |
| audio | -Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment |
| audio | -Facial Laughter Detection in Naturalistic Dyadic Conversations |
| audio | -Guided Video-Based Face Recognition |
| audio | -Noise Power Spectral Density Estimation Using Long Short-Term Memory |
| audio | -Oculomotor Transformation |
| audio | -Video Analysis for Indexing and Classification |
| audio | -Video Analysis of Musical Expressive Intentions |
| audio | -Video Based Emotion Recognition Using Minimum Cost Flow Algorithm |
| audio | -Video detection of the active speaker in meetings |
| audio | -video front-end for multimedia applications, An |
| audio | -Video Integration for Background Modelling |
| audio | -Video Sensor Fusion with Probabilistic Graphical Models |
| audio | -video surveillance system for public transportation |
| audio | -Visual Active Speaker Tracking in Cluttered Indoors Environments |
| audio | -Visual Affect Recognition |
| audio | -Visual Affect Recognition through Multi-Stream Fused HMM for HCI |
| audio | -Visual Affective Expression Recognition Through Multistream Fused HMM |
| audio | -visual attention: Eye-tracking dataset and analysis toolbox |
| audio | -Visual Automatic Group Affect Analysis |
| audio | -visual based emotion recognition-a new approach |
| audio | -visual biometric recognition via joint sparse representations |
| audio | -Visual Biometrics |
| audio | -Visual Class-Incremental Learning |
| audio | -Visual Classification and Fusion of Spontaneous Affective Data in Likelihood Space |
| audio | -Visual Classification of Sports Types |
| audio | -Visual Classification Video Browser |
| audio | -Visual Co-Training for Vehicle Classification |
| audio | -visual content-based violent scene characterization |
| audio | -visual continuous speech recognition using MPEG-4 compliant visual features |
| audio | -Visual Contrastive and Consistency Learning for Semi-Supervised Action Recognition |
| audio | -visual data association for face expression analysis |
| audio | -Visual Data Fusion Using a Particle Filter in the Application of Face Recognition |
| audio | -Visual Deception Detection: DOLOS Dataset and Parameter-Efficient Crossmodal Learning |
| audio | -Visual Efficient Conformer for Robust Speech Recognition |
| audio | -Visual Emotion Analysis Using Semi-Supervised Temporal Clustering with Constraint Propagation |
| audio | -Visual Emotion Recognition in Video Clips |
| audio | -visual emotion recognition using Boltzmann Zippers |
| audio | -visual emotion recognition with boosted coupled HMM |
| audio | -Visual Emotion Recognition With Preference Learning Based on Intended and Multi-Modal Perceived Labels |
| audio | -Visual Emotion, Audiovisual Emotion Recognition |
| audio | -Visual Emotion-Aware Cloud Gaming Framework |
| audio | -visual event classification via spatial-temporal-audio words |
| audio | -visual event classification via spatial-temporal-audio words |
| audio | -Visual Event Detection using Duration Dependent Input Output Markov Models |
| audio | -Visual Event Localization by Learning Spatial and Semantic Co-Attention |
| audio | -Visual Event Localization in Unconstrained Videos |
| audio | -Visual Event Localization via Recursive Fusion by Joint Co-Attention |
| audio | -Visual Event Recognition in Surveillance Video Sequences |
| audio | -Visual Face Reenactment |
| audio | -Visual Feature Fusion for Vehicles Classification in a Surveillance System |
| audio | -Visual Floorplan Reconstruction |
| audio | -visual flow: A variational approach to multi-modal flow estimation |
| audio | -Visual Foreground Extraction for Event Characterization |
| audio | -Visual Gated-Sequenced Neural Networks for Affect Recognition |
| audio | -Visual Glance Network for Efficient Video Recognition |
| audio | -Visual Grouping Network for Sound Localization from Mixtures |
| audio | -visual Hybrid Approach for Filling Mass Estimation |
| audio | -Visual Identity Verification and Robustness to Imposture |
| audio | -Visual Instance Discrimination with Cross-Modal Agreement |
| audio | -Visual Keyword Spotting Based on Multidimensional Convolutional Neural Network |
| audio | -visual Keyword Spotting for Mandarin Based on Discriminative Local Spatial-Temporal Descriptors |
| audio | -Visual Kinship Verification: A New Dataset and a Unified Adaptive Adversarial Multimodal Learning Approach |
| audio | -Visual Mismatch-Aware Video Retrieval via Association and Adjustment |
| audio | -Visual Model Distillation Using Acoustic Images |
| audio | -Visual Particle Flow SMC-PHD Filtering for Multi-Speaker Tracking |
| audio | -Visual Person Authentication with Multiple Visualized-Speech Features and Multiple Face Profiles |
| audio | -Visual Person Verification |
| audio | -Visual Person-of-Interest DeepFake Detection |
| audio | -Visual Predictive Coding for Self-Supervised Visual Representation Learning |
| audio | -visual processing for scene change detection |
| audio | -Visual Quality Assessment for User Generated Content: Database and Method |
| audio | -Visual Recognition System in Compression Domain |
| audio | -visual saliency prediction with multisensory perception and integration |
| audio | -Visual Scene Analysis with Self-Supervised Multisensory Features |
| audio | -Visual Segmentation |
| audio | -visual selection process for the synthesis of photo-realistic talking-head animations |
| audio | -visual sensor fusion approach for feature based vehicle identification, An |
| audio | -visual Sensor Fusion Framework Using Person Attributes Robust to Missing Visual Modality for Person Recognition |
| audio | -visual speaker detection using dynamic Bayesian networks |
| audio | -Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion |
| audio | -Visual Speaker Identification Based on the Use of Dynamic Audio and Visual Features |
| audio | -Visual Speaker Identification Based on the Use of Dynamic Audio and Visual Features |
| audio | -visual speaker identification using coupled hidden Markov models, A |
| audio | -Visual Speaker Identification via Adaptive Fusion Using Reliability Estimates of Both Modalities |
| audio | -visual speaker identification with multi-view distance metric learning |
| audio | -Visual Speaker Localization Using Graphical Models |
| audio | -visual speaker tracking with importance particle filters |
| audio | -Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis |
| audio | -Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis |
| audio | -Visual Speech Fusion Using Coupled Hidden Markov Models |
| audio | -Visual Speech Recognition Based on AAM Parameter and Phoneme Analysis of Visual Feature |
| audio | -Visual Speech Recognition Scheme Based on Wavelets and Random Forests Classification |
| audio | -visual speech recognition techniques in augmented reality environments |
| audio | -Visual Speech Recognition Using A Two-Step Feature Fusion Strategy |
| audio | -Visual Speech Recognition Using MPEG-4 Compliant Visual Features |
| audio | -visual speech synchronization detection using a bimodal linear prediction model |
| audio | -Visual Speech Synthesis Based on Chinese Visual Triphone |
| audio | -Visual System for Object-Based Audio: From Recording to Listening, An |
| audio | -Visual System for Object-Based Audio: From Recording to Listening, An |
| audio | -Visual Temporal Saliency Modeling Validated by fMRI Data |
| audio | -Visual Tracking of Concurrent Speakers |
| audio | -Visual Transformer Based Crowd Counting |
| audio | -Visual Unit Selection for the Synthesis of Photo-Realistic Talking-Heads |
| audio | 2Gestures: Generating Diverse Gestures from Speech Audio with Conditional Variational Autoencoders |
| audio | ScopeV2: Audio-Visual Attention Architectures for Calibrated Open-Domain On-Screen Sound Separation |
| audio | visual Spatial-Audio Analysis by Means of Sound Localization and Imaging: A Multimedia Healthcare Framework in Abdominal Sound Mapping |
| audio | visual Transformer with Instance Attention for Audio-visual Event Localization |
Auditory Distance Rendering Based on ICPD Control for Stereophonic 3D | audio | System |
AutoAD II: The Sequel - Who, When, and What in Movie | audio | Description |
Automated | audio | -visual Activity Analysis |
Automated detection of errors and quality issues in | audio | -visual content |
Automated MPEG | audio | -video summarization and description |
Automatic annotation of tennis games: An integration of | audio | , vision, and learning |
Automatic | audio | Feature Extraction for Keyword Spotting |
Automatic | audio | -Visual Fusion for Aggression Detection Using Meta-information |
Automatic Detection and Removal of Impulsive Noise in | audio | Signals |
Automatic Music Stretching Resistance Classification Using | audio | Features and Genres |
Autonomous | audio | -Supported Learning of Visual Classifiers for Traffic Monitoring |
AV-GAZE: A Study on the Effectiveness of | audio | Guided Visual Attention Estimation for Non-profilic Faces |
AVE-CLIP: | audio | CLIP-based Multi-window Temporal Transformer for Audio Visual Event Localization |
AVFace: Towards Detailed | audio | -Visual 4D Face Reconstruction |
AVGZSLNet: | audio | -Visual Generalized Zero-Shot Learning by Reconstructing Label Features from Multi-Modal Embeddings |
AVS: Scientific Research Community | audio | -Visual Systems |
AWEAR 2.0 system: Omni-directional | audio | -visual data acquisition and processing |
Backdoor Attacks against Deep Neural Networks by Personalized | audio | Steganography |
Ball Hit Detection in Table Tennis Games Based on | audio | Analysis |
Ballroom Dance Recognition from | audio | Recordings |
BAUM-1: A Spontaneous | audio | -Visual Face Database of Affective and Mental States |
Bayesian Approach to | audio | -Visual Speaker Identification, A |
Bayesian Blind Identification of Nonlinear Distortion with Memory for | audio | Applications |
Be Everywhere - Hear Everything (BEE): | audio | Scene Reconstruction by Sparse Audio-Visual Samples |
Be Everywhere - Hear Everything (BEE): | audio | Scene Reconstruction by Sparse Audio-Visual Samples |
Beat Synchronous Dance Animation Based on Visual Analysis of Human Motion and | audio | Analysis of Music Tempo |
Beyond | audio | and video retrieval: Topic-oriented multimedia summarization |
Beyond | audio | and video retrieval: Towards multimedia summarization |
Beyond Mono to Binaural: Generating Binaural | audio | from Mono Audio with Depth and Cross Modal Attention |
Beyond Mono to Binaural: Generating Binaural | audio | from Mono Audio with Depth and Cross Modal Attention |
Bi-modal First Impressions Recognition Using Temporally Ordered Deep | audio | and Stochastic Visual Features |
Bimodal fusion in | audio | -visual speech recognition |
Biometric Based Unique Key Generation for Authentic | audio | Watermarking |
biometric-based verification system for handwritten image-based signatures using | audio | to image matching, A |
Bird Sounds, Bird Song, Birds | audio | , Identification |
BirdSoundsDenoising: Deep Visual | audio | Denoising for Bird Sounds |
Blind | audio | watermark decoding using independent component analysis |
Blind | audio | -Visual Localization and Separation via Low-Rank and Sparsity |
BLTRCNN-Based 3-D Articulatory Movement Prediction: Learning Articulatory Synchronicity From Both Text and | audio | Inputs |
Boosted | audio | -visual HMM for speech reading |
Boosting and structure learning in dynamic Bayesian networks for | audio | -visual speaker detection |
Boosting | audio | chord estimation using multiple classifiers |
Boosting Positive Segments for Weakly-Supervised | audio | -Visual Video Parsing |
brand new application of visual- | audio | fingerprints: Estimating the position of the pirate in a theater-A case study, A |
C-GCN: Correlation Based Graph Convolutional Network for | audio | -Video Emotion Recognition |
Camera Pose Estimation and Localization with Active | audio | Sensing |
Can | audio | -visual integration strengthen robustness under multimodal attacks? |
Car crashes detection by | audio | analysis in crowded roads |
Cartesian Genetic Programming Parameterization in the Context of | audio | Synthesis |
Cascade classifiers trained on gammatonegrams for reliably detecting | audio | events |
Cascaded Siamese Self-supervised | audio | to Video GAN |
CASP-Net: Rethinking Video Saliency Prediction from an | audio | -Visual Consistency Perceptual Perspective |
CASSANDRA: | audio | -video sensor fusion for aggression detection |
Casual Conversations v2 Dataset: A diverse, large benchmark for measuring fairness and robustness in | audio | /vision/speech models, The |
CATNet: Cross-modal fusion for | audio | -visual speech recognition |
Challenges in | audio | Processing of Terrorist-Related Data |
Channel Capacity Analysis of the Generalized Spread Spectrum Watermarking in | audio | Signals |
Channel Capacity Analysis of the Multiple Orthogonal Sequence Spread Spectrum Watermarking in | audio | Signals |
Class Confusability Reduction in | audio | -Visual Speech Recognition Using Random Forests |
Class-Incremental Grouping Network for Continual | audio | -Visual Learning |
Classification of | audio | Signals in All-Night Sleep Studies |
Classification of | audio | Signals Using Fuzzy C-Means with Divergence-Based Kernel |
Classification of general | audio | data for content-based retrieval |
Classifying | audio | of movies by a multi-expert system |
cloud infrastructure for target detection and tracking using | audio | and video fusion, A |
Clustering and Visualizing | audio | -Visual Dataset on Mobile Devices in a Topic-Oriented Manner |
Coding, Compression, Acoustic Signals, Sounds, | audio | |
Coherent bag-of | audio | words model for efficient large-scale video copy detection |
Collaborative Interface for Multimodal Ink and | audio | Documents, A |
Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised | audio | -Visual Event Perception |
Combined | audio | Visual Recognition and Analysis |
Combined | audio | Visual Speaker Tracking |
Combined Rule-Based Machine Learning | audio | -Visual Emotion Recognition Approach, A |
Combining visual and acoustic features for | audio | classification tasks |
Commentary Paper on Person Tracking With | audio | -Visual Cues Using the Iterative Decoding Framework |
Comparative Error Analysis of | audio | -Visual Source Localization, A |
Comparative Study of Different Segmentation Approaches for | audio | Track Indexing, A |
comparative study on automatic | audio | -visual fusion for aggression detection using meta-information, A |
comparison of extended fingerprint hashing and locality sensitive hashing for binary | audio | fingerprints, A |
Comparison of MPEG-4 Facial Animation Parameter Groups with Respect to | audio | -Visual Speech Recognition Performance |
Complementary Cues from | audio | Help Combat Noise in Weakly-Supervised Object Detection |
Complementary video and | audio | analysis for broadcast news archives |
Complex and Quaternionic Principal Component Pursuit and Its Application to | audio | Separation |
Compositional Models for | audio | Processing: Uncovering the structure of sound mixtures |
Comprehensive Survey on Video Saliency Detection With Auditory Information: The | audio | -Visual Consistency Perceptual is the Key!, A |
Compressing | audio | Signals with Inpainting-Based Sparsification |
Compression enhancement of video motion of mouth region using joint | audio | and video coding |
Computer Vision for | audio | -Visual Media |
Concealing Fingerprint-Biometric Data into | audio | Signals for Identify Authentication |
Conditional Generation of | audio | from Video via Foley Analogies |
Conditional Random Fields in Speech, | audio | , and Language Processing |
Conducting | audio | Files via Computer Vision |
confidence-based late fusion framework for | audio | -visual biometric identification, A |
Consistent Wiener Filtering for | audio | Source Separation |
consumer video search system by | audio | -visual concept classification, A |
Content-Adaptive Analysis and Representation Framework for | audio | Event Discovery from Unscripted Multimedia, A |
Content-Based | audio | Classification Using Support Vector Machines and Independent Component Analysis |
Content-based | audio | retrieval with relevance feedback |
Content-Based Movie Analysis and Indexing Based on | audio | -Visual Cues |
Content-based video parsing and indexing based on | audio | -visual interaction |
Continuous | audio | -Visual Speech Recognition |
Continuous Emotion Recognition using Visual- | audio | -linguistic Information: A Technical Report for ABAW3 |
Continuous Emotion Recognition with | audio | -visual Leader-follower Attentive Fusion |
Contrastive Positive Sample Propagation Along the | audio | -Visual Event Line |
Cooperative Game Modeling With Weighted Token-Level Alignment for | audio | -Text Retrieval |
Coordinated Joint Multimodal Embeddings for Generalized | audio | -Visual Zero-shot Classification and Retrieval of Videos |
Correlation of Gestural Musical | audio | Cues and Perceived Expressive Qualities |
Cost-effective solution to synchronised | audio | -visual data capture using multiple sensors |
Cost-Effective Solution to Synchronized | audio | -Visual Capture Using Multiple Sensors |
Cost-Sensitive Multi-Label Learning for | audio | Tag Annotation and Retrieval |
Creating | audio | -centric, image-centric, and integrated audio-visual summaries |
Creating | audio | -centric, image-centric, and integrated audio-visual summaries |
Cross Attentional | audio | -Visual Fusion for Dimensional Emotion Recognition |
Cross-Domain Deep Feature Combination for Bird Species Classification with | audio | -Visual Data |
Cross-modal Background Suppression for | audio | -Visual Event Localization |
Cross-modal Deep Learning Applications: | audio | -visual Retrieval |
Cross-modal Embeddings for Video and | audio | Retrieval |
Cross-Referencing Self-Training Network for Sound Event Detection in | audio | Mixtures |
Crossmodal Matching of Speakers Using Lip and Voice Features in Temporally Non-overlapping | audio | and Video Streams |
Current Developments and Future Trends in | audio | Authentication |
Data Hiding in MPEG Compressed | audio | Using Wet Paper Codes |
Data-Driven Approach to | audio | Decorrelation, A |
DAVD-Net: Deep | audio | -Aided Video Decompression of Talking Heads |
DBATES: Dataset for Discerning Benefits of | audio | , Textual, and Facial Expression Features in Competitive Debate Speeches |
DCAR: A Discriminative and Compact | audio | Representation for Audio Processing |
DCAR: A Discriminative and Compact | audio | Representation for Audio Processing |
Decision Level Fusion for | audio | -Visual Speech Recognition in Noisy Conditions |
Deep | audio | -Visual Beamforming for Speaker Localization |
Deep | audio | -Visual Fusion Neural Network for Saliency Estimation |
Deep | audio | -Visual Speech Recognition |
Deep Boltzmann Machines for i-Vector Based | audio | -Visual Person Identification |
Deep emotion recognition based on | audio | -visual correlation |
Deep Learning for | audio | -Based Music Classification and Tagging: Teaching Computers to Distinguish Rock from Bach |
Deep Learning-based Video Retrieval Using Object Relationships and Associated | audio | Classes |
Deep Neural Network Based 3D Articulatory Movement Prediction Using Both Text and | audio | Inputs |
Deep Neural Networks for Full-Reference and No-Reference | audio | -Visual Quality Assessment |
DeepComboSAD: Spectro-Temporal Correlation Based Speech Activity Detection for Naturalistic | audio | Streams |
Deepfake Video Detection Using | audio | -visual Consistency |
Deepfakes | audio | Detection Leveraging Audio Spectrogram and Convolutional Neural Networks |
Deepfakes | audio | Detection Leveraging Audio Spectrogram and Convolutional Neural Networks |
Demonstration of an HMM-based photorealistic expressive | audio | -visual speech synthesis system |
Denoising of | audio | Data by Nonlinear Diffusion |
Dense Modality Interaction Network for | audio | -Visual Event Localization |
Dense-Localizing | audio | -Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline |
Derivative-Based Steganographic Distortion and its Non-additive Extensions for | audio | |
Design and implementation of MPEG | audio | layer III decoder using graphics processing units |
Design of MP3 | audio | Signal System Based on DSP |
Designing an Interactive | audio | Interface for Climate Science |
Detect | audio | -Video Temporal Synchronization Errors in Advertisements (Ads) |
Detecting and indexing moving objects for Behavior Analysis by Video and | audio | Interpretation |
Detecting Fake-Quality WAV | audio | Based on Phase Differences |
Detecting Group Turn Patterns in Conversations Using | audio | -Video Change Scale-Space |
Detecting Hubs in Music | audio | Based on Network Analysis |
Detecting information-hiding in WAV | audio | s |
Detecting local | audio | -visual synchrony in monologues utilizing vocal pitch and facial landmark trajectories |
Detecting News Reporting Using | audio | /Visual Information |
Detecting Replay Attacks Using Multi-Channel | audio | : A Neural Network-Based Method |
Detecting road surface wetness from | audio | : A deep learning approach |
Detecting Semantic Concepts Using Context and | audio | /Visual Features |
Development of an Estimation Model for Instantaneous Presence in | audio | -Visual Content |
Device and method for dubbing an | audio | -visual presentation which generates synthesized speech and corresponding facial movements |
Diff2Lip: | audio | Conditioned Diffusion Models for Lip-Synchronization |
DiffTalk: Crafting Diffusion Models for Generalized | audio | -Driven Portraits Animation |
Digit Recognition Applied to Reconstructed | audio | Signals Using Deep Learning |
Digital | audio | watermarking method based on wavelet transform |
Discovering joint | audio | -visual codewords for video event detection |
Discovering meaningful multimedia patterns with | audio | -visual concepts and associated text |
Discriminative Collaborative Representation and Its Application to | audio | Signal Classification |
Discriminative Cross-Modality Attention Network for Temporal Inconsistent | audio | -Visual Event Localization |
Distilling | audio | -Visual Knowledge by Compositional Contrastive Learning |
Distributed | audio | Network for Speech Enhancement in Challenging Noise Backgrounds |
Domain Generalization through | audio | -Visual Relative Norm Alignment in First Person Action Recognition |
DSP Restoration Techniques for | audio | |
Dual Attention Matching for | audio | -Visual Event Localization |
Dual Perspective Network for | audio | -Visual Event Localization |
Dual-modality Talking-metrics: 3D Visual- | audio | Integrated Behaviometric Cues from Speakers |
Dynamic 2D/3D Speaking Face Dataset with Synchronized | audio | |
Dynamic | audio | -Visual Mapping using Fused Hidden Markov Model Inversion Method |
Dynamic Bayesian Networks for | audio | -Visual Speaker Recognition |
Dynamic Bayesian Networks for | audio | -Visual Speech Recognition |
EAVA: A 3D Emotive | audio | -Visual Avatar |
Effective News Anchorperson Shot Detection Method Based on Adaptive | audio | /Visual Model Generation, An |
Effective Pseudonoise Sequence and Decoding Function for Imperceptibility and Robustness Enhancement in Time-Spread Echo-Based | audio | Watermarking |
Effective Watermarking of Digital | audio | and Image Using Matlab Technique |
Effects of ATM network impairments on | audio | -visual broadcast applications |
Effects of | audio | Compression on Chord Recognition |
Efficient | audio | Rendering Using Angular Region-Wise Source Enhancement for 360° Video |
Efficient Cascaded Filtering Retrieval Method for Big | audio | Data, An |
Efficient Emotional Adaptation for | audio | -Driven Talking-Head Generation |
Efficient Implementation of the Forward and Inverse MDCT in MPEG | audio | Coding, An |
Efficient Parallel | audio | Generation Using Group Masked Language Modeling |
Efficient video coding based on | audio | -visual focus of attention |
Egocentric | audio | -Visual Object Localization |
Egocentric Deep Multi-Channel | audio | -Visual Active Speaker Localization |
EM Algorithms for Weighted-Data Clustering with Application to | audio | -Visual Scene Analysis |
EM Estimation of Scale Factor for Quantization-Based | audio | Watermarking |
EMMN: Emotional Motion Memory Network for | audio | -driven Emotional Talking Face Generation |
Emotion Analysis Using | audio | /Video, EMG and EEG: A Dataset and Comparison Study |
Emotion Recognition Based on Joint Visual and | audio | Cues |
Emotional Tone-Based | audio | Continuous Emotion Recognition |
Empirical Study of | audio | -Visual Features Fusion for Gait Recognition |
Empirical Study of Feature Extraction Methods for | audio | Classification, An |
Energy and Computation Efficient | audio | -Visual Voice Activity Detection Driven by Event-Cameras |
ENF Detection in | audio | Recordings via Multi-Harmonic Combining |
Enhancing | audio | surveillance with hierarchical recurrent neural networks |
Enhancing Transferability of Adversarial | audio | in Speaker Recognition Systems |
Ensemble of Rejecting Classifiers for Anomaly Detection of | audio | Events, An |
eNTERFACE-05 | audio | -Visual Emotion Database, The |
Environmental Sound Classification Using Local Binary Pattern and | audio | Features Collaboration |
EPG2S: Speech Generation and Speech Enhancement Based on Electropalatography and | audio | Signals Using Multimodal Learning |
EPIC-Fusion: | audio | -Visual Temporal Binding for Egocentric Action Recognition |
ERANNs: Efficient residual | audio | neural networks for audio pattern recognition |
ERANNs: Efficient residual | audio | neural networks for audio pattern recognition |
Error Weighted Semi-Coupled Hidden Markov Model for | audio | -Visual Emotion Recognition |
Escape from the Dark Jungle: A 3D | audio | Game for Emotion Regulation |
Estimating Cohesion in Small Groups Using | audio | -Visual Nonverbal Behavior |
Estimating Rainfall from Surveillance | audio | Based on Parallel Network with Multi-Scale Fusion and Attention Mechanism |
Evaluating | audio | skimming and frame rate acceleration for summarizing BBC rushes |
Event Detection in Field Sports Video Using | audio | -Visual Features and a Support Vector Machine |
Event-Specific | audio | -Visual Fusion Layers: A Simple and New Perspective on Video Understanding |
Exploiting evidential theory in the fusion of textual, | audio | , and visual modalities for affective music video retrieval |
Exploiting the Complementarity of | audio | and Visual Data in Multi-speaker Tracking |
Exploiting Visual- | audio | -Textual Characteristics for Automatic TV Commercial Block Detection and Segmentation |
Exploring | audio | Compression as Image Completion in Time-frequency Domain |
Exploring Co-Occurence Between Speech and Body Movement for | audio | -Guided Video Localization |
Exploring Heterogeneous Clues for Weakly-Supervised | audio | -Visual Video Parsing |
Exploring the Resolution Limit for In-Air Synthetic-Aperture | audio | Imaging |
Exploring the Topics of | audio | Words for Detecting Alzheimer's Disease From Spontaneous Speech |
Exponential Hyperbolic Cosine Robust Adaptive Filters for | audio | Signal Processing |
Expressive Talking Head Generation with Granular | audio | -Visual Control |
Extracting High Level Semantics by Means of Speech, | audio | , and Image Primitives in Surveillance Applications |
Extracting Semantic Information from Basketball Video Based on | audio | -Visual Features |
Fast Conversion Algorithm for the Dolby Digital (Plus) AC-3 | audio | Coding Standards |
Fast Mode Decision Algorithm for Intra Encoding of the 3rd Generation | audio | Video Coding Standard |
Feature Analysis for | audio | Classification |
Feature contours fusion for determining segment boundaries in | audio | data |
Feature fluctuation absorption for a quick | audio | retrieval from long recordings |
Feature space video stream consistency estimation for dynamic stream weighting in | audio | -visual speech recognition |
Few-Shot Class-Incremental | audio | Classification Using Dynamically Expanded Classifier With Self-Attention Modified Prototypes |
Finding Fallen Objects Via Asynchronous | audio | -Visual Integration |
Fingerprint extraction of | audio | signal using wavelet transform |
Flow-guided One-shot Talking Face Generation with a High-resolution | audio | -visual Dataset |
Formant-based acoustic features for cow's estrus detection in | audio | surveillance system |
FPGA-based real-time MFCC extraction for automatic | audio | indexing on FM broadcast data |
Frame-Independent and Parallel Method for 3D | audio | Real-Time Rendering on Mobile Devices |
framework for estimating geometric distortions in video copies based on visual- | audio | fingerprints, A |
Free Viewpoint Image Generation Synchronized with Free Listening-Point | audio | for 3-D Real Space Navigation |
Frequency Domain Long-Term Prediction for Low Delay General | audio | Coding |
From Blind to Guided | audio | Source Separation: How models and side information can improve the separation of sound |
From Horspiel to | audio | Fiction: Sound Design Perspectives for Blind and Visually Impaired People |
Fully automatic face recognition system using a combined | audio | -visual approach |
Fusing | audio | and Visual Features of Speech |
Fusing | audio | -Visual Nonverbal Cues to Detect Dominant People in Group Conversations |
Fusion of | audio | and Video Information for Multi Modal Person Authentication |
Fusion of | audio | and visual cues for laughter detection |
Fusion of | audio | - and Visual Cues for Real-Life Emotional Human Robot Interaction |
Fusion of | audio | -Visual Information for Integrated Speech Processing |
Fusion of classifier predictions for | audio | -visual emotion recognition |
Fwobble: Continuous | audio | -haptic feedback for balance control, The |
Gammatone Cepstral Coefficients: Biologically Inspired Features for Non-Speech | audio | Classification |
Generalizing AUC Optimization to Multiclass Classification for | audio | Segmentation With Limited Training Data |
Generating Adversarial Examples in | audio | Classification with Generative Adversarial Network |
Generation of sports highlights using motion activity in combination with a common | audio | feature extraction framework |
Generative Model Driven Representation Learning in a Hybrid Framework for Environmental | audio | Scene and Sound Event Recognition |
Generic content-based | audio | indexing and retrieval framework |
Genre-Adaptive Semantic Computing and | audio | -Based Modelling for Music Mood Annotation |
Geometric Invariant | audio | Watermarking Based on an LCM Feature |
Gestural Interactions for Multi-parameter | audio | Control and Audification |
GLAVNet: Global-Local | audio | -Visual Cues for Fine-Grained Material Recognition |
Glitch in the matrix: A large scale benchmark for content driven | audio | -visual forgery detection and localization |
Global Affective Video Content Regression Based on Complementary | audio | -visual Features |
Goal detection in soccer video using | audio | /visual keywords |
Graph Attention for Automated | audio | Captioning |
Graph Fourier Transform Based | audio | Zero-Watermarking |
Group Feature Selection for | audio | -Based Video Genre Classification |
Group Masked Model Learning for General | audio | Representation |
HAMEX: A Handwritten and | audio | Dataset of Mathematical Expressions |
Handwritten and | audio | Information Fusion for Mathematical Symbol Recognition |
Heterogeneous Networks for | audio | and Video: Using IEEE 802.1 Audio Video Bridging |
Heterogeneous Networks for | audio | and Video: Using IEEE 802.1 Audio Video Bridging |
Heuristic Attack Method to PRH-Based | audio | Copy Detectors, A |
HHT-based | audio | coding |
Hiding Video in | audio | via Reversible Generative Models |
Hierarchical | audio | -visual cue integration framework for activity analysis in intelligent meeting rooms |
Hierarchical | audio | -Visual Surveillance for Passenger Elevators |
Hierarchical Model For Long-Length Video Summarization With Adversarially Enhanced | audio | /Visual Features |
High Scrambling Degree in | audio | Through Imitation of an Unintelligible Signal |
High-Level Feature Extraction Using SIFT GMMs and | audio | Models |
Highly Efficient | audio | Coding With Blind Spectral Recovery Based on Machine Learning |
Highly Transparent and Secure Scheme for Concealing Text Within | audio | |
HMM Based Falling Person Detection Using Both | audio | and Video |
Horror film genre typing and scene labeling via | audio | analysis |
Hough transform-based mouth localization for | audio | -visual speech recognition |
How to Design a Three-Stage Architecture for | audio | -Visual Active Speaker Detection in the Wild |
Htad: A Home-tasks Activities Dataset with Wrist-accelerometer and | audio | Features |
Human emotion recognition from videos using spatio-temporal and | audio | features |
Human interaction categorization by using | audio | -visual cues |
Human Perception of | audio | -Visual Synthetic Character Emotion Expression in the Presence of Ambiguous and Conflicting Information |
hybrid visual feature extraction method for | audio | -visual speech recognition, A |
Hyperbolic | audio | -visual Zero-shot Learning |
Identification of Sparse | audio | Tampering Using Distributed Source Coding and Compressive Sensing Techniques |
Identification of story units in | audio | -visual sequences by joint audio and video processing |
Identification of story units in | audio | -visual sequences by joint audio and video processing |
Identification of Successive Correlated Camera Shots Using | audio | and Video Information |
Identifying Colombian Bird Species from | audio | Recordings |
Identifying dominant people in meetings from | audio | -visual sensors |
Identifying Human Behaviors Using Synchronized | audio | -Visual Cues |
IEEE Standards for Advanced | audio | and Video Coding in Emerging Applications |
Image and | audio | Sequence Visualization and Interaction Mechanisms for Structured Video Browsing and Editing |
Image/Video/ | audio | Quality in Computer Vision and Generative AI |
Image2 | audio | : Facilitating Semi-supervised Audio Emotion Recognition with Facial Expression Image |
Impact of | audio | on Subjective Assessment of Video Quality |
Impact of | audio | on Subjective Assessment of Video Quality in Videoconferencing Applications |
Improved | audio | -Visual Speaker Recognition via the Use of a Hybrid Combination Strategy |
Improved High Capacity Spread Spectrum-Based | audio | Watermarking by Hadamard Matrices |
Improved Soccer Action Spotting using both | audio | and Video Streams |
Improving accuracy in behaviour identification for content-based retrieval by using | audio | and video information |
Improving | audio | Steganalysis Using Deep Residual Networks |
Improving mix-and-separate training in | audio | -visual sound source separation with an object prior |
Improving user verification in human-robot interaction from | audio | or image inputs through sample quality assessment |
Improving videophone subjective quality using | audio | information |
Incorporating | audio | Signals into Constructing a Visual Saliency Map |
Increasing Robustness of an Improved Spread Spectrum | audio | Watermarking Method Using Attack Characterization |
Indexing | audio | visual databases through joint audio and video processing |
Indexing of multilingual news telecast using | audio | -visual keywords |
Information-Geometric Approach to Real-Time | audio | Segmentation, An |
Instant Mobile Video Search With Layered | audio | -Video Indexing and Progressive Transmission |
Instantaneous Evaluation of the Sense of Presence in | audio | -Visual Content |
integrated decoding framework for | audio | watermark extraction, An |
Integrating LDV | audio | and IR Video for Remote Multimodal Surveillance |
Integrating Visual, | audio | and Text Analysis for News Video |
Integration of 3D | audio | and 3D video for FTV |
Integration of | audio | and visual information for content-based video segmentation |
Integration of | audio | /Visual Information for Use in Human-Computer Intelligent Interaction |
Interactive 3-D | audio | System With Loudspeakers, An |
Interactive Multi-View Video and View-Dependent | audio | Under MPEG-21 DIA (Digital Item Adaptation) |
Introduction to the Special Issue on | audio | and Video Analysis for Multimedia Interactive Services |
Introduction to the Special Issue: Advances on pattern recognition for speech and | audio | processing |
Investigating Blind User Preference on Tactile Symbols for Landmarks on | audio | -Tactile Map |
investigation on MPEG | audio | segmentation by evolutionary algorithms, An |
Investigations into the Robustness of | audio | -Visual Gender Classification to Background Noise and Illumination Effects |
iQuery: Instruments as Queries for | audio | -Visual Sound Separation |
ISLA: Temporal Segmentation and Labeling for | audio | -Visual Emotion Recognition |
ISNN: Impact Sound Neural Network for | audio | -Visual Object Classification |
Joint | audio | -video Object Tracking |
Joint | audio | -video people tracking using belief theory |
Joint | audio | -visual bi-modal codewords for video event detection |
Joint | audio | -Visual Deepfake Detection |
Joint | audio | -Visual Tracking Using Particle Filters |
Joint Cross-Attention Model for | audio | -Visual Fusion in Dimensional Emotion Recognition, A |
Joint Inversion of | audio | -Magnetotelluric and Seismic Travel Time Data With Deep Learning Constraint |
Joint Object-Material Category Segmentation from | audio | -Visual Cues |
Joint Visual and | audio | Learning for Video Highlight Detection |
Joint-Modal Label Denoising for Weakly-Supervised | audio | -Visual Video Parsing |
KAN-AV dataset for | audio | -visual face and speech analysis in the wild |
Kernel Fusion of | audio | and Visual Information for Emotion Recognition |
Known-Artist Live Song Identification Using | audio | Hashprints |
Laboratory and Crowdsourcing Studies of Lip Sync Effect on the | audio | -Video Quality Assessment for Videoconferencing Application |
Language-Guided | audio | -Visual Source Separation via Trimodal Consistency |
Large Scale | audio | -Visual Video Analytics Platform for Forensic Investigations of Terroristic Attacks |
Large Vocabulary | audio | -visual Speech Recognition Using Active Shape Models |
Large Vocabulary | audio | -Visual Speech Recognition Using the Janus Speech Recognition Toolkit |
Latent topic model for | audio | retrieval |
LAVSS: Location-Guided | audio | -Visual Spatial Audio Separation |
LAVSS: Location-Guided | audio | -Visual Spatial Audio Separation |
Learning Affective Features With a Hybrid Deep Model for | audio | -Visual Emotion Recognition |
Learning Algorithms for | audio | and Video Processing: Independent Component Analysis and Support Vector Machine Based Approaches |
Learning | audio | and image representations with bio-inspired trainable feature extractors |
Learning | audio | -Video Modalities from Image Captions |
Learning | audio | -Visual Source Localization via False Negative Aware Contrastive Learning |
Learning Contextually Fused | audio | -Visual Representations for Audio-Visual Speech Recognition |
Learning Contextually Fused | audio | -Visual Representations for Audio-Visual Speech Recognition |
Learning Self-supervised | audio | -Visual Representations for Sound Recommendations |
Learning to Answer Questions in Dynamic | audio | -Visual Scenarios |
Learning to Predict Salient Faces: A Novel Visual- | audio | Saliency Model |
Learning Visual Styles from | audio | -Visual Associations |
Let's Play Music: | audio | -Driven Performance Video Generation |
Level Ratio Based Inter and Intra Channel Prediction with Application to Stereo | audio | Frame Loss Concealment |
Leveraging Acoustic Images for Effective Self-supervised | audio | Representation Learning |
Leveraging recent advances in deep learning for | audio | -Visual emotion recognition |
Leveraging TCN and Transformer for effective visual- | audio | fusion in continuous emotion recognition |
Leveraging the Video-Level Semantic Consistency of Event for | audio | -Visual Event Localization |
Lifelog Scene Change Detection Using Cascades of | audio | and Video Detectors |
Linear Dynamic Range Reduction of Musical | audio | Using an Allpass Filter Chain |
Lip2Vec: Efficient and Robust Visual Speech Recognition via Latent-to-Latent Visual to | audio | Representation Mapping |
Listen and Look: | audio | -Visual Matching Assisted Speech Source Separation |
Listen to Look: Action Recognition by Previewing | audio | |
Listen to Your Face: Inferring Facial Action Units from | audio | Channel |
Live Version Identification with | audio | Scene Detection |
Local AM/FM Parameters Estimation: Application to Sinusoidal Modeling and Blind | audio | Source Separation |
Local Information Assisted Attention-Free Decoder for | audio | Captioning |
Localize to Binauralize: | audio | Spatialization from Visual Sound Source Localization |
Look who's talking: Speaker detection using video and | audio | correlation |
Looking and Hearing Into Details: Dual-Enhanced Siamese Adversarial Network for | audio | -Visual Matching |
Looking into Your Speech: Learning Cross-modal Affinity for | audio | -visual Speech Separation |
Low bit rate | audio | -visual communication having improved face and lip region detection |
Low Cost Force-Feedback Interaction with Haptic Digital | audio | Effects |
Low-Complexity Linear-Phase Graphic | audio | Equalizer Based on IFIR Filters, A |
Low-Spec Extendable GPU-Based | audio | Library, A |
MAD: A Scalable Dataset for Language Grounding in Videos from Movie | audio | Descriptions |
Making a scene: alignment of complete sets of clips based on pairwise | audio | match |
MALip: Modal Amplification Lipreading based on reconstructed | audio | features |
Mead: A Large-scale | audio | -visual Dataset for Emotional Talking-face Generation |
Mean-Shift and Sparse Sampling-Based SMC-PHD Filtering for | audio | Informed Visual Speaker Tracking |
Method and apparatus for enhancing and indexing video and | audio | signals |
Method and apparatus for producing | audio | -visual synthetic speech |
Method and apparatus for summarizing and indexing the contents of an | audio | -visual presentation |
Method and apparatus for tracking moving objects using combined video and | audio | information in video conferencing and other applications |
Method and system for generating facial animation values based on a combination of visual and | audio | information |
Method of and apparatus for animation, driven by an | audio | signal, of a synthesized model of a human face |
Method of | audio | Watermarking Based on Adaptive Phase Modulation |
Methods and apparatuses for segmenting an | audio | -visual recording using image similarity searching and audio speaker recognition |
Methods and apparatuses for segmenting an | audio | -visual recording using image similarity searching and audio speaker recognition |
Metric Learning-Based Multimodal | audio | -Visual Emotion Recognition |
Microphone Arrays as Generalized Cameras for Integrated | audio | Visual Processing |
Minimal test collections for low-cost evaluation of | audio | Music Similarity and Retrieval systems |
MixSpeech: Cross-Modality Self-Learning with | audio | -Visual Stream Mixup for Visual Speech Translation and Recognition |
MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint | audio | and Video Generation |
MODA: Mapping-Once | audio | -driven Portrait Animation with Dual Attentions |
Modeling Label Dependencies for | audio | Tagging With Graph Convolutional Network |
Modelling Stochastic Context of | audio | -Visual Expressive Behaviour With Affective Processes |
Modification of Polar Echo Kernel for Performance Improvement of | audio | Watermarking |
Mood detection analyzing lyrics and | audio | signal based on deep learning architectures |
Motion estimation using | audio | and video fusion |
Move2Hear: Active | audio | -Visual Source Separation |
Movie genre classification by exploiting | audio | -visual features of previews |
MPEG Digital | audio | -Coding and Video-Coding Standards |
MPEG Standards for Compressed Representation of Immersive | audio | |
MPEG-1 | audio | Real-Time Encoding System |
MPEG-1, Coding of Moving Pictures and Associated | audio | for Digital Storage Media at Up to About 1.5 mbit/s |
MPEG-4 natural | audio | coding |
MPEG-4 Systems and Description Languages: A Way Ahead in | audio | Visual Information Representation, The |
MPEG-4: | audio | /video and synthetic graphics/audio for mixed media |
MPEG-4: | audio | /video and synthetic graphics/audio for mixed media |
MUGEN: A Playground for Video- | audio | -Text Multimodal Understanding and GENeration |
Multi Event Localization by | audio | -Visual Fusion with Omnidirectional Camera and Microphone Array |
Multi-beam steering for 3D | audio | rendering in linear phased loudspeaker arrays |
Multi-Granularity Aggregation Transformer for Joint Video- | audio | -Text Representation Learning |
Multi-level Fusion of | audio | and Visual Features for Speaker Identification |
Multi-level Particle Filter Fusion of Features and Cues for | audio | -Visual Person Tracking |
Multi-Modal Particle Filtering Tracking using Appearance, Motion and | audio | Likelihoods |
Multi-Speaker Tracking From an | audio | -Visual Sensing Device |
Multi-step Coding Structure of Spatial | audio | Object Coding |
Multi-Task Adapters for On-Device | audio | Inference |
Multi-View Video and Multi-Channel | audio | Broadcasting System |
Multichannel | audio | Coding Based on Analysis by Synthesis |
Multimedia, | audio | -Visual Communications, Survey |
Multimodal and Multi-task | audio | -Visual Vehicle Detection and Classification |
Multimodal Approach for Percussion Music Transcription from | audio | and Video, A |
Multimodal emotion recognition using cross modal | audio | -video fusion with attention and deep metric learning |
Multimodal framework based on | audio | -visual features for summarisation of cricket videos |
Multimodal fusion of | audio | , scene, and face features for first impression estimation |
Multimodal Music Mood Classification by Fusion of | audio | and Lyrics |
Multimodal Person Recognition Using Unconstrained | audio | and Video |
Multimodal Processing and Interaction: | audio | , Video, Text |
Multimodal Saliency Model for Videos With High | audio | -Visual Correspondence, A |
Multimodal speaker identification with | audio | -video processing |
Multimodal tracking and classification of | audio | -visual features |
Multimodal Variational Auto-encoder based | audio | -Visual Segmentation |
Multimodal( | audio | , Facial and Gesture) based Emotion Recognition challenge |
Multiple Scrambling and Adaptive Synchronization for | audio | Watermarking |
Multiple Speaker Tracking in Spatial | audio | via PHD Filtering and Depth-Audio Fusion |
Multiple Speaker Tracking in Spatial | audio | via PHD Filtering and Depth-Audio Fusion |
Multipurpose | audio | watermarking |
Multistandard Digital HD/SD | audio | Multiplexer With Modular Ancillary Packet Substitution, A |
Multivariate mutual information for | audio | video fusion |
Music Popularity: Metrics, Characteristics, and | audio | -Based Prediction |
NDFT-based | audio | Watermarking Scheme with High Security |
Neural network based reinforcement learning for | audio | -visual gaze control in human-robot interaction |
Neural Voice Puppetry: | audio | -driven Facial Reenactment |
New algorithm for searching minimum bit rate wavelet representations with application to multiresolution-based perceptual | audio | coding |
New Approach to Integrate | audio | and Visual Features of Speech, A |
New | audio | Watermarking for Copyright Protection and Content Authentication, A |
New matching pursuit based sinusoidal modelling method for | audio | coding |
new multi-purpose | audio | -visual UNMC-VIER database with multiple variabilities, A |
No-Reference model for Detecting | audio | Artifacts using Pretrained Audio Neural Networks, A |
No-Reference model for Detecting | audio | Artifacts using Pretrained Audio Neural Networks, A |
Noise Adaptive Stream Weighting in | audio | -Visual Speech Recognition |
Noise-Free | audio | Signal Processing in Noisy Environment: A Hardware and Algorithm Solution |
Nonnegative OPLS for Supervised Design of Filter Banks: Application to Image and | audio | Feature Extraction |
novel 3D | audio | display system using radiated loudspeaker for future 3D multimodal communications, A |
Novel Anti-Collusion | audio | Fingerprinting Scheme Based on Fourier Coefficients Reversing, A |
Novel | audio | Feature Projection Using KDLPCCA-Based Correlation with EEG Features for Favorite Music Classification |
Novel | audio | Features for Music Emotion Recognition |
novel efficient approach for | audio | segmentation, A |
Novel Lip Descriptor for | audio | -Visual Keyword Spotting Based on Adaptive Decision Fusion, A |
novel perceptual feature set for | audio | emotion recognition, A |
Novel Representation of Bioacoustic Events for Content-Based Search in Field | audio | Data, A |
Novel Steganalysis of Steghide Focused on High-Frequency Region of | audio | Waveform, A |
Object Category Detection Using | audio | -Visual Cues |
Omnidirectional Information Gathering for Knowledge Transfer-based | audio | -Visual Navigation |
On the | audio | -visual Synchronization for Lip-to-Speech Synthesis |
On the Correlation of Automatic | audio | and Visual Segmentations of Music Videos |
On the Effect of Observed Subject Biases in Apparent Personality Analysis From | audio | -Visual Signals |
On the Use of Locality Sensitive Hashing for | audio | Following |
On-line adaptive background modelling for | audio | surveillance |
Online Cross-Modal Adaptation for | audio | -Visual Person Identification With Wearable Cameras |
Online Spectrogram Inversion for Low-Latency | audio | Source Separation |
Open-Set Recognition and Few-Shot Learning Dataset for | audio | Event Classification in Domestic Environments, An |
Open-Source Practices for Music Signal Processing Research: Recommendations for Transparent, Sustainable, and Reproducible | audio | Research |
Optimized recursive subband synthesis windowing for implementing efficient MPEG | audio | decoders |
Optimizing a High-Order Graphic Equalizer for | audio | Processing |
Optimum Design of Multistage Multirate FIR Filter for | audio | Signal Sampling Rate Conversion via a Genetic Algorithm Approach, The |
Ornithologist's Guide for Including Machine Learning in a Workflow to Identify a Secretive Focal Species from Recorded | audio | , An |
Other Related Papers, | audio | , Speech, Signal Processing, Pattern Recognition |
Overview of MPEG-7 | audio | |
Overview on Perceptually Motivated | audio | Indexing and Classification, An |
Pano-AVQA: Grounded | audio | -Visual Question Answering on 360° Videos |
Parametric Implicit Face Representation for | audio | -Driven Facial Reenactment |
Patra: A Novel Document Architecture for Integrating Handwriting with | audio | -Visual Information |
Perception-Aware Cross-Modal Signal Reconstruction: From | audio | -Haptic to Visual |
Perceptual | audio | data concealment and watermarking scheme using direct frequency domain substitution |
Perceptual | audio | Watermarking by Learning in Wavelet Domain |
Perceptual Coding of High-Quality Digital | audio | |
Perceptual criterion based fragile | audio | watermarking using adaptive wavelet packets |
Perceptual Lossless Quantization of Spatial Parameter for 3D | audio | Signals, The |
Perceptual-based quality assessment for | audio | -visual services: A survey |
Performance-Based Interpreter Identification in Saxophone | audio | Recordings |
Performances of low-level | audio | classifiers for large-scale music similarity |
Person Tracking Using | audio | and Depth Cues |
Person Tracking with | audio | -Visual Cues Using the Iterative Decoding Framework |
Personal Sound Zones: Delivering interface-free | audio | to multiple listeners |
Phase-Entrained Particle Filter for | audio | -Locomotion Synchronization, A |
phone-viseme dynamic Bayesian network for | audio | -visual automatic speech recognition, A |
Photorealistic adaptation and interpolation of facial expressions using HMMS and AAMS for | audio | -visual speech synthesis |
Pose-Controllable Talking Face Generation by Implicitly Modularized | audio | -Visual Representation |
Positive Sample Propagation along the | audio | -Visual Event Line |
Power of Sound (TPoS): | audio | Reactive Video Generation with Stable Diffusion, The |
Pre-Training | audio | Representations With Self-Supervision |
Predicting | audio | -visual salient events based on visual, audio and text modalities for movie summarization |
Predicting | audio | -visual salient events based on visual, audio and text modalities for movie summarization |
Prediction of the Leadership Style of an Emergent Leader Using | audio | and Visual Nonverbal Features |
Primal-dual algorithms for | audio | decomposition using mixed norms |
Probabilistic Kernels for Improved Text-to-Speech Alignment in Long | audio | Tracks |
Proposed Integration Algorithm to Optimize the Separation of | audio | Signals Using the Ica and Wavelet Transform |
Prosodic, Spectral and Voice Quality Feature Selection Using a Long-Term Stopping Criterion for | audio | -Based Emotion Recognition |
Psychoacoustic Calibration of Loss Functions for Efficient End-to-End Neural | audio | Coding |
Pyramid Based Interpolation for Face-Video Playback in | audio | Visual Recognition |
QUALIFIER: Question-Guided Self-Attentive Multimodal Fusion Network for | audio | Visual Scene-Aware Dialog |
RAV: Learning-Based Adaptive Streaming to Coordinate the | audio | and Video Bitrate Selections |
Ray-Space-Based Multichannel Nonnegative Matrix Factorization for | audio | Source Separation |
Real-Time | audio | -Guided Multi-Face Reenactment |
Real-time Demonstration of Personal | audio | and 3d Audio Rendering Using Line Array Systems |
Real-time Demonstration of Personal | audio | and 3d Audio Rendering Using Line Array Systems |
Real-Time Lip Tracking for | audio | -Visual Speech Recognition Applications |
Real-Time monophonic and polyphonic | audio | classification from power spectra |
Real-Time MPEG-1 | audio | Coding and Decoding on a DSP Chip |
Real-Time Perceptual Model for Distraction in Interfering | audio | -on-Audio Scenarios |
Real-Time Perceptual Model for Distraction in Interfering | audio | -on-Audio Scenarios |
real-time system for | audio | source localization with cheap sensor device, A |
Real-Time User Position Estimation in Indoor Environments Using Digital Watermarking for | audio | Signals |
Realistic Human Action Recognition with | audio | Context |
Recalibrated Bandpass Filtering on Temporal Waveform for | audio | Spoof Detection |
Recognizing High-level | audio | -visual Concepts Using Context |
Recovering | audio | -to-video synchronization by audiovisual correlation analysis |
Recovery of | audio | -to-video synchronization through analysis of cross-modality correlation |
Reliable detection of | audio | events in highly noisy environments |
Remote | audio | /video acquisition for human signature detection |
Representation and linking mechanisms for | audio | in MPEG-7 |
Representations of the Complex-Valued Frequency-Domain LPC for | audio | Coding |
Resilience Mask for Robust | audio | Hashing, A |
Rethink Cross-Modal Fusion in Weakly-Supervised | audio | -Visual Video Parsing |
Reverberant | audio | Source Separation via Sparse and Low-Rank Modeling |
Reversible and Robust | audio | Watermarking Based on Quantization Index Modulation and Amplitude Expansion |
Reversible and Robust | audio | Watermarking Based on Spread Spectrum and Amplitude Expansion |
Reversible | audio | Data Hiding Based on Variable Error-Expansion of Linear Prediction for Segmental Audio and G.711 Speech |
Reversible | audio | Data Hiding Based on Variable Error-Expansion of Linear Prediction for Segmental Audio and G.711 Speech |
Reversible | audio | Information Hiding Based on Integer DCT Coefficients with Adaptive Hiding Locations |
Review of Automatic Fault Diagnosis Systems Using | audio | and Vibration Signals |
Right to Talk: An | audio | -Visual Transformer Approach, The |
RNN-Based Speech-Music Discrimination Used for Hybrid | audio | Coder, An |
Robot Command Interface Using an | audio | -Visual Speech Recognition System |
Robust | audio | Fingerprint's Based Identification Method, A |
Robust | audio | Patch Attacks Using Physical Sample Simulation and Adversarial Patch Noise Generation |
robust | audio | searching method for cellular-phone-based music information retrieval, A |
Robust | audio | Watermarking Based on Log-Polar Frequency Index |
Robust | audio | Watermarking Based on Low-Order Zernike Moments |
Robust | audio | watermarking based on multi-carrier modulation |
Robust | audio | Watermarking by Using Low-Frequency Histogram |
Robust | audio | Watermarking Scheme Based on Lifting Wavelet Transform and Singular Value Decomposition, A |
Robust | audio | Watermarking Using Both DWT and Masking Effect |
Robust | audio | Watermarking Using Perceptual Masking |
Robust | audio | Zero-Watermark Based on LWT and Chaotic Modulation |
Robust | audio | -Visual Instance Discrimination |
Robust | audio | -Visual Mandarin Speech Recognition Based On Adaptive Decision Fusion And Tone Features |
Robust | audio | -Visual Speech Recognition Based on Hybrid Fusion |
Robust | audio | -Visual Speech Recognition Based on Late Integration |
Robust | audio | -Visual Speech Recognition Under Noisy Audio-Video Conditions |
Robust | audio | -Visual Speech Recognition Under Noisy Audio-Video Conditions |
Robust AVS | audio | Watermarking |
robust digital | audio | watermarking based on statistics characteristics, A |
Robust Estimation of Amplitude Modification for Scalar Costa Scheme Based | audio | Watermark Detection |
Robust Frequency Domain | audio | Watermarking: A Tuning Analysis |
Robust Hiding of Fingerprint-Biometric Data into | audio | Signals |
Robust One Shot | audio | to Video Generation |
Robust Sensor Fusion: Analysis and Application to | audio | -Visual Speech Recognition |
Robust, Blindly-Detectable, and Semi-Reversible Technique of | audio | Watermarking Based on Cochlear Delay Characteristics |
Robustness of Multiplexing Protocols for | audio | -Visual Services Over Wireless Networks |
SadTalker: Learning Realistic 3D Motion Coefficients for Stylized | audio | -Driven Single Image Talking Face Animation |
Scalability Analysis of | audio | -Visual Person Identity Verification |
Scalable | audio | coding for compression and loss resilient streaming |
scale-free distribution of false positives for a large class of | audio | similarity measures, A |
Scene Change Detection Based on | audio | -Visual Analysis and Interaction |
Score-Informed Source Separation for Musical | audio | Recordings: An overview |
Scream and gunshot detection and localization for | audio | -surveillance systems |
Search the | audio | , Browse the Video: A Generic Paradigm for Video Collections |
Secure spread spectrum watermarking for images, | audio | and video |
Segmental DCT Coefficient Reversal Based Anti-Collusion | audio | Fingerprinting Mechanism |
Selective Background Adaptation Based Abnormal Acoustic Event Recognition for | audio | Surveillance |
Self-Supervised | audio | Spatialization with Correspondence Classifier |
Self-Supervised Contrastive Learning for | audio | -Visual Action Recognition |
Self-Supervised Fine-Grained Cycle-Separation Network (FSCN) for Visual- | audio | Separation |
Self-Supervised Learning of | audio | Representations From Permutations With Differentiable Ranking |
Self-supervised Learning of | audio | -visual Objects from Video |
Self-supervised object detection from | audio | -visual correspondence |
Self-Supervised Video Forensics by | audio | -Visual Anomaly Detection |
Semantic and Relation Modulation for | audio | -Visual Event Localization |
Semantic | audio | -Visual Navigation |
Semantic Context Detection Using | audio | Event Fusion |
Semantic Indexing of Multimedia Content Using Visual, | audio | , and Text Cues |
Semantic indexing of soccer | audio | -visual sequences: A multimodal approach based on controlled Markov chains |
Semantic indexing of sports program sequences by | audio | -visual analysis |
Semantic Learning for | audio | Applications: A Computer Vision Approach |
Semantic Video Retrieval Using | audio | Analysis |
Semantic-Aware Implicit Neural | audio | -Driven Video Portrait Generation |
Sensor and Data Systems, | audio | -Assisted Cameras and Acoustic Doppler Sensors |
Sep-stereo: Visually Guided Stereophonic | audio | Generation by Associating Source Separation |
Separation of | audio | -Visual Speech Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli |
Separation of | audio | -Visual Speech Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli |
SEWA DB: A Rich Database for | audio | -Visual Emotion and Sentiment Research in the Wild |
Shot genre classification using compressed | audio | -visual features |
Signal-Aware Parametric Quality Model for | audio | and Speech over IP Networks |
Simple and Efficient method for Dubbed | audio | Sync Detection using Compressive Sensing, A |
Simple Baseline for | audio | -Visual Scene-Aware Dialog, A |
Single-modal Incremental Terrain Clustering from Self-Supervised | audio | -Visual Feature Learning |
Sinusoidal modelling using perceptual matching pursuits in the bark scale for parametric | audio | coding |
Sleep Apnea Detection via Depth Video and | audio | Feature Learning |
SMART-I2: Spatial Multi-user | audio | -visual Real-time interactive interface, A broadcast application context |
SNR-Constrained Heuristics for Optimizing the Scaling Parameter of Robust | audio | Watermarking |
Sociometry Based Multiparty | audio | Recordings Summarization |
Sonic Trampoline: How | audio | Feedback Impacts the User's Experience of Jumping |
Sound event detection in real-life | audio | using joint spectral and temporal features |
Sound Quality Evaluation for | audio | Watermarking Based on Phase Shift Keying Using BCH Code |
Sound to Visual Scene Generation by | audio | -to-Visual Latent Alignment |
Sound Transformation: Applying Image Neural Style Transfer Networks to | audio | Spectograms |
Soundspaces: | audio | -visual Navigation in 3d Environments |
Sparse representation of | audio | features for sputum detection from lung sounds |
Spatial | audio | Object Coding With Two-Step Coding Structure for Interactive Audio Service |
Spatial | audio | Object Coding With Two-Step Coding Structure for Interactive Audio Service |
Spatial misregistration of virtual human | audio | : Implications of the precedence effect |
Speaker and Digit Recognition by | audio | -Visual Lip Biometrics |
Speaker dependent video indexing based on | audio | -visual interaction |
Speaker Independent | audio | -Visual Speech Recognition |
Speaker Tracking Algorithm Based on | audio | and Visual Information Fusion Using Particle Filter, A |
Special Issue on | audio | -Based and Video-Based Person Authentication |
Speech Activity Detection in Naturalistic | audio | Environments: Fearless Steps Apollo Corpus |
Speech driven video editing via an | audio | -conditioned diffusion model |
Speech Personality Recognition Based on Annotation Classification Using Log-Likelihood Distance and Extraction of Essential | audio | Features |
Speech-assisted lip synchronization in | audio | -visual communications |
Speech/Music Classification Based on Distributed Evolutionary Fuzzy Logic for Intelligent | audio | Coding |
Spoken Moments: Learning Joint | audio | -Visual Representations from Video Descriptions |
Spontaneous Driver Emotion Facial Expression (DEFE) Dataset for Intelligent Vehicles: Emotions Triggered by Video- | audio | Clips in Driving Scenarios, A |
Spotting | audio | -Visual Inconsistencies (SAVI) in Manipulated Video |
Stacked Sparse Autoencoder for | audio | Object Coding |
Statistical Lip-Appearance Models Trained Automatically Using | audio | Information |
Steganalysis Scheme for AAC | audio | Based on MDCT Difference Between Intra and Inter Frame, A |
Structuring Soccer Video Based on | audio | Classification and Segmentation Using Hidden Markov Model |
Strumming to the Beat: | audio | -Conditioned Contrastive Video Textures |
Study of Subjective and Objective Quality Assessment of | audio | -Visual Signals |
Subjective and Objective | audio | -Visual Quality Assessment for User Generated Content |
Summarizing Long-Length Videos with GAN-Enhanced | audio | /Visual Features |
Supplementary Material: AVA-ActiveSpeaker: An | audio | -Visual Dataset for Active Speaker Detection |
Survey of Affect Recognition Methods: | audio | , Visual, and Spontaneous Expressions, A |
Survey of | audio | -Based Music Classification and Annotation, A |
Survey of compressed-domain features used in | audio | -visual indexing and analysis |
survey of MPEG-1 | audio | , video and semantic analysis techniques, A |
SVD-Based Adaptive QIM Watermarking on Stereo | audio | Signals |
SVGC-AVA: 360-Degree Video Saliency Prediction With Spherical Vector-Based Graph Convolution and | audio | -Visual Attention |
SVM-Based | audio | Classification for Content- Based Multimedia Retrieval |
Synchronization of Multiple Camera Videos Using | audio | -Visual Features |
Synchronization of Processed | audio | -Video Signals using Time-Stamps |
Synchronized | audio | -Visual Frames with Fractional Positional Encoding for Transformers in Video-to-Text Translation |
syntactic framework for bitstream-level representation of | audio | -visual objects, A |
Synthesizing Photo-Realistic 3D Talking Head: Learning Lip Synchronicity and Emotion from | audio | and Video |
Synthesizing Talking Faces from Text and | audio | : An Autoencoder and Sequence-to-Sequence Convolutional Neural Network |
Synthetic and SNHC | audio | in MPEG-4 |
System and method for | audio | /video speaker detection |
System and method for skimming digital | audio | /video data |
System and process for adding high frame-rate current speaker data to a low frame-rate video using | audio | watermarking techniques |
Tackling the Cover Source Mismatch Problem in | audio | Steganalysis With Unsupervised Domain Adaptation |
Talking Head Generation with Probabilistic | audio | -to-Visual Diffusion Priors |
Talking With Hands 16.2M: A Large-Scale Dataset of Synchronized Body-Finger Motion and | audio | for Conversational Motion Analysis and Synthesis |
Taming Diffusion Models for | audio | -Driven Co-Speech Gesture Generation |
TCD-TIMIT: An | audio | -Visual Corpus of Continuous Speech |
Teaching Practices Analysis Through | audio | Signal Processing |
Teleimmersive | audio | -Visual Communication Using Commodity Hardware |
Temporal and Cross-modal Attention for | audio | -Visual Zero-Shot Learning |
Temporal Bayesian Fusion for Affect Sensing: Combining Video, | audio | , and Lexical Modalities |
Temporal Cue Guided Video Highlight Detection with Low-Rank | audio | -Visual Fusion |
Temporal Envelope Fit of Transient | audio | Signals |
Tests on MPEG-4 | audio | codec proposals |
Texturedness decision time for | audio | texturedness indicator |
theoretical analysis of a buffer frame size conversion algorithm for | audio | applications ensuring minimum latency, A |
Three-Dimensional Speaker Localization: | audio | -Refined Visual Scaling Factor Estimation |
Time-Delay Neural Networks for Estimating Lip Movements from Speech Analysis: A Useful Tool in | audio | Video Synchronization |
Time-frequency analysis for | audio | event detection in real scenarios |
Tolerance Evaluation of | audio | Watermarking Method Based on Modification of Sound Pressure Level between Channels |
TOOTEKO: A Case Study of Augmented Reality for an Accessible Cultural Heritage. Digitization, 3D Printing and Sensors for an | audio | -Tactile Experience |
Toward Automating Oral Presentation Scoring During Principal Certification Program Using | audio | -Video Low-Level Behavior Profiles |
Towards affect-aware vehicles for increasing safety and comfort: recognising driver emotions from | audio | recordings in a realistic driving study |
Towards an End-to-End Visual-to-Raw- | audio | Generation With GAN |
Towards | audio | -Visual On-line Diarization Of Participants In Group Meetings |
Towards | audio | -Visual Saliency Prediction for Omnidirectional Video with Spatial Audio |
Towards | audio | -Visual Saliency Prediction for Omnidirectional Video with Spatial Audio |
Towards event detection in an | audio | -based sensor network |
Towards Intercultural Affect Recognition: | audio | -Visual Affect Recognition in the Wild Across Six Cultures |
Tracking Multiple | audio | Sources With the von Mises Distribution and Variational EM |
Tracking the Active Speaker Based on a Joint | audio | -Visual Observation Model |
Transcribing broadcast news for | audio | and video indexing |
TUM Gait from | audio | , Image and Depth (GAID) database: Multimodal recognition of subjects and traits, The |
two level classifier process for | audio | segmentation, A |
Two-Level Bimodal Association for | audio | -Visual Speech Recognition |
Two-level Method for Unsupervised Speaker-based | audio | Segmentation, A |
UAVM: Towards Unifying | audio | and Visual Models |
Ultra wide band | audio | visual PHY IEEE 802.15.3c for SPIHT-compressed image transmission |
Unified | audio | -Visual Saliency Model for Omnidirectional Videos With Spatial Audio |
Unified | audio | -Visual Saliency Model for Omnidirectional Videos With Spatial Audio |
Unified Multisensory Perception: Weakly-supervised | audio | -visual Video Parsing |
Unifying Background Models over Complex | audio | using Entropy |
Unsupervised | audio | -Visual Lecture Segmentation |
Unsupervised Cross-Modal Deep-Model Adaptation for | audio | -Visual Re-identification with Wearable Cameras |
Unsupervised Sound Source Localization From | audio | -Image Pairs Using Input Gradient Map |
Unsupervised Synthetic Acoustic Image Generation for | audio | -Visual Scene Understanding |
use of | audio | -Visual Description Profile in 3D video content description, The |
Using | audio | -Derived Affective Offset to Enhance TV Recommendation |
Using background | audio | change detection for segmenting video |
Using mel-frequency | audio | features from footstep sound and spatial segmentation techniques to improve frame-based moving object detection |
Using the | audio | Respiration Signal for Multimodal Discrimination of Expressive Movement Qualities |
Using Three Reassigned Spectrogram Patches and Log-Gabor Filter for | audio | Surveillance Application |
utility of MPEG-7 systems in | audio | -visual applications with multiple streams, The |
Va2mass: Towards the Fluid Filling Mass Estimation via Integration of Vision and | audio | Learning |
VALID: A New Practical | audio | -Visual Database, and Comparative Results |
Variational Bayes Adapted GMM Based Models for | audio | Clip Classification |
Variational Bayesian Inference for | audio | -Visual Tracking of Multiple Speakers |
Very low bit-rate | audio | -visual applications |
Video Augmentation for Improving | audio | Speech Recognition under Noise |
Video clip recognition using joint | audio | -visual processing model |
Video concept detection by | audio | -visual grouplets |
Video Rewrite: Driving Visual Speech with | audio | |
Video Scene Segmentation Using Video and | audio | Features |
Video Segmentation with the Assistance of | audio | Content Analysis |
Video Skimming for Quick Browsing based on | audio | and Image Characterization |
Video Summarization using MPEG-7 Motion Activity and | audio | Descriptors |
Video tracking through occlusions by fast | audio | source localisation |
Video/ | audio | Quality in Computer Vision |
Violent Video Recognition Based on Global-Local Visual and | audio | Contrastive Learning |
Violin Timbre Navigator: Real-Time Visual Feedback of Violin Bowing Based on | audio | Analysis and Machine Learning |
Virtual | audio | system customization using visual matching of ear parameters |
Virtual Talk: A Model-Based Virtual Phone Using a Layered | audio | -Visual Integration |
Vision Transformers are Parameter-Efficient | audio | -Visual Learners |
Vision-Infused Deep | audio | Inpainting |
Visual Music Transcription of Clarinet Video Recordings Trained with | audio | -Based Labelled Data |
Visual Scene Graphs for | audio | Source Separation |
Visual Signal Reliability for Robust | audio | -Visual Speaker Identification, A |
Visually Guided | audio | Source Separation with Meta Consistency Learning |
Visually Informed Binaural | audio | Generation without Binaural Audios |
Visually Informed Binaural | audio | Generation without Binaural Audios |
Visually-Guided | audio | Spatialization in Video with Geometry-Aware Multi-task Learning |
VisualVoice: | audio | -Visual Speech Separation with Cross-Modal Consistency |
Voice Activity Detection Using Wavelet-Based Multiresolution Spectrum and Support Vector Machines and | audio | Mixing Algorithm |
VoViT: Low Latency Graph-Based | audio | -Visual Voice Separation Transformer |
Watch or Listen: Robust | audio | -Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring |
Watermarking for Digital | audio | Based on Adaptive Phase Modulation |
Waveprint: Efficient wavelet-based | audio | fingerprinting |
Weakly Supervised | audio | -Visual Violence Detection |
Weakly-Supervised Action Detection Guided by | audio | Narration |
Which are the factors affecting the performance of | audio | surveillance systems? |
Wnet: | audio | -Guided Video Object Segmentation via Wavelet-Based Cross- Modal Denoising Networks |
X2Face: A Network for Controlling Face Generation Using Images, | audio | , and Pose Codes |
You Said That?: Synthesising Talking Faces from | audio | |
YouTube Movie Reviews: Sentiment Analysis in an | audio | -Visual Context |
1012 for audio