| _ | listen | _ |
| Can Language Models Learn to | listen | ? |
| Don't just | listen | , use your imagination: Leveraging visual common sense for non-visual tasks |
| Knowing who to | listen | to: Prioritizing experts from a diverse ensemble for attribute personalization |
| Learning to | listen | : Modeling Non-Deterministic Dyadic Facial Motion |
| listen | and Look: Audio-Visual Matching Assisted Speech Source Separation |
| listen | Then See: Video Alignment with Speaker Attention |
| listen | to Look Into the Future: Audio-visual Egocentric Gaze Anticipation |
| listen | to Look: Action Recognition by Previewing Audio |
| listen | to the Image |
| listen | To the Pixels |
| listen | to Your Face: Inferring Facial Action Units from Audio Channel |
| listen | to your gradients: Integrating gradients into deep unfolding networks |
| listen | With Seeing: Cross-Modal Contrastive Learning for Audio-Visual Event Localization |
| listen | : a system for locating and tracking individual speakers |
| Look& | listen | : Multi-Modal Correlation Learning for Active Speaker Detection and Speech Enhancement |
| Look, | listen | and Learn |
| Look, | listen | , and Attack: Backdoor Attacks Against Video Action Recognition |
| Music Interfaces Based on Automatic Music Signal Analysis: New Ways to Create and | listen | to Music |
| Not only Look, But Also | listen | : Learning Multimodal Violence Detection Under Weak Supervision |
| Online Automatic Speech Recognition With | listen | , Attend and Spell Model |
| OWL (Observe, Watch, | listen | ): Audiovisual Temporal Context for Localizing Actions in Egocentric Videos |
| Reading to | listen | at the Cocktail Party: Multi-Modal Speech Separation |
| To | listen | or Not: Distributed Detection with Asynchronous Transmissions |
| Watch or | listen | : Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring |
| Watch to | listen | Clearly: Visual Speech Enhancement Driven Multi-modality Speech Recognition |
| Watch, | listen | and Tell: Multi-Modal Weakly Supervised Dense Event Captioning |
26 for listen
| _ | listener | _ |
| Building Autonomous Sensitive Artificial | listener | s |
| Come and have an emotional workout with sensitive artificial | listener | s! |
| D 3 Net: A Unified Speaker- | listener | Architecture for 3D Dense Captioning and Visual Grounding |
| Emotional | listener | Portrait: Realistic Listener Motion Simulation in Conversation |
| Emotional | listener | Portrait: Realistic Listener Motion Simulation in Conversation |
| High Signal-to-Noise Ratio MEMS Noise | listener | for Ship Noise Detection |
| Investigating the Impact of Sound Angular Position on the | listener | Affective State |
| Joint Speaker- | listener | -Reinforcer Model for Referring Expressions, A |
| Making Music More Accessible for Cochlear Implant | listener | s: Recent Developments |
| On the Interrelation Between | listener | Characteristics and the Perception of Emotions in Classical Orchestra Music |
| Personal Sound Zones: Delivering interface-free audio to multiple | listener | s |
| Referit3d: Neural | listener | s for Fine-grained 3d Object Identification in Real-world Scenes |
12 for listener
| _ | listening | _ |
| Assisted | listening | Using a Headset: Enhancing audio perception in real, augmented, and virtual environments |
| Audio-Visual System for Object-Based Audio: From Recording to | listening | , An |
| Automatic ECG-Based Emotion Recognition in Music | listening | |
| Clock Skew Estimation of | listening | Nodes with Clock Correction upon Every Synchronization in Wireless Sensor Networks |
| CustomListener: Text-Guided Responsive Interaction for User-Friendly | listening | Head Generation |
| Diffusion-based Realistic | listening | Head Generation via Hybrid Motion Modeling |
| Emotion classification during music | listening | from forehead biosignals |
| Emotion Recognition Based on Physiological Changes in Music | listening | |
| Free Viewpoint Image Generation Synchronized with Free | listening | -Point Audio for 3-D Real Space Navigation |
| Inner Voices: Reflexive Augmented | listening | |
| Inter-Trial Coherence Reveals Enhanced Synchrony During Mantra | listening | |
| Joint Clock Synchronization and Ranging: Asymmetrical Time-Stamping and Passive | listening | |
| listening | for Sirens: Locating and Classifying Acoustic Alarms in City Scenes |
| listening | for You: Enhancing Speech Image Retrieval via Target Speaker Extraction |
| listening | Human Behavior: 3D Human Pose Estimation with Acoustic Signals |
| listening | with Your Eyes: Towards a Practical Visual Speech Recognition System Using Deep Boltzmann Machines |
| listening | -oriented response generation by exploiting user responses |
| LLM-driven Multimodal and Multi-Identity | listening | Head Generation |
| Machine | listening | techniques as a complement to video image analysis in forensics |
| Modeling Sequential | listening | Behaviors With Attentive Temporal Point Process for Next and Next New Music Recommendation |
| Multichannel Signal Enhancement Algorithms for Assisted | listening | Devices: Exploiting spatial diversity using multiple microphones |
| Multisensory Music | listening | in Affective Virtual Environments |
| Objective Quality and Intelligibility Prediction for Users of Assistive | listening | Devices: Advantages and limitations of existing tools |
| On Optimal Linear Filtering of Speech for Near-End | listening | Enhancement |
| Quantitative Study of Music | listening | Behavior in a Social and Affective Context |
| Reproducing the Acoustic Velocity Vectors in a Spherical | listening | Region |
| Responsive | listening | Head Generation: A Benchmark Dataset and Baseline |
| Signal Processing Techniques for Assisted | listening | |
28 for listening