_ | speech | _ |
2.4kbps Multiband Characteristic Waveform Interpolation | speech | Coding Algorithm, A |
2.5D Visual | speech | Synthesis Using Appearance Models |
3-D Convolutional Recurrent Neural Networks With Attention Model for | speech | Emotion Recognition |
3D Visual passcode: | speech | -driven 3D facial dynamics for behaviometrics |
450bps | speech | Coding Algorithm Based on Multi-Mode Matrix Quantization, A |
Accuracy, Apps Advance | speech | Recognition |
Acoustic Analysis for Automatic | speech | Recognition |
Acoustic echo cancellation for stereophonic systems derived from pairwise panning of monophonic | speech | |
Acoustic Event Detection in | speech | Overlapping Scenarios Based on High-Resolution Spectral Input and Deep Learning |
Acoustically Emotion-Aware Conversational Agent With | speech | Emotion Recognition and Empathetic Responses, The |
Active Contour Model for | speech | Balloon Detection in Comics, An |
Adaptation of Hidden Markov Models for Recognizing | speech | of Reduced Frame Rate |
Adaptive Gain Control for Enhanced | speech | Intelligibility Under Reverberation |
adaptive model of person identification combining | speech | and image information, An |
Adaptive Signal Models for Wide-Band | speech | and Audio Compression |
Adaptive | speech | Dereverberation Using Constrained Sparse Multichannel Linear Prediction |
Adaptive | speech | enhancement with varying noise backgrounds |
Adaptive | speech | Intelligibility Enhancement for Far-and-Near-end Noise Environments Based on Self-attention StarGAN |
Adding Voicing Features into | speech | Recognition Based on HMM in Slovak |
Advanced tools for | speech | synchronized animation |
Adversarial Continual Learning to Transfer Self-Supervised | speech | Representations for Voice Pathology Detection |
Adversarial Feature Learning and Unsupervised Clustering Based | speech | Synthesis for Found Data With Acoustic and Textual Noise |
Adversarial Training Based | speech | Emotion Classifier With Isolated Gaussian Regularization, An |
Affective Audio Annotation of Public | speech | es with Convolutional Clustering Neural Network |
Affine-Invariant Visual Features Contain Supplementary Information to Enhance | speech | Recognition |
Aging | speech | recognition with speaker adaptation techniques: Study on medium vocabulary continuous Bengali speech |
Aging | speech | recognition with speaker adaptation techniques: Study on medium vocabulary continuous Bengali speech |
AKVSR: Audio Knowledge Empowered Visual | speech | Recognition by Compressing Audio Knowledge of a Pretrained Model |
Algorithms for syllabic hypothesization in continuous | speech | |
Alias-and-Separate: Wideband | speech | Coding Using Sub-Nyquist Sampling and Speech Separation |
Alias-and-Separate: Wideband | speech | Coding Using Sub-Nyquist Sampling and Speech Separation |
Amazigh audiovisual | speech | recognition system design |
Amazigh isolated word | speech | recognition system using the Adaptive Orthogonal Transform Method. |
Analysing acoustic model changes for active learning in automatic | speech | recognition |
Analysis and Classification of Cold | speech | Using Variational Mode Decomposition |
Analysis of Emotion Annotation Strength Improves Generalization in | speech | Emotion Recognition Models |
Analysis of Lip Geometric Features for Audio-Visual | speech | Recognition |
Analysis of stressed human | speech | |
analysis of the effect of combining standard and alternate sensor signals on recognition of syllabic units for multimodal | speech | recognition, An |
Analysis of the Multifractal Nature of | speech | Signals |
Analysis of the Possibilities to Adapt the Foreign Language | speech | Recognition Engines for the Lithuanian Spoken Commands Recognition |
Analysis of the Utility of Classical and Novel | speech | Quality Measures for Speaker Verification |
Anchor Models for Emotion Recognition from | speech | |
Animating visible | speech | and facial expressions |
AnyoneNet: Synchronized | speech | and Talking Head Generation for Arbitrary Persons |
Application of Capsule Neural Network Based CNN for | speech | Emotion Recognition, The |
Application of digit and | speech | recognition in food delivery robot |
Application of support vector machines classifiers to visual | speech | recognition |
Application of triphone clustering in acoustic modeling for continuous | speech | recognition in Bengali |
Application of wavelet transforms for C/V segmentation on Mandarin | speech | signals |
ARawNet: A Lightweight Solution for Leveraging Raw Waveforms in Spoof | speech | Detection |
Architecture for Automatic Lipreading to Enhance | speech | Recognition, An |
Art Critic: Multisignal Vision and | speech | Interaction System in a Gaming Context |
Articulatory | speech | Re-synthesis: Profiting from Natural Acoustic Speech Data |
Articulatory | speech | Re-synthesis: Profiting from Natural Acoustic Speech Data |
ASQ: An Ultra-Low Bit Rate ASR-Oriented | speech | Quantization Method |
Assessing speaker independence on a | speech | -based depression level estimation system |
Asymmetric 3D face model for | speech | Language Pathologist applications |
Asymmetrically boosted HMM for | speech | reading |
Attention Based Speaker-independent Audio-visual Deep Learning Model for | speech | Enhancement, An |
Attention-based convolutional neural network and long short-term memory for short-term detection of mood disorders based on elicited | speech | responses |
Attention-Based Dense LSTM for | speech | Emotion Recognition |
Audio Based Real-Time | speech | Animation of Embodied Conversational Agents |
Audio Classification in | speech | and Music: A Comparison Between a Statistical and a Neural Approach |
Audio Watermarks, | speech | Watermarks |
Audio-visual continuous | speech | recognition using MPEG-4 compliant visual features |
Audio-Visual Efficient Conformer for Robust | speech | Recognition |
Audio-Visual Person Authentication with Multiple Visualized- | speech | Features and Multiple Face Profiles |
Audio-Visual | speech | Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis |
Audio-Visual | speech | Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis |
Audio-Visual | speech | Fusion Using Coupled Hidden Markov Models |
Audio-Visual | speech | Recognition Based on AAM Parameter and Phoneme Analysis of Visual Feature |
Audio-Visual | speech | Recognition Scheme Based on Wavelets and Random Forests Classification |
Audio-visual | speech | recognition techniques in augmented reality environments |
Audio-Visual | speech | Recognition Using A Two-Step Feature Fusion Strategy |
Audio-Visual | speech | Recognition Using MPEG-4 Compliant Visual Features |
Audio-visual | speech | synchronization detection using a bimodal linear prediction model |
Audio-Visual | speech | Synthesis Based on Chinese Visual Triphone |
Audio2Gestures: Generating Diverse Gestures from | speech | Audio with Conditional Variational Autoencoders |
Audiovisual Discrimination Between | speech | and Laughter: Why and When Visual Information Might Help |
Audiovisual | speech | Source Separation: An overview of key methodologies |
Audiovisual Talking Head for Augmented | speech | Generation: Models and Animations Based on a Real Speaker's Articulatory Data, An |
Auditory Features Revisited for Robust | speech | Recognition |
Autoencoder-based Unsupervised Domain Adaptation for | speech | Emotion Recognition |
Automated Lip Synchronized | speech | Driven Facial Animation |
Automated | speech | alignment for image synthesis |
Automatic bi-modal emotion recognition system based on fusion of facial expressions and emotion extraction from | speech | |
Automatic continuous | speech | recogniser for Dravidian languages using the auto associative neural network |
Automatic Detection of Amyotrophic Lateral Sclerosis (ALS) from Video-Based Analysis of Facial Movements: | speech | and Non-Speech Tasks |
Automatic Detection of Amyotrophic Lateral Sclerosis (ALS) from Video-Based Analysis of Facial Movements: | speech | and Non-Speech Tasks |
Automatic Evaluation of Hypernasality and Consonant Misarticulation in Cleft Palate | speech | |
Automatic Evaluation of | speech | Therapy Exercises Based on Image Data |
Automatic Person Verification Using | speech | and Face Information |
Automatic Selection of Visemes for Image-based Visual | speech | Synthesis |
Automatic Sentence Modality Recognition in Children's | speech | , and Its Usage Potential in the Speech Therapy |
Automatic Sentence Modality Recognition in Children's | speech | , and Its Usage Potential in the Speech Therapy |
Automatic speaker verification on narrowband and wideband lossy coded clean | speech | |
Automatic | speech | discrete labels to dimensional emotional values conversion method |
Automatic | speech | Emotion Recognition Using Auditory Models with Binary Decision Tree and SVM |
Automatic Urdu | speech | Recognition using Hidden Markov Model |
Automatic Video Annotation by Mining | speech | Transcripts |
Automatic visual | speech | segmentation and recognition using directional motion history images and Zernike moments |
AVFormer: Injecting Vision into Frozen | speech | Models for Zero-Shot AV-ASR |
Avoiding dominance of speaker features in | speech | -based depression detection |
AWLloss: Speaker Verification Based on the Quality and Difficulty of | speech | |
Bandwidth-adjusted LPC analysis for robust | speech | recognition |
Bayesian Predictive Method for Automatic | speech | Segmentation, A |
Bayesian reasoning on qualitative descriptions from images and | speech | |
Beam-search Formant Tracking Algorithm Based on Trajectory Functions for Continuous | speech | |
Beamforming Algorithm Based on Maximum Likelihood of a Complex Gaussian Distribution With Time-Varying Variances for Robust | speech | Recognition, A |
Behavioral Signal Processing: Deriving Human Behavioral Informatics From | speech | and Language |
Benchmarking classification models for emotion recognition in natural | speech | : A multi-corporal study |
Bilingual | speech | Recognition by Estimating Speaker Geometry from Video Data |
Bimodal fusion in audio-visual | speech | recognition |
Biological Motion of | speech | |
Blind Adaptive Mask to Improve Intelligibility of Non-Stationary Noisy | speech | |
Blind Source Separation Based Approach for | speech | Enhancement in Noisy and Reverberant Environment, A |
Boosted audio-visual HMM for | speech | reading |
Building Naturalistic Emotionally Balanced | speech | Corpus by Retrieving Emotional Speech from Existing Podcast Recordings |
Building Naturalistic Emotionally Balanced | speech | Corpus by Retrieving Emotional Speech from Existing Podcast Recordings |
cache-based natural language model for | speech | recognition, A |
Can we Automatically Transform | speech | Recorded on Common Consumer Devices in Real-World Environments into Professional Production Quality Speech?: A Dataset, Insights, and Challenges |
Can we Automatically Transform | speech | Recorded on Common Consumer Devices in Real-World Environments into Professional Production Quality Speech?: A Dataset, Insights, and Challenges |
Can We Read | speech | Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech Recognition |
Can We Read | speech | Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech Recognition |
Cancellable | speech | template via random binary orthogonal matrices projection hashing |
Cascade Image Transform for Speaker Independent Automatic | speech | Reading, A |
Casual chatter or speaking up? Adjusting articulatory effort in generation of | speech | and animation for conversational characters |
Casual Conversations v2 Dataset: A diverse, large benchmark for measuring fairness and robustness in audio/vision/ | speech | models, The |
CAT-DUnet: Enhancing | speech | Dereverberation via Feature Fusion and Structural Similarity Loss |
CATNet: Cross-modal fusion for audio-visual | speech | recognition |
Chunk-Level | speech | Emotion Recognition: A General Framework of Sequence-to-One Dynamic Temporal Modeling |
CIF-Based | speech | Segmentation Method for Streaming E2E ASR, A |
Class Confusability Reduction in Audio-Visual | speech | Recognition Using Random Forests |
Classification of Complex Information: Inference of Co-Occurring Affective States from Their Expressions in | speech | |
Classifier-Based Learning of Nonlinear Feature Manifold for Visualization of Emotional | speech | Prosody |
clump splitting based method to localize | speech | balloons in comics, A |
Clustering Algorithm for the Fast Match of Acoustic Conditions in Continuous | speech | Recognition, A |
Co- | speech | Gesture Detection through Multi-Phase Sequence Labeling |
Co- | speech | Gesture Synthesis by Reinforcement Learning with Contrastive Pretrained Rewards |
CodeTalker: | speech | -Driven 3D Facial Animation with Discrete Motion Prior |
Combined Handwriting and | speech | Modalities for User Authentication |
Combining Deep and Unsupervised Features for Multilingual | speech | Emotion Recognition |
Combining handwriting and | speech | recognition for transcribing historical handwritten documents |
Combining | speech | and Handwriting Modalities for Mathematical Expression Recognition |
Combining | speech | energy and edge information for fast and efficient voice activity detection in noisy environments |
Communicative Rhythm in Gesture and | speech | |
Compact and Efficient Multitask Learning in Vision, Language and | speech | |
Compact Representation of Visual | speech | Data Using Latent Variables, A |
Comparative Experiments to Evaluate the Use of Syllables for the Improvement of Automatic Recognition of Dysarthric | speech | |
Comparing Multiple Classifiers for | speech | -Based Detection of Self-Confidence: A Pilot Study |
Comparison of Active Shape Model and Scale Decomposition Based Features for Visual | speech | Recognition, A |
Comparison of Image Transform-Based Features for Visual | speech | Recognition in Clean and Corrupted Videos |
Comparison of MPEG-4 Facial Animation Parameter Groups with Respect to Audio-Visual | speech | Recognition Performance |
Comparison of Phoneme and Viseme Based Acoustic Units for | speech | Driven Realistic lip Animation |
Complex Neural Spatial Filter: Enhancing Multi-Channel Target | speech | Separation in Complex Domain |
computationally compact divergence measure for | speech | processing, A |
Computer Assisted Transcription of | speech | |
Concatenated Frame Image Based CNN for Visual | speech | Recognition |
Conceptual and Lexical Factors in the Production of | speech | and Conversational Gestures: Neuropsychological Evidence |
Conditional Random Fields in | speech | , Audio, and Language Processing |
ConflictNET: End-to-End Learning for | speech | -Based Conflict Intensity Estimation |
Connecting Subspace Learning and Extreme Learning Machine in | speech | Emotion Recognition |
Constant-Q magnitude-phase coefficients extraction for synthetic | speech | detection |
Constrained MMSE LP Residual Estimator for | speech | Dereverberation in Noisy Environments, A |
Constructing | speech | processing systems on universal phonetic codes accompanied with reference acoustic models |
Contextual and Cross-Modal Interaction for Multi-Modal | speech | Emotion Recognition |
Contextual vector quantization for | speech | recognition with discrete hidden Markov model |
Continual Learning for Personalized Co- | speech | Gesture Generation |
Continuous Audio-Visual | speech | Recognition |
Continuous Automatic | speech | Recognition by Lipreading |
Continuous Estimation of Emotions in | speech | by Dynamic Cooperative Speaker Models |
Continuous | speech | coding using coiflets wavelet |
Controllable Multi-Lingual Multi-Speaker Multi-Style Text-to- | speech | Synthesis With Multivariate Information Minimization, A |
Conversational Evaluation of | speech | Bandwidth Extension Using a Mobile Handset |
Conversion of neutral | speech | to storytelling style speech |
Conversion of neutral | speech | to storytelling style speech |
Convolutional Network With Multi-Scale and Attention Mechanisms for End-to-End Single-Channel | speech | Enhancement, A |
Convolutional Neural Networks for Distant | speech | Recognition |
Correlation based | speech | -video synchronization |
coupled HMM approach to video-realistic | speech | animation, A |
Creating 3D | speech | -driven talking heads: a probabilistic network approach |
CrisisHateMM: Multimodal Analysis of Directed and Undirected Hate | speech | in Text-Embedded Images from Russia-Ukraine Conflict |
CroMM-VSR: Cross-Modal Memory Augmented Visual | speech | Recognition |
Cross-Corpus | speech | Emotion Recognition Based on Domain-Adaptive Least-Squares Regression |
Cross-Corpus | speech | Emotion Recognition Based on Few-Shot Learning and Domain Adaptation |
Cross-Modal Analysis of | speech | , Gestures, Gaze and Facial Expressions |
Cross-Speaker Emotion Transfer Through Information Perturbation in Emotional | speech | Synthesis |
Cryptographic- | speech | -Key Generation Architecture Improvements |
Cued | speech | Gesture Recognition: A First Prototype Based on Early Reduction |
CWT-Based Approach for Epoch Extraction From Telephone Quality | speech | |
Cyclic Defense GAN Against | speech | Adversarial Attacks |
Cyclic Transfer Learning for Mandarin-English Code-Switching | speech | Recognition |
Czech Spontaneous | speech | Collection and Annotation: The Database of Technical Lectures |
Dar | speech | : An Automatic Speech Recognition System for the Moroccan Dialect |
Data-Driven Jacobian Adaptation in a Multi-model Structure for Noisy | speech | Recognition |
Dawn of the Transformer Era in | speech | Emotion Recognition: Closing the Valence Gap |
DBATES: Dataset for Discerning Benefits of Audio, Textual, and Facial Expression Features in Competitive Debate | speech | es |
DBN-based Spectral Feature Representation for Statistical Parametric | speech | Synthesis |
Decision Level Fusion for Audio-Visual | speech | Recognition in Noisy Conditions |
Deep Audio-Visual | speech | Recognition |
Deep Belief Networks for Real-Time Extraction of Tongue Contours from Ultrasound During | speech | |
Deep Cross-Modal Retrieval Between Spatial Image and Acoustic | speech | |
Deep Hybrid Approach for Hate | speech | Analysis, A |
Deep Learning for Acoustic Modeling in Parametric | speech | Generation: A systematic review of existing techniques and future trends |
Deep Learning for Emotional | speech | Recognition |
Deep Learning Loss Function Based on the Perceptual Evaluation of the | speech | Quality, A |
DeepComboSAD: Spectro-Temporal Correlation Based | speech | Activity Detection for Naturalistic Audio Streams |
Defining Laughter Context for Laughter Synthesis with Spontaneous | speech | Corpus |
DeFT-AN: Dense Frequency-Time Attentive Network for Multichannel | speech | Enhancement |
Demonstration of an HMM-based photorealistic expressive audio-visual | speech | synthesis system |
Dense Convolutional Recurrent Neural Network for Generalized | speech | Animation |
Detecting Aggression in Voice Using Inverse Filtered | speech | Features |
Detecting Multiple Steganography Methods in | speech | Streams Using Multi-Encoder Network |
Detecting Parkinson's disease with sustained phonation and | speech | signals using machine learning techniques |
Detecting Unipolar and Bipolar Depressive Disorders from Elicited | speech | Responses Using Latent Affective Structure Model |
Detection of a Speaker in Video by Combined Analysis of | speech | Sound and Mouth Movement |
Detection of COVID-19 from | speech | signal using bio-inspired based cepstral features |
Detection of Dynamic Structures of | speech | Fundamental Frequency in Tonal Languages |
Detection of Vowel Offset Point From | speech | Signal |
Device and method for dubbing an audio-visual presentation which generates synthesized | speech | and corresponding facial movements |
Differentiable Mean Opinion Score Regularization for Perceptual | speech | Enhancement |
DiffMotion: | speech | -Driven Gesture Synthesis Using Denoising Diffusion Model |
DiffV2S: Diffusion-based Video-to- | speech | Synthesis with Vision-guided Speaker Embedding |
Diphone spanish text-to- | speech | synthesizer |
Direct Text to | speech | Translation System Using Acoustic Units |
Disambiguation in Unknown Object Detection by Integrating Image and | speech | Recognition Confidences |
Discriminating Unknown Objects from Known Objects Using Image and | speech | Information |
Discrimination Between Native and Non-Native | speech | Using Visual Features Only |
Discriminative Analysis of Lip Motion Features for Speaker Identification and | speech | -Reading |
Discriminative Capacity and Phonetic Information of Bottleneck Features in | speech | |
Discriminative feature extraction for | speech | recognition using continuous output codes |
Discriminative Frequency Information Learning for End-to-End | speech | Anti-Spoofing |
Discriminative Multi-Modality | speech | Recognition |
Discriminative Training of NMF Model Based on Class Probabilities for | speech | Enhancement |
Distilled non-semantic | speech | embeddings with binary neural networks for low-resource devices |
Distributed Audio Network for | speech | Enhancement in Challenging Noise Backgrounds |
Distributed Microphones | speech | Separation by Learning Spatial Information With Recurrent Neural Network |
Djinn: Interaction Framework for Home Environment Using | speech | and Vision |
DNN-Based Feature Enhancement Using DOA-Constrained ICA for Robust | speech | Recognition |
DNN-Based Feature Extraction for Conflict Intensity Estimation From | speech | |
Does Visual Self-Supervision Improve Learning of | speech | Representations for Emotion Recognition? |
DR2: Disentangled Recurrent Representation Learning for Data-efficient | speech | Video Synthesis |
Dynamic 3-D Visualization of Vocal Tract Shaping During | speech | |
Dynamic Bayesian Networks for Audio-Visual | speech | Recognition |
Dynamic versus Static Facial Expressions in the Presence of | speech | |
Dynamic-static Cross Attentional Feature Fusion Method for | speech | Emotion Recognition |
E2E-V2SResNet: Deep residual convolutional neural networks for end-to-end video driven | speech | synthesis |
Effect of Various Visual | speech | Units on Language Identification Using Visual Speech Recognition |
Effect of Various Visual | speech | Units on Language Identification Using Visual Speech Recognition |
Effective online unsupervised adaptation of Gaussian mixture models and its application to | speech | classification |
Effective Style Token Weight Control Technique for End-to-End Emotional | speech | Synthesis, An |
Effectiveness of Mel Scale-Based ESA-IFCC Features for Classification of Natural vs. Spoofed | speech | |
Efficient Framework for Constructing | speech | Emotion Corpus Based on Integrated Active Learning Strategies, An |
Efficient Gaussian Mixture for | speech | Recognition |
Efficient Generation of | speech | Adversarial Examples with Generative Model |
Efficient HMM-Based Feature Enhancement Method With Filter Estimation for Reverberant | speech | Recognition, An |
Efficient One-Pass Decoding with NNLM for | speech | Recognition |
Efficient Representation Learning for Inner | speech | Domain Generalization |
Efficient Sparse Banded Acoustic Models for | speech | Recognition |
Efficient text analyser with prosody generator-driven approach for Mandarin text-to- | speech | |
Efficient use of the grammar scale factor to classify incorrect words in | speech | recognition verification |
Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for Low-Resource | speech | Recognition |
EmoNet: A Transfer Learning Framework for Multi-Corpus | speech | Emotion Recognition |
EmoTalk: | speech | -Driven Emotional Disentanglement for 3D Face Animation |
Emotion Dependent Domain Adaptation for | speech | Driven Affective Facial Feature Synthesis |
Emotion recognition from | speech | signals via a probabilistic echo-state network |
Emotion Recognition of Affective | speech | Based on Multiple Classifiers Using Acoustic-Prosodic Information and Semantic Labels |
Emotional | speech | Analysis on Nonlinear Manifold |
Emotional | speech | Classification Based on Multi View Characterization |
Emotional | speech | Clustering Based Robust Speaker Recognition System |
Emotional | speech | Recognition Using Acoustic Models of Decomposed Component Words |
End-to-End Audiovisual | speech | Recognition System With Multitask Learning |
End-to-End Dual-Branch Network Towards Synthetic | speech | Detection |
End-to-End Pathological | speech | Detection Using Wavelet Scattering Network |
End-to-end Triplet Loss based Emotion Embedding System for | speech | Emotion Recognition |
End-to-End Video-to- | speech | Synthesis Using Generative Adversarial Networks |
End-to-end visual | speech | recognition for small-scale datasets |
Enhanced VQ-Based Algorithms for | speech | Independent Speaker Identification |
Enhancement of Spectral Tilt in Synthesized | speech | |
Enhancing Emotion Classification Through | speech | and Correlated Emotional Sounds via a Variational Auto-Encoder Model with Prosodic Regularization |
Enhancing Frequency Shifted | speech | Signals in Single Side-Band Communication |
EPG2S: | speech | Generation and Speech Enhancement Based on Electropalatography and Audio Signals Using Multimodal Learning |
EPG2S: | speech | Generation and Speech Enhancement Based on Electropalatography and Audio Signals Using Multimodal Learning |
Error Mitigation Technique for Erasure Channels Based on a Wavelet Representation of the | speech | Excitation Signal, An |
Error-Diffusion Based | speech | Feature Quantization for Small-Footprint Keyword Spotting |
ESAformer: Enhanced Self-Attention for Automatic | speech | Recognition |
Estimating | speech | Spectral Amplitude Based on the Nakagami Approximation |
Estimation of Rapidly Time-Varying Harmonic Noise for | speech | Enhancement |
Evaluation of Head Gaze Loosely Synchronized With Real-Time Synthetic | speech | for Social Robots |
Evaluation of | speech | Emotion Classification Based on GMM and Data Fusion |
Evaluation of the Concatenative Turkish Text-to- | speech | System |
Evaluation of Visual | speech | Features for the Tasks of Speech and Speaker Recognition, An |
Evaluation of Visual | speech | Features for the Tasks of Speech and Speaker Recognition, An |
experimental study of energy dips for | speech | and music, An |
Experimental Study on | speech | Enhancement Based on Deep Neural Networks, An |
Experimental Study on Transfer Learning in Denoising Autoencoders for | speech | Enhancement |
Experiments in dynamic programming inference of Markov networks with strings representing | speech | data |
Explainability of | speech | Recognition Transformers via Gradient-Based Attention Visualization |
Exploiting alternative acoustic sensors for improved noise robustness in | speech | communication |
Exploiting | speech | for Automatic TV Delinearization: From Streams to Cross-Media Semantic Navigation |
Exploiting | speech | /Gesture Co-occurrence for Improving Continuous Gesture Recognition in Weather Narration |
Exploring Co-Occurence Between | speech | and Body Movement for Audio-Guided Video Localization |
Exploring Hate | speech | Detection in Multimodal Publications |
Exploring | speech | Features for Classifying Emotions along Valence Dimension |
Exploring the Topics of Audio Words for Detecting Alzheimer's Disease From Spontaneous | speech | |
Exploring Zero-Shot Emotion Recognition in | speech | Using Semantic-Embedding Prototypes |
Expression-Preserving Face Frontalization Improves Visually Assisted | speech | Processing |
Expressive Facial Animation Synthesis by Learning | speech | Coarticulation and Expression Spaces |
Expressive Modulation of Neutral Visual | speech | |
Expressive | speech | -Driven Lip Movements with Multitask Learning |
Expressive visual text-to- | speech | as an assistive technology for individuals with autism spectrum conditions |
Expressive Visual Text-to- | speech | Using Active Appearance Models |
Extended Decision Tree with or Relationship for HMM-Based | speech | Synthesis |
Extension of proposal of standards for intelligibility tests of Chinese | speech | : CDRT-tone |
Extracting High Level Semantics by Means of | speech | , Audio, and Image Primitives in Surveillance Applications |
F0 Parameterization of Glottalized Tones in HMM-Based | speech | Synthesis for Hanoi Vietnamese |
FaceFormer: | speech | -Driven 3D Facial Animation with Transformers |
Facial 3D Shape Estimation from Images for Visual | speech | Animation |
Facial Expression Recognition in the Presence of | speech | Using Blind Lexical Compensation |
Factorized MVDR Deep Beamforming for Multi-Channel | speech | Enhancement |
Factors in Emotion Recognition With Deep Learning Models Using | speech | and Text on Multiple Corpora |
Far-Field Automatic | speech | Recognition |
Fast Object Class Labelling via | speech | |
Fast, Diverse and Accurate Image Captioning Guided by Part-Of- | speech | |
Feature Denoising Using Joint Sparse Representation for In-Car | speech | Recognition |
Feature optimisation for stress recognition in | speech | |
Feature Pooling of Modulation Spectrum Features for Improved | speech | Emotion Recognition in the Wild |
Feature Selection Based Transfer Subspace Learning for | speech | Emotion Recognition |
Feature selection methods for hidden Markov model-based | speech | recognition |
Feature space video stream consistency estimation for dynamic stream weighting in audio-visual | speech | recognition |
Features extraction and selection for emotional | speech | classification |
Few-Shot Learning in Emotion Recognition of Spontaneous | speech | Using a Siamese Neural Network With Adaptive Sample Pair Formation |
Finding Lips in Unconstrained Imagery for Improved Automatic | speech | Recognition |
Fine-Grained Action Retrieval Through Multiple Parts-of- | speech | Embeddings |
First degree heart block determination from | speech | analysis |
Frame-synchronous noise compensation for hands-free | speech | recognition in car environments |
From Bottom to Top: A Coordinated Feature Representation Method for | speech | Recognition |
From | speech | Quality Measures to Speaker Recognition Performance |
From Text to | speech | : A Multimodal Cross-Domain Approach for Deception Detection |
FSCNet: Feature-Specific Convolution Neural Network for Real-Time | speech | Enhancement |
FSER: Deep Convolutional Neural Networks for | speech | Emotion Recognition |
Fundamental Technologies in Modern | speech | Recognition |
Furcanext: End-to-end Monaural | speech | Separation with Dynamic Gated Dilated Temporal Convolutional Networks |
Fused | speech | Enhancement Framework for Robust Speaker Verification, A |
Fusing Audio and Visual Features of | speech | |
Fusion of Audio-Visual Information for Integrated | speech | Processing |
Fusion of Face and | speech | Data for Person Identity Verification |
Fusion of | speech | , Faces and Text for Person Identification in TV Broadcast |
Fuzzy integral based information fusion for classification of highly confusable non- | speech | sounds |
Fuzzy rule selection using Iterative Rule Learning for | speech | data classification |
GA Approaches to HMM Optimization for Automatic | speech | Recognition |
Gabor Filterbank Features for Robust | speech | Recognition |
Gammatone Cepstral Coefficients: Biologically Inspired Features for Non- | speech | Audio Classification |
GAN-in-GAN for Monaural | speech | Enhancement |
Gaussian Specific Compensation for Channel Distortion in | speech | Recognition |
Gender classification in two Emotional | speech | databases |
Generalized Two-Stage Rank Regression Framework for Depression Score Prediction from | speech | |
Generating Co- | speech | Gestures for the Humanoid Robot NAO through BML |
Generating Holistic 3D Human Motion from | speech | |
Generating Personalized Virtual Agent in | speech | Dialogue System for People with Dementia |
Generating realistic facial animation from | speech | |
Generating Transferable Adversarial Examples for | speech | Classification |
Genetic Algorithm-Based Adaptive Wiener Gain for | speech | Enhancement Using an Iterative Posterior NMF |
geostatistical model for linear prediction analysis of | speech | , A |
GesRec3D: A Real-Time Coded Gesture-to- | speech | System with Automatic Segmentation and Recognition Thresholding Using Dissimilarity Measures |
Gesture, | speech | , and Gaze Cues for Discourse Segmentation |
Gestures and Lip Shape Integration for Cued | speech | Recognition |
Global Variance in | speech | Synthesis With Linear Dynamical Models |
Graphical | speech | Training system for hearing impaired |
Group Delay based Methods for Detection and Recognition of Whispered | speech | |
GRU-SVM Model for Synthetic | speech | Detection |
Guest Editorial: Special Issue on Affective | speech | and Language Synthesis, Generation, and Conversion |
GUI for interactive | speech | synthesis |
Harmonic Enhancement with Noise Reduction of | speech | Signal by Comb Filtering |
Head Movements in Context of | speech | during Stress Induction |
Hidden Bawls, Whispers, and Yelps: Can Text Convey the Sound of | speech | , Beyond Words? |
Hidden Conditional Random Fields for Visual | speech | Recognition |
Hierarchical Bayesian combination of plug-in maximum a posteriori decoders in deep neural networks-based | speech | recognition and speaker adaptation |
hierarchical Bayesian model for continuous | speech | recognition, A |
Hierarchical | speech | -act classification for discourse analysis |
hierarchical tag-graph search scheme with layered grammar rules for spontaneous | speech | understanding, A |
High-frame-rate real-time imaging of | speech | production |
Higher Order Subspace Algorithm for Multichannel | speech | Enhancement, A |
Highly Transparent Steganography Scheme of | speech | Signals into Color Images Using Quantization Index Modulation |
Historical Perspective of | speech | Recognition, A |
HMM based | speech | -driven 3D tongue animation |
HNM-Based Speaker-Nonspecific Timbre Transformation Scheme for | speech | Synthesis, An |
Hough transform-based mouth localization for audio-visual | speech | recognition |
Human emotion recognition by optimally fusing facial expression and | speech | feature |
hybrid approach to improve part of | speech | tagging system, An |
Hybrid Autoregressive and Non-Autoregressive Transformer Models for | speech | Recognition |
Hybrid HMM-Based | speech | Recognizer Using Kernel-Based Discriminants as Acoustic Models, A |
Hybrid PNN-GMM classification scheme for | speech | emotion recognition, A |
hybrid SVM/DDBHMM decision fusion modeling for robust continuous digital | speech | recognition, A |
hybrid visual feature extraction method for audio-visual | speech | recognition, A |
IBM Rich Transcription 2007 | speech | -to-Text Systems for Lecture Meetings, The |
IDANet: An Information Distillation and Aggregation Network for | speech | Enhancement |
IEEE Acoustics, | speech | , and Signal Processing Magazine |
IEEE Trans. Acoustics, | speech | , and Signal Processing |
Image Caption Generation with Part of | speech | Guidance |
Image-Based Visual | speech | Animation System, An |
Image-Sensitive Language Modeling for Automatic | speech | Recognition |
Image- | speech | combination for interactive computer assisted transcription of handwritten documents |
Imitator: Personalized | speech | -driven 3D Facial Animation |
Impact of imperfect OCR on part-of- | speech | tagging |
Impact of OCR Errors on Automated Classification of OCR Japanese Texts with Parts-of- | speech | Analysis, An |
Impact of Reduced Video Quality on Visual | speech | Recognition, The |
Implantation of voicing on whispered | speech | using frequency-domain parametric modelling of source and filter information |
Implementation of Three Text to | speech | Systems for Kurdish Language |
Implicit Compositional Generative Network for Length-Variable Co- | speech | Gesture Synthesis |
Improve Word Mover's Distance with Part-of- | speech | Tagging |
improved maximum model distance approach for HMM-based | speech | recognition systems, An |
Improved | speech | Reconstruction from Silent Video |
Improvement of | speech | emotion recognition with neural network classifier by using speech spectrogram |
Improvement of | speech | emotion recognition with neural network classifier by using speech spectrogram |
Improvements on Automatic | speech | Segmentation at the Phonetic Level |
Improving and Aligning | speech | with Presentation Slides |
Improving Children's | speech | Recognition by HMM Interpolation with an Adults' Speech Recognizer |
Improving Children's | speech | Recognition by HMM Interpolation with an Adults' Speech Recognizer |
Improving Cross-Corpus | speech | Emotion Recognition with Adversarial Discriminative Domain Generalization (ADDoG) |
Improving End-to-End Contextual | speech | Recognition via a Word-Matching Algorithm With Backward Search |
Improving Frame-Online Neural | speech | Enhancement With Overlapped-Frame Prediction |
Improving GANs for | speech | Enhancement |
Improving Mandarin End-to-End | speech | Recognition With Word N-Gram Language Model |
Improving Monaural | speech | Enhancement by Mapping to Fixed Simulation Space With Knowledge Distillation |
Improving Multimodal | speech | Recognition by Data Augmentation and Speech Representations |
Improving Multimodal | speech | Recognition by Data Augmentation and Speech Representations |
Improving | speech | Related Facial Action Unit Recognition by Audiovisual Information Fusion |
Improving the Classification of Volcanic Seismic Events Extracting New Seismic and | speech | Features |
Improving the Performance of Deep Learning Based | speech | Enhancement System Using Fuzzy Restricted Boltzmann Machine |
Improving the | speech | Quality of VoIP by Packet Prioritization |
Increasing Compactness of Deep Learning Based | speech | Enhancement Models With Parameter Pruning and Quantization Techniques |
Incremental Text-to- | speech | Synthesis Using Pseudo Lookahead With Large Pretrained Language Model |
Individual 3d Face Synthesis Based on Orthogonal Photos and | speech | -driven Facial Animation |
Individualized Super-Gaussian Single Microphone | speech | Enhancement for Hearing Aid Users With Smartphone as an Assistive Device, An |
Inducing Genuine Emotions in Simulated | speech | -Based Human-Machine Interaction: The NIMITEK Corpus |
Influence of Hangover and Hangbefore Criteria on Automatic | speech | Recognition |
Influence of | speech | /Non-Speech Segmentation on On-Line and Off-Line Speaker Segmentation Accuracy, The |
Influence of | speech | /Non-Speech Segmentation on On-Line and Off-Line Speaker Segmentation Accuracy, The |
Information Fusion and Person Verification Using | speech | and Face Information |
Information-Extraction Approach to | speech | Processing: Analysis, Detection, Verification, and Recognition, An |
Instrumental Assessment of Prosodic Quality for Text-to- | speech | Signals |
Integrated analysis of | speech | and images as a probabilistic decoding process |
Integrated Mining of Visual Features, | speech | Features, and Frequent Patterns for Semantic Video Annotation |
Integrated neural network model for identifying | speech | acts, predicators, and sentiments of dialogue utterances |
Integrating Binary Mask Estimation With MRF Priors of Cochleagram for | speech | Separation |
Integrating Part of | speech | Guidance for Image Captioning |
Integration of Vision and | speech | Understanding Using Bayesian Networks |
Intelligibility Enhancement Via Normal-to-Lombard | speech | Conversion With Long Short-Term Memory Network and Bayesian Gaussian Mixture Model |
Intelligibility improvements using binaural diverse sub-band processing applied to | speech | corrupted with automobile noise |
Intelligibility of Children with Cleft Lip and Palate: Evaluation by | speech | Recognition Techniques |
Inter-frame contextual modelling for visual | speech | recognition |
Interaction between | speech | and Gesture: Strategies for Pointing to Distant Objects |
Interaction framework for home environment using | speech | and vision |
Interaction of Iconic Gesture and | speech | in Talk, The |
Interaction With Gaze, Gesture, and | speech | in a Flexibly Configurable Augmented Reality System |
Interdependencies among Voice Source Parameters in Emotional | speech | |
Interference Reduction in Reverberant | speech | Separation With Visual Voice Activity Detection |
Intra-Predictive Switched Split Vector Quantization of | speech | Spectra |
Introduction to the Special Issue: Advances on pattern recognition for | speech | and audio processing |
Investigation into Audiovisual | speech | Correlation in Reverberant Noisy Environments, An |
Investigation of Partition-Based and Phonetically-Aware Acoustic Features for Continuous Emotion Prediction from | speech | , An |
Investigation of | speech | Landmark Patterns for Depression Detection |
Invited paper: Automatic | speech | recognition: History, methods and challenges |
ISL RT-07 | speech | -to-Text System, The |
Isolate | speech | Recognition Based on Time-Frequency Analysis Methods |
Isolated word recognition by neural network models with cross-correlation coefficients for | speech | dynamics |
Iterative Closed-Loop Phase-Aware Single-Channel | speech | Enhancement |
Iterative Feature Normalization Scheme for Automatic Emotion Detection from | speech | |
Joint Bayesian Estimation of Time-Varying LP Parameters and Excitation for | speech | |
KAN-AV dataset for audio-visual face and | speech | analysis in the wild |
Kernel Eigenvoices (Revisited) for Large-Vocabulary | speech | Recognition |
Key Frame Mechanism for Efficient Conformer Based End-to-End | speech | Recognition |
Keyword Detection for Spontaneous | speech | |
Kinect Development Kit: A Toolkit for Gesture- and | speech | -Based Human-Machine Interaction |
Language-Independent OCR Using a Continuous | speech | Recognition System |
Large Vocabulary Audio-visual | speech | Recognition Using Active Shape Models |
Large Vocabulary Audio-Visual | speech | Recognition Using the Janus Speech Recognition Toolkit |
Large Vocabulary Audio-Visual | speech | Recognition Using the Janus Speech Recognition Toolkit |
Large Vocabulary Continuous | speech | Recognition With Reservoir-Based Acoustic Models |
Large-Vocabulary Continuous | speech | Recognition Systems: A Look at Some Recent Advances |
Late pre-dereverberation for | speech | intelligibility enhancement in public address systems |
Latency in | speech | Feature Analysis for Telepresence Event Coding |
Learning Contextually Fused Audio-Visual Representations for Audio-Visual | speech | Recognition |
Learning Continuous Facial Actions From | speech | for Real-Time Animation |
Learning Hierarchical Cross-Modal Association for Co- | speech | Gesture Generation |
Learning Individual Speaking Styles for Accurate Lip to | speech | Synthesis |
Learning Landmarks Motion from | speech | for Speaker-agnostic 3d Talking Heads Generation |
Learning Salient Features for | speech | Emotion Recognition Using Convolutional Neural Networks |
Learning Speaker-specific Lip-to- | speech | Generation |
Learning Torso Prior for Co- | speech | Gesture Generation with Better Hand Shape |
Learning Visual | speech | |
Learning With Learned Loss Function: | speech | Enhancement With Quality-Net to Improve Perceptual Evaluation of Speech Quality |
Learning With Learned Loss Function: | speech | Enhancement With Quality-Net to Improve Perceptual Evaluation of Speech Quality |
Letter-To-Sound conversion for | speech | synthesizer |
Leveraging Non-Causal Knowledge via Cross-Network Knowledge Distillation for Real-Time | speech | Enhancement |
LFEformer: Local Feature Enhancement Using Sliding Window With Deformability for Automatic | speech | Recognition |
Linked Source and Target Domain Subspace Feature Transfer Learning -- Exemplified by | speech | Emotion Recognition |
Lip Movement Synthesis from | speech | Based on Hidden Markov Models |
Lip Reading for Low-resource Languages by Learning and Combining General | speech | Knowledge and Language-specific Knowledge |
Lip Shape and Hand Position Fusion for Automatic Vowel Recognition in Cued | speech | for French |
Lip2Vec: Efficient and Robust Visual | speech | Recognition via Latent-to-Latent Visual to Audio Representation Mapping |
Listen and Look: Audio-Visual Matching Assisted | speech | Source Separation |
Listening with Your Eyes: Towards a Practical Visual | speech | Recognition System Using Deep Boltzmann Machines |
Lite-RTSE: Exploring a Cost-Effective Lite DNN Model for Real-Time | speech | Enhancement in RTC Scenarios |
LivelySpeaker: Towards Semantic-Aware Co- | speech | Gesture Generation |
LM-VC: Zero-Shot Voice Conversion via | speech | Generation Based on Language Models |
Localizing Fake Segments in | speech | |
Locally Normalized Filter Banks Applied to Deep Neural-Network-Based Robust | speech | Recognition |
Locating and Tracking Facial | speech | Features |
Long-Frame-Shift Neural | speech | Phase Prediction With Spectral Continuity Enhancement and Interpolation Error Compensation |
Look&listen: Multi-Modal Correlation Learning for Active Speaker Detection and | speech | Enhancement |
Looking into Your | speech | : Learning Cross-modal Affinity for Audio-visual Speech Separation |
Looking into Your | speech | : Learning Cross-modal Affinity for Audio-visual Speech Separation |
Low-Complexity Parabolic Lip Contour Model With Speaker Normalization for High-Level Feature Extraction in Noise-Robust Audiovisual | speech | Recognition, A |
Low-Rank and Sparsity Analysis Applied to | speech | Enhancement Via Online Estimated Dictionary |
Low-Resource Adaptation for Personalized Co- | speech | Gesture Generation |
M3TTS: Multi-modal text-to- | speech | of multi-scale style control for dubbing |
Mandarin Emotional | speech | Recognition Based on SVM and NN |
Mandarin Text-to- | speech | Front-End With Lightweight Distilled Convolution Network |
Marathi Language | speech | Synthesizer Using Concatenative Synthesis Strategy (Spoken in Maharashtra, India) |
Markov random field model for automatic | speech | recognition, A |
Mathematical Modeling of the Effects of | speech | Warning Characteristics on Human Performance and Its Application in Transportation Cyberphysical Systems |
maximum model distance approach for HMM-based | speech | recognition, A |
Maximum Phase Modeling for Sparse Linear Prediction of | speech | |
Memory Attention: Robust Alignment Using Gating Mechanism for End-to-End | speech | Synthesis |
MES-P: An Emotional Tonal | speech | Dataset in Mandarin with Distal and Proximal Labels |
MeshTalk: 3D Face Animation from | speech | using Cross-Modality Disentanglement |
Method and apparatus for producing audio-visual synthetic | speech | |
Method and apparatus for synthetic | speech | in facial animation |
Methodology for Acoustic Characterization of a Labial Constraint in | speech | Production |
Methods and devices for producing and using synthetic visual | speech | based on natural coarticulation |
Micro-Doppler Classification for Ground Surveillance Radar Using | speech | Recognition Tools |
Microphone Array Processing Strategies for Distant-Based Automatic | speech | Recognition |
Minimized Database of Unit Selection in Visual | speech | Synthesis without Loss of Naturalness |
MixCycle: Unsupervised | speech | Separation via Cyclic Mixture Permutation Invariant Training |
Mixed bayesian networks with auxiliary variables for automatic | speech | recognition |
Mix | speech | : Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition |
Mixture of Factor Analyzers Using Priors From Non-Parallel | speech | for Voice Conversion |
Mixture of Support Vector Machines for HMM based | speech | Recognition |
Mixtures of Local Dictionaries for Unsupervised | speech | Enhancement |
Model-Based Localization Method by Non- | speech | Sound Via Wavelet Transform and Dynamic Neural Network |
Modeling and Synthesis of Facial Motion Driven by | speech | |
Modeling Feature Representations for Affective | speech | Using Generative Adversarial Networks |
Modeling human activities as | speech | |
Modeling of Physical Characteristics of | speech | under Stress |
Modeling Syllable-Based Pronunciation Variation for Accented Mandarin | speech | Recognition |
Modeling the Temporal Evolution of Acoustic Parameters for | speech | Emotion Recognition |
Modeling Vocal Entrainment in Conversational | speech | Using Deep Unsupervised Learning |
Modelling and combining emotions, visual | speech | and gestures in virtual head models |
Modelling Combined Handwriting and | speech | Modalities |
Models for the Perception of | speech | and Visual Form |
Mono-font Cursive Arabic Text Recognition Using | speech | Recognition System |
More than Words: In-the-Wild Visually-Driven Prosody for Text-to- | speech | |
Moroccan Dialect | speech | Recognition System Based on CMU SphinxTools |
Morpheme-Based Automatic | speech | Recognition of Basque |
Morphological normalization of vowel images for articulatory | speech | recognition |
Moving-Talker, Speaker-Independent Feature Study, and Baseline Results Using the CUAVE Multimodal | speech | Corpus |
Multi-environment model adaptation based on vector Taylor series for robust | speech | recognition |
Multi-Font Off-Line Arabic Character Recognition Using the BBN Byblos | speech | Recognition System |
Multi-Label | speech | Emotion Recognition via Inter-Class Difference Loss Under Response Residual Network |
Multi-layer encoder-decoder time-domain single channel | speech | separation |
Multi-lingual and Multi-modal | speech | Processing and Applications |
Multi-Modal Human Verification Using Face and | speech | |
Multi-modal information retrieval from broadcast video using OCR and | speech | recognition |
Multi-modality Associative Bridging through Memory: | speech | Sound Recollected from Face Video |
Multi-task multimodal feature refinement for emotional | speech | animation |
Multi-Task Semi-Supervised Adversarial Autoencoding for | speech | Emotion Recognition |
Multi-view visual | speech | recognition based on multi task learning |
Multichannel filters for | speech | recognition using a particle swarm optimization |
Multilevel Integration of Vision and | speech | Understanding Using Bayesian Networks |
Multimedia Document Retrieval Using | speech | and Speaker Recognition |
Multimodal biometric authentication using | speech | and hand geometry fusion |
Multimodal Biometric System Using Fingerprint, Face and | speech | , A |
Multimodal Database of Emotional | speech | , Video and Gestures |
Multimodal Emotion Recognition Based on | speech | and Physiological Signals Using Deep Neural Networks |
Multimodal Interface Framework for Using Hand Gestures and | speech | in Virtual Environment Applications, A |
Multimodal person authentication using | speech | , face and visual speech |
Multimodal person authentication using | speech | , face and visual speech |
Multiple classifier applied on predicting microsleep from | speech | |
Multiple statistical models for soft decision in noisy | speech | enhancement |
Multistream Articulatory Feature-Based Models for Visual | speech | Recognition |
Multistream Recognition of | speech | : Dealing With Unknown Unknowns |
Multitapering and a wavelet variant of MFCC in | speech | recognition |
Multitask Learning From Augmented Auxiliary Data for Improving | speech | Emotion Recognition |
Multivariate Autoregressive Spectrogram Modeling for Noisy | speech | Recognition |
Mutual Alignment between Audiovisual Features for End-to-End Audiovisual | speech | Recognition |
Mutual-optimization Towards Generative Adversarial Networks For Robust | speech | Recognition |
Nested U-Net With Self-Attention and Dense Connectivity for Monaural | speech | Enhancement, A |
Neural Emotion Director: | speech | -preserving semantic control of facial expressions in in-the-wild videos |
Neural network-based adaptive noise cancellation for enhancement of | speech | auditory brainstem responses |
Neurally Optimized Decoder for Low Bitrate | speech | Codec |
New Approach to Fourier Synthesis With Application to Neural Encoding and | speech | Classification, A |
New Approach to Integrate Audio and Visual Features of | speech | , A |
new approach to | speech | -input statistical translation, A |
New Encoding Algorithm for Distributed | speech | Recognition Based on DTFS Transform |
New feature weighting approaches for | speech | -act classification |
New Insights into the Kalman Filter Beamformer: Applications to | speech | and Robustness |
New Manifold Representation for Visual | speech | Recognition, A |
New Parameter of | speech | Character Based on the Bloomfield's Model, A |
New single-ended objective measure for non-intrusive | speech | quality evaluation |
New Visual | speech | Recognition Approach for RGB-D Cameras, A |
NMF-Based | speech | Enhancement Using Bases Update |
Noise Adaptive Stream Weighting in Audio-Visual | speech | Recognition |
Noise compensation in a person verification system using face and multiple | speech | features |
Noise Robust Front-end for | speech | Recognition Using Hough Transform and Cumulative Distribution Mapping, A |
Noise-Adaptive LDA: A New Approach for | speech | Recognition Under Observation Uncertainty |
Noise-Separated Adaptive Feature Distillation for Robust | speech | Recognition |
Non-Autoregressive Transformer for | speech | Recognition |
Non-Contact | speech | Recovery Technology Using a 24 GHz Portable Auditory Radar and Webcam |
Non-Intrusive Binaural | speech | Intelligibility Prediction From Discrete Latent Representations |
Non-intrusive | speech | -quality assessment using vocal-tract models |
Nonlinear Manifold Learning for Visual | speech | Recognition |
Normalized Training for HMM-based Visual | speech | Recognition |
Novel Approach to Very Fast and Noise Robust, Isolated Word | speech | Recognition, A |
Novel Data Independent Approach for Conversion of Hand Punched Kannada Braille Script to Text and | speech | , A |
Novel | speech | Emotion Recognition Method via Incomplete Sparse Least Square Regression, A |
Novel Statistical Model for | speech | Recognition and POS Tagging, A |
Novel Visual | speech | Representation and HMM Classification for Visual Speech Recognition, A |
Novel Visual | speech | Representation and HMM Classification for Visual Speech Recognition, A |
Objective Estimation of | speech | Quality for Communication Systems |
Obtaining | speech | assets for judgement analysis on low-pass filtered emotional speech |
Obtaining | speech | assets for judgement analysis on low-pass filtered emotional speech |
On Emotions as Features for | speech | Overlaps Classification |
On Factoring Out a Gesture Typology from the Bielefeld | speech | -and-Gesture-Alignment Corpus (SAGA) |
On Homotopy Continuation for | speech | Restoration |
On Optimal Linear Filtering of | speech | for Near-End Listening Enhancement |
On the Audio-visual Synchronization for Lip-to- | speech | Synthesis |
On the Compensation Between Magnitude and Phase in | speech | Separation |
On the Estimation of Fundamental Frequency From Nonstationary Noisy | speech | Signals Based on the Hilbert-Huang Transform |
On the Processing of Fuzzy Patterns for Text Independent Phonetic | speech | Segmentation |
On the Relationship between Face Movements, Tongue Movements, and | speech | Acoustics |
On the Robustness of Parametric Watermarking of | speech | |
On the Use of Computer Vision Techniques for Automatic | speech | Recognition |
On the use of different | speech | representations for speaker modeling |
On the Use of Time-Domain Widely Linear Filtering for Binaural | speech | Enhancement |
On Training | speech | Separation Models With Various Numbers of Speakers |
On-Line | speech | /Music Segmentation for Broadcast News Domain |
One-Pulse FEC Coding for Robust CELP-Coded | speech | Transmission Over Erasure Channels |
Online Animation System For Practicing Cued | speech | |
Online Automatic | speech | Recognition With Listen, Attend and Spell Model |
Online | speech | Dereverberation Using Mixture of Multichannel Linear Prediction Models |
Optimal residual frame based source modeling for HMM-based | speech | synthesis |
Optimized discriminative transformations for | speech | features based on minimum classification error |
Optimizing | speech | Intelligibility in a Noisy Environment: A unified view |
Other Related Papers, Audio, | speech | , Signal Processing, Pattern Recognition |
Over-Sampling Emotional | speech | Data Based on Subjective Evaluations Provided by Multiple Individuals |
Overview of compression and packet loss effects in | speech | biometrics |
Panel Tracking for the Extraction and the Classification of | speech | Balloons |
Parallel implementation of Artificial Neural Network training for | speech | recognition |
Parametric Representation of the Speaker's Lips for Multimodal Sign Language And | speech | Recognition |
Part-of- | speech | Tagging Based on Machine Translation Techniques |
Part-of- | speech | Tagging for Table of Contents Recognition |
Partial linear regression for | speech | -driven talking head application |
Particle filtering based pitch sequence correction for monaural | speech | segregation |
Patient-Provider Communication Training Models for Interactive | speech | Devices |
Perceptual Evaluation of Video-Realistic | speech | |
Perceptual Properties of Current | speech | Recognition Technology |
PFRNet: Dual-Branch Progressive Fusion Rectification Network for Monaural | speech | Enhancement |
Phase Estimation in Single Channel | speech | Enhancement Using Phase Decomposition |
Phase Processing for Single-Channel | speech | Enhancement: History and recent advances |
Phase-Sensitive Joint Learning Algorithms for Deep Learning-Based | speech | Enhancement |
phone-viseme dynamic Bayesian network for audio-visual automatic | speech | recognition, A |
Phoneme segmentation of | speech | |
Photorealistic adaptation and interpolation of facial expressions using HMMS and AAMS for audio-visual | speech | synthesis |
pilot study on augmented | speech | communication based on Electro-Magnetic Articulography, A |
Pipelined Recurrent Fuzzy Neural Networks for Nonlinear Adaptive | speech | Prediction |
Pitch Delay Based Adaptive Steganography for AMR | speech | Stream |
Pitch Detection Algorithms and Voiced/Unvoiced Classification for Noisy | speech | |
Pitch-Normalized Acoustic Features for Robust Children's | speech | Recognition |
Place Theory as an Alternative Solution in Automatic | speech | Recognition Tasks, The |
Polish Emotional | speech | Database: Recording and Preliminary Validation |
Power Exponent Based Weighting Criterion for DNN-Based Mask Approximation in | speech | Enhancement |
Practical Considerations for Real-Time Implementation of | speech | -Based Gender Detection |
Prediction-based classification for audiovisual discrimination between laughter and | speech | |
Principal Component Analysis of | speech | Spectrogram Images |
Probabilistic Class Histogram Equalization Based on Posterior Mean Estimation for Robust | speech | Recognition |
Probabilistic Kernels for Improved Text-to- | speech | Alignment in Long Audio Tracks |
QPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural | speech | -Driven Gesture Generation |
Quality-Aware Bag of Modulation Spectrum Features for Robust | speech | Emotion Recognition |
Quantifying Emotional Similarity in | speech | |
Quantitative Analysis of the Relative Local | speech | Rate |
Query expansion for imperfect | speech | : applications in distributed learning |
R-CNN Based Method to Localize | speech | Balloons in Comics, An |
R-Letter disorder diagnosis (R-LDD): Arabic | speech | database development for automatic diagnosis of childhood speech disorders (Case study) |
R-Letter disorder diagnosis (R-LDD): Arabic | speech | database development for automatic diagnosis of childhood speech disorders (Case study) |
Rate-Invariant Analysis of Trajectories on Riemannian Manifolds with Application in Visual | speech | Recognition |
Rate-invariant comparisons of covariance paths for visual | speech | recognition |
Re-Synchronization Using the Hand Preceding Model for Multi-Modal Fusion in Automatic Continuous Cued | speech | Recognition |
Reading to Listen at the Cocktail Party: Multi-Modal | speech | Separation |
Real-Time Lip Tracking for Audio-Visual | speech | Recognition Applications |
Real-Time Recognition of Affective States from Nonverbal Features of | speech | and Its Application for Public Speaking Skill Analysis |
Real-Time Scene Text to | speech | System, A |
Real-time sign language recognition and | speech | conversion using VGG16 |
Real-time | speech | -driven 3D face animation |
Real-Time Vision and | speech | Driven Avatars for Multimedia Applications |
Realistic Face Animation for Audiovisual | speech | Applications: A Densification Approach Driven by Sparse Stereo Meshes |
Realistic | speech | animation based on observed 3D face dynamics |
Realistic | speech | -Driven Facial Animation with GANs |
Recent advances in the automatic recognition of audiovisual | speech | |
Recognition of gestures in the context of | speech | |
Recognition of phonetic labels of the TIMIT | speech | corpus by means of an artificial neural network |
Recognition of visual | speech | elements using adaptively boosted hidden Markov models |
Recognizing Stress Using Semantics and Modulation of | speech | and Gestures |
Reconstructing | speech | From CNN Embeddings |
Reconstruction of Dysphonic | speech | by MELP |
Reconstruction-Based Visual-Acoustic-Semantic Embedding Method for | speech | -Image Retrieval, A |
Recurrent Neural Network Based Small-footprint Wake-up-word | speech | Recognition System with a Score Calibration Method |
Recurrent neural network | speech | predictor based on dynamical systems approach |
Reduced Universal Background Model for | speech | Recognition and Identification System |
Reduction of musical residual noise using perceptual tools with classic | speech | denoising techniques |
Regression based landmark estimation and multi-feature fusion for visual | speech | recognition |
Regularized Subspace Gaussian Mixture Models for | speech | Recognition |
reliable multidomain model for | speech | act classification, A |
Representation of | speech | in Deep Neural Networks, The |
Rescoring of N-Best Hypotheses Using Top-Down Selective Attention for Automatic | speech | Recognition |
Research and Design of Smart Home | speech | Recognition System Based on Deep Learning |
Research of Chain Model Based on CNN-TDNNF in Yulin Dialect | speech | Recognition, The |
Research of STRAIGHT Spectrogram and Difference Subspace Algorithm for | speech | Recognition |
Research on HMM_based | speech | synthesis for Lhasa dialect |
Research Progress in | speech | Enhancement Technology |
Researchers Push | speech | Recognition Toward the Mainstream |
Residual Excitation Skewness for Automatic | speech | Polarity Detection |
Resolution limits on visual | speech | recognition |
Restoration of Bone-Conducted | speech | With U-Net-Like Model and Energy Distance Loss |
Rethinking Algorithm Design and Development in | speech | Processing |
Reversible Audio Data Hiding Based on Variable Error-Expansion of Linear Prediction for Segmental Audio and G.711 | speech | |
review of recent advances in visual | speech | decoding, A |
ReVISE: Self-Supervised | speech | Resynthesis with Visual Input for Universal and Generalized Speech Regeneration |
ReVISE: Self-Supervised | speech | Resynthesis with Visual Input for Universal and Generalized Speech Regeneration |
RNN-Based | speech | -Music Discrimination Used for Hybrid Audio Coder, An |
Robot Command Interface Using an Audio-Visual | speech | Recognition System |
Robust and Fast Localization of Single | speech | Source Using a Planar Array |
Robust Arabic Multi-stream | speech | Recognition System in Noisy Environment |
Robust Audio-Visual Mandarin | speech | Recognition Based On Adaptive Decision Fusion And Tone Features |
Robust Audio-Visual | speech | Recognition Based on Hybrid Fusion |
Robust Audio-Visual | speech | Recognition Based on Late Integration |
Robust Audio-Visual | speech | Recognition Under Noisy Audio-Video Conditions |
Robust Automatic | speech | Recognition Using PD-MEEMLIN |
Robust Biometric Person Identification Using Automatic Classifier Fusion of | speech | , Mouth, and Face Experts |
Robust Face Frontalization For Visual | speech | Recognition* |
robust method for the Vietnamese handwritten and | speech | recognition, A |
Robust Parallel | speech | Recognition in Multiple Energy Bands |
Robust Pitch Extraction Method for the HMM-Based | speech | Synthesis System |
Robust Sensor Fusion: Analysis and Application to Audio-Visual | speech | Recognition |
Robust Speaker Verification via Asynchronous Fusion of | speech | and Lip Information |
Robust | speech | recognition using spatial-temporal feature distribution characteristics |
Robust telephone | speech | recognition based on channel compensation |
robust unsupervised pattern discovery and clustering of | speech | signals, A |
Robustness of linear discriminant analysis in automatic | speech | recognition |
Role of Long-Term Dependency in Synthetic | speech | Detection, The |
Role of Synthetically Generated Samples on | speech | Recognition in a Resource-Scarce Language |
Role of Vocal Persona in Natural and Synthesized | speech | , The |
RSD-GAN: Regularized Sobolev Defense GAN Against | speech | -to-Text Adversarial Attacks |
Salient Feature Extraction Algorithm for | speech | Emotion Recognition, A |
Say it to see it: A | speech | based immersive model retrieval system |
SC-CNN: Effective Speaker Conditioning Method for Zero-Shot Multi-Speaker Text-to- | speech | Systems |
Searching through a | speech | Memory for Text-Independent Speaker Verification |
Secure | speech | biometric templates for user authentication |
SEEG: Semantic Energized Co- | speech | Gesture Generation |
Selection of Unknown Objects Specified by | speech | Using Models Constructed from Web Images |
Selective HuBERT: Self-Supervised Pre-Training for Target Speaker in Clean and Mixture | speech | |
Self Supervised Adversarial Domain Adaptation for Cross-Corpus and Cross-Language | speech | Emotion Recognition |
Semi-blind | speech | -Music Separation Using Sparsity and Continuity Priors |
Semi-supervised | speech | -driven 3D Facial Animation via Cross-modal Encoding |
Sentence boundary detection in conversational | speech | transcripts using noisily labeled examples |
Separation of Audio-Visual | speech | Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli |
Separation of Audio-Visual | speech | Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli |
Session compensation using binary | speech | representation for speaker recognition |
SFNet: A Computationally Efficient Source Filter Model Based Neural | speech | Synthesis |
Signal subspace approach for narrowband noise reduction in | speech | |
Signal-Aware Parametric Quality Model for Audio and | speech | over IP Networks |
Signal-to-Signal Ratio Independent Speaker Identification for Co-channel | speech | Signals |
Significance of Empty | speech | Pauses: Cognitive and Algorithmic Issues, The |
Significance of Pitch-Based Spectral Normalization for Children's | speech | Recognition |
Simple Model of | speech | Communication and its Application to Intelligibility Enhancement, A |
Single Channel | speech | Separation Using Source-Filter Representation |
Single-Channel | speech | Separation Focusing on Attention DE |
Single-Input/Binaural-Output Antiphasic | speech | Enhancement Method for Speech Intelligibility Improvement, A |
Single-Input/Binaural-Output Antiphasic | speech | Enhancement Method for Speech Intelligibility Improvement, A |
SNAC: Speaker-Normalized Affine Coupling Layer in Flow-Based Architecture for Zero-Shot Multi-Speaker Text-to- | speech | |
So-DAS: A Two-Step Soft-Direction-Aware | speech | Separation Framework |
Some recent advances in | speech | recognition with potential applications in other statistical pattern recognition areas |
Some relations among stochastic finite state networks used in automatic | speech | recognition |
Something to Talk About: Signal Processing in | speech | and Audiology Research: Promising Investigations Explore New Opportunities in Human Communication |
source and channel coding approach to data hiding with application to hiding | speech | in video, A |
SPACE: | speech | -driven Portrait Animation with Controllable Expression |
Sparse Kernel Reduced-Rank Regression for Bimodal Emotion Recognition From Facial Expression and | speech | |
Speaker Attractor Network: Generalizing | speech | Separation to Unseen Numbers of Sources |
Speaker Extraction With Co- | speech | Gestures Cue |
Speaker identification security improvement by means of | speech | watermarking |
Speaker Independent Audio-Visual | speech | Recognition |
Speaker Modeling with Various | speech | Representations |
Speaker-aware Multi-Task Learning for automatic | speech | recognition |
Speaker-aware | speech | Emotion Recognition by Fusing Amplitude and Phase Information |
Speaker-Independent | speech | Animation Using Perceptual Loss Functions and Synthetic Data |
Speaker-independent | speech | Recognition by Means of Functional-link Neural Networks |
Spectral Domain | speech | Enhancement Using HMM State-Dependent Super-Gaussian Priors |
Spectral domain texture analysis for | speech | enhancement |
Spectral Features Based on Local Hu Moments of Gabor Spectrograms for | speech | Emotion Recognition |
Spectral Flatness Analysis for Emotional | speech | Synthesis and Transformation |
Spectral Tilt Estimation for | speech | Intelligibility Enhancement Using RNN Based on All-Pole Model |
SPECTRE: Visual | speech | -Informed Perceptual 3D Facial Expression Reconstruction from Videos |
Spectro-Temporal Filtering for Multichannel | speech | Enhancement in Short-Time Fourier Transform Domain |
| speech | Activity Detection in Naturalistic Audio Environments: Fearless Steps Apollo Corpus |
| speech | Analysis, other than Recognition |
| speech | Animation Using Coupled Hidden Markov Models |
| speech | Authentication and Recovery Scheme in Encrypted Domain |
| speech | authentication system using digital watermarking and pattern recovery |
| speech | Ballons in Comics, Comic Analysis, Panel Detection |
| speech | balloon and speaker association for comics and manga understanding |
| speech | Bandwidth Extension Using Recurrent Temporal Restricted Boltzmann Machines |
| speech | Based Approach to Surveillance Video Retrieval, A |
| speech | Based Shopping Assistance for the Blind |
| speech | Content Retrieval Model Based on Integrated Neural Network for Natural Language Description, A |
| speech | Denoising and Compensation for Hearing Aids Using an FTCRN-Based Metric GAN |
| speech | driven facial animation using a hidden markov coarticulation model |
| speech | driven lip synthesis using viseme based hidden markov models |
| speech | Driven Talking Face Generation From a Single Image and an Emotion Condition |
| speech | Driven Tongue Animation |
| speech | driven video editing via an audio-conditioned diffusion model |
| speech | Drives Templates: Co-Speech Gesture Synthesis with Learned Templates |
| speech | Drives Templates: Co-Speech Gesture Synthesis with Learned Templates |
| speech | Emotion Analysis in Noisy Real-World Environment |
| speech | emotion recognition based on kernel reduced-rank regression |
| speech | Emotion Recognition Enhanced Traffic Efficiency Solution for Autonomous Vehicles in a 5G-Enabled Space-Air-Ground Integrated Intelligent Transportation System |
| speech | emotion recognition model based on Bi-GRU and Focal Loss |
| speech | emotion recognition system based on genetic algorithm and neural network |
| speech | Emotion Recognition using a backward context |
| speech | Emotion Recognition Using Deep Convolutional Neural Network and Discriminant Temporal Pyramid Matching |
| speech | Emotion Recognition Using Fourier Parameters |
| speech | emotion recognition via learning analogies |
| speech | Emotion Recognition via Multi-Level Attention Network |
| speech | Enhancement Based on Deep Autoencoder for Remote Arabic Speech Recognition |
| speech | Enhancement Based on Deep Autoencoder for Remote Arabic Speech Recognition |
| speech | enhancement for in-vehicle voice control systems using wavelet analysis and blind source separation |
| speech | Enhancement Using a Two-Stage Network for an Efficient Boosting Strategy |
| speech | Enhancement with Nonstationary Acoustic Noise Detection in Time Domain |
| speech | Enhancement: A Review of Modern Methods |
| speech | frame recognition based on less shift sensitive wavelet filter banks |
| speech | Information Processing: Theory and Applications |
| speech | Intelligibility Enhancement By Non-Parallel Speech Style Conversion Using CWT and iMetricGAN Based CycleGAN |
| speech | Intelligibility Enhancement By Non-Parallel Speech Style Conversion Using CWT and iMetricGAN Based CycleGAN |
| speech | Intelligibility Estimation Method Using a Non-reference Feature Set, A |
| speech | Magnitude-Spectrum Information-Entropy (MSIE) for Automatic Speech Recognition in Noisy Environments |
| speech | Magnitude-Spectrum Information-Entropy (MSIE) for Automatic Speech Recognition in Noisy Environments |
| speech | music discrimination using class-specific features |
| speech | Personality Recognition Based on Annotation Classification Using Log-Likelihood Distance and Extraction of Essential Audio Features |
| speech | Privacy for Sound Surveillance Using Super-Resolution Based on Maximum Likelihood and Bayesian Linear Regression |
| speech | Quality Assessment Over Lossy Transmission Channels Using Deep Belief Networks |
| speech | recognition method based on feature distributions, A |
| speech | Recognition Moves from Software to Hardware |
| speech | Recognition of English by Japanese Using Lexicon Represented by Multiple Reduced Phoneme Sets |
| speech | Recognition of Mandarin Monosyllables |
| speech | Recognition Supported by Lip Analysis |
| speech | Recognition System of Arabic Alphabet Based on a Telephony Arabic Corpus |
| speech | recognition using fractals |
| speech | Recognition Using Long-Span Temporal Patterns in a Deep Network Model |
| speech | recognition with hierarchical recurrent neural networks |
| speech | Recognition, Neural Networks, CNN |
| speech | Recognition, Speech Analysis, Signal Processing |
| speech | Recognition, Speech Analysis, Signal Processing |
| speech | Separation from Background of Music Based on Single-channel Recording |
| speech | Signal Processing Based on Wavelets and SVM for Vocal Tract Pathology Detection |
| speech | Spectral Envelope Enhancement by HMM-Based Analysis/Resynthesis |
| speech | Synchronized Tongue Animation by Combining Physiology Modeling and X-ray Image Fitting |
| speech | Synthesis Approach for High Quality Speech Separation and Generation, A |
| speech | Synthesis Approach for High Quality Speech Separation and Generation, A |
| speech | Synthesis Based on Hidden Markov Models |
| speech | Synthesis for the Generation of Artificial Personality |
| speech | Synthesis With Mixed Emotions |
| speech | Synthesis, Synthetic Speech |
| speech | Synthesis, Synthetic Speech |
| speech | Time-Scale Modification With GANs |
| speech | understanding and dialog system with a homogeneous linguistic knowledge base, A |
| speech | Watermarking Method Based on Formant Tuning |
| speech | -assisted lip synchronization in audio-visual communications |
| speech | -Centric Information Processing: An Optimization-Oriented Approach |
| speech | -controlled animation system |
| speech | -Driven 3D Facial Animation with Implicit Emotional Awareness: A Deep Learning Approach |
| speech | -Driven Automatic Facial Expression Synthesis |
| speech | -Driven Expressive Talking Lips with Conditional Sequential Generative Adversarial Networks |
| speech | -driven face synthesis from 3D video |
| speech | -driven facial animation using a hierarchical model |
| speech | -Driven Facial Animation Using a Shared Gaussian Process Latent Variable Model |
| speech | -driven Facial Animation Using Cascaded Gans for Learning of Motion and Texture |
| speech | -Driven Facial Animation Using Manifold Relevance Determination |
| speech | -gesture driven multimodal interfaces for crisis management |
| speech | -To-Face Movement Synthesis Based on HMMS |
| speech | -to-Singing Voice Conversion: The Challenges and Strategies for Improving Vocal Conversion Processes |
| speech | -to-video synthesis using facial animation parameters |
| speech | -to-video synthesis using MPEG-4 compliant visual features |
| speech | -Video Synchronization Using Lips Movements and Speech Envelope Correlation |
| speech | -Video Synchronization Using Lips Movements and Speech Envelope Correlation |
| speech | -Visual Emotion Recognition by Fusing Shared and Specific Features |
| speech | -Visual Emotion Recognition via Modal Decomposition Learning |
| speech | /Gesture Interface to a Visual Computing Environment for Molecular Biologists |
| speech | /Music Classification Based on Distributed Evolutionary Fuzzy Logic for Intelligent Audio Coding |
| speech | /music discrimination for analysis of radio stations |
| speech | 2Lip: High-fidelity Speech to Lip Generation by Learning from a Short Video |
| speech | 4Mesh: Speech-Assisted Monocular 3D Facial Reconstruction for Speech-Driven 3D Facial Animation |
| speech | 4Mesh: Speech-Assisted Monocular 3D Facial Reconstruction for Speech-Driven 3D Facial Animation |
Split Bregman Approach to Linear Prediction Based Dereverberation With Enforced | speech | Sparsity |
Spontaneous | speech | Emotion Recognition Using Multiscale Deep Convolutional LSTM |
Spontaneous | speech | emotion recognition using prior knowledge |
Spotting words in silent | speech | videos: a retrieval-based approach |
SSSD: | speech | Scene database by Smart Device for Visual Speech Recognition |
SSSD: | speech | Scene database by Smart Device for Visual Speech Recognition |
Stable Implementation of Zero Frequency Filtering of | speech | Signals for Efficient Epoch Extraction |
Standardization-refinement domain adaptation method for cross-subject EEG-based classification in imagined | speech | recognition |
Statistical estimation of emotions in | speech | notes by featured term analogy |
Statistical Machine Translation for | speech | : A Perspective on Structures, Learning, and Decoding |
Statistical Parametric | speech | Synthesis Using Generalized Distillation Framework |
Steganalysis of Compressed | speech | Based on Markov and Entropy |
Stochastic Modelling: From Pattern Classification to | speech | Recognition and Translation |
Strategies to improve the performance of very low bit rate | speech | coders and application to a variable rate 1.2 kb/s codec |
Streaming End-to-End Multi-Talker | speech | Recognition |
Structural representation of | speech | for phonetic classification |
study of artificial | speech | quality assessors of VoIP calls subject to limited bursty packet losses, A |
Style Extractor For Facial Expression Recognition in the Presence of | speech | |
Style Transfer for Co- | speech | Gesture Animation: A Multi-speaker Conditional-mixture Approach |
Subband-Based Stationary-Component Suppression Method Using Harmonics and Power Ratio for Reverberant | speech | Recognition, A |
Subspace-Based Learning for Automatic Dysarthric | speech | Detection |
Supervised Learning Approach for Explicit Spatial Filtering of | speech | |
Supervised Monaural | speech | Enhancement Using Complementary Joint Sparse Representations |
Supervised single-channel | speech | dereverberation and denoising using a two-stage processing |
Support Vector Machine-Based Dynamic Network for Visual | speech | Recognition Applications, A |
Survey of Deep Representation Learning for | speech | Emotion Recognition |
Survey on | speech | emotion recognition: Features, classification schemes, and databases |
Switching Auxiliary Chains for | speech | Recognition based on Dynamic Bayesian Networks |
Switching Linear Dynamic Models for Noise Robust In-Car | speech | Recognition |
SylNet: An Adaptable End-to-End Syllable Count Estimator for | speech | |
Synchrony-Based Feature Extraction for Robust Automatic | speech | Recognition |
syntactic procedure for the recognition of glottal pulses in continuous | speech | , A |
Synthesising 3D Facial Motion from In-the-Wild | speech | |
Synthetic | speech | Detection Based on Local Autoregression and Variance Statistics |
Synthetic | speech | Detection Based on the Temporal Consistency of Speaker Features |
SynthVSR: Scaling Up Visual | speech | Recognition With Synthetic Supervision |
System and Analysis Used for a Dynamic Facial | speech | Deformation Model |
System and method for triphone-based unit selection for visual | speech | synthesis |
Talking About 3D Scenes: Integration of Image and | speech | Understanding in a Hybrid Distributed System |
Talking Face: Using Facial Feature Detection and Image Transformations for Visual | speech | |
Talking Heads, | speech | Driven Face Animation |
Taming Diffusion Models for Audio-Driven Co- | speech | Gesture Generation |
TCD-TIMIT: An Audio-Visual Corpus of Continuous | speech | |
Technical and Phonetic Aspects of | speech | Quality Assessment: The Case of Prosody Synthesis |
Telephone-Based | speech | Dialog Systems |
Temporal Envelope and Fine Structure Cues for Dysarthric | speech | Detection Using CNNs |
Temporal Measures of Hand and | speech | Coordination During French Cued Speech Production |
Temporal Measures of Hand and | speech | Coordination During French Cued Speech Production |
Temporal Modulation Normalization for Robust | speech | Feature Extraction and Recognition |
Temporal Multimodal Learning in Audiovisual | speech | Recognition |
Temporal Relation Inference Network for Multimodal | speech | Emotion Recognition |
Temporal Symbolic Integration Applied to a Multimodal System Using Gestures and | speech | |
Text Block Segmentation in Comic | speech | Bubbles |
Text- and | speech | -based phonotactic models for spoken language identification of Basque and Spanish |
Text-independent speaker identification using Radon and discrete cosine transforms based features from | speech | spectrogram |
Time Distributed Multiview Representation for | speech | Emotion Recognition |
Time-Delay Neural Networks for Estimating Lip Movements from | speech | Analysis: A Useful Tool in Audio Video Synchronization |
Time-Domain Multi-Modal Bone/Air Conducted | speech | Enhancement |
Time-Domain | speech | Separation Networks With Graph Encoding Auxiliary |
Time-Frequency Attention for | speech | Emotion Recognition with Squeeze-and-Excitation Blocks |
Towards a high quality Arabic | speech | synthesis system based on neural networks and residual excited vocal tract model |
Towards End-to-End Synthetic | speech | Detection |
Towards Estimating the Upper Bound of Visual- | speech | Recognition: The Visual Lip-Reading Feasibility Database |
Towards multilingual end-to-end | speech | recognition for air traffic control |
Towards query-by- | speech | handwritten keyword spotting |
Towards Robust Deep Neural Networks for Affect and Depression Recognition from | speech | |
Towards Zero-Shot Multi-Speaker Multi-Accent Text-to- | speech | Synthesis |
Tracking continuous emotional trends of participants during affective dyadic interactions using body language and | speech | information |
Tracking Discourse Topics in Co- | speech | Gesture |
Trainable videorealistic | speech | animation |
Transfer learning helps to improve the accuracy to classify patients with different | speech | disorders in different languages |
Transfer Linear Subspace Learning for Cross-Corpus | speech | Emotion Recognition |
Transformer-Based End-to-End Automatic | speech | Recognition Algorithm, A |
Transformer-Based End-to-End | speech | Translation With Rotary Position Embedding |
Translingual visual | speech | synthesis |
tutorial on Hidden Markov Models and selected applications in | speech | recognition, A |
Two features combination with gated recurrent unit for visual | speech | recognition |
Two technologies vie for recognition in | speech | market |
Two-Band Radial Postfiltering in Cepstral Domain with Application to | speech | Synthesis |
Two-Level Bimodal Association for Audio-Visual | speech | Recognition |
Two-Stage Learning and Fusion Network With Noise Aware for Time-Domain Monaural | speech | Enhancement |
Two-Stage Refinement of Magnitude and Complex Spectra for Real-Time | speech | Enhancement |
two-stage | speech | activity detection system considering fractal aspects of prosody, A |
U-Former: Improving Monaural | speech | Enhancement with Multi-head Self and Cross Attention |
UniEnc-CASSNAT: An Encoder-Only Non-Autoregressive ASR for | speech | SSL Models |
Unified Training of Feature Extractor and HMM Classifier for | speech | Recognition |
Unit Selection Using Linguistic, Prosodic and Spectral Distance for Developing Text-to- | speech | System in Hindi |
Universum Autoencoder-Based Domain Adaptation for | speech | Emotion Recognition |
Unpaired Image-to- | speech | Synthesis With Multimodal Information Bottleneck |
Unpaired | speech | Enhancement by Acoustic and Adversarial Supervision for Speech Recognition |
Unpaired | speech | Enhancement by Acoustic and Adversarial Supervision for Speech Recognition |
Unsupervised Cross-Corpus | speech | Emotion Recognition Using a Multi-Source Cycle-GAN |
Unsupervised Feature Learning for | speech | Using Correspondence and Siamese Networks |
Unsupervised Personalization of an Emotion Recognition System: The Unique Properties of the Externalization of Valence in | speech | |
Unsupervised | speech | Activity Detection Using Voicing Measures and Perceptual Spectral Flux |
Unsupervised | speech | Text Localization in Comic Images |
Unsupervised Tibetan | speech | features Learning based on Dynamic Bayesian Networks |
Use of Line Spectral Frequencies for Emotion Recognition from | speech | |
Use of radial basis function network with discrete wavelet transform for | speech | enhancement |
User Authentication System Based on | speech | and Cascade Hybrid Facial Feature |
User Verification by Combining | speech | and Face Biometrics in Video |
Using Adaptive Filter to Increase Automatic | speech | Recognition Rate in a Digit Corpus |
Using Hand Gesture and | speech | in a Multimodal Augmented Reality Environment |
Using Semantics to Automatically Generate | speech | Interfaces for Wearable Virtual and Augmented Reality Applications |
Using | speech | for Handwritten Mathematical Expression Recognition Disambiguation |
Using | speech | Input for Image Interpretation and Annotation |
Utterance Verification-Based Dysarthric | speech | Intelligibility Assessment Using Phonetic Posterior Features |
value of stories for | speech | -based video search, The |
Variable-Length Speaker Conditioning in Flow-Based Text-to- | speech | |
Vector quantization with memory and multi-labeling for isolated video-only automatic | speech | recognition |
Vector Taylor series based model adaptation using noisy | speech | trained hidden Markov models |
Vector-Based Feature Representations for | speech | Signals: From Supervector to Latent Vector |
Vector-to-Vector Regression via Distributional Loss for | speech | Enhancement |
Ventriloquist-Net: Leveraging | speech | Cues for Emotive Talking Head Generation |
very low bit rate codec for wide band | speech | based on a long-term perceptual harmonic plus noise model, A |
Video Augmentation for Improving Audio | speech | Recognition under Noise |
Video Rewrite: Driving Visual | speech | with Audio |
Video, Text, and | speech | -Driven Realistic 3-D Virtual Head for Human-Machine Interface, A |
VisageSynTalk: Unseen Speaker Video-to- | speech | Synthesis via Speech-Visage Feature Selection |
VisageSynTalk: Unseen Speaker Video-to- | speech | Synthesis via Speech-Visage Feature Selection |
Vision Based | speech | Animation Transferring with Underlying Anatomical Structure |
Visual display methods for in computer-animated | speech | production models |
Visual prosody: facial movements accompanying | speech | |
Visual Recognition of Activities, Gestures, Facial Expressions and | speech | : An Introduction and a Perspective |
Visual Skeleton and Reparative Attention for Part-of- | speech | image captioning system |
Visual | speech | Enhancement Without A Real Visual Stream |
visual | speech | model based on fuzzy-neuro methods, A |
Visual | speech | Recognition by Recurrent Neural Networks |
Visual | speech | Recognition Method Using Translation, Scale and Rotation Invariant Features |
Visual | speech | Recognition Using Dynamic Features And Support Vector Machines |
Visual | speech | Recognition Using Motion Features and Hidden Markov Models |
Visual | speech | Recognition Using PCA Networks and LSTMs in a Tandem GMM-HMM System |
Visual | speech | Recognition Using Weighted Dynamic Time Warping |
Visual | speech | Recognition with Loosely Synchronized Feature Streams |
Visual | speech | Synthesis by Morphing Visemes |
Visual | speech | Synthesis Using a Variable-Order Switching Shared Gaussian Process Dynamical Model |
Visual | speech | , a trajectory in viseme space |
Visual | speech | : A Physiological or Behavioural Biometric? |
Visual-to- | speech | conversion based on maximum likelihood estimation |
Visually Recognizing | speech | Using Eigen Sequences |
VisualVoice: Audio-Visual | speech | Separation with Cross-Modal Consistency |
Voice Conversion for Whispered | speech | Synthesis |
Voice of Leadership: Models and Performances of Automatic Analysis in Online | speech | es, The |
Voicing Detection in Noisy | speech | Signal |
Watch or Listen: Robust Audio-Visual | speech | Recognition with Visual Corruption Modeling and Reliability Scoring |
Watch to Listen Clearly: Visual | speech | Enhancement Driven Multi-modality Speech Recognition |
Watch to Listen Clearly: Visual | speech | Enhancement Driven Multi-modality Speech Recognition |
Watermarking-Based Perceptual Hashing Search Over Encrypted | speech | |
WavDepressionNet: Automatic Depression Level Prediction via Raw | speech | Signals |
WaveCRN: An Efficient Convolutional Recurrent Neural Network for End-to-End | speech | Enhancement |
Waveform Interpolation-Based | speech | Analysis/Synthesis for HMM-Based TTS Systems |
Wavelet | speech | Enhancement Based on Nonnegative Matrix Factorization |
Wavelet-FILVQ classifier for | speech | analysis |
WebVoice: A Toolkit for Perceptual Insights into | speech | Processing |
Weight-Space Viterbi Decoding Based Spectral Subtraction for Reverberant | speech | Recognition |
Whispered | speech | Detection in Noise Using Auditory-Inspired Modulation Spectrum Features |
Whispered | speech | Detection Using Fusion of Group-Delay-Based Subband Modulation Spectrum and Correntropy Features |
Wideband | speech | Encryption Based Arnold Cat Map for AMR-WB G.722.2 Codec |
Word Segments in Category-Based Language Models for Automatic | speech | Recognition |
Zero-Shot Keyword Spotting for Visual | speech | Recognition In-the-wild |
1063 for speech