Index for speec

_speech_
2.4kbps Multiband Characteristic Waveform Interpolation speech Coding Algorithm, A
2.5D Visual speech Synthesis Using Appearance Models
3-D Convolutional Recurrent Neural Networks With Attention Model for speech Emotion Recognition
3D Visual passcode: speech-driven 3D facial dynamics for behaviometrics
450bps speech Coding Algorithm Based on Multi-Mode Matrix Quantization, A
Accuracy, Apps Advance speech Recognition
Acoustic Analysis for Automatic speech Recognition
Acoustic echo cancellation for stereophonic systems derived from pairwise panning of monophonic speech
Acoustic Event Detection in speech Overlapping Scenarios Based on High-Resolution Spectral Input and Deep Learning
Acoustically Emotion-Aware Conversational Agent With speech Emotion Recognition and Empathetic Responses, The
Active Contour Model for speech Balloon Detection in Comics, An
Adaptation of Hidden Markov Models for Recognizing speech of Reduced Frame Rate
Adaptive Gain Control for Enhanced speech Intelligibility Under Reverberation
adaptive model of person identification combining speech and image information, An
Adaptive Signal Models for Wide-Band speech and Audio Compression
Adaptive speech Dereverberation Using Constrained Sparse Multichannel Linear Prediction
Adaptive speech enhancement with varying noise backgrounds
Adaptive speech Intelligibility Enhancement for Far-and-Near-end Noise Environments Based on Self-attention StarGAN
Adding Voicing Features into speech Recognition Based on HMM in Slovak
Advanced tools for speech synchronized animation
Adversarial Continual Learning to Transfer Self-Supervised speech Representations for Voice Pathology Detection
Adversarial Feature Learning and Unsupervised Clustering Based speech Synthesis for Found Data With Acoustic and Textual Noise
Adversarial Training Based speech Emotion Classifier With Isolated Gaussian Regularization, An
Affective Audio Annotation of Public speeches with Convolutional Clustering Neural Network
Affine-Invariant Visual Features Contain Supplementary Information to Enhance speech Recognition
Aging speech recognition with speaker adaptation techniques: Study on medium vocabulary continuous Bengali speech
Aging speech recognition with speaker adaptation techniques: Study on medium vocabulary continuous Bengali speech
AKVSR: Audio Knowledge Empowered Visual speech Recognition by Compressing Audio Knowledge of a Pretrained Model
Algorithms for syllabic hypothesization in continuous speech
Alias-and-Separate: Wideband speech Coding Using Sub-Nyquist Sampling and Speech Separation
Alias-and-Separate: Wideband speech Coding Using Sub-Nyquist Sampling and Speech Separation
Amazigh audiovisual speech recognition system design
Amazigh isolated word speech recognition system using the Adaptive Orthogonal Transform Method.
Analysing acoustic model changes for active learning in automatic speech recognition
Analysis and Classification of Cold speech Using Variational Mode Decomposition
Analysis of Emotion Annotation Strength Improves Generalization in speech Emotion Recognition Models
Analysis of Lip Geometric Features for Audio-Visual speech Recognition
Analysis of stressed human speech
analysis of the effect of combining standard and alternate sensor signals on recognition of syllabic units for multimodal speech recognition, An
Analysis of the Multifractal Nature of speech Signals
Analysis of the Possibilities to Adapt the Foreign Language speech Recognition Engines for the Lithuanian Spoken Commands Recognition
Analysis of the Utility of Classical and Novel speech Quality Measures for Speaker Verification
Anchor Models for Emotion Recognition from speech
Animating visible speech and facial expressions
AnyoneNet: Synchronized speech and Talking Head Generation for Arbitrary Persons
Application of Capsule Neural Network Based CNN for speech Emotion Recognition, The
Application of digit and speech recognition in food delivery robot
Application of support vector machines classifiers to visual speech recognition
Application of triphone clustering in acoustic modeling for continuous speech recognition in Bengali
Application of wavelet transforms for C/V segmentation on Mandarin speech signals
ARawNet: A Lightweight Solution for Leveraging Raw Waveforms in Spoof speech Detection
Architecture for Automatic Lipreading to Enhance speech Recognition, An
Art Critic: Multisignal Vision and speech Interaction System in a Gaming Context
Articulatory speech Re-synthesis: Profiting from Natural Acoustic Speech Data
Articulatory speech Re-synthesis: Profiting from Natural Acoustic Speech Data
ASQ: An Ultra-Low Bit Rate ASR-Oriented speech Quantization Method
Assessing speaker independence on a speech-based depression level estimation system
Asymmetric 3D face model for speech Language Pathologist applications
Asymmetrically boosted HMM for speech reading
Attention Based Speaker-independent Audio-visual Deep Learning Model for speech Enhancement, An
Attention-based convolutional neural network and long short-term memory for short-term detection of mood disorders based on elicited speech responses
Attention-Based Dense LSTM for speech Emotion Recognition
Audio Based Real-Time speech Animation of Embodied Conversational Agents
Audio Classification in speech and Music: A Comparison Between a Statistical and a Neural Approach
Audio Watermarks, speech Watermarks
Audio-visual continuous speech recognition using MPEG-4 compliant visual features
Audio-Visual Efficient Conformer for Robust speech Recognition
Audio-Visual Person Authentication with Multiple Visualized-speech Features and Multiple Face Profiles
Audio-Visual speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis
Audio-Visual speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis
Audio-Visual speech Fusion Using Coupled Hidden Markov Models
Audio-Visual speech Recognition Based on AAM Parameter and Phoneme Analysis of Visual Feature
Audio-Visual speech Recognition Scheme Based on Wavelets and Random Forests Classification
Audio-visual speech recognition techniques in augmented reality environments
Audio-Visual speech Recognition Using A Two-Step Feature Fusion Strategy
Audio-Visual speech Recognition Using MPEG-4 Compliant Visual Features
Audio-visual speech synchronization detection using a bimodal linear prediction model
Audio-Visual speech Synthesis Based on Chinese Visual Triphone
Audio2Gestures: Generating Diverse Gestures from speech Audio with Conditional Variational Autoencoders
Audiovisual Discrimination Between speech and Laughter: Why and When Visual Information Might Help
Audiovisual speech Source Separation: An overview of key methodologies
Audiovisual Talking Head for Augmented speech Generation: Models and Animations Based on a Real Speaker's Articulatory Data, An
Auditory Features Revisited for Robust speech Recognition
Autoencoder-based Unsupervised Domain Adaptation for speech Emotion Recognition
Automated Lip Synchronized speech Driven Facial Animation
Automated speech alignment for image synthesis
Automatic bi-modal emotion recognition system based on fusion of facial expressions and emotion extraction from speech
Automatic continuous speech recogniser for Dravidian languages using the auto associative neural network
Automatic Detection of Amyotrophic Lateral Sclerosis (ALS) from Video-Based Analysis of Facial Movements: speech and Non-Speech Tasks
Automatic Detection of Amyotrophic Lateral Sclerosis (ALS) from Video-Based Analysis of Facial Movements: speech and Non-Speech Tasks
Automatic Evaluation of Hypernasality and Consonant Misarticulation in Cleft Palate speech
Automatic Evaluation of speech Therapy Exercises Based on Image Data
Automatic Person Verification Using speech and Face Information
Automatic Selection of Visemes for Image-based Visual speech Synthesis
Automatic Sentence Modality Recognition in Children's speech, and Its Usage Potential in the Speech Therapy
Automatic Sentence Modality Recognition in Children's speech, and Its Usage Potential in the Speech Therapy
Automatic speaker verification on narrowband and wideband lossy coded clean speech
Automatic speech discrete labels to dimensional emotional values conversion method
Automatic speech Emotion Recognition Using Auditory Models with Binary Decision Tree and SVM
Automatic Urdu speech Recognition using Hidden Markov Model
Automatic Video Annotation by Mining speech Transcripts
Automatic visual speech segmentation and recognition using directional motion history images and Zernike moments
AVFormer: Injecting Vision into Frozen speech Models for Zero-Shot AV-ASR
Avoiding dominance of speaker features in speech-based depression detection
AWLloss: Speaker Verification Based on the Quality and Difficulty of speech
Bandwidth-adjusted LPC analysis for robust speech recognition
Bayesian Predictive Method for Automatic speech Segmentation, A
Bayesian reasoning on qualitative descriptions from images and speech
Beam-search Formant Tracking Algorithm Based on Trajectory Functions for Continuous speech
Beamforming Algorithm Based on Maximum Likelihood of a Complex Gaussian Distribution With Time-Varying Variances for Robust speech Recognition, A
Behavioral Signal Processing: Deriving Human Behavioral Informatics From speech and Language
Benchmarking classification models for emotion recognition in natural speech: A multi-corporal study
Bilingual speech Recognition by Estimating Speaker Geometry from Video Data
Bimodal fusion in audio-visual speech recognition
Biological Motion of speech
Blind Adaptive Mask to Improve Intelligibility of Non-Stationary Noisy speech
Blind Source Separation Based Approach for speech Enhancement in Noisy and Reverberant Environment, A
Boosted audio-visual HMM for speech reading
Building Naturalistic Emotionally Balanced speech Corpus by Retrieving Emotional Speech from Existing Podcast Recordings
Building Naturalistic Emotionally Balanced speech Corpus by Retrieving Emotional Speech from Existing Podcast Recordings
cache-based natural language model for speech recognition, A
Can we Automatically Transform speech Recorded on Common Consumer Devices in Real-World Environments into Professional Production Quality Speech?: A Dataset, Insights, and Challenges
Can we Automatically Transform speech Recorded on Common Consumer Devices in Real-World Environments into Professional Production Quality Speech?: A Dataset, Insights, and Challenges
Can We Read speech Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech Recognition
Can We Read speech Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech Recognition
Cancellable speech template via random binary orthogonal matrices projection hashing
Cascade Image Transform for Speaker Independent Automatic speech Reading, A
Casual chatter or speaking up? Adjusting articulatory effort in generation of speech and animation for conversational characters
Casual Conversations v2 Dataset: A diverse, large benchmark for measuring fairness and robustness in audio/vision/speech models, The
CAT-DUnet: Enhancing speech Dereverberation via Feature Fusion and Structural Similarity Loss
CATNet: Cross-modal fusion for audio-visual speech recognition
Chunk-Level speech Emotion Recognition: A General Framework of Sequence-to-One Dynamic Temporal Modeling
CIF-Based speech Segmentation Method for Streaming E2E ASR, A
Class Confusability Reduction in Audio-Visual speech Recognition Using Random Forests
Classification of Complex Information: Inference of Co-Occurring Affective States from Their Expressions in speech
Classifier-Based Learning of Nonlinear Feature Manifold for Visualization of Emotional speech Prosody
clump splitting based method to localize speech balloons in comics, A
Clustering Algorithm for the Fast Match of Acoustic Conditions in Continuous speech Recognition, A
Co-speech Gesture Detection through Multi-Phase Sequence Labeling
Co-speech Gesture Synthesis by Reinforcement Learning with Contrastive Pretrained Rewards
CodeTalker: speech-Driven 3D Facial Animation with Discrete Motion Prior
Combined Handwriting and speech Modalities for User Authentication
Combining Deep and Unsupervised Features for Multilingual speech Emotion Recognition
Combining handwriting and speech recognition for transcribing historical handwritten documents
Combining speech and Handwriting Modalities for Mathematical Expression Recognition
Combining speech energy and edge information for fast and efficient voice activity detection in noisy environments
Communicative Rhythm in Gesture and speech
Compact and Efficient Multitask Learning in Vision, Language and speech
Compact Representation of Visual speech Data Using Latent Variables, A
Comparative Experiments to Evaluate the Use of Syllables for the Improvement of Automatic Recognition of Dysarthric speech
Comparing Multiple Classifiers for speech-Based Detection of Self-Confidence: A Pilot Study
Comparison of Active Shape Model and Scale Decomposition Based Features for Visual speech Recognition, A
Comparison of Image Transform-Based Features for Visual speech Recognition in Clean and Corrupted Videos
Comparison of MPEG-4 Facial Animation Parameter Groups with Respect to Audio-Visual speech Recognition Performance
Comparison of Phoneme and Viseme Based Acoustic Units for speech Driven Realistic lip Animation
Complex Neural Spatial Filter: Enhancing Multi-Channel Target speech Separation in Complex Domain
computationally compact divergence measure for speech processing, A
Computer Assisted Transcription of speech
Concatenated Frame Image Based CNN for Visual speech Recognition
Conceptual and Lexical Factors in the Production of speech and Conversational Gestures: Neuropsychological Evidence
Conditional Random Fields in speech, Audio, and Language Processing
ConflictNET: End-to-End Learning for speech-Based Conflict Intensity Estimation
Connecting Subspace Learning and Extreme Learning Machine in speech Emotion Recognition
Constant-Q magnitude-phase coefficients extraction for synthetic speech detection
Constrained MMSE LP Residual Estimator for speech Dereverberation in Noisy Environments, A
Constructing speech processing systems on universal phonetic codes accompanied with reference acoustic models
Contextual and Cross-Modal Interaction for Multi-Modal speech Emotion Recognition
Contextual vector quantization for speech recognition with discrete hidden Markov model
Continual Learning for Personalized Co-speech Gesture Generation
Continuous Audio-Visual speech Recognition
Continuous Automatic speech Recognition by Lipreading
Continuous Estimation of Emotions in speech by Dynamic Cooperative Speaker Models
Continuous speech coding using coiflets wavelet
Controllable Multi-Lingual Multi-Speaker Multi-Style Text-to-speech Synthesis With Multivariate Information Minimization, A
Conversational Evaluation of speech Bandwidth Extension Using a Mobile Handset
Conversion of neutral speech to storytelling style speech
Conversion of neutral speech to storytelling style speech
Convolutional Network With Multi-Scale and Attention Mechanisms for End-to-End Single-Channel speech Enhancement, A
Convolutional Neural Networks for Distant speech Recognition
Correlation based speech-video synchronization
coupled HMM approach to video-realistic speech animation, A
Creating 3D speech-driven talking heads: a probabilistic network approach
CrisisHateMM: Multimodal Analysis of Directed and Undirected Hate speech in Text-Embedded Images from Russia-Ukraine Conflict
CroMM-VSR: Cross-Modal Memory Augmented Visual speech Recognition
Cross-Corpus speech Emotion Recognition Based on Domain-Adaptive Least-Squares Regression
Cross-Corpus speech Emotion Recognition Based on Few-Shot Learning and Domain Adaptation
Cross-Modal Analysis of speech, Gestures, Gaze and Facial Expressions
Cross-Speaker Emotion Transfer Through Information Perturbation in Emotional speech Synthesis
Cryptographic-speech-Key Generation Architecture Improvements
Cued speech Gesture Recognition: A First Prototype Based on Early Reduction
CWT-Based Approach for Epoch Extraction From Telephone Quality speech
Cyclic Defense GAN Against speech Adversarial Attacks
Cyclic Transfer Learning for Mandarin-English Code-Switching speech Recognition
Czech Spontaneous speech Collection and Annotation: The Database of Technical Lectures
Darspeech: An Automatic Speech Recognition System for the Moroccan Dialect
Data-Driven Jacobian Adaptation in a Multi-model Structure for Noisy speech Recognition
Dawn of the Transformer Era in speech Emotion Recognition: Closing the Valence Gap
DBATES: Dataset for Discerning Benefits of Audio, Textual, and Facial Expression Features in Competitive Debate speeches
DBN-based Spectral Feature Representation for Statistical Parametric speech Synthesis
Decision Level Fusion for Audio-Visual speech Recognition in Noisy Conditions
Deep Audio-Visual speech Recognition
Deep Belief Networks for Real-Time Extraction of Tongue Contours from Ultrasound During speech
Deep Cross-Modal Retrieval Between Spatial Image and Acoustic speech
Deep Hybrid Approach for Hate speech Analysis, A
Deep Learning for Acoustic Modeling in Parametric speech Generation: A systematic review of existing techniques and future trends
Deep Learning for Emotional speech Recognition
Deep Learning Loss Function Based on the Perceptual Evaluation of the speech Quality, A
DeepComboSAD: Spectro-Temporal Correlation Based speech Activity Detection for Naturalistic Audio Streams
Defining Laughter Context for Laughter Synthesis with Spontaneous speech Corpus
DeFT-AN: Dense Frequency-Time Attentive Network for Multichannel speech Enhancement
Demonstration of an HMM-based photorealistic expressive audio-visual speech synthesis system
Dense Convolutional Recurrent Neural Network for Generalized speech Animation
Detecting Aggression in Voice Using Inverse Filtered speech Features
Detecting Multiple Steganography Methods in speech Streams Using Multi-Encoder Network
Detecting Parkinson's disease with sustained phonation and speech signals using machine learning techniques
Detecting Unipolar and Bipolar Depressive Disorders from Elicited speech Responses Using Latent Affective Structure Model
Detection of a Speaker in Video by Combined Analysis of speech Sound and Mouth Movement
Detection of COVID-19 from speech signal using bio-inspired based cepstral features
Detection of Dynamic Structures of speech Fundamental Frequency in Tonal Languages
Detection of Vowel Offset Point From speech Signal
Device and method for dubbing an audio-visual presentation which generates synthesized speech and corresponding facial movements
Differentiable Mean Opinion Score Regularization for Perceptual speech Enhancement
DiffMotion: speech-Driven Gesture Synthesis Using Denoising Diffusion Model
DiffV2S: Diffusion-based Video-to-speech Synthesis with Vision-guided Speaker Embedding
Diphone spanish text-to-speech synthesizer
Direct Text to speech Translation System Using Acoustic Units
Disambiguation in Unknown Object Detection by Integrating Image and speech Recognition Confidences
Discriminating Unknown Objects from Known Objects Using Image and speech Information
Discrimination Between Native and Non-Native speech Using Visual Features Only
Discriminative Analysis of Lip Motion Features for Speaker Identification and speech-Reading
Discriminative Capacity and Phonetic Information of Bottleneck Features in speech
Discriminative feature extraction for speech recognition using continuous output codes
Discriminative Frequency Information Learning for End-to-End speech Anti-Spoofing
Discriminative Multi-Modality speech Recognition
Discriminative Training of NMF Model Based on Class Probabilities for speech Enhancement
Distilled non-semantic speech embeddings with binary neural networks for low-resource devices
Distributed Audio Network for speech Enhancement in Challenging Noise Backgrounds
Distributed Microphones speech Separation by Learning Spatial Information With Recurrent Neural Network
Djinn: Interaction Framework for Home Environment Using speech and Vision
DNN-Based Feature Enhancement Using DOA-Constrained ICA for Robust speech Recognition
DNN-Based Feature Extraction for Conflict Intensity Estimation From speech
Does Visual Self-Supervision Improve Learning of speech Representations for Emotion Recognition?
DR2: Disentangled Recurrent Representation Learning for Data-efficient speech Video Synthesis
Dynamic 3-D Visualization of Vocal Tract Shaping During speech
Dynamic Bayesian Networks for Audio-Visual speech Recognition
Dynamic versus Static Facial Expressions in the Presence of speech
Dynamic-static Cross Attentional Feature Fusion Method for speech Emotion Recognition
E2E-V2SResNet: Deep residual convolutional neural networks for end-to-end video driven speech synthesis
Effect of Various Visual speech Units on Language Identification Using Visual Speech Recognition
Effect of Various Visual speech Units on Language Identification Using Visual Speech Recognition
Effective online unsupervised adaptation of Gaussian mixture models and its application to speech classification
Effective Style Token Weight Control Technique for End-to-End Emotional speech Synthesis, An
Effectiveness of Mel Scale-Based ESA-IFCC Features for Classification of Natural vs. Spoofed speech
Efficient Framework for Constructing speech Emotion Corpus Based on Integrated Active Learning Strategies, An
Efficient Gaussian Mixture for speech Recognition
Efficient Generation of speech Adversarial Examples with Generative Model
Efficient HMM-Based Feature Enhancement Method With Filter Estimation for Reverberant speech Recognition, An
Efficient One-Pass Decoding with NNLM for speech Recognition
Efficient Representation Learning for Inner speech Domain Generalization
Efficient Sparse Banded Acoustic Models for speech Recognition
Efficient text analyser with prosody generator-driven approach for Mandarin text-to-speech
Efficient use of the grammar scale factor to classify incorrect words in speech recognition verification
Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for Low-Resource speech Recognition
EmoNet: A Transfer Learning Framework for Multi-Corpus speech Emotion Recognition
EmoTalk: speech-Driven Emotional Disentanglement for 3D Face Animation
Emotion Dependent Domain Adaptation for speech Driven Affective Facial Feature Synthesis
Emotion recognition from speech signals via a probabilistic echo-state network
Emotion Recognition of Affective speech Based on Multiple Classifiers Using Acoustic-Prosodic Information and Semantic Labels
Emotional speech Analysis on Nonlinear Manifold
Emotional speech Classification Based on Multi View Characterization
Emotional speech Clustering Based Robust Speaker Recognition System
Emotional speech Recognition Using Acoustic Models of Decomposed Component Words
End-to-End Audiovisual speech Recognition System With Multitask Learning
End-to-End Dual-Branch Network Towards Synthetic speech Detection
End-to-End Pathological speech Detection Using Wavelet Scattering Network
End-to-end Triplet Loss based Emotion Embedding System for speech Emotion Recognition
End-to-End Video-to-speech Synthesis Using Generative Adversarial Networks
End-to-end visual speech recognition for small-scale datasets
Enhanced VQ-Based Algorithms for speech Independent Speaker Identification
Enhancement of Spectral Tilt in Synthesized speech
Enhancing Emotion Classification Through speech and Correlated Emotional Sounds via a Variational Auto-Encoder Model with Prosodic Regularization
Enhancing Frequency Shifted speech Signals in Single Side-Band Communication
EPG2S: speech Generation and Speech Enhancement Based on Electropalatography and Audio Signals Using Multimodal Learning
EPG2S: speech Generation and Speech Enhancement Based on Electropalatography and Audio Signals Using Multimodal Learning
Error Mitigation Technique for Erasure Channels Based on a Wavelet Representation of the speech Excitation Signal, An
Error-Diffusion Based speech Feature Quantization for Small-Footprint Keyword Spotting
ESAformer: Enhanced Self-Attention for Automatic speech Recognition
Estimating speech Spectral Amplitude Based on the Nakagami Approximation
Estimation of Rapidly Time-Varying Harmonic Noise for speech Enhancement
Evaluation of Head Gaze Loosely Synchronized With Real-Time Synthetic speech for Social Robots
Evaluation of speech Emotion Classification Based on GMM and Data Fusion
Evaluation of the Concatenative Turkish Text-to-speech System
Evaluation of Visual speech Features for the Tasks of Speech and Speaker Recognition, An
Evaluation of Visual speech Features for the Tasks of Speech and Speaker Recognition, An
experimental study of energy dips for speech and music, An
Experimental Study on speech Enhancement Based on Deep Neural Networks, An
Experimental Study on Transfer Learning in Denoising Autoencoders for speech Enhancement
Experiments in dynamic programming inference of Markov networks with strings representing speech data
Explainability of speech Recognition Transformers via Gradient-Based Attention Visualization
Exploiting alternative acoustic sensors for improved noise robustness in speech communication
Exploiting speech for Automatic TV Delinearization: From Streams to Cross-Media Semantic Navigation
Exploiting speech/Gesture Co-occurrence for Improving Continuous Gesture Recognition in Weather Narration
Exploring Co-Occurence Between speech and Body Movement for Audio-Guided Video Localization
Exploring Hate speech Detection in Multimodal Publications
Exploring speech Features for Classifying Emotions along Valence Dimension
Exploring the Topics of Audio Words for Detecting Alzheimer's Disease From Spontaneous speech
Exploring Zero-Shot Emotion Recognition in speech Using Semantic-Embedding Prototypes
Expression-Preserving Face Frontalization Improves Visually Assisted speech Processing
Expressive Facial Animation Synthesis by Learning speech Coarticulation and Expression Spaces
Expressive Modulation of Neutral Visual speech
Expressive speech-Driven Lip Movements with Multitask Learning
Expressive visual text-to-speech as an assistive technology for individuals with autism spectrum conditions
Expressive Visual Text-to-speech Using Active Appearance Models
Extended Decision Tree with or Relationship for HMM-Based speech Synthesis
Extension of proposal of standards for intelligibility tests of Chinese speech: CDRT-tone
Extracting High Level Semantics by Means of speech, Audio, and Image Primitives in Surveillance Applications
F0 Parameterization of Glottalized Tones in HMM-Based speech Synthesis for Hanoi Vietnamese
FaceFormer: speech-Driven 3D Facial Animation with Transformers
Facial 3D Shape Estimation from Images for Visual speech Animation
Facial Expression Recognition in the Presence of speech Using Blind Lexical Compensation
Factorized MVDR Deep Beamforming for Multi-Channel speech Enhancement
Factors in Emotion Recognition With Deep Learning Models Using speech and Text on Multiple Corpora
Far-Field Automatic speech Recognition
Fast Object Class Labelling via speech
Fast, Diverse and Accurate Image Captioning Guided by Part-Of-speech
Feature Denoising Using Joint Sparse Representation for In-Car speech Recognition
Feature optimisation for stress recognition in speech
Feature Pooling of Modulation Spectrum Features for Improved speech Emotion Recognition in the Wild
Feature Selection Based Transfer Subspace Learning for speech Emotion Recognition
Feature selection methods for hidden Markov model-based speech recognition
Feature space video stream consistency estimation for dynamic stream weighting in audio-visual speech recognition
Features extraction and selection for emotional speech classification
Few-Shot Learning in Emotion Recognition of Spontaneous speech Using a Siamese Neural Network With Adaptive Sample Pair Formation
Finding Lips in Unconstrained Imagery for Improved Automatic speech Recognition
Fine-Grained Action Retrieval Through Multiple Parts-of-speech Embeddings
First degree heart block determination from speech analysis
Frame-synchronous noise compensation for hands-free speech recognition in car environments
From Bottom to Top: A Coordinated Feature Representation Method for speech Recognition
From speech Quality Measures to Speaker Recognition Performance
From Text to speech: A Multimodal Cross-Domain Approach for Deception Detection
FSCNet: Feature-Specific Convolution Neural Network for Real-Time speech Enhancement
FSER: Deep Convolutional Neural Networks for speech Emotion Recognition
Fundamental Technologies in Modern speech Recognition
Furcanext: End-to-end Monaural speech Separation with Dynamic Gated Dilated Temporal Convolutional Networks
Fused speech Enhancement Framework for Robust Speaker Verification, A
Fusing Audio and Visual Features of speech
Fusion of Audio-Visual Information for Integrated speech Processing
Fusion of Face and speech Data for Person Identity Verification
Fusion of speech, Faces and Text for Person Identification in TV Broadcast
Fuzzy integral based information fusion for classification of highly confusable non-speech sounds
Fuzzy rule selection using Iterative Rule Learning for speech data classification
GA Approaches to HMM Optimization for Automatic speech Recognition
Gabor Filterbank Features for Robust speech Recognition
Gammatone Cepstral Coefficients: Biologically Inspired Features for Non-speech Audio Classification
GAN-in-GAN for Monaural speech Enhancement
Gaussian Specific Compensation for Channel Distortion in speech Recognition
Gender classification in two Emotional speech databases
Generalized Two-Stage Rank Regression Framework for Depression Score Prediction from speech
Generating Co-speech Gestures for the Humanoid Robot NAO through BML
Generating Holistic 3D Human Motion from speech
Generating Personalized Virtual Agent in speech Dialogue System for People with Dementia
Generating realistic facial animation from speech
Generating Transferable Adversarial Examples for speech Classification
Genetic Algorithm-Based Adaptive Wiener Gain for speech Enhancement Using an Iterative Posterior NMF
geostatistical model for linear prediction analysis of speech, A
GesRec3D: A Real-Time Coded Gesture-to-speech System with Automatic Segmentation and Recognition Thresholding Using Dissimilarity Measures
Gesture, speech, and Gaze Cues for Discourse Segmentation
Gestures and Lip Shape Integration for Cued speech Recognition
Global Variance in speech Synthesis With Linear Dynamical Models
Graphical speech Training system for hearing impaired
Group Delay based Methods for Detection and Recognition of Whispered speech
GRU-SVM Model for Synthetic speech Detection
Guest Editorial: Special Issue on Affective speech and Language Synthesis, Generation, and Conversion
GUI for interactive speech synthesis
Harmonic Enhancement with Noise Reduction of speech Signal by Comb Filtering
Head Movements in Context of speech during Stress Induction
Hidden Bawls, Whispers, and Yelps: Can Text Convey the Sound of speech, Beyond Words?
Hidden Conditional Random Fields for Visual speech Recognition
Hierarchical Bayesian combination of plug-in maximum a posteriori decoders in deep neural networks-based speech recognition and speaker adaptation
hierarchical Bayesian model for continuous speech recognition, A
Hierarchical speech-act classification for discourse analysis
hierarchical tag-graph search scheme with layered grammar rules for spontaneous speech understanding, A
High-frame-rate real-time imaging of speech production
Higher Order Subspace Algorithm for Multichannel speech Enhancement, A
Highly Transparent Steganography Scheme of speech Signals into Color Images Using Quantization Index Modulation
Historical Perspective of speech Recognition, A
HMM based speech-driven 3D tongue animation
HNM-Based Speaker-Nonspecific Timbre Transformation Scheme for speech Synthesis, An
Hough transform-based mouth localization for audio-visual speech recognition
Human emotion recognition by optimally fusing facial expression and speech feature
hybrid approach to improve part of speech tagging system, An
Hybrid Autoregressive and Non-Autoregressive Transformer Models for speech Recognition
Hybrid HMM-Based speech Recognizer Using Kernel-Based Discriminants as Acoustic Models, A
Hybrid PNN-GMM classification scheme for speech emotion recognition, A
hybrid SVM/DDBHMM decision fusion modeling for robust continuous digital speech recognition, A
hybrid visual feature extraction method for audio-visual speech recognition, A
IBM Rich Transcription 2007 speech-to-Text Systems for Lecture Meetings, The
IDANet: An Information Distillation and Aggregation Network for speech Enhancement
IEEE Acoustics, speech, and Signal Processing Magazine
IEEE Trans. Acoustics, speech, and Signal Processing
Image Caption Generation with Part of speech Guidance
Image-Based Visual speech Animation System, An
Image-Sensitive Language Modeling for Automatic speech Recognition
Image-speech combination for interactive computer assisted transcription of handwritten documents
Imitator: Personalized speech-driven 3D Facial Animation
Impact of imperfect OCR on part-of-speech tagging
Impact of OCR Errors on Automated Classification of OCR Japanese Texts with Parts-of-speech Analysis, An
Impact of Reduced Video Quality on Visual speech Recognition, The
Implantation of voicing on whispered speech using frequency-domain parametric modelling of source and filter information
Implementation of Three Text to speech Systems for Kurdish Language
Implicit Compositional Generative Network for Length-Variable Co-speech Gesture Synthesis
Improve Word Mover's Distance with Part-of-speech Tagging
improved maximum model distance approach for HMM-based speech recognition systems, An
Improved speech Reconstruction from Silent Video
Improvement of speech emotion recognition with neural network classifier by using speech spectrogram
Improvement of speech emotion recognition with neural network classifier by using speech spectrogram
Improvements on Automatic speech Segmentation at the Phonetic Level
Improving and Aligning speech with Presentation Slides
Improving Children's speech Recognition by HMM Interpolation with an Adults' Speech Recognizer
Improving Children's speech Recognition by HMM Interpolation with an Adults' Speech Recognizer
Improving Cross-Corpus speech Emotion Recognition with Adversarial Discriminative Domain Generalization (ADDoG)
Improving End-to-End Contextual speech Recognition via a Word-Matching Algorithm With Backward Search
Improving Frame-Online Neural speech Enhancement With Overlapped-Frame Prediction
Improving GANs for speech Enhancement
Improving Mandarin End-to-End speech Recognition With Word N-Gram Language Model
Improving Monaural speech Enhancement by Mapping to Fixed Simulation Space With Knowledge Distillation
Improving Multimodal speech Recognition by Data Augmentation and Speech Representations
Improving Multimodal speech Recognition by Data Augmentation and Speech Representations
Improving speech Related Facial Action Unit Recognition by Audiovisual Information Fusion
Improving the Classification of Volcanic Seismic Events Extracting New Seismic and speech Features
Improving the Performance of Deep Learning Based speech Enhancement System Using Fuzzy Restricted Boltzmann Machine
Improving the speech Quality of VoIP by Packet Prioritization
Increasing Compactness of Deep Learning Based speech Enhancement Models With Parameter Pruning and Quantization Techniques
Incremental Text-to-speech Synthesis Using Pseudo Lookahead With Large Pretrained Language Model
Individual 3d Face Synthesis Based on Orthogonal Photos and speech-driven Facial Animation
Individualized Super-Gaussian Single Microphone speech Enhancement for Hearing Aid Users With Smartphone as an Assistive Device, An
Inducing Genuine Emotions in Simulated speech-Based Human-Machine Interaction: The NIMITEK Corpus
Influence of Hangover and Hangbefore Criteria on Automatic speech Recognition
Influence of speech/Non-Speech Segmentation on On-Line and Off-Line Speaker Segmentation Accuracy, The
Influence of speech/Non-Speech Segmentation on On-Line and Off-Line Speaker Segmentation Accuracy, The
Information Fusion and Person Verification Using speech and Face Information
Information-Extraction Approach to speech Processing: Analysis, Detection, Verification, and Recognition, An
Instrumental Assessment of Prosodic Quality for Text-to-speech Signals
Integrated analysis of speech and images as a probabilistic decoding process
Integrated Mining of Visual Features, speech Features, and Frequent Patterns for Semantic Video Annotation
Integrated neural network model for identifying speech acts, predicators, and sentiments of dialogue utterances
Integrating Binary Mask Estimation With MRF Priors of Cochleagram for speech Separation
Integrating Part of speech Guidance for Image Captioning
Integration of Vision and speech Understanding Using Bayesian Networks
Intelligibility Enhancement Via Normal-to-Lombard speech Conversion With Long Short-Term Memory Network and Bayesian Gaussian Mixture Model
Intelligibility improvements using binaural diverse sub-band processing applied to speech corrupted with automobile noise
Intelligibility of Children with Cleft Lip and Palate: Evaluation by speech Recognition Techniques
Inter-frame contextual modelling for visual speech recognition
Interaction between speech and Gesture: Strategies for Pointing to Distant Objects
Interaction framework for home environment using speech and vision
Interaction of Iconic Gesture and speech in Talk, The
Interaction With Gaze, Gesture, and speech in a Flexibly Configurable Augmented Reality System
Interdependencies among Voice Source Parameters in Emotional speech
Interference Reduction in Reverberant speech Separation With Visual Voice Activity Detection
Intra-Predictive Switched Split Vector Quantization of speech Spectra
Introduction to the Special Issue: Advances on pattern recognition for speech and audio processing
Investigation into Audiovisual speech Correlation in Reverberant Noisy Environments, An
Investigation of Partition-Based and Phonetically-Aware Acoustic Features for Continuous Emotion Prediction from speech, An
Investigation of speech Landmark Patterns for Depression Detection
Invited paper: Automatic speech recognition: History, methods and challenges
ISL RT-07 speech-to-Text System, The
Isolate speech Recognition Based on Time-Frequency Analysis Methods
Isolated word recognition by neural network models with cross-correlation coefficients for speech dynamics
Iterative Closed-Loop Phase-Aware Single-Channel speech Enhancement
Iterative Feature Normalization Scheme for Automatic Emotion Detection from speech
Joint Bayesian Estimation of Time-Varying LP Parameters and Excitation for speech
KAN-AV dataset for audio-visual face and speech analysis in the wild
Kernel Eigenvoices (Revisited) for Large-Vocabulary speech Recognition
Key Frame Mechanism for Efficient Conformer Based End-to-End speech Recognition
Keyword Detection for Spontaneous speech
Kinect Development Kit: A Toolkit for Gesture- and speech-Based Human-Machine Interaction
Language-Independent OCR Using a Continuous speech Recognition System
Large Vocabulary Audio-visual speech Recognition Using Active Shape Models
Large Vocabulary Audio-Visual speech Recognition Using the Janus Speech Recognition Toolkit
Large Vocabulary Audio-Visual speech Recognition Using the Janus Speech Recognition Toolkit
Large Vocabulary Continuous speech Recognition With Reservoir-Based Acoustic Models
Large-Vocabulary Continuous speech Recognition Systems: A Look at Some Recent Advances
Late pre-dereverberation for speech intelligibility enhancement in public address systems
Latency in speech Feature Analysis for Telepresence Event Coding
Learning Contextually Fused Audio-Visual Representations for Audio-Visual speech Recognition
Learning Continuous Facial Actions From speech for Real-Time Animation
Learning Hierarchical Cross-Modal Association for Co-speech Gesture Generation
Learning Individual Speaking Styles for Accurate Lip to speech Synthesis
Learning Landmarks Motion from speech for Speaker-agnostic 3d Talking Heads Generation
Learning Salient Features for speech Emotion Recognition Using Convolutional Neural Networks
Learning Speaker-specific Lip-to-speech Generation
Learning Torso Prior for Co-speech Gesture Generation with Better Hand Shape
Learning Visual speech
Learning With Learned Loss Function: speech Enhancement With Quality-Net to Improve Perceptual Evaluation of Speech Quality
Learning With Learned Loss Function: speech Enhancement With Quality-Net to Improve Perceptual Evaluation of Speech Quality
Letter-To-Sound conversion for speech synthesizer
Leveraging Non-Causal Knowledge via Cross-Network Knowledge Distillation for Real-Time speech Enhancement
LFEformer: Local Feature Enhancement Using Sliding Window With Deformability for Automatic speech Recognition
Linked Source and Target Domain Subspace Feature Transfer Learning -- Exemplified by speech Emotion Recognition
Lip Movement Synthesis from speech Based on Hidden Markov Models
Lip Reading for Low-resource Languages by Learning and Combining General speech Knowledge and Language-specific Knowledge
Lip Shape and Hand Position Fusion for Automatic Vowel Recognition in Cued speech for French
Lip2Vec: Efficient and Robust Visual speech Recognition via Latent-to-Latent Visual to Audio Representation Mapping
Listen and Look: Audio-Visual Matching Assisted speech Source Separation
Listening with Your Eyes: Towards a Practical Visual speech Recognition System Using Deep Boltzmann Machines
Lite-RTSE: Exploring a Cost-Effective Lite DNN Model for Real-Time speech Enhancement in RTC Scenarios
LivelySpeaker: Towards Semantic-Aware Co-speech Gesture Generation
LM-VC: Zero-Shot Voice Conversion via speech Generation Based on Language Models
Localizing Fake Segments in speech
Locally Normalized Filter Banks Applied to Deep Neural-Network-Based Robust speech Recognition
Locating and Tracking Facial speech Features
Long-Frame-Shift Neural speech Phase Prediction With Spectral Continuity Enhancement and Interpolation Error Compensation
Look&listen: Multi-Modal Correlation Learning for Active Speaker Detection and speech Enhancement
Looking into Your speech: Learning Cross-modal Affinity for Audio-visual Speech Separation
Looking into Your speech: Learning Cross-modal Affinity for Audio-visual Speech Separation
Low-Complexity Parabolic Lip Contour Model With Speaker Normalization for High-Level Feature Extraction in Noise-Robust Audiovisual speech Recognition, A
Low-Rank and Sparsity Analysis Applied to speech Enhancement Via Online Estimated Dictionary
Low-Resource Adaptation for Personalized Co-speech Gesture Generation
M3TTS: Multi-modal text-to-speech of multi-scale style control for dubbing
Mandarin Emotional speech Recognition Based on SVM and NN
Mandarin Text-to-speech Front-End With Lightweight Distilled Convolution Network
Marathi Language speech Synthesizer Using Concatenative Synthesis Strategy (Spoken in Maharashtra, India)
Markov random field model for automatic speech recognition, A
Mathematical Modeling of the Effects of speech Warning Characteristics on Human Performance and Its Application in Transportation Cyberphysical Systems
maximum model distance approach for HMM-based speech recognition, A
Maximum Phase Modeling for Sparse Linear Prediction of speech
Memory Attention: Robust Alignment Using Gating Mechanism for End-to-End speech Synthesis
MES-P: An Emotional Tonal speech Dataset in Mandarin with Distal and Proximal Labels
MeshTalk: 3D Face Animation from speech using Cross-Modality Disentanglement
Method and apparatus for producing audio-visual synthetic speech
Method and apparatus for synthetic speech in facial animation
Methodology for Acoustic Characterization of a Labial Constraint in speech Production
Methods and devices for producing and using synthetic visual speech based on natural coarticulation
Micro-Doppler Classification for Ground Surveillance Radar Using speech Recognition Tools
Microphone Array Processing Strategies for Distant-Based Automatic speech Recognition
Minimized Database of Unit Selection in Visual speech Synthesis without Loss of Naturalness
MixCycle: Unsupervised speech Separation via Cyclic Mixture Permutation Invariant Training
Mixed bayesian networks with auxiliary variables for automatic speech recognition
Mixspeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition
Mixture of Factor Analyzers Using Priors From Non-Parallel speech for Voice Conversion
Mixture of Support Vector Machines for HMM based speech Recognition
Mixtures of Local Dictionaries for Unsupervised speech Enhancement
Model-Based Localization Method by Non-speech Sound Via Wavelet Transform and Dynamic Neural Network
Modeling and Synthesis of Facial Motion Driven by speech
Modeling Feature Representations for Affective speech Using Generative Adversarial Networks
Modeling human activities as speech
Modeling of Physical Characteristics of speech under Stress
Modeling Syllable-Based Pronunciation Variation for Accented Mandarin speech Recognition
Modeling the Temporal Evolution of Acoustic Parameters for speech Emotion Recognition
Modeling Vocal Entrainment in Conversational speech Using Deep Unsupervised Learning
Modelling and combining emotions, visual speech and gestures in virtual head models
Modelling Combined Handwriting and speech Modalities
Models for the Perception of speech and Visual Form
Mono-font Cursive Arabic Text Recognition Using speech Recognition System
More than Words: In-the-Wild Visually-Driven Prosody for Text-to-speech
Moroccan Dialect speech Recognition System Based on CMU SphinxTools
Morpheme-Based Automatic speech Recognition of Basque
Morphological normalization of vowel images for articulatory speech recognition
Moving-Talker, Speaker-Independent Feature Study, and Baseline Results Using the CUAVE Multimodal speech Corpus
Multi-environment model adaptation based on vector Taylor series for robust speech recognition
Multi-Font Off-Line Arabic Character Recognition Using the BBN Byblos speech Recognition System
Multi-Label speech Emotion Recognition via Inter-Class Difference Loss Under Response Residual Network
Multi-layer encoder-decoder time-domain single channel speech separation
Multi-lingual and Multi-modal speech Processing and Applications
Multi-Modal Human Verification Using Face and speech
Multi-modal information retrieval from broadcast video using OCR and speech recognition
Multi-modality Associative Bridging through Memory: speech Sound Recollected from Face Video
Multi-task multimodal feature refinement for emotional speech animation
Multi-Task Semi-Supervised Adversarial Autoencoding for speech Emotion Recognition
Multi-view visual speech recognition based on multi task learning
Multichannel filters for speech recognition using a particle swarm optimization
Multilevel Integration of Vision and speech Understanding Using Bayesian Networks
Multimedia Document Retrieval Using speech and Speaker Recognition
Multimodal biometric authentication using speech and hand geometry fusion
Multimodal Biometric System Using Fingerprint, Face and speech, A
Multimodal Database of Emotional speech, Video and Gestures
Multimodal Emotion Recognition Based on speech and Physiological Signals Using Deep Neural Networks
Multimodal Interface Framework for Using Hand Gestures and speech in Virtual Environment Applications, A
Multimodal person authentication using speech, face and visual speech
Multimodal person authentication using speech, face and visual speech
Multiple classifier applied on predicting microsleep from speech
Multiple statistical models for soft decision in noisy speech enhancement
Multistream Articulatory Feature-Based Models for Visual speech Recognition
Multistream Recognition of speech: Dealing With Unknown Unknowns
Multitapering and a wavelet variant of MFCC in speech recognition
Multitask Learning From Augmented Auxiliary Data for Improving speech Emotion Recognition
Multivariate Autoregressive Spectrogram Modeling for Noisy speech Recognition
Mutual Alignment between Audiovisual Features for End-to-End Audiovisual speech Recognition
Mutual-optimization Towards Generative Adversarial Networks For Robust speech Recognition
Nested U-Net With Self-Attention and Dense Connectivity for Monaural speech Enhancement, A
Neural Emotion Director: speech-preserving semantic control of facial expressions in in-the-wild videos
Neural network-based adaptive noise cancellation for enhancement of speech auditory brainstem responses
Neurally Optimized Decoder for Low Bitrate speech Codec
New Approach to Fourier Synthesis With Application to Neural Encoding and speech Classification, A
New Approach to Integrate Audio and Visual Features of speech, A
new approach to speech-input statistical translation, A
New Encoding Algorithm for Distributed speech Recognition Based on DTFS Transform
New feature weighting approaches for speech-act classification
New Insights into the Kalman Filter Beamformer: Applications to speech and Robustness
New Manifold Representation for Visual speech Recognition, A
New Parameter of speech Character Based on the Bloomfield's Model, A
New single-ended objective measure for non-intrusive speech quality evaluation
New Visual speech Recognition Approach for RGB-D Cameras, A
NMF-Based speech Enhancement Using Bases Update
Noise Adaptive Stream Weighting in Audio-Visual speech Recognition
Noise compensation in a person verification system using face and multiple speech features
Noise Robust Front-end for speech Recognition Using Hough Transform and Cumulative Distribution Mapping, A
Noise-Adaptive LDA: A New Approach for speech Recognition Under Observation Uncertainty
Noise-Separated Adaptive Feature Distillation for Robust speech Recognition
Non-Autoregressive Transformer for speech Recognition
Non-Contact speech Recovery Technology Using a 24 GHz Portable Auditory Radar and Webcam
Non-Intrusive Binaural speech Intelligibility Prediction From Discrete Latent Representations
Non-intrusive speech-quality assessment using vocal-tract models
Nonlinear Manifold Learning for Visual speech Recognition
Normalized Training for HMM-based Visual speech Recognition
Novel Approach to Very Fast and Noise Robust, Isolated Word speech Recognition, A
Novel Data Independent Approach for Conversion of Hand Punched Kannada Braille Script to Text and speech, A
Novel speech Emotion Recognition Method via Incomplete Sparse Least Square Regression, A
Novel Statistical Model for speech Recognition and POS Tagging, A
Novel Visual speech Representation and HMM Classification for Visual Speech Recognition, A
Novel Visual speech Representation and HMM Classification for Visual Speech Recognition, A
Objective Estimation of speech Quality for Communication Systems
Obtaining speech assets for judgement analysis on low-pass filtered emotional speech
Obtaining speech assets for judgement analysis on low-pass filtered emotional speech
On Emotions as Features for speech Overlaps Classification
On Factoring Out a Gesture Typology from the Bielefeld speech-and-Gesture-Alignment Corpus (SAGA)
On Homotopy Continuation for speech Restoration
On Optimal Linear Filtering of speech for Near-End Listening Enhancement
On the Audio-visual Synchronization for Lip-to-speech Synthesis
On the Compensation Between Magnitude and Phase in speech Separation
On the Estimation of Fundamental Frequency From Nonstationary Noisy speech Signals Based on the Hilbert-Huang Transform
On the Processing of Fuzzy Patterns for Text Independent Phonetic speech Segmentation
On the Relationship between Face Movements, Tongue Movements, and speech Acoustics
On the Robustness of Parametric Watermarking of speech
On the Use of Computer Vision Techniques for Automatic speech Recognition
On the use of different speech representations for speaker modeling
On the Use of Time-Domain Widely Linear Filtering for Binaural speech Enhancement
On Training speech Separation Models With Various Numbers of Speakers
On-Line speech/Music Segmentation for Broadcast News Domain
One-Pulse FEC Coding for Robust CELP-Coded speech Transmission Over Erasure Channels
Online Animation System For Practicing Cued speech
Online Automatic speech Recognition With Listen, Attend and Spell Model
Online speech Dereverberation Using Mixture of Multichannel Linear Prediction Models
Optimal residual frame based source modeling for HMM-based speech synthesis
Optimized discriminative transformations for speech features based on minimum classification error
Optimizing speech Intelligibility in a Noisy Environment: A unified view
Other Related Papers, Audio, speech, Signal Processing, Pattern Recognition
Over-Sampling Emotional speech Data Based on Subjective Evaluations Provided by Multiple Individuals
Overview of compression and packet loss effects in speech biometrics
Panel Tracking for the Extraction and the Classification of speech Balloons
Parallel implementation of Artificial Neural Network training for speech recognition
Parametric Representation of the Speaker's Lips for Multimodal Sign Language And speech Recognition
Part-of-speech Tagging Based on Machine Translation Techniques
Part-of-speech Tagging for Table of Contents Recognition
Partial linear regression for speech-driven talking head application
Particle filtering based pitch sequence correction for monaural speech segregation
Patient-Provider Communication Training Models for Interactive speech Devices
Perceptual Evaluation of Video-Realistic speech
Perceptual Properties of Current speech Recognition Technology
PFRNet: Dual-Branch Progressive Fusion Rectification Network for Monaural speech Enhancement
Phase Estimation in Single Channel speech Enhancement Using Phase Decomposition
Phase Processing for Single-Channel speech Enhancement: History and recent advances
Phase-Sensitive Joint Learning Algorithms for Deep Learning-Based speech Enhancement
phone-viseme dynamic Bayesian network for audio-visual automatic speech recognition, A
Phoneme segmentation of speech
Photorealistic adaptation and interpolation of facial expressions using HMMS and AAMS for audio-visual speech synthesis
pilot study on augmented speech communication based on Electro-Magnetic Articulography, A
Pipelined Recurrent Fuzzy Neural Networks for Nonlinear Adaptive speech Prediction
Pitch Delay Based Adaptive Steganography for AMR speech Stream
Pitch Detection Algorithms and Voiced/Unvoiced Classification for Noisy speech
Pitch-Normalized Acoustic Features for Robust Children's speech Recognition
Place Theory as an Alternative Solution in Automatic speech Recognition Tasks, The
Polish Emotional speech Database: Recording and Preliminary Validation
Power Exponent Based Weighting Criterion for DNN-Based Mask Approximation in speech Enhancement
Practical Considerations for Real-Time Implementation of speech-Based Gender Detection
Prediction-based classification for audiovisual discrimination between laughter and speech
Principal Component Analysis of speech Spectrogram Images
Probabilistic Class Histogram Equalization Based on Posterior Mean Estimation for Robust speech Recognition
Probabilistic Kernels for Improved Text-to-speech Alignment in Long Audio Tracks
QPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural speech-Driven Gesture Generation
Quality-Aware Bag of Modulation Spectrum Features for Robust speech Emotion Recognition
Quantifying Emotional Similarity in speech
Quantitative Analysis of the Relative Local speech Rate
Query expansion for imperfect speech: applications in distributed learning
R-CNN Based Method to Localize speech Balloons in Comics, An
R-Letter disorder diagnosis (R-LDD): Arabic speech database development for automatic diagnosis of childhood speech disorders (Case study)
R-Letter disorder diagnosis (R-LDD): Arabic speech database development for automatic diagnosis of childhood speech disorders (Case study)
Rate-Invariant Analysis of Trajectories on Riemannian Manifolds with Application in Visual speech Recognition
Rate-invariant comparisons of covariance paths for visual speech recognition
Re-Synchronization Using the Hand Preceding Model for Multi-Modal Fusion in Automatic Continuous Cued speech Recognition
Reading to Listen at the Cocktail Party: Multi-Modal speech Separation
Real-Time Lip Tracking for Audio-Visual speech Recognition Applications
Real-Time Recognition of Affective States from Nonverbal Features of speech and Its Application for Public Speaking Skill Analysis
Real-Time Scene Text to speech System, A
Real-time sign language recognition and speech conversion using VGG16
Real-time speech-driven 3D face animation
Real-Time Vision and speech Driven Avatars for Multimedia Applications
Realistic Face Animation for Audiovisual speech Applications: A Densification Approach Driven by Sparse Stereo Meshes
Realistic speech animation based on observed 3D face dynamics
Realistic speech-Driven Facial Animation with GANs
Recent advances in the automatic recognition of audiovisual speech
Recognition of gestures in the context of speech
Recognition of phonetic labels of the TIMIT speech corpus by means of an artificial neural network
Recognition of visual speech elements using adaptively boosted hidden Markov models
Recognizing Stress Using Semantics and Modulation of speech and Gestures
Reconstructing speech From CNN Embeddings
Reconstruction of Dysphonic speech by MELP
Reconstruction-Based Visual-Acoustic-Semantic Embedding Method for speech-Image Retrieval, A
Recurrent Neural Network Based Small-footprint Wake-up-word speech Recognition System with a Score Calibration Method
Recurrent neural network speech predictor based on dynamical systems approach
Reduced Universal Background Model for speech Recognition and Identification System
Reduction of musical residual noise using perceptual tools with classic speech denoising techniques
Regression based landmark estimation and multi-feature fusion for visual speech recognition
Regularized Subspace Gaussian Mixture Models for speech Recognition
reliable multidomain model for speech act classification, A
Representation of speech in Deep Neural Networks, The
Rescoring of N-Best Hypotheses Using Top-Down Selective Attention for Automatic speech Recognition
Research and Design of Smart Home speech Recognition System Based on Deep Learning
Research of Chain Model Based on CNN-TDNNF in Yulin Dialect speech Recognition, The
Research of STRAIGHT Spectrogram and Difference Subspace Algorithm for speech Recognition
Research on HMM_based speech synthesis for Lhasa dialect
Research Progress in speech Enhancement Technology
Researchers Push speech Recognition Toward the Mainstream
Residual Excitation Skewness for Automatic speech Polarity Detection
Resolution limits on visual speech recognition
Restoration of Bone-Conducted speech With U-Net-Like Model and Energy Distance Loss
Rethinking Algorithm Design and Development in speech Processing
Reversible Audio Data Hiding Based on Variable Error-Expansion of Linear Prediction for Segmental Audio and G.711 speech
review of recent advances in visual speech decoding, A
ReVISE: Self-Supervised speech Resynthesis with Visual Input for Universal and Generalized Speech Regeneration
ReVISE: Self-Supervised speech Resynthesis with Visual Input for Universal and Generalized Speech Regeneration
RNN-Based speech-Music Discrimination Used for Hybrid Audio Coder, An
Robot Command Interface Using an Audio-Visual speech Recognition System
Robust and Fast Localization of Single speech Source Using a Planar Array
Robust Arabic Multi-stream speech Recognition System in Noisy Environment
Robust Audio-Visual Mandarin speech Recognition Based On Adaptive Decision Fusion And Tone Features
Robust Audio-Visual speech Recognition Based on Hybrid Fusion
Robust Audio-Visual speech Recognition Based on Late Integration
Robust Audio-Visual speech Recognition Under Noisy Audio-Video Conditions
Robust Automatic speech Recognition Using PD-MEEMLIN
Robust Biometric Person Identification Using Automatic Classifier Fusion of speech, Mouth, and Face Experts
Robust Face Frontalization For Visual speech Recognition*
robust method for the Vietnamese handwritten and speech recognition, A
Robust Parallel speech Recognition in Multiple Energy Bands
Robust Pitch Extraction Method for the HMM-Based speech Synthesis System
Robust Sensor Fusion: Analysis and Application to Audio-Visual speech Recognition
Robust Speaker Verification via Asynchronous Fusion of speech and Lip Information
Robust speech recognition using spatial-temporal feature distribution characteristics
Robust telephone speech recognition based on channel compensation
robust unsupervised pattern discovery and clustering of speech signals, A
Robustness of linear discriminant analysis in automatic speech recognition
Role of Long-Term Dependency in Synthetic speech Detection, The
Role of Synthetically Generated Samples on speech Recognition in a Resource-Scarce Language
Role of Vocal Persona in Natural and Synthesized speech, The
RSD-GAN: Regularized Sobolev Defense GAN Against speech-to-Text Adversarial Attacks
Salient Feature Extraction Algorithm for speech Emotion Recognition, A
Say it to see it: A speech based immersive model retrieval system
SC-CNN: Effective Speaker Conditioning Method for Zero-Shot Multi-Speaker Text-to-speech Systems
Searching through a speech Memory for Text-Independent Speaker Verification
Secure speech biometric templates for user authentication
SEEG: Semantic Energized Co-speech Gesture Generation
Selection of Unknown Objects Specified by speech Using Models Constructed from Web Images
Selective HuBERT: Self-Supervised Pre-Training for Target Speaker in Clean and Mixture speech
Self Supervised Adversarial Domain Adaptation for Cross-Corpus and Cross-Language speech Emotion Recognition
Semi-blind speech-Music Separation Using Sparsity and Continuity Priors
Semi-supervised speech-driven 3D Facial Animation via Cross-modal Encoding
Sentence boundary detection in conversational speech transcripts using noisily labeled examples
Separation of Audio-Visual speech Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli
Separation of Audio-Visual speech Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli
Session compensation using binary speech representation for speaker recognition
SFNet: A Computationally Efficient Source Filter Model Based Neural speech Synthesis
Signal subspace approach for narrowband noise reduction in speech
Signal-Aware Parametric Quality Model for Audio and speech over IP Networks
Signal-to-Signal Ratio Independent Speaker Identification for Co-channel speech Signals
Significance of Empty speech Pauses: Cognitive and Algorithmic Issues, The
Significance of Pitch-Based Spectral Normalization for Children's speech Recognition
Simple Model of speech Communication and its Application to Intelligibility Enhancement, A
Single Channel speech Separation Using Source-Filter Representation
Single-Channel speech Separation Focusing on Attention DE
Single-Input/Binaural-Output Antiphasic speech Enhancement Method for Speech Intelligibility Improvement, A
Single-Input/Binaural-Output Antiphasic speech Enhancement Method for Speech Intelligibility Improvement, A
SNAC: Speaker-Normalized Affine Coupling Layer in Flow-Based Architecture for Zero-Shot Multi-Speaker Text-to-speech
So-DAS: A Two-Step Soft-Direction-Aware speech Separation Framework
Some recent advances in speech recognition with potential applications in other statistical pattern recognition areas
Some relations among stochastic finite state networks used in automatic speech recognition
Something to Talk About: Signal Processing in speech and Audiology Research: Promising Investigations Explore New Opportunities in Human Communication
source and channel coding approach to data hiding with application to hiding speech in video, A
SPACE: speech-driven Portrait Animation with Controllable Expression
Sparse Kernel Reduced-Rank Regression for Bimodal Emotion Recognition From Facial Expression and speech
Speaker Attractor Network: Generalizing speech Separation to Unseen Numbers of Sources
Speaker Extraction With Co-speech Gestures Cue
Speaker identification security improvement by means of speech watermarking
Speaker Independent Audio-Visual speech Recognition
Speaker Modeling with Various speech Representations
Speaker-aware Multi-Task Learning for automatic speech recognition
Speaker-aware speech Emotion Recognition by Fusing Amplitude and Phase Information
Speaker-Independent speech Animation Using Perceptual Loss Functions and Synthetic Data
Speaker-independent speech Recognition by Means of Functional-link Neural Networks
Spectral Domain speech Enhancement Using HMM State-Dependent Super-Gaussian Priors
Spectral domain texture analysis for speech enhancement
Spectral Features Based on Local Hu Moments of Gabor Spectrograms for speech Emotion Recognition
Spectral Flatness Analysis for Emotional speech Synthesis and Transformation
Spectral Tilt Estimation for speech Intelligibility Enhancement Using RNN Based on All-Pole Model
SPECTRE: Visual speech-Informed Perceptual 3D Facial Expression Reconstruction from Videos
Spectro-Temporal Filtering for Multichannel speech Enhancement in Short-Time Fourier Transform Domain
speech Activity Detection in Naturalistic Audio Environments: Fearless Steps Apollo Corpus
speech Analysis, other than Recognition
speech Animation Using Coupled Hidden Markov Models
speech Authentication and Recovery Scheme in Encrypted Domain
speech authentication system using digital watermarking and pattern recovery
speech Ballons in Comics, Comic Analysis, Panel Detection
speech balloon and speaker association for comics and manga understanding
speech Bandwidth Extension Using Recurrent Temporal Restricted Boltzmann Machines
speech Based Approach to Surveillance Video Retrieval, A
speech Based Shopping Assistance for the Blind
speech Content Retrieval Model Based on Integrated Neural Network for Natural Language Description, A
speech Denoising and Compensation for Hearing Aids Using an FTCRN-Based Metric GAN
speech driven facial animation using a hidden markov coarticulation model
speech driven lip synthesis using viseme based hidden markov models
speech Driven Talking Face Generation From a Single Image and an Emotion Condition
speech Driven Tongue Animation
speech driven video editing via an audio-conditioned diffusion model
speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates
speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates
speech Emotion Analysis in Noisy Real-World Environment
speech emotion recognition based on kernel reduced-rank regression
speech Emotion Recognition Enhanced Traffic Efficiency Solution for Autonomous Vehicles in a 5G-Enabled Space-Air-Ground Integrated Intelligent Transportation System
speech emotion recognition model based on Bi-GRU and Focal Loss
speech emotion recognition system based on genetic algorithm and neural network
speech Emotion Recognition using a backward context
speech Emotion Recognition Using Deep Convolutional Neural Network and Discriminant Temporal Pyramid Matching
speech Emotion Recognition Using Fourier Parameters
speech emotion recognition via learning analogies
speech Emotion Recognition via Multi-Level Attention Network
speech Enhancement Based on Deep Autoencoder for Remote Arabic Speech Recognition
speech Enhancement Based on Deep Autoencoder for Remote Arabic Speech Recognition
speech enhancement for in-vehicle voice control systems using wavelet analysis and blind source separation
speech Enhancement Using a Two-Stage Network for an Efficient Boosting Strategy
speech Enhancement with Nonstationary Acoustic Noise Detection in Time Domain
speech Enhancement: A Review of Modern Methods
speech frame recognition based on less shift sensitive wavelet filter banks
speech Information Processing: Theory and Applications
speech Intelligibility Enhancement By Non-Parallel Speech Style Conversion Using CWT and iMetricGAN Based CycleGAN
speech Intelligibility Enhancement By Non-Parallel Speech Style Conversion Using CWT and iMetricGAN Based CycleGAN
speech Intelligibility Estimation Method Using a Non-reference Feature Set, A
speech Magnitude-Spectrum Information-Entropy (MSIE) for Automatic Speech Recognition in Noisy Environments
speech Magnitude-Spectrum Information-Entropy (MSIE) for Automatic Speech Recognition in Noisy Environments
speech music discrimination using class-specific features
speech Personality Recognition Based on Annotation Classification Using Log-Likelihood Distance and Extraction of Essential Audio Features
speech Privacy for Sound Surveillance Using Super-Resolution Based on Maximum Likelihood and Bayesian Linear Regression
speech Quality Assessment Over Lossy Transmission Channels Using Deep Belief Networks
speech recognition method based on feature distributions, A
speech Recognition Moves from Software to Hardware
speech Recognition of English by Japanese Using Lexicon Represented by Multiple Reduced Phoneme Sets
speech Recognition of Mandarin Monosyllables
speech Recognition Supported by Lip Analysis
speech Recognition System of Arabic Alphabet Based on a Telephony Arabic Corpus
speech recognition using fractals
speech Recognition Using Long-Span Temporal Patterns in a Deep Network Model
speech recognition with hierarchical recurrent neural networks
speech Recognition, Neural Networks, CNN
speech Recognition, Speech Analysis, Signal Processing
speech Recognition, Speech Analysis, Signal Processing
speech Separation from Background of Music Based on Single-channel Recording
speech Signal Processing Based on Wavelets and SVM for Vocal Tract Pathology Detection
speech Spectral Envelope Enhancement by HMM-Based Analysis/Resynthesis
speech Synchronized Tongue Animation by Combining Physiology Modeling and X-ray Image Fitting
speech Synthesis Approach for High Quality Speech Separation and Generation, A
speech Synthesis Approach for High Quality Speech Separation and Generation, A
speech Synthesis Based on Hidden Markov Models
speech Synthesis for the Generation of Artificial Personality
speech Synthesis With Mixed Emotions
speech Synthesis, Synthetic Speech
speech Synthesis, Synthetic Speech
speech Time-Scale Modification With GANs
speech understanding and dialog system with a homogeneous linguistic knowledge base, A
speech Watermarking Method Based on Formant Tuning
speech-assisted lip synchronization in audio-visual communications
speech-Centric Information Processing: An Optimization-Oriented Approach
speech-controlled animation system
speech-Driven 3D Facial Animation with Implicit Emotional Awareness: A Deep Learning Approach
speech-Driven Automatic Facial Expression Synthesis
speech-Driven Expressive Talking Lips with Conditional Sequential Generative Adversarial Networks
speech-driven face synthesis from 3D video
speech-driven facial animation using a hierarchical model
speech-Driven Facial Animation Using a Shared Gaussian Process Latent Variable Model
speech-driven Facial Animation Using Cascaded Gans for Learning of Motion and Texture
speech-Driven Facial Animation Using Manifold Relevance Determination
speech-gesture driven multimodal interfaces for crisis management
speech-To-Face Movement Synthesis Based on HMMS
speech-to-Singing Voice Conversion: The Challenges and Strategies for Improving Vocal Conversion Processes
speech-to-video synthesis using facial animation parameters
speech-to-video synthesis using MPEG-4 compliant visual features
speech-Video Synchronization Using Lips Movements and Speech Envelope Correlation
speech-Video Synchronization Using Lips Movements and Speech Envelope Correlation
speech-Visual Emotion Recognition by Fusing Shared and Specific Features
speech-Visual Emotion Recognition via Modal Decomposition Learning
speech/Gesture Interface to a Visual Computing Environment for Molecular Biologists
speech/Music Classification Based on Distributed Evolutionary Fuzzy Logic for Intelligent Audio Coding
speech/music discrimination for analysis of radio stations
speech2Lip: High-fidelity Speech to Lip Generation by Learning from a Short Video
speech4Mesh: Speech-Assisted Monocular 3D Facial Reconstruction for Speech-Driven 3D Facial Animation
speech4Mesh: Speech-Assisted Monocular 3D Facial Reconstruction for Speech-Driven 3D Facial Animation
Split Bregman Approach to Linear Prediction Based Dereverberation With Enforced speech Sparsity
Spontaneous speech Emotion Recognition Using Multiscale Deep Convolutional LSTM
Spontaneous speech emotion recognition using prior knowledge
Spotting words in silent speech videos: a retrieval-based approach
SSSD: speech Scene database by Smart Device for Visual Speech Recognition
SSSD: speech Scene database by Smart Device for Visual Speech Recognition
Stable Implementation of Zero Frequency Filtering of speech Signals for Efficient Epoch Extraction
Standardization-refinement domain adaptation method for cross-subject EEG-based classification in imagined speech recognition
Statistical estimation of emotions in speech notes by featured term analogy
Statistical Machine Translation for speech: A Perspective on Structures, Learning, and Decoding
Statistical Parametric speech Synthesis Using Generalized Distillation Framework
Steganalysis of Compressed speech Based on Markov and Entropy
Stochastic Modelling: From Pattern Classification to speech Recognition and Translation
Strategies to improve the performance of very low bit rate speech coders and application to a variable rate 1.2 kb/s codec
Streaming End-to-End Multi-Talker speech Recognition
Structural representation of speech for phonetic classification
study of artificial speech quality assessors of VoIP calls subject to limited bursty packet losses, A
Style Extractor For Facial Expression Recognition in the Presence of speech
Style Transfer for Co-speech Gesture Animation: A Multi-speaker Conditional-mixture Approach
Subband-Based Stationary-Component Suppression Method Using Harmonics and Power Ratio for Reverberant speech Recognition, A
Subspace-Based Learning for Automatic Dysarthric speech Detection
Supervised Learning Approach for Explicit Spatial Filtering of speech
Supervised Monaural speech Enhancement Using Complementary Joint Sparse Representations
Supervised single-channel speech dereverberation and denoising using a two-stage processing
Support Vector Machine-Based Dynamic Network for Visual speech Recognition Applications, A
Survey of Deep Representation Learning for speech Emotion Recognition
Survey on speech emotion recognition: Features, classification schemes, and databases
Switching Auxiliary Chains for speech Recognition based on Dynamic Bayesian Networks
Switching Linear Dynamic Models for Noise Robust In-Car speech Recognition
SylNet: An Adaptable End-to-End Syllable Count Estimator for speech
Synchrony-Based Feature Extraction for Robust Automatic speech Recognition
syntactic procedure for the recognition of glottal pulses in continuous speech, A
Synthesising 3D Facial Motion from In-the-Wild speech
Synthetic speech Detection Based on Local Autoregression and Variance Statistics
Synthetic speech Detection Based on the Temporal Consistency of Speaker Features
SynthVSR: Scaling Up Visual speech Recognition With Synthetic Supervision
System and Analysis Used for a Dynamic Facial speech Deformation Model
System and method for triphone-based unit selection for visual speech synthesis
Talking About 3D Scenes: Integration of Image and speech Understanding in a Hybrid Distributed System
Talking Face: Using Facial Feature Detection and Image Transformations for Visual speech
Talking Heads, speech Driven Face Animation
Taming Diffusion Models for Audio-Driven Co-speech Gesture Generation
TCD-TIMIT: An Audio-Visual Corpus of Continuous speech
Technical and Phonetic Aspects of speech Quality Assessment: The Case of Prosody Synthesis
Telephone-Based speech Dialog Systems
Temporal Envelope and Fine Structure Cues for Dysarthric speech Detection Using CNNs
Temporal Measures of Hand and speech Coordination During French Cued Speech Production
Temporal Measures of Hand and speech Coordination During French Cued Speech Production
Temporal Modulation Normalization for Robust speech Feature Extraction and Recognition
Temporal Multimodal Learning in Audiovisual speech Recognition
Temporal Relation Inference Network for Multimodal speech Emotion Recognition
Temporal Symbolic Integration Applied to a Multimodal System Using Gestures and speech
Text Block Segmentation in Comic speech Bubbles
Text- and speech-based phonotactic models for spoken language identification of Basque and Spanish
Text-independent speaker identification using Radon and discrete cosine transforms based features from speech spectrogram
Time Distributed Multiview Representation for speech Emotion Recognition
Time-Delay Neural Networks for Estimating Lip Movements from speech Analysis: A Useful Tool in Audio Video Synchronization
Time-Domain Multi-Modal Bone/Air Conducted speech Enhancement
Time-Domain speech Separation Networks With Graph Encoding Auxiliary
Time-Frequency Attention for speech Emotion Recognition with Squeeze-and-Excitation Blocks
Towards a high quality Arabic speech synthesis system based on neural networks and residual excited vocal tract model
Towards End-to-End Synthetic speech Detection
Towards Estimating the Upper Bound of Visual-speech Recognition: The Visual Lip-Reading Feasibility Database
Towards multilingual end-to-end speech recognition for air traffic control
Towards query-by-speech handwritten keyword spotting
Towards Robust Deep Neural Networks for Affect and Depression Recognition from speech
Towards Zero-Shot Multi-Speaker Multi-Accent Text-to-speech Synthesis
Tracking continuous emotional trends of participants during affective dyadic interactions using body language and speech information
Tracking Discourse Topics in Co-speech Gesture
Trainable videorealistic speech animation
Transfer learning helps to improve the accuracy to classify patients with different speech disorders in different languages
Transfer Linear Subspace Learning for Cross-Corpus speech Emotion Recognition
Transformer-Based End-to-End Automatic speech Recognition Algorithm, A
Transformer-Based End-to-End speech Translation With Rotary Position Embedding
Translingual visual speech synthesis
tutorial on Hidden Markov Models and selected applications in speech recognition, A
Two features combination with gated recurrent unit for visual speech recognition
Two technologies vie for recognition in speech market
Two-Band Radial Postfiltering in Cepstral Domain with Application to speech Synthesis
Two-Level Bimodal Association for Audio-Visual speech Recognition
Two-Stage Learning and Fusion Network With Noise Aware for Time-Domain Monaural speech Enhancement
Two-Stage Refinement of Magnitude and Complex Spectra for Real-Time speech Enhancement
two-stage speech activity detection system considering fractal aspects of prosody, A
U-Former: Improving Monaural speech Enhancement with Multi-head Self and Cross Attention
UniEnc-CASSNAT: An Encoder-Only Non-Autoregressive ASR for speech SSL Models
Unified Training of Feature Extractor and HMM Classifier for speech Recognition
Unit Selection Using Linguistic, Prosodic and Spectral Distance for Developing Text-to-speech System in Hindi
Universum Autoencoder-Based Domain Adaptation for speech Emotion Recognition
Unpaired Image-to-speech Synthesis With Multimodal Information Bottleneck
Unpaired speech Enhancement by Acoustic and Adversarial Supervision for Speech Recognition
Unpaired speech Enhancement by Acoustic and Adversarial Supervision for Speech Recognition
Unsupervised Cross-Corpus speech Emotion Recognition Using a Multi-Source Cycle-GAN
Unsupervised Feature Learning for speech Using Correspondence and Siamese Networks
Unsupervised Personalization of an Emotion Recognition System: The Unique Properties of the Externalization of Valence in speech
Unsupervised speech Activity Detection Using Voicing Measures and Perceptual Spectral Flux
Unsupervised speech Text Localization in Comic Images
Unsupervised Tibetan speech features Learning based on Dynamic Bayesian Networks
Use of Line Spectral Frequencies for Emotion Recognition from speech
Use of radial basis function network with discrete wavelet transform for speech enhancement
User Authentication System Based on speech and Cascade Hybrid Facial Feature
User Verification by Combining speech and Face Biometrics in Video
Using Adaptive Filter to Increase Automatic speech Recognition Rate in a Digit Corpus
Using Hand Gesture and speech in a Multimodal Augmented Reality Environment
Using Semantics to Automatically Generate speech Interfaces for Wearable Virtual and Augmented Reality Applications
Using speech for Handwritten Mathematical Expression Recognition Disambiguation
Using speech Input for Image Interpretation and Annotation
Utterance Verification-Based Dysarthric speech Intelligibility Assessment Using Phonetic Posterior Features
value of stories for speech-based video search, The
Variable-Length Speaker Conditioning in Flow-Based Text-to-speech
Vector quantization with memory and multi-labeling for isolated video-only automatic speech recognition
Vector Taylor series based model adaptation using noisy speech trained hidden Markov models
Vector-Based Feature Representations for speech Signals: From Supervector to Latent Vector
Vector-to-Vector Regression via Distributional Loss for speech Enhancement
Ventriloquist-Net: Leveraging speech Cues for Emotive Talking Head Generation
very low bit rate codec for wide band speech based on a long-term perceptual harmonic plus noise model, A
Video Augmentation for Improving Audio speech Recognition under Noise
Video Rewrite: Driving Visual speech with Audio
Video, Text, and speech-Driven Realistic 3-D Virtual Head for Human-Machine Interface, A
VisageSynTalk: Unseen Speaker Video-to-speech Synthesis via Speech-Visage Feature Selection
VisageSynTalk: Unseen Speaker Video-to-speech Synthesis via Speech-Visage Feature Selection
Vision Based speech Animation Transferring with Underlying Anatomical Structure
Visual display methods for in computer-animated speech production models
Visual prosody: facial movements accompanying speech
Visual Recognition of Activities, Gestures, Facial Expressions and speech: An Introduction and a Perspective
Visual Skeleton and Reparative Attention for Part-of-speech image captioning system
Visual speech Enhancement Without A Real Visual Stream
visual speech model based on fuzzy-neuro methods, A
Visual speech Recognition by Recurrent Neural Networks
Visual speech Recognition Method Using Translation, Scale and Rotation Invariant Features
Visual speech Recognition Using Dynamic Features And Support Vector Machines
Visual speech Recognition Using Motion Features and Hidden Markov Models
Visual speech Recognition Using PCA Networks and LSTMs in a Tandem GMM-HMM System
Visual speech Recognition Using Weighted Dynamic Time Warping
Visual speech Recognition with Loosely Synchronized Feature Streams
Visual speech Synthesis by Morphing Visemes
Visual speech Synthesis Using a Variable-Order Switching Shared Gaussian Process Dynamical Model
Visual speech, a trajectory in viseme space
Visual speech: A Physiological or Behavioural Biometric?
Visual-to-speech conversion based on maximum likelihood estimation
Visually Recognizing speech Using Eigen Sequences
VisualVoice: Audio-Visual speech Separation with Cross-Modal Consistency
Voice Conversion for Whispered speech Synthesis
Voice of Leadership: Models and Performances of Automatic Analysis in Online speeches, The
Voicing Detection in Noisy speech Signal
Watch or Listen: Robust Audio-Visual speech Recognition with Visual Corruption Modeling and Reliability Scoring
Watch to Listen Clearly: Visual speech Enhancement Driven Multi-modality Speech Recognition
Watch to Listen Clearly: Visual speech Enhancement Driven Multi-modality Speech Recognition
Watermarking-Based Perceptual Hashing Search Over Encrypted speech
WavDepressionNet: Automatic Depression Level Prediction via Raw speech Signals
WaveCRN: An Efficient Convolutional Recurrent Neural Network for End-to-End speech Enhancement
Waveform Interpolation-Based speech Analysis/Synthesis for HMM-Based TTS Systems
Wavelet speech Enhancement Based on Nonnegative Matrix Factorization
Wavelet-FILVQ classifier for speech analysis
WebVoice: A Toolkit for Perceptual Insights into speech Processing
Weight-Space Viterbi Decoding Based Spectral Subtraction for Reverberant speech Recognition
Whispered speech Detection in Noise Using Auditory-Inspired Modulation Spectrum Features
Whispered speech Detection Using Fusion of Group-Delay-Based Subband Modulation Spectrum and Correntropy Features
Wideband speech Encryption Based Arnold Cat Map for AMR-WB G.722.2 Codec
Word Segments in Category-Based Language Models for Automatic speech Recognition
Zero-Shot Keyword Spotting for Visual speech Recognition In-the-wild
1063 for speech

_speech2action_
speech2action: Cross-Modal Supervision for Action Recognition

_speech2face_
speech2face: Learning the Face Behind a Voice

_speech2lip_
speech2lip: High-fidelity Speech to Lip Generation by Learning from a Short Video

_speech2video_
speech2video Synthesis with 3d Skeleton Regularization and Expressive Body Poses

_speech4mesh_
speech4mesh: Speech-Assisted Monocular 3D Facial Reconstruction for Speech-Driven 3D Facial Animation

_speechreading_
Automatic speechreading with Applications to Human-Computer Interfaces
Lip Feature Extraction Towards an Automatic speechreading System
Robust face feature analysis for automatic speechreading and character animation
Selecting relevant visual features for speechreading
speechreading Using Probabilistic Models
speechreading: an overview of image processing, feature extraction, sensory integration and pattern recognition techniques

Index for "s"


Last update: 2-May-24 21:06:23
Use price@usc.edu for comments.