Keith Price Bibliography kwic Details for speec

Index for speec

_ speech _

2.4kbps Multiband Characteristic Waveform Interpolation speech Coding Algorithm, A

2.5D Visual speech Synthesis Using Appearance Models

3-D Convolutional Recurrent Neural Networks With Attention Model for speech Emotion Recognition

3D Visual passcode: speech -driven 3D facial dynamics for behaviometrics

450bps speech Coding Algorithm Based on Multi-Mode Matrix Quantization, A

Accuracy, Apps Advance speech Recognition

Acoustic Analysis for Automatic speech Recognition

Acoustic echo cancellation for stereophonic systems derived from pairwise panning of monophonic speech

Acoustic Event Detection in speech Overlapping Scenarios Based on High-Resolution Spectral Input and Deep Learning

Acoustically Emotion-Aware Conversational Agent With speech Emotion Recognition and Empathetic Responses, The

Active Contour Model for speech Balloon Detection in Comics, An

Adaptation of Hidden Markov Models for Recognizing speech of Reduced Frame Rate

Adaptive Gain Control for Enhanced speech Intelligibility Under Reverberation

adaptive model of person identification combining speech and image information, An

Adaptive Signal Models for Wide-Band speech and Audio Compression

Adaptive speech Dereverberation Using Constrained Sparse Multichannel Linear Prediction

Adaptive speech enhancement with varying noise backgrounds

Adaptive speech Intelligibility Enhancement for Far-and-Near-end Noise Environments Based on Self-attention StarGAN

Adding Voicing Features into speech Recognition Based on HMM in Slovak

Advanced tools for speech synchronized animation

Adversarial Continual Learning to Transfer Self-Supervised speech Representations for Voice Pathology Detection

Adversarial Feature Learning and Unsupervised Clustering Based speech Synthesis for Found Data With Acoustic and Textual Noise

Adversarial Training Based speech Emotion Classifier With Isolated Gaussian Regularization, An

Affective Audio Annotation of Public speech es with Convolutional Clustering Neural Network

Affine-Invariant Visual Features Contain Supplementary Information to Enhance speech Recognition

Aging speech recognition with speaker adaptation techniques: Study on medium vocabulary continuous Bengali speech

Aging speech recognition with speaker adaptation techniques: Study on medium vocabulary continuous Bengali speech

AKVSR: Audio Knowledge Empowered Visual speech Recognition by Compressing Audio Knowledge of a Pretrained Model

Algorithms for syllabic hypothesization in continuous speech

Alias-and-Separate: Wideband speech Coding Using Sub-Nyquist Sampling and Speech Separation

Alias-and-Separate: Wideband speech Coding Using Sub-Nyquist Sampling and Speech Separation

Amazigh audiovisual speech recognition system design

Amazigh isolated word speech recognition system using the Adaptive Orthogonal Transform Method.

Analysing acoustic model changes for active learning in automatic speech recognition

Analysis and Classification of Cold speech Using Variational Mode Decomposition

Analysis of Emotion Annotation Strength Improves Generalization in speech Emotion Recognition Models

Analysis of Lip Geometric Features for Audio-Visual speech Recognition

Analysis of stressed human speech

analysis of the effect of combining standard and alternate sensor signals on recognition of syllabic units for multimodal speech recognition, An

Analysis of the Multifractal Nature of speech Signals

Analysis of the Possibilities to Adapt the Foreign Language speech Recognition Engines for the Lithuanian Spoken Commands Recognition

Analysis of the Utility of Classical and Novel speech Quality Measures for Speaker Verification

Anchor Models for Emotion Recognition from speech

Animating visible speech and facial expressions

AnyoneNet: Synchronized speech and Talking Head Generation for Arbitrary Persons

Application of Capsule Neural Network Based CNN for speech Emotion Recognition, The

Application of digit and speech recognition in food delivery robot

Application of support vector machines classifiers to visual speech recognition

Application of triphone clustering in acoustic modeling for continuous speech recognition in Bengali

Application of wavelet transforms for C/V segmentation on Mandarin speech signals

ARawNet: A Lightweight Solution for Leveraging Raw Waveforms in Spoof speech Detection

Architecture for Automatic Lipreading to Enhance speech Recognition, An

Art Critic: Multisignal Vision and speech Interaction System in a Gaming Context

Articulatory speech Re-synthesis: Profiting from Natural Acoustic Speech Data

Articulatory speech Re-synthesis: Profiting from Natural Acoustic Speech Data

ASQ: An Ultra-Low Bit Rate ASR-Oriented speech Quantization Method

Assessing speaker independence on a speech -based depression level estimation system

Asymmetric 3D face model for speech Language Pathologist applications

Asymmetrically boosted HMM for speech reading

Attention Based Speaker-independent Audio-visual Deep Learning Model for speech Enhancement, An

Attention-based convolutional neural network and long short-term memory for short-term detection of mood disorders based on elicited speech responses

Attention-Based Dense LSTM for speech Emotion Recognition

Audio Based Real-Time speech Animation of Embodied Conversational Agents

Audio Classification in speech and Music: A Comparison Between a Statistical and a Neural Approach

Audio Watermarks, speech Watermarks

Audio-visual continuous speech recognition using MPEG-4 compliant visual features

Audio-Visual Efficient Conformer for Robust speech Recognition

Audio-Visual Person Authentication with Multiple Visualized- speech Features and Multiple Face Profiles

Audio-Visual speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis

Audio-Visual speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis

Audio-Visual speech Fusion Using Coupled Hidden Markov Models

Audio-Visual speech Recognition Based on AAM Parameter and Phoneme Analysis of Visual Feature

Audio-Visual speech Recognition Scheme Based on Wavelets and Random Forests Classification

Audio-visual speech recognition techniques in augmented reality environments

Audio-Visual speech Recognition Using A Two-Step Feature Fusion Strategy

Audio-Visual speech Recognition Using MPEG-4 Compliant Visual Features

Audio-visual speech synchronization detection using a bimodal linear prediction model

Audio-Visual speech Synthesis Based on Chinese Visual Triphone

Audio2Gestures: Generating Diverse Gestures from speech Audio with Conditional Variational Autoencoders

Audiovisual Discrimination Between speech and Laughter: Why and When Visual Information Might Help

Audiovisual speech Source Separation: An overview of key methodologies

Audiovisual Talking Head for Augmented speech Generation: Models and Animations Based on a Real Speaker's Articulatory Data, An

Auditory Features Revisited for Robust speech Recognition

Autoencoder-based Unsupervised Domain Adaptation for speech Emotion Recognition

Automated Lip Synchronized speech Driven Facial Animation

Automated speech alignment for image synthesis

Automatic bi-modal emotion recognition system based on fusion of facial expressions and emotion extraction from speech

Automatic continuous speech recogniser for Dravidian languages using the auto associative neural network

Automatic Detection of Amyotrophic Lateral Sclerosis (ALS) from Video-Based Analysis of Facial Movements: speech and Non-Speech Tasks

Automatic Detection of Amyotrophic Lateral Sclerosis (ALS) from Video-Based Analysis of Facial Movements: speech and Non-Speech Tasks

Automatic Evaluation of Hypernasality and Consonant Misarticulation in Cleft Palate speech

Automatic Evaluation of speech Therapy Exercises Based on Image Data

Automatic Person Verification Using speech and Face Information

Automatic Selection of Visemes for Image-based Visual speech Synthesis

Automatic Sentence Modality Recognition in Children's speech , and Its Usage Potential in the Speech Therapy

Automatic Sentence Modality Recognition in Children's speech , and Its Usage Potential in the Speech Therapy

Automatic speaker verification on narrowband and wideband lossy coded clean speech

Automatic speech discrete labels to dimensional emotional values conversion method

Automatic speech Emotion Recognition Using Auditory Models with Binary Decision Tree and SVM

Automatic Urdu speech Recognition using Hidden Markov Model

Automatic Video Annotation by Mining speech Transcripts

Automatic visual speech segmentation and recognition using directional motion history images and Zernike moments

AVFormer: Injecting Vision into Frozen speech Models for Zero-Shot AV-ASR

Avoiding dominance of speaker features in speech -based depression detection

AWLloss: Speaker Verification Based on the Quality and Difficulty of speech

Bandwidth-adjusted LPC analysis for robust speech recognition

Bayesian Predictive Method for Automatic speech Segmentation, A

Bayesian reasoning on qualitative descriptions from images and speech

Beam-search Formant Tracking Algorithm Based on Trajectory Functions for Continuous speech

Beamforming Algorithm Based on Maximum Likelihood of a Complex Gaussian Distribution With Time-Varying Variances for Robust speech Recognition, A

Behavioral Signal Processing: Deriving Human Behavioral Informatics From speech and Language

Benchmarking classification models for emotion recognition in natural speech : A multi-corporal study

Bilingual speech Recognition by Estimating Speaker Geometry from Video Data

Bimodal fusion in audio-visual speech recognition

Biological Motion of speech

Blind Adaptive Mask to Improve Intelligibility of Non-Stationary Noisy speech

Blind Source Separation Based Approach for speech Enhancement in Noisy and Reverberant Environment, A

Boosted audio-visual HMM for speech reading

Building Naturalistic Emotionally Balanced speech Corpus by Retrieving Emotional Speech from Existing Podcast Recordings

Building Naturalistic Emotionally Balanced speech Corpus by Retrieving Emotional Speech from Existing Podcast Recordings

cache-based natural language model for speech recognition, A

Can we Automatically Transform speech Recorded on Common Consumer Devices in Real-World Environments into Professional Production Quality Speech?: A Dataset, Insights, and Challenges

Can we Automatically Transform speech Recorded on Common Consumer Devices in Real-World Environments into Professional Production Quality Speech?: A Dataset, Insights, and Challenges

Can We Read speech Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech Recognition

Can We Read speech Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech Recognition

Cancellable speech template via random binary orthogonal matrices projection hashing

Cascade Image Transform for Speaker Independent Automatic speech Reading, A

Casual chatter or speaking up? Adjusting articulatory effort in generation of speech and animation for conversational characters

Casual Conversations v2 Dataset: A diverse, large benchmark for measuring fairness and robustness in audio/vision/ speech models, The

CAT-DUnet: Enhancing speech Dereverberation via Feature Fusion and Structural Similarity Loss

CATNet: Cross-modal fusion for audio-visual speech recognition

Chunk-Level speech Emotion Recognition: A General Framework of Sequence-to-One Dynamic Temporal Modeling

CIF-Based speech Segmentation Method for Streaming E2E ASR, A

Class Confusability Reduction in Audio-Visual speech Recognition Using Random Forests

Classification of Complex Information: Inference of Co-Occurring Affective States from Their Expressions in speech

Classifier-Based Learning of Nonlinear Feature Manifold for Visualization of Emotional speech Prosody

clump splitting based method to localize speech balloons in comics, A

Clustering Algorithm for the Fast Match of Acoustic Conditions in Continuous speech Recognition, A

Co- speech Gesture Detection through Multi-Phase Sequence Labeling

Co- speech Gesture Synthesis by Reinforcement Learning with Contrastive Pretrained Rewards

CodeTalker: speech -Driven 3D Facial Animation with Discrete Motion Prior

Combined Handwriting and speech Modalities for User Authentication

Combining Deep and Unsupervised Features for Multilingual speech Emotion Recognition

Combining handwriting and speech recognition for transcribing historical handwritten documents

Combining speech and Handwriting Modalities for Mathematical Expression Recognition

Combining speech energy and edge information for fast and efficient voice activity detection in noisy environments

Communicative Rhythm in Gesture and speech

Compact and Efficient Multitask Learning in Vision, Language and speech

Compact Representation of Visual speech Data Using Latent Variables, A

Comparative Experiments to Evaluate the Use of Syllables for the Improvement of Automatic Recognition of Dysarthric speech

Comparing Multiple Classifiers for speech -Based Detection of Self-Confidence: A Pilot Study

Comparison of Active Shape Model and Scale Decomposition Based Features for Visual speech Recognition, A

Comparison of Image Transform-Based Features for Visual speech Recognition in Clean and Corrupted Videos

Comparison of MPEG-4 Facial Animation Parameter Groups with Respect to Audio-Visual speech Recognition Performance

Comparison of Phoneme and Viseme Based Acoustic Units for speech Driven Realistic lip Animation

Complex Neural Spatial Filter: Enhancing Multi-Channel Target speech Separation in Complex Domain

computationally compact divergence measure for speech processing, A

Computer Assisted Transcription of speech

Concatenated Frame Image Based CNN for Visual speech Recognition

Conceptual and Lexical Factors in the Production of speech and Conversational Gestures: Neuropsychological Evidence

Conditional Random Fields in speech , Audio, and Language Processing

ConflictNET: End-to-End Learning for speech -Based Conflict Intensity Estimation

Connecting Subspace Learning and Extreme Learning Machine in speech Emotion Recognition

Constant-Q magnitude-phase coefficients extraction for synthetic speech detection

Constrained MMSE LP Residual Estimator for speech Dereverberation in Noisy Environments, A

Constructing speech processing systems on universal phonetic codes accompanied with reference acoustic models

Contextual and Cross-Modal Interaction for Multi-Modal speech Emotion Recognition

Contextual vector quantization for speech recognition with discrete hidden Markov model

Continual Learning for Personalized Co- speech Gesture Generation

Continuous Audio-Visual speech Recognition

Continuous Automatic speech Recognition by Lipreading

Continuous Estimation of Emotions in speech by Dynamic Cooperative Speaker Models

Continuous speech coding using coiflets wavelet

Controllable Multi-Lingual Multi-Speaker Multi-Style Text-to- speech Synthesis With Multivariate Information Minimization, A

Conversational Evaluation of speech Bandwidth Extension Using a Mobile Handset

Conversion of neutral speech to storytelling style speech

Conversion of neutral speech to storytelling style speech

Convolutional Network With Multi-Scale and Attention Mechanisms for End-to-End Single-Channel speech Enhancement, A

Convolutional Neural Networks for Distant speech Recognition

Correlation based speech -video synchronization

coupled HMM approach to video-realistic speech animation, A

Creating 3D speech -driven talking heads: a probabilistic network approach

CrisisHateMM: Multimodal Analysis of Directed and Undirected Hate speech in Text-Embedded Images from Russia-Ukraine Conflict

CroMM-VSR: Cross-Modal Memory Augmented Visual speech Recognition

Cross-Corpus speech Emotion Recognition Based on Domain-Adaptive Least-Squares Regression

Cross-Corpus speech Emotion Recognition Based on Few-Shot Learning and Domain Adaptation

Cross-Modal Analysis of speech , Gestures, Gaze and Facial Expressions

Cross-Speaker Emotion Transfer Through Information Perturbation in Emotional speech Synthesis

Cryptographic- speech -Key Generation Architecture Improvements

Cued speech Gesture Recognition: A First Prototype Based on Early Reduction

CWT-Based Approach for Epoch Extraction From Telephone Quality speech

Cyclic Defense GAN Against speech Adversarial Attacks

Cyclic Transfer Learning for Mandarin-English Code-Switching speech Recognition

Czech Spontaneous speech Collection and Annotation: The Database of Technical Lectures

Dar speech : An Automatic Speech Recognition System for the Moroccan Dialect

Data-Driven Jacobian Adaptation in a Multi-model Structure for Noisy speech Recognition

Dawn of the Transformer Era in speech Emotion Recognition: Closing the Valence Gap

DBATES: Dataset for Discerning Benefits of Audio, Textual, and Facial Expression Features in Competitive Debate speech es

DBN-based Spectral Feature Representation for Statistical Parametric speech Synthesis

Decision Level Fusion for Audio-Visual speech Recognition in Noisy Conditions

Deep Audio-Visual speech Recognition

Deep Belief Networks for Real-Time Extraction of Tongue Contours from Ultrasound During speech

Deep Cross-Modal Retrieval Between Spatial Image and Acoustic speech

Deep Hybrid Approach for Hate speech Analysis, A

Deep Learning for Acoustic Modeling in Parametric speech Generation: A systematic review of existing techniques and future trends

Deep Learning for Emotional speech Recognition

Deep Learning Loss Function Based on the Perceptual Evaluation of the speech Quality, A

DeepComboSAD: Spectro-Temporal Correlation Based speech Activity Detection for Naturalistic Audio Streams

Defining Laughter Context for Laughter Synthesis with Spontaneous speech Corpus

DeFT-AN: Dense Frequency-Time Attentive Network for Multichannel speech Enhancement

Demonstration of an HMM-based photorealistic expressive audio-visual speech synthesis system

Dense Convolutional Recurrent Neural Network for Generalized speech Animation

Detecting Aggression in Voice Using Inverse Filtered speech Features

Detecting Multiple Steganography Methods in speech Streams Using Multi-Encoder Network

Detecting Parkinson's disease with sustained phonation and speech signals using machine learning techniques

Detecting Unipolar and Bipolar Depressive Disorders from Elicited speech Responses Using Latent Affective Structure Model

Detection of a Speaker in Video by Combined Analysis of speech Sound and Mouth Movement

Detection of COVID-19 from speech signal using bio-inspired based cepstral features

Detection of Dynamic Structures of speech Fundamental Frequency in Tonal Languages

Detection of Vowel Offset Point From speech Signal

Device and method for dubbing an audio-visual presentation which generates synthesized speech and corresponding facial movements

Differentiable Mean Opinion Score Regularization for Perceptual speech Enhancement

DiffMotion: speech -Driven Gesture Synthesis Using Denoising Diffusion Model

DiffV2S: Diffusion-based Video-to- speech Synthesis with Vision-guided Speaker Embedding

Diphone spanish text-to- speech synthesizer

Direct Text to speech Translation System Using Acoustic Units

Disambiguation in Unknown Object Detection by Integrating Image and speech Recognition Confidences

Discriminating Unknown Objects from Known Objects Using Image and speech Information

Discrimination Between Native and Non-Native speech Using Visual Features Only

Discriminative Analysis of Lip Motion Features for Speaker Identification and speech -Reading

Discriminative Capacity and Phonetic Information of Bottleneck Features in speech

Discriminative feature extraction for speech recognition using continuous output codes

Discriminative Frequency Information Learning for End-to-End speech Anti-Spoofing

Discriminative Multi-Modality speech Recognition

Discriminative Training of NMF Model Based on Class Probabilities for speech Enhancement

Distilled non-semantic speech embeddings with binary neural networks for low-resource devices

Distributed Audio Network for speech Enhancement in Challenging Noise Backgrounds

Distributed Microphones speech Separation by Learning Spatial Information With Recurrent Neural Network

Djinn: Interaction Framework for Home Environment Using speech and Vision

DNN-Based Feature Enhancement Using DOA-Constrained ICA for Robust speech Recognition

DNN-Based Feature Extraction for Conflict Intensity Estimation From speech

Does Visual Self-Supervision Improve Learning of speech Representations for Emotion Recognition?

DR2: Disentangled Recurrent Representation Learning for Data-efficient speech Video Synthesis

Dynamic 3-D Visualization of Vocal Tract Shaping During speech

Dynamic Bayesian Networks for Audio-Visual speech Recognition

Dynamic versus Static Facial Expressions in the Presence of speech

Dynamic-static Cross Attentional Feature Fusion Method for speech Emotion Recognition

E2E-V2SResNet: Deep residual convolutional neural networks for end-to-end video driven speech synthesis

Effect of Various Visual speech Units on Language Identification Using Visual Speech Recognition

Effect of Various Visual speech Units on Language Identification Using Visual Speech Recognition

Effective online unsupervised adaptation of Gaussian mixture models and its application to speech classification

Effective Style Token Weight Control Technique for End-to-End Emotional speech Synthesis, An

Effectiveness of Mel Scale-Based ESA-IFCC Features for Classification of Natural vs. Spoofed speech

Efficient Framework for Constructing speech Emotion Corpus Based on Integrated Active Learning Strategies, An

Efficient Gaussian Mixture for speech Recognition

Efficient Generation of speech Adversarial Examples with Generative Model

Efficient HMM-Based Feature Enhancement Method With Filter Estimation for Reverberant speech Recognition, An

Efficient One-Pass Decoding with NNLM for speech Recognition

Efficient Representation Learning for Inner speech Domain Generalization

Efficient Sparse Banded Acoustic Models for speech Recognition

Efficient text analyser with prosody generator-driven approach for Mandarin text-to- speech

Efficient use of the grammar scale factor to classify incorrect words in speech recognition verification

Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for Low-Resource speech Recognition

EmoNet: A Transfer Learning Framework for Multi-Corpus speech Emotion Recognition

EmoTalk: speech -Driven Emotional Disentanglement for 3D Face Animation

Emotion Dependent Domain Adaptation for speech Driven Affective Facial Feature Synthesis

Emotion recognition from speech signals via a probabilistic echo-state network

Emotion Recognition of Affective speech Based on Multiple Classifiers Using Acoustic-Prosodic Information and Semantic Labels

Emotional speech Analysis on Nonlinear Manifold

Emotional speech Classification Based on Multi View Characterization

Emotional speech Clustering Based Robust Speaker Recognition System

Emotional speech Recognition Using Acoustic Models of Decomposed Component Words

End-to-End Audiovisual speech Recognition System With Multitask Learning

End-to-End Dual-Branch Network Towards Synthetic speech Detection

End-to-End Pathological speech Detection Using Wavelet Scattering Network

End-to-end Triplet Loss based Emotion Embedding System for speech Emotion Recognition

End-to-End Video-to- speech Synthesis Using Generative Adversarial Networks

End-to-end visual speech recognition for small-scale datasets

Enhanced VQ-Based Algorithms for speech Independent Speaker Identification

Enhancement of Spectral Tilt in Synthesized speech

Enhancing Emotion Classification Through speech and Correlated Emotional Sounds via a Variational Auto-Encoder Model with Prosodic Regularization

Enhancing Frequency Shifted speech Signals in Single Side-Band Communication

EPG2S: speech Generation and Speech Enhancement Based on Electropalatography and Audio Signals Using Multimodal Learning

EPG2S: speech Generation and Speech Enhancement Based on Electropalatography and Audio Signals Using Multimodal Learning

Error Mitigation Technique for Erasure Channels Based on a Wavelet Representation of the speech Excitation Signal, An

Error-Diffusion Based speech Feature Quantization for Small-Footprint Keyword Spotting

ESAformer: Enhanced Self-Attention for Automatic speech Recognition

Estimating speech Spectral Amplitude Based on the Nakagami Approximation

Estimation of Rapidly Time-Varying Harmonic Noise for speech Enhancement

Evaluation of Head Gaze Loosely Synchronized With Real-Time Synthetic speech for Social Robots

Evaluation of speech Emotion Classification Based on GMM and Data Fusion

Evaluation of the Concatenative Turkish Text-to- speech System

Evaluation of Visual speech Features for the Tasks of Speech and Speaker Recognition, An

Evaluation of Visual speech Features for the Tasks of Speech and Speaker Recognition, An

experimental study of energy dips for speech and music, An

Experimental Study on speech Enhancement Based on Deep Neural Networks, An

Experimental Study on Transfer Learning in Denoising Autoencoders for speech Enhancement

Experiments in dynamic programming inference of Markov networks with strings representing speech data

Explainability of speech Recognition Transformers via Gradient-Based Attention Visualization

Exploiting alternative acoustic sensors for improved noise robustness in speech communication

Exploiting speech for Automatic TV Delinearization: From Streams to Cross-Media Semantic Navigation

Exploiting speech /Gesture Co-occurrence for Improving Continuous Gesture Recognition in Weather Narration

Exploring Co-Occurence Between speech and Body Movement for Audio-Guided Video Localization

Exploring Hate speech Detection in Multimodal Publications

Exploring speech Features for Classifying Emotions along Valence Dimension

Exploring the Topics of Audio Words for Detecting Alzheimer's Disease From Spontaneous speech

Exploring Zero-Shot Emotion Recognition in speech Using Semantic-Embedding Prototypes

Expression-Preserving Face Frontalization Improves Visually Assisted speech Processing

Expressive Facial Animation Synthesis by Learning speech Coarticulation and Expression Spaces

Expressive Modulation of Neutral Visual speech

Expressive speech -Driven Lip Movements with Multitask Learning

Expressive visual text-to- speech as an assistive technology for individuals with autism spectrum conditions

Expressive Visual Text-to- speech Using Active Appearance Models

Extended Decision Tree with or Relationship for HMM-Based speech Synthesis

Extension of proposal of standards for intelligibility tests of Chinese speech : CDRT-tone

Extracting High Level Semantics by Means of speech , Audio, and Image Primitives in Surveillance Applications

F0 Parameterization of Glottalized Tones in HMM-Based speech Synthesis for Hanoi Vietnamese

FaceFormer: speech -Driven 3D Facial Animation with Transformers

Facial 3D Shape Estimation from Images for Visual speech Animation

Facial Expression Recognition in the Presence of speech Using Blind Lexical Compensation

Factorized MVDR Deep Beamforming for Multi-Channel speech Enhancement

Factors in Emotion Recognition With Deep Learning Models Using speech and Text on Multiple Corpora

Far-Field Automatic speech Recognition

Fast Object Class Labelling via speech

Fast, Diverse and Accurate Image Captioning Guided by Part-Of- speech

Feature Denoising Using Joint Sparse Representation for In-Car speech Recognition

Feature optimisation for stress recognition in speech

Feature Pooling of Modulation Spectrum Features for Improved speech Emotion Recognition in the Wild

Feature Selection Based Transfer Subspace Learning for speech Emotion Recognition

Feature selection methods for hidden Markov model-based speech recognition

Feature space video stream consistency estimation for dynamic stream weighting in audio-visual speech recognition

Features extraction and selection for emotional speech classification

Few-Shot Learning in Emotion Recognition of Spontaneous speech Using a Siamese Neural Network With Adaptive Sample Pair Formation

Finding Lips in Unconstrained Imagery for Improved Automatic speech Recognition

Fine-Grained Action Retrieval Through Multiple Parts-of- speech Embeddings

First degree heart block determination from speech analysis

Frame-synchronous noise compensation for hands-free speech recognition in car environments

From Bottom to Top: A Coordinated Feature Representation Method for speech Recognition

From speech Quality Measures to Speaker Recognition Performance

From Text to speech : A Multimodal Cross-Domain Approach for Deception Detection

FSCNet: Feature-Specific Convolution Neural Network for Real-Time speech Enhancement

FSER: Deep Convolutional Neural Networks for speech Emotion Recognition

Fundamental Technologies in Modern speech Recognition

Furcanext: End-to-end Monaural speech Separation with Dynamic Gated Dilated Temporal Convolutional Networks

Fused speech Enhancement Framework for Robust Speaker Verification, A

Fusing Audio and Visual Features of speech

Fusion of Audio-Visual Information for Integrated speech Processing

Fusion of Face and speech Data for Person Identity Verification

Fusion of speech , Faces and Text for Person Identification in TV Broadcast

Fuzzy integral based information fusion for classification of highly confusable non- speech sounds

Fuzzy rule selection using Iterative Rule Learning for speech data classification

GA Approaches to HMM Optimization for Automatic speech Recognition

Gabor Filterbank Features for Robust speech Recognition

Gammatone Cepstral Coefficients: Biologically Inspired Features for Non- speech Audio Classification

GAN-in-GAN for Monaural speech Enhancement

Gaussian Specific Compensation for Channel Distortion in speech Recognition

Gender classification in two Emotional speech databases

Generalized Two-Stage Rank Regression Framework for Depression Score Prediction from speech

Generating Co- speech Gestures for the Humanoid Robot NAO through BML

Generating Holistic 3D Human Motion from speech

Generating Personalized Virtual Agent in speech Dialogue System for People with Dementia

Generating realistic facial animation from speech

Generating Transferable Adversarial Examples for speech Classification

Genetic Algorithm-Based Adaptive Wiener Gain for speech Enhancement Using an Iterative Posterior NMF

geostatistical model for linear prediction analysis of speech , A

GesRec3D: A Real-Time Coded Gesture-to- speech System with Automatic Segmentation and Recognition Thresholding Using Dissimilarity Measures

Gesture, speech , and Gaze Cues for Discourse Segmentation

Gestures and Lip Shape Integration for Cued speech Recognition

Global Variance in speech Synthesis With Linear Dynamical Models

Graphical speech Training system for hearing impaired

Group Delay based Methods for Detection and Recognition of Whispered speech

GRU-SVM Model for Synthetic speech Detection

Guest Editorial: Special Issue on Affective speech and Language Synthesis, Generation, and Conversion

GUI for interactive speech synthesis

Harmonic Enhancement with Noise Reduction of speech Signal by Comb Filtering

Head Movements in Context of speech during Stress Induction

Hidden Bawls, Whispers, and Yelps: Can Text Convey the Sound of speech , Beyond Words?

Hidden Conditional Random Fields for Visual speech Recognition

Hierarchical Bayesian combination of plug-in maximum a posteriori decoders in deep neural networks-based speech recognition and speaker adaptation

hierarchical Bayesian model for continuous speech recognition, A

Hierarchical speech -act classification for discourse analysis

hierarchical tag-graph search scheme with layered grammar rules for spontaneous speech understanding, A

High-frame-rate real-time imaging of speech production

Higher Order Subspace Algorithm for Multichannel speech Enhancement, A

Highly Transparent Steganography Scheme of speech Signals into Color Images Using Quantization Index Modulation

Historical Perspective of speech Recognition, A

HMM based speech -driven 3D tongue animation

HNM-Based Speaker-Nonspecific Timbre Transformation Scheme for speech Synthesis, An

Hough transform-based mouth localization for audio-visual speech recognition

Human emotion recognition by optimally fusing facial expression and speech feature

hybrid approach to improve part of speech tagging system, An

Hybrid Autoregressive and Non-Autoregressive Transformer Models for speech Recognition

Hybrid HMM-Based speech Recognizer Using Kernel-Based Discriminants as Acoustic Models, A

Hybrid PNN-GMM classification scheme for speech emotion recognition, A

hybrid SVM/DDBHMM decision fusion modeling for robust continuous digital speech recognition, A

hybrid visual feature extraction method for audio-visual speech recognition, A

IBM Rich Transcription 2007 speech -to-Text Systems for Lecture Meetings, The

IDANet: An Information Distillation and Aggregation Network for speech Enhancement

IEEE Acoustics, speech , and Signal Processing Magazine

IEEE Trans. Acoustics, speech , and Signal Processing

Image Caption Generation with Part of speech Guidance

Image-Based Visual speech Animation System, An

Image-Sensitive Language Modeling for Automatic speech Recognition

Image- speech combination for interactive computer assisted transcription of handwritten documents

Imitator: Personalized speech -driven 3D Facial Animation

Impact of imperfect OCR on part-of- speech tagging

Impact of OCR Errors on Automated Classification of OCR Japanese Texts with Parts-of- speech Analysis, An

Impact of Reduced Video Quality on Visual speech Recognition, The

Implantation of voicing on whispered speech using frequency-domain parametric modelling of source and filter information

Implementation of Three Text to speech Systems for Kurdish Language

Implicit Compositional Generative Network for Length-Variable Co- speech Gesture Synthesis

Improve Word Mover's Distance with Part-of- speech Tagging

improved maximum model distance approach for HMM-based speech recognition systems, An

Improved speech Reconstruction from Silent Video

Improvement of speech emotion recognition with neural network classifier by using speech spectrogram

Improvement of speech emotion recognition with neural network classifier by using speech spectrogram

Improvements on Automatic speech Segmentation at the Phonetic Level

Improving and Aligning speech with Presentation Slides

Improving Children's speech Recognition by HMM Interpolation with an Adults' Speech Recognizer

Improving Children's speech Recognition by HMM Interpolation with an Adults' Speech Recognizer

Improving Cross-Corpus speech Emotion Recognition with Adversarial Discriminative Domain Generalization (ADDoG)

Improving End-to-End Contextual speech Recognition via a Word-Matching Algorithm With Backward Search

Improving Frame-Online Neural speech Enhancement With Overlapped-Frame Prediction

Improving GANs for speech Enhancement

Improving Mandarin End-to-End speech Recognition With Word N-Gram Language Model

Improving Monaural speech Enhancement by Mapping to Fixed Simulation Space With Knowledge Distillation

Improving Multimodal speech Recognition by Data Augmentation and Speech Representations

Improving Multimodal speech Recognition by Data Augmentation and Speech Representations

Improving speech Related Facial Action Unit Recognition by Audiovisual Information Fusion

Improving the Classification of Volcanic Seismic Events Extracting New Seismic and speech Features

Improving the Performance of Deep Learning Based speech Enhancement System Using Fuzzy Restricted Boltzmann Machine

Improving the speech Quality of VoIP by Packet Prioritization

Increasing Compactness of Deep Learning Based speech Enhancement Models With Parameter Pruning and Quantization Techniques

Incremental Text-to- speech Synthesis Using Pseudo Lookahead With Large Pretrained Language Model

Individual 3d Face Synthesis Based on Orthogonal Photos and speech -driven Facial Animation

Individualized Super-Gaussian Single Microphone speech Enhancement for Hearing Aid Users With Smartphone as an Assistive Device, An

Inducing Genuine Emotions in Simulated speech -Based Human-Machine Interaction: The NIMITEK Corpus

Influence of Hangover and Hangbefore Criteria on Automatic speech Recognition

Influence of speech /Non-Speech Segmentation on On-Line and Off-Line Speaker Segmentation Accuracy, The

Influence of speech /Non-Speech Segmentation on On-Line and Off-Line Speaker Segmentation Accuracy, The

Information Fusion and Person Verification Using speech and Face Information

Information-Extraction Approach to speech Processing: Analysis, Detection, Verification, and Recognition, An

Instrumental Assessment of Prosodic Quality for Text-to- speech Signals

Integrated analysis of speech and images as a probabilistic decoding process

Integrated Mining of Visual Features, speech Features, and Frequent Patterns for Semantic Video Annotation

Integrated neural network model for identifying speech acts, predicators, and sentiments of dialogue utterances

Integrating Binary Mask Estimation With MRF Priors of Cochleagram for speech Separation

Integrating Part of speech Guidance for Image Captioning

Integration of Vision and speech Understanding Using Bayesian Networks

Intelligibility Enhancement Via Normal-to-Lombard speech Conversion With Long Short-Term Memory Network and Bayesian Gaussian Mixture Model

Intelligibility improvements using binaural diverse sub-band processing applied to speech corrupted with automobile noise

Intelligibility of Children with Cleft Lip and Palate: Evaluation by speech Recognition Techniques

Inter-frame contextual modelling for visual speech recognition

Interaction between speech and Gesture: Strategies for Pointing to Distant Objects

Interaction framework for home environment using speech and vision

Interaction of Iconic Gesture and speech in Talk, The

Interaction With Gaze, Gesture, and speech in a Flexibly Configurable Augmented Reality System

Interdependencies among Voice Source Parameters in Emotional speech

Interference Reduction in Reverberant speech Separation With Visual Voice Activity Detection

Intra-Predictive Switched Split Vector Quantization of speech Spectra

Introduction to the Special Issue: Advances on pattern recognition for speech and audio processing

Investigation into Audiovisual speech Correlation in Reverberant Noisy Environments, An

Investigation of Partition-Based and Phonetically-Aware Acoustic Features for Continuous Emotion Prediction from speech , An

Investigation of speech Landmark Patterns for Depression Detection

Invited paper: Automatic speech recognition: History, methods and challenges

ISL RT-07 speech -to-Text System, The

Isolate speech Recognition Based on Time-Frequency Analysis Methods

Isolated word recognition by neural network models with cross-correlation coefficients for speech dynamics

Iterative Closed-Loop Phase-Aware Single-Channel speech Enhancement

Iterative Feature Normalization Scheme for Automatic Emotion Detection from speech

Joint Bayesian Estimation of Time-Varying LP Parameters and Excitation for speech

KAN-AV dataset for audio-visual face and speech analysis in the wild

Kernel Eigenvoices (Revisited) for Large-Vocabulary speech Recognition

Key Frame Mechanism for Efficient Conformer Based End-to-End speech Recognition

Keyword Detection for Spontaneous speech

Kinect Development Kit: A Toolkit for Gesture- and speech -Based Human-Machine Interaction

Language-Independent OCR Using a Continuous speech Recognition System

Large Vocabulary Audio-visual speech Recognition Using Active Shape Models

Large Vocabulary Audio-Visual speech Recognition Using the Janus Speech Recognition Toolkit

Large Vocabulary Audio-Visual speech Recognition Using the Janus Speech Recognition Toolkit

Large Vocabulary Continuous speech Recognition With Reservoir-Based Acoustic Models

Large-Vocabulary Continuous speech Recognition Systems: A Look at Some Recent Advances

Late pre-dereverberation for speech intelligibility enhancement in public address systems

Latency in speech Feature Analysis for Telepresence Event Coding

Learning Contextually Fused Audio-Visual Representations for Audio-Visual speech Recognition

Learning Continuous Facial Actions From speech for Real-Time Animation

Learning Hierarchical Cross-Modal Association for Co- speech Gesture Generation

Learning Individual Speaking Styles for Accurate Lip to speech Synthesis

Learning Landmarks Motion from speech for Speaker-agnostic 3d Talking Heads Generation

Learning Salient Features for speech Emotion Recognition Using Convolutional Neural Networks

Learning Speaker-specific Lip-to- speech Generation

Learning Torso Prior for Co- speech Gesture Generation with Better Hand Shape

Learning Visual speech

Learning With Learned Loss Function: speech Enhancement With Quality-Net to Improve Perceptual Evaluation of Speech Quality

Learning With Learned Loss Function: speech Enhancement With Quality-Net to Improve Perceptual Evaluation of Speech Quality

Letter-To-Sound conversion for speech synthesizer

Leveraging Non-Causal Knowledge via Cross-Network Knowledge Distillation for Real-Time speech Enhancement

LFEformer: Local Feature Enhancement Using Sliding Window With Deformability for Automatic speech Recognition

Linked Source and Target Domain Subspace Feature Transfer Learning -- Exemplified by speech Emotion Recognition

Lip Movement Synthesis from speech Based on Hidden Markov Models

Lip Reading for Low-resource Languages by Learning and Combining General speech Knowledge and Language-specific Knowledge

Lip Shape and Hand Position Fusion for Automatic Vowel Recognition in Cued speech for French

Lip2Vec: Efficient and Robust Visual speech Recognition via Latent-to-Latent Visual to Audio Representation Mapping

Listen and Look: Audio-Visual Matching Assisted speech Source Separation

Listening with Your Eyes: Towards a Practical Visual speech Recognition System Using Deep Boltzmann Machines

Lite-RTSE: Exploring a Cost-Effective Lite DNN Model for Real-Time speech Enhancement in RTC Scenarios

LivelySpeaker: Towards Semantic-Aware Co- speech Gesture Generation

LM-VC: Zero-Shot Voice Conversion via speech Generation Based on Language Models

Localizing Fake Segments in speech

Locally Normalized Filter Banks Applied to Deep Neural-Network-Based Robust speech Recognition

Locating and Tracking Facial speech Features

Long-Frame-Shift Neural speech Phase Prediction With Spectral Continuity Enhancement and Interpolation Error Compensation

Look&listen: Multi-Modal Correlation Learning for Active Speaker Detection and speech Enhancement

Looking into Your speech : Learning Cross-modal Affinity for Audio-visual Speech Separation

Looking into Your speech : Learning Cross-modal Affinity for Audio-visual Speech Separation

Low-Complexity Parabolic Lip Contour Model With Speaker Normalization for High-Level Feature Extraction in Noise-Robust Audiovisual speech Recognition, A

Low-Rank and Sparsity Analysis Applied to speech Enhancement Via Online Estimated Dictionary

Low-Resource Adaptation for Personalized Co- speech Gesture Generation

M3TTS: Multi-modal text-to- speech of multi-scale style control for dubbing

Mandarin Emotional speech Recognition Based on SVM and NN

Mandarin Text-to- speech Front-End With Lightweight Distilled Convolution Network

Marathi Language speech Synthesizer Using Concatenative Synthesis Strategy (Spoken in Maharashtra, India)

Markov random field model for automatic speech recognition, A

Mathematical Modeling of the Effects of speech Warning Characteristics on Human Performance and Its Application in Transportation Cyberphysical Systems

maximum model distance approach for HMM-based speech recognition, A

Maximum Phase Modeling for Sparse Linear Prediction of speech

Memory Attention: Robust Alignment Using Gating Mechanism for End-to-End speech Synthesis

MES-P: An Emotional Tonal speech Dataset in Mandarin with Distal and Proximal Labels

MeshTalk: 3D Face Animation from speech using Cross-Modality Disentanglement

Method and apparatus for producing audio-visual synthetic speech

Method and apparatus for synthetic speech in facial animation

Methodology for Acoustic Characterization of a Labial Constraint in speech Production

Methods and devices for producing and using synthetic visual speech based on natural coarticulation

Micro-Doppler Classification for Ground Surveillance Radar Using speech Recognition Tools

Microphone Array Processing Strategies for Distant-Based Automatic speech Recognition

Minimized Database of Unit Selection in Visual speech Synthesis without Loss of Naturalness

MixCycle: Unsupervised speech Separation via Cyclic Mixture Permutation Invariant Training

Mixed bayesian networks with auxiliary variables for automatic speech recognition

Mix speech : Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition

Mixture of Factor Analyzers Using Priors From Non-Parallel speech for Voice Conversion

Mixture of Support Vector Machines for HMM based speech Recognition

Mixtures of Local Dictionaries for Unsupervised speech Enhancement

Model-Based Localization Method by Non- speech Sound Via Wavelet Transform and Dynamic Neural Network

Modeling and Synthesis of Facial Motion Driven by speech

Modeling Feature Representations for Affective speech Using Generative Adversarial Networks

Modeling human activities as speech

Modeling of Physical Characteristics of speech under Stress

Modeling Syllable-Based Pronunciation Variation for Accented Mandarin speech Recognition

Modeling the Temporal Evolution of Acoustic Parameters for speech Emotion Recognition

Modeling Vocal Entrainment in Conversational speech Using Deep Unsupervised Learning

Modelling and combining emotions, visual speech and gestures in virtual head models

Modelling Combined Handwriting and speech Modalities

Models for the Perception of speech and Visual Form

Mono-font Cursive Arabic Text Recognition Using speech Recognition System

More than Words: In-the-Wild Visually-Driven Prosody for Text-to- speech

Moroccan Dialect speech Recognition System Based on CMU SphinxTools

Morpheme-Based Automatic speech Recognition of Basque

Morphological normalization of vowel images for articulatory speech recognition

Moving-Talker, Speaker-Independent Feature Study, and Baseline Results Using the CUAVE Multimodal speech Corpus

Multi-environment model adaptation based on vector Taylor series for robust speech recognition

Multi-Font Off-Line Arabic Character Recognition Using the BBN Byblos speech Recognition System

Multi-Label speech Emotion Recognition via Inter-Class Difference Loss Under Response Residual Network

Multi-layer encoder-decoder time-domain single channel speech separation

Multi-lingual and Multi-modal speech Processing and Applications

Multi-Modal Human Verification Using Face and speech

Multi-modal information retrieval from broadcast video using OCR and speech recognition

Multi-modality Associative Bridging through Memory: speech Sound Recollected from Face Video

Multi-task multimodal feature refinement for emotional speech animation

Multi-Task Semi-Supervised Adversarial Autoencoding for speech Emotion Recognition

Multi-view visual speech recognition based on multi task learning

Multichannel filters for speech recognition using a particle swarm optimization

Multilevel Integration of Vision and speech Understanding Using Bayesian Networks

Multimedia Document Retrieval Using speech and Speaker Recognition

Multimodal biometric authentication using speech and hand geometry fusion

Multimodal Biometric System Using Fingerprint, Face and speech , A

Multimodal Database of Emotional speech , Video and Gestures

Multimodal Emotion Recognition Based on speech and Physiological Signals Using Deep Neural Networks

Multimodal Interface Framework for Using Hand Gestures and speech in Virtual Environment Applications, A

Multimodal person authentication using speech , face and visual speech

Multimodal person authentication using speech , face and visual speech

Multiple classifier applied on predicting microsleep from speech

Multiple statistical models for soft decision in noisy speech enhancement

Multistream Articulatory Feature-Based Models for Visual speech Recognition

Multistream Recognition of speech : Dealing With Unknown Unknowns

Multitapering and a wavelet variant of MFCC in speech recognition

Multitask Learning From Augmented Auxiliary Data for Improving speech Emotion Recognition

Multivariate Autoregressive Spectrogram Modeling for Noisy speech Recognition

Mutual Alignment between Audiovisual Features for End-to-End Audiovisual speech Recognition

Mutual-optimization Towards Generative Adversarial Networks For Robust speech Recognition

Nested U-Net With Self-Attention and Dense Connectivity for Monaural speech Enhancement, A

Neural Emotion Director: speech -preserving semantic control of facial expressions in in-the-wild videos

Neural network-based adaptive noise cancellation for enhancement of speech auditory brainstem responses

Neurally Optimized Decoder for Low Bitrate speech Codec

New Approach to Fourier Synthesis With Application to Neural Encoding and speech Classification, A

New Approach to Integrate Audio and Visual Features of speech , A

new approach to speech -input statistical translation, A

New Encoding Algorithm for Distributed speech Recognition Based on DTFS Transform

New feature weighting approaches for speech -act classification

New Insights into the Kalman Filter Beamformer: Applications to speech and Robustness

New Manifold Representation for Visual speech Recognition, A

New Parameter of speech Character Based on the Bloomfield's Model, A

New single-ended objective measure for non-intrusive speech quality evaluation

New Visual speech Recognition Approach for RGB-D Cameras, A

NMF-Based speech Enhancement Using Bases Update

Noise Adaptive Stream Weighting in Audio-Visual speech Recognition

Noise compensation in a person verification system using face and multiple speech features

Noise Robust Front-end for speech Recognition Using Hough Transform and Cumulative Distribution Mapping, A

Noise-Adaptive LDA: A New Approach for speech Recognition Under Observation Uncertainty

Noise-Separated Adaptive Feature Distillation for Robust speech Recognition

Non-Autoregressive Transformer for speech Recognition

Non-Contact speech Recovery Technology Using a 24 GHz Portable Auditory Radar and Webcam

Non-Intrusive Binaural speech Intelligibility Prediction From Discrete Latent Representations

Non-intrusive speech -quality assessment using vocal-tract models

Nonlinear Manifold Learning for Visual speech Recognition

Normalized Training for HMM-based Visual speech Recognition

Novel Approach to Very Fast and Noise Robust, Isolated Word speech Recognition, A

Novel Data Independent Approach for Conversion of Hand Punched Kannada Braille Script to Text and speech , A

Novel speech Emotion Recognition Method via Incomplete Sparse Least Square Regression, A

Novel Statistical Model for speech Recognition and POS Tagging, A

Novel Visual speech Representation and HMM Classification for Visual Speech Recognition, A

Novel Visual speech Representation and HMM Classification for Visual Speech Recognition, A

Objective Estimation of speech Quality for Communication Systems

Obtaining speech assets for judgement analysis on low-pass filtered emotional speech

Obtaining speech assets for judgement analysis on low-pass filtered emotional speech

On Emotions as Features for speech Overlaps Classification

On Factoring Out a Gesture Typology from the Bielefeld speech -and-Gesture-Alignment Corpus (SAGA)

On Homotopy Continuation for speech Restoration

On Optimal Linear Filtering of speech for Near-End Listening Enhancement

On the Audio-visual Synchronization for Lip-to- speech Synthesis

On the Compensation Between Magnitude and Phase in speech Separation

On the Estimation of Fundamental Frequency From Nonstationary Noisy speech Signals Based on the Hilbert-Huang Transform

On the Processing of Fuzzy Patterns for Text Independent Phonetic speech Segmentation

On the Relationship between Face Movements, Tongue Movements, and speech Acoustics

On the Robustness of Parametric Watermarking of speech

On the Use of Computer Vision Techniques for Automatic speech Recognition

On the use of different speech representations for speaker modeling

On the Use of Time-Domain Widely Linear Filtering for Binaural speech Enhancement

On Training speech Separation Models With Various Numbers of Speakers

On-Line speech /Music Segmentation for Broadcast News Domain

One-Pulse FEC Coding for Robust CELP-Coded speech Transmission Over Erasure Channels

Online Animation System For Practicing Cued speech

Online Automatic speech Recognition With Listen, Attend and Spell Model

Online speech Dereverberation Using Mixture of Multichannel Linear Prediction Models

Optimal residual frame based source modeling for HMM-based speech synthesis

Optimized discriminative transformations for speech features based on minimum classification error

Optimizing speech Intelligibility in a Noisy Environment: A unified view

Other Related Papers, Audio, speech , Signal Processing, Pattern Recognition

Over-Sampling Emotional speech Data Based on Subjective Evaluations Provided by Multiple Individuals

Overview of compression and packet loss effects in speech biometrics

Panel Tracking for the Extraction and the Classification of speech Balloons

Parallel implementation of Artificial Neural Network training for speech recognition

Parametric Representation of the Speaker's Lips for Multimodal Sign Language And speech Recognition

Part-of- speech Tagging Based on Machine Translation Techniques

Part-of- speech Tagging for Table of Contents Recognition

Partial linear regression for speech -driven talking head application

Particle filtering based pitch sequence correction for monaural speech segregation

Patient-Provider Communication Training Models for Interactive speech Devices

Perceptual Evaluation of Video-Realistic speech

Perceptual Properties of Current speech Recognition Technology

PFRNet: Dual-Branch Progressive Fusion Rectification Network for Monaural speech Enhancement

Phase Estimation in Single Channel speech Enhancement Using Phase Decomposition

Phase Processing for Single-Channel speech Enhancement: History and recent advances

Phase-Sensitive Joint Learning Algorithms for Deep Learning-Based speech Enhancement

phone-viseme dynamic Bayesian network for audio-visual automatic speech recognition, A

Phoneme segmentation of speech

Photorealistic adaptation and interpolation of facial expressions using HMMS and AAMS for audio-visual speech synthesis

pilot study on augmented speech communication based on Electro-Magnetic Articulography, A

Pipelined Recurrent Fuzzy Neural Networks for Nonlinear Adaptive speech Prediction

Pitch Delay Based Adaptive Steganography for AMR speech Stream

Pitch Detection Algorithms and Voiced/Unvoiced Classification for Noisy speech

Pitch-Normalized Acoustic Features for Robust Children's speech Recognition

Place Theory as an Alternative Solution in Automatic speech Recognition Tasks, The

Polish Emotional speech Database: Recording and Preliminary Validation

Power Exponent Based Weighting Criterion for DNN-Based Mask Approximation in speech Enhancement

Practical Considerations for Real-Time Implementation of speech -Based Gender Detection

Prediction-based classification for audiovisual discrimination between laughter and speech

Principal Component Analysis of speech Spectrogram Images

Probabilistic Class Histogram Equalization Based on Posterior Mean Estimation for Robust speech Recognition

Probabilistic Kernels for Improved Text-to- speech Alignment in Long Audio Tracks

QPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural speech -Driven Gesture Generation

Quality-Aware Bag of Modulation Spectrum Features for Robust speech Emotion Recognition

Quantifying Emotional Similarity in speech

Quantitative Analysis of the Relative Local speech Rate

Query expansion for imperfect speech : applications in distributed learning

R-CNN Based Method to Localize speech Balloons in Comics, An

R-Letter disorder diagnosis (R-LDD): Arabic speech database development for automatic diagnosis of childhood speech disorders (Case study)

R-Letter disorder diagnosis (R-LDD): Arabic speech database development for automatic diagnosis of childhood speech disorders (Case study)

Rate-Invariant Analysis of Trajectories on Riemannian Manifolds with Application in Visual speech Recognition

Rate-invariant comparisons of covariance paths for visual speech recognition

Re-Synchronization Using the Hand Preceding Model for Multi-Modal Fusion in Automatic Continuous Cued speech Recognition

Reading to Listen at the Cocktail Party: Multi-Modal speech Separation

Real-Time Lip Tracking for Audio-Visual speech Recognition Applications

Real-Time Recognition of Affective States from Nonverbal Features of speech and Its Application for Public Speaking Skill Analysis

Real-Time Scene Text to speech System, A

Real-time sign language recognition and speech conversion using VGG16

Real-time speech -driven 3D face animation

Real-Time Vision and speech Driven Avatars for Multimedia Applications

Realistic Face Animation for Audiovisual speech Applications: A Densification Approach Driven by Sparse Stereo Meshes

Realistic speech animation based on observed 3D face dynamics

Realistic speech -Driven Facial Animation with GANs

Recent advances in the automatic recognition of audiovisual speech

Recognition of gestures in the context of speech

Recognition of phonetic labels of the TIMIT speech corpus by means of an artificial neural network

Recognition of visual speech elements using adaptively boosted hidden Markov models

Recognizing Stress Using Semantics and Modulation of speech and Gestures

Reconstructing speech From CNN Embeddings

Reconstruction of Dysphonic speech by MELP

Reconstruction-Based Visual-Acoustic-Semantic Embedding Method for speech -Image Retrieval, A

Recurrent Neural Network Based Small-footprint Wake-up-word speech Recognition System with a Score Calibration Method

Recurrent neural network speech predictor based on dynamical systems approach

Reduced Universal Background Model for speech Recognition and Identification System

Reduction of musical residual noise using perceptual tools with classic speech denoising techniques

Regression based landmark estimation and multi-feature fusion for visual speech recognition

Regularized Subspace Gaussian Mixture Models for speech Recognition

reliable multidomain model for speech act classification, A

Representation of speech in Deep Neural Networks, The

Rescoring of N-Best Hypotheses Using Top-Down Selective Attention for Automatic speech Recognition

Research and Design of Smart Home speech Recognition System Based on Deep Learning

Research of Chain Model Based on CNN-TDNNF in Yulin Dialect speech Recognition, The

Research of STRAIGHT Spectrogram and Difference Subspace Algorithm for speech Recognition

Research on HMM_based speech synthesis for Lhasa dialect

Research Progress in speech Enhancement Technology

Researchers Push speech Recognition Toward the Mainstream

Residual Excitation Skewness for Automatic speech Polarity Detection

Resolution limits on visual speech recognition

Restoration of Bone-Conducted speech With U-Net-Like Model and Energy Distance Loss

Rethinking Algorithm Design and Development in speech Processing

Reversible Audio Data Hiding Based on Variable Error-Expansion of Linear Prediction for Segmental Audio and G.711 speech

review of recent advances in visual speech decoding, A

ReVISE: Self-Supervised speech Resynthesis with Visual Input for Universal and Generalized Speech Regeneration

ReVISE: Self-Supervised speech Resynthesis with Visual Input for Universal and Generalized Speech Regeneration

RNN-Based speech -Music Discrimination Used for Hybrid Audio Coder, An

Robot Command Interface Using an Audio-Visual speech Recognition System

Robust and Fast Localization of Single speech Source Using a Planar Array

Robust Arabic Multi-stream speech Recognition System in Noisy Environment

Robust Audio-Visual Mandarin speech Recognition Based On Adaptive Decision Fusion And Tone Features

Robust Audio-Visual speech Recognition Based on Hybrid Fusion

Robust Audio-Visual speech Recognition Based on Late Integration

Robust Audio-Visual speech Recognition Under Noisy Audio-Video Conditions

Robust Automatic speech Recognition Using PD-MEEMLIN

Robust Biometric Person Identification Using Automatic Classifier Fusion of speech , Mouth, and Face Experts

Robust Face Frontalization For Visual speech Recognition*

robust method for the Vietnamese handwritten and speech recognition, A

Robust Parallel speech Recognition in Multiple Energy Bands

Robust Pitch Extraction Method for the HMM-Based speech Synthesis System

Robust Sensor Fusion: Analysis and Application to Audio-Visual speech Recognition

Robust Speaker Verification via Asynchronous Fusion of speech and Lip Information

Robust speech recognition using spatial-temporal feature distribution characteristics

Robust telephone speech recognition based on channel compensation

robust unsupervised pattern discovery and clustering of speech signals, A

Robustness of linear discriminant analysis in automatic speech recognition

Role of Long-Term Dependency in Synthetic speech Detection, The

Role of Synthetically Generated Samples on speech Recognition in a Resource-Scarce Language

Role of Vocal Persona in Natural and Synthesized speech , The

RSD-GAN: Regularized Sobolev Defense GAN Against speech -to-Text Adversarial Attacks

Salient Feature Extraction Algorithm for speech Emotion Recognition, A

Say it to see it: A speech based immersive model retrieval system

SC-CNN: Effective Speaker Conditioning Method for Zero-Shot Multi-Speaker Text-to- speech Systems

Searching through a speech Memory for Text-Independent Speaker Verification

Secure speech biometric templates for user authentication

SEEG: Semantic Energized Co- speech Gesture Generation

Selection of Unknown Objects Specified by speech Using Models Constructed from Web Images

Selective HuBERT: Self-Supervised Pre-Training for Target Speaker in Clean and Mixture speech

Self Supervised Adversarial Domain Adaptation for Cross-Corpus and Cross-Language speech Emotion Recognition

Semi-blind speech -Music Separation Using Sparsity and Continuity Priors

Semi-supervised speech -driven 3D Facial Animation via Cross-modal Encoding

Sentence boundary detection in conversational speech transcripts using noisily labeled examples

Separation of Audio-Visual speech Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli

Separation of Audio-Visual speech Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli

Session compensation using binary speech representation for speaker recognition

SFNet: A Computationally Efficient Source Filter Model Based Neural speech Synthesis

Signal subspace approach for narrowband noise reduction in speech

Signal-Aware Parametric Quality Model for Audio and speech over IP Networks

Signal-to-Signal Ratio Independent Speaker Identification for Co-channel speech Signals

Significance of Empty speech Pauses: Cognitive and Algorithmic Issues, The

Significance of Pitch-Based Spectral Normalization for Children's speech Recognition

Simple Model of speech Communication and its Application to Intelligibility Enhancement, A

Single Channel speech Separation Using Source-Filter Representation

Single-Channel speech Separation Focusing on Attention DE

Single-Input/Binaural-Output Antiphasic speech Enhancement Method for Speech Intelligibility Improvement, A

Single-Input/Binaural-Output Antiphasic speech Enhancement Method for Speech Intelligibility Improvement, A

SNAC: Speaker-Normalized Affine Coupling Layer in Flow-Based Architecture for Zero-Shot Multi-Speaker Text-to- speech

So-DAS: A Two-Step Soft-Direction-Aware speech Separation Framework

Some recent advances in speech recognition with potential applications in other statistical pattern recognition areas

Some relations among stochastic finite state networks used in automatic speech recognition

Something to Talk About: Signal Processing in speech and Audiology Research: Promising Investigations Explore New Opportunities in Human Communication

source and channel coding approach to data hiding with application to hiding speech in video, A

SPACE: speech -driven Portrait Animation with Controllable Expression

Sparse Kernel Reduced-Rank Regression for Bimodal Emotion Recognition From Facial Expression and speech

Speaker Attractor Network: Generalizing speech Separation to Unseen Numbers of Sources

Speaker Extraction With Co- speech Gestures Cue

Speaker identification security improvement by means of speech watermarking

Speaker Independent Audio-Visual speech Recognition

Speaker Modeling with Various speech Representations

Speaker-aware Multi-Task Learning for automatic speech recognition

Speaker-aware speech Emotion Recognition by Fusing Amplitude and Phase Information

Speaker-Independent speech Animation Using Perceptual Loss Functions and Synthetic Data

Speaker-independent speech Recognition by Means of Functional-link Neural Networks

Spectral Domain speech Enhancement Using HMM State-Dependent Super-Gaussian Priors

Spectral domain texture analysis for speech enhancement

Spectral Features Based on Local Hu Moments of Gabor Spectrograms for speech Emotion Recognition

Spectral Flatness Analysis for Emotional speech Synthesis and Transformation

Spectral Tilt Estimation for speech Intelligibility Enhancement Using RNN Based on All-Pole Model

SPECTRE: Visual speech -Informed Perceptual 3D Facial Expression Reconstruction from Videos

Spectro-Temporal Filtering for Multichannel speech Enhancement in Short-Time Fourier Transform Domain

speech Activity Detection in Naturalistic Audio Environments: Fearless Steps Apollo Corpus

speech Analysis, other than Recognition

speech Animation Using Coupled Hidden Markov Models

speech Authentication and Recovery Scheme in Encrypted Domain

speech authentication system using digital watermarking and pattern recovery

speech Ballons in Comics, Comic Analysis, Panel Detection

speech balloon and speaker association for comics and manga understanding

speech Bandwidth Extension Using Recurrent Temporal Restricted Boltzmann Machines

speech Based Approach to Surveillance Video Retrieval, A

speech Based Shopping Assistance for the Blind

speech Content Retrieval Model Based on Integrated Neural Network for Natural Language Description, A

speech Denoising and Compensation for Hearing Aids Using an FTCRN-Based Metric GAN

speech driven facial animation using a hidden markov coarticulation model

speech driven lip synthesis using viseme based hidden markov models

speech Driven Talking Face Generation From a Single Image and an Emotion Condition

speech Driven Tongue Animation

speech driven video editing via an audio-conditioned diffusion model

speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates

speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates

speech Emotion Analysis in Noisy Real-World Environment

speech emotion recognition based on kernel reduced-rank regression

speech Emotion Recognition Enhanced Traffic Efficiency Solution for Autonomous Vehicles in a 5G-Enabled Space-Air-Ground Integrated Intelligent Transportation System

speech emotion recognition model based on Bi-GRU and Focal Loss

speech emotion recognition system based on genetic algorithm and neural network

speech Emotion Recognition using a backward context

speech Emotion Recognition Using Deep Convolutional Neural Network and Discriminant Temporal Pyramid Matching

speech Emotion Recognition Using Fourier Parameters

speech emotion recognition via learning analogies

speech Emotion Recognition via Multi-Level Attention Network

speech Enhancement Based on Deep Autoencoder for Remote Arabic Speech Recognition

speech Enhancement Based on Deep Autoencoder for Remote Arabic Speech Recognition

speech enhancement for in-vehicle voice control systems using wavelet analysis and blind source separation

speech Enhancement Using a Two-Stage Network for an Efficient Boosting Strategy

speech Enhancement with Nonstationary Acoustic Noise Detection in Time Domain

speech Enhancement: A Review of Modern Methods

speech frame recognition based on less shift sensitive wavelet filter banks

speech Information Processing: Theory and Applications

speech Intelligibility Enhancement By Non-Parallel Speech Style Conversion Using CWT and iMetricGAN Based CycleGAN

speech Intelligibility Enhancement By Non-Parallel Speech Style Conversion Using CWT and iMetricGAN Based CycleGAN

speech Intelligibility Estimation Method Using a Non-reference Feature Set, A

speech Magnitude-Spectrum Information-Entropy (MSIE) for Automatic Speech Recognition in Noisy Environments

speech Magnitude-Spectrum Information-Entropy (MSIE) for Automatic Speech Recognition in Noisy Environments

speech music discrimination using class-specific features

speech Personality Recognition Based on Annotation Classification Using Log-Likelihood Distance and Extraction of Essential Audio Features

speech Privacy for Sound Surveillance Using Super-Resolution Based on Maximum Likelihood and Bayesian Linear Regression

speech Quality Assessment Over Lossy Transmission Channels Using Deep Belief Networks

speech recognition method based on feature distributions, A

speech Recognition Moves from Software to Hardware

speech Recognition of English by Japanese Using Lexicon Represented by Multiple Reduced Phoneme Sets

speech Recognition of Mandarin Monosyllables

speech Recognition Supported by Lip Analysis

speech Recognition System of Arabic Alphabet Based on a Telephony Arabic Corpus

speech recognition using fractals

speech Recognition Using Long-Span Temporal Patterns in a Deep Network Model

speech recognition with hierarchical recurrent neural networks

speech Recognition, Neural Networks, CNN

speech Recognition, Speech Analysis, Signal Processing

speech Recognition, Speech Analysis, Signal Processing

speech Separation from Background of Music Based on Single-channel Recording

speech Signal Processing Based on Wavelets and SVM for Vocal Tract Pathology Detection

speech Spectral Envelope Enhancement by HMM-Based Analysis/Resynthesis

speech Synchronized Tongue Animation by Combining Physiology Modeling and X-ray Image Fitting

speech Synthesis Approach for High Quality Speech Separation and Generation, A

speech Synthesis Approach for High Quality Speech Separation and Generation, A

speech Synthesis Based on Hidden Markov Models

speech Synthesis for the Generation of Artificial Personality

speech Synthesis With Mixed Emotions

speech Synthesis, Synthetic Speech

speech Synthesis, Synthetic Speech

speech Time-Scale Modification With GANs

speech understanding and dialog system with a homogeneous linguistic knowledge base, A

speech Watermarking Method Based on Formant Tuning

speech -assisted lip synchronization in audio-visual communications

speech -Centric Information Processing: An Optimization-Oriented Approach

speech -controlled animation system

speech -Driven 3D Facial Animation with Implicit Emotional Awareness: A Deep Learning Approach

speech -Driven Automatic Facial Expression Synthesis

speech -Driven Expressive Talking Lips with Conditional Sequential Generative Adversarial Networks

speech -driven face synthesis from 3D video

speech -driven facial animation using a hierarchical model

speech -Driven Facial Animation Using a Shared Gaussian Process Latent Variable Model

speech -driven Facial Animation Using Cascaded Gans for Learning of Motion and Texture

speech -Driven Facial Animation Using Manifold Relevance Determination

speech -gesture driven multimodal interfaces for crisis management

speech -To-Face Movement Synthesis Based on HMMS

speech -to-Singing Voice Conversion: The Challenges and Strategies for Improving Vocal Conversion Processes

speech -to-video synthesis using facial animation parameters

speech -to-video synthesis using MPEG-4 compliant visual features

speech -Video Synchronization Using Lips Movements and Speech Envelope Correlation

speech -Video Synchronization Using Lips Movements and Speech Envelope Correlation

speech -Visual Emotion Recognition by Fusing Shared and Specific Features

speech -Visual Emotion Recognition via Modal Decomposition Learning

speech /Gesture Interface to a Visual Computing Environment for Molecular Biologists

speech /Music Classification Based on Distributed Evolutionary Fuzzy Logic for Intelligent Audio Coding

speech /music discrimination for analysis of radio stations

speech 2Lip: High-fidelity Speech to Lip Generation by Learning from a Short Video

speech 4Mesh: Speech-Assisted Monocular 3D Facial Reconstruction for Speech-Driven 3D Facial Animation

speech 4Mesh: Speech-Assisted Monocular 3D Facial Reconstruction for Speech-Driven 3D Facial Animation

Split Bregman Approach to Linear Prediction Based Dereverberation With Enforced speech Sparsity

Spontaneous speech Emotion Recognition Using Multiscale Deep Convolutional LSTM

Spontaneous speech emotion recognition using prior knowledge

Spotting words in silent speech videos: a retrieval-based approach

SSSD: speech Scene database by Smart Device for Visual Speech Recognition

SSSD: speech Scene database by Smart Device for Visual Speech Recognition

Stable Implementation of Zero Frequency Filtering of speech Signals for Efficient Epoch Extraction

Standardization-refinement domain adaptation method for cross-subject EEG-based classification in imagined speech recognition

Statistical estimation of emotions in speech notes by featured term analogy

Statistical Machine Translation for speech : A Perspective on Structures, Learning, and Decoding

Statistical Parametric speech Synthesis Using Generalized Distillation Framework

Steganalysis of Compressed speech Based on Markov and Entropy

Stochastic Modelling: From Pattern Classification to speech Recognition and Translation

Strategies to improve the performance of very low bit rate speech coders and application to a variable rate 1.2 kb/s codec

Streaming End-to-End Multi-Talker speech Recognition

Structural representation of speech for phonetic classification

study of artificial speech quality assessors of VoIP calls subject to limited bursty packet losses, A

Style Extractor For Facial Expression Recognition in the Presence of speech

Style Transfer for Co- speech Gesture Animation: A Multi-speaker Conditional-mixture Approach

Subband-Based Stationary-Component Suppression Method Using Harmonics and Power Ratio for Reverberant speech Recognition, A

Subspace-Based Learning for Automatic Dysarthric speech Detection

Supervised Learning Approach for Explicit Spatial Filtering of speech

Supervised Monaural speech Enhancement Using Complementary Joint Sparse Representations

Supervised single-channel speech dereverberation and denoising using a two-stage processing

Support Vector Machine-Based Dynamic Network for Visual speech Recognition Applications, A

Survey of Deep Representation Learning for speech Emotion Recognition

Survey on speech emotion recognition: Features, classification schemes, and databases

Switching Auxiliary Chains for speech Recognition based on Dynamic Bayesian Networks

Switching Linear Dynamic Models for Noise Robust In-Car speech Recognition

SylNet: An Adaptable End-to-End Syllable Count Estimator for speech

Synchrony-Based Feature Extraction for Robust Automatic speech Recognition

syntactic procedure for the recognition of glottal pulses in continuous speech , A

Synthesising 3D Facial Motion from In-the-Wild speech

Synthetic speech Detection Based on Local Autoregression and Variance Statistics

Synthetic speech Detection Based on the Temporal Consistency of Speaker Features

SynthVSR: Scaling Up Visual speech Recognition With Synthetic Supervision

System and Analysis Used for a Dynamic Facial speech Deformation Model

System and method for triphone-based unit selection for visual speech synthesis

Talking About 3D Scenes: Integration of Image and speech Understanding in a Hybrid Distributed System

Talking Face: Using Facial Feature Detection and Image Transformations for Visual speech

Talking Heads, speech Driven Face Animation

Taming Diffusion Models for Audio-Driven Co- speech Gesture Generation

TCD-TIMIT: An Audio-Visual Corpus of Continuous speech

Technical and Phonetic Aspects of speech Quality Assessment: The Case of Prosody Synthesis

Telephone-Based speech Dialog Systems

Temporal Envelope and Fine Structure Cues for Dysarthric speech Detection Using CNNs

Temporal Measures of Hand and speech Coordination During French Cued Speech Production

Temporal Measures of Hand and speech Coordination During French Cued Speech Production

Temporal Modulation Normalization for Robust speech Feature Extraction and Recognition

Temporal Multimodal Learning in Audiovisual speech Recognition

Temporal Relation Inference Network for Multimodal speech Emotion Recognition

Temporal Symbolic Integration Applied to a Multimodal System Using Gestures and speech

Text Block Segmentation in Comic speech Bubbles

Text- and speech -based phonotactic models for spoken language identification of Basque and Spanish

Text-independent speaker identification using Radon and discrete cosine transforms based features from speech spectrogram

Time Distributed Multiview Representation for speech Emotion Recognition

Time-Delay Neural Networks for Estimating Lip Movements from speech Analysis: A Useful Tool in Audio Video Synchronization

Time-Domain Multi-Modal Bone/Air Conducted speech Enhancement

Time-Domain speech Separation Networks With Graph Encoding Auxiliary

Time-Frequency Attention for speech Emotion Recognition with Squeeze-and-Excitation Blocks

Towards a high quality Arabic speech synthesis system based on neural networks and residual excited vocal tract model

Towards End-to-End Synthetic speech Detection

Towards Estimating the Upper Bound of Visual- speech Recognition: The Visual Lip-Reading Feasibility Database

Towards multilingual end-to-end speech recognition for air traffic control

Towards query-by- speech handwritten keyword spotting

Towards Robust Deep Neural Networks for Affect and Depression Recognition from speech

Towards Zero-Shot Multi-Speaker Multi-Accent Text-to- speech Synthesis

Tracking continuous emotional trends of participants during affective dyadic interactions using body language and speech information

Tracking Discourse Topics in Co- speech Gesture

Trainable videorealistic speech animation

Transfer learning helps to improve the accuracy to classify patients with different speech disorders in different languages

Transfer Linear Subspace Learning for Cross-Corpus speech Emotion Recognition

Transformer-Based End-to-End Automatic speech Recognition Algorithm, A

Transformer-Based End-to-End speech Translation With Rotary Position Embedding

Translingual visual speech synthesis

tutorial on Hidden Markov Models and selected applications in speech recognition, A

Two features combination with gated recurrent unit for visual speech recognition

Two technologies vie for recognition in speech market

Two-Band Radial Postfiltering in Cepstral Domain with Application to speech Synthesis

Two-Level Bimodal Association for Audio-Visual speech Recognition

Two-Stage Learning and Fusion Network With Noise Aware for Time-Domain Monaural speech Enhancement

Two-Stage Refinement of Magnitude and Complex Spectra for Real-Time speech Enhancement

two-stage speech activity detection system considering fractal aspects of prosody, A

U-Former: Improving Monaural speech Enhancement with Multi-head Self and Cross Attention

UniEnc-CASSNAT: An Encoder-Only Non-Autoregressive ASR for speech SSL Models

Unified Training of Feature Extractor and HMM Classifier for speech Recognition

Unit Selection Using Linguistic, Prosodic and Spectral Distance for Developing Text-to- speech System in Hindi

Universum Autoencoder-Based Domain Adaptation for speech Emotion Recognition

Unpaired Image-to- speech Synthesis With Multimodal Information Bottleneck

Unpaired speech Enhancement by Acoustic and Adversarial Supervision for Speech Recognition

Unpaired speech Enhancement by Acoustic and Adversarial Supervision for Speech Recognition

Unsupervised Cross-Corpus speech Emotion Recognition Using a Multi-Source Cycle-GAN

Unsupervised Feature Learning for speech Using Correspondence and Siamese Networks

Unsupervised Personalization of an Emotion Recognition System: The Unique Properties of the Externalization of Valence in speech

Unsupervised speech Activity Detection Using Voicing Measures and Perceptual Spectral Flux

Unsupervised speech Text Localization in Comic Images

Unsupervised Tibetan speech features Learning based on Dynamic Bayesian Networks

Use of Line Spectral Frequencies for Emotion Recognition from speech

Use of radial basis function network with discrete wavelet transform for speech enhancement

User Authentication System Based on speech and Cascade Hybrid Facial Feature

User Verification by Combining speech and Face Biometrics in Video

Using Adaptive Filter to Increase Automatic speech Recognition Rate in a Digit Corpus

Using Hand Gesture and speech in a Multimodal Augmented Reality Environment

Using Semantics to Automatically Generate speech Interfaces for Wearable Virtual and Augmented Reality Applications

Using speech for Handwritten Mathematical Expression Recognition Disambiguation

Using speech Input for Image Interpretation and Annotation

Utterance Verification-Based Dysarthric speech Intelligibility Assessment Using Phonetic Posterior Features

value of stories for speech -based video search, The

Variable-Length Speaker Conditioning in Flow-Based Text-to- speech

Vector quantization with memory and multi-labeling for isolated video-only automatic speech recognition

Vector Taylor series based model adaptation using noisy speech trained hidden Markov models

Vector-Based Feature Representations for speech Signals: From Supervector to Latent Vector

Vector-to-Vector Regression via Distributional Loss for speech Enhancement

Ventriloquist-Net: Leveraging speech Cues for Emotive Talking Head Generation

very low bit rate codec for wide band speech based on a long-term perceptual harmonic plus noise model, A

Video Augmentation for Improving Audio speech Recognition under Noise

Video Rewrite: Driving Visual speech with Audio

Video, Text, and speech -Driven Realistic 3-D Virtual Head for Human-Machine Interface, A

VisageSynTalk: Unseen Speaker Video-to- speech Synthesis via Speech-Visage Feature Selection

VisageSynTalk: Unseen Speaker Video-to- speech Synthesis via Speech-Visage Feature Selection

Vision Based speech Animation Transferring with Underlying Anatomical Structure

Visual display methods for in computer-animated speech production models

Visual prosody: facial movements accompanying speech

Visual Recognition of Activities, Gestures, Facial Expressions and speech : An Introduction and a Perspective

Visual Skeleton and Reparative Attention for Part-of- speech image captioning system

Visual speech Enhancement Without A Real Visual Stream

visual speech model based on fuzzy-neuro methods, A

Visual speech Recognition by Recurrent Neural Networks

Visual speech Recognition Method Using Translation, Scale and Rotation Invariant Features

Visual speech Recognition Using Dynamic Features And Support Vector Machines

Visual speech Recognition Using Motion Features and Hidden Markov Models

Visual speech Recognition Using PCA Networks and LSTMs in a Tandem GMM-HMM System

Visual speech Recognition Using Weighted Dynamic Time Warping

Visual speech Recognition with Loosely Synchronized Feature Streams

Visual speech Synthesis by Morphing Visemes

Visual speech Synthesis Using a Variable-Order Switching Shared Gaussian Process Dynamical Model

Visual speech , a trajectory in viseme space

Visual speech : A Physiological or Behavioural Biometric?

Visual-to- speech conversion based on maximum likelihood estimation

Visually Recognizing speech Using Eigen Sequences

VisualVoice: Audio-Visual speech Separation with Cross-Modal Consistency

Voice Conversion for Whispered speech Synthesis

Voice of Leadership: Models and Performances of Automatic Analysis in Online speech es, The

Voicing Detection in Noisy speech Signal

Watch or Listen: Robust Audio-Visual speech Recognition with Visual Corruption Modeling and Reliability Scoring

Watch to Listen Clearly: Visual speech Enhancement Driven Multi-modality Speech Recognition

Watch to Listen Clearly: Visual speech Enhancement Driven Multi-modality Speech Recognition

Watermarking-Based Perceptual Hashing Search Over Encrypted speech

WavDepressionNet: Automatic Depression Level Prediction via Raw speech Signals

WaveCRN: An Efficient Convolutional Recurrent Neural Network for End-to-End speech Enhancement

Waveform Interpolation-Based speech Analysis/Synthesis for HMM-Based TTS Systems

Wavelet speech Enhancement Based on Nonnegative Matrix Factorization

Wavelet-FILVQ classifier for speech analysis

WebVoice: A Toolkit for Perceptual Insights into speech Processing

Weight-Space Viterbi Decoding Based Spectral Subtraction for Reverberant speech Recognition

Whispered speech Detection in Noise Using Auditory-Inspired Modulation Spectrum Features

Whispered speech Detection Using Fusion of Group-Delay-Based Subband Modulation Spectrum and Correntropy Features

Wideband speech Encryption Based Arnold Cat Map for AMR-WB G.722.2 Codec

Word Segments in Category-Based Language Models for Automatic speech Recognition

Zero-Shot Keyword Spotting for Visual speech Recognition In-the-wild

1063 for speech

_	speech	_
2.4kbps Multiband Characteristic Waveform Interpolation	speech	Coding Algorithm, A
2.5D Visual	speech	Synthesis Using Appearance Models
3-D Convolutional Recurrent Neural Networks With Attention Model for	speech	Emotion Recognition
3D Visual passcode:	speech	-driven 3D facial dynamics for behaviometrics
450bps	speech	Coding Algorithm Based on Multi-Mode Matrix Quantization, A
Accuracy, Apps Advance	speech	Recognition
Acoustic Analysis for Automatic	speech	Recognition
Acoustic echo cancellation for stereophonic systems derived from pairwise panning of monophonic	speech
Acoustic Event Detection in	speech	Overlapping Scenarios Based on High-Resolution Spectral Input and Deep Learning
Acoustically Emotion-Aware Conversational Agent With	speech	Emotion Recognition and Empathetic Responses, The
Active Contour Model for	speech	Balloon Detection in Comics, An
Adaptation of Hidden Markov Models for Recognizing	speech	of Reduced Frame Rate
Adaptive Gain Control for Enhanced	speech	Intelligibility Under Reverberation
adaptive model of person identification combining	speech	and image information, An
Adaptive Signal Models for Wide-Band	speech	and Audio Compression
Adaptive	speech	Dereverberation Using Constrained Sparse Multichannel Linear Prediction
Adaptive	speech	enhancement with varying noise backgrounds
Adaptive	speech	Intelligibility Enhancement for Far-and-Near-end Noise Environments Based on Self-attention StarGAN
Adding Voicing Features into	speech	Recognition Based on HMM in Slovak
Advanced tools for	speech	synchronized animation
Adversarial Continual Learning to Transfer Self-Supervised	speech	Representations for Voice Pathology Detection
Adversarial Feature Learning and Unsupervised Clustering Based	speech	Synthesis for Found Data With Acoustic and Textual Noise
Adversarial Training Based	speech	Emotion Classifier With Isolated Gaussian Regularization, An
Affective Audio Annotation of Public	speech	es with Convolutional Clustering Neural Network
Affine-Invariant Visual Features Contain Supplementary Information to Enhance	speech	Recognition
Aging	speech	recognition with speaker adaptation techniques: Study on medium vocabulary continuous Bengali speech
Aging	speech	recognition with speaker adaptation techniques: Study on medium vocabulary continuous Bengali speech
AKVSR: Audio Knowledge Empowered Visual	speech	Recognition by Compressing Audio Knowledge of a Pretrained Model
Algorithms for syllabic hypothesization in continuous	speech
Alias-and-Separate: Wideband	speech	Coding Using Sub-Nyquist Sampling and Speech Separation
Alias-and-Separate: Wideband	speech	Coding Using Sub-Nyquist Sampling and Speech Separation
Amazigh audiovisual	speech	recognition system design
Amazigh isolated word	speech	recognition system using the Adaptive Orthogonal Transform Method.
Analysing acoustic model changes for active learning in automatic	speech	recognition
Analysis and Classification of Cold	speech	Using Variational Mode Decomposition
Analysis of Emotion Annotation Strength Improves Generalization in	speech	Emotion Recognition Models
Analysis of Lip Geometric Features for Audio-Visual	speech	Recognition
Analysis of stressed human	speech
analysis of the effect of combining standard and alternate sensor signals on recognition of syllabic units for multimodal	speech	recognition, An
Analysis of the Multifractal Nature of	speech	Signals
Analysis of the Possibilities to Adapt the Foreign Language	speech	Recognition Engines for the Lithuanian Spoken Commands Recognition
Analysis of the Utility of Classical and Novel	speech	Quality Measures for Speaker Verification
Anchor Models for Emotion Recognition from	speech
Animating visible	speech	and facial expressions
AnyoneNet: Synchronized	speech	and Talking Head Generation for Arbitrary Persons
Application of Capsule Neural Network Based CNN for	speech	Emotion Recognition, The
Application of digit and	speech	recognition in food delivery robot
Application of support vector machines classifiers to visual	speech	recognition
Application of triphone clustering in acoustic modeling for continuous	speech	recognition in Bengali
Application of wavelet transforms for C/V segmentation on Mandarin	speech	signals
ARawNet: A Lightweight Solution for Leveraging Raw Waveforms in Spoof	speech	Detection
Architecture for Automatic Lipreading to Enhance	speech	Recognition, An
Art Critic: Multisignal Vision and	speech	Interaction System in a Gaming Context
Articulatory	speech	Re-synthesis: Profiting from Natural Acoustic Speech Data
Articulatory	speech	Re-synthesis: Profiting from Natural Acoustic Speech Data
ASQ: An Ultra-Low Bit Rate ASR-Oriented	speech	Quantization Method
Assessing speaker independence on a	speech	-based depression level estimation system
Asymmetric 3D face model for	speech	Language Pathologist applications
Asymmetrically boosted HMM for	speech	reading
Attention Based Speaker-independent Audio-visual Deep Learning Model for	speech	Enhancement, An
Attention-based convolutional neural network and long short-term memory for short-term detection of mood disorders based on elicited	speech	responses
Attention-Based Dense LSTM for	speech	Emotion Recognition
Audio Based Real-Time	speech	Animation of Embodied Conversational Agents
Audio Classification in	speech	and Music: A Comparison Between a Statistical and a Neural Approach
Audio Watermarks,	speech	Watermarks
Audio-visual continuous	speech	recognition using MPEG-4 compliant visual features
Audio-Visual Efficient Conformer for Robust	speech	Recognition
Audio-Visual Person Authentication with Multiple Visualized-	speech	Features and Multiple Face Profiles
Audio-Visual	speech	Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis
Audio-Visual	speech	Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis
Audio-Visual	speech	Fusion Using Coupled Hidden Markov Models
Audio-Visual	speech	Recognition Based on AAM Parameter and Phoneme Analysis of Visual Feature
Audio-Visual	speech	Recognition Scheme Based on Wavelets and Random Forests Classification
Audio-visual	speech	recognition techniques in augmented reality environments
Audio-Visual	speech	Recognition Using A Two-Step Feature Fusion Strategy
Audio-Visual	speech	Recognition Using MPEG-4 Compliant Visual Features
Audio-visual	speech	synchronization detection using a bimodal linear prediction model
Audio-Visual	speech	Synthesis Based on Chinese Visual Triphone
Audio2Gestures: Generating Diverse Gestures from	speech	Audio with Conditional Variational Autoencoders
Audiovisual Discrimination Between	speech	and Laughter: Why and When Visual Information Might Help
Audiovisual	speech	Source Separation: An overview of key methodologies
Audiovisual Talking Head for Augmented	speech	Generation: Models and Animations Based on a Real Speaker's Articulatory Data, An
Auditory Features Revisited for Robust	speech	Recognition
Autoencoder-based Unsupervised Domain Adaptation for	speech	Emotion Recognition
Automated Lip Synchronized	speech	Driven Facial Animation
Automated	speech	alignment for image synthesis
Automatic bi-modal emotion recognition system based on fusion of facial expressions and emotion extraction from	speech
Automatic continuous	speech	recogniser for Dravidian languages using the auto associative neural network
Automatic Detection of Amyotrophic Lateral Sclerosis (ALS) from Video-Based Analysis of Facial Movements:	speech	and Non-Speech Tasks
Automatic Detection of Amyotrophic Lateral Sclerosis (ALS) from Video-Based Analysis of Facial Movements:	speech	and Non-Speech Tasks
Automatic Evaluation of Hypernasality and Consonant Misarticulation in Cleft Palate	speech
Automatic Evaluation of	speech	Therapy Exercises Based on Image Data
Automatic Person Verification Using	speech	and Face Information
Automatic Selection of Visemes for Image-based Visual	speech	Synthesis
Automatic Sentence Modality Recognition in Children's	speech	, and Its Usage Potential in the Speech Therapy
Automatic Sentence Modality Recognition in Children's	speech	, and Its Usage Potential in the Speech Therapy
Automatic speaker verification on narrowband and wideband lossy coded clean	speech
Automatic	speech	discrete labels to dimensional emotional values conversion method
Automatic	speech	Emotion Recognition Using Auditory Models with Binary Decision Tree and SVM
Automatic Urdu	speech	Recognition using Hidden Markov Model
Automatic Video Annotation by Mining	speech	Transcripts
Automatic visual	speech	segmentation and recognition using directional motion history images and Zernike moments
AVFormer: Injecting Vision into Frozen	speech	Models for Zero-Shot AV-ASR
Avoiding dominance of speaker features in	speech	-based depression detection
AWLloss: Speaker Verification Based on the Quality and Difficulty of	speech
Bandwidth-adjusted LPC analysis for robust	speech	recognition
Bayesian Predictive Method for Automatic	speech	Segmentation, A
Bayesian reasoning on qualitative descriptions from images and	speech
Beam-search Formant Tracking Algorithm Based on Trajectory Functions for Continuous	speech
Beamforming Algorithm Based on Maximum Likelihood of a Complex Gaussian Distribution With Time-Varying Variances for Robust	speech	Recognition, A
Behavioral Signal Processing: Deriving Human Behavioral Informatics From	speech	and Language
Benchmarking classification models for emotion recognition in natural	speech	: A multi-corporal study
Bilingual	speech	Recognition by Estimating Speaker Geometry from Video Data
Bimodal fusion in audio-visual	speech	recognition
Biological Motion of	speech
Blind Adaptive Mask to Improve Intelligibility of Non-Stationary Noisy	speech
Blind Source Separation Based Approach for	speech	Enhancement in Noisy and Reverberant Environment, A
Boosted audio-visual HMM for	speech	reading
Building Naturalistic Emotionally Balanced	speech	Corpus by Retrieving Emotional Speech from Existing Podcast Recordings
Building Naturalistic Emotionally Balanced	speech	Corpus by Retrieving Emotional Speech from Existing Podcast Recordings
cache-based natural language model for	speech	recognition, A
Can we Automatically Transform	speech	Recorded on Common Consumer Devices in Real-World Environments into Professional Production Quality Speech?: A Dataset, Insights, and Challenges
Can we Automatically Transform	speech	Recorded on Common Consumer Devices in Real-World Environments into Professional Production Quality Speech?: A Dataset, Insights, and Challenges
Can We Read	speech	Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech Recognition
Can We Read	speech	Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech Recognition
Cancellable	speech	template via random binary orthogonal matrices projection hashing
Cascade Image Transform for Speaker Independent Automatic	speech	Reading, A
Casual chatter or speaking up? Adjusting articulatory effort in generation of	speech	and animation for conversational characters
Casual Conversations v2 Dataset: A diverse, large benchmark for measuring fairness and robustness in audio/vision/	speech	models, The
CAT-DUnet: Enhancing	speech	Dereverberation via Feature Fusion and Structural Similarity Loss
CATNet: Cross-modal fusion for audio-visual	speech	recognition
Chunk-Level	speech	Emotion Recognition: A General Framework of Sequence-to-One Dynamic Temporal Modeling
CIF-Based	speech	Segmentation Method for Streaming E2E ASR, A
Class Confusability Reduction in Audio-Visual	speech	Recognition Using Random Forests
Classification of Complex Information: Inference of Co-Occurring Affective States from Their Expressions in	speech
Classifier-Based Learning of Nonlinear Feature Manifold for Visualization of Emotional	speech	Prosody
clump splitting based method to localize	speech	balloons in comics, A
Clustering Algorithm for the Fast Match of Acoustic Conditions in Continuous	speech	Recognition, A
Co-	speech	Gesture Detection through Multi-Phase Sequence Labeling
Co-	speech	Gesture Synthesis by Reinforcement Learning with Contrastive Pretrained Rewards
CodeTalker:	speech	-Driven 3D Facial Animation with Discrete Motion Prior
Combined Handwriting and	speech	Modalities for User Authentication
Combining Deep and Unsupervised Features for Multilingual	speech	Emotion Recognition
Combining handwriting and	speech	recognition for transcribing historical handwritten documents
Combining	speech	and Handwriting Modalities for Mathematical Expression Recognition
Combining	speech	energy and edge information for fast and efficient voice activity detection in noisy environments
Communicative Rhythm in Gesture and	speech
Compact and Efficient Multitask Learning in Vision, Language and	speech
Compact Representation of Visual	speech	Data Using Latent Variables, A
Comparative Experiments to Evaluate the Use of Syllables for the Improvement of Automatic Recognition of Dysarthric	speech
Comparing Multiple Classifiers for	speech	-Based Detection of Self-Confidence: A Pilot Study
Comparison of Active Shape Model and Scale Decomposition Based Features for Visual	speech	Recognition, A
Comparison of Image Transform-Based Features for Visual	speech	Recognition in Clean and Corrupted Videos
Comparison of MPEG-4 Facial Animation Parameter Groups with Respect to Audio-Visual	speech	Recognition Performance
Comparison of Phoneme and Viseme Based Acoustic Units for	speech	Driven Realistic lip Animation
Complex Neural Spatial Filter: Enhancing Multi-Channel Target	speech	Separation in Complex Domain
computationally compact divergence measure for	speech	processing, A
Computer Assisted Transcription of	speech
Concatenated Frame Image Based CNN for Visual	speech	Recognition
Conceptual and Lexical Factors in the Production of	speech	and Conversational Gestures: Neuropsychological Evidence
Conditional Random Fields in	speech	, Audio, and Language Processing
ConflictNET: End-to-End Learning for	speech	-Based Conflict Intensity Estimation
Connecting Subspace Learning and Extreme Learning Machine in	speech	Emotion Recognition
Constant-Q magnitude-phase coefficients extraction for synthetic	speech	detection
Constrained MMSE LP Residual Estimator for	speech	Dereverberation in Noisy Environments, A
Constructing	speech	processing systems on universal phonetic codes accompanied with reference acoustic models
Contextual and Cross-Modal Interaction for Multi-Modal	speech	Emotion Recognition
Contextual vector quantization for	speech	recognition with discrete hidden Markov model
Continual Learning for Personalized Co-	speech	Gesture Generation
Continuous Audio-Visual	speech	Recognition
Continuous Automatic	speech	Recognition by Lipreading
Continuous Estimation of Emotions in	speech	by Dynamic Cooperative Speaker Models
Continuous	speech	coding using coiflets wavelet
Controllable Multi-Lingual Multi-Speaker Multi-Style Text-to-	speech	Synthesis With Multivariate Information Minimization, A
Conversational Evaluation of	speech	Bandwidth Extension Using a Mobile Handset
Conversion of neutral	speech	to storytelling style speech
Conversion of neutral	speech	to storytelling style speech
Convolutional Network With Multi-Scale and Attention Mechanisms for End-to-End Single-Channel	speech	Enhancement, A
Convolutional Neural Networks for Distant	speech	Recognition
Correlation based	speech	-video synchronization
coupled HMM approach to video-realistic	speech	animation, A
Creating 3D	speech	-driven talking heads: a probabilistic network approach
CrisisHateMM: Multimodal Analysis of Directed and Undirected Hate	speech	in Text-Embedded Images from Russia-Ukraine Conflict
CroMM-VSR: Cross-Modal Memory Augmented Visual	speech	Recognition
Cross-Corpus	speech	Emotion Recognition Based on Domain-Adaptive Least-Squares Regression
Cross-Corpus	speech	Emotion Recognition Based on Few-Shot Learning and Domain Adaptation
Cross-Modal Analysis of	speech	, Gestures, Gaze and Facial Expressions
Cross-Speaker Emotion Transfer Through Information Perturbation in Emotional	speech	Synthesis
Cryptographic-	speech	-Key Generation Architecture Improvements
Cued	speech	Gesture Recognition: A First Prototype Based on Early Reduction
CWT-Based Approach for Epoch Extraction From Telephone Quality	speech
Cyclic Defense GAN Against	speech	Adversarial Attacks
Cyclic Transfer Learning for Mandarin-English Code-Switching	speech	Recognition
Czech Spontaneous	speech	Collection and Annotation: The Database of Technical Lectures
Dar	speech	: An Automatic Speech Recognition System for the Moroccan Dialect
Data-Driven Jacobian Adaptation in a Multi-model Structure for Noisy	speech	Recognition
Dawn of the Transformer Era in	speech	Emotion Recognition: Closing the Valence Gap
DBATES: Dataset for Discerning Benefits of Audio, Textual, and Facial Expression Features in Competitive Debate	speech	es
DBN-based Spectral Feature Representation for Statistical Parametric	speech	Synthesis
Decision Level Fusion for Audio-Visual	speech	Recognition in Noisy Conditions
Deep Audio-Visual	speech	Recognition
Deep Belief Networks for Real-Time Extraction of Tongue Contours from Ultrasound During	speech
Deep Cross-Modal Retrieval Between Spatial Image and Acoustic	speech
Deep Hybrid Approach for Hate	speech	Analysis, A
Deep Learning for Acoustic Modeling in Parametric	speech	Generation: A systematic review of existing techniques and future trends
Deep Learning for Emotional	speech	Recognition
Deep Learning Loss Function Based on the Perceptual Evaluation of the	speech	Quality, A
DeepComboSAD: Spectro-Temporal Correlation Based	speech	Activity Detection for Naturalistic Audio Streams
Defining Laughter Context for Laughter Synthesis with Spontaneous	speech	Corpus
DeFT-AN: Dense Frequency-Time Attentive Network for Multichannel	speech	Enhancement
Demonstration of an HMM-based photorealistic expressive audio-visual	speech	synthesis system
Dense Convolutional Recurrent Neural Network for Generalized	speech	Animation
Detecting Aggression in Voice Using Inverse Filtered	speech	Features
Detecting Multiple Steganography Methods in	speech	Streams Using Multi-Encoder Network
Detecting Parkinson's disease with sustained phonation and	speech	signals using machine learning techniques
Detecting Unipolar and Bipolar Depressive Disorders from Elicited	speech	Responses Using Latent Affective Structure Model
Detection of a Speaker in Video by Combined Analysis of	speech	Sound and Mouth Movement
Detection of COVID-19 from	speech	signal using bio-inspired based cepstral features
Detection of Dynamic Structures of	speech	Fundamental Frequency in Tonal Languages
Detection of Vowel Offset Point From	speech	Signal
Device and method for dubbing an audio-visual presentation which generates synthesized	speech	and corresponding facial movements
Differentiable Mean Opinion Score Regularization for Perceptual	speech	Enhancement
DiffMotion:	speech	-Driven Gesture Synthesis Using Denoising Diffusion Model
DiffV2S: Diffusion-based Video-to-	speech	Synthesis with Vision-guided Speaker Embedding
Diphone spanish text-to-	speech	synthesizer
Direct Text to	speech	Translation System Using Acoustic Units
Disambiguation in Unknown Object Detection by Integrating Image and	speech	Recognition Confidences
Discriminating Unknown Objects from Known Objects Using Image and	speech	Information
Discrimination Between Native and Non-Native	speech	Using Visual Features Only
Discriminative Analysis of Lip Motion Features for Speaker Identification and	speech	-Reading
Discriminative Capacity and Phonetic Information of Bottleneck Features in	speech
Discriminative feature extraction for	speech	recognition using continuous output codes
Discriminative Frequency Information Learning for End-to-End	speech	Anti-Spoofing
Discriminative Multi-Modality	speech	Recognition
Discriminative Training of NMF Model Based on Class Probabilities for	speech	Enhancement
Distilled non-semantic	speech	embeddings with binary neural networks for low-resource devices
Distributed Audio Network for	speech	Enhancement in Challenging Noise Backgrounds
Distributed Microphones	speech	Separation by Learning Spatial Information With Recurrent Neural Network
Djinn: Interaction Framework for Home Environment Using	speech	and Vision
DNN-Based Feature Enhancement Using DOA-Constrained ICA for Robust	speech	Recognition
DNN-Based Feature Extraction for Conflict Intensity Estimation From	speech
Does Visual Self-Supervision Improve Learning of	speech	Representations for Emotion Recognition?
DR2: Disentangled Recurrent Representation Learning for Data-efficient	speech	Video Synthesis
Dynamic 3-D Visualization of Vocal Tract Shaping During	speech
Dynamic Bayesian Networks for Audio-Visual	speech	Recognition
Dynamic versus Static Facial Expressions in the Presence of	speech
Dynamic-static Cross Attentional Feature Fusion Method for	speech	Emotion Recognition
E2E-V2SResNet: Deep residual convolutional neural networks for end-to-end video driven	speech	synthesis
Effect of Various Visual	speech	Units on Language Identification Using Visual Speech Recognition
Effect of Various Visual	speech	Units on Language Identification Using Visual Speech Recognition
Effective online unsupervised adaptation of Gaussian mixture models and its application to	speech	classification
Effective Style Token Weight Control Technique for End-to-End Emotional	speech	Synthesis, An
Effectiveness of Mel Scale-Based ESA-IFCC Features for Classification of Natural vs. Spoofed	speech
Efficient Framework for Constructing	speech	Emotion Corpus Based on Integrated Active Learning Strategies, An
Efficient Gaussian Mixture for	speech	Recognition
Efficient Generation of	speech	Adversarial Examples with Generative Model
Efficient HMM-Based Feature Enhancement Method With Filter Estimation for Reverberant	speech	Recognition, An
Efficient One-Pass Decoding with NNLM for	speech	Recognition
Efficient Representation Learning for Inner	speech	Domain Generalization
Efficient Sparse Banded Acoustic Models for	speech	Recognition
Efficient text analyser with prosody generator-driven approach for Mandarin text-to-	speech
Efficient use of the grammar scale factor to classify incorrect words in	speech	recognition verification
Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for Low-Resource	speech	Recognition
EmoNet: A Transfer Learning Framework for Multi-Corpus	speech	Emotion Recognition
EmoTalk:	speech	-Driven Emotional Disentanglement for 3D Face Animation
Emotion Dependent Domain Adaptation for	speech	Driven Affective Facial Feature Synthesis
Emotion recognition from	speech	signals via a probabilistic echo-state network
Emotion Recognition of Affective	speech	Based on Multiple Classifiers Using Acoustic-Prosodic Information and Semantic Labels
Emotional	speech	Analysis on Nonlinear Manifold
Emotional	speech	Classification Based on Multi View Characterization
Emotional	speech	Clustering Based Robust Speaker Recognition System
Emotional	speech	Recognition Using Acoustic Models of Decomposed Component Words
End-to-End Audiovisual	speech	Recognition System With Multitask Learning
End-to-End Dual-Branch Network Towards Synthetic	speech	Detection
End-to-End Pathological	speech	Detection Using Wavelet Scattering Network
End-to-end Triplet Loss based Emotion Embedding System for	speech	Emotion Recognition
End-to-End Video-to-	speech	Synthesis Using Generative Adversarial Networks
End-to-end visual	speech	recognition for small-scale datasets
Enhanced VQ-Based Algorithms for	speech	Independent Speaker Identification
Enhancement of Spectral Tilt in Synthesized	speech
Enhancing Emotion Classification Through	speech	and Correlated Emotional Sounds via a Variational Auto-Encoder Model with Prosodic Regularization
Enhancing Frequency Shifted	speech	Signals in Single Side-Band Communication
EPG2S:	speech	Generation and Speech Enhancement Based on Electropalatography and Audio Signals Using Multimodal Learning
EPG2S:	speech	Generation and Speech Enhancement Based on Electropalatography and Audio Signals Using Multimodal Learning
Error Mitigation Technique for Erasure Channels Based on a Wavelet Representation of the	speech	Excitation Signal, An
Error-Diffusion Based	speech	Feature Quantization for Small-Footprint Keyword Spotting
ESAformer: Enhanced Self-Attention for Automatic	speech	Recognition
Estimating	speech	Spectral Amplitude Based on the Nakagami Approximation
Estimation of Rapidly Time-Varying Harmonic Noise for	speech	Enhancement
Evaluation of Head Gaze Loosely Synchronized With Real-Time Synthetic	speech	for Social Robots
Evaluation of	speech	Emotion Classification Based on GMM and Data Fusion
Evaluation of the Concatenative Turkish Text-to-	speech	System
Evaluation of Visual	speech	Features for the Tasks of Speech and Speaker Recognition, An
Evaluation of Visual	speech	Features for the Tasks of Speech and Speaker Recognition, An
experimental study of energy dips for	speech	and music, An
Experimental Study on	speech	Enhancement Based on Deep Neural Networks, An
Experimental Study on Transfer Learning in Denoising Autoencoders for	speech	Enhancement
Experiments in dynamic programming inference of Markov networks with strings representing	speech	data
Explainability of	speech	Recognition Transformers via Gradient-Based Attention Visualization
Exploiting alternative acoustic sensors for improved noise robustness in	speech	communication
Exploiting	speech	for Automatic TV Delinearization: From Streams to Cross-Media Semantic Navigation
Exploiting	speech	/Gesture Co-occurrence for Improving Continuous Gesture Recognition in Weather Narration
Exploring Co-Occurence Between	speech	and Body Movement for Audio-Guided Video Localization
Exploring Hate	speech	Detection in Multimodal Publications
Exploring	speech	Features for Classifying Emotions along Valence Dimension
Exploring the Topics of Audio Words for Detecting Alzheimer's Disease From Spontaneous	speech
Exploring Zero-Shot Emotion Recognition in	speech	Using Semantic-Embedding Prototypes
Expression-Preserving Face Frontalization Improves Visually Assisted	speech	Processing
Expressive Facial Animation Synthesis by Learning	speech	Coarticulation and Expression Spaces
Expressive Modulation of Neutral Visual	speech
Expressive	speech	-Driven Lip Movements with Multitask Learning
Expressive visual text-to-	speech	as an assistive technology for individuals with autism spectrum conditions
Expressive Visual Text-to-	speech	Using Active Appearance Models
Extended Decision Tree with or Relationship for HMM-Based	speech	Synthesis
Extension of proposal of standards for intelligibility tests of Chinese	speech	: CDRT-tone
Extracting High Level Semantics by Means of	speech	, Audio, and Image Primitives in Surveillance Applications
F0 Parameterization of Glottalized Tones in HMM-Based	speech	Synthesis for Hanoi Vietnamese
FaceFormer:	speech	-Driven 3D Facial Animation with Transformers
Facial 3D Shape Estimation from Images for Visual	speech	Animation
Facial Expression Recognition in the Presence of	speech	Using Blind Lexical Compensation
Factorized MVDR Deep Beamforming for Multi-Channel	speech	Enhancement
Factors in Emotion Recognition With Deep Learning Models Using	speech	and Text on Multiple Corpora
Far-Field Automatic	speech	Recognition
Fast Object Class Labelling via	speech
Fast, Diverse and Accurate Image Captioning Guided by Part-Of-	speech
Feature Denoising Using Joint Sparse Representation for In-Car	speech	Recognition
Feature optimisation for stress recognition in	speech
Feature Pooling of Modulation Spectrum Features for Improved	speech	Emotion Recognition in the Wild
Feature Selection Based Transfer Subspace Learning for	speech	Emotion Recognition
Feature selection methods for hidden Markov model-based	speech	recognition
Feature space video stream consistency estimation for dynamic stream weighting in audio-visual	speech	recognition
Features extraction and selection for emotional	speech	classification
Few-Shot Learning in Emotion Recognition of Spontaneous	speech	Using a Siamese Neural Network With Adaptive Sample Pair Formation
Finding Lips in Unconstrained Imagery for Improved Automatic	speech	Recognition
Fine-Grained Action Retrieval Through Multiple Parts-of-	speech	Embeddings
First degree heart block determination from	speech	analysis
Frame-synchronous noise compensation for hands-free	speech	recognition in car environments
From Bottom to Top: A Coordinated Feature Representation Method for	speech	Recognition
From	speech	Quality Measures to Speaker Recognition Performance
From Text to	speech	: A Multimodal Cross-Domain Approach for Deception Detection
FSCNet: Feature-Specific Convolution Neural Network for Real-Time	speech	Enhancement
FSER: Deep Convolutional Neural Networks for	speech	Emotion Recognition
Fundamental Technologies in Modern	speech	Recognition
Furcanext: End-to-end Monaural	speech	Separation with Dynamic Gated Dilated Temporal Convolutional Networks
Fused	speech	Enhancement Framework for Robust Speaker Verification, A
Fusing Audio and Visual Features of	speech
Fusion of Audio-Visual Information for Integrated	speech	Processing
Fusion of Face and	speech	Data for Person Identity Verification
Fusion of	speech	, Faces and Text for Person Identification in TV Broadcast
Fuzzy integral based information fusion for classification of highly confusable non-	speech	sounds
Fuzzy rule selection using Iterative Rule Learning for	speech	data classification
GA Approaches to HMM Optimization for Automatic	speech	Recognition
Gabor Filterbank Features for Robust	speech	Recognition
Gammatone Cepstral Coefficients: Biologically Inspired Features for Non-	speech	Audio Classification
GAN-in-GAN for Monaural	speech	Enhancement
Gaussian Specific Compensation for Channel Distortion in	speech	Recognition
Gender classification in two Emotional	speech	databases
Generalized Two-Stage Rank Regression Framework for Depression Score Prediction from	speech
Generating Co-	speech	Gestures for the Humanoid Robot NAO through BML
Generating Holistic 3D Human Motion from	speech
Generating Personalized Virtual Agent in	speech	Dialogue System for People with Dementia
Generating realistic facial animation from	speech
Generating Transferable Adversarial Examples for	speech	Classification
Genetic Algorithm-Based Adaptive Wiener Gain for	speech	Enhancement Using an Iterative Posterior NMF
geostatistical model for linear prediction analysis of	speech	, A
GesRec3D: A Real-Time Coded Gesture-to-	speech	System with Automatic Segmentation and Recognition Thresholding Using Dissimilarity Measures
Gesture,	speech	, and Gaze Cues for Discourse Segmentation
Gestures and Lip Shape Integration for Cued	speech	Recognition
Global Variance in	speech	Synthesis With Linear Dynamical Models
Graphical	speech	Training system for hearing impaired
Group Delay based Methods for Detection and Recognition of Whispered	speech
GRU-SVM Model for Synthetic	speech	Detection
Guest Editorial: Special Issue on Affective	speech	and Language Synthesis, Generation, and Conversion
GUI for interactive	speech	synthesis
Harmonic Enhancement with Noise Reduction of	speech	Signal by Comb Filtering
Head Movements in Context of	speech	during Stress Induction
Hidden Bawls, Whispers, and Yelps: Can Text Convey the Sound of	speech	, Beyond Words?
Hidden Conditional Random Fields for Visual	speech	Recognition
Hierarchical Bayesian combination of plug-in maximum a posteriori decoders in deep neural networks-based	speech	recognition and speaker adaptation
hierarchical Bayesian model for continuous	speech	recognition, A
Hierarchical	speech	-act classification for discourse analysis
hierarchical tag-graph search scheme with layered grammar rules for spontaneous	speech	understanding, A
High-frame-rate real-time imaging of	speech	production
Higher Order Subspace Algorithm for Multichannel	speech	Enhancement, A
Highly Transparent Steganography Scheme of	speech	Signals into Color Images Using Quantization Index Modulation
Historical Perspective of	speech	Recognition, A
HMM based	speech	-driven 3D tongue animation
HNM-Based Speaker-Nonspecific Timbre Transformation Scheme for	speech	Synthesis, An
Hough transform-based mouth localization for audio-visual	speech	recognition
Human emotion recognition by optimally fusing facial expression and	speech	feature
hybrid approach to improve part of	speech	tagging system, An
Hybrid Autoregressive and Non-Autoregressive Transformer Models for	speech	Recognition
Hybrid HMM-Based	speech	Recognizer Using Kernel-Based Discriminants as Acoustic Models, A
Hybrid PNN-GMM classification scheme for	speech	emotion recognition, A
hybrid SVM/DDBHMM decision fusion modeling for robust continuous digital	speech	recognition, A
hybrid visual feature extraction method for audio-visual	speech	recognition, A
IBM Rich Transcription 2007	speech	-to-Text Systems for Lecture Meetings, The
IDANet: An Information Distillation and Aggregation Network for	speech	Enhancement
IEEE Acoustics,	speech	, and Signal Processing Magazine
IEEE Trans. Acoustics,	speech	, and Signal Processing
Image Caption Generation with Part of	speech	Guidance
Image-Based Visual	speech	Animation System, An
Image-Sensitive Language Modeling for Automatic	speech	Recognition
Image-	speech	combination for interactive computer assisted transcription of handwritten documents
Imitator: Personalized	speech	-driven 3D Facial Animation
Impact of imperfect OCR on part-of-	speech	tagging
Impact of OCR Errors on Automated Classification of OCR Japanese Texts with Parts-of-	speech	Analysis, An
Impact of Reduced Video Quality on Visual	speech	Recognition, The
Implantation of voicing on whispered	speech	using frequency-domain parametric modelling of source and filter information
Implementation of Three Text to	speech	Systems for Kurdish Language
Implicit Compositional Generative Network for Length-Variable Co-	speech	Gesture Synthesis
Improve Word Mover's Distance with Part-of-	speech	Tagging
improved maximum model distance approach for HMM-based	speech	recognition systems, An
Improved	speech	Reconstruction from Silent Video
Improvement of	speech	emotion recognition with neural network classifier by using speech spectrogram
Improvement of	speech	emotion recognition with neural network classifier by using speech spectrogram
Improvements on Automatic	speech	Segmentation at the Phonetic Level
Improving and Aligning	speech	with Presentation Slides
Improving Children's	speech	Recognition by HMM Interpolation with an Adults' Speech Recognizer
Improving Children's	speech	Recognition by HMM Interpolation with an Adults' Speech Recognizer
Improving Cross-Corpus	speech	Emotion Recognition with Adversarial Discriminative Domain Generalization (ADDoG)
Improving End-to-End Contextual	speech	Recognition via a Word-Matching Algorithm With Backward Search
Improving Frame-Online Neural	speech	Enhancement With Overlapped-Frame Prediction
Improving GANs for	speech	Enhancement
Improving Mandarin End-to-End	speech	Recognition With Word N-Gram Language Model
Improving Monaural	speech	Enhancement by Mapping to Fixed Simulation Space With Knowledge Distillation
Improving Multimodal	speech	Recognition by Data Augmentation and Speech Representations
Improving Multimodal	speech	Recognition by Data Augmentation and Speech Representations
Improving	speech	Related Facial Action Unit Recognition by Audiovisual Information Fusion
Improving the Classification of Volcanic Seismic Events Extracting New Seismic and	speech	Features
Improving the Performance of Deep Learning Based	speech	Enhancement System Using Fuzzy Restricted Boltzmann Machine
Improving the	speech	Quality of VoIP by Packet Prioritization
Increasing Compactness of Deep Learning Based	speech	Enhancement Models With Parameter Pruning and Quantization Techniques
Incremental Text-to-	speech	Synthesis Using Pseudo Lookahead With Large Pretrained Language Model
Individual 3d Face Synthesis Based on Orthogonal Photos and	speech	-driven Facial Animation
Individualized Super-Gaussian Single Microphone	speech	Enhancement for Hearing Aid Users With Smartphone as an Assistive Device, An
Inducing Genuine Emotions in Simulated	speech	-Based Human-Machine Interaction: The NIMITEK Corpus
Influence of Hangover and Hangbefore Criteria on Automatic	speech	Recognition
Influence of	speech	/Non-Speech Segmentation on On-Line and Off-Line Speaker Segmentation Accuracy, The
Influence of	speech	/Non-Speech Segmentation on On-Line and Off-Line Speaker Segmentation Accuracy, The
Information Fusion and Person Verification Using	speech	and Face Information
Information-Extraction Approach to	speech	Processing: Analysis, Detection, Verification, and Recognition, An
Instrumental Assessment of Prosodic Quality for Text-to-	speech	Signals
Integrated analysis of	speech	and images as a probabilistic decoding process
Integrated Mining of Visual Features,	speech	Features, and Frequent Patterns for Semantic Video Annotation
Integrated neural network model for identifying	speech	acts, predicators, and sentiments of dialogue utterances
Integrating Binary Mask Estimation With MRF Priors of Cochleagram for	speech	Separation
Integrating Part of	speech	Guidance for Image Captioning
Integration of Vision and	speech	Understanding Using Bayesian Networks
Intelligibility Enhancement Via Normal-to-Lombard	speech	Conversion With Long Short-Term Memory Network and Bayesian Gaussian Mixture Model
Intelligibility improvements using binaural diverse sub-band processing applied to	speech	corrupted with automobile noise
Intelligibility of Children with Cleft Lip and Palate: Evaluation by	speech	Recognition Techniques
Inter-frame contextual modelling for visual	speech	recognition
Interaction between	speech	and Gesture: Strategies for Pointing to Distant Objects
Interaction framework for home environment using	speech	and vision
Interaction of Iconic Gesture and	speech	in Talk, The
Interaction With Gaze, Gesture, and	speech	in a Flexibly Configurable Augmented Reality System
Interdependencies among Voice Source Parameters in Emotional	speech
Interference Reduction in Reverberant	speech	Separation With Visual Voice Activity Detection
Intra-Predictive Switched Split Vector Quantization of	speech	Spectra
Introduction to the Special Issue: Advances on pattern recognition for	speech	and audio processing
Investigation into Audiovisual	speech	Correlation in Reverberant Noisy Environments, An
Investigation of Partition-Based and Phonetically-Aware Acoustic Features for Continuous Emotion Prediction from	speech	, An
Investigation of	speech	Landmark Patterns for Depression Detection
Invited paper: Automatic	speech	recognition: History, methods and challenges
ISL RT-07	speech	-to-Text System, The
Isolate	speech	Recognition Based on Time-Frequency Analysis Methods
Isolated word recognition by neural network models with cross-correlation coefficients for	speech	dynamics
Iterative Closed-Loop Phase-Aware Single-Channel	speech	Enhancement
Iterative Feature Normalization Scheme for Automatic Emotion Detection from	speech
Joint Bayesian Estimation of Time-Varying LP Parameters and Excitation for	speech
KAN-AV dataset for audio-visual face and	speech	analysis in the wild
Kernel Eigenvoices (Revisited) for Large-Vocabulary	speech	Recognition
Key Frame Mechanism for Efficient Conformer Based End-to-End	speech	Recognition
Keyword Detection for Spontaneous	speech
Kinect Development Kit: A Toolkit for Gesture- and	speech	-Based Human-Machine Interaction
Language-Independent OCR Using a Continuous	speech	Recognition System
Large Vocabulary Audio-visual	speech	Recognition Using Active Shape Models
Large Vocabulary Audio-Visual	speech	Recognition Using the Janus Speech Recognition Toolkit
Large Vocabulary Audio-Visual	speech	Recognition Using the Janus Speech Recognition Toolkit
Large Vocabulary Continuous	speech	Recognition With Reservoir-Based Acoustic Models
Large-Vocabulary Continuous	speech	Recognition Systems: A Look at Some Recent Advances
Late pre-dereverberation for	speech	intelligibility enhancement in public address systems
Latency in	speech	Feature Analysis for Telepresence Event Coding
Learning Contextually Fused Audio-Visual Representations for Audio-Visual	speech	Recognition
Learning Continuous Facial Actions From	speech	for Real-Time Animation
Learning Hierarchical Cross-Modal Association for Co-	speech	Gesture Generation
Learning Individual Speaking Styles for Accurate Lip to	speech	Synthesis
Learning Landmarks Motion from	speech	for Speaker-agnostic 3d Talking Heads Generation
Learning Salient Features for	speech	Emotion Recognition Using Convolutional Neural Networks
Learning Speaker-specific Lip-to-	speech	Generation
Learning Torso Prior for Co-	speech	Gesture Generation with Better Hand Shape
Learning Visual	speech
Learning With Learned Loss Function:	speech	Enhancement With Quality-Net to Improve Perceptual Evaluation of Speech Quality
Learning With Learned Loss Function:	speech	Enhancement With Quality-Net to Improve Perceptual Evaluation of Speech Quality
Letter-To-Sound conversion for	speech	synthesizer
Leveraging Non-Causal Knowledge via Cross-Network Knowledge Distillation for Real-Time	speech	Enhancement
LFEformer: Local Feature Enhancement Using Sliding Window With Deformability for Automatic	speech	Recognition
Linked Source and Target Domain Subspace Feature Transfer Learning -- Exemplified by	speech	Emotion Recognition
Lip Movement Synthesis from	speech	Based on Hidden Markov Models

_ speech2action _

speech2action : Cross-Modal Supervision for Action Recognition

_	speech2action	_
	speech2action	: Cross-Modal Supervision for Action Recognition

_ speech2face _

speech2face : Learning the Face Behind a Voice

_	speech2face	_
	speech2face	: Learning the Face Behind a Voice

_ speech2lip _

speech2lip : High-fidelity Speech to Lip Generation by Learning from a Short Video

_	speech2lip	_
	speech2lip	: High-fidelity Speech to Lip Generation by Learning from a Short Video

_ speech2video _

speech2video Synthesis with 3d Skeleton Regularization and Expressive Body Poses

_	speech2video	_
	speech2video	Synthesis with 3d Skeleton Regularization and Expressive Body Poses

_ speech4mesh _

speech4mesh : Speech-Assisted Monocular 3D Facial Reconstruction for Speech-Driven 3D Facial Animation

_	speech4mesh	_
	speech4mesh	: Speech-Assisted Monocular 3D Facial Reconstruction for Speech-Driven 3D Facial Animation

_ speechreading _

Automatic speechreading with Applications to Human-Computer Interfaces

Lip Feature Extraction Towards an Automatic speechreading System

Robust face feature analysis for automatic speechreading and character animation

Selecting relevant visual features for speechreading

speechreading Using Probabilistic Models

speechreading : an overview of image processing, feature extraction, sensory integration and pattern recognition techniques

_	speechreading	_
Automatic	speechreading	with Applications to Human-Computer Interfaces
Lip Feature Extraction Towards an Automatic	speechreading	System
Robust face feature analysis for automatic	speechreading	and character animation
Selecting relevant visual features for	speechreading
	speechreading	Using Probabilistic Models
	speechreading	: an overview of image processing, feature extraction, sensory integration and pattern recognition techniques

Index for "s"

Last update: 2-May-24 21:06:23
Use price@usc.edu for comments.