_ | speaker | _ |
3D Audio-Visual | speaker | Tracking with A Novel Particle Filter |
3D Audio-Visual | speaker | Tracking with A Two-Layer Particle Filter |
3D Convolutional Neural Networks Based | speaker | Identification and Authentication |
40 Years of Progress in Automatic | speaker | Recognition |
Acoustic and Facial Features for | speaker | Recognition |
Acoustic | speaker | Identification: The LIMSI CLEAR'07 System |
Active | speaker | Detection and Localization in Videos Using Low-Rank and Kernelized Sparsity |
Active | speaker | s in Context |
Adaptive fuzzy wavelet algorithm for text-independent | speaker | recognition |
Adaptive i-Vector Extraction for | speaker | Verification with Short Utterance, An |
Adaptive | speaker | identification using sequential probability ratio test |
Adaptive | speaker | Identification with Audio-Visual Cues for Movie Content Analysis |
Aging speech recognition with | speaker | adaptation techniques: Study on medium vocabulary continuous Bengali speech |
Ambiguity reduction in | speaker | identification by the relaxation labeling process |
Analysis of cosine distance features for | speaker | verification |
Analysis of Expressiveness of Portuguese Sign Language | speaker | s |
Analysis of the Utility of Classical and Novel Speech Quality Measures for | speaker | Verification |
analytic study on clustering driven self-supervised | speaker | verification, An |
Application of fusion techniques to | speaker | authentication over ip networks |
Application of New Qualitative Voicing Time-Frequency Features for | speaker | Recognition |
Approach to Statistical Lip Modelling for | speaker | Identification via Chromatic Feature Extraction, An |
ART2-based multiple MLPs neural network for | speaker | -independent recognition of isolated words |
Assessing | speaker | independence on a speech-based depression level estimation system |
ASVFI: Audio-Driven | speaker | Video Frame Interpolation |
Attention Based | speaker | -independent Audio-visual Deep Learning Model for Speech Enhancement, An |
Attention guided deep audio-face fusion for efficient | speaker | naming |
Audio Segmentation and | speaker | Localization in Meeting Videos |
Audio Visual | speaker | Verification Based on Hybrid Fusion of Cross Modal Features |
Audio-Video detection of the active | speaker | in meetings |
Audio-Visual Active | speaker | Tracking in Cluttered Indoors Environments |
Audio-Visual Particle Flow SMC-PHD Filtering for Multi- | speaker | Tracking |
Audio-visual | speaker | detection using dynamic Bayesian networks |
Audio-Visual | speaker | Diarization Based on Spatiotemporal Bayesian Fusion |
Audio-Visual | speaker | Identification Based on the Use of Dynamic Audio and Visual Features |
audio-visual | speaker | identification using coupled hidden Markov models, A |
Audio-Visual | speaker | Identification via Adaptive Fusion Using Reliability Estimates of Both Modalities |
Audio-visual | speaker | identification with multi-view distance metric learning |
Audio-Visual | speaker | Localization Using Graphical Models |
Audio-visual | speaker | tracking with importance particle filters |
Audio-Visual Tracking of Concurrent | speaker | s |
Audiovisual localization of multiple | speaker | s in a video teleconferencing setting |
Audiovisual Talking Head for Augmented Speech Generation: Models and Animations Based on a Real | speaker | 's Articulatory Data, An |
Auto-Tuning Spectral Clustering for | speaker | Diarization Using Normalized Maximum Eigengap |
Automatic Cross-Biometric Footstep Database Labelling Using | speaker | Recognition |
Automatic Estimation of a Priori | speaker | Dependent Thresholds in Speaker Verification |
Automatic Estimation of a Priori | speaker | Dependent Thresholds in Speaker Verification |
Automatic | speaker | verification on narrowband and wideband lossy coded clean speech |
Avoiding dominance of | speaker | features in speech-based depression detection |
AWLloss: | speaker | Verification Based on the Quality and Difficulty of Speech |
Bag-of-phonemes Model for Homeplace Classification of Mandarin | speaker | s, A |
Bayesian Approach to Audio-Visual | speaker | Identification, A |
Bayesian Network Approach for Combining Pitch and Reliable Spectral Envelope Features for Robust | speaker | Verification, A |
Bilingual Speech Recognition by Estimating | speaker | Geometry from Video Data |
Biometric Identification Using Motion History Images of a | speaker | 's Lip Movements |
Biometric template protection for | speaker | recognition based on universal background models |
Blue-green color categorization in Mandarin-English | speaker | s |
Boosted learning in dynamic Bayesian networks for multimodal | speaker | detection |
Boosting and structure learning in dynamic Bayesian networks for audio-visual | speaker | detection |
Boosting-Based Multimodal | speaker | Detection for Distributed Meeting Videos |
Cascade Image Transform for | speaker | Independent Automatic Speech Reading, A |
CentriForce: Multiple-Domain Adaptation for Domain-Invariant | speaker | Representation Learning |
Centroid-aware local discriminative metric learning in | speaker | verification |
Channel/Handset Mismatch Evaluation in a Biometric | speaker | Verification Using Shifted Delta Cepstral Features |
Cluster-Dependent Feature Transformation for Telephone-Based | speaker | Verification |
Cluster-Guided Unsupervised Domain Adaptation for Deep | speaker | Embedding |
Cochannel | speaker | count labelling based on the use of cepstral and pitch prediction derived features |
Colored Noise Based Multicondition Training Technique for Robust | speaker | Identification |
Combination of Cepstral and Phonetically Discriminative Features for | speaker | Verification |
Combined Audio Visual | speaker | Tracking |
Combining classifier decisions for robust | speaker | identification |
Combining the Likelihood and the Kullback-Leibler Distance in Estimating the Universal Background Model for | speaker | Verification Using SVM |
Comparative evaluation of feature normalization techniques for voice password based | speaker | verification |
Comparative evaluation of maximum a Posteriori vector quantization and Gaussian mixture models in | speaker | verification |
Comparative Study of Feature and Score Normalization for | speaker | Verification, A |
Comparative Study to Evaluate a Text-Independent | speaker | Identification Engine for Arabic Speakers Using a CHMM-Based Approach, A |
Comparative Study to Evaluate a Text-Independent | speaker | Identification Engine for Arabic Speakers Using a CHMM-Based Approach, A |
Comparing Data-driven and Phonetic N-gram Systems for Text-Independent | speaker | Verification |
Comparison between supervised and unsupervised learning of probabilistic linear discriminant analysis mixture models for | speaker | verification |
Comparison of Clustering Methods for MLP-based | speaker | Verification |
Comparison of clustering methods: A case study of text-independent | speaker | modeling |
Compensation Techniques for | speaker | Variability in Continuous Emotion Prediction |
Compositional clustering: Applications to multi-label object recognition and | speaker | identification |
conditional mixture of neural networks for face detection, applied to locating and tracking an individual | speaker | , A |
Constructing the Discriminative Kernels Using GMM for Text-Independent | speaker | Identification |
Continuous Estimation of Emotions in Speech by Dynamic Cooperative | speaker | Models |
Contrasting the Effects of Different Frequency Bands on | speaker | and Accent Identification |
Controllable Multi-Lingual Multi- | speaker | Multi-Style Text-to-Speech Synthesis With Multivariate Information Minimization, A |
Cosine Scoring With Uncertainty for Neural | speaker | Embedding |
Creating | speaker | independent ASR system through prosody modification based data augmentation |
Cross Modal Video Representations for Weakly Supervised Active | speaker | Localization |
Cross-modal | speaker | Verification and Recognition: A Multilingual Perspective |
Cross-Modal Supervision for Learning Active | speaker | Detection in Video |
Cross- | speaker | Emotion Transfer Through Information Perturbation in Emotional Speech Synthesis |
Crossed-time Delay Neural Network for | speaker | Recognition |
Crossmodal Matching of | speaker | s Using Lip and Voice Features in Temporally Non-overlapping Audio and Video Streams |
D 3 Net: A Unified | speaker | -Listener Architecture for 3D Dense Captioning and Visual Grounding |
DANTE | speaker | Recognition Module. An Efficient and Robust Automatic Speaker Searching Solution for Terrorism-Related Scenarios |
DANTE | speaker | Recognition Module. An Efficient and Robust Automatic Speaker Searching Solution for Terrorism-Related Scenarios |
Data-Driven Impostor Selection for T-Norm Score Normalisation and the Background Dataset in SVM-Based | speaker | Verification |
Decision-Based Attack to | speaker | Recognition System via Local Low-Frequency Perturbation |
Decision-Tree-Based Online | speaker | Clustering, A |
Deep Audio-Visual Beamforming for | speaker | Localization |
Deep Neural Network Approaches to | speaker | and Language Recognition |
Deep neural networks for automatic | speaker | recognition do not learn supra-segmental temporal features |
Deep | speaker | Embedding Using Hybrid Network of Multi-Feature Aggregation and Multi-Loss Fusion for TI-SV |
Detection of a | speaker | in Video by Combined Analysis of Speech Sound and Mouth Movement |
Detection of Calls From Smart | speaker | Devices |
Development of a 3D Haptic Rendering System with the String-Based Haptic Interface Device and Vibration | speaker | s, A |
DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided | speaker | Embedding |
Dimension-Decoupled Gaussian Mixture Model for Short Utterance | speaker | Recognition |
Directly Modeling of Correlation Matrices for GMM in | speaker | Identification |
Discriminative Analysis of Lip Motion Features for | speaker | Identification and Speech-Reading |
Discriminative lip-motion features for biometric | speaker | identification |
Distributed genetic algorithm for Gaussian mixture model based | speaker | identification |
Domain Mismatch Compensation for | speaker | Recognition Using a Library of Whiteners |
DTW-based probability model for | speaker | feature analysis and data mining, A |
Dual-Factor Authentication System Featuring | speaker | Verification and Token Technology, A |
Dual-modality Talking-metrics: 3D Visual-Audio Integrated Behaviometric Cues from | speaker | s |
Dynamic Bayesian Networks for Audio-Visual | speaker | Recognition |
Dynamic Convolution With Global-Local Information for Session-Invariant | speaker | Representation Learning |
Effect of Window Size and Shift Period in Mel-Warped Cepstral Feature Extraction on GMM-Based | speaker | Verification |
Effective | speaker | spotting for watch-list detection of fraudsters in telephone banking |
Effective | speaker | verification via dynamic mismatch compensation |
Efficient identification of | speaker | s in news video based on shot segmentation |
Efficient text independent | speaker | recognition with wavelet feature selection based multilayered neural network using supervised learning algorithm |
Egocentric Deep Multi-Channel Audio-Visual Active | speaker | Localization |
Emotional Speech Clustering Based Robust | speaker | Recognition System |
End-to-End Active | speaker | Detection |
Enhanced Deep Neural Network-Based Approach for | speaker | Recognition Using Triumvirate Euphemism Strategy, An |
Enhanced | speaker | recognition based on intra-modal fusion and accent modeling |
Enhanced VQ-Based Algorithms for Speech Independent | speaker | Identification |
Enhancing Transferability of Adversarial Audio in | speaker | Recognition Systems |
Enrollee-constrained sparse coding of test data for | speaker | verification |
Estimating | speaker | Height and Subglottal Resonances Using MFCCs and GMMs |
European Portuguese Accent in Acoustic Models for Non-native English | speaker | s |
Evaluation of Lineal Relation between Shifted Delta Cepstral Features and Prosodic Features in | speaker | Verification |
Evaluation of Visual Speech Features for the Tasks of Speech and | speaker | Recognition, An |
Evaluation of wolf attack for classified target on | speaker | verification systems |
Exploiting Glottal Information in | speaker | Recognition Using Parallel GMMs |
Exploiting temporal information to detect conversational groups in videos and predict the next | speaker | |
Exploiting the Complementarity of Audio and Visual Data in Multi- | speaker | Tracking |
Exploring kernel discriminant analysis for | speaker | verification with limited test data |
Factorizing | speaker | , lexical and emotional variabilities observed in facial expressions |
Feature classification criterion for missing features mask estimation in robust | speaker | recognition |
Feature extracted from wavelet decomposition using biorthogonal Riesz basis for text-independent | speaker | recognition |
Feature extracted from wavelet eigenfunction estimation for text-independent | speaker | recognition |
Feature Extraction For Visual | speaker | Authentication Against Computer-Generated Video Attacks |
Feature extraction using HHT-based locally optimized short-time fractional Fourier transform for | speaker | recognition |
Feature Selection Based on Information Theory for | speaker | Verification |
Few-Shot Lip-Password Based | speaker | Verification |
Few-Shot | speaker | Identification Using Lightweight Prototypical Network With Feature Grouping and Interaction |
Finding | speaker | Face Region by Audiovisual Correlation |
Fingerprint and | speaker | verification decisions fusion |
Fingerprint and | speaker | verification decisions fusion using a functional link network |
From Speech Quality Measures to | speaker | Recognition Performance |
Further reduced form of wavelet feature for text independent | speaker | recognition |
Fused Speech Enhancement Framework for Robust | speaker | Verification, A |
Fusing wavelet and short-term features for | speaker | identification in noisy environment |
Gaussian Selection for | speaker | Recognition Using Cumulative Vectors |
Generalized Variability Model for | speaker | Verification |
GFM-Based Methods for | speaker | Identification |
Group Delay Based Methods for | speaker | Segregation and its Application in Multimedia Information Retrieval |
Hierarchical Bayesian combination of plug-in maximum a posteriori decoders in deep neural networks-based speech recognition and | speaker | adaptation |
High Performance | speaker | Verification System Based on Multilayer Perceptrons and Real-Time Enrollment |
HMM-Based Subband Processing Approach to | speaker | Identification, An |
HNM-Based | speaker | -Nonspecific Timbre Transformation Scheme for Speech Synthesis, An |
How to Design a Three-Stage Architecture for Audio-Visual Active | speaker | Detection in the Wild |
Hybrid Network For End-To-End Text-Independent | speaker | Identification |
IBM RT07 Evaluation Systems for | speaker | Diarization on Lecture Meetings, The |
ICSI RT07s | speaker | Diarization System, The |
Identify the Benefits of the Different Steps in an i-Vector Based | speaker | Verification System |
Identifying optimised | speaker | identification model using hybrid GRU-CNN feature extraction technique |
Impact of Prior Channel Information for | speaker | Identification |
Improved Active | speaker | Detection based on Optical Flow |
Improved Audio-Visual | speaker | Recognition via the Use of a Hybrid Combination Strategy |
Improved | speaker | and Navigator for Vision-and-Language Navigation |
Improved Two-stage Wiener Filter for Robust | speaker | Identification |
Improvement of Lip Reading Performance in Real Environments Using | speaker | and Environmental Adaptation |
Improving | speaker | identification in noise by subband processing and decision fusion |
Improving | speaker | Turn Embedding by Crossmodal Transfer Learning from Face Embedding |
Improving | speaker | Verification Using ALISP-Based Specific GMMs |
Improving the characterization of the alternative hypothesis via minimum verification error training with applications to | speaker | verification |
Incremental MLLR | speaker | adaptation by fuzzy logic control |
Individual Dimension Gaussian Mixture Model for | speaker | Identification |
Influence of Speech/Non-Speech Segmentation on On-Line and Off-Line | speaker | Segmentation Accuracy, The |
Information based | speaker | verification |
Information Theoretic Expectation Maximization Based Gaussian Mixture Modeling for | speaker | Verification |
Integration of Face and | speaker | Recognition by Subspace Method |
Integration of MKL-Based and I-Vector-Based | speaker | Verification by Short Utterances |
Investigation on LP-residual representations for | speaker | identification |
Is masking a relevant aspect lacking in MFCC? A | speaker | verification perspective |
Joint | speaker | -Listener-Reinforcer Model for Referring Expressions, A |
Kernel oriented discriminant analysis for | speaker | -independent phoneme spaces |
kernel trick for sequences applied to text-independent | speaker | verification systems, A |
Kernel-based Discrimination Framework for Solving Hypothesis Testing Problems with Application to | speaker | Verification, A |
Keynote | speaker | : An engineered microenvironment for manipulating cells in tissue regeneration: A way towards the development of bioartificial organs using mesenchymal stem cell |
Keynote | speaker | : An experimental and computational framework to build a dynamic protein atlas for human cell division |
Keynote | speaker | : Computer aided diagnostics in medicine: Discrimination for some lung diseases |
Keynote | speaker | : Electromagnetic performances analysis of an ultra-wideband antenna in microwave breast imaging |
Keynote | speaker | : Interacting in spatial augmented reality |
Keynote | speaker | : Nano-cancer technology: New diagnostic and therapeutic devices |
Keynote | speaker | : Sensing motion on wild data and virtual reality application |
Keynote | speaker | : State of the art brain PET inserts for the existing MRI system |
Labelled Non-Zero Diffusion Particle Flow SMC-PHD Filtering for Multi- | speaker | Tracking |
Large-Scale Open-Source Acoustic Simulator for | speaker | Recognition, A |
Learning Landmarks Motion from Speech for | speaker | -agnostic 3d Talking Heads Generation |
Learning Long-Term Spatial-Temporal Graphs for Active | speaker | Detection |
Learning polynomial function based neutral-emotion GMM transformation for emotional | speaker | recognition |
Learning | speaker | -specific Lip-to-Speech Generation |
Learning Virtual HD Model for Bi-model Emotional | speaker | Recognition |
LIA RT'07 | speaker | Diarization System, The |
Light Weight Model for Active | speaker | Detection, A |
Lightweight CNN-Conformer Model for Automatic | speaker | Verification, A |
Lightweight | speaker | Recognition in Poincare Spaces |
LipFormer: Learning to Lipread Unseen | speaker | s Based on Visual-Landmark Transformers |
LISTEN: a system for locating and tracking individual | speaker | s |
Local fuzzy PCA based GMM with dimension reduction on | speaker | identification |
Local Pairwise Linear Discriminant Analysis for | speaker | Verification |
Look who's talking: | speaker | detection using video and audio correlation |
Look&listen: Multi-Modal Correlation Learning for Active | speaker | Detection and Speech Enhancement |
Low-Complexity Parabolic Lip Contour Model With | speaker | Normalization for High-Level Feature Extraction in Noise-Robust Audiovisual Speech Recognition, A |
MAAS: Multi-modal Assignation for Active | speaker | Detection |
Major Cast Detection in Video Using Both | speaker | and Face Information |
Malay lexical simplification model for non-native | speaker | |
Maximum Gaussianality training for deep | speaker | vector normalization |
Maximum Likelihood and Maximum a Posteriori Adaptation for Distributed | speaker | Recognition Systems |
Maximum Likelihood Discriminant Feature for Text-Independent | speaker | Verification |
Maximum likelihood Linear Dimension Reduction of heteroscedastic feature for robust | speaker | Recognition |
Mean-Shift and Sparse Sampling-Based SMC-PHD Filtering for Audio Informed Visual | speaker | Tracking |
Method and apparatus for animation of a human | speaker | |
Methods and apparatuses for segmenting an audio-visual recording using image similarity searching and audio | speaker | recognition |
Minimising | speaker | Verification Utterance Length through Confidence Based Early Verification Decisions |
Mixture Linear Prediction in | speaker | Verification Under Vocal Effort Mismatch |
MKPLS: Manifold Kernel Partial Least Squares for Lipreading and | speaker | Identification |
MLLR Transforms Based | speaker | Recognition in Broadcast Streams |
Modified Segmental Histogram Equalization for robust | speaker | verification |
Moving-Talker, | speaker | -Independent Feature Study, and Baseline Results Using the CUAVE Multimodal Speech Corpus |
Multi-Feature Fusion Using Multi-GMM Supervector for SVM | speaker | Verification |
Multi-level Fusion of Audio and Visual Features for | speaker | Identification |
Multi-resolution form of SVD for text-independent | speaker | recognition |
Multi-SNR GMMs-Based Noise-Robust | speaker | Verification Using 1/fa Noises |
Multi- | speaker | Tracking From an Audio-Visual Sensing Device |
Multi- | speaker | voice activity detection using a camera-assisted microphone array |
Multi-stage | speaker | Diarization for Conference and Lecture Meetings |
Multi-Target Extractor and Detector for Unknown-Number | speaker | Diarization |
multi-task network for | speaker | and command recognition in industrial environments, A |
Multimedia Document Retrieval Using Speech and | speaker | Recognition |
Multimodal Approach to | speaker | Diarization on TV Talk-Shows, A |
Multimodal Multi-Channel On-Line | speaker | Diarization Using Sensor Fusion Through SVM |
Multimodal | speaker | Detection using Error Feedback Dynamic Bayesian Networks |
Multimodal | speaker | Diarization |
Multimodal | speaker | identification with audio-video processing |
Multimodal | speaker | Recognition in a Conversation Scenario |
Multimodal | speaker | Verification Using Ear Image Features Extracted by PCA and ICA |
Multimodality Framework for Creating | speaker | /Non-Speaker Profile Databases for Real-World Video, A |
Multimodality Framework for Creating | speaker | /Non-Speaker Profile Databases for Real-World Video, A |
Multiple | speaker | Tracking in Spatial Audio via PHD Filtering and Depth-Audio Fusion |
Neural Acoustic-Phonetic Approach for | speaker | Verification With Phonetic Attention Mask |
new approach for text-independent | speaker | recognition, A |
New Hybrid GMM/SVM for | speaker | Verification, A |
New On-Line Model Quality Evaluation Method for | speaker | Verification, A |
Noise robust voice detector for | speaker | recognition |
Noise-Robust Self-Adaptive Multitarget | speaker | Detection System, A |
novel method for estimating the number of | speaker | s based on generalized eigenvalue-vector decomposition and adaptive wavelet transform by using K-means clustering, A |
Novel Text-Independent | speaker | Verification System Using Ant Colony Optimization Algorithm, A |
Novel Windowing Technique for Efficient Computation of MFCC for | speaker | Recognition, A |
On combining classifiers for | speaker | authentication |
On the Complementarity of Phone Posterior Probabilities for Improved | speaker | Recognition |
On the Performance and Use of | speaker | Recognition Systems for Surveillance |
On the Results of the First Mobile Biometry (MOBIO) Face and | speaker | Verification Evaluation |
On the Robustness of Cross-lingual | speaker | Recognition using Transformer-based Approaches |
On the use of different speech representations for | speaker | modeling |
On the Use of Dot Scoring for | speaker | Diarization |
On the use of nearest feature line for | speaker | identification |
On Training Speech Separation Models With Various Numbers of | speaker | s |
Online | speaker | emotion tracking with a dynamic state transition model |
Optimizing Multi-Taper Features for Deep | speaker | Verification |
Parametric Representation of the | speaker | 's Lips for Multimodal Sign Language And Speech Recognition |
Partially Supervised | speaker | Clustering |
Person instance graphs for mono-, cross- and multi-modal person recognition in multimedia data: application to | speaker | identification in TV broadcast |
Phoneme analysis for multiple languages with fuzzy-based | speaker | identification |
Plda-based system for text-prompted password | speaker | verification |
Plenary | speaker | : Development of the next generation positron emission tomography |
Plenary | speaker | : Granular video tracking: Role of r-granules |
Plenary | speaker | : Smart material interfaces: Playful and artistic applications |
Predicting | speaker | Head Nods and the Effects of Affective Information |
Privacy-preserving | speaker | verification system based on binary I-vectors |
Probabilistic Random Projections and | speaker | Verification |
Progress in the AMIDA | speaker | Diarization System for Meeting Data |
Prototype Division for Self-Supervised | speaker | Verification |
Pruning Approach for GMM-Based | speaker | Verification in Mobile Embedded Systems, A |
Rapid feature space MLLR | speaker | adaptation for deep neural network acoustic modeling |
real-time text-independent | speaker | identification system, A |
Real-time unsupervised | speaker | change detection |
Regularized All-Pole Models for | speaker | Verification Under Noisy Environments |
Regularized Auto-Associative Neural Networks for | speaker | Verification |
RES-StS: Referring Expression | speaker | via Self-Training With Scorer for Goal-Oriented Vision-Language Navigation |
Revisiting Carl Bildt's Impostor: Would a | speaker | Verification System Foil Him? |
Robust Bootstrapping of | speaker | Models for Unsupervised Speaker Indexing |
Robust Bootstrapping of | speaker | Models for Unsupervised Speaker Indexing |
Robust face-voice based | speaker | identity verification using multilevel fusion |
Robust Local Scoring Function for Text-Independent | speaker | Verification |
Robust Multi- | speaker | Tracking via Dictionary Learning and Identity Modeling |
Robust | speaker | Identification for Meetings: UPC CLEAR'07 Meeting Room Evaluation System |
Robust | speaker | recognition based on filtering in autocorrelation domain and sub-band feature recombination |
Robust | speaker | Recognition in Cross-Channel Condition |
Robust | speaker | Verification via Asynchronous Fusion of Speech and Lip Information |
Robust | speaker | 's Location Detection in a Vehicle Environment Using GMM Models |
Robust visual features for the multimodal identification of unregistered | speaker | s in TV talk-shows |
SAGRNN: Self-Attentive Gated RNN For Binaural | speaker | Separation With Interaural Cue Preservation |
SC-CNN: Effective | speaker | Conditioning Method for Zero-Shot Multi-Speaker Text-to-Speech Systems |
SC-CNN: Effective | speaker | Conditioning Method for Zero-Shot Multi-Speaker Text-to-Speech Systems |
Scatter Difference NAP for SVM | speaker | Recognition |
Scores Selection for Emotional | speaker | Recognition |
Searching through a Speech Memory for Text-Independent | speaker | Verification |
Selection of | speaker | Independent Feature for a Speaker Verification System |
Selection of | speaker | Independent Feature for a Speaker Verification System |
Selection of the Best Wavelet Packet Nodes Based on Mutual Information for | speaker | Identification |
Selective HuBERT: Self-Supervised Pre-Training for Target | speaker | in Clean and Mixture Speech |
Self-supervised contrastive | speaker | verification with nearest neighbor positive instances |
Semi-automated | speaker | Adaptation: How to Control the Quality of Adaptation? |
Sequence-Level | speaker | Change Detection With Difference-Based Continuous Integrate-and-Fire |
Sequential Monte Carlo Fusion of Sound and Vision for | speaker | Tracking |
Session compensation using binary speech representation for | speaker | recognition |
Sheep, Goats, Lambs and Wolves: A Statistical Analysis of | speaker | Performance in the NIST 1998 Speaker Recognition Evaluation |
Sheep, Goats, Lambs and Wolves: A Statistical Analysis of | speaker | Performance in the NIST 1998 Speaker Recognition Evaluation |
Signal-to-Signal Ratio Independent | speaker | Identification for Co-channel Speech Signals |
SilentTrig: An imperceptible backdoor attack against | speaker | identification with hidden triggers |
Similarity normalization for | speaker | verification by fuzzy fusion |
Simple But Effective Approach to | speaker | Tracking in Broadcast News, A |
Simple Noise Robust Feature Vector Selection Method for | speaker | Recognition |
Simultaneous- | speaker | Voice Activity Detection and Localization Using Mid-Fusion of SVM and HMMs |
Skeleton-based Methods for | speaker | Action Classification on Lecture Videos |
SNAC: | speaker | -Normalized Affine Coupling Layer in Flow-Based Architecture for Zero-Shot Multi-Speaker Text-to-Speech |
SNAC: | speaker | -Normalized Affine Coupling Layer in Flow-Based Architecture for Zero-Shot Multi-Speaker Text-to-Speech |
SNR-Invariant Multitask Deep Neural Networks for Robust | speaker | Verification |
Sparse Representation for | speaker | Identification |
| speaker | adaptation based on MAP estimation using fuzzy controller |
| speaker | adaptation in a large-vocabulary Gaussian HMM recognizer |
| speaker | adaptation using semi-continuous hidden Markov models |
| speaker | and Digit Recognition by Audio-Visual Lip Biometrics |
| speaker | Attractor Network: Generalizing Speech Separation to Unseen Numbers of Sources |
| speaker | Change Detection Based on the Pairwise Distance Matrix |
| speaker | Classification via Supervised Hierarchical Clustering Using ICA Mixture Model |
| speaker | Clustering by Co-Optimizing Deep Representation Learning and Cluster Estimation |
| speaker | Clustering Using Dominant Sets |
| speaker | Dependent ASRs for Huastec and Western-Huastec Nahuatl Languages |
| speaker | dependent video indexing based on audio-visual interaction |
| speaker | detection using the timing structure of lip motion and sound |
| speaker | Diarization for Conference Room: The UPC RT07s Evaluation System |
| speaker | Diarization Using Direction of Arrival Estimate and Acoustic Feature Information: The I2R-NTU Submission for the NIST RT 2007 Evaluation |
| speaker | Discrimination Based on a Fusion Between Neural and Statistical Classifiers |
| speaker | Discrimination Using Several Classifiers and a Relativistic Speaker Characterization |
| speaker | Discrimination Using Several Classifiers and a Relativistic Speaker Characterization |
| speaker | Discriminative Weighting Method for VQ-Based Speaker Identification |
| speaker | Discriminative Weighting Method for VQ-Based Speaker Identification |
| speaker | Extraction With Co-Speech Gestures Cue |
| speaker | Identification and Verification Using Support Vector Machines and Sparse Kernel Logistic Regression |
| speaker | Identification and Video Analysis for Hierarchical Video Shot Classification |
| speaker | identification based on adaptive discriminative vector quantisation |
| speaker | identification based on Classification Sub-space Gaussian Mixture Model |
| speaker | Identification Based on Integrated Face Direction in a Group Conversation |
| speaker | identification security improvement by means of speech watermarking |
| speaker | Identification System Based on Lip-Motion Feature |
| speaker | Identification Using Higher Order Spectral Phase Features and their Effectiveness vis-a-vis Mel-Cepstral Features |
| speaker | identification using hybrid Karhunen-Loeve transform and Gaussian mixture model approach |
| speaker | identification using multimodal neural networks and wavelet analysis |
| speaker | Identification Using the VQ-Based Discriminative Kernels |
| speaker | Independent Audio-Visual Speech Recognition |
| speaker | independent VSR: A systematic review and futuristic applications |
| speaker | Modeling with Various Speech Representations |
| speaker | Pruning Algorithm for Real-Time Speaker Identification, A |
| speaker | Pruning Algorithm for Real-Time Speaker Identification, A |
| speaker | recognition based on dynamic MFCC parameters |
| speaker | Recognition by Machines and Humans: A tutorial review |
| speaker | Recognition Using a Binary Representation and Specificities Models |
| speaker | recognition: general classifier approaches and data fusion methods |
| speaker | Tracking Algorithm Based on Audio and Visual Information Fusion Using Particle Filter, A |
| speaker | Tracking Based on Distributed Particle Filter in Distributed Microphone Networks |
| speaker | Tracking Using Multi-modal Fusion Framework |
| speaker | Verification in Noisy Environment Using Missing Feature Approach |
| speaker | Verification Using A Novel Set of Dynamic Features |
| speaker | Verification Using Accumulative Vectors with Support Vector Machines |
| speaker | Verification with Adaptive Spectral Subband Centroids |
| speaker | verification with short utterances: a review of challenges, trends and opportunities |
| speaker | Verification, Speaker Identification |
| speaker | Verification, Speaker Identification |
| speaker | -Adaptive Lip Reading with User-Dependent Padding |
| speaker | -aware Multi-Task Learning for automatic speech recognition |
| speaker | -aware Speech Emotion Recognition by Fusing Amplitude and Phase Information |
| speaker | -Independent Lipreading By Disentangled Representation Learning |
| speaker | -Independent Lipreading With Limited Data |
| speaker | -Independent Speech Animation Using Perceptual Loss Functions and Synthetic Data |
| speaker | -independent Speech Recognition by Means of Functional-link Neural Networks |
| speaker | s' Use of Interactive Gestures as Markers of Common Ground |
Spectral Subband Centroids as Complementary Features for | speaker | Authentication |
Speech balloon and | speaker | association for comics and manga understanding |
statistical prediction model of | speaker | s' intentions using multi-level features in a goal-oriented dialog system, A |
Stealthy Backdoor Attack Against | speaker | Recognition Using Phase-Injection Hidden Trigger |
Style Transfer for Co-speech Gesture Animation: A Multi- | speaker | Conditional-mixture Approach |
Supplementary Material: AVA-Active | speaker | : An Audio-Visual Dataset for Active Speaker Detection |
Support Vector Machine Regression for Robust | speaker | Verification in Mismatching and Forensic Conditions |
SVM Based GMM Supervector | speaker | Recognition Using LP Residual Signal |
SVM | speaker | Verification Using Session Variability Modelling and GMM Supervectors |
SVSNet: An End-to-End | speaker | Voice Similarity Assessment Model |
Symmetric Saliency-Based Adversarial Attack to | speaker | Identification |
Synthetic Speech Detection Based on the Temporal Consistency of | speaker | Features |
System and method for audio/video | speaker | detection |
System and process for adding high frame-rate current | speaker | data to a low frame-rate video using audio watermarking techniques |
System and process for adding high frame-rate current | speaker | data to a low frame-rate video using delta frames |
TechWare: | speaker | and Spoken Language Recognition Resources |
Temporal Information in a Binary Framework for | speaker | Recognition |
Text independent | speaker | gender recognition using lip movement |
Text-independent | speaker | identification using Radon and discrete cosine transforms based features from speech spectrogram |
Text-independent | speaker | recognition using graph matching |
Text-independent | speaker | verification with ant colony optimization feature selection and support vector machine |
Three-Dimensional Lip Motion Network for Text-Independent | speaker | Recognition |
Three-Dimensional | speaker | Localization: Audio-Refined Visual Scaling Factor Estimation |
Time and frequency pruning for | speaker | identification |
Time-normalization techniques for | speaker | -independent isolated word recognition |
Total Variability Layer in Deep Neural Network Embeddings for | speaker | Verification |
Toward A | speaker | -Independent Real-Time Affect Detection System |
Toward Text-independent Cross-lingual | speaker | Recognition Using English-Mandarin-Taiwanese Dataset |
Towards better making a decision in | speaker | verification |
Towards Structured Approaches to Arbitrary Data Selection and Performance Prediction for | speaker | Recognition |
Towards Zero-Shot Multi- | speaker | Multi-Accent Text-to-Speech Synthesis |
Tracking the Active | speaker | Based on a Joint Audio-Visual Observation Model |
Two-level Method for Unsupervised | speaker | -based Audio Segmentation, A |
Type-2 Fuzzy GMMs for Robust Text-Independent | speaker | Verification in Noisy Environments |
U-NORM Likelihood Normalization in PIN-Based | speaker | Verification Systems |
UBM-Based Reference Space for | speaker | Recognition, An |
Understanding Public | speaker | s' Performance: First Contributions to Support a Computational Approach |
unique approach in text independent | speaker | recognition using MFCC feature sets and probabilistic neural network, A |
Unsupervised | speaker | Identification for TV News |
Use of Neumann series decomposition to fit the Weighted Euclidean distance and Inner product scoring models in automatic | speaker | recognition |
User-centric | speaker | report: Ranking-based effectiveness evaluation and feedback |
Using Polynomial Kernel Support Vector Machines for | speaker | Verification |
Variable-Length | speaker | Conditioning in Flow-Based Text-to-Speech |
Variational Bayesian Inference for Audio-Visual Tracking of Multiple | speaker | s |
Variational DNN embeddings for text-independent | speaker | verification |
Vector quantization based Gaussian modeling for | speaker | verification |
Vector Quantization Mappings for | speaker | Verification |
Verification effectiveness in open-set | speaker | identification |
VisageSynTalk: Unseen | speaker | Video-to-Speech Synthesis via Speech-Visage Feature Selection |
Vision-Based | speaker | Detection Using Bayesian Networks |
Visual Lip Activity Detection and | speaker | Detection Using Mouth Region Intensities |
Visual Signal Reliability for Robust Audio-Visual | speaker | Identification, A |
Visual | speaker | authentication by ensemble learning over static and dynamic lip details |
Visual | speaker | authentication with random prompt texts by a dual-task CNN framework |
Visual | speaker | Identification with Spatiotemporal Directional Features |
Visually steerable sound beam forming system based on face tracking and | speaker | array |
Voice activity detection and | speaker | localization using audiovisual cues |
Wavelet feature domain adaptive noise reduction using learning algorithm for text-independent | speaker | recognition |
Wavelet feature selection based neural networks with application to the text independent | speaker | identification |
Wrapped Kalman Filter for Azimuthal | speaker | Tracking, A |
Xi-Vector Embedding for | speaker | Recognition |
You're Not You When You're Angry: Robust Emotion Features Emerge by Recognizing | speaker | s |
457 for speaker