| _ | talking | _ |
| 3D-Aware | talking | -Head Video Motion Transfer |
| 3D-Aware Text-Driven | talking | Avatar Generation |
| AD-NeRF: Audio Driven Neural Radiance Fields for | talking | Head Synthesis |
| All You Need Is Your Voice: Emotional Face Representation with Audio Perspective for Emotional | talking | Face Generation |
| Alleviating One-to-Many Mapping in | talking | Head Synthesis With Dynamic Adaptation Context and Style Adapter |
| AnyoneNet: Synchronized Speech and | talking | Head Generation for Arbitrary Persons |
| Are You | talking | to Me? Reasoned Visual Dialog Generation Through Adversarial Learning |
| Audio-driven | talking | face generation with diverse yet realistic facial animations |
| Audio-driven | talking | Face Generation with Stabilized Synchronization Loss |
| Audio-Driven | talking | Face Video Generation With Dynamic Convolution Kernels |
| Audio-Driven | talking | Video Frame Restoration |
| Audio-Semantic Enhanced Pose-Driven | talking | Head Generation |
| Audio-visual selection process for the synthesis of photo-realistic | talking | -head animations |
| Audio-Visual Speech Representation Expert for Enhanced | talking | Face Video Generation and Evaluation |
| Audio-Visual Unit Selection for the Synthesis of Photo-Realistic | talking | -Heads |
| Audiovisual | talking | Head for Augmented Speech Generation: Models and Animations Based on a Real Speaker's Articulatory Data, An |
| Avatar Fingerprinting for Authorized Use of Synthetic | talking | -head Videos |
| Bangla | talking | Calculator for Visually Impaired Students in Bangladesh |
| Beyond Fixed Topologies: Unregistered Training and Comprehensive Evaluation Metrics for 3D | talking | Heads |
| Beyond | talking | : Generating Holistic 3D Human Dyadic Motion for Communication |
| CoFaCo: Controllable Generative | talking | Face Video Coding |
| Combining online and offline learning for tracking a | talking | face in video |
| Compact Temporal Trajectory Representation for | talking | Face Video Compression |
| Comparative Study of Perceptual Quality Metrics For Audio-Driven | talking | Head Videos, A |
| Compression of MPEG-4 Facial Animation Parameters for Transmission of | talking | Heads |
| Continuously Controllable Facial Expression Editing in | talking | Face Videos |
| Cptnet: Cascade Pose Transform Network for Single Image | talking | Head Animation |
| Creating 3D speech-driven | talking | heads: a probabilistic network approach |
| Czech Artificial Computerized | talking | Head George |
| DaGAN++: Depth-Aware Generative Adversarial Network for | talking | Head Video Generation |
| DAVD-Net: Deep Audio-Aided Video Decompression of | talking | Heads |
| Defending Low-Bandwidth | talking | Head Videoconferencing Systems From Real-Time Puppeteering Attacks |
| Depth-Aware Generative Adversarial Network for | talking | Head Video Generation |
| Diffused Heads: Diffusion Models Beat GANs on | talking | -Face Generation |
| DisenEmo: Learning disentangled emotional representation from facial motion for 3D | talking | head generation |
| DisFlowEm: One-Shot Emotional | talking | Head Generation Using Disentangled Pose and Expression Flow-Guidance |
| Do-it-yourself photo realistic | talking | head creation system and method |
| Dual-modality | talking | -metrics: 3D Visual-Audio Integrated Behaviometric Cues from Speakers |
| DualTalk: Dual-Speaker Interaction for 3D | talking | Head Conversations |
| EAT-Face: Emotion-Controllable Audio-Driven | talking | Face Generation via Diffusion Model |
| Edtalk: Efficient Disentanglement for Emotional | talking | Head Synthesis |
| Efficient Emotional Adaptation for Audio-Driven | talking | -Head Generation |
| Efficient Region-Aware Neural Radiance Fields for High-Fidelity | talking | Portrait Synthesis |
| EMMN: Emotional Motion Memory Network for Audio-driven Emotional | talking | Face Generation |
| EmoGene: Audio-Driven Emotional 3D | talking | -Head Generation |
| Emotalk3d: High-fidelity Free-view Synthesis of Emotional 3d | talking | Head |
| Emotalker: Audio Driven Emotion Aware | talking | Head Generation |
| EmotiveTalk: Expressive | talking | Head Generation through Audio Information Decoupling and Emotional Video Diffusion |
| EmoVOCA: Speech-Driven Emotional 3D | talking | Heads |
| Exploiting Temporal Audio-Visual Correlation Embedding for Audio-Driven One-Shot | talking | Head Animation |
| Expression-aware neural radiance fields for high-fidelity | talking | portrait synthesis |
| Expressive | talking | Head Generation with Granular Audio-Visual Control |
| Expressive | talking | Head Video Encoding in StyleGAN2 Latent Space |
| Face Analysis for the Synthesis of Photo-Realistic | talking | Heads |
| FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse | talking | Faces from Disentangled Audio |
| Faces that Speak: Jointly Synthesising | talking | Face and Speech from Text |
| FACIAL: Synthesizing Dynamic | talking | Face with Implicit Attribute Learning |
| FakeTalkerDetect: Effective and Practical Realistic Neural | talking | Head Detection with a Highly Unbalanced Dataset |
| Fast Viseme Recognition for | talking | Head Application |
| FastTalker: Real-time audio-driven | talking | face generation with 3D Gaussian |
| Few-Shot Adversarial Learning of Realistic Neural | talking | Head Models |
| Flow-guided One-shot | talking | Face Generation with a High-resolution Audio-visual Dataset |
| FlowVQTalker: High-Quality Emotional | talking | Face Generation through Normalizing Flow and Quantization |
| Free-HeadGAN: Neural | talking | Head Synthesis With Explicit Gaze Control |
| FT2TF: First-Person Statement Text-to- | talking | Face Generation |
| Hierarchical Cross-Modal | talking | Face Generation With Dynamic Pixel-Wise Loss |
| Hierarchical Feature Warping and Blending for | talking | Head Animation |
| High-Fidelity and Freely Controllable | talking | Head Video Generation |
| High-Fidelity Generalized Emotional | talking | Face Generation with Multi-Modal Emotion Space Learning |
| Hypermask: | talking | Head Projected Onto Moving Surface |
| Identity-Preserving | talking | Face Generation with Landmark and Appearance Priors |
| iface: A 3d Synthetic | talking | Face |
| IM-Portrait: Learning 3D-aware Video Diffusion for Photorealistic | talking | Heads from Monocular Videos |
| Implicit Identity Representation Conditioned Memory Compensation Network for | talking | Head Video Generation |
| Implicit Memory-Based Variational Motion | talking | Face Generation |
| Impostures of | talking | Face Systems Using Automatic Face Animation |
| InsTaG: Learning Personalized 3D | talking | Head from Few-Second Video |
| IV2 Multimodal Biometric Database (Including Iris, 2D, 3D, Stereoscopic, and | talking | Face Data), and the IV2-2007 Evaluation Campaign, The |
| Joint Co-Speech Gesture and Expressive | talking | Face Generation Using Diffusion with Adapters |
| Keypoints and Action Units Jointly Drive | talking | Head Generation for Video Conferencing |
| LaughTalk: Expressive 3D | talking | Head Generation with Laughter |
| Learned Spatial Representations for Few-shot | talking | -Head Synthesis |
| Learning Dynamic Facial Radiance Fields for Few-Shot | talking | Head Synthesis |
| Learning Dynamic Tetrahedra for High-Quality | talking | Head Synthesis |
| Learning Landmarks Motion from Speech for Speaker-agnostic 3d | talking | Heads Generation |
| Learning to Recognise | talking | Faces |
| Leveraging Real | talking | Faces via Self-Supervision for Robust Forgery Detection |
| Lifelike | talking | faces for interactive services |
| LipFormer: High-fidelity and Generalizable | talking | Face Generation with A Pre-learned Facial Codebook |
| LipSync3D: Data-Efficient Learning of Personalized 3D | talking | Faces from Video using Pose and Lighting Normalization |
| Look who is not | talking | : Assessing engagement levels in panel conversations |
| Look who's | talking | : Speaker detection using video and audio correlation |
| Low-Latency Real-Time Audio-Driven | talking | Head Generation Based on Future Speech Feature Prediction |
| Mead: A Large-scale Audio-visual Dataset for Emotional | talking | -face Generation |
| Memories are One-to-Many Mapping Alleviators in | talking | Face Generation |
| MergeTalk: Audio-Driven | talking | Head Generation From Single Image With Feature Merge |
| MetaPortrait: Identity-Preserving | talking | Head Generation with Fast Personalized Adaptation |
| Modelling | talking | Head Behaviour |
| Monocular and Generalizable Gaussian | talking | Head Animation |
| Multimodal Emotional | talking | Face Generation Based on Action Units |
| Multimodal Inputs Driven | talking | Face Generation With Spatial-Temporal Dependency |
| Multimodal Learning for Temporally Coherent | talking | Face Generation With Articulator Synergy |
| new frame interpolation scheme for | talking | head sequences, A |
| Non-Invasive Approach for Driving Virtual | talking | Heads from Real Facial Movements, A |
| One-Shot Free-View Neural | talking | -Head Synthesis for Video Conferencing |
| One-Shot High-Fidelity | talking | -Head Synthesis with Deformable Neural Radiance Field |
| OSM-Net: One-to-Many One-Shot | talking | Head Generation With Spontaneous Head Motions |
| OTAvatar: One-Shot | talking | Face Avatar with Controllable Tri-Plane Rendering |
| Partial linear regression for speech-driven | talking | head application |
| Perceptually Accurate 3D | talking | Head Generation: New Definitions, Speech-Mesh Representation, and Evaluation Metrics |
| Personatalk: Preserving Personalized Dynamic Speech Style In | talking | Face Generation |
| Photo realistic | talking | head creation system and method |
| Pose-Controllable | talking | Face Generation by Implicitly Modularized Audio-Visual Representation |
| Progressive Disentangled Representation Learning for Fine-Grained Controllable | talking | Head Synthesis |
| Quality Assessment for | talking | Head Videos via Multi-Modal Feature Representation |
| Reactive Memories: An Interactive | talking | -Head |
| Real-Time Neural Radiance | talking | Portrait Synthesis via Audio-Spatial Decomposition |
| Realistic head motion synthesis for an image-based | talking | head |
| Realistic | talking | Face Synthesis With Geometry-Aware Feature Transformation |
| Recognizing | talking | faces from acoustic Doppler reflections |
| ReliTalk: Relightable | talking | Portrait Generation from a Single Video |
| SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image | talking | Face Animation |
| Sample-based Synthesis of | talking | Heads |
| Scantalk: 3d | talking | Heads from Unregistered Scans |
| SD-NeRF: Towards Lifelike | talking | Head Animation via Spatially-Adaptive Dual-Driven NeRFs |
| Seeing What You Said: | talking | Face Generation Guided by a Lip Reading Expert |
| Shape and appearance models of | talking | faces for model-based tracking |
| Silence is Golden: Leveraging Adversarial Examples to Nullify Audio Control in LDM-based | talking | -Head Generation |
| Simulated natural movement of a computer-generated synthesized | talking | head |
| Speech Driven | talking | Face Generation From a Single Image and an Emotion Condition |
| Speech-Driven Expressive | talking | Lips with Conditional Sequential Generative Adversarial Networks |
| Stochastic Latent | talking | Face Generation Toward Emotional Expressions and Head Poses |
| Style Transfer for 2D | talking | Head Generation |
| StyleHEAT: One-Shot High-Resolution Editable | talking | Face Generation via Pre-trained StyleGAN |
| StyleTalk++: A Unified Framework for Controlling the Speaking Styles of | talking | Heads |
| Survey of | talking | Head Synthesis Techniques: Portrait Generation, Driving Mechanisms, and Editing, A |
| SyncDiff: Diffusion-Based | talking | Head Synthesis with Bottlenecked Temporal Visual Prior for Improved Synchronization |
| SyncTalk: The Devil is in the Synchronization for | talking | Head Synthesis |
| Synergizing Motion and Appearance: Multi-Scale Compensatory Codebooks for | talking | Head Video Generation |
| Synthesizing a | talking | mouth |
| Synthesizing Photo-Realistic 3D | talking | Head: Learning Lip Synchronicity and Emotion from Audio and Video |
| Synthesizing | talking | Faces from Text and Audio: An Autoencoder and Sequence-to-Sequence Convolutional Neural Network |
| S^3d-nerf: Single-shot Speech-driven Neural Radiance Field for High Fidelity | talking | Head Synthesis |
| TalkCLIP: | talking | Head Generation with Text-Guided Expressive Speaking Styles |
| talking | About 3D Scenes: Integration of Image and Speech Understanding in a Hybrid Distributed System |
| talking | Cars, Doubtful Users: A Population Study in Virtual Reality |
| talking | Detection in Collaborative Learning Environments |
| talking | Face |
| talking | Face Generation via Learning Semantic and Temporal Synchronous Landmarks |
| talking | Face Generation with Multilingual TTS |
| talking | Face: Using Facial Feature Detection and Image Transformations for Visual Speech |
| talking | Faces: Technologies and Applications |
| talking | Head Anime 4: Distillation for Real-Time Performance |
| talking | Head Generation with Probabilistic Audio-to-Visual Diffusion Priors |
| talking | Heads, Speech Driven Face Animation |
| talking | Heads: Detecting Humans and Recognizing Their Interactions |
| talking | Heads: Introducing the tool of 3D motion fields in the study of action |
| talking | pictures: Temporal grouping and dialog-supervised person recognition |
| talking | profile to distinguish identical twins, A |
| talking | to Machines |
| talking | With Hands 16.2M: A Large-Scale Dataset of Synchronized Body-Finger Motion and Audio for Conversational Motion Analysis and Synthesis |
| talking | with signs A simple method to detect nouns and numbers in a non-annotated signs language corpus |
| talking | With Your Hands: Scaling Hand Gestures and Recognition With CNNs |
| talking | -head Generation with Rhythmic Head Motion |
| talking | -Head Video Compression With Motion Semantic Enhancement Model |
| talking | gaussian: Structure-persistent 3d Talking Head Synthesis via Gaussian Splatting |
| TaoAvatar: Real-Time Lifelike Full-Body | talking | Avatars for Augmented Reality via 3D Gaussian Splatting |
| THInImg: Cross-modal Steganography for Presenting | talking | Heads in Images |
| Thqa: A Perceptual Quality Assessment Database for | talking | Heads |
| Three-Dimensional Facial Adaptation for MPEG-4 | talking | Heads |
| Toward Fine-Grained | talking | Face Generation |
| Towards a low bandwidth | talking | face using appearance models |
| Towards Generating Ultra-High Resolution | talking | -Face Videos with Lip synchronization |
| Towards High-fidelity 3D | talking | Avatar with Personalized Dynamic Texture |
| Towards MOOCs for Lipreading: Using Synthetic | talking | Heads to Train Humans in Lipreading at Scale |
| Unlock Pose Diversity: Accurate and Efficient Implicit Keypoint-based Spatiotemporal Diffusion for Audio-driven | talking | Portrait |
| VAST: Vivify Your | talking | Avatar via Zero-Shot Expressive Facial Style Transfer |
| Ventriloquist-Net: Leveraging Speech Cues for Emotive | talking | Head Generation |
| Viseme Classification for | talking | Head Application |
| VividTalk: One-Shot Audio-Driven | talking | Head Generation Based on 3D Hybrid Prior |
| Walking and | talking | : A bilinear approach to multi-label action recognition |
| Watching and | talking | : media content as social nexus |
| Wav2NeRF: Audio-driven realistic | talking | head generation via wavelet-based NeRF |
| What Are You | talking | About? Text-to-Image Coreference |
| You Said That?: Synthesising | talking | Faces from Audio |
185 for talking