Yeh, C.Y.,
Hwang, S.H.,
Efficient text analyser with prosody generator-driven approach for
Mandarin text-to-speech,
VISP(152), No. 6, December 2005, pp. 793-799.
DOI Link
0512
BibRef
Chouireb, F.[Fatima],
Guerti, M.[Mhania],
Towards a high quality Arabic speech synthesis system based on neural
networks and residual excited vocal tract model,
SIViP(2), No. 1, January 2008, pp. 73-87.
Springer DOI
0712
BibRef
Elfitri, I.,
Gunel, B.,
Kondoz, A.M.,
Multichannel Audio Coding Based on Analysis by Synthesis,
PIEEE(99), No. 4, April 2011, pp. 657-670.
IEEE DOI
1103
Part of 3-D display series.
BibRef
Jung, C.S.[Chi-Sang],
Joo, Y.S.[Young-Sun],
Kang, H.G.[Hong-Goo],
Waveform Interpolation-Based Speech Analysis/Synthesis for HMM-Based
TTS Systems,
SPLetters(19), No. 12, December 2012, pp. 809-812.
IEEE DOI
1212
BibRef
Carmona, J.L.,
Barker, J.,
Gomez, A.M.,
Ma, N.[Ning],
Speech Spectral Envelope Enhancement by HMM-Based Analysis/Resynthesis,
SPLetters(20), No. 6, 2013, pp. 563-566.
IEEE DOI speech enhancement
1307
BibRef
Tokuda, K.,
Nankaku, Y.,
Toda, T.,
Zen, H.,
Yamagishi, J.,
Oura, K.,
Speech Synthesis Based on Hidden Markov Models,
PIEEE(100), No. 5, May 2013, pp. 1234-1252.
IEEE DOI
1305
BibRef
Ling, Z.,
Kang, S.,
Zen, H.,
Senior, A.,
Schuster, M.,
Qian, X.,
Meng, H.,
Deng, L.,
Deep Learning for Acoustic Modeling in Parametric Speech Generation:
A systematic review of existing techniques and future trends,
SPMag(32), No. 3, May 2015, pp. 35-52.
IEEE DOI
1504
Acoustic signal detection
BibRef
Bordel, G.,
Penagarikano, M.,
Rodriguez-Fuentes, L.J.,
Alvarez, A.,
Varona, A.,
Probabilistic Kernels for Improved Text-to-Speech Alignment in Long
Audio Tracks,
SPLetters(23), No. 1, January 2016, pp. 126-129.
IEEE DOI
1601
Acoustics
BibRef
Ninh, D.K.[Duy Khanh],
Yamashita, Y.[Yoichi],
F0 Parameterization of Glottalized Tones in HMM-Based Speech Synthesis
for Hanoi Vietnamese,
IEICE(E98-D), No. 12, December 2015, pp. 2280-2289.
WWW Link.
1601
BibRef
Erro, D.,
Two-Band Radial Postfiltering in Cepstral Domain with Application to
Speech Synthesis,
SPLetters(23), No. 2, February 2016, pp. 202-206.
IEEE DOI
1602
filtering theory
BibRef
Hu, Y.J.,
Ling, Z.H.,
DBN-based Spectral Feature Representation for Statistical Parametric
Speech Synthesis,
SPLetters(23), No. 3, March 2016, pp. 321-325.
IEEE DOI
1603
belief networks
BibRef
Tsiaras, V.,
Maia, R.,
Diakoloukas, V.,
Stylianou, Y.,
Digalakis, V.,
Global Variance in Speech Synthesis With Linear Dynamical Models,
SPLetters(23), No. 8, August 2016, pp. 1057-1061.
IEEE DOI
1608
speech synthesis
BibRef
Wang, F.Z.[Fang-Zhou],
Nagano, H.[Hidehisa],
Kashino, K.[Kunio],
Igarashi, T.[Takeo],
Visualizing Video Sounds With Sound Word Animation to Enrich User
Experience,
MultMed(19), No. 2, February 2017, pp. 418-429.
IEEE DOI
1702
BibRef
Sharma, B.,
Prasanna, S.R.M.,
Enhancement of Spectral Tilt in Synthesized Speech,
SPLetters(24), No. 4, April 2017, pp. 382-386.
IEEE DOI
1704
speech enhancement
BibRef
Singh, R.[Rita],
Jiménez, A.[Abelino],
Řland, A.[Anders],
Voice disguise by mimicry: deriving statistical articulometric evidence
to evaluate claimed impersonation,
IET-Bio(6), No. 4, July 2017, pp. 282-289.
DOI Link
1707
BibRef
Lee, K.S.,
Restricted Boltzmann Machine-Based Voice Conversion for Nonparallel
Corpus,
SPLetters(24), No. 8, August 2017, pp. 1103-1107.
IEEE DOI
1708
Boltzmann machines, probability, speaker recognition,
OGI VOICES corpus,
conversion function, linear transformation,
parallel training corpus.
BibRef
Reddy, M.K.,
Rao, K.S.,
Robust Pitch Extraction Method for the HMM-Based Speech Synthesis
System,
SPLetters(24), No. 8, August 2017, pp. 1133-1137.
IEEE DOI
1708
feature extraction, hidden Markov models, speech synthesis,
wavelet transforms, CMU Arctic and Keele databases,
HMM-based speech synthesis system,
continuous wavelet transform coefficients,
hidden Markov model-based HTS, pitch estimation, pitch tracking,
robust pitch extraction method, speech representation,
BibRef
Liu, Z.C.,
Ling, Z.H.,
Dai, L.R.,
Statistical Parametric Speech Synthesis Using Generalized
Distillation Framework,
SPLetters(25), No. 5, May 2018, pp. 695-699.
IEEE DOI
1805
Fourier transforms, acoustic signal processing,
learning (artificial intelligence), recurrent neural nets,
speech synthesis
BibRef
Drugman, T.,
Huybrechts, G.,
Klimkov, V.,
Moinet, A.,
Traditional Machine Learning for Pitch Detection,
SPLetters(25), No. 11, November 2018, pp. 1745-1749.
IEEE DOI
1811
acoustic signal processing, estimation theory,
feature extraction, learning (artificial intelligence),
speech synthesis
BibRef
Arik, S.Ö.,
Jun, H.,
Diamos, G.,
Fast Spectrogram Inversion Using Multi-Head Convolutional Neural
Networks,
SPLetters(26), No. 1, January 2019, pp. 94-98.
IEEE DOI
1901
audio signal processing, feedforward neural nets, interpolation,
iterative methods, learning (artificial intelligence),
speech synthesis
BibRef
Masuyama, Y.,
Yatabe, K.,
Oikawa, Y.,
Griffin-Lim Like Phase Recovery via Alternating Direction Method of
Multipliers,
SPLetters(26), No. 1, January 2019, pp. 184-188.
IEEE DOI
1901
acoustic signal processing, iterative methods, optimisation,
subjective test, objective measure, ADMM, signal recovery,
STFT-based speech synthesis
BibRef
Kwon, O.,
Jang, I.,
Ahn, C.,
Kang, H.,
An Effective Style Token Weight Control Technique for End-to-End
Emotional Speech Synthesis,
SPLetters(26), No. 9, September 2019, pp. 1383-1387.
IEEE DOI
1909
Speech synthesis, Spectrogram, Training, Decoding,
Acoustics, Vocoders, emotion weight values
BibRef
Liu, Q.,
Jackson, P.J.B.,
Wang, W.,
A Speech Synthesis Approach for High Quality Speech Separation and
Generation,
SPLetters(26), No. 12, December 2019, pp. 1872-1876.
IEEE DOI
2001
decoding, source separation, speech coding, speech synthesis,
time-domain analysis, time-domain samples,
high quality
BibRef
Cotescu, M.,
Drugman, T.,
Huybrechts, G.,
Lorenzo-Trueba, J.,
Moinet, A.,
Voice Conversion for Whispered Speech Synthesis,
SPLetters(27), 2020, pp. 186-190.
IEEE DOI
2002
Whispered speech conversion, voice conversion (VC),
whispered text to speech (TTS)
BibRef
Aylett, M.P.,
Vinciarelli, A.,
Wester, M.,
Speech Synthesis for the Generation of Artificial Personality,
AffCom(11), No. 2, April 2020, pp. 361-372.
IEEE DOI
2006
Speech, Speech synthesis, Robots, Psychology, Hidden Markov models,
Digital signal processing, Computational modeling, Personality,
automatic personality synthesis
BibRef
Rao, M.V.A.[M.V. Achuth],
Ghosh, P.K.[Prasanta Kumar],
SFNet: A Computationally Efficient Source Filter Model Based Neural
Speech Synthesis,
SPLetters(27), 2020, pp. 1170-1174.
IEEE DOI
2007
Neural vocoder, source-filter model, computational complexity, Mel-spectrum
BibRef
Zhou, Y.,
Tian, X.,
Li, H.,
Multi-Task WaveRNN With an Integrated Architecture for Cross-Lingual
Voice Conversion,
SPLetters(27), 2020, pp. 1310-1314.
IEEE DOI
2008
Vocoders, Pipelines, Generators, Linguistics, Acoustics, Training,
Task analysis, Cross-lingual voice conversion (xVC),
integrated architecture
BibRef
Yang, J.C.[Ji-Chen],
Lin, P.[Pei],
He, Q.H.[Qian-Hua],
Constant-Q magnitude-phase coefficients extraction for synthetic speech
detection,
IET-Bio(9), No. 5, September 2020, pp. 216-221.
DOI Link
2008
BibRef
Liu, R.,
Sisman, B.,
Bao, F.,
Gao, G.,
Li, H.,
Modeling Prosodic Phrasing With Multi-Task Learning in Tacotron-Based
TTS,
SPLetters(27), 2020, pp. 1470-1474.
IEEE DOI
2009
Task analysis, Generators, Training, Speech synthesis, Decoding,
Linguistics, Data models, Tacotron, multi-task learning, prosody
BibRef
Qi, J.,
Du, J.,
Siniscalchi, S.M.,
Ma, X.,
Lee, C.,
On Mean Absolute Error for Deep Neural Network Based Vector-to-Vector
Regression,
SPLetters(27), 2020, pp. 1485-1489.
IEEE DOI
2009
Upper bound, Speech enhancement, Additive noise, Complexity theory,
Laplace equations, Neural networks, Loss measurement,
vector-to-vector regression
BibRef
Yang, S.,
Wang, Y.,
Xie, L.,
Adversarial Feature Learning and Unsupervised Clustering Based Speech
Synthesis for Found Data With Acoustic and Textual Noise,
SPLetters(27), 2020, pp. 1730-1734.
IEEE DOI
1806
Noise measurement, Decoding, Speech synthesis, Speech recognition,
Training, Speech enhancement, Acoustics, Adversarial training,
speech synthesis
BibRef
Lee, J.Y.,
Cheon, S.J.,
Choi, B.J.,
Kim, N.S.,
Memory Attention: Robust Alignment Using Gating Mechanism for
End-to-End Speech Synthesis,
SPLetters(27), 2020, pp. 2004-2008.
IEEE DOI
2012
Logic gates, Speech synthesis, Decoding, Speech recognition,
Training, Computational modeling, Memory management,
memory attention
BibRef
Zhang, Y.[You],
Jiang, F.[Fei],
Duan, Z.Y.[Zhi-Yao],
One-Class Learning Towards Synthetic Voice Spoofing Detection,
SPLetters(28), 2021, pp. 937-941.
IEEE DOI
2106
Training, Signal processing algorithms, Speech synthesis,
Feature extraction, Cepstral analysis, Support vector machines,
speaker verification
BibRef
Saeki, T.[Takaaki],
Takamichi, S.[Shinnosuke],
Saruwatari, H.[Hiroshi],
Incremental Text-to-Speech Synthesis Using Pseudo Lookahead With
Large Pretrained Language Model,
SPLetters(28), 2021, pp. 857-861.
IEEE DOI
2106
Context modeling, Training, Tuning, Speech synthesis,
Predictive models, Linguistics, Decoding,
contextual embedding
BibRef
Comanducci, L.[Luca],
Bestagini, P.[Paolo],
Tagliasacchi, M.[Marco],
Sarti, A.[Augusto],
Tubaro, S.[Stefano],
Reconstructing Speech From CNN Embeddings,
SPLetters(28), 2021, pp. 952-956.
IEEE DOI
2106
Decoding, Task analysis, Spectrogram, Feature extraction,
Image reconstruction, Training, speech recognition
BibRef
Hua, G.[Guang],
Teoh, A.B.J.[Andrew Beng Jin],
Zhang, H.J.[Hai-Jian],
Towards End-to-End Synthetic Speech Detection,
SPLetters(28), 2021, pp. 1265-1269.
IEEE DOI
2107
Feature extraction, Speech synthesis, Training,
Mel frequency cepstral coefficient, Task analysis, Standards,
end-to-end
BibRef
Cheon, S.J.[Sung Jun],
Choi, B.J.[Byoung Jin],
Kim, M.[Minchan],
Lee, H.[Hyeonseung],
Kim, N.S.[Nam Soo],
A Controllable Multi-Lingual Multi-Speaker Multi-Style Text-to-Speech
Synthesis With Multivariate Information Minimization,
SPLetters(29), 2022, pp. 55-59.
IEEE DOI
2202
Training, Upper bound, Speech synthesis, Correlation,
Mutual information, Synthesizers, Estimation, Disentanglement,
total correlation
BibRef
Bilbao, S.[Stefan],
3D Interpolation in Wave-Based Acoustic Simulation,
SPLetters(29), 2022, pp. 384-388.
IEEE DOI
2202
Interpolation, Solid modeling,
Numerical models, Mathematical models, Time-domain analysis,
wave-based simulation
BibRef
Saleem, N.[Nasir],
Gao, J.[Jiechao],
Irfan, M.[Muhammad],
Verdu, E.[Elena],
Fuente, J.P.[Javier Parra],
E2E-V2SResNet: Deep residual convolutional neural networks for
end-to-end video driven speech synthesis,
IVC(119), 2022, pp. 104389.
Elsevier DOI
2202
Video processing, E2E speech synthesis, ResNet-18, Residual CNN, Waveform CRITIC
BibRef
Sun, X.[Xiao],
Li, J.Y.[Jing-Yuan],
Tao, J.H.[Jian-Hua],
Emotional Conversation Generation Orientated Syntactically
Constrained Bidirectional-Asynchronous Framework,
AffCom(13), No. 1, January 2022, pp. 187-198.
IEEE DOI
2203
Decoding, Dictionaries, Syntactics, Sun, Computational modeling,
Detectors, Indexes, Emotional conversation generation,
affective computing
BibRef
Liu, S.G.[Shi-Guang],
Li, S.[Sijia],
Cheng, H.[Haonan],
Towards an End-to-End Visual-to-Raw-Audio Generation With GAN,
CirSysVideo(32), No. 3, March 2022, pp. 1299-1312.
IEEE DOI
2203
Videos, Visualization, Task analysis, Computational modeling,
Animation, Acoustics, Synchronization, Visual to audio, cross media,
audio-visual synchronization
BibRef
Li, C.T.[Chang-Tao],
Yang, F.[Feiran],
Yang, J.[Jun],
The Role of Long-Term Dependency in Synthetic Speech Detection,
SPLetters(29), 2022, pp. 1142-1146.
IEEE DOI
2205
Transformers, Convolution, Feature extraction, Training,
Speech synthesis, Voice activity detection,
voice anti-spoofing
BibRef
Cui, S.S.[San-Shuai],
Huang, B.Y.[Bing-Yuan],
Huang, J.W.[Ji-Wu],
Kang, X.G.[Xian-Gui],
Synthetic Speech Detection Based on Local Autoregression and Variance
Statistics,
SPLetters(29), 2022, pp. 1462-1466.
IEEE DOI
2207
Feature extraction, Speech synthesis, Standards, Filtering,
Forensics, Mel frequency cepstral coefficient, Windows,
variance statistics
BibRef
Lei, Y.[Yi],
Yang, S.[Shan],
Zhu, X.F.[Xin-Fa],
Xie, L.[Lei],
Su, D.[Dan],
Cross-Speaker Emotion Transfer Through Information Perturbation in
Emotional Speech Synthesis,
SPLetters(29), 2022, pp. 1948-1952.
IEEE DOI
2209
Timbre, Spectrogram, Perturbation methods, Generators,
Speech synthesis, Adaptation models, Acoustics, speech synthesis
BibRef
Choi, B.J.[Byoung Jin],
Jeong, M.[Myeonghun],
Lee, J.Y.[Joun Yeop],
Kim, N.S.[Nam Soo],
SNAC: Speaker-Normalized Affine Coupling Layer in Flow-Based
Architecture for Zero-Shot Multi-Speaker Text-to-Speech,
SPLetters(29), 2022, pp. 2502-2506.
IEEE DOI
2212
Hidden Markov models, Couplings, Training, Adaptation models,
Jacobian matrices, Standards, Predictive models,
zero-shot multi-speaker text-to-speech
BibRef
Choi, B.J.[Byoung Jin],
Jeong, M.[Myeonghun],
Kim, M.[Minchan],
Kim, N.S.[Nam Soo],
Variable-Length Speaker Conditioning in Flow-Based Text-to-Speech,
SPLetters(31), 2024, pp. 899-903.
IEEE DOI
2404
Training, Vectors, Couplings, Adaptation models,
Computer architecture, Speech enhancement, Codes, Speech synthesis,
zero-shot multi-speaker text-to-speech
BibRef
Chen, L.C.[Li-Chin],
Chen, P.H.[Po-Hsun],
Tsai, R.T.H.[Richard Tzong-Han],
Tsao, Y.[Yu],
EPG2S: Speech Generation and Speech Enhancement Based on
Electropalatography and Audio Signals Using Multimodal Learning,
SPLetters(29), 2022, pp. 2582-2586.
IEEE DOI
2301
Speech enhancement, Feature extraction, Noise measurement,
Spectrogram, Tongue, Decoding, Loss measurement, Speech synthesis,
speech generation
BibRef
Zhou, K.[Kun],
Sisman, B.[Berrak],
Rana, R.[Rajib],
Schuller, B.W.[Björn W.],
Li, H.Z.[Hai-Zhou],
Emotion Intensity and its Control for Emotional Voice Conversion,
AffCom(14), No. 1, January 2023, pp. 31-48.
IEEE DOI
2303
Speech recognition, Emotion recognition, Training,
Speech synthesis, Hidden Markov models, Computational modeling,
relative attribute
BibRef
Huang, B.[Bingyuan],
Cui, S.[Sanshuai],
Huang, J.W.[Ji-Wu],
Kang, X.[Xiangui],
Discriminative Frequency Information Learning for End-to-End Speech
Anti-Spoofing,
SPLetters(30), 2023, pp. 185-189.
IEEE DOI
2303
Band-pass filters, Convolution, Task analysis, Cutoff frequency,
Speech processing, Computational modeling, Robustness, ASVspoof,
speech anti-spoofing
BibRef
Zhao, W.[Wei],
Wang, Z.[Zuyi],
Xu, L.[Li],
Mandarin Text-to-Speech Front-End With Lightweight Distilled
Convolution Network,
SPLetters(30), 2023, pp. 249-253.
IEEE DOI
2303
Convolution, Bit error rate, Task analysis, Kernel,
Knowledge engineering, Training, Electrical engineering, convolution network
BibRef
Ma, K.[Kaijie],
Feng, Y.[Yifan],
Chen, B.[Beijing],
Zhao, G.Y.[Guo-Ying],
End-to-End Dual-Branch Network Towards Synthetic Speech Detection,
SPLetters(30), 2023, pp. 359-363.
IEEE DOI
2305
Forgery, Feature extraction, Finite element analysis, Training,
Speech synthesis, Task analysis, Multitasking, ASVspoof 2019 LA,
synthetic speech detection
BibRef
Mira, R.[Rodrigo],
Vougioukas, K.[Konstantinos],
Ma, P.C.[Ping-Chuan],
Petridis, S.[Stavros],
Schuller, B.W.[Björn W.],
Pantic, M.[Maja],
End-to-End Video-to-Speech Synthesis Using Generative Adversarial
Networks,
Cyber(53), No. 6, June 2023, pp. 3454-3466.
IEEE DOI
2305
Hidden Markov models, Task analysis, Spectrogram, Visualization,
Speech recognition, Predictive models, Feature extraction,
video-to-speech
BibRef
Yoon, H.C.[Hyung-Chan],
Kim, C.[Changhwan],
Um, S.[Seyun],
Yoon, H.W.[Hyun-Wook],
Kang, H.G.[Hong-Goo],
SC-CNN: Effective Speaker Conditioning Method for Zero-Shot
Multi-Speaker Text-to-Speech Systems,
SPLetters(30), 2023, pp. 593-597.
IEEE DOI
2306
Kernel, Convolution, Convolutional neural networks, Training,
Task analysis, Predictive models, Phonetics, Generalization, style transfer
BibRef
Gu, Y.W.[Ye-Wei],
Zhao, X.F.[Xian-Feng],
Yi, X.W.[Xiao-Wei],
Xiao, J.C.[Jun-Chao],
Voice Conversion Using Learnable Similarity-guided Masked Autoencoder,
IWDW22(53-67).
Springer DOI
2307
BibRef
Zhang, M.Y.[Ming-Yang],
Zhou, X.[Xuehao],
Wu, Z.Z.[Zhi-Zheng],
Li, H.Z.[Hai-Zhou],
Towards Zero-Shot Multi-Speaker Multi-Accent Text-to-Speech Synthesis,
SPLetters(30), 2023, pp. 947-951.
IEEE DOI
2308
Adaptation models, Decoding, Training, Data models, Speech synthesis,
Predictive models, Standards, Accent speech synthesis, text-to-speech
BibRef
Ly, E.[Edward],
Villegas, J.[Julián],
Cartesian Genetic Programming Parameterization in the Context of
Audio Synthesis,
SPLetters(30), 2023, pp. 1077-1081.
IEEE DOI
2309
BibRef
Mingote, V.[Victoria],
Gimeno, P.[Pablo],
Vicente, L.[Luis],
Khurana, S.[Sameer],
Laurent, A.[Antoine],
Duret, J.[Jarod],
Direct Text to Speech Translation System Using Acoustic Units,
SPLetters(30), 2023, pp. 1262-1266.
IEEE DOI
2310
BibRef
Wang, Z.C.[Zhi-Chao],
Chen, Y.Z.[Yuan-Zhe],
Xie, L.[Lei],
Tian, Q.[Qiao],
Wang, Y.P.[Yu-Ping],
LM-VC: Zero-Shot Voice Conversion via Speech Generation Based on
Language Models,
SPLetters(30), 2023, pp. 1157-1161.
IEEE DOI
2310
BibRef
van Niekerk, B.[Benjamin],
Carbonneau, M.A.[Marc-André],
Kamper, H.[Herman],
Rhythm Modeling for Voice Conversion,
SPLetters(30), 2023, pp. 1297-1301.
IEEE DOI
2310
BibRef
Zhou, K.[Kun],
Sisman, B.[Berrak],
Rana, R.[Rajib],
Schuller, B.W.[Björn W.],
Li, H.Z.[Hai-Zhou],
Speech Synthesis With Mixed Emotions,
AffCom(14), No. 4, October 2023, pp. 3120-3134.
IEEE DOI
2312
BibRef
Liu, Y.[Yan],
Wei, L.F.[Li-Fang],
Qian, X.Y.[Xin-Yuan],
Zhang, T.H.[Tian-Hao],
Chen, S.L.[Song-Lu],
Yin, X.C.[Xu-Cheng],
M3TTS: Multi-modal text-to-speech of multi-scale style control for
dubbing,
PRL(179), 2024, pp. 158-164.
Elsevier DOI
2403
Multi-modal text-to-speech, Memory network,
Expressive speech synthesis, Multi-scale style transfer
BibRef
Jeong, M.[Myeonghun],
Kim, M.[Minchan],
Lee, J.Y.[Joun Yeop],
Kim, N.S.[Nam Soo],
Efficient Parallel Audio Generation Using Group Masked Language
Modeling,
SPLetters(31), 2024, pp. 979-983.
IEEE DOI
2404
Acoustics, Iterative decoding, Computational modeling, Semantics,
Tokenization, Training, Iterative methods,
neural audio codec
BibRef
Yi, J.Y.[Jiang-Yan],
Wang, C.L.[Cheng-Long],
Tao, J.H.[Jian-Hua],
Zhang, C.Y.[Chu Yuan],
Fan, C.H.[Cun-Hang],
Tian, Z.K.[Zheng-Kun],
Ma, H.X.[Hao-Xin],
Fu, R.[Ruibo],
SceneFake:
An initial dataset and benchmarks for scene fake audio detection,
PR(152), 2024, pp. 110468.
Elsevier DOI Code:
WWW Link.
2405
Scene manipulation, Fake audio detection, Speech enhancement, SceneFake dateset
BibRef
Tan, X.[Xu],
Chen, J.W.[Jia-Wei],
Liu, H.[Haohe],
Cong, J.[Jian],
Zhang, C.[Chen],
Liu, Y.Q.[Yan-Qing],
Wang, X.[Xi],
Leng, Y.[Yichong],
Yi, Y.H.[Yuan-Hao],
He, L.[Lei],
Zhao, S.[Sheng],
Qin, T.[Tao],
Soong, F.[Frank],
Liu, T.Y.[Tie-Yan],
NaturalSpeech:
End-to-End Text-to-Speech Synthesis With Human-Level Quality,
PAMI(46), No. 6, June 2024, pp. 4234-4245.
IEEE DOI
2405
Recording, Vocoders, Decoding, Semiconductor device modeling, Guidelines,
Upper bound, Training, Text-to-speech, speech synthesis, end-to-end
BibRef
Zhou, J.[Jian],
Li, Y.[Yong],
Fan, C.H.[Cun-Hang],
Tao, L.[Liang],
Kwan, H.K.[Hon Keung],
Multi-Level Information Aggregation Based Graph Attention Networks
Towards Fake Speech Detection,
SPLetters(31), 2024, pp. 1580-1584.
IEEE DOI
2406
Voice activity detection, Representation learning, Databases,
Benchmark testing, Feature extraction, Boosting,
multi-level information aggregation
BibRef
Cao, D.Y.[Dan-Yang],
Zhang, Z.Y.[Ze-Yi],
Zhang, J.[Jinyuan],
NeuralVC: Any-to-Any Voice Conversion Using Neural Networks Decoder
for Real-Time Voice Conversion,
SPLetters(31), 2024, pp. 2070-2074.
IEEE DOI
2408
Decoding, Feature extraction, Training, Real-time systems,
Speech synthesis, Speech recognition, Speech enhancement, voice conversion
BibRef
Valin, J.M.[Jean-Marc],
Mustafa, A.[Ahmed],
Büthe, J.[Jan],
Very Low Complexity Speech Synthesis Using Framewise Autoregressive
GAN (FARGAN) With Pitch Prediction,
SPLetters(31), 2024, pp. 2115-2119.
IEEE DOI
2409
Vocoders, Training, Complexity theory, Predictive models,
Computational modeling, Logic gates, DDSP
BibRef
Xue, J.[Jun],
Fan, C.[Cunhang],
Yi, J.[Jiangyan],
Zhou, J.[Jian],
Lv, Z.[Zhao],
Dynamic Ensemble Teacher-Student Distillation Framework for
Light-Weight Fake Audio Detection,
SPLetters(31), 2024, pp. 2305-2309.
IEEE DOI
2410
Training, Predictive models, Feature extraction,
Computational modeling, Performance evaluation, Neural networks,
fake audio detection
BibRef
Cheng, X.Y.[Xiang-Yu],
Wang, Y.F.[Yao-Fei],
Liu, C.[Chang],
Hu, D.H.[Dong-Hui],
Su, Z.[Zhaopin],
HiFi-GANw: Watermarked Speech Synthesis via Fine-Tuning of HiFi-GAN,
SPLetters(31), 2024, pp. 2440-2444.
IEEE DOI
2410
Watermarking, Speech synthesis, Vocoders, Spectrogram, Generators,
Acoustics, Generative adversarial networks, Audio watermarking,
speech synthesis
BibRef
Zhang, Y.M.[Yi-Ming],
Du, R.[Ruoyi],
Tan, Z.H.[Zheng-Hua],
Wang, W.W.[Wen-Wu],
Ma, Z.Y.[Zhan-Yu],
Generating Accurate and Diverse Audio Captions Through Variational
Autoencoder Framework,
SPLetters(31), 2024, pp. 2520-2524.
IEEE DOI
2410
Measurement, Decoding, Maximum likelihood estimation, Semantics,
Mathematical models, Training, Diversity reception,
variational autoencoder
BibRef
Huang, W.C.[Wen-Chin],
Wu, Y.C.[Yi-Chiao],
Toda, T.[Tomoki],
Multi-Speaker Text-to-Speech Training With Speaker Anonymized Data,
SPLetters(31), 2024, pp. 2995-2999.
IEEE DOI
2411
Data privacy, Information integrity, Information filtering,
Data models, Training, Training data, Measurement, Acoustics, text-to-speech
BibRef
Lee, J.[Jaeuk],
Shin, Y.[Yoonsoo],
Chang, J.H.[Joon-Hyuk],
Differentiable Duration Refinement Using Internal Division for
Non-Autoregressive Text-to-Speech,
SPLetters(31), 2024, pp. 3154-3158.
IEEE DOI
2411
Hidden Markov models, Spectrogram, Training, Accuracy, Transformers,
Predictive models, Decoding, Text to speech, Reactive power, text-to-speech
BibRef
Cuccovillo, L.[Luca],
Gerhardt, M.[Milica],
Aichroth, P.[Patrick],
Audio Transformer for Synthetic Speech Detection via Multi-Formant
Analysis,
WMF24(4409-4417)
IEEE DOI
2410
Phonetics, Transformers, Multitasking, Trajectory,
Synthetic Speech Detection, Audio Transformer
BibRef
Cong, G.X.[Gao-Xiang],
Li, L.[Liang],
Qi, Y.[Yuankai],
Zha, Z.J.[Zheng-Jun],
Wu, Q.[Qi],
Wang, W.Y.[Wen-Yu],
Jiang, B.[Bin],
Yang, M.H.[Ming-Hsuan],
Huang, Q.M.[Qing-Ming],
Learning to Dub Movies via Hierarchical Prosody Models,
CVPR23(14687-14697)
IEEE DOI
2309
BibRef
Hsu, W.N.[Wei-Ning],
Remez, T.[Tal],
Shi, B.[Bowen],
Donley, J.[Jacob],
Adi, Y.[Yossi],
ReVISE: Self-Supervised Speech Resynthesis with Visual Input for
Universal and Generalized Speech Regeneration,
CVPR23(18796-18806)
IEEE DOI
2309
BibRef
Sun, C.Z.[Cheng-Zhe],
Jia, S.[Shan],
Hou, S.W.[Shu-Wei],
Lyu, S.W.[Si-Wei],
AI-Synthesized Voice Detection Using Neural Vocoder Artifacts,
WMF23(904-912)
IEEE DOI
2309
BibRef
Noufi, C.[Camille],
May, L.[Lloyd],
Berger, J.[Jonathan],
The Role of Vocal Persona in Natural and Synthesized Speech,
FG23(1-4)
IEEE DOI
2303
Human computer interaction, Face recognition, Speech recognition,
Gesture recognition, Ecology, Interviews
BibRef
Hwang, I.S.[In-Sun],
Lee, S.H.[Sang-Hoon],
Lee, S.W.[Seong-Whan],
StyleVC: Non-Parallel Voice Conversion with Adversarial Style
Generalization,
ICPR22(23-30)
IEEE DOI
2212
Training, Feature extraction, Decoding
BibRef
Wang, W.B.[Wen-Bin],
Song, Y.[Yang],
Jha, S.[Sanjay],
Autolv: Automatic Lecture Video Generator,
ICIP22(1086-1090)
IEEE DOI
2211
Measurement, Adaptation models, Synthesizers, Generators,
Speech synthesis, speech synthesis, talking-head generation, e-learning
BibRef
Borzě, S.[Stefano],
Giudice, O.[Oliver],
Stanco, F.[Filippo],
Allegra, D.[Dario],
Is synthetic voice detection research going into the right direction?,
WMF22(71-80)
IEEE DOI
2210
Deep learning, Training, Image color analysis, Forensics,
Conferences, Bit rate
BibRef
Hassid, M.[Michael],
Ramanovich, M.T.[Michelle Tadmor],
Shillingford, B.[Brendan],
Wang, M.[Miaosen],
Jia, Y.[Ye],
Remez, T.[Tal],
More than Words: In-the-Wild Visually-Driven Prosody for
Text-to-Speech,
CVPR22(10577-10587)
IEEE DOI
2210
Machine learning, Benchmark testing, Robustness,
Synchronization, Vision + X,
Vision + language
BibRef
Kwak, I.Y.[Il-Youp],
Kwag, S.[Sungsu],
Lee, J.[Junhee],
Huh, J.H.[Jun Ho],
Lee, C.H.[Choong-Hoon],
Jeon, Y.B.[Young-Bae],
Hwang, J.H.[Jeong-Hwan],
Yoon, J.W.[Ji Won],
ResMax: Detecting Voice Spoofing Attacks with Residual Network and
Max Feature Map,
ICPR21(4837-4844)
IEEE DOI
2105
Deep learning, Error analysis, Transforms, Feature extraction,
Complexity theory, Residual neural networks,
voice presentation attack detection
BibRef
Wang, D.H.[Dong-Hua],
Wang, R.[Rangding],
Dong, L.[Li],
Yan, D.[Diqun],
Ren, Y.M.[Yi-Ming],
Efficient Generation of Speech Adversarial Examples with Generative
Model,
IWDW20(251-264).
Springer DOI
2103
BibRef
Zhou, H.,
Liu, Z.,
Xu, X.,
Luo, P.,
Wang, X.,
Vision-Infused Deep Audio Inpainting,
ICCV19(283-292)
IEEE DOI
2004
Code, Inpainting.
WWW Link. audio signal processing, audio-visual systems, image restoration,
image segmentation, multimodality perception,
BibRef
Bailer, W.[Werner],
Wijnants, M.[Maarten],
Lievens, H.[Hendrik],
Claes, S.[Sandy],
Multimedia Analytics Challenges and Opportunities for Creating
Interactive Radio Content,
MMMod20(II:375-387).
Springer DOI
2003
BibRef
Huang, T.[Ting],
Wang, H.X.[Hong-Xia],
Chen, Y.[Yi],
He, P.S.[Pei-Song],
GRU-SVM Model for Synthetic Speech Detection,
IWDW19(115-125).
Springer DOI
2003
BibRef
Wong, A.,
Xu, A.,
Dudek, G.,
Investigating Trust Factors in Human-Robot Shared Control:
Implicit Gender Bias Around Robot Voice,
CRV19(195-200)
IEEE DOI
1908
Robots, Measurement, Drones,
Graphical user interfaces, Uncertainty, Psychology, trust,
gender bias
BibRef
Xiao, L.,
Wang, Z.,
Dense Convolutional Recurrent Neural Network for Generalized Speech
Animation,
ICPR18(633-638)
IEEE DOI
1812
Feature extraction, Animation, Acoustics, Decoding, Visualization,
Logic gates, Hidden Markov models
BibRef
Shah, N.J.[Nirmesh J.],
Patil, H.A.[Hemant A.],
Analysis of Features and Metrics for Alignment in Text-Dependent Voice
Conversion,
PReMI17(299-307).
Springer DOI
1711
BibRef
Rybárová, R.,
Drozd, I.,
Rozinaj, G.,
GUI for interactive speech synthesis,
WSSIP16(1-4)
IEEE DOI
1608
XML
BibRef
Coto-Jiménez, M.[Marvin],
Goddard-Close, J.[John],
LSTM Deep Neural Networks Postfiltering for Improving the Quality of
Synthetic Voices,
MCPR16(280-289).
Springer DOI
1608
BibRef
Vasek, M.,
Rozinaj, G.,
Rybárová, R.,
Letter-To-Sound conversion for speech synthesizer,
WSSIP16(1-4)
IEEE DOI
1608
speech processing
BibRef
Rybarová, R.,
del Corral, G.,
Rozinaj, G.,
Diphone spanish text-to-speech synthesizer,
WSSIP15(121-124)
IEEE DOI
1603
natural language processing
BibRef
Verma, R.,
Sarkar, P.,
Rao, K.S.,
Conversion of neutral speech to storytelling style speech,
ICAPR15(1-6)
IEEE DOI
1511
natural language processing
BibRef
Narendra, N.P.,
Rao, K.S.[K. Sreenivasa],
Optimal residual frame based source modeling for HMM-based speech
synthesis,
ICAPR15(1-5)
IEEE DOI
1511
decision trees
BibRef
Wang, Y.[Yang],
Tao, J.H.[Jian-Hua],
Yang, M.H.[Ming-Hao],
Li, Y.[Ya],
Extended Decision Tree with or Relationship for HMM-Based Speech
Synthesis,
ACPR13(225-229)
IEEE DOI
1408
decision trees
BibRef
Gao, L.[Lu],
Yu, H.Z.[Hong-Zhi],
Zhang, J.H.[Jins-Huang],
Fang, H.P.[Hua-Ping],
Research on HMM_based speech synthesis for Lhasa dialect,
IASP11(429-433).
IEEE DOI
1112
BibRef
Chakraborty, R.[Rupayan],
Garain, U.[Utpal],
Role of Synthetically Generated Samples on Speech Recognition in a
Resource-Scarce Language,
ICPR10(1618-1621).
IEEE DOI
1008
BibRef
Rao, K.S.[K. Sreenivasa],
Maity, S.[Sudhamay],
Taru, A.[Amol],
Koolagudi, S.G.[Shashidhar G.],
Unit Selection Using Linguistic, Prosodic and Spectral Distance for
Developing Text-to-Speech System in Hindi,
PReMI09(531-536).
Springer DOI
0912
BibRef
Bahrampour, A.[Anvar],
Barkhoda, W.[Wafa],
Azami, B.Z.[Bahram Zahir],
Implementation of Three Text to Speech Systems for Kurdish Language,
CIARP09(321-328).
Springer DOI
0911
BibRef
Shirbahadurkar, S.D.,
Bormane, D.S.,
Marathi Language Speech Synthesizer Using Concatenative Synthesis
Strategy (Spoken in Maharashtra, India),
ICMV09(181-185).
IEEE DOI
0912
BibRef
Tucková, J.[Jana],
Holub, J.[Jan],
Dubeda, T.[Tomá],
Technical and Phonetic Aspects of Speech Quality Assessment:
The Case of Prosody Synthesis,
COST08(126-132).
Springer DOI
0810
BibRef
Bauer, D.[Dominik],
Kannampuzha, J.[Jim],
Kröger, B.J.[Bernd J.],
Articulatory Speech Re-synthesis:
Profiting from Natural Acoustic Speech Data,
COST08(344-355).
Springer DOI
0810
BibRef
Gu, H.Y.[Hung-Yan],
Cai, C.L.[Chen-Lin],
Cai, S.F.[Song-Fong],
An HNM-Based Speaker-Nonspecific Timbre Transformation Scheme for
Speech Synthesis,
CISP09(1-5).
IEEE DOI
0910
BibRef
Chapter on New Unsorted Entries, and Other Miscellaneous Papers continues in
Speaker Verification, Speaker Identification .