26.1.13.2 Speech Recognition, Neural Networks, CNN

Chapter Contents (Back)
Speech. Neural Networks.

Wu, J.X.[Jian-Xiong], Chan, C.[Chorkin],
Isolated word recognition by neural network models with cross-correlation coefficients for speech dynamics,
PAMI(15), No. 11, November 1993, pp. 1174-1185.
IEEE DOI 0401
BibRef

Chen, W.Y.[Wen-Yuan], Liao, Y.F.[Yuan-Fu], Chen, S.H.[Sin-Horng],
Speech recognition with hierarchical recurrent neural networks,
PR(28), No. 6, June 1995, pp. 795-805.
Elsevier DOI 0401
BibRef

Lee, T.[Tan], Ching, P.C., Chan, L.W.[Lai-Wan],
Isolated word recognition using modular recurrent neural networks,
PR(31), No. 6, June 1998, pp. 751-760.
Elsevier DOI 0401
BibRef

Stavrakoudis, D.G., Theocharis, J.B.,
Pipelined Recurrent Fuzzy Neural Networks for Nonlinear Adaptive Speech Prediction,
SMC-B(37), No. 5, October 2007, pp. 1305-1320.
IEEE DOI 0711
BibRef

Kay, S.,
A New Approach to Fourier Synthesis With Application to Neural Encoding and Speech Classification,
SPLetters(17), No. 10, October 2010, pp. 855-858.
IEEE DOI 1008
BibRef

Kay, S.,
A New Proof of the Neyman-Pearson Theorem Using the EEF and the Vindication of Sir R. Fisher,
SPLetters(19), No. 8, August 2012, pp. 451-454.
IEEE DOI 1208
BibRef

Scanzio, S.[Stefano], Cumani, S.[Sandro], Gemello, R.[Roberto], Mana, F.[Franco], Laface, P.,
Parallel implementation of Artificial Neural Network training for speech recognition,
PRL(31), No. 11, 1 August 2010, pp. 1302-1309.
Elsevier DOI 1008
Artificial Neural Network; Block Back-propagation; Focused Attention Back-Propagation; GPU; CUDA; Fast Training BibRef

Siniscalchi, S.M., Yu, D.[Dong], Deng, L.[Li], Lee, C.H.[Chin-Hui],
Speech Recognition Using Long-Span Temporal Patterns in a Deep Network Model,
SPLetters(20), No. 3, March 2013, pp. 201-204.
IEEE DOI 1303
BibRef

Hutchinson, B.[Brian], Deng, L.[Li], Yu, D.[Dong],
Tensor Deep Stacking Networks,
PAMI(35), No. 8, 2013, pp. 1944-1957.
IEEE DOI 1307
Closed-form solutions; Deep learning; handwriting image classification; BibRef

Bengio, Y.[Yoshua], Courville, A.[Aaron], Vincent, P.[Pascal],
Representation Learning: A Review and New Perspectives,
PAMI(35), No. 8, 2013, pp. 1798-1828.
IEEE DOI Survey, Learning. 1307
Neural networks; Speech recognition; Boltzmann machine; Deep learning; representation learning; unsupervised learning BibRef

Swietojanski, P., Ghoshal, A., Renals, S.,
Convolutional Neural Networks for Distant Speech Recognition,
SPLetters(21), No. 9, September 2014, pp. 1120-1124.
IEEE DOI 1406
Acoustics BibRef

Espi, M.[Miquel], Fujimoto, M.[Masakiyo], Nakatani, T.[Tomohiro],
Acoustic Event Detection in Speech Overlapping Scenarios Based on High-Resolution Spectral Input and Deep Learning,
IEICE(E98-D), No. 10, October 2015, pp. 1799-1807.
WWW Link. 1511
BibRef

Richardson, F., Reynolds, D., Dehak, N.,
Deep Neural Network Approaches to Speaker and Language Recognition,
SPLetters(22), No. 10, October 2015, pp. 1671-1675.
IEEE DOI 1506
feature extraction BibRef

Trentin, E.[Edmondo],
Maximum-likelihood normalization of features increases the robustness of neural-based spoken human-computer interaction,
PRL(66), No. 1, 2015, pp. 71-80.
Elsevier DOI 1511
Feature normalization BibRef

Lee, H.Y., Cho, J.W., Kim, M., Park, H.M.,
DNN-Based Feature Enhancement Using DOA-Constrained ICA for Robust Speech Recognition,
SPLetters(23), No. 8, August 2016, pp. 1091-1095.
IEEE DOI 1608
direction-of-arrival estimation BibRef

Sangeetha, J., Jothilakshmi, S.,
Automatic continuous speech recogniser for Dravidian languages using the auto associative neural network,
IJCVR(6), No. 1-2, 2016, pp. 113-126.
DOI Link 1601
BibRef

Fredes, J., Novoa, J., King, S., Stern, R.M., Yoma, N.B.,
Locally Normalized Filter Banks Applied to Deep Neural-Network-Based Robust Speech Recognition,
SPLetters(24), No. 4, April 2017, pp. 377-381.
IEEE DOI 1704
cepstral analysis BibRef

Shahnawazuddin, S., Sinha, R., Pradhan, G.,
Pitch-Normalized Acoustic Features for Robust Children's Speech Recognition,
SPLetters(24), No. 8, August 2017, pp. 1128-1132.
IEEE DOI 1708
feature extraction, spectral analysis, speech recognition, time-frequency analysis, SMAC features, adaptive-cepstral truncation, additive noise, spectral smoothening approach, Additive noise, Hidden Markov models, Mel frequency cepstral coefficient, Robustness, Speech, Automatic speech recognition (ASR) BibRef

Gosztolya, G.[Gábor], Tóth, L.[László],
DNN-Based Feature Extraction for Conflict Intensity Estimation From Speech,
SPLetters(24), No. 12, December 2017, pp. 1837-1841.
IEEE DOI 1712
estimation theory, feature extraction, greedy algorithms, neural nets, speech processing, feature extraction BibRef

Gosztolya, G.[Gábor], Bánhalmi, A.[András], Tóth, L.[László],
Using One-Class Classification Techniques in the Anti-phoneme Problem,
IbPRIA09(433-440).
Springer DOI 0906
BibRef

Kim, M.[Minkyoung], Kim, H.[Harksoo],
Integrated neural network model for identifying speech acts, predicators, and sentiments of dialogue utterances,
PRL(101), No. 1, 2018, pp. 1-5.
Elsevier DOI 1801
Integrated intention identification model BibRef

Affonso, E.T., Rosa, R.L., Rodríguez, D.Z.,
Speech Quality Assessment Over Lossy Transmission Channels Using Deep Belief Networks,
SPLetters(25), No. 1, January 2018, pp. 70-74.
IEEE DOI 1801
IP networks, belief networks, feature extraction, radial basis function networks, speech coding, speech processing, speech quality assessment BibRef

Kim, H.G., Lee, H., Kim, G., Oh, S.H., Lee, S.Y.,
Rescoring of N-Best Hypotheses Using Top-Down Selective Attention for Automatic Speech Recognition,
SPLetters(25), No. 2, February 2018, pp. 199-203.
IEEE DOI 1802
neural nets, speech recognition, Aurora4 speech recognition tasks, top-down selective attention BibRef

Kaushik, L., Sangwan, A., Hansen, J.H.L.,
Speech Activity Detection in Naturalistic Audio Environments: Fearless Steps Apollo Corpus,
SPLetters(25), No. 9, September 2018, pp. 1290-1294.
IEEE DOI 1809
acoustic noise, acoustic signal detection, audio recording, feedforward neural nets, learning (artificial intelligence), speech activity detection (SAD) BibRef

Heracleous, P.[Panikos], Even, J.[Jani], Sugaya, F.[Fumiaki], Hashimoto, M.[Masayuki], Yoneyama, A.[Akio],
Exploiting alternative acoustic sensors for improved noise robustness in speech communication,
PRL(112), 2018, pp. 191-197.
Elsevier DOI 1809
Body-conducted sensors, Hidden Markov models (HMMs), Automatic speech recognition, Speech intelligibility, Fusion, Noise robustness BibRef

Takahashi, N.[Naoya], Gygli, M.[Michael], Van Gool, L.J.[Luc J.],
AENet: Learning Deep Audio Features for Video Analysis,
MultMed(20), No. 3, March 2018, pp. 513-524.
IEEE DOI 1802
Feature extraction, Hidden Markov models, Mel frequency cepstral coefficient, Network architecture, Speech, large input field BibRef

Cho, B.J., Lee, J., Park, H.,
A Beamforming Algorithm Based on Maximum Likelihood of a Complex Gaussian Distribution With Time-Varying Variances for Robust Speech Recognition,
SPLetters(26), No. 9, September 2019, pp. 1398-1402.
IEEE DOI 1909
Covariance matrices, Array signal processing, Speech recognition, Maximum likelihood estimation, Artificial neural networks, robust speech recognition BibRef

Gundogdu, B., Yusuf, B., Saraclar, M.,
Generative RNNs for OOV Keyword Search,
SPLetters(26), No. 1, January 2019, pp. 124-128.
IEEE DOI 1901
learning (artificial intelligence), query processing, recurrent neural nets, search problems, speech recognition, recurrent neural networks BibRef

Seshadri, S., Räsänen, O.,
SylNet: An Adaptable End-to-End Syllable Count Estimator for Speech,
SPLetters(26), No. 9, September 2019, pp. 1359-1363.
IEEE DOI 1909
estimation theory, learning (artificial intelligence), natural language processing, neural net architecture, speech processing BibRef

Last, P., Engelbrecht, H.A., Kamper, H.,
Unsupervised Feature Learning for Speech Using Correspondence and Siamese Networks,
SPLetters(27), 2020, pp. 421-425.
IEEE DOI 2004
Acoustics, Training, Feature extraction, Speech processing, Standards, Data models, Unsupervised learning, zero-resource speech processing BibRef

John Wesley, R., Nayeemulla Khan, A., Shahina, A.,
Phoneme classification in reconstructed phase space with convolutional neural networks,
PRL(135), 2020, pp. 299-306.
Elsevier DOI 2006
Reconstructed phase space, Time-delay embedding, TIMIT phoneme classification, Convolutional neural network, Phase space reconstruction BibRef

Phan, H., McLoughlin, I.V., Pham, L., Chén, O.Y., Koch, P., de Vos, M., Mertins, A.,
Improving GANs for Speech Enhancement,
SPLetters(27), 2020, pp. 1700-1704.
IEEE DOI 1806
Generators, Noise measurement, Speech enhancement, Convolution, Decoding, Task analysis, Speech enhancement, DSEGAN BibRef

Wei, W.[Wei], Wang, Z.[Zanbo], Mao, X.L.[Xian-Ling], Zhou, G.Y.[Guang-You], Zhou, P.[Pan], Jiang, S.[Sheng],
Position-aware self-attention based neural sequence labeling,
PR(110), 2021, pp. 107636.
Elsevier DOI 2011
Sequence labeling, Self-attention, Discrete context dependency BibRef

Gu, R.Z.[Rong-Zhi], Zhang, S.X.[Shi-Xiong], Zou, Y.X.[Yue-Xian], Yu, D.[Dong],
Complex Neural Spatial Filter: Enhancing Multi-Channel Target Speech Separation in Complex Domain,
SPLetters(28), 2021, pp. 1370-1374.
IEEE DOI 2107
Customer relationship management, Spectrogram, Training, Task analysis, Supervised learning, Speech enhancement, MVDR BibRef

Li, Y.X.[Yan-Xiong], Wang, W.[Wucheng], Liu, M.[Mingle], Jiang, Z.J.[Zhong-Jie], He, Q.H.[Qian-Hua],
Speaker Clustering by Co-Optimizing Deep Representation Learning and Cluster Estimation,
MultMed(23), 2021, pp. 3377-3387.
IEEE DOI 2109
Estimation, Feature extraction, Clustering methods, Clustering algorithms, Decoding, Neural networks, audio document analysis BibRef

Esmaeilpour, M.[Mohammad], Chaalia, N.[Nourhene], Cardinal, P.[Patrick],
RSD-GAN: Regularized Sobolev Defense GAN Against Speech-to-Text Adversarial Attacks,
SPLetters(29), 2022, pp. 1998-2002.
IEEE DOI 2210
Generative adversarial networks, Training, Perturbation methods, Signal processing algorithms, Generators, Optimization, speech adversarial attack BibRef

Tian, J.C.[Jin-Chuan], Yu, J.W.[Jian-Wei], Weng, C.[Chao], Zou, Y.X.[Yue-Xian], Yu, D.[Dong],
Improving Mandarin End-to-End Speech Recognition With Word N-Gram Language Model,
SPLetters(29), 2022, pp. 812-816.
IEEE DOI 2204
Decoding, Lattices, Chaos, Artificial neural networks, Vocabulary, Transducers, Training, Speech recognition, language model BibRef

Mai, S.J.[Si-Jie], Hu, H.F.[Hai-Feng], Xing, S.L.[Song-Long],
A Unimodal Representation Learning and Recurrent Decomposition Fusion Structure for Utterance-Level Multimodal Embedding Learning,
MultMed(24), 2022, pp. 2488-2501.
IEEE DOI 2205
Feature extraction, Logic gates, Acoustics, Uniform resource locators, Data mining, Tensors, recurrent decomposition fusion network BibRef

Yang, R.[Runyan], Cheng, G.F.[Gao-Feng], Zhang, P.Y.[Peng-Yuan], Yan, Y.H.[Yong-Hong],
An E2E-ASR-Based Iteratively-Trained Timestamp Estimator,
SPLetters(29), 2022, pp. 1654-1658.
IEEE DOI 2208
Training, Hidden Markov models, Acoustics, Task analysis, Decoding, Neural networks, Automatic speech recognition, end-to-end, text-to-speech alignment BibRef

Muralikrishna, H., Aroor Dinesh, D.[Dileep],
Spoken language identification in unseen channel conditions using modified within-sample similarity loss,
PRL(158), 2022, pp. 16-23.
Elsevier DOI 2205
Spoken language identification, Unseen channel condition, Channel-mismatch, Domain-mismatch, Deep learning, Within-sample similarity loss BibRef

Nasir, M.[Md], Baucom, B.[Brian], Bryan, C.[Craig], Narayanan, S.[Shrikanth], Georgiou, P.[Panayiotis],
Modeling Vocal Entrainment in Conversational Speech Using Deep Unsupervised Learning,
AffCom(13), No. 3, July 2022, pp. 1651-1663.
IEEE DOI 2209
Medical treatment, Encoding, Feature extraction, Training, Signal processing, Neural networks, Computational modeling, interaction BibRef

Lian, Z.[Zheng], Chen, L.[Lan], Sun, L.[Licai], Liu, B.[Bin], Tao, J.H.[Jian-Hua],
GCNet: Graph Completion Network for Incomplete Multimodal Learning in Conversation,
PAMI(45), No. 7, July 2023, pp. 8419-8432.
IEEE DOI 2306
Oral communication, Correlation, Data models, Task analysis, Feature extraction, Tensors, Benchmark testing, temporal-sensitive modeling BibRef

Sun, H.R.[Hao-Ran], Wang, D.[Dong], Li, L.[Lantian], Chen, C.[Chen], Zheng, T.F.[Thomas F.],
Random Cycle Loss and Its Application to Voice Conversion,
PAMI(45), No. 8, August 2023, pp. 10331-10345.
IEEE DOI 2307
Codes, Speech coding, Speech recognition, Task analysis, Probabilistic logic, Mathematical models, Analytical models, voice conversion BibRef

Li, L.[Linhao], Wang, A.[Ao], Xu, M.[Ming], Dong, Y.F.[Yong-Feng], Li, X.[Xin],
Abductive natural language inference by interactive model with structural loss,
PRL(177), 2024, pp. 82-88.
Elsevier DOI 2401
Natural language inference, Abductive inference, Deep neural network, BiLSTM, Pretrained model(RoBERTa) BibRef

Wang, Q.Q.[Qiong-Qiong], Lee, K.A.[Kong Aik],
Cosine Scoring With Uncertainty for Neural Speaker Embedding,
SPLetters(31), 2024, pp. 845-849.
IEEE DOI 2404
Uncertainty, Vectors, Speech recognition, Neural networks, Measurement uncertainty, Training, Speaker recognition, uncertainty propagation BibRef

Singh, S.[Shubhr], Steinmetz, C.J.[Christian J.], Benetos, E.[Emmanouil], Phan, H.[Huy], Stowell, D.[Dan],
ATGNN: Audio Tagging Graph Neural Network,
SPLetters(31), 2024, pp. 825-829.
IEEE DOI 2404
Spectrogram, Tagging, Correlation, Convolution, Transformers, Training, Feature extraction, Audio tagging, graph neural networks, computational sound scene analysis BibRef

Wang, S.[Sijie], Ni, L.[Lin], Zhang, Z.[Zeyu], Li, X.X.[Xiao-Xuan], Zheng, X.[Xianda], Liu, J.[Jiamou],
Multimodal prediction of student performance: A fusion of signed graph neural networks and large language models,
PRL(181), 2024, pp. 1-8.
Elsevier DOI 2405
Signed network, Graph representations learning, Natural language processing, Multimodal BibRef

Song, Y.H.[Yi-Hua], Guo, L.[Lei], Man, M.[Menghua], Wu, Y.X.[You-Xi],
The spiking neural network based on fMRI for speech recognition,
PR(155), 2024, pp. 110672.
Elsevier DOI 2408
functional Magnetic Resonance Imaging, Functional brain network, Spiking neural network, Neural information transmission BibRef

Ma, D.[Duo], Yue, X.H.[Xiang-Hu], Ao, J.[Junyi], Gao, X.X.[Xiao-Xue], Li, H.Z.[Hai-Zhou],
Text-Guided HuBERT: Self-Supervised Speech Pre-Training via Generative Adversarial Networks,
SPLetters(31), 2024, pp. 2055-2059.
IEEE DOI 2408
Speech enhancement, Data models, Generative adversarial networks, Transformers, Training, Task analysis, Phonetics, speech representation BibRef

Kim, S.S.[Sung-Soo], Lee, D.[Dongjune], Kang, J.Y.[Ju Yeon], Jeong, M.[Myeonghun], Kim, N.S.[Nam Soo],
Sampling-Based Pruned Knowledge Distillation for Training Lightweight RNN-T,
SPLetters(32), 2025, pp. 631-635.
IEEE DOI 2502
Lattices, Computational modeling, Training, Speech recognition, Memory management, Complexity theory, Vectors, speech recognition BibRef

Lee, E.[Eunkyun], Chae, J.[Jongwook], Park, S.[Sooyoung], Shin, J.W.[Jong Won],
R3VQ: Redundancy-Reduced Residual Vector Quantization for Low-Bitrate Neural Speech Coding,
SPLetters(33), 2026, pp. 693-697.
IEEE DOI 2602
Training, Decoding, Speech coding, Vectors, Vector quantization, Speech codecs, Receivers, Codes, Bit rate, Neural networks, soundstream BibRef

Burchi, M.[Maxime], Timofte, R.[Radu],
Audio-Visual Efficient Conformer for Robust Speech Recognition,
WACV23(2257-2266)
IEEE DOI 2302
Training, Visualization, Error analysis, Lips, Working environment noise, Neural networks, Vision + language and/or other modalities BibRef

Aitoulghazi, O.[Omar], Jaafari, A.[Ahmed], Mourhir, A.[Asmaa],
DarSpeech: An Automatic Speech Recognition System for the Moroccan Dialect,
ISCV22(1-6)
IEEE DOI 2208
Error analysis, Web and internet services, Customer relationship management, Organizations, Deep Speech 2 BibRef

Zhai, M.E.[Meng-En], Dong, L.H.[Li-Hong], Qin, Y.[Yi], Yu, F.F.[Fei-Fan],
The Research of Chain Model Based on CNN-TDNNF in Yulin Dialect Speech Recognition,
ICIVC22(883-888)
IEEE DOI 2301
Training, Adaptation models, Perturbation methods, Computational modeling, Neural networks, Time series analysis, dialect speech BibRef

Vedvyasan, K.[Kishore], Nathwani, K.[Karan], Hegde, R.M.[Rajesh M.],
Group Delay based Methods for Detection and Recognition of Whispered Speech,
ICPR22(499-505)
IEEE DOI 2212
Voice activity detection, Smoothing methods, Art, Databases, Cepstral analysis, Surveillance, Neural networks, LP and Whisper Detection BibRef

Toufa, A.S.[Anastasia-Sotiria], Kotropoulos, C.[Constantine],
Digit Recognition Applied to Reconstructed Audio Signals Using Deep Learning,
ICPR21(3050-3057)
IEEE DOI 2105
Support vector machines, Pipelines, Neural networks, Generative adversarial networks, Audio recording, Generators, Signal reconstruction BibRef

Chakraborty, J.[Jaybrata], Chakraborty, B.[Bappaditya], Bhattacharya, U.[Ujjwal],
Dense Recognition of Spoken Languages,
ICPR21(9674-9681)
IEEE DOI 2105
Training, TV, Data preprocessing, Deep architecture, Speech recognition, Network architecture, Linguistics BibRef

Ghezaiel, W.[Wajdi], Brun, L.[Luc], LÉZORAY, O.[Olivier],
Hybrid Network For End-To-End Text-Independent Speaker Identification,
ICPR21(2352-2359)
IEEE DOI 2105
Wavelet transforms, Training, Scattering, Training data, Speech recognition, Speaker recognition BibRef

Zhou, P.L.[Pei-Lin], Huang, Z.Q.[Zhi-Qi], Liu, F.L.[Feng-Lin], Zou, Y.X.[Yue-Xian],
PIN: A Novel Parallel Interactive Network for Spoken Language Understanding,
ICPR21(2950-2957)
IEEE DOI 2105
Recurrent neural networks, Correlation, Fuses, Bit error rate, Filling, Pins BibRef

Zhu, B.L.[Bao-Luo], Chen, X.B.[Xiao-Bing], Chen, T.Y.[Tai-Yue], Zhu, J.R.[Jun-Rui],
Experiment Research on Mobile Terminal Image Scene Recognition Based on optimization,
CVIDL20(70-75)
IEEE DOI 2102
convolutional neural nets, entropy, feature extraction, image recognition, learning (artificial intelligence), Tensor Flow Life BibRef

Wang, P.,
Research and Design of Smart Home Speech Recognition System Based on Deep Learning,
CVIDL20(218-221)
IEEE DOI 2102
belief networks, feature extraction, hidden Markov models, home automation, Internet, learning (artificial intelligence), acoustic feature extraction BibRef

Wang, L.,
A Speech Content Retrieval Model Based on Integrated Neural Network for Natural Language Description,
CVIDL20(532-535)
IEEE DOI 2102
audio signal processing, content-based retrieval, convolutional neural nets, feature extraction, keyword embedding BibRef

Scharenborg, O.[Odette], van der Gouw, N.[Nikki], Larson, M.[Martha], Marchiori, E.[Elena],
The Representation of Speech in Deep Neural Networks,
MMMod19(II:194-205).
Springer DOI 1901
BibRef

Roth, J., Chaudhuri, S., Klejch, O., Marvin, R., Gallagher, A., Kaver, L., Ramaswamy, S., Stopczynski, A., Schmid, C., Xi, Z., Pantofaru, C.,
Supplementary Material: AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection,
MMVAMTC19(3718-3722)
IEEE DOI 2004
audio signal processing, audio-visual systems, face recognition, object detection, speaker recognition, video signal processing, neural networks BibRef

Wang, F., Chen, W., Yang, Z., Xu, B.,
Self-Attention Based Network for Punctuation Restoration,
ICPR18(2803-2808)
IEEE DOI 1812
Decoding, Neural networks, Training, Encoding, Feature extraction, Adaptation models BibRef

Tokozume, Y., Ushiku, Y., Harada, T.,
Between-Class Learning for Image Classification,
CVPR18(5486-5494)
IEEE DOI 1812
Training, Image recognition, Standards, Speech recognition, Data models, Neural networks, Learning systems BibRef

Smirnov, E., Ivanova, E., Melnikov, A., Kalinovskiy, I., Oleinik, A., Luckyanets, E.,
Hard Example Mining with Auxiliary Embeddings,
DFW18(37-3709)
IEEE DOI 1812
Training, Prototypes, Measurement, Speech recognition, Face recognition, Neural networks, Image recognition BibRef

Ding, K., Luo, N., Xu, Y., Ke, D., Su, K.,
Mutual-optimization Towards Generative Adversarial Networks For Robust Speech Recognition,
ICPR18(2699-2704)
IEEE DOI 1812
Generative adversarial networks, Generators, Speech enhancement, Noise measurement, Speech recognition, generative adversarial networks BibRef

Li, C., Zhu, L., Xu, S., Gao, P., Xu, B.,
Recurrent Neural Network Based Small-footprint Wake-up-word Speech Recognition System with a Score Calibration Method,
ICPR18(3222-3227)
IEEE DOI 1812
Dynamic programming, Feature extraction, Speech recognition, Hidden Markov models, Real-time systems, dynamic programming search BibRef

Li, C., Zhu, L., Xu, S., Gao, P., Xu, B.,
Compression of Acoustic Model via Knowledge Distillation and Pruning,
ICPR18(2785-2790)
IEEE DOI 1812
Computational modeling, Training, Speech recognition, Acoustics, Neurons, Brain modeling, Recurrent neural networks, model compression BibRef

Zhang, S., Liu, W.[Wen], Qin, Y.,
Wake-up-word spotting using end-to-end deep neural network system,
ICPR16(2878-2883)
IEEE DOI 1705
Computational modeling, Hidden Markov models, Logic gates, Neural networks, Speech recognition, Training, CTC, LSTM, RNN, Wake-up-Word system, speech, recognition BibRef

Zhang, S.L.[Shi-Lei], Qin, Y.,
Rapid feature space MLLR speaker adaptation for deep neural network acoustic modeling,
ICPR16(2889-2894)
IEEE DOI 1705
Acoustics, Adaptation models, Data models, Hidden Markov models, Standards, Training, Transforms, Deep Neural Networks, FMLLR, bilinear models, rapid, speaker, adaptation BibRef

Zheng, H.[Huadi], Cai, W., Zhou, T.Y.[Tian-Yan], Zhang, S.L.[Shi-Lei], Li, M.,
Text-independent voice conversion using deep neural network based phonetic level features,
ICPR16(2872-2877)
IEEE DOI 1705
Covariance matrices, Data mining, Data models, Feature extraction, Speech, Training, Training data, Gaussian mixture model, deep neural network, phoneme posterior probability, voice, conversion BibRef

Zhang, B.[Bo], Gan, Y.Q.[Yu-Qin], Song, Y.[Yan], Tang, B.L.[Ben-Lai],
Application of pronunciation knowledge on phoneme recognition by LSTM neural network,
ICPR16(2906-2911)
IEEE DOI 1705
Automata, Dictionaries, Hidden Markov models, Linear programming, Neural networks, Speech, Training, connectionist temporal classification, phoneme recognition, pronunciation, knowledge BibRef

García, F.[Fernando], Sanchis, E.[Emilio], Hurtado, L.F.[Lluís F.], Segarra, E.[Encarna],
Adaptive Training for Robust Spoken Language Understanding,
CIARP15(519-526).
Springer DOI 1511
BibRef

Pastor, J.[Joan], Hurtado, L.F.[Lluís F.], Segarra, E.[Encarna], Sanchis, E.[Emilio],
Language Modelization and Categorization for Voice-Activated QA,
CIARP11(475-482).
Springer DOI 1111
BibRef

García, F.[Fernando], Hurtado, L.F.[Lluís F.], Sanchis, E.[Emilio], Segarra, E.[Encarna],
An Active Learning Approach for Statistical Spoken Language Understanding,
CIARP11(565-572).
Springer DOI 1111
BibRef

Hurtado, L.F.[Lluís F.], Griol, D.[David], Sanchis, E.[Emilio], Segarra, E.[Encarna],
A Statistical User Simulation Technique for the Improvement of a Spoken Dialog System,
CIARP07(743-752).
Springer DOI 0711
BibRef
Earlier: A2, A1, A4, A3:
A Dialog Management Methodology Based on Neural Networks and Its Application to Different Domains,
CIARP08(643-650).
Springer DOI 0809
BibRef

He, H.Y.[Hai-Yan], Wen, C.Y.[Cheng-Yi],
ART2-based multiple MLPs neural network for speaker-independent recognition of isolated words,
ICPR92(II:590-593).
IEEE DOI 9208
BibRef

Chapter on New Unsorted Entries, and Other Miscellaneous Papers continues in
Speech Analysis, other than Recognition .

Last update:Jul 11, 2026 at 11:55:55