21.3.4.1 Combined Audio Visual Recognition

Chapter Contents (Back)
Real Time Vision. Application, Lipreading. Speech.

Wu, J.X.[Jian-Xiong], Chan, C.[Chorkin],
Recognition of phonetic labels of the timit speech corpus by means of an artificial neural network,
PR(24), No. 11, 1991, pp. 1085-1091.
WWW Version. 0401 BibRef

Wu, J.T.[Jian-Tong], Tamura, S.[Shinichi], Mitsumoto, H.[Hiroshi], Kawai, H.[Hideo], Kurosu, K.[Kenji], Okazaki, K.[Kozo],
Neural network vowel-recognition jointly using voice features and mouth shape image,
PR(24), No. 10, 1991, pp. 921-927.
WWW Version. 0401 BibRef

Lavagetto, F.,
Time-Delay Neural Networks for Estimating Lip Movements from Speech Analysis: A Useful Tool in Audio Video Synchronization,
CirSysVideo(7), No. 5, October 1997, pp. 786-800.
IEEE Top Reference. 9710 BibRef

Movellan, J.R., Mineiro, P.,
Robust Sensor Fusion: Analysis and Application to Audio-Visual Speech Recognition,
MachLearn(32), No. 2, August 1998, pp. 85-100. 9810 BibRef

Wachsmuth, S.[Sven], Socher, G.[Gudrun], Brandt-Pook, H.[Hans], Kummert, F.[Franz], Sagerer, G.[Gerhard],
Integration of Vision and Speech Understanding Using Bayesian Networks,
Videre(1), No. 4, Winter 2000, pp. xx-yy. 0005 BibRef
Earlier: A1, A3, A2, A4, A5:
Multilevel Integration of Vision and Speech Understanding Using Bayesian Networks,
CVS99(231 ff.).
HTML Version. 0209 BibRef

Chien, J.T., Lin, M.S.,
Frame-synchronous noise compensation for hands-free speech recognition in car environments,
VISP(147), No. 6, December 2000, pp. 508-515. 0101 BibRef

Patel, D., Turner, L.F.,
Effects of ATM network impairments on audio-visual broadcast applications,
VISP(147), No. 5, October 2000, pp. 436-444. 0101 BibRef

Aleksic, P.S.[Petar S.], Williams, J.J.[Jay J.], Wu, Z.L.[Zhi-Lin], Katsaggelos, A.K.[Aggelos K.],
Audio-Visual Speech Recognition Using MPEG-4 Compliant Visual Features,
JASP(2002), No. 11, November 2002, pp. 1213.
HTML Version. 0304 BibRef
Earlier:
Audio-visual continuous speech recognition using MPEG-4 compliant visual features,
ICIP02(I: 960-963).
IEEE Abstract. IEEE Top Reference. 0210 BibRef

Aleksic, P.S.[Petar S.], Katsaggelos, A.K.[Aggelos K.],
Audio-Visual Biometrics,
PIEEE(94), No. 11, November 2006, pp. 2025-2044.
IEEE DOI may work or IEEE-CS DOI may work. 0611 BibRef

Aleksic, P.S.[Petar S.], Katsaggelos, A.K.[Aggelos K.],
Speech-to-video synthesis using MPEG-4 compliant visual features,
CirSysVideo(14), No. 5, May 2004, pp. 682-692.
IEEE Abstract. IEEE Top Reference. 0407 BibRef
Earlier:
Comparison of MPEG-4 Facial Animation Parameter Groups with Respect to Audio-Visual Speech Recognition Performance,
ICIP05(III: 501-504).
IEEE DOI may work or IEEE-CS DOI may work. 0512 BibRef

Jiang, J.T.[Jin-Tao], Alwan, A.[Abeer], Keating, P.A.[Patricia A.], Auer Jr., E.T.[Edward T.], Bernstein, L.E.[Lynne E.],
On the Relationship between Face Movements, Tongue Movements, and Speech Acoustics,
JASP(2002), No. 11, November 2002, pp. 1174.
HTML Version. 0304 BibRef

Sodoyer, D.[David], Schwartz, J.L.[Jean-Luc], Girin, L.[Laurent], Klinkisch, J.[Jacob], Jutten, C.[Christian],
Separation of Audio-Visual Speech Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli,
JASP(2002), No. 11, November 2002, pp. 1165.
HTML Version. 0304 BibRef

Zotkin, D.N.[Dmitry N.], Duraiswami, R.[Ramani], Davis, L.S.[Larry S.],
Joint Audio-Visual Tracking Using Particle Filters,
JASP(2002), No. 11, November 2002, pp. 1154.
HTML Version. 0304 BibRef

Heckmann, M.[Martin], Berthommier, F.[Frédéric], Kroschel, K.[Kristian],
Noise Adaptive Stream Weighting in Audio-Visual Speech Recognition,
JASP(2002), No. 11, November 2002, pp. 1260.
HTML Version. 0304 BibRef

Nefian, A.V.[Ara V.], Liang, L.H.[Lu-Hong], Pi, X.B.[Xiao-Bo], Liu, X.X.[Xiao-Xing], Murphy, K.[Kevin],
Dynamic Bayesian Networks for Audio-Visual Speech Recognition,
JASP(2002), No. 11, November 2002, pp. 1274.
HTML Version. 0304 BibRef

Nefian, A.V.[Ara V.], Liang, L.H.[Lu Hong], Fu, T.Y.[Tie-Yan], Liu, X.X.[Xiao Xing],
A Bayesian Approach to Audio-Visual Speaker Identification,
AVBPA03(761-769).
HTML Version. 0310 BibRef

Patterson, E.K.[Eric K.], Gurbuz, S.[Sabri], Tufekci, Z.[Zekeriya], Gowdy, J.N.[John N.],
Moving-Talker, Speaker-Independent Feature Study, and Baseline Results Using the CUAVE Multimodal Speech Corpus,
JASP(2002), No. 11, November 2002, pp. 1189.
HTML Version. 0304 BibRef

Gurbuz, S.[Sabri], Patterson, E.K.[Eric K.], Tufekci, Z.[Zekeriya], Gowdy, J.N.[John N.],
Affine-Invariant Visual Features Contain Supplementary Information to Enhance Speech Recognition,
AVBPA01(175).
HTML Version. 0310 BibRef

Garg, A.[Ashutosh], Pavlovic, V.[Vladimir], Rehg, J.M.[James M.],
Boosted learning in dynamic Bayesian networks for multimodal speaker detection,
PIEEE(91), No. 9, September 2003, pp. 1355-1369.
IEEE DOI may work or IEEE-CS DOI may work. 0309 BibRef
Earlier:
Audio-visual speaker detection using dynamic Bayesian networks,
AFGR00(384-390).
IEEE DOI may work or IEEE-CS DOI may work. 0003 BibRef

Pavlovic, V.[Vladimir], Garg, A.[Ashutosh], Rehg, J.M.[James M.], Huang, T.S.[Thomas S.],
Multimodal Speaker Detection using Error Feedback Dynamic Bayesian Networks,
CVPR00(II: 34-41).
IEEE Abstract. IEEE Top Reference.
WWW Version. 0005 BibRef

Pavlovic, V., Berry, G., and Huang, T.S.,
Integration of Audio/Visual Information for Use in Human-Computer Intelligent Interaction,
ICIP97(I: 121-124).
IEEE DOI may work or IEEE-CS DOI may work. BibRef 9700

Choudhury, T.[Tanzeem], Rehg, J.M., Pavlovic, V., Pentland, A.P.,
Boosting and structure learning in dynamic Bayesian networks for audio-visual speaker detection,
ICPR02(III: 789-794).
IEEE DOI may work or IEEE-CS DOI may work. 0211 BibRef

Pavlovic, V.[Vladimir],
Multimodal tracking and classification of audio-visual features,
ICIP98(I: 343-347).
IEEE DOI may work or IEEE-CS DOI may work. 9810 BibRef

Rehg, J.M.[James M.], Murphy, K.P.[Kevin P.], Fieguth, P.W.[Paul W.],
Vision-Based Speaker Detection Using Bayesian Networks,
CVPR99(II: 110-116).
IEEE Abstract. IEEE Top Reference.
WWW Version. More particuarly the one talking. BibRef 9900

Kalberer, G.A.[Gregor A.], Müller, P.[Pascal], Van Gool, L.J.[Luc J.],
Visual speech, a trajectory in viseme space,
IJIST(13), No. 1, 2003, pp. 74-84.
WWW Version. 0308 BibRef

Sharma, R., Yeasin, M., Krahnstoever, N., Rauschert, I., Cai, G., Brewer, I., MacEachren, A.M., Sengupta, K.,
Speech-gesture driven multimodal interfaces for crisis management,
PIEEE(91), No. 9, September 2003, pp. 1327-1354.
IEEE DOI may work or IEEE-CS DOI may work. 0309 BibRef

Potamianos, G., Neti, C., Gravier, G., Garg, A., Senior, A.W.,
Recent advances in the automatic recognition of audiovisual speech,
PIEEE(91), No. 9, September 2003, pp. 1306-1326.
IEEE DOI may work or IEEE-CS DOI may work. 0309 BibRef

Kaynak, M.N., Zhi, Q., Cheok, A.D., Sengupta, K., Jian, Z., Chung, K.C.,
Analysis of Lip Geometric Features for Audio-Visual Speech Recognition,
SMC-A(34), No. 4, July 2004, pp. 564-570.
IEEE Abstract. IEEE Top Reference. 0407 BibRef

Foo, S.W.[Say Wei], Lian, Y.[Yong], Dong, L.[Liang],
Recognition of visual speech elements using adaptively boosted hidden Markov models,
CirSysVideo(14), No. 5, May 2004, pp. 693-705.
IEEE Abstract. IEEE Top Reference. 0407 BibRef

Albiol, A.[Alberto], Torres, L.[Luis], Delp, E.J.[Edward J.],
Fully automatic face recognition system using a combined audio-visual approach,
VISP(152), No. 3, June 2005, pp. 318-326.
WWW Version. 0510 BibRef
Earlier:
A Fast Anchor Person Searching Scheme in News Sequences,
AVBPA01(366).
HTML Version. 0310 BibRef
And:
An Unsupervised Color Image Segmentation Algorithm for Face Detection Applications,
ICIP01(II: 681-684).
IEEE Abstract. IEEE Top Reference. 0108 BibRef
Earlier:
Optimum Color Spaces for Skin Detection,
ICIP01(I: 122-124).
IEEE Abstract. IEEE Top Reference. 0108 BibRef

Kleindienst, J.[Jan], Macek, T.[Tomáš], Serédi, L.[Ladislav], Šedivý, J.[Jan],
Interaction framework for home environment using speech and vision,
IVC(25), No. 12, 3 December 2007, pp. 1836-1847.
WWW Version. 0710 BibRef
Earlier:
Djinn: Interaction Framework for Home Environment Using Speech and Vision,
CVHCI04(153-164).
WWW Version. 0505Multi-modal; Computer-vision; Context-aware; Speech recognition BibRef

Palanivel, S., Yegnanarayana, B.,
Multimodal person authentication using speech, face and visual speech,
CVIU(109), No. 1, January 2008, pp. 44-55.
WWW Version. 0801Multimodal person authentication; Face tracking; Eye location; Visual speech; Multiscale morphological dilation and erosion; Autoassociative neural network BibRef

Talantzis, F., Pnevmatikakis, A., Constantinides, A.G.,
Audio-Visual Active Speaker Tracking in Cluttered Indoors Environments,
SMC-B(37), No. 3, June 2007, pp. 799-807.
IEEE DOI may work or IEEE-CS DOI may work. 0711 BibRef

Chetty, G.[Girija], Wagner, M.[Michael],
Robust face-voice based speaker identity verification using multilevel fusion,
IVC(26), No. 9, 1 September 2008, pp. 1249-1260.
WWW Version. 0806 BibRef
Earlier:
Audio Visual Speaker Verification Based on Hybrid Fusion of Cross Modal Features,
PReMI07(469-478).
WWW Version. 0712Lip; 3D Face; Voice; Biometric; Identity verification; Robust; Multilevel fusion BibRef

Delakis, M.[Manolis], Gravier, G.[Guillaume], Gros, P.[Patrick],
Audiovisual integration with Segment Models for tennis video parsing,
CVIU(111), No. 2, August 2008, pp. 142-154.
WWW Version. 0808Hidden Markov Models; Segment Models; Multimodal fusion; Video indexing; Video summarization BibRef


Pachoud, S., Gong, S., Cavallaro, A.,
Video Augmentation for Improving Audio Speech Recognition under Noise,
BMVC08(xx-yy).
PDF Version. 0809 BibRef

Horii, Y.[Yu], Kawashima, H.[Hiroaki], Matsuyama, T.[Takashi],
Speaker detection using the timing structure of lip motion and sound,
CVPR4HB08(1-8).
IEEE DOI may work or IEEE-CS DOI may work. 0806 BibRef

Rúa, E.A.[Enrique Argones], Castro, J.L.A.[José Luis Alba], Mateo, C.G.[Carmen García],
Quality-Based Score Normalization for Audiovisual Person Authentication,
ICIAR08(xx-yy).
WWW Version. 0806 BibRef

Wang, L.[Lei], Tjondrongoro, D.[Dian], Liu, Y.[Yuee],
Clustering and Visualizing Audio-Visual Dataset on Mobile Devices in a Topic-Oriented Manner,
Visual07(310-321).
WWW Version. 0706 BibRef

Zajdel, W., Krijnders, J.D., Andringa, T., Gavrila, D.M.,
CASSANDRA: audio-video sensor fusion for aggression detection,
AVSBS07(200-205).
IEEE DOI may work or IEEE-CS DOI may work. 0709 BibRef

Stødle, D.[Daniel], Bjørndalen, J.M.[John Markus], Anshus, O.J.[Otto J.],
A System for Hybrid Vision- and Sound-Based Interaction with Distal and Proximal Targets on Wall-Sized, High-Resolution Tiled Displays,
CVHCI07(59-68).
WWW Version. 0710 BibRef

van Hengel, P.W.J., Andringa, T.C.,
Verbal aggression detection in complex social environments,
AVSBS07(15-20).
IEEE DOI may work or IEEE-CS DOI may work. 0709 BibRef

Ikeda, O.[Osamu],
Detection of a Speaker in Video by Combined Analysis of Speech Sound and Mouth Movement,
ISVC07(II: 602-610).
WWW Version. 0711 BibRef

Das, A.[Amitava],
Audio Visual Person Authentication by Multiple Nearest Neighbor Classifiers,
ICB07(1114-1123).
WWW Version. 0708 BibRef

Xin, L.[Le], Tao, J.H.[Jian-Hua], Tan, T.N.[Tie-Niu],
Dynamic Audio-Visual Mapping using Fused Hidden Markov Model Inversion Method,
ICIP07(III: 293-296).
IEEE DOI may work or IEEE-CS DOI may work. 0709 BibRef

Casanovas, A.L.[Anna Llagostera], Monaci, G.[Gianluca], Vandergheynst, P.[Pierre],
Blind Audiovisual Source Separation using Sparse Representations,
ICIP07(III: 301-304).
IEEE DOI may work or IEEE-CS DOI may work. 0709 BibRef

Barzelay, Z.[Zohar], Schechner, Y.Y.[Yoav Y.],
Harmony in Motion,
CVPR07(1-8).
IEEE DOI may work or IEEE-CS DOI may work. 0706Audio-visual analysis. BibRef

O'Donovan, A.[Adam], Duraiswami, R.[Ramani], Neumann, J.[Jan],
Microphone Arrays as Generalized Cameras for Integrated Audio Visual Processing,
CVPR07(1-8).
IEEE DOI may work or IEEE-CS DOI may work. 0706 BibRef

Abbas, J.[Jehanzeb], Dagli, C.K.[Charlie K.], Huang, T.S.[Thomas S.],
A Multimodality Framework for Creating Speaker/Non-Speaker Profile Databases for Real-World Video,
SLAM07(1-8).
IEEE DOI may work or IEEE-CS DOI may work. 0706 BibRef

Kushal, A.[Akash], Rahurkar, M.[Mandar], Fei-Fei, L.[Li], Ponce, J.[Jean], Huang, T.[Thomas],
Audio-Visual Speaker Localization Using Graphical Models,
ICPR06(I: 291-294).
WWW Version. 0609 BibRef

Tsuji, T.[Tokuo], Yamamoto, K.[Kenkichi], Ishii, I.[Idaku],
Real-time Sound Source Localization Based on Audiovisual Frequency Integration,
ICPR06(IV: 322-325).
WWW Version. 0609 BibRef

Monaci, G.[Gianluca], Vandergheynst, P.[Pierre],
Audiovisual Gestalts,
PercOrg06(200).
IEEE DOI may work or IEEE-CS DOI may work. 0609 BibRef

Zhu, Z.G.[Zhi-Gang], Li, W.H.[Wei-Hong], Molina, E.[Edgardo], Wolberg, G.[George],
LDV Sensing and Processing for Remote Hearing in a Multimodal Surveillance System,
MSCSAS07(1-2).
IEEE DOI may work or IEEE-CS DOI may work. 0706 BibRef

Zhu, Z.G.[Zhi-Gang], Li, W.H.[Wei-Hong], Wolberg, G.,
Integrating LDV Audio and IR Video for Remote Multimodal Surveillance,
OTCBVS05(III: 10-10).
IEEE DOI may work or IEEE-CS DOI may work. 0507 BibRef

Chetty, G.[Girija], Wagner, M.[Michael],
Face-Voice Authentication Based on 3D Face Models,
ACCV06(I:559-568).
WWW Version. 0601 BibRef

Wu, Z.Y.[Zhi-Yong], Cai, L.H.[Lian-Hong], Meng, H.[Helen],
Multi-level Fusion of Audio and Visual Features for Speaker Identification,
ICB06(493-499).
WWW Version. 0601 BibRef

Yang, P.[Pu], Yang, Y.C.[Ying-Chun], Wu, Z.H.[Zhao-Hui],
Exploiting Glottal Information in Speaker Recognition Using Parallel GMMs,
AVBPA05(804).
WWW Version. 0509 BibRef

Lei, Z.[Zhenchun], Yang, Y.C.[Ying-Chun], Wu, Z.H.[Zhao-Hui],
An UBM-Based Reference Space for Speaker Recognition,
ICPR06(IV: 318-321).
WWW Version. 0609 BibRef

Li, D.D.[Dong-Dong], Yang, Y.C.[Ying-Chun], Wu, Z.H.[Zhao-Hui],
Dynamic Bayesian Networks for Audio-Visual Speaker Recognition,
ICB06(539-545).
WWW Version. 0601 BibRef

Megherbi, N., Ambellouis, S., Colot, O., Cabestaing, F.,
Data Association in Multi-Target Tracking Using Belief Theory: Handling Target Emergence and Disappearance Issue,
AVSBS05(517-521).
IEEE DOI may work or IEEE-CS DOI may work. 0602 BibRef

Megherbi, N., Ambellouis, S., Colot, O., Cabestaing, F.,
Joint audio-video people tracking using belief theory,
AVSBS05(135-140).
IEEE DOI may work or IEEE-CS DOI may work. 0602 BibRef

Saenko, K.[Kate], Livescu, K.[Karen], Siracusa, M.[Michael], Wilson, K.[Kevin], Glass, J.[James], Darrell, T.J.[Trevor J.],
Visual Speech Recognition with Loosely Synchronized Feature Streams,
ICCV05(II: 1424-1431).
IEEE DOI may work or IEEE-CS DOI may work. 0510 BibRef

Lei, Z.[Zhenchun], Yang, Y.C.[Ying-Chun], Wu, Z.H.[Zhao-Hui],
Constructing the Discriminative Kernels Using GMM for Text-Independent Speaker Identification,
IWBRS05(165).
WWW Version. 0601 BibRef
And:
Speaker Identification Using the VQ-Based Discriminative Kernels,
AVBPA05(797).
WWW Version. 0509 BibRef

Fox, N.A.[Niall A.], O'Mullane, B.A.[Brian A.], Reilly, R.B.[Richard B.],
VALID: A New Practical Audio-Visual Database, and Comparative Results,
AVBPA05(777).
WWW Version. 0509
WWW Version. Dataset, Faces. BibRef

Sharma, P.[Prag], Reilly, R.B.[Richard B.],
The UCD Colour Face Image Database for Face Detection,
Online1998.
WWW Version. Dataset, Faces. BibRef 9800

Fox, N.A.[Niall A.], O'Mullane, B.A.[Brian A.], Reilly, R.B.[Richard B.],
Audio-Visual Speaker Identification via Adaptive Fusion Using Reliability Estimates of Both Modalities,
AVBPA05(787).
WWW Version. 0509 BibRef

Li, X.[Xin], Sun, L.[Luo], Tao, L.M.[Lin-Mi], Xu, G.Y.[Guang-You], Jia, Y.[Ying],
A Speaker Tracking Algorithm Based on Audio and Visual Information Fusion Using Particle Filter,
ICIAR04(II: 572-580).
WWW Version. 0409 BibRef

Zhang, D., Ghobakhlou, A., Kasabov, N.,
An adaptive model of person identification combining speech and image information,
ICARCV04(I: 413-418).
IEEE DOI may work or IEEE-CS DOI may work. 0412 BibRef

Kratt, J.[Jan], Metze, F.[Florian], Stiefelhagen, R.[Rainer], Waibel, A.[Alex],
Large Vocabulary Audio-Visual Speech Recognition Using the Janus Speech Recognition Toolkit,
DAGM04(488-495).
WWW Version. 0505 BibRef

Hanafiah, Z.M., Yamazaki, C., Nakamura, A., Kuno, Y.,
Understanding inexplicit utterances using vision for helper robots,
ICPR04(IV: 925-928).
IEEE DOI may work or IEEE-CS DOI may work. 0409 BibRef

Hermann, T.[Thomas], Henning, T.[Thomas], Ritter, H.[Helge],
Gesture Desk an Integrated Multi-modal Gestural Workplace for Sonification,
GW03(369-379).
WWW Version. 0405 BibRef

Merola, G.[Giorgio], Poggi, I.[Isabella],
Multimodality and Gestures in the Teacher's Communication,
GW03(101-111).
WWW Version. 0405 BibRef

Althoff, F.[Frank], McGlaun, G.[Gregor], Lang, M.[Manfred], Rigoll, G.[Gerhard],
Evaluating Multimodal Interaction Patterns in Various Application Scenarios,
GW03(421-435).
WWW Version. 0405 BibRef

Kranstedt, A.[Alfred], Kühnlein, P.[Peter], Wachsmuth, I.[Ipke],
Deixis in Multimodal Human Computer Interaction: An Interdisciplinary Approach,
GW03(112-123).
WWW Version. 0405 BibRef

Saeed, K.[Khalid], Kozlowski, M.[Marcin],
An Image-Based System for Spoken-Letter Recognition,
CAIP03(494-502).
WWW Version. 0311 BibRef

Ho, P.[Purdy], Armington, J.[John],
A Dual-Factor Authentication System Featuring Speaker Verification and Token Technology,
AVBPA03(128-136).
HTML Version. 0310 BibRef

Fox, N.A.[Niall A.], Reilly, R.B.[Richard B.],
Audio-Visual Speaker Identification Based on the Use of Dynamic Audio and Visual Features,
AVBPA03(743-751).
HTML Version. 0310 BibRef

Czyz, J.[Jacek], Bengio, S.[Samy], Marcel, C.[Christine], Vandendorpe, L.[Luc],
Scalability Analysis of Audio-Visual Person Identity Verification,
AVBPA03(752-760).
HTML Version. 0310 BibRef

Bengio, S.[Samy],
Multimodal Authentication Using Asynchronous HMMs,
AVBPA03(770-777).
HTML Version. 0310 BibRef

Lucey, S.[Simon], Chen, T.H.[Tsu-Han],
Improved Audio-Visual Speaker Recognition via the Use of a Hybrid Combination Strategy,
AVBPA03(929-936).
HTML Version. 0310 BibRef

Krahnstoever, N., Schapira, E., Kettebeko, S., Sharma, R.,
Multimodal human-computer interaction for crisis management systems,
WACV02(203-207).
IEEE Abstract. IEEE Top Reference. 0303 BibRef

Kettebekov, S., Yeasin, M., Sharma, R.,
Improving continuous gesture recognition with spoken prosody,
CVPR03(I: 565-570).
IEEE Abstract. IEEE Top Reference. 0307 BibRef

Higgins, J.E., Damper, R.I.,
An HMM-Based Subband Processing Approach to Speaker Identification,
AVBPA01(169).
HTML Version. 0310 BibRef

Poh, N.[Norman], Korczak, J.[Jerzy],
Hybrid Biometric Person Authentication Using Face and Voice Features,
AVBPA01(348).
HTML Version. 0310 BibRef

Nakamura, S.[Satoshi],
Fusion of Audio-Visual Information for Integrated Speech Processing,
AVBPA01(127).
HTML Version. 0310 BibRef

Sullivan, K.P.H.[Kirk P.H.], Pelecanos, J.[Jason],
Revisiting Carl Bildt's Impostor: Would a Speaker Verification System Foil Him?,
AVBPA01(144).
HTML Version. 0310 BibRef

Geiger, G.[Gadi], Ezzat, T.[Tony], Poggio, T.[Tomaso],
Perceptual Evaluation of Video-Realistic Speech,
MIT AIMAIM-2003-003, February 28, 2003.
WWW Version. We describe here our scheme and its application to a new video-realistic (potentially indistinguishable from real recorded video) visual-speech animation system, called Mary 101. 0306 BibRef

Gordan, M., Kotropoulos, C., Pitas, I.,
Application of support vector machines classifiers to visual speech recognition,
ICIP02(III: 129-132).
IEEE Abstract. IEEE Top Reference. 0210 BibRef

Blake, A., Gangnet, M., Perez, P., Vermaak, J.,
Integrated tracking with vision and sound,
CIAP01(354-357).
IEEE Top Reference. 0210 BibRef

Zhang, X.Z.[Xiao-Zheng], Merserratt, R.M., Clements, M.,
Bimodal fusion in audio-visual speech recognition,
ICIP02(I: 964-967).
IEEE Abstract. IEEE Top Reference. 0210 BibRef

Graf, H.P., Cosatto, E., Strom, V., Huang, F.J.[Fu Jie],
Visual prosody: facial movements accompanying speech,
AFGR02(381-386).
IEEE DOI may work or IEEE-CS DOI may work. 0206 BibRef

Qi, Y.[Yuan],
Learning Algorithms for Audio and Video Processing: Independent Component Analysis and Support Vector Machine Based Approaches,
UMD--TR4174, August 2000.
WWW Version.
WWW Version. BibRef 0008

Nankaku, Y., Tokuda, K., Kitamura, T.,
Normalized Training for HMM-based Visual Speech Recognition,
ICIP00(Vol III: 234-237).
IEEE Abstract. IEEE Top Reference. 0008 BibRef

Zhang, Y.[You], Levinson, S.[Stephen], Huang, T.S.[Thomas S.],
Speaker Independent Audio-Visual Speech Recognition,
ICME00(TP8). 0007 BibRef

Pan, H.[Hao], Huang, T.S.[Thomas S.],
A New Approach to Integrate Audio and Visual Features of Speech,
ICME00(TP8). 0007 BibRef

Potamianos, G.[Gerasimos], Verma, A.[Ashish], Neti, C.[Chalapathy], Iyengar, G.[Giri], Basu, S.[Sankar],
A Cascade Image Transform for Speaker Independent Automatic Speech Reading,
ICME00(TP8). 0007 BibRef

Pan, H., Liang, Z.P., Huang, T.S.,
Fusing Audio and Visual Features of Speech,
ICIP00(Vol III: 214-217).
IEEE Abstract. IEEE Top Reference. 0008 BibRef

Faruquie, T.A., Majumdar, A., Rajput, N., Subramaniam, L.V.,
Large Vocabulary Audio-visual Speech Recognition Using Active Shape Models,
ICPR00(Vol III: 106-109).
IEEE DOI may work or IEEE-CS DOI may work.
HTML Version. 0009 BibRef

Yu, K., Jiang, X., Bunke, H.,
Combining Acoustic and Visual Classifiers for the Recognition of Spoken Sentences,
ICPR00(Vol II: 491-494).
IEEE DOI may work or IEEE-CS DOI may work.
HTML Version. 0009 BibRef

Nam, J., Alghoniemy, M., Tewfik, A.H.[Ahmed H.],
Audio-visual content-based violent scene characterization,
ICIP98(I: 353-357).
IEEE DOI may work or IEEE-CS DOI may work. 9810 BibRef

Luettin, J.[Juergen], Dupont, S.[Stéphane],
Continuous Audio-Visual Speech Recognition,
ECCV98(II: 657).
WWW Version. BibRef 9800

Kaucic, R., Dalton, B., Blake, A.,
Real-Time Lip Tracking for Audio-Visual Speech Recognition Applications,
ECCV96(II:376-387).
WWW Version. Snakes. BibRef 9600

Yang, J.[Jie], Xiao, J.[Jing], Ritter, M.[Max],
Automatic Selection of Visemes for Image-based Visual Speech Synthesis,
ICME00(TP8). 0007 BibRef

Faruquie, T.A.[Tanveer A.], Neti, C.[Chalapathy], Rajput, N.[Nitendra], Subramaniam, L.V.[L. Venkata], Verma, A.[Ashish],
Translingual Visual Speech Synthesis,
ICME00(TP8). 0007 BibRef

Sharma, R.[Rajeev], Cai, J.[Jiongyu], Chakravarthy, S.[Srivatsan], Poddar, I.[Indrajit], Sethi, Y.[Yogesh],
Exploiting Speech/Gesture Co-occurrence for Improving Continuous Gesture Recognition in Weather Narration,
AFGR00(422-427).
IEEE DOI may work or IEEE-CS DOI may work. 0003 BibRef

Yamamoto, E., Nakamura, S., Shikano, K.,
Lip Movement Synthesis from Speech Based on Hidden Markov Models,
AFGR98(154-159).
IEEE DOI may work or IEEE-CS DOI may work. BibRef 9800

Roy, D., Pentland, A.P.,
Automatic spoken affect classification and analysis,
AFGR96(363-367).
IEEE DOI may work or IEEE-CS DOI may work. 9610 BibRef

Petajan, E.D.[Eric D.],
An Architecture for Automatic Lipreading to Enhance Speech Recognition,
CVPR85(40-47). (AT&T Bell Labs) Application, Lipreading. A real hardware implementation of a system that tracks the nostrils and mouth. Improvement over use of acoustic data alone. BibRef 8500

Chapter on Face Recognition, Detection, Tracking, Gesture Recognition, Fingerprints, Biometrics continues in
Mouth Location, Lip Location, Detection .


Last update:Oct 1, 2008 at 09:28:47