21.3.4.1 Combined Audio Visual Recognition and Analysis

Chapter Contents (Back)
Real Time Vision. Application, Lipreading. Speech. Audiovisual Speech. Lip Reading.

Wu, J.X.[Jian-Xiong], Chan, C.[Chorkin],
Recognition of phonetic labels of the TIMIT speech corpus by means of an artificial neural network,
PR(24), No. 11, 1991, pp. 1085-1091.
WWW Link. 0401
BibRef

Wu, J.T.[Jian-Tong], Tamura, S.[Shinichi], Mitsumoto, H.[Hiroshi], Kawai, H.[Hideo], Kurosu, K.[Kenji], Okazaki, K.[Kozo],
Neural network vowel-recognition jointly using voice features and mouth shape image,
PR(24), No. 10, 1991, pp. 921-927.
WWW Link. 0401
BibRef

Lavagetto, F.,
Time-Delay Neural Networks for Estimating Lip Movements from Speech Analysis: A Useful Tool in Audio Video Synchronization,
CirSysVideo(7), No. 5, October 1997, pp. 786-800.
IEEE Top Reference. 9710
BibRef

Movellan, J.R., Mineiro, P.,
Robust Sensor Fusion: Analysis and Application to Audio-Visual Speech Recognition,
MachLearn(32), No. 2, August 1998, pp. 85-100. 9810
BibRef

Wachsmuth, S.[Sven], Socher, G.[Gudrun], Brandt-Pook, H.[Hans], Kummert, F.[Franz], Sagerer, G.F.[Gerhard F.],
Integration of Vision and Speech Understanding Using Bayesian Networks,
Videre(1), No. 4, Winter 2000, pp. xx-yy. 0005
BibRef
Earlier: A1, A3, A2, A4, A5:
Multilevel Integration of Vision and Speech Understanding Using Bayesian Networks,
CVS99(231 ff.).
Springer DOI 0209
BibRef

Chien, J.T., Lin, M.S.,
Frame-synchronous noise compensation for hands-free speech recognition in car environments,
VISP(147), No. 6, December 2000, pp. 508-515. 0101
BibRef

Patel, D., Turner, L.F.,
Effects of ATM network impairments on audio-visual broadcast applications,
VISP(147), No. 5, October 2000, pp. 436-444. 0101
BibRef

Aleksic, P.S.[Petar S.], Williams, J.J.[Jay J.], Wu, Z.L.[Zhi-Lin], Katsaggelos, A.K.[Aggelos K.],
Audio-Visual Speech Recognition Using MPEG-4 Compliant Visual Features,
JASP(2002), No. 11, November 2002, pp. 1213.
WWW Link. 0304
BibRef
Earlier:
Audio-visual continuous speech recognition using MPEG-4 compliant visual features,
ICIP02(I: 960-963).
IEEE DOI 0210
BibRef

Aleksic, P.S.[Petar S.], Katsaggelos, A.K.[Aggelos K.],
Audio-Visual Biometrics,
PIEEE(94), No. 11, November 2006, pp. 2025-2044.
IEEE DOI 0611
BibRef

Aleksic, P.S.[Petar S.], Katsaggelos, A.K.[Aggelos K.],
Speech-to-video synthesis using MPEG-4 compliant visual features,
CirSysVideo(14), No. 5, May 2004, pp. 682-692.
IEEE Abstract. 0407
BibRef
Earlier:
Comparison of MPEG-4 Facial Animation Parameter Groups with Respect to Audio-Visual Speech Recognition Performance,
ICIP05(III: 501-504).
IEEE DOI 0512
BibRef

Jiang, J.T.[Jin-Tao], Alwan, A.[Abeer], Keating, P.A.[Patricia A.], Auer Jr., E.T.[Edward T.], Bernstein, L.E.[Lynne E.],
On the Relationship between Face Movements, Tongue Movements, and Speech Acoustics,
JASP(2002), No. 11, November 2002, pp. 1174.
WWW Link. 0304
BibRef

Sodoyer, D.[David], Schwartz, J.L.[Jean-Luc], Girin, L.[Laurent], Klinkisch, J.[Jacob], Jutten, C.[Christian],
Separation of Audio-Visual Speech Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli,
JASP(2002), No. 11, November 2002, pp. 1165.
WWW Link. 0304
BibRef

Zotkin, D.N.[Dmitry N.], Duraiswami, R.[Ramani], Davis, L.S.[Larry S.],
Joint Audio-Visual Tracking Using Particle Filters,
JASP(2002), No. 11, November 2002, pp. 1154.
WWW Link. 0304
BibRef

Heckmann, M.[Martin], Berthommier, F.[Frédéric], Kroschel, K.[Kristian],
Noise Adaptive Stream Weighting in Audio-Visual Speech Recognition,
JASP(2002), No. 11, November 2002, pp. 1260.
WWW Link. 0304
BibRef

Nefian, A.V.[Ara V.], Liang, L.H.[Lu-Hong], Pi, X.B.[Xiao-Bo], Liu, X.X.[Xiao-Xing], Murphy, K.P.[Kevin P.],
Dynamic Bayesian Networks for Audio-Visual Speech Recognition,
JASP(2002), No. 11, November 2002, pp. 1274.
WWW Link. 0304
BibRef

Nefian, A.V.[Ara V.], Liang, L.H.[Lu Hong], Fu, T.Y.[Tie-Yan], Liu, X.X.[Xiao Xing],
A Bayesian Approach to Audio-Visual Speaker Identification,
AVBPA03(761-769).
Springer DOI 0310
BibRef

Patterson, E.K.[Eric K.], Gurbuz, S.[Sabri], Tufekci, Z.[Zekeriya], Gowdy, J.N.[John N.],
Moving-Talker, Speaker-Independent Feature Study, and Baseline Results Using the CUAVE Multimodal Speech Corpus,
JASP(2002), No. 11, November 2002, pp. 1189.
WWW Link. 0304
BibRef

Gurbuz, S.[Sabri], Patterson, E.K.[Eric K.], Tufekci, Z.[Zekeriya], Gowdy, J.N.[John N.],
Affine-Invariant Visual Features Contain Supplementary Information to Enhance Speech Recognition,
AVBPA01(175).
Springer DOI 0310
BibRef

Garg, A.[Ashutosh], Pavlovic, V.[Vladimir], Rehg, J.M.[James M.],
Boosted learning in dynamic Bayesian networks for multimodal speaker detection,
PIEEE(91), No. 9, September 2003, pp. 1355-1369.
IEEE DOI 0309
BibRef
Earlier:
Audio-visual speaker detection using dynamic Bayesian networks,
AFGR00(384-390).
IEEE DOI 0003
BibRef

Pavlovic, V.[Vladimir], Garg, A.[Ashutosh], Rehg, J.M.[James M.], Huang, T.S.[Thomas S.],
Multimodal Speaker Detection using Error Feedback Dynamic Bayesian Networks,
CVPR00(II: 34-41).
IEEE DOI 0005
BibRef

Pavlovic, V., Berry, G., and Huang, T.S.,
Integration of Audio/Visual Information for Use in Human-Computer Intelligent Interaction,
ICIP97(I: 121-124).
IEEE DOI BibRef 9700

Choudhury, T.[Tanzeem], Rehg, J.M., Pavlovic, V., Pentland, A.P.,
Boosting and structure learning in dynamic Bayesian networks for audio-visual speaker detection,
ICPR02(III: 789-794).
IEEE DOI 0211
BibRef

Pavlovic, V.[Vladimir],
Multimodal tracking and classification of audio-visual features,
ICIP98(I: 343-347).
IEEE DOI 9810
BibRef

Rehg, J.M.[James M.], Murphy, K.P.[Kevin P.], Fieguth, P.W.[Paul W.],
Vision-Based Speaker Detection Using Bayesian Networks,
CVPR99(II: 110-116).
IEEE DOI More particuarly the one talking. BibRef 9900

Kalberer, G.A.[Gregor A.], Müller, P.[Pascal], Van Gool, L.J.[Luc J.],
Visual speech, a trajectory in viseme space,
IJIST(13), No. 1, 2003, pp. 74-84.
DOI Link 0308
BibRef

Sharma, R., Yeasin, M., Krahnstoever, N., Rauschert, I., Cai, G., Brewer, I., MacEachren, A.M., Sengupta, K.,
Speech-gesture driven multimodal interfaces for crisis management,
PIEEE(91), No. 9, September 2003, pp. 1327-1354.
IEEE DOI 0309
BibRef

Potamianos, G., Neti, C., Gravier, G., Garg, A., Senior, A.W.,
Recent advances in the automatic recognition of audiovisual speech,
PIEEE(91), No. 9, September 2003, pp. 1306-1326.
IEEE DOI 0309
BibRef

Kaynak, M.N., Zhi, Q., Cheok, A.D., Sengupta, K., Jian, Z., Chung, K.C.,
Analysis of Lip Geometric Features for Audio-Visual Speech Recognition,
SMC-A(34), No. 4, July 2004, pp. 564-570.
IEEE Abstract. 0407
BibRef

Foo, S.W.[Say Wei], Lian, Y.[Yong], Dong, L.[Liang],
Recognition of visual speech elements using adaptively boosted hidden Markov models,
CirSysVideo(14), No. 5, May 2004, pp. 693-705.
IEEE Abstract. 0407
BibRef

Albiol, A.[Alberto], Torres, L.[Luis], Delp, E.J.[Edward J.],
Fully automatic face recognition system using a combined audio-visual approach,
VISP(152), No. 3, June 2005, pp. 318-326.
DOI Link 0510
BibRef
Earlier:
A Fast Anchor Person Searching Scheme in News Sequences,
AVBPA01(366).
Springer DOI 0310
BibRef
And:
An Unsupervised Color Image Segmentation Algorithm for Face Detection Applications,
ICIP01(II: 681-684).
IEEE DOI 0108
BibRef
Earlier:
Optimum Color Spaces for Skin Detection,
ICIP01(I: 122-124).
IEEE DOI 0108
BibRef

Kleindienst, J.[Jan], Macek, T.[Tomáš], Serédi, L.[Ladislav], Šedivý, J.[Jan],
Interaction framework for home environment using speech and vision,
IVC(25), No. 12, 3 December 2007, pp. 1836-1847.
WWW Link. 0710
BibRef
Earlier:
Djinn: Interaction Framework for Home Environment Using Speech and Vision,
CVHCI04(153-164).
Springer DOI 0505
Multi-modal; Computer-vision; Context-aware; Speech recognition BibRef

Palanivel, S., Yegnanarayana, B.,
Multimodal person authentication using speech, face and visual speech,
CVIU(109), No. 1, January 2008, pp. 44-55.
WWW Link. 0801
Multimodal person authentication; Face tracking; Eye location; Visual speech; Multiscale morphological dilation and erosion; Autoassociative neural network BibRef

Talantzis, F., Pnevmatikakis, A., Constantinides, A.G.,
Audio-Visual Active Speaker Tracking in Cluttered Indoors Environments,
SMC-B(39), No. 1, February 2009, pp. 7-15.
IEEE DOI 0902
BibRef
Earlier: SMC-B(38), No. 3, June 2008, pp. 799-807.
IEEE DOI 0711
The top one is the special issue, it was published early in the other issue. BibRef

Chetty, G.[Girija], Wagner, M.[Michael],
Robust face-voice based speaker identity verification using multilevel fusion,
IVC(26), No. 9, 1 September 2008, pp. 1249-1260.
WWW Link. 0806
BibRef
Earlier:
Audio Visual Speaker Verification Based on Hybrid Fusion of Cross Modal Features,
PReMI07(469-478).
Springer DOI 0712
BibRef
Earlier:
Face-Voice Authentication Based on 3D Face Models,
ACCV06(I:559-568).
Springer DOI 0601
Lip; 3D Face; Voice; Biometric; Identity verification; Robust; Multilevel fusion BibRef

Delakis, M.[Manolis], Gravier, G.[Guillaume], Gros, P.[Patrick],
Audiovisual integration with Segment Models for tennis video parsing,
CVIU(111), No. 2, August 2008, pp. 142-154.
WWW Link. 0808
Hidden Markov Models; Segment Models; Multimodal fusion; Video indexing; Video summarization BibRef

Gravier, G.[Guillaume], Guinaudeau, C.[Camille], Lecorvé, G.[Gwénolé], Sébillot, P.[Pascale],
Exploiting Speech for Automatic TV Delinearization: From Streams to Cross-Media Semantic Navigation,
JIVP(2011), No. 2011, pp. xx-yy.
DOI Link 1104
BibRef

Vajaria, H.[Himanshu], Sankar, R.[Ravi], Kasturi, R.[Ranga],
Exploring Co-Occurence Between Speech and Body Movement for Audio-Guided Video Localization,
CirSysVideo(18), No. 11, November 2008, pp. 1608-1617.
IEEE DOI 0811
BibRef

Vajaria, H.[Himanshu], Islam, T.[Tanmoy], Sarkar, S.[Sudeep], Sankar, R.[Ravi], Kasturi, R.[Ranga],
Audio Segmentation and Speaker Localization in Meeting Videos,
ICPR06(II: 1150-1153).
IEEE DOI 0609
BibRef

Hospedales, T.M.[Timothy M.], Vijayakumar, S.[Sethu],
Structure Inference for Bayesian Multisensory Scene Understanding,
PAMI(30), No. 12, December 2008, pp. 2140-2157.
IEEE DOI 0811
Audio-visual inputs. speakers in meetings. BibRef

Liu, Z.C.[Zi-Cheng], Cohen, M., Bhatnagar, D., Cutler, R., Zhang, Z.Y.[Zheng-You],
Head-Size Equalization for Improved Visual Perception in Video Conferencing,
MultMed(9), No. 7, November 2007, pp. 1520-1527.
IEEE DOI 0905
BibRef

Liu, Z.C.[Zi-Cheng], Cutler, R.[Ross], Cohen, M.[Michael], Zhang, Z.Y.[Zheng-You],
System and method for head size equalization in 360 degree panoramic images,
US_Patent7,184,609, Feb 27, 2007
WWW Link. BibRef 0702

Cutler, R.[Ross],
User interface for a system and method for head size equalization in 360 degree panoramic images,
US_Patent7,149,367, Dec 12, 2006
WWW Link. BibRef 0612

Cutler, R.[Ross], Kapoor, A.[Ashish],
System and method for audio/video speaker detection,
US_Patent7,343,289, Mar 11, 2008
WWW Link. BibRef 0803

Heracleous, P., Aboutabit, N., Beautemps, D.,
Lip Shape and Hand Position Fusion for Automatic Vowel Recognition in Cued Speech for French,
SPLetters(16), No. 5, May 2009, pp. 339-342.
IEEE DOI 0903
BibRef

Zhang, C.[Cha], Yin, P.[Pei], Rui, Y.[Yong], Cutler, R., Viola, P., Sun, X.D.[Xin-Ding], Pinto, N., Zhang, Z.Y.[Zheng-You],
Boosting-Based Multimodal Speaker Detection for Distributed Meeting Videos,
MultMed(10), No. 8, December 2008, pp. 1541-1552.
IEEE DOI 0905
BibRef

Lee, J.S.[Jong-Seok], Park, C.H.[Cheol Hoon],
Robust Audio-Visual Speech Recognition Based on Late Integration,
MultMed(10), No. 5, August 2008, pp. 767-779.
IEEE DOI 0905
BibRef

Saenko, K.[Kate], Livescu, K.[Karen], Glass, J.[James], Darrell, T.J.[Trevor J.],
Multistream Articulatory Feature-Based Models for Visual Speech Recognition,
PAMI(31), No. 9, September 2009, pp. 1700-1707.
IEEE DOI 0907
Lip opening, lip rounding features. BibRef

Saenko, K.[Kate], Livescu, K.[Karen], Siracusa, M.[Michael], Wilson, K.[Kevin], Glass, J.[James], Darrell, T.J.[Trevor J.],
Visual Speech Recognition with Loosely Synchronized Feature Streams,
ICCV05(II: 1424-1431).
IEEE DOI 0510
BibRef

Schuller, B.[Bjorn], Muller, R.[Ronald], Eyben, F.[Florian], Gast, J.[Jurgen], Hornler, B.[Benedikt], Wollmer, M.[Martin], Rigoll, G.[Gerhard], Hothker, A.[Anja], Konosu, H.[Hitoshi],
Being bored? Recognising natural interest by extensive audiovisual integration for real-life application,
IVC(27), No. 12, November 2009, pp. 1760-1774.
Elsevier DOI 0910
Interest recognition; Affective computing; Audiovisual processing BibRef

Eyben, F.[Florian], Wollmer, M.[Martin], Valstar, M.F.[Michel F.], Gunes, H.[Hatice], Schuller, B.[Bjorn], Pantic, M.[Maja],
String-based audiovisual fusion of behavioural events for the assessment of dimensional affect,
FG11(322-329).
IEEE DOI 1103
BibRef

Althoff, F.[Frank], McGlaun, G.[Gregor], Lang, M.K.[Manfred K.], Rigoll, G.[Gerhard],
Evaluating Multimodal Interaction Patterns in Various Application Scenarios,
GW03(421-435).
Springer DOI 0405
BibRef

Casanovas, A.L.[Anna Llagostera], Monaci, G.[Gianluca], Vandergheynst, P.[Pierre], Gribonval, R.,
Blind Audiovisual Source Separation Based on Sparse Redundant Representations,
MultMed(12), No. 5, 2010, pp. 358-371.
IEEE DOI 1008
BibRef
Earlier: A1, A2, A3, Only:
Blind Audiovisual Source Separation using Sparse Representations,
ICIP07(III: 301-304).
IEEE DOI 0709
BibRef

Esch, J.,
Audiovisual Information Fusion in Human-Computer Interfaces and Intelligent Environments: A Survey,
PIEEE(98), No. 10, October 2010, pp. 1690-1691.
IEEE DOI 1003
Article intro. BibRef

Shivappa, S.T., Trivedi, M.M., Rao, B.D.,
Audiovisual Information Fusion in Human-Computer Interfaces and Intelligent Environments: A Survey,
PIEEE(98), No. 10, October 2010, pp. 1692-1715.
IEEE DOI 1003
Survey, Audio-Visual Fusion. BibRef

Claussen, H.[Heiko], Rosca, J.[Justinian], Damper, R.I.[Robert I.],
Signature extraction using mutual interdependencies,
PR(44), No. 3, March 2011, pp. 650-661.
Elsevier DOI 1011
Algorithms; Signal processing; Pattern classification; Signal analysis; Speaker recognition; Face recognition. Mutual interdependence analysis for extracting face signatures or speech signatures. BibRef

Higgins, J.E., Damper, R.I.,
An HMM-Based Subband Processing Approach to Speaker Identification,
AVBPA01(169).
Springer DOI 0310
BibRef

El-Sallam, A.A.[Amar A.], Mian, A.S.[Ajmal S.],
Correlation based speech-video synchronization,
PRL(32), No. 6, 15 April 2011, pp. 780-786.
Elsevier DOI 1103
BibRef
Earlier:
Speech-Video Synchronization Using Lips Movements and Speech Envelope Correlation,
ICIAR09(397-407).
Springer DOI 0907
Correlation; Lip sync; Formants; Estimation; AM,FM BibRef

Petridis, S.[Stavros], Pantic, M.[Maja],
Audiovisual Discrimination Between Speech and Laughter: Why and When Visual Information Might Help,
MultMed(13), No. 2, 2011, pp. 216-234.
IEEE DOI 1103
BibRef

Petridis, S.[Stavros], Pantic, M.[Maja],
Prediction-Based Audiovisual Fusion for Classification of Non-Linguistic Vocalisations,
AffCom(7), No. 1, January 2016, pp. 45-58.
IEEE DOI 1603
BibRef
Earlier:
Fusion of audio and visual cues for laughter detection,
CIVR08(329-338). 0807
Brain models BibRef

Petridis, S.[Stavros], Pantic, M.[Maja], Cohn, J.F.[Jeffrey F.],
Prediction-based classification for audiovisual discrimination between laughter and speech,
FG11(619-626).
IEEE DOI 1103
BibRef

Moustakas, K.[Konstantinos], Tzovaras, D.[Dimitrios], Dybkjaer, L.[Laila], Bernsen, N.[Niels], Aran, O.[Oya],
Using Modality Replacement to Facilitate Communication between Visually and Hearing-Impaired People,
MultMedMag(18), No. 2, April-June 2011, pp. 26-37.
IEEE DOI 1105
BibRef

Tariquzzaman, M., Kim, J.Y.[Jin Young], Na, S.Y.[Seung You], Kim, H.G.[Hyoung-Gook], Har, D.S.[Dong-Soo],
A Visual Signal Reliability for Robust Audio-Visual Speaker Identification,
IEICE(E94-D), No. 10, October 2011, pp. 2052-2055.
WWW Link. 1110
BibRef

Lee, J.S.[Jong-Seok], De Simone, F.[Francesca], Ebrahimi, T.[Touradj],
Efficient video coding based on audio-visual focus of attention,
JVCIR(22), No. 8, November 2011, pp. 704-711.
Elsevier DOI 1110
Video coding; Audio-visual focus of attention; Quality of experience; Audio-visual source localization; H.264/AVC; Flexible macroblock ordering (FMO); Canonical correlation analysis; Subjective quality assessment BibRef

Tiawongsombat, P., Jeong, M.H.[Mun-Ho], Yun, J.S.[Joo-Seop], You, B.J.[Bum-Jae], Oh, S.R.[Sang-Rok],
Robust visual speakingness detection using bi-level HMM,
PR(45), No. 2, February 2012, pp. 783-793.
Elsevier DOI 1110
Visual voice activity detection; Mouth image energy; Speakingness detection; Bi-level HMM BibRef

Noulas, A.[Athanasios], Englebienne, G.[Gwenn], Krose, B.J.A.[Ben J.A.],
Multimodal Speaker Diarization,
PAMI(34), No. 1, January 2012, pp. 79-93.
IEEE DOI 1112
Fuse audio and video. Meetings, news video. BibRef

Blauth, D.A.[Dante A.], Minotto, V.P.[Vicente P.], Jung, C.R.[Claudio R.], Lee, B.[Bowon], Kalker, T.[Ton],
Voice activity detection and speaker localization using audiovisual cues,
PRL(33), No. 4, March 2012, pp. 373-380.
Elsevier DOI 1201
User interfaces; Voice activity detection; Speaker localization; Multimodal analysis; Hidden Markov Models BibRef

Montazzolli, S., Jung, C.R., Gelb, D.[Dan],
Audiovisual voice activity detection using off-the-shelf cameras,
ICIP15(3886-3890)
IEEE DOI 1512
Lip Movement BibRef

Minotto, V.P.[V. Peruffo], Jung, C.R.[C. Rosito], Lee, B.[Bowon],
Simultaneous-Speaker Voice Activity Detection and Localization Using Mid-Fusion of SVM and HMMs,
MultMed(16), No. 4, June 2014, pp. 1032-1044.
IEEE DOI 1407
Accuracy BibRef

Minotto, V.P.[V. Peruffo], Jung, C.R.[C. Rosito], Lee, B.[Bowon],
Multimodal Multi-Channel On-Line Speaker Diarization Using Sensor Fusion Through SVM,
MultMed(17), No. 10, October 2015, pp. 1694-1705.
IEEE DOI 1511
audio streaming BibRef

Nicolaou, M.A.[Mihalis A.], Gunes, H.[Hatice], Pantic, M.[Maja],
Output-associative RVM regression for dimensional and continuous emotion prediction,
IVC(30), No. 3, March 2012, pp. 186-196.
Elsevier DOI 1204
BibRef
And: FG11(16-23).
IEEE DOI 1103
BibRef
And:
Designing frameworks for automatic affect prediction and classification in dimensional space,
Gesture11(20-26).
IEEE DOI 1106
Dimensional and continuous emotion prediction; Facial expressions; Shoulder movements; Audio cues; Output-associative RVM regression BibRef

Nicolaou, M.A.[Mihalis A.], Gunes, H.[Hatice], Pantic, M.[Maja],
Continuous Prediction of Spontaneous Affect from Multiple Cues and Modalities in Valence-Arousal Space,
AffCom(2), No. 2, 2011, pp. 92-105.
IEEE DOI 1202
BibRef
Earlier:
Audio-Visual Classification and Fusion of Spontaneous Affective Data in Likelihood Space,
ICPR10(3695-3699).
IEEE DOI 1008
BibRef

Nicolaou, M.A.[Mihalis A.], Pavlovic, V.[Vladimir], Pantic, M.[Maja],
Dynamic Probabilistic CCA for Analysis of Affective Behavior and Fusion of Continuous Annotations,
PAMI(36), No. 7, July 2014, pp. 1299-1311.
IEEE DOI 1407
BibRef
Earlier:
Dynamic Probabilistic CCA for Analysis of Affective Behaviour,
ECCV12(VII: 98-111).
Springer DOI 1210
Bismuth BibRef

Wang, L.J.[Li-Juan], Qian, Y.[Yao], Scott, M.R., Chen, G.[Gang], Soong, F.K.,
Computer-Assisted Audiovisual Language Learning,
Computer(45), No. 6, June 2012, pp. 38-47.
IEEE DOI 1208
BibRef

Wu, Q.X.[Qiu-Xia], Wang, Z.Y.[Zhi-Yong], Deng, F.Q.[Fei-Qi], Chi, Z., Feng, D.D.[David Dagan],
Realistic Human Action Recognition with Multimodal Feature Selection and Fusion,
SMCS(43), No. 4, 2013, pp. 875-885.
IEEE DOI multimodal fusion; realistic human action recognition 1307
BibRef

Wu, Q.X.[Qiu-Xia], Wang, Z.Y.[Zhi-Yong], Deng, F.Q.[Fei-Qi], Xia, Y.[Yong], Kang, W.X.[Wen-Xiong], Feng, D.D.[David Dagan],
Discriminative two-level feature selection for realistic human action recognition,
JVCIR(24), No. 7, 2013, pp. 1064-1074.
Elsevier DOI 1309
Realistic human action recognition BibRef

Wu, Q.X.[Qiu-Xia], Wang, Z.Y.[Zhi-Yong], Deng, F.Q.[Fei-Qi], Feng, D.D.[David Dagan],
Realistic Human Action Recognition with Audio Context,
DICTA10(288-293).
IEEE DOI 1012
BibRef

Wu, Q.X.[Qiu-Xia], Lu, S.Y.[Shi-Yang], Wang, Z.Y.[Zhi-Yong], Deng, F.Q.[Fei-Qi], Kang, W.X.[Wen-Xiong], Feng, D.D.[David Dagan],
Structure Context of Local Features in Realistic Human Action Recognition,
VECTaR11(1496-1501).
IEEE DOI 1201
BibRef

Mirzaei, M.R.[Mohammad Reza], Ghorshi, S.[Seyed], Mortazavi, M.[Mohammad],
Audio-visual speech recognition techniques in augmented reality environments,
VC(30), No. 3, March 2014, pp. 245-257.
WWW Link. 1403
BibRef

Bredin, H.[Hervé], Roy, A.[Anindya], Le, V.B.[Viet-Bac], Barras, C.[Claude],
Person instance graphs for mono-, cross- and multi-modal person recognition in multimedia data: application to speaker identification in TV broadcast,
MultInfoRetr(3), No. 3, September 2014, pp. 161-175.
Springer DOI 1408
BibRef

Ozasa, Y.[Yuko], Nakano, M.[Mikio], Ariki, Y.[Yasuo], Iwahashi, N.[Naoto],
Discriminating Unknown Objects from Known Objects Using Image and Speech Information,
IEICE(E98-D), No. 3, March 2015, pp. 704-711.
WWW Link. 1504
BibRef
Earlier: A1, A3, A2, A4:
Disambiguation in Unknown Object Detection by Integrating Image and Speech Recognition Confidences,
ACCV12(I:85-96).
Springer DOI 1304
BibRef

Nishimura, H.[Hitoshi], Ozasa, Y.[Yuko], Ariki, Y.[Yasuo], Nakano, M.[Mikio],
Selection of Unknown Objects Specified by Speech Using Models Constructed from Web Images,
ICPR14(477-482)
IEEE DOI 1412
BibRef
Earlier:
Object Recognition by Integrated Information Using Web Images,
ACPR13(657-661)
IEEE DOI 1408
Accuracy. acoustic signal processing BibRef

Ozasa, Y.[Yuko], Enami, N., Ariki, Y.[Yasuo],
Color saliency for object identification,
FCV15(1-5)
IEEE DOI 1506
image colour analysis BibRef

Harte, N., Gillen, E.,
TCD-TIMIT: An Audio-Visual Corpus of Continuous Speech,
MultMed(17), No. 5, May 2015, pp. 603-615.
IEEE DOI 1505
Cameras BibRef

Katsaggelos, A.K., Bahaadini, S., Molina, R.,
Audiovisual Fusion: Challenges and New Approaches,
PIEEE(103), No. 9, September 2015, pp. 1635-1653.
IEEE DOI 1509
Data integration BibRef

Mezai, L., Hachouf, F.,
Score-Level Fusion of Face and Voice Using Particle Swarm Optimization and Belief Functions,
HMS(45), No. 6, December 2015, pp. 761-772.
IEEE DOI 1512
Bayes methods BibRef

Wu, P., Liu, H., Li, X., Fan, T., Zhang, X.,
A Novel Lip Descriptor for Audio-Visual Keyword Spotting Based on Adaptive Decision Fusion,
MultMed(18), No. 3, March 2016, pp. 326-338.
IEEE DOI 1603
Acoustics BibRef

Dilpazir, H.[Hammad], Muhammad, Z.[Zia], Minhas, Q.[Qurratulain], Ahmed, F.[Faheem], Malik, H.[Hafiz], Mahmood, H.[Hasan],
Multivariate mutual information for audio video fusion,
SIViP(10), No. 7, October 2016, pp. 1265-1272.
Springer DOI 1609
BibRef


Le, N.[Nam], Heili, A.[Alexandre], Wu, D.[Di], Odobez, J.M.[Jean-Marc],
Temporally subsampled detection for accurate and efficient face tracking and diarization,
ICPR16(1792-1797)
IEEE DOI 1705
Detectors, Face, Face detection, Image color analysis, Motion pictures, TV, Tracking BibRef

Ahn, J.[Juhyun], Kim, Y.J.[Yong-Joong], Kim, D.J.[Dai-Jin],
Patch-based visual microphone for improving quality of sound,
ICPR16(3927-3932)
IEEE DOI 1705
Cameras, Microphones, Noise level, Signal to noise ratio, Speech, Vibrations, Visualization BibRef

Chung, J.S.[Joon Son], Zisserman, A.[Andrew],
Out of Time: Automated Lip Sync in the Wild,
LipRead16(II: 251-263).
Springer DOI 1704
BibRef

Miao, C.L.[Chang-Long], Feng, J.W.[Jian-Wei], Ding, Y.[Yu], Yang, Y.[Yu], Chen, X.G.[Xiao-Gang], Ji, X.Y.[Xiang-Yang],
Unsupervised person clustering in videos with cross-modal communication,
VCIP16(1-4)
IEEE DOI 1701
Feature extraction. Audio-visual. BibRef

Hu, D.[Di], Li, X.L.[Xue-Long], Lu, X.Q.[Xiao-Qiang],
Temporal Multimodal Learning in Audiovisual Speech Recognition,
CVPR16(3574-3582)
IEEE DOI 1612
BibRef

Liu, H.[Hong], Fan, T.[Ting], Wu, P.P.[Ping-Ping],
Audio-visual Keyword Spotting for Mandarin Based on Discriminative Local Spatial-Temporal Descriptors,
ICPR14(785-790)
IEEE DOI 1412
Acoustics BibRef

Ringeval, F., Sonderegger, A., Sauer, J., Lalanne, D.,
Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions,
FG13(1-8)
IEEE DOI 1309
natural languages. Collaborative and affective interactions in French. BibRef

Aubrey, A.J.[Andrew J.], Cunningham, D.W.[Douglas W.], Marshall, D.[David], Rosin, P.L.[Paul L.], Shin, A.[Ah_Young],
The Face Speaks: Contextual and Temporal Sensitivity to Backchannel Responses,
FaceCVHum12(II:248-259).
Springer DOI 1304
BibRef

Tawari, A.[Ashish], Trivedi, M.[Mohan],
Audio-visual data association for face expression analysis,
ICPR12(1120-1123).
WWW Link. 1302
BibRef

Taj, M.[Murtaza], Cavallaro, A.[Andrea],
Interaction recognition in wide areas using audiovisual sensors,
ICIP12(1113-1116).
IEEE DOI 1302
BibRef

Giorgolo, G.[Gianluca],
Integration of Gesture and Verbal Language: A Formal Semantics Approach,
GW11(216-227).
Springer DOI 1211
BibRef

Le, Q.A.[Quoc Anh], Pelachaud, C.[Catherine],
Generating Co-speech Gestures for the Humanoid Robot NAO through BML,
GW11(228-237).
Springer DOI 1211
BibRef

Saeed, A.[Anwar], Al-Hamadi, A.[Ayoub], Heuer, M.[Michael],
Speaker Tracking Using Multi-modal Fusion Framework,
ICISP12(539-546).
Springer DOI 1208
BibRef

Navarathna, R., Dean, D., Sridharan, S.[Sridha], Fookes, C.[Clinton], Lucey, P.,
Visual Voice Activity Detection Using Frontal versus Profile Views,
DICTA11(134-139).
IEEE DOI 1205
BibRef

Komai, Y.[Yuto], Ariki, Y.[Yasuo], Takiguchi, T.[Tetsuya],
Audio-Visual Speech Recognition Based on AAM Parameter and Phoneme Analysis of Visual Feature,
PSIVT11(I: 97-108).
Springer DOI 1111
BibRef

Zheng, H.M.[Hao-Main], Wang, M.[Meng], Li, Z.[Zhu],
Audio-visual speaker identification with multi-view distance metric learning,
ICIP10(4561-4564).
IEEE DOI 1009
BibRef

Krishnan, R.K.[Ravi-Kiran], Sarkar, S.[Sudeep],
Similarity Measure between Two Gestures Using Triplets,
HAU3D13(506-513)
IEEE DOI 1309
BibRef

Krishnan, R.K.[Ravi-Kiran], Sarkar, S.[Sudeep],
Detecting Group Turn Patterns in Conversations Using Audio-Video Change Scale-Space,
ICPR10(137-140).
IEEE DOI 1008
BibRef

Aran, O.[Oya], Gatica-Perez, D.[Daniel],
Fusing Audio-Visual Nonverbal Cues to Detect Dominant People in Group Conversations,
ICPR10(3687-3690).
IEEE DOI 1008
BibRef

Niese, R.[Robert], Al-Hamadi, A.[Ayoub], Michaelis, B.[Bernd],
A New Multi-camera Based Facial Expression Analysis Concept,
ICIAR12(II: 64-71).
Springer DOI 1206
BibRef

Steer, M.A.[Michael Alan], Al-Hamadi, A.[Ayoub], Michaelis, B.[Bernd],
Audio-Visual Data Fusion Using a Particle Filter in the Application of Face Recognition,
ICPR10(4392-4395).
IEEE DOI 1008
BibRef

Roy, A.[Anindya], Marcel, S.[Sebastien],
Crossmodal Matching of Speakers Using Lip and Voice Features in Temporally Non-overlapping Audio and Video Streams,
ICPR10(4504-4507).
IEEE DOI 1008
BibRef

Cour, T.[Timothee], Sapp, B.[Benjamin], Nagle, A.[Akash], Taskar, B.[Ben],
Talking pictures: Temporal grouping and dialog-supervised person recognition,
CVPR10(1014-1021).
IEEE DOI 1006
BibRef

Wu, G.Y.[Guan-Yong], Zhu, J.[Jie], Xu, H.H.[Hai-Hua],
A hybrid visual feature extraction method for audio-visual speech recognition,
ICIP09(1829-1832).
IEEE DOI 0911
BibRef

Ceballos, A.[Alexánder], Gómez, J.[Juan], Prieto, F.[Flavio], Redarce, T.[Tanneguy],
Robot Command Interface Using an Audio-Visual Speech Recognition System,
CIARP09(869-876).
Springer DOI 0911
BibRef

Cifani, S.[Simone], Abel, A.[Andrew], Hussain, A.[Amir], Squartini, S.[Stefano], Piazza, F.[Francesco],
An Investigation into Audiovisual Speech Correlation in Reverberant Noisy Environments,
COST08(331-343).
Springer DOI 0810
BibRef

Fanelli, G.[Gabriele], Gall, J.[Jürgen], Van Gool, L.J.[Luc J.],
Hough transform-based mouth localization for audio-visual speech recognition,
BMVC09(xx-yy).
PDF File. 0909
BibRef

Cadavid, S.[Steven], Abdel-Mottaleb, M.[Mohamed], Messinger, D.S.[Daniel S.], Mahoor, M.H.[Mohammad H.], Bahrick, L.E.[Lorraine E.],
Detecting local audio-visual synchrony in monologues utilizing vocal pitch and facial landmark trajectories,
BMVC09(xx-yy).
PDF File. 0909
BibRef

Lee, J.S.[Jong-Seok], Ebrahimi, T.[Touradj],
Two-Level Bimodal Association for Audio-Visual Speech Recognition,
ACIVS09(133-144).
Springer DOI 0909
BibRef

Marchegiani, M.L.[Maria Letizia], Pirri, F.[Fiora], Pizzoli, M.[Matia],
Multimodal Speaker Recognition in a Conversation Scenario,
CVS09(11-20).
Springer DOI 0910
BibRef

Kumar, K.[Kshitiz], Navratil, J.[Jiri], Marcheret, E.[Etienne], Libal, V.[Vit], Ramaswamy, G.[Ganesh], Potamianos, G.[Gerasimos],
Audio-visual speech synchronization detection using a bimodal linear prediction model,
Biometrics09(53-59).
IEEE DOI 0906
BibRef

Karam, W.[Walid], Mokbel, C.[Chafic], Greige, H.[Hanna], Chollet, G.[Gérard],
Audio-Visual Identity Verification and Robustness to Imposture,
ICB09(796-805).
Springer DOI 0906
BibRef

Rebillat, M.[Marc], Katz, B.F.G.[Brian F.G.], Corteel, E.[Etienne],
SMART-I2: Spatial Multi-user Audio-visual Real-time interactive interface, A broadcast application context,
3DTV09(1-4).
IEEE DOI 0905
BibRef

Eisenstein, J.[Jacob],
Gesture in Automatic Discourse Processing,
CSAIL-2008-027, May 2008. BibRef 0805 Ph.D.Thesis, MIT, May 2008.
WWW Link. BibRef

Das, A.[Amitava], Manyam, O.K.[Ohil K.], Tapaswi, M.[Makarand],
Audio-Visual Person Authentication with Multiple Visualized-Speech Features and Multiple Face Profiles,
ICCVGIP08(39-46).
IEEE DOI 0812
BibRef

Cao, Y.[Yu], Baang, S.[Sung], Liu, S.H.[Shih-Hsi], Li, M.[Ming], Hu, S.Q.[San-Qing],
Audio-visual event classification via spatial-temporal-audio words,
ICPR08(1-5).
IEEE DOI 0812
BibRef

Terry, L.H.[Louis H.], Shiell, D.J.[Derek J.], Katsaggelos, A.K.[Aggelos K.],
Feature space video stream consistency estimation for dynamic stream weighting in audio-visual speech recognition,
ICIP08(1316-1319).
IEEE DOI 0810
BibRef

Naseem, I.[Imran], Mian, A.S.[Ajmal S.],
User Verification by Combining Speech and Face Biometrics in Video,
ISVC08(II: 482-492).
Springer DOI 0812
BibRef

Ettinger, E.[Evan], Freund, Y.[Yoav],
Coordinate-free calibration of an acoustically driven camera pointing system,
ICDSC08(1-9).
IEEE DOI 0809
BibRef

Hung, H.[Hayley], Friedland, G.[Gerald],
Towards Audio-Visual On-line Diarization Of Participants In Group Meetings,
M2SFA208(xx-yy). 0810
BibRef

Liu, Y.[Yuyu], Sato, Y.[Yoichi],
Finding Speaker Face Region by Audiovisual Correlation,
M2SFA208(xx-yy). 0810
BibRef

Kelly, D.[Damien], Pitie, F.[Francois], Kokaram, A.[Anil], Boland, F.[Frank],
A Comparative Error Analysis of Audio-Visual Source Localization,
M2SFA208(xx-yy). 0810
BibRef

Katsarakis, N.[Nikos], Talantzis, F.[Fotios], Pnevmatikakis, A.[Aristodemos], Polymenakos, L.[Lazaros],
The AIT 3D Audio / Visual Person Tracker for CLEAR 2007,
MTPH07(xx-yy).
Springer DOI 0705
See also AIT 2D Face Detection and Tracking System for CLEAR 2007, The. See also AIT Multimodal Person Identification System for CLEAR 2007, The. BibRef

Pachoud, S., Gong, S., Cavallaro, A.,
Video Augmentation for Improving Audio Speech Recognition under Noise,
BMVC08(xx-yy).
PDF File. 0809
BibRef

Horii, Y.[Yu], Kawashima, H.[Hiroaki], Matsuyama, T.[Takashi],
Speaker detection using the timing structure of lip motion and sound,
CVPR4HB08(1-8).
IEEE DOI 0806
BibRef

Rúa, E.A.[Enrique Argones], Castro, J.L.A.[José Luis Alba], Mateo, C.G.[Carmen García],
Quality-Based Score Normalization for Audiovisual Person Authentication,
ICIAR08(xx-yy).
Springer DOI 0806
BibRef

Wang, L.[Lei], Tjondrongoro, D.[Dian], Liu, Y.[Yuee],
Clustering and Visualizing Audio-Visual Dataset on Mobile Devices in a Topic-Oriented Manner,
Visual07(310-321).
Springer DOI 0706
BibRef

Zajdel, W., Krijnders, J.D., Andringa, T., Gavrila, D.M.,
CASSANDRA: audio-video sensor fusion for aggression detection,
AVSBS07(200-205).
IEEE DOI 0709
BibRef

Stødle, D.[Daniel], Bjørndalen, J.M.[John Markus], Anshus, O.J.[Otto J.],
A System for Hybrid Vision- and Sound-Based Interaction with Distal and Proximal Targets on Wall-Sized, High-Resolution Tiled Displays,
CVHCI07(59-68).
Springer DOI 0710
BibRef

van Hengel, P.W.J., Andringa, T.C.,
Verbal aggression detection in complex social environments,
AVSBS07(15-20).
IEEE DOI 0709
BibRef

Ikeda, O.[Osamu],
Detection of a Speaker in Video by Combined Analysis of Speech Sound and Mouth Movement,
ISVC07(II: 602-610).
Springer DOI 0711
BibRef

Das, A.[Amitava],
Audio Visual Person Authentication by Multiple Nearest Neighbor Classifiers,
ICB07(1114-1123).
Springer DOI 0708
BibRef

Xin, L.[Le], Tao, J.H.[Jian-Hua], Tan, T.N.[Tie-Niu],
Dynamic Audio-Visual Mapping using Fused Hidden Markov Model Inversion Method,
ICIP07(III: 293-296).
IEEE DOI 0709
BibRef

Barzelay, Z.[Zohar], Schechner, Y.Y.[Yoav Y.],
Harmony in Motion,
CVPR07(1-8).
IEEE DOI 0706
Audio-visual analysis. BibRef

O'Donovan, A.[Adam], Duraiswami, R.[Ramani], Neumann, J.[Jan],
Microphone Arrays as Generalized Cameras for Integrated Audio Visual Processing,
CVPR07(1-8).
IEEE DOI 0706
BibRef

Abbas, J.[Jehanzeb], Dagli, C.K.[Charlie K.], Huang, T.S.[Thomas S.],
A Multimodality Framework for Creating Speaker/Non-Speaker Profile Databases for Real-World Video,
SLAM07(1-8).
IEEE DOI 0706
BibRef

Kushal, A.[Akash], Rahurkar, M.[Mandar], Fei-Fei, L.[Li], Ponce, J.[Jean], Huang, T.[Thomas],
Audio-Visual Speaker Localization Using Graphical Models,
ICPR06(I: 291-294).
IEEE DOI 0609
BibRef

Tsuji, T.[Tokuo], Yamamoto, K.[Kenkichi], Ishii, I.[Idaku],
Real-time Sound Source Localization Based on Audiovisual Frequency Integration,
ICPR06(IV: 322-325).
IEEE DOI 0609
BibRef

Monaci, G.[Gianluca], Vandergheynst, P.[Pierre],
Audiovisual Gestalts,
PercOrg06(200).
IEEE DOI 0609
BibRef

Zhu, Z.G.[Zhi-Gang], Li, W.H.[Wei-Hong], Molina, E.[Edgardo], Wolberg, G.[George],
LDV Sensing and Processing for Remote Hearing in a Multimodal Surveillance System,
MSCSAS07(1-2).
IEEE DOI 0706
BibRef

Zhu, Z.G.[Zhi-Gang], Li, W.H.[Wei-Hong], Wolberg, G.,
Integrating LDV Audio and IR Video for Remote Multimodal Surveillance,
OTCBVS05(III: 10-10).
IEEE DOI 0507
BibRef

Wu, Z.Y.[Zhi-Yong], Cai, L.H.[Lian-Hong], Meng, H.[Helen],
Multi-level Fusion of Audio and Visual Features for Speaker Identification,
ICB06(493-499).
Springer DOI 0601
BibRef

Yang, P.[Pu], Yang, Y.C.[Ying-Chun], Wu, Z.H.[Zhao-Hui],
Exploiting Glottal Information in Speaker Recognition Using Parallel GMMs,
AVBPA05(804).
Springer DOI 0509
BibRef

Lei, Z.C.[Zhen-Chun],
Combining the Likelihood and the Kullback-Leibler Distance in Estimating the Universal Background Model for Speaker Verification Using SVM,
ICPR10(4553-4556).
IEEE DOI 1008
BibRef

Lei, Z.C.[Zhen-Chun], Yang, Y.C.[Ying-Chun], Wu, Z.H.[Zhao-Hui],
An UBM-Based Reference Space for Speaker Recognition,
ICPR06(IV: 318-321).
IEEE DOI 0609
BibRef
Earlier:
Constructing the Discriminative Kernels Using GMM for Text-Independent Speaker Identification,
IWBRS05(165).
Springer DOI 0601
BibRef
And:
Speaker Identification Using the VQ-Based Discriminative Kernels,
AVBPA05(797).
Springer DOI 0509
BibRef

Li, D.D.[Dong-Dong], Yang, Y.C.[Ying-Chun], Wu, Z.H.[Zhao-Hui],
Dynamic Bayesian Networks for Audio-Visual Speaker Recognition,
ICB06(539-545).
Springer DOI 0601
BibRef

Megherbi, N., Ambellouis, S., Colot, O., Cabestaing, F.,
Data Association in Multi-Target Tracking Using Belief Theory: Handling Target Emergence and Disappearance Issue,
AVSBS05(517-521).
IEEE DOI 0602
BibRef

Megherbi, N., Ambellouis, S., Colot, O., Cabestaing, F.,
Joint audio-video people tracking using belief theory,
AVSBS05(135-140).
IEEE DOI 0602
BibRef

Fox, N.A.[Niall A.], O'Mullane, B.A.[Brian A.], Reilly, R.B.[Richard B.],
VALID: A New Practical Audio-Visual Database, and Comparative Results,
AVBPA05(777).
Springer DOI
WWW Link. 0509
Dataset, Faces. BibRef

Sharma, P.[Prag], Reilly, R.B.[Richard B.],
The UCD Colour Face Image Database for Face Detection,
Online1998.
WWW Link. Dataset, Faces. BibRef 9800

Fox, N.A.[Niall A.], O'Mullane, B.A.[Brian A.], Reilly, R.B.[Richard B.],
Audio-Visual Speaker Identification via Adaptive Fusion Using Reliability Estimates of Both Modalities,
AVBPA05(787).
Springer DOI 0509
BibRef

Li, X.[Xin], Sun, L.[Luo], Tao, L.M.[Lin-Mi], Xu, G.Y.[Guang-You], Jia, Y.[Ying],
A Speaker Tracking Algorithm Based on Audio and Visual Information Fusion Using Particle Filter,
ICIAR04(II: 572-580).
Springer DOI 0409
BibRef

Zhang, D., Ghobakhlou, A., Kasabov, N.,
An adaptive model of person identification combining speech and image information,
ICARCV04(I: 413-418).
IEEE DOI 0412
BibRef

Kratt, J.[Jan], Metze, F.[Florian], Stiefelhagen, R.[Rainer], Waibel, A.[Alex],
Large Vocabulary Audio-Visual Speech Recognition Using the Janus Speech Recognition Toolkit,
DAGM04(488-495).
Springer DOI 0505
BibRef

Hanafiah, Z.M., Yamazaki, C., Nakamura, A., Kuno, Y.,
Understanding inexplicit utterances using vision for helper robots,
ICPR04(IV: 925-928).
IEEE DOI 0409
BibRef

Lange, C.[Christian], Hermann, T.[Thomas], Ritter, H.[Helge],
Holistic Body Tracking for Gestural Interfaces,
GW03(132-139).
Springer DOI 0405
BibRef

Hermann, T.[Thomas], Henning, T.[Thomas], Ritter, H.[Helge],
Gesture Desk an Integrated Multi-modal Gestural Workplace for Sonification,
GW03(369-379).
Springer DOI 0405
BibRef

Merola, G.[Giorgio],
The Effects of the Gesture Viewpoint on the Students' Memory of Words and Stories,
GW07(272-281).
Springer DOI 0705
BibRef

Merola, G.[Giorgio], Poggi, I.[Isabella],
Multimodality and Gestures in the Teacher's Communication,
GW03(101-111).
Springer DOI 0405
BibRef

Kranstedt, A.[Alfred], Kühnlein, P.[Peter], Wachsmuth, I.[Ipke],
Deixis in Multimodal Human Computer Interaction: An Interdisciplinary Approach,
GW03(112-123).
Springer DOI 0405
BibRef

Saeed, K.[Khalid], Kozlowski, M.[Marcin],
An Image-Based System for Spoken-Letter Recognition,
CAIP03(494-502).
Springer DOI 0311
BibRef

Ho, P.[Purdy], Armington, J.[John],
A Dual-Factor Authentication System Featuring Speaker Verification and Token Technology,
AVBPA03(128-136).
Springer DOI 0310
BibRef

Fox, N.A.[Niall A.], Reilly, R.B.[Richard B.],
Audio-Visual Speaker Identification Based on the Use of Dynamic Audio and Visual Features,
AVBPA03(743-751).
Springer DOI 0310
BibRef

Czyz, J.[Jacek], Bengio, S.[Samy], Marcel, C.[Christine], Vandendorpe, L.[Luc],
Scalability Analysis of Audio-Visual Person Identity Verification,
AVBPA03(752-760).
Springer DOI 0310
BibRef

Bengio, S.[Samy],
Multimodal Authentication Using Asynchronous HMMs,
AVBPA03(770-777).
Springer DOI 0310
BibRef

Lucey, S.[Simon], Chen, T.H.[Tsu-Han],
Improved Audio-Visual Speaker Recognition via the Use of a Hybrid Combination Strategy,
AVBPA03(929-936).
Springer DOI 0310
BibRef

Krahnstoever, N., Schapira, E., Kettebeko, S., Sharma, R.,
Multimodal human-computer interaction for crisis management systems,
WACV02(203-207).
WWW Link. 0303
BibRef

Kettebekov, S., Yeasin, M., Sharma, R.,
Improving continuous gesture recognition with spoken prosody,
CVPR03(I: 565-570).
IEEE DOI 0307
BibRef

Poh, N.[Norman], Korczak, J.[Jerzy],
Hybrid Biometric Person Authentication Using Face and Voice Features,
AVBPA01(348).
Springer DOI 0310
BibRef

Nakamura, S.[Satoshi],
Fusion of Audio-Visual Information for Integrated Speech Processing,
AVBPA01(127).
Springer DOI 0310
BibRef

Sullivan, K.P.H.[Kirk P.H.], Pelecanos, J.[Jason],
Revisiting Carl Bildt's Impostor: Would a Speaker Verification System Foil Him?,
AVBPA01(144).
Springer DOI 0310
BibRef

Geiger, G.[Gadi], Ezzat, T.[Tony], Poggio, T.[Tomaso],
Perceptual Evaluation of Video-Realistic Speech,
MIT AIMAIM-2003-003, February 28, 2003.
WWW Link. We describe here our scheme and its application to a new video-realistic (potentially indistinguishable from real recorded video) visual-speech animation system, called Mary 101. 0306
BibRef

Blake, A., Gangnet, M., Perez, P., Vermaak, J.,
Integrated tracking with vision and sound,
CIAP01(354-357).
WWW Link. 0210
BibRef

Zhang, X.Z.[Xiao-Zheng], Merserratt, R.M., Clements, M.,
Bimodal fusion in audio-visual speech recognition,
ICIP02(I: 964-967).
IEEE DOI 0210
BibRef

Graf, H.P., Cosatto, E., Strom, V., Huang, F.J.[Fu Jie],
Visual prosody: facial movements accompanying speech,
AFGR02(381-386).
IEEE DOI 0206
BibRef

Qi, Y.[Yuan],
Learning Algorithms for Audio and Video Processing: Independent Component Analysis and Support Vector Machine Based Approaches,
UMD--TR4174, August 2000.
WWW Link. BibRef 0008

Nankaku, Y., Tokuda, K.[Keiichi], Kitamura, T.[Tadashi],
Normalized Training for HMM-based Visual Speech Recognition,
ICIP00(Vol III: 234-237).
IEEE DOI 0008
BibRef

Zhang, Y.[You], Levinson, S.[Stephen], Huang, T.S.[Thomas S.],
Speaker Independent Audio-Visual Speech Recognition,
ICME00(TP8). 0007
BibRef

Pan, H.[Hao], Huang, T.S.[Thomas S.],
A New Approach to Integrate Audio and Visual Features of Speech,
ICME00(TP8). 0007
BibRef

Potamianos, G.[Gerasimos], Verma, A.[Ashish], Neti, C.[Chalapathy], Iyengar, G.[Giri], Basu, S.[Sankar],
A Cascade Image Transform for Speaker Independent Automatic Speech Reading,
ICME00(TP8). 0007
BibRef

Pan, H., Liang, Z.P., Huang, T.S.,
Fusing Audio and Visual Features of Speech,
ICIP00(Vol III: 214-217).
IEEE DOI 0008
BibRef

Faruquie, T.A., Majumdar, A., Rajput, N., Subramaniam, L.V.,
Large Vocabulary Audio-visual Speech Recognition Using Active Shape Models,
ICPR00(Vol III: 106-109).
IEEE DOI 0009
BibRef

Yu, K., Jiang, X., Bunke, H.,
Combining Acoustic and Visual Classifiers for the Recognition of Spoken Sentences,
ICPR00(Vol II: 491-494).
IEEE DOI 0009
BibRef

Nam, J., Alghoniemy, M., Tewfik, A.H.[Ahmed H.],
Audio-visual content-based violent scene characterization,
ICIP98(I: 353-357).
IEEE DOI 9810
BibRef

Luettin, J.[Juergen], Dupont, S.[Stéphane],
Continuous Audio-Visual Speech Recognition,
ECCV98(II: 657).
Springer DOI BibRef 9800

Yang, J.[Jie], Xiao, J.[Jing], Ritter, M.[Max],
Automatic Selection of Visemes for Image-based Visual Speech Synthesis,
ICME00(TP8). 0007
BibRef

Sharma, R.[Rajeev], Cai, J.Y.[Jiong-Yu], Chakravarthy, S.[Srivatsan], Poddar, I.[Indrajit], Sethi, Y.[Yogesh],
Exploiting Speech/Gesture Co-occurrence for Improving Continuous Gesture Recognition in Weather Narration,
AFGR00(422-427).
IEEE DOI 0003
BibRef

Yamamoto, E., Nakamura, S., Shikano, K.,
Lip Movement Synthesis from Speech Based on Hidden Markov Models,
AFGR98(154-159).
IEEE DOI BibRef 9800

Roy, D., Pentland, A.P.,
Automatic spoken affect classification and analysis,
AFGR96(363-367).
IEEE DOI 9610
BibRef

Petajan, E.D.[Eric D.],
An Architecture for Automatic Lipreading to Enhance Speech Recognition,
CVPR85(40-47). (AT&T Bell Labs) Application, Lipreading. A real hardware implementation of a system that tracks the nostrils and mouth. Improvement over use of acoustic data alone. BibRef 8500

Chapter on Face Recognition, Detection, Tracking, Gesture Recognition, Fingerprints, Biometrics continues in
Mouth Location, Lip Location, Detection .


Last update:Nov 11, 2017 at 13:31:57