22.3.4.1.1 Combined Audio Visual Speaker Tracking

Chapter Contents (Back)
Real Time Vision. Audio-Visual Speech. Audio-Visual Tracking. Speaker Tracking.

Zotkin, D.N.[Dmitry N.], Duraiswami, R.[Ramani], Davis, L.S.[Larry S.],
Joint Audio-Visual Tracking Using Particle Filters,
JASP(2002), No. 11, November 2002, pp. 1154.
WWW Link. 0304
BibRef

Garg, A.[Ashutosh], Pavlovic, V.[Vladimir], Rehg, J.M.[James M.],
Boosted learning in dynamic Bayesian networks for multimodal speaker detection,
PIEEE(91), No. 9, September 2003, pp. 1355-1369.
IEEE DOI 0309
BibRef
Earlier:
Audio-visual speaker detection using dynamic Bayesian networks,
AFGR00(384-390).
IEEE DOI 0003
BibRef

Pavlovic, V.[Vladimir], Garg, A.[Ashutosh], Rehg, J.M.[James M.], Huang, T.S.[Thomas S.],
Multimodal Speaker Detection using Error Feedback Dynamic Bayesian Networks,
CVPR00(II: 34-41).
IEEE DOI 0005
BibRef

Pavlovic, V., Berry, G., and Huang, T.S.,
Integration of Audio/Visual Information for Use in Human-Computer Intelligent Interaction,
ICIP97(I: 121-124).
IEEE DOI BibRef 9700

Choudhury, T.[Tanzeem], Rehg, J.M., Pavlovic, V., Pentland, A.P.,
Boosting and structure learning in dynamic Bayesian networks for audio-visual speaker detection,
ICPR02(III: 789-794).
IEEE DOI 0211
BibRef

Pavlovic, V.[Vladimir],
Multimodal tracking and classification of audio-visual features,
ICIP98(I: 343-347).
IEEE DOI 9810
BibRef

Rehg, J.M.[James M.], Murphy, K.P.[Kevin P.], Fieguth, P.W.[Paul W.],
Vision-Based Speaker Detection Using Bayesian Networks,
CVPR99(II: 110-116).
IEEE DOI More particuarly the one talking. BibRef 9900

Talantzis, F., Pnevmatikakis, A., Constantinides, A.G.,
Audio-Visual Active Speaker Tracking in Cluttered Indoors Environments,
SMC-B(39), No. 1, February 2009, pp. 7-15.
IEEE DOI 0902
BibRef
Earlier: SMC-B(38), No. 3, June 2008, pp. 799-807.
IEEE DOI 0711
The top one is the special issue, it was published early in the other issue. BibRef

Qian, X., Brutti, A., Lanz, O., Omologo, M., Cavallaro, A.,
Multi-Speaker Tracking From an Audio-Visual Sensing Device,
MultMed(21), No. 10, October 2019, pp. 2576-2588.
IEEE DOI 1910
image colour analysis, object detection, object tracking, particle filtering (numerical methods), sensor fusion, particle filter BibRef

Ban, Y.T.[Yu-Tong], Alameda-Pineda, X.[Xavier], Girin, L.[Laurent], Horaud, R.[Radu],
Variational Bayesian Inference for Audio-Visual Tracking of Multiple Speakers,
PAMI(43), No. 5, May 2021, pp. 1761-1776.
IEEE DOI 2104
BibRef
Earlier: A1, A3, A2, A4:
Exploiting the Complementarity of Audio and Visual Data in Multi-speaker Tracking,
CVAVM17(446-454)
IEEE DOI 1802
Visualization, Target tracking, Acoustics, Bayes methods, Cameras, Object tracking, Direction-of-arrival estimation, speaker diarization. Cameras, Detectors, Kalman filters, Microphones, Robots, Tracking, Visualization BibRef

Qian, X.Y.[Xin-Yuan], Brutti, A.[Alessio], Lanz, O.[Oswald], Omologo, M.[Maurizio], Cavallaro, A.[Andrea],
Audio-Visual Tracking of Concurrent Speakers,
MultMed(24), 2022, pp. 942-954.
IEEE DOI 2202
Target tracking, Acoustics, Faces, Cameras, Visualization, Image color analysis, 3D multiple target tracking, particle filter BibRef

Hu, D.[Di], Wei, Y.[Yake], Qian, R.[Rui], Lin, W.Y.[Wei-Yao], Song, R.H.[Rui-Hua], Wen, J.R.[Ji-Rong],
Class-Aware Sounding Objects Localization via Audiovisual Correspondence,
PAMI(44), No. 12, December 2022, pp. 9844-9859.
IEEE DOI 2212
Where did the sound come from. Location awareness, Visualization, Task analysis, Annotations, Semantics, Dictionaries, Videos, distribution alignment BibRef

Wang, H.[Hao], Zha, Z.J.[Zheng-Jun], Li, L.[Liang], Chen, X.J.[Xue-Jin], Luo, J.B.[Jie-Bo],
Semantic and Relation Modulation for Audio-Visual Event Localization,
PAMI(45), No. 6, June 2023, pp. 7711-7725.
IEEE DOI 2305
Visualization, Location awareness, Correlation, Proposals, Semantics, Task analysis, Modulation, Audio-visual learning, normalization BibRef

Garg, R.[Rishabh], Gao, R.H.[Ruo-Han], Grauman, K.[Kristen],
Visually-Guided Audio Spatialization in Video with Geometry-Aware Multi-task Learning,
IJCV(131), No. 10, October 2023, pp. 2723-2737.
Springer DOI 2309
BibRef

Wang, J.X.[Jia-Xiang], Li, C.L.[Cheng-Long], Zheng, A.[Aihua], Tang, J.[Jin], Luo, B.[Bin],
Looking and Hearing Into Details: Dual-Enhanced Siamese Adversarial Network for Audio-Visual Matching,
MultMed(25), 2023, pp. 7505-7516.
IEEE DOI 2311
BibRef

Liu, C.[Chen], Li, P.[Peike], Zhang, H.[Hu], Li, L.C.[Lin-Cheng], Huang, Z.[Zi], Wang, D.D.[Da-Dong], Yu, X.[Xin],
BAVS: Bootstrapping Audio-Visual Segmentation by Integrating Foundation Knowledge,
MultMed(26), 2024, pp. 10015-10028.
IEEE DOI 2410
Visualization, Semantics, Location awareness, Background noise, Task analysis, White noise, Transformers, and audio-visual hierarchical trees BibRef

Traa, J., Smaragdis, P.,
A Wrapped Kalman Filter for Azimuthal Speaker Tracking,
SPLetters(20), No. 12, 2013, pp. 1257-1260.
IEEE DOI 1311
Approximation methods BibRef


Wang, X.[Xizi], Cheng, F.[Feng], Bertasius, G.[Gedas],
LoCoNet: Long-Short Context Network for Active Speaker Detection,
CVPR24(18462-18472)
IEEE DOI Code:
WWW Link. 2410
Convolutional codes, Visualization, Benchmark testing, Robustness, Convolutional neural networks BibRef

Nugroho, M.A.[Muhammad Adi], Woo, S.[Sangmin], Lee, S.[Sumin], Kim, C.[Changick],
Audio-Visual Glance Network for Efficient Video Recognition,
ICCV23(10116-10125)
IEEE DOI 2401
BibRef

Liu, Y.[Yang], Tan, Y.[Ying], Lan, H.Y.[Hao-Yuan],
Self-Supervised Contrastive Learning for Audio-Visual Action Recognition,
ICIP23(1000-1004)
IEEE DOI 2312
BibRef

Min, K.[Kyle], Roy, S.[Sourya], Tripathi, S.[Subarna], Guha, T.[Tanaya], Majumdar, S.[Somdeb],
Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection,
ECCV22(XXXV:371-387).
Springer DOI 2211
BibRef

Majumder, S.[Sagnik], Al-Halah, Z.[Ziad], Grauman, K.[Kristen],
Move2Hear: Active Audio-Visual Source Separation,
ICCV21(275-285)
IEEE DOI 2203
Solid modeling, Source separation, Robot vision systems, Reinforcement learning, Ear, Vision + other modalities, Vision for robotics and autonomous vehicles BibRef

Majumder, S.[Sagnik], Grauman, K.[Kristen],
Active Audio-Visual Separation of Dynamic Sound Sources,
ECCV22(XXIX:551-569).
Springer DOI 2211
BibRef

Alcázar, J.L.[Juan León], Heilbron, F.C.[Fabian Caba], Thabet, A.K.[Ali K.], Ghanem, B.[Bernard],
MAAS: Multi-modal Assignation for Active Speaker Detection,
ICCV21(265-274)
IEEE DOI 2203
Visualization, Benchmark testing, Feature extraction, Data structures, Task analysis, Vision + other modalities, Video analysis and understanding BibRef

Köpüklü, O.[Okan], Taseska, M.[Maja], Rigoll, G.[Gerhard],
How to Design a Three-Stage Architecture for Audio-Visual Active Speaker Detection in the Wild,
ICCV21(1173-1183)
IEEE DOI 2203
Codes, Computational modeling, Pipelines, Computer architecture, Encoding, Task analysis, Vision + other modalities, Vision applications and systems BibRef

Wu, Y.[Yu], Yang, Y.[Yi],
Exploring Heterogeneous Clues for Weakly-Supervised Audio-Visual Video Parsing,
CVPR21(1326-1335)
IEEE DOI 2111
Training, Visualization, Target tracking, Annotations, Predictive models BibRef

Liu, H.[Hong], Sun, Y.H.[Yong-Heng], Li, Y.D.[Yi-Di], Yang, B.[Bing],
3D Audio-Visual Speaker Tracking with A Novel Particle Filter,
ICPR21(7343-7348)
IEEE DOI 2105
BibRef
Earlier: A1, A3, A4, Only:
3D Audio-Visual Speaker Tracking with A Two-Layer Particle Filter,
ICIP19(1955-1959)
IEEE DOI 1910
Visualization, Histograms, Head, Image color analysis, Sensor phenomena and characterization, compact platform. 3D speaker tracking, audio-visual fusion, particle filter, adaptive likelihood BibRef

He, G., Liu, X., Fan, F., You, J.,
Image2Audio: Facilitating Semi-supervised Audio Emotion Recognition with Facial Expression Image,
VL3W20(3978-3983)
IEEE DOI 2008
Spectrogram, Training, Emotion recognition, Reliability, Visualization, Face recognition BibRef

Le, N.[Nam], Heili, A.[Alexandre], Wu, D.[Di], Odobez, J.M.[Jean-Marc],
Temporally subsampled detection for accurate and efficient face tracking and diarization,
ICPR16(1792-1797)
IEEE DOI 1705
Detectors, Face, Face detection, Image color analysis, Motion pictures, TV, Tracking BibRef

Saeed, A.[Anwar], Al-Hamadi, A.[Ayoub], Heuer, M.[Michael],
Speaker Tracking Using Multi-modal Fusion Framework,
ICISP12(539-546).
Springer DOI 1208
BibRef

Katsarakis, N.[Nikos], Talantzis, F.[Fotios], Pnevmatikakis, A.[Aristodemos], Polymenakos, L.[Lazaros],
The AIT 3D Audio / Visual Person Tracker for CLEAR 2007,
MTPH07(xx-yy).
Springer DOI 0705

See also AIT 2D Face Detection and Tracking System for CLEAR 2007, The.
See also AIT Multimodal Person Identification System for CLEAR 2007, The. BibRef

Megherbi, N., Ambellouis, S., Colot, O., Cabestaing, F.,
Data Association in Multi-Target Tracking Using Belief Theory: Handling Target Emergence and Disappearance Issue,
AVSBS05(517-521).
IEEE DOI 0602
BibRef

Megherbi, N., Ambellouis, S., Colot, O., Cabestaing, F.,
Joint audio-video people tracking using belief theory,
AVSBS05(135-140).
IEEE DOI 0602
BibRef

Li, X.[Xin], Sun, L.[Luo], Tao, L.M.[Lin-Mi], Xu, G.Y.[Guang-You], Jia, Y.[Ying],
A Speaker Tracking Algorithm Based on Audio and Visual Information Fusion Using Particle Filter,
ICIAR04(II: 572-580).
Springer DOI 0409
BibRef

Lange, C.[Christian], Hermann, T.[Thomas], Ritter, H.[Helge],
Holistic Body Tracking for Gestural Interfaces,
GW03(132-139).
Springer DOI 0405
BibRef

Blake, A., Gangnet, M., Perez, P., Vermaak, J.,
Integrated tracking with vision and sound,
CIAP01(354-357).
IEEE DOI 0210
BibRef

Chapter on Face Recognition, Detection, Tracking, Gesture Recognition, Fingerprints, Biometrics continues in
Mouth Location, Lip Location, Detection .


Last update:Nov 26, 2024 at 16:40:19