Zotkin, D.N.[Dmitry N.],
Duraiswami, R.[Ramani],
Davis, L.S.[Larry S.],
Joint Audio-Visual Tracking Using Particle Filters,
JASP(2002), No. 11, November 2002, pp. 1154.
WWW Link.
0304
BibRef
Garg, A.[Ashutosh],
Pavlovic, V.[Vladimir],
Rehg, J.M.[James M.],
Boosted learning in dynamic Bayesian networks for multimodal speaker
detection,
PIEEE(91), No. 9, September 2003, pp. 1355-1369.
IEEE DOI
0309
BibRef
Earlier:
Audio-visual speaker detection using dynamic Bayesian networks,
AFGR00(384-390).
IEEE DOI
0003
BibRef
Pavlovic, V.[Vladimir],
Garg, A.[Ashutosh],
Rehg, J.M.[James M.],
Huang, T.S.[Thomas S.],
Multimodal Speaker Detection using Error Feedback Dynamic Bayesian
Networks,
CVPR00(II: 34-41).
IEEE DOI
0005
BibRef
Pavlovic, V.,
Berry, G., and
Huang, T.S.,
Integration of Audio/Visual Information for Use in
Human-Computer Intelligent Interaction,
ICIP97(I: 121-124).
IEEE DOI
BibRef
9700
Choudhury, T.[Tanzeem],
Rehg, J.M.,
Pavlovic, V.,
Pentland, A.P.,
Boosting and structure learning in dynamic Bayesian networks for
audio-visual speaker detection,
ICPR02(III: 789-794).
IEEE DOI
0211
BibRef
Pavlovic, V.[Vladimir],
Multimodal tracking and classification of audio-visual features,
ICIP98(I: 343-347).
IEEE DOI
9810
BibRef
Rehg, J.M.[James M.],
Murphy, K.P.[Kevin P.],
Fieguth, P.W.[Paul W.],
Vision-Based Speaker Detection Using Bayesian Networks,
CVPR99(II: 110-116).
IEEE DOI More particuarly the one talking.
BibRef
9900
Talantzis, F.,
Pnevmatikakis, A.,
Constantinides, A.G.,
Audio-Visual Active Speaker Tracking in Cluttered Indoors Environments,
SMC-B(39), No. 1, February 2009, pp. 7-15.
IEEE DOI
0902
BibRef
Earlier:
SMC-B(38), No. 3, June 2008, pp. 799-807.
IEEE DOI
0711
The top one is the special issue, it was published early in the other issue.
BibRef
Qian, X.,
Brutti, A.,
Lanz, O.,
Omologo, M.,
Cavallaro, A.,
Multi-Speaker Tracking From an Audio-Visual Sensing Device,
MultMed(21), No. 10, October 2019, pp. 2576-2588.
IEEE DOI
1910
image colour analysis, object detection, object tracking,
particle filtering (numerical methods), sensor fusion,
particle filter
BibRef
Ban, Y.T.[Yu-Tong],
Alameda-Pineda, X.[Xavier],
Girin, L.[Laurent],
Horaud, R.[Radu],
Variational Bayesian Inference for Audio-Visual Tracking of Multiple
Speakers,
PAMI(43), No. 5, May 2021, pp. 1761-1776.
IEEE DOI
2104
BibRef
Earlier: A1, A3, A2, A4:
Exploiting the Complementarity of Audio and Visual Data in
Multi-speaker Tracking,
CVAVM17(446-454)
IEEE DOI
1802
Visualization, Target tracking, Acoustics, Bayes methods, Cameras,
Object tracking, Direction-of-arrival estimation,
speaker diarization.
Cameras, Detectors, Kalman filters, Microphones, Robots, Tracking,
Visualization
BibRef
Qian, X.Y.[Xin-Yuan],
Brutti, A.[Alessio],
Lanz, O.[Oswald],
Omologo, M.[Maurizio],
Cavallaro, A.[Andrea],
Audio-Visual Tracking of Concurrent Speakers,
MultMed(24), 2022, pp. 942-954.
IEEE DOI
2202
Target tracking, Acoustics, Faces, Cameras, Visualization,
Image color analysis, 3D multiple target tracking,
particle filter
BibRef
Hu, D.[Di],
Wei, Y.[Yake],
Qian, R.[Rui],
Lin, W.Y.[Wei-Yao],
Song, R.H.[Rui-Hua],
Wen, J.R.[Ji-Rong],
Class-Aware Sounding Objects Localization via Audiovisual
Correspondence,
PAMI(44), No. 12, December 2022, pp. 9844-9859.
IEEE DOI
2212
Where did the sound come from.
Location awareness, Visualization, Task analysis, Annotations,
Semantics, Dictionaries, Videos, distribution alignment
BibRef
Wang, H.[Hao],
Zha, Z.J.[Zheng-Jun],
Li, L.[Liang],
Chen, X.J.[Xue-Jin],
Luo, J.B.[Jie-Bo],
Semantic and Relation Modulation for Audio-Visual Event Localization,
PAMI(45), No. 6, June 2023, pp. 7711-7725.
IEEE DOI
2305
Visualization, Location awareness, Correlation, Proposals, Semantics,
Task analysis, Modulation, Audio-visual learning, normalization
BibRef
Garg, R.[Rishabh],
Gao, R.H.[Ruo-Han],
Grauman, K.[Kristen],
Visually-Guided Audio Spatialization in Video with Geometry-Aware
Multi-task Learning,
IJCV(131), No. 10, October 2023, pp. 2723-2737.
Springer DOI
2309
BibRef
Wang, J.X.[Jia-Xiang],
Li, C.L.[Cheng-Long],
Zheng, A.[Aihua],
Tang, J.[Jin],
Luo, B.[Bin],
Looking and Hearing Into Details:
Dual-Enhanced Siamese Adversarial Network for Audio-Visual Matching,
MultMed(25), 2023, pp. 7505-7516.
IEEE DOI
2311
BibRef
Liu, C.[Chen],
Li, P.[Peike],
Zhang, H.[Hu],
Li, L.C.[Lin-Cheng],
Huang, Z.[Zi],
Wang, D.D.[Da-Dong],
Yu, X.[Xin],
BAVS: Bootstrapping Audio-Visual Segmentation by Integrating
Foundation Knowledge,
MultMed(26), 2024, pp. 10015-10028.
IEEE DOI
2410
Visualization, Semantics, Location awareness, Background noise,
Task analysis, White noise, Transformers,
and audio-visual hierarchical trees
BibRef
Traa, J.,
Smaragdis, P.,
A Wrapped Kalman Filter for Azimuthal Speaker Tracking,
SPLetters(20), No. 12, 2013, pp. 1257-1260.
IEEE DOI
1311
Approximation methods
BibRef
Nugroho, M.A.[Muhammad Adi],
Woo, S.[Sangmin],
Lee, S.[Sumin],
Kim, C.[Changick],
Audio-Visual Glance Network for Efficient Video Recognition,
ICCV23(10116-10125)
IEEE DOI
2401
BibRef
Liu, Y.[Yang],
Tan, Y.[Ying],
Lan, H.Y.[Hao-Yuan],
Self-Supervised Contrastive Learning for Audio-Visual Action
Recognition,
ICIP23(1000-1004)
IEEE DOI
2312
BibRef
Min, K.[Kyle],
Roy, S.[Sourya],
Tripathi, S.[Subarna],
Guha, T.[Tanaya],
Majumdar, S.[Somdeb],
Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection,
ECCV22(XXXV:371-387).
Springer DOI
2211
BibRef
Majumder, S.[Sagnik],
Al-Halah, Z.[Ziad],
Grauman, K.[Kristen],
Move2Hear: Active Audio-Visual Source Separation,
ICCV21(275-285)
IEEE DOI
2203
Solid modeling, Source separation, Robot vision systems,
Reinforcement learning, Ear, Vision + other modalities,
Vision for robotics and autonomous vehicles
BibRef
Majumder, S.[Sagnik],
Grauman, K.[Kristen],
Active Audio-Visual Separation of Dynamic Sound Sources,
ECCV22(XXIX:551-569).
Springer DOI
2211
BibRef
Alcázar, J.L.[Juan León],
Heilbron, F.C.[Fabian Caba],
Thabet, A.K.[Ali K.],
Ghanem, B.[Bernard],
MAAS: Multi-modal Assignation for Active Speaker Detection,
ICCV21(265-274)
IEEE DOI
2203
Visualization, Benchmark testing, Feature extraction,
Data structures, Task analysis, Vision + other modalities,
Video analysis and understanding
BibRef
Köpüklü, O.[Okan],
Taseska, M.[Maja],
Rigoll, G.[Gerhard],
How to Design a Three-Stage Architecture for Audio-Visual Active
Speaker Detection in the Wild,
ICCV21(1173-1183)
IEEE DOI
2203
Codes, Computational modeling, Pipelines, Computer architecture,
Encoding, Task analysis, Vision + other modalities,
Vision applications and systems
BibRef
Wu, Y.[Yu],
Yang, Y.[Yi],
Exploring Heterogeneous Clues for Weakly-Supervised Audio-Visual
Video Parsing,
CVPR21(1326-1335)
IEEE DOI
2111
Training, Visualization, Target tracking,
Annotations, Predictive models
BibRef
Liu, H.[Hong],
Sun, Y.H.[Yong-Heng],
Li, Y.D.[Yi-Di],
Yang, B.[Bing],
3D Audio-Visual Speaker Tracking with A Novel Particle Filter,
ICPR21(7343-7348)
IEEE DOI
2105
BibRef
Earlier: A1, A3, A4, Only:
3D Audio-Visual Speaker Tracking with A Two-Layer Particle Filter,
ICIP19(1955-1959)
IEEE DOI
1910
Visualization, Histograms, Head,
Image color analysis, Sensor phenomena and characterization, compact platform.
3D speaker tracking, audio-visual fusion, particle filter, adaptive likelihood
BibRef
He, G.,
Liu, X.,
Fan, F.,
You, J.,
Image2Audio: Facilitating Semi-supervised Audio Emotion Recognition
with Facial Expression Image,
VL3W20(3978-3983)
IEEE DOI
2008
Spectrogram, Training, Emotion recognition,
Reliability, Visualization, Face recognition
BibRef
Le, N.[Nam],
Heili, A.[Alexandre],
Wu, D.[Di],
Odobez, J.M.[Jean-Marc],
Temporally subsampled detection for accurate and efficient face
tracking and diarization,
ICPR16(1792-1797)
IEEE DOI
1705
Detectors, Face, Face detection, Image color analysis,
Motion pictures, TV, Tracking
BibRef
Saeed, A.[Anwar],
Al-Hamadi, A.[Ayoub],
Heuer, M.[Michael],
Speaker Tracking Using Multi-modal Fusion Framework,
ICISP12(539-546).
Springer DOI
1208
BibRef
Katsarakis, N.[Nikos],
Talantzis, F.[Fotios],
Pnevmatikakis, A.[Aristodemos],
Polymenakos, L.[Lazaros],
The AIT 3D Audio / Visual Person Tracker for CLEAR 2007,
MTPH07(xx-yy).
Springer DOI
0705
See also AIT 2D Face Detection and Tracking System for CLEAR 2007, The.
See also AIT Multimodal Person Identification System for CLEAR 2007, The.
BibRef
Megherbi, N.,
Ambellouis, S.,
Colot, O.,
Cabestaing, F.,
Data Association in Multi-Target Tracking Using Belief Theory:
Handling Target Emergence and Disappearance Issue,
AVSBS05(517-521).
IEEE DOI
0602
BibRef
Megherbi, N.,
Ambellouis, S.,
Colot, O.,
Cabestaing, F.,
Joint audio-video people tracking using belief theory,
AVSBS05(135-140).
IEEE DOI
0602
BibRef
Li, X.[Xin],
Sun, L.[Luo],
Tao, L.M.[Lin-Mi],
Xu, G.Y.[Guang-You],
Jia, Y.[Ying],
A Speaker Tracking Algorithm Based on Audio and Visual Information
Fusion Using Particle Filter,
ICIAR04(II: 572-580).
Springer DOI
0409
BibRef
Lange, C.[Christian],
Hermann, T.[Thomas],
Ritter, H.[Helge],
Holistic Body Tracking for Gestural Interfaces,
GW03(132-139).
Springer DOI
0405
BibRef
Blake, A.,
Gangnet, M.,
Perez, P.,
Vermaak, J.,
Integrated tracking with vision and sound,
CIAP01(354-357).
IEEE DOI
0210
BibRef
Chapter on Face Recognition, Detection, Tracking, Gesture Recognition, Fingerprints, Biometrics continues in
Mouth Location, Lip Location, Detection .