Liu, C.[Chen],
Li, P.[Peike],
Zhang, H.[Hu],
Li, L.C.[Lin-Cheng],
Huang, Z.[Zi],
Wang, D.D.[Da-Dong],
Yu, X.[Xin],
BAVS: Bootstrapping Audio-Visual Segmentation by Integrating
Foundation Knowledge,
MultMed(26), 2024, pp. 10015-10028.
IEEE DOI
2410
Visualization, Semantics, Location awareness, Background noise,
Task analysis, White noise, Transformers,
and audio-visual hierarchical trees
BibRef
Liu, C.[Chen],
Yang, L.Y.[Li-Ying],
Li, P.[Peike],
Wang, D.D.[Da-Dong],
Li, L.C.[Lin-Cheng],
Yu, X.[Xin],
Dynamic Derivation and Elimination: Audio Visual Segmentation with
Enhanced Audio Semantics,
CVPR25(3131-3141)
IEEE DOI
2508
Representation learning, Visualization, Matched filters, Codes,
Semantics, Object segmentation, Benchmark testing,
audio visual localization
BibRef
Liu, C.[Chen],
Li, P.[Peike],
Yang, L.Y.[Li-Ying],
Wang, D.D.[Da-Dong],
Li, L.C.[Lin-Cheng],
Yu, X.[Xin],
Robust Audio-Visual Segmentation via Audio-Guided Visual Convergent
Alignment,
CVPR25(28922-28931)
IEEE DOI
2508
Visualization, Uncertainty, Attention mechanisms, Accuracy, Merging,
Estimation, Contrastive learning, Reliability,
audio visual segmentation
BibRef
Liu, C.[Chen],
Li, P.P.[Peike Patrick],
Yu, Q.[Qingtao],
Sheng, H.W.[Hong-Wei],
Wang, D.D.[Da-Dong],
Li, L.C.[Lin-Cheng],
Yu, X.[Xin],
Benchmarking Audio Visual Segmentation for Long-Untrimmed Videos,
CVPR24(22712-22722)
IEEE DOI Code:
WWW Link.
2410
Location awareness, Visualization, Adaptation models, Annotations,
Grounding, Deformation, Multimodal Processing,
sounding source localization
BibRef
Shi, Z.F.[Zhao-Feng],
Wu, Q.B.[Qing-Bo],
Meng, F.M.[Fan-Man],
Xu, L.F.[Lin-Feng],
Li, H.L.[Hong-Liang],
Cross-Modal Cognitive Consensus Guided Audio-Visual Segmentation,
MultMed(27), 2025, pp. 209-223.
IEEE DOI
2501
Visualization, Semantics, Feature extraction, Object segmentation,
Location awareness, Data mining, Attention mechanisms,
semantic-level consistency
BibRef
Zhou, J.X.[Jin-Xing],
Shen, X.Y.[Xu-Yang],
Wang, J.Y.[Jian-Yuan],
Zhang, J.Y.[Jia-Yi],
Sun, W.X.[Wei-Xuan],
Zhang, J.[Jing],
Birchfield, S.[Stan],
Guo, D.[Dan],
Kong, L.P.[Ling-Peng],
Wang, M.[Meng],
Zhong, Y.R.[Yi-Ran],
Audio-Visual Segmentation with Semantics,
IJCV(133), No. 4, April 2025, pp. 1644-1664.
Springer DOI
2504
BibRef
Lv, Y.[Ying],
Liu, Z.[Zhi],
Chang, X.J.[Xiao-Jun],
Consistency-Queried Transformer for Audio-Visual Segmentation,
IP(34), 2025, pp. 2616-2627.
IEEE DOI
2505
Visualization, Transformers, Decoding, Semantics,
Semantic segmentation, Quadrature amplitude modulation, aligned matching
BibRef
Zhu, Y.[Yue],
Li, K.[Kun],
Yang, Z.X.[Zong-Xin],
Exploiting EfficientSAM and Temporal Coherence for Audio-Visual
Segmentation,
MultMed(27), 2025, pp. 2999-3008.
IEEE DOI
2506
Visualization, Image segmentation, Computational modeling,
Location awareness, Computational efficiency, Adaptation models,
segment-anything model
BibRef
Mao, Y.X.[Yu-Xin],
Zhang, J.[Jing],
Xiang, M.[Mochu],
Lv, Y.Q.[Yun-Qiu],
Li, D.[Dong],
Zhong, Y.R.[Yi-Ran],
Dai, Y.C.[Yu-Chao],
Contrastive Conditional Latent Diffusion for Audio-Visual
Segmentation,
IP(34), 2025, pp. 4108-4119.
IEEE DOI Code:
WWW Link.
2507
Diffusion models, Contrastive learning, Image segmentation,
Visualization, Semantics, Training, Representation learning,
contrastive learning
BibRef
Xuan, H.Y.[Han-Yu],
Liu, T.X.[Tong-Xing],
Dong, W.X.[Wen-Xiang],
Li, Z.H.[Zhong-Heng],
Chen, S.[Shuo],
X-STA: Cross-Modal Spatial-Temporal Alignment Network for Unified
Audio-Visual Segmentation,
SPLetters(32), 2025, pp. 2883-2887.
IEEE DOI
2509
Visualization, Training, Location awareness, Synchronization, Shape,
Image segmentation, Decoding, Data mining, Video sequences,
alignment
BibRef
Gong, S.[Sitong],
Zhuge, Y.Z.[Yun-Zhi],
Zhang, L.[Lu],
Wang, Y.F.[Yi-Fan],
Zhang, P.P.[Ping-Ping],
Wang, L.J.[Li-Jun],
Lu, H.C.[Hu-Chuan],
AVS-Mamba: Exploring Temporal and Multi-Modal Mamba for Audio-Visual
Segmentation,
MultMed(27), 2025, pp. 5413-5425.
IEEE DOI
2509
Visualization, Transformers, Decoding, Feature extraction,
Computational modeling, Artificial intelligence, Training,
temporal modeling
BibRef
Gong, S.[Sitong],
Zhuge, Y.Z.[Yun-Zhi],
Zhang, L.[Lu],
Zhang, P.P.[Ping-Ping],
Lu, H.C.[Hu-Chuan],
Complementary and Contrastive Learning for Audio-Visual Segmentation,
MultMed(27), 2025, pp. 7407-7418.
IEEE DOI
2510
Transformers, Visualization, Videos, Contrastive learning, Location awareness,
semantics, Object recognition, Generators, multimodal transformer
BibRef
Nguyen, K.B.[Khanh-Binh],
Park, C.J.[Chae Jung],
SAVE: Segment Audio-Visual Easy way using the Segment Anything Model,
CVIU(260), 2025, pp. 104460.
Elsevier DOI
2510
Multimodal, Audio-visual segmentation, Segment anything model
BibRef
Wang, Y.[Yusen],
Qian, X.H.[Xiao-Hong],
Zhou, W.[Wujie],
Transformer-Prompted Network: Efficient Audio-Visual Segmentation via
Transformer and Prompt Learning,
SPLetters(32), 2025, pp. 516-520.
IEEE DOI
2501
Transformers, Feature extraction, Frequency-domain analysis,
Europe, Visualization, Location awareness, Convolution,
self-knowledge distillation
BibRef
Zhou, J.X.[Jin-Xing],
Li, Z.H.[Zhi-Hui],
Yu, Y.Q.[Yong-Qiang],
Zhou, Y.H.[Yang-Hao],
Guo, R.[Ruohao],
Li, G.Y.[Guang-Yao],
Mao, Y.X.[Yu-Xin],
Han, M.F.[Ming-Fei],
Chang, X.J.[Xiao-Jun],
Wang, M.[Meng],
Mettle: Meta-Token Learning for Memory-Efficient Audio-Visual Adaptation,
PAMI(48), No. 4, April 2026, pp. 4222-4238.
IEEE DOI
2603
Transformers, Visualization, Training, Memory management,
Adaptation models, Runtime, Liquid crystal displays, Accuracy,
audio-visual segmentation
BibRef
Shen, L.[Li],
Wang, Y.Z.[Yang-Zhu],
Fan, X.[Xuyi],
Wei, Y.N.[Yu-Ning],
Qiu, H.X.[Hua-Xin],
AV2TS: A Multivariate Time Series Modeling Framework for Audio-Visual
Segmentation,
MultMed(28), 2026, pp. 4056-4068.
IEEE DOI
2606
Feature extraction, Time series analysis, Visualization, Acoustics,
Video sequences, Videos, Forecasting, Attention mechanisms, Fuses,
multivariate time series modeling
BibRef
Radman, A.[Abduljalil],
Laaksonen, J.[Jorma],
TSAM: Temporal SAM Augmented with Multimodal Prompts for Referring
Audio-Visual Segmentation,
CVPR25(23947-23956)
IEEE DOI Code:
WWW Link.
2508
Image segmentation, Visualization, Memory modules, Robustness,
Videos, referring audio-visual segmentation,
multimodal learning
BibRef
Seon, J.[Juhyeong],
Im, W.B.[Woo-Bin],
Lee, S.[Sebin],
Lee, J.[Jumin],
Yoon, S.E.[Sung-Eui],
Extending Segment Anything Model into Auditory and Temporal
Dimensions for Audio-Visual Segmentation,
ICIP24(2480-2486)
IEEE DOI
2411
Image segmentation, Adaptation models, Video sequences,
Benchmark testing, Streaming media, Predictive models, Decoding,
audio-visual segmentation
BibRef
Yang, Q.[Qi],
Nie, X.[Xing],
Li, T.[Tong],
Gao, P.F.[Peng-Fei],
Guo, Y.[Ying],
Zhen, C.[Cheng],
Yan, P.F.[Peng-Fei],
Xiang, S.M.[Shi-Ming],
Cooperation Does Matter: Exploring Multi-Order Bilateral Relations
for Audio-Visual Segmentation,
CVPR24(27124-27133)
IEEE DOI Code:
WWW Link.
2410
Visualization, Adaptation models, Terminology,
Bidirectional control, Transformers, multi-modal
BibRef
Liu, J.X.[Jin-Xiang],
Liu, Y.K.[Yi-Kun],
Zhang, F.[Fei],
Ju, C.[Chen],
Zhang, Y.[Ya],
Wang, Y.F.[Yan-Feng],
Audio-Visual Segmentation via Unlabeled Frame Exploitation,
CVPR24(26318-26329)
IEEE DOI
2410
Location awareness, Training, Accuracy, Semantics, Dynamics,
Performance gain, audio-visual segmentation, video understanding,
audio-visual understanding
BibRef
Liu, J.X.[Jin-Xiang],
Wang, Y.[Yu],
Ju, C.[Chen],
Ma, C.F.[Chao-Fan],
Zhang, Y.[Ya],
Xie, W.[Weidi],
Annotation-free Audio-Visual Segmentation,
WACV24(5592-5602)
IEEE DOI Code:
WWW Link.
2404
Training, Adaptation models, Image segmentation, Visualization,
Computational modeling, Pipelines, Data models, Algorithms,
Image recognition and understanding
BibRef
Zhou, J.X.[Jin-Xing],
Wang, J.Y.[Jian-Yuan],
Zhang, J.Y.[Jia-Yi],
Sun, W.X.[Wei-Xuan],
Zhang, J.[Jing],
Birchfield, S.[Stan],
Guo, D.[Dan],
Kong, L.P.[Ling-Peng],
Wang, M.[Meng],
Zhong, Y.R.[Yi-Ran],
Audio-Visual Segmentation,
ECCV22(XXXVII:386-403).
Springer DOI
2211
BibRef
Chapter on New Unsorted Entries, and Other Miscellaneous Papers continues in
Time of Arrival, ToA, Analysis .