Yang, J.X.[Jia-Xin],
Yu, M.M.[Miao-Miao],
Li, S.[Shuohao],
Zhang, J.[Jun],
Hu, S.Z.[Sheng-Ze],
Long-Tailed Object Detection for Multimodal Remote Sensing Images,
RS(15), No. 18, 2023, pp. 4539.
DOI Link
2310
BibRef
He, H.[Haolan],
Dong, X.G.[Xian-Guo],
Zhou, X.F.[Xiao-Fei],
Wang, B.[Bo],
Zhang, J.Y.[Ji-Yong],
Interactive Fusion and Correlation Network for Three-Modal Images
Few-Shot Semantic Segmentation,
SPLetters(31), 2024, pp. 2430-2434.
IEEE DOI
2410
Correlation, Fuses, Decoding, Feature extraction, Convolution,
Visualization, Water resources, Few-shot learning, semantic segmentation
BibRef
Liu, Y.X.[Yan-Xing],
Pan, Z.X.[Zong-Xu],
Yang, J.W.[Jian-Wei],
Zhou, P.[Peiling],
Zhang, B.C.[Bing-Chen],
Multi-Modal Prototypes for Few-Shot Object Detection in Remote
Sensing Images,
RS(16), No. 24, 2024, pp. 4693.
DOI Link
2501
BibRef
Guo, H.[Hao],
Liu, Y.X.[Yan-Xing],
Pan, Z.X.[Zong-Xu],
Hu, Y.X.[Yu-Xin],
Advancing Fine-Grained Few-Shot Object Detection on Remote Sensing
Images with Decoupled Self-Distillation and Progressive Prototype
Calibration,
RS(17), No. 3, 2025, pp. 495.
DOI Link
2502
BibRef
Xu, Y.F.[Yi-Fan],
Zhang, M.[Mengdan],
Yang, X.S.[Xiao-Shan],
Xu, C.S.[Chang-Sheng],
Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object
Detection,
IP(33), 2024, pp. 6253-6267.
IEEE DOI
2411
Transformers, Visualization, Detectors, Object detection, Context modeling,
Proposals, Location awareness, Annotations, contextual knowledge
BibRef
Fei, X.[Xuan],
Guo, M.Y.[Meng-Yao],
Li, Y.[Yan],
Yu, R.P.[Ren-Ping],
Sun, L.[Le],
ACDF-YOLO: Attentive and Cross-Differential Fusion Network for
Multimodal Remote Sensing Object Detection,
RS(16), No. 18, 2024, pp. 3532.
DOI Link
2410
BibRef
Zang, Y.H.[Yu-Hang],
Li, W.[Wei],
Han, J.[Jun],
Zhou, K.Y.[Kai-Yang],
Loy, C.C.[Chen Change],
Contextual Object Detection with Multimodal Large Language Models,
IJCV(133), No. 2, February 2025, pp. 825-843.
Springer DOI
2502
See also Towards Language-Driven Video Inpainting via Multimodal Large Language Models.
BibRef
Du, Y.Y.[Yao-Yang],
Liu, F.[Fang],
Jiao, L.C.[Li-Cheng],
Li, S.[Shuo],
Hao, Z.[Zehua],
Li, P.F.[Peng-Fang],
Wang, J.H.[Jia-Hao],
Wang, H.[Hao],
Liu, X.[Xu],
Text generation and multi-modal knowledge transfer for few-shot
object detection,
PR(161), 2025, pp. 111283.
Elsevier DOI
2502
Few-shot learning, Few-shot object detection, Multi-modal
BibRef
Cheng, D.Q.[De-Qiang],
Xu, X.C.[Xing-Chen],
Zhang, H.X.[Hao-Xiang],
Song, T.S.[Tian-Shu],
Jiang, H.[He],
Kou, Q.Q.[Qi-Qi],
Zero-Shot Object Detection Based on Cross-Modal Guided Clustering,
IVC(162), 2025, pp. 105664.
Elsevier DOI
2510
Zero-shot, Object detection, Clustering methods, Contrastive learning
BibRef
Wang, Y.[Yu],
Wei, S.K.[Shi-Kui],
Xu, S.[Sen],
Qin, Y.[Ying],
Zhao, Y.[Yao],
Confidence-Driven Unimodal Interference Removal for Enhanced
Multimodal Object Detection,
CirSysVideo(35), No. 11, November 2025, pp. 11041-11053.
IEEE DOI
2511
Feature extraction, Interference, Object detection, Visualization,
Transformers, Noise, Streaming media, Detection algorithms, Training,
disentangled representation
BibRef
Sun, S.J.[Shi-Jun],
Ma, S.[Shuai],
Feng, X.Y.[Xu-Yang],
Sun, C.[Chen],
Ding, B.[Baolong],
Ran, Y.Y.[Yao-Yao],
Zhang, Y.H.[Yi-Hong],
LDSDet: Long-Range Context and Dynamic Cross-Modal Alignment for
Multimodal Object Detection Under Challenging Illumination,
RS(18), No. 11, 2026, pp. 1827.
DOI Link
2606
BibRef
Xu, W.H.[Wen-Hao],
Yang, Y.[You],
JFDet: Joint Fusion and Detection for Multimodal Remote Sensing
Imagery,
RS(18), No. 1, 2026, pp. 176.
DOI Link
2601
BibRef
Shangguan, Z.[Zeyu],
Seita, D.[Daniel],
Rostami, M.[Mohammad],
Cross-domain Few-shot Object Detection with Multi-modal Textual
Enrichment,
IJCV(134), No. 6, June 2026, pp. 261.
Springer DOI Code:
WWW Link.
2605
BibRef
Earlier:
Cross-Domain Multi-Modal Few-Shot Object Detection via Rich Text,
WACV25(6570-6580)
IEEE DOI Code:
WWW Link.
2505
Metalearning, Adaptation models, Semantics, Neural networks, Object detection,
Feature extraction, Few shot learning, Standards, vision-language
BibRef
Fu, Y.Q.[Yu-Qian],
Wang, Y.[Yu],
Pan, Y.X.[Yi-Xuan],
Huai, L.[Lian],
Qiu, X.Y.[Xing-Yu],
Shangguan, Z.[Zeyu],
Liu, T.[Tong],
Fu, Y.W.[Yan-Wei],
Van Gool, L.J.[Luc J.],
Jiang, X.Q.[Xing-Qun],
Cross-domain Few-shot Object Detection via Enhanced Open-set Object
Detector,
ECCV24(LVIII: 247-264).
Springer DOI
2412
BibRef
Su, Y.[Yudi],
Ni, J.[Jialei],
Wen, T.[Tiansheng],
Liu, H.W.[Hong-Wei],
Su, H.T.[Hong-Tao],
Chen, B.[Bo],
CAIR-Net: Reliability-Aware Information Routing for Robust Multimodal
Object Detection Under Modality Degradation,
CirSysVideo(36), No. 6, June 2026, pp. 8303-8315.
IEEE DOI
2606
Reliability, Degradation, Optical imaging, Optical sensors,
Cloud computing, Synthetic aperture radar, Routing, expert routing
BibRef
Xia, Q.M.[Qi-Ming],
zheng, L.H.[Long-Hui],
Zhao, S.[Shijia],
Huang, X.[Xun],
Wu, H.[Hai],
Wen, C.[Chenglu],
Wang, C.[Cheng],
DOtA++: Unsupervisely and Collaboratively Detect Objects from
Multi-Agent Observations With Multi-Modal Prior Constraints,
PAMI(48), No. 7, July 2026, pp. 7467-7484.
IEEE DOI
2606
Collaboration, Annotations, Object detection, Manuals, Detectors,
Training, Labeling, Costs, Point cloud compression,
composite prior constraints
BibRef
Campos, F.[Filipe],
Cerqueira, F.G.[Francisco Gonçalves],
Cruz, R.P.M.[Ricardo P. M.],
Cardoso, J.S.[Jaime S.],
YOLOMM: You Only Look Once for Multi-modal Multi-tasking,
CIARP23(I:564-574).
Springer DOI
2312
BibRef
Gungor, C.[Cagri],
Kovashka, A.[Adriana],
Complementary Cues from Audio Help Combat Noise in Weakly-Supervised
Object Detection,
WACV23(2184-2193)
IEEE DOI
2302
Training, Location awareness, Visualization, Music, Object detection,
Detectors, Vision + language and/or other modalities
BibRef
Cao, Y.[Yue],
Bin, J.C.[Jun-Chi],
Hamari, J.[Jozsef],
Blasch, E.[Erik],
Liu, Z.[Zheng],
Multimodal Object Detection by Channel Switching and Spatial
Attention,
PBVS23(403-411)
IEEE DOI
2309
BibRef
Maaz, M.[Muhammad],
Rasheed, H.[Hanoona],
Khan, S.[Salman],
Khan, F.S.[Fahad Shahbaz],
Anwer, R.M.[Rao Muhammad],
Yang, M.H.[Ming-Hsuan],
Class-Agnostic Object Detection with Multi-modal Transformer,
ECCV22(X:512-531).
Springer DOI
2211
BibRef
Chapter on 2-D Feature Analysis, Extraction and Representations, Shape, Skeletons, Texture continues in
Camouflaged Object Detection, Camouflage .