Li, L.H.[Ling-Hui],
Tang, S.[Sheng],
Zhang, Y.D.[Yong-Dong],
Deng, L.X.[Li-Xi],
Tian, Q.[Qi],
GLA: Global-Local Attention for Image Description,
MultMed(20), No. 3, March 2018, pp. 726-737.
IEEE DOI
1802
Computational modeling, Decoding, Feature extraction,
Image recognition, Natural language processing,
recurrent neural network
BibRef
Wu, C.L.[Chun-Lei],
Wei, Y.W.[Yi-Wei],
Chu, X.L.[Xiao-Liang],
Su, F.[Fei],
Wang, L.Q.[Lei-Quan],
Modeling visual and word-conditional semantic attention for image
captioning,
SP:IC(67), 2018, pp. 100-107.
Elsevier DOI
1808
Image captioning, Word-conditional semantic attention,
Visual attention, Attention variation
BibRef
Xu, N.[Ning],
Liu, A.A.[An-An],
Liu, J.[Jing],
Nie, W.Z.[Wei-Zhi],
Su, Y.T.[Yu-Ting],
Scene graph captioner:
Image captioning based on structural visual representation,
JVCIR(58), 2019, pp. 477-485.
Elsevier DOI
1901
Image captioning, Scene graph, Structural representation, Attention
BibRef
Ding, S.T.[Song-Tao],
Qu, S.[Shiru],
Xi, Y.L.[Yu-Ling],
Sangaiah, A.K.[Arun Kumar],
Wan, S.H.[Shao-Hua],
Image caption generation with high-level image features,
PRL(123), 2019, pp. 89-95.
Elsevier DOI
1906
Image captioning, Language model,
Bottom-up attention mechanism, Faster R-CNN
BibRef
Zhang, Z.J.[Zong-Jian],
Wu, Q.[Qiang],
Wang, Y.[Yang],
Chen, F.[Fang],
High-Quality Image Captioning With Fine-Grained and Semantic-Guided
Visual Attention,
MultMed(21), No. 7, July 2019, pp. 1681-1693.
IEEE DOI
1906
BibRef
Earlier:
Fine-Grained and Semantic-Guided Visual Attention for Image
Captioning,
WACV18(1709-1717)
IEEE DOI
1806
Visualization, Semantics, Feature extraction, Decoding,
Task analysis, Object oriented modeling, Image resolution,
fully convolutional network-long short term memory framework.
feedforward neural nets, image representation,
image segmentation, convolutional neural network,
Visualization
BibRef
Tan, J.H.,
Chan, C.S.,
Chuah, J.H.,
COMIC: Toward A Compact Image Captioning Model With Attention,
MultMed(21), No. 10, October 2019, pp. 2686-2696.
IEEE DOI
1910
embedded systems; feature extraction; image retrieval; matrix algebra.
BibRef
Yang, L.[Liang],
Hu, H.F.[Hai-Feng],
Visual Skeleton and Reparative Attention for Part-of-Speech image
captioning system,
CVIU(189), 2019, pp. 102819.
Elsevier DOI
1911
Neural network, Visual attention, Image captioning
BibRef
Wang, J.B.[Jun-Bo],
Wang, W.[Wei],
Wang, L.[Liang],
Wang, Z.Y.[Zhi-Yong],
Feng, D.D.[David Dagan],
Tan, T.N.[Tie-Niu],
Learning Visual Relationship and Context-Aware Attention for Image
Captioning,
PR(98), 2020, pp. 107075.
Elsevier DOI
1911
Image captioning, Relational reasoning, Context-aware attention
BibRef
Wei, H.Y.[Hai-Yang],
Li, Z.X.[Zhi-Xin],
Zhang, C.L.[Can-Long],
Ma, H.F.[Hui-Fang],
The synergy of double attention: Combine sentence-level and
word-level attention for image captioning,
CVIU(201), 2020, pp. 103068.
Elsevier DOI
2011
Image captioning, Sentence-level attention,
Word-level attention, Reinforcement learning
BibRef
Ji, J.Z.[Jun-Zhong],
Du, Z.R.[Zhuo-Ran],
Zhang, X.D.[Xiao-Dan],
Divergent-convergent attention for image captioning,
PR(115), 2021, pp. 107928.
Elsevier DOI
2104
Image Captioning, Divergent Observation, Convergent Attention
BibRef
Wei, Y.W.[Yi-Wei],
Wu, C.L.[Chun-Lei],
Jia, Z.Y.[Zhi-Yang],
Hu, X.F.[Xu-Fei],
Guo, S.[Shuang],
Shi, H.T.[Hai-Tao],
Past is important: Improved image captioning by looking back in time,
SP:IC(94), 2021, pp. 116183.
Elsevier DOI
2104
Image captioning, Reinforcement learning, Visual attention
BibRef
Zhang, Z.J.[Zong-Jian],
Wu, Q.[Qiang],
Wang, Y.[Yang],
Chen, F.[Fang],
Exploring region relationships implicitly:
Image captioning with visual relationship attention,
IVC(109), 2021, pp. 104146.
Elsevier DOI
2105
Image captioning, Visual relationship attention,
Relationship-level attention parallel attention mechanism,
Learned spatial constraint
BibRef
Zhang, Z.J.[Zong-Jian],
Wu, Q.[Qiang],
Wang, Y.[Yang],
Chen, F.[Fang],
Exploring Pairwise Relationships Adaptively From Linguistic Context
in Image Captioning,
MultMed(24), 2022, pp. 3101-3113.
IEEE DOI
2206
Visualization, Linguistics, Decoding, Modulation, Context modeling,
Adaptation models, Semantics, Bilinear attention,
visual relationship attention
BibRef
Zhong, X.[Xian],
Nie, G.Z.[Guo-Zhang],
Huang, W.X.[Wen-Xin],
Liu, W.X.[Wen-Xuan],
Ma, B.[Bo],
Lin, C.W.[Chia-Wen],
Attention-guided image captioning with adaptive global and local
feature fusion,
JVCIR(78), 2021, pp. 103138.
Elsevier DOI
2107
Image captioning, Encoder-decoder, Spatial information, Adaptive attention
BibRef
Wan, B.Y.[Bo-Yang],
Jiang, W.H.[Wen-Hui],
Fang, Y.M.[Yu-Ming],
Zhu, M.W.[Min-Wei],
Li, Q.[Qin],
Liu, Y.[Yang],
Revisiting image captioning via maximum discrepancy competition,
PR(122), 2022, pp. 108358.
Elsevier DOI
2112
Image captioning, Model comparison, Attention mechanism
BibRef
Chen, T.Y.[Tian-Yu],
Li, Z.X.[Zhi-Xin],
Wu, J.L.[Jing-Li],
Ma, H.F.[Hui-Fang],
Su, B.P.[Bian-Ping],
Improving image captioning with Pyramid Attention and SC-GAN,
IVC(117), 2022, pp. 104340.
Elsevier DOI
2112
Image captioning, Pyramid Attention network,
Self-critical training, Reinforcement learning, Sequence-level learning
BibRef
Zhou, Y.J.[Yu-Jie],
Long, J.F.[Jie-Feng],
Xu, S.P.[Su-Ping],
Shang, L.[Lin],
Attribute-driven image captioning via soft-switch pointer,
PRL(152), 2021, pp. 34-41.
Elsevier DOI
2112
Image captioning, Visual attributes detection, Attention, Pointing mechanism
BibRef
Wang, Q.Z.[Qing-Zhong],
Wan, J.[Jia],
Chan, A.B.[Antoni B.],
On Diversity in Image Captioning: Metrics and Methods,
PAMI(44), No. 2, February 2022, pp. 1035-1049.
IEEE DOI
2201
Measurement, Semantics, Learning (artificial intelligence),
Vegetation, Legged locomotion, Training, Computational modeling,
diversity metric
BibRef
Wang, J.N.[Jiu-Niu],
Xu, W.J.[Wen-Jia],
Wang, Q.Z.[Qing-Zhong],
Chan, A.B.[Antoni B.],
On Distinctive Image Captioning via Comparing and Reweighting,
PAMI(45), No. 2, February 2023, pp. 2088-2103.
IEEE DOI
2301
Training, Measurement, Annotations, Semantics,
Maximum likelihood estimation, Xenon, Web and internet services, metric
BibRef
Wang, J.N.[Jiu-Niu],
Xu, W.J.[Wen-Jia],
Wang, Q.Z.[Qing-Zhong],
Chan, A.B.[Antoni B.],
Group-Based Distinctive Image Captioning with Memory Difference
Encoding and Attention,
IJCV(133), No. 4, April 2025, pp. 1435-1455.
Springer DOI
2504
BibRef
Earlier:
Compare and Reweight:
Distinctive Image Captioning Using Similar Images Sets,
ECCV20(I:370-386).
Springer DOI
2011
BibRef
Liu, M.F.[Mao-Fu],
Hu, H.J.[Hui-Jun],
Li, L.J.[Ling-Jun],
Yu, Y.[Yan],
Guan, W.L.[Wei-Li],
Chinese Image Caption Generation via Visual Attention and Topic
Modeling,
Cyber(52), No. 2, February 2022, pp. 1247-1257.
IEEE DOI
2202
Visualization, Decoding, Semantics, Predictive models,
Feature extraction, Natural language processing,
visual attention
BibRef
Li, X.[Xuan],
Zhang, W.K.[Wen-Kai],
Sun, X.[Xian],
Gao, X.[Xin],
Without detection: Two-step clustering features with local-global
attention for image captioning,
IET-CV(16), No. 3, 2022, pp. 280-294.
DOI Link
2204
BibRef
Yu, L.T.[Li-Tao],
Zhang, J.[Jian],
Wu, Q.[Qiang],
Dual Attention on Pyramid Feature Maps for Image Captioning,
MultMed(24), No. 2022, pp. 1775-1786.
IEEE DOI
2204
Visualization, Decoding, Task analysis, Semantics,
Feature extraction, Context modeling, Image captioning, pyramid attention
BibRef
Shao, X.J.[Xiang-Jun],
Xiang, Z.L.[Zheng-Long],
Li, Y.X.[Yuan-Xiang],
Zhang, M.J.[Ming-Jie],
Variational joint self-attention for image captioning,
IET-IPR(16), No. 8, 2022, pp. 2075-2086.
DOI Link
2205
BibRef
Ma, Y.W.[Yi-Wei],
Ji, J.Y.[Jia-Yi],
Sun, X.S.[Xiao-Shuai],
Zhou, Y.[Yiyi],
Ji, R.R.[Rong-Rong],
Towards local visual modeling for image captioning,
PR(138), 2023, pp. 109420.
Elsevier DOI
2303
Image captioning, Attention mechanism, Local visual modeling
BibRef
Barati, A.[Alireza],
Farsi, H.[Hassan],
Mohamadzadeh, S.[Sajad],
Integration of the latent variable knowledge into deep image
captioning with Bayesian modeling,
IET-IPR(17), No. 7, 2023, pp. 2256-2271.
DOI Link
2305
attention mechanism, automatic image captioning,
deep neural networks, high-level semantic concepts, latent variable
BibRef
Ji, J.Y.[Jia-Yi],
Huang, X.Y.[Xiao-Yang],
Sun, X.S.[Xiao-Shuai],
Zhou, Y.[Yiyi],
Luo, G.[Gen],
Cao, L.J.[Liu-Juan],
Liu, J.Z.[Jian-Zhuang],
Shao, L.[Ling],
Ji, R.R.[Rong-Rong],
Multi-Branch Distance-Sensitive Self-Attention Network for Image
Captioning,
MultMed(25), 2023, pp. 3962-3974.
IEEE DOI
2310
BibRef
Cornia, M.[Marcella],
Baraldi, L.[Lorenzo],
Tal, A.[Ayellet],
Cucchiara, R.[Rita],
Fully-attentive iterative networks for region-based controllable
image and video captioning,
CVIU(237), 2023, pp. 103857.
Elsevier DOI
2311
Controllable captioning, Image captioning, Video captioning, Vision-and-language
BibRef
Song, L.F.[Li-Fei],
Li, F.[Fei],
Wang, Y.[Ying],
Liu, Y.[Yu],
Wang, Y.H.[Yuan-Hua],
Xiang, S.M.[Shi-Ming],
Image captioning: Semantic selection unit with stacked residual
attention,
IVC(144), 2024, pp. 104965.
Elsevier DOI
2404
Image captioning, Semantic attributes, Semantic selection unit,
Transformer, Stacked residual attention
BibRef
Du, R.[Runyan],
Zhang, W.K.[Wen-Kai],
Li, S.[Shuoke],
Chen, J.L.[Jia-Liang],
Guo, Z.[Zhi],
Spatial guided image captioning: Guiding attention with object's
spatial interaction,
IET-IPR(18), No. 12, 2024, pp. 3368-3380.
DOI Link
2411
image representation, image texture
BibRef
Zhang, X.D.[Xiao-Dan],
Jia, A.[Aozhe],
Ji, J.Z.[Jun-Zhong],
Qu, L.Q.[Liang-Qiong],
Ye, Q.X.[Qi-Xiang],
Intra- and Inter-Head Orthogonal Attention for Image Captioning,
IP(34), 2025, pp. 594-607.
IEEE DOI
2502
Head, Redundancy, Visualization, Decoding, Transformers,
Feature extraction, Correlation, Accuracy, Optimization, Dogs,
orthogonal constraint
BibRef
Song, L.F.[Li-Fei],
Wang, Y.[Ying],
Shi, L.[Linsu],
Yu, J.Z.[Jia-Zhong],
Li, F.[Fei],
Xiang, S.M.[Shi-Ming],
Transformer with token attention and attribute prediction for image
captioning,
PRL(188), 2025, pp. 74-80.
Elsevier DOI
2502
Image Captioning, Token Attention, Vision Transformers
BibRef
Parseh, M.J.[Mohammad Javad],
Ghadiri, S.[Saeed],
Graph-based image captioning with semantic and spatial features,
SP:IC(133), 2025, pp. 117273.
Elsevier DOI
2502
Image captioning, Semantic Graph, Spatial graph, Attention mechanism
BibRef
Popattia, M.[Murad],
Rafi, M.[Muhammad],
Qureshi, R.[Rizwan],
Nawaz, S.[Shah],
Guiding Attention using Partial-Order Relationships for Image
Captioning,
MULA22(4670-4679)
IEEE DOI
2210
Training, Measurement, Visualization, Semantics, Computer architecture
BibRef
Deb, T.[Tonmoay],
Sadmanee, A.[Akib],
Bhaumik, K.K.[Kishor Kumar],
Ali, A.A.[Amin Ahsan],
Amin, M.A.[M Ashraful],
Rahman, A.K.M.M.[A.K.M. Mahbubur],
Variational Stacked Local Attention Networks for Diverse Video
Captioning,
WACV22(2493-2502)
IEEE DOI
2202
Measurement, Visualization, Stacking, Redundancy, Natural languages,
Streaming media, Syntactics, Vision and Languages Datasets,
Analysis and Understanding
BibRef
Li, Z.,
Tran, Q.,
Mai, L.,
Lin, Z.,
Yuille, A.L.,
Context-Aware Group Captioning via Self-Attention and Contrastive
Features,
CVPR20(3437-3447)
IEEE DOI
2008
Task analysis, Visualization, Context modeling,
Training, Natural languages, Computational modeling
BibRef
Guo, L.,
Liu, J.,
Zhu, X.,
Yao, P.,
Lu, S.,
Lu, H.,
Normalized and Geometry-Aware Self-Attention Network for Image
Captioning,
CVPR20(10324-10333)
IEEE DOI
2008
Geometry, Task analysis, Visualization, Decoding, Training,
Feature extraction, Computer architecture
BibRef
Pan, Y.,
Yao, T.,
Li, Y.,
Mei, T.,
X-Linear Attention Networks for Image Captioning,
CVPR20(10968-10977)
IEEE DOI
2008
Visualization, Decoding, Cognition, Knowledge discovery,
Task analysis, Aggregates, Weight measurement
BibRef
Park, G.[Geondo],
Han, C.[Chihye],
Kim, D.[Daeshik],
Yoon, W.J.[Won-Jun],
MHSAN: Multi-Head Self-Attention Network for Visual Semantic
Embedding,
WACV20(1507-1515)
IEEE DOI
2006
Feature extraction, Visualization, Semantics, Task analysis,
Recurrent neural networks, Image representation, Image coding
BibRef
He, S.,
Tavakoli, H.R.,
Borji, A.,
Pugeault, N.,
Human Attention in Image Captioning: Dataset and Analysis,
ICCV19(8528-8537)
IEEE DOI
2004
Code, Captioning.
WWW Link. convolutional neural nets, image segmentation, natural language processing,
object detection, visual perception, Adaptation models
BibRef
Huang, L.,
Wang, W.,
Chen, J.,
Wei, X.,
Attention on Attention for Image Captioning,
ICCV19(4633-4642)
IEEE DOI
2004
Code, Captioning.
WWW Link. decoding, encoding, image processing, natural language processing,
element-wise multiplication, image captioning, weighted average, Testing
BibRef
Wei, H.Y.[Hai-Yang],
Li, Z.X.[Zhi-Xin],
Zhang, C.L.[Can-Long],
Image Captioning Based on Visual and Semantic Attention,
MMMod20(I:151-162).
Springer DOI
2003
BibRef
Fukui, H.[Hiroshi],
Hirakawa, T.[Tsubasa],
Yamashita, T.[Takayoshi],
Fujiyoshi, H.[Hironobu],
Attention Branch Network: Learning of Attention Mechanism for Visual
Explanation,
CVPR19(10697-10706).
IEEE DOI
2002
BibRef
Huang, Y.,
Li, C.,
Li, T.,
Wan, W.,
Chen, J.,
Image Captioning with Attribute Refinement,
ICIP19(1820-1824)
IEEE DOI
1910
Image captioning, attribute recognition, Semantic attention,
Deep Neural Network, Conditional Random Field
BibRef
Shi, J.,
Li, Y.,
Wang, S.,
Cascade Attention: Multiple Feature Based Learning for Image
Captioning,
ICIP19(1970-1974)
IEEE DOI
1910
Image Captioning, Attention Mechanism, Cascade Attention
BibRef
Xiao, H.,
Shi, J.,
A Novel Attribute Selection Mechanism for Video Captioning,
ICIP19(619-623)
IEEE DOI
1910
Attributes, Video captioning, Attention, Reinforcement learning
BibRef
Wang, Q.Z.[Qing-Zhong],
Chan, A.B.[Antoni B.],
Gated Hierarchical Attention for Image Captioning,
ACCV18(IV:21-37).
Springer DOI
1906
BibRef
Wang, W.X.[Wei-Xuan],
Chen, Z.H.[Zhi-Hong],
Hu, H.F.[Hai-Feng],
Multivariate Attention Network for Image Captioning,
ACCV18(VI:587-602).
Springer DOI
1906
BibRef
Ghanimifard, M.[Mehdi],
Dobnik, S.[Simon],
Knowing When to Look for What and Where: Evaluating Generation of
Spatial Descriptions with Adaptive Attention,
VL18(IV:153-161).
Springer DOI
1905
See also Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning.
BibRef
Khademi, M.,
Schulte, O.,
Image Caption Generation with Hierarchical Contextual Visual Spatial
Attention,
Cognitive18(2024-20248)
IEEE DOI
1812
Feature extraction, Visualization, Logic gates,
Task analysis, Context modeling, Computational modeling
BibRef
Wang, F.,
Gong, X.,
Huang, L.,
Time-Dependent Pre-attention Model for Image Captioning,
ICPR18(3297-3302)
IEEE DOI
1812
Decoding, Task analysis, Semantics, Visualization,
Feature extraction, Computational modeling, Computer science
BibRef
Chen, S.[Shi],
Zhao, Q.[Qi],
Boosted Attention: Leveraging Human Attention for Image Captioning,
ECCV18(XI: 72-88).
Springer DOI
1810
BibRef
Fang, F.,
Wang, H.,
Tang, P.,
Image Captioning with Word Level Attention,
ICIP18(1278-1282)
IEEE DOI
1809
Visualization, Feature extraction, Task analysis, Training,
Recurrent neural networks, Semantics, Computational modeling,
bidirectional spatial embedding
BibRef
Zhu, Z.,
Xue, Z.,
Yuan, Z.,
Topic-Guided Attention for Image Captioning,
ICIP18(2615-2619)
IEEE DOI
1809
Visualization, Semantics, Feature extraction, Training, Decoding,
Generators, Measurement, Image captioning, Attention, Topic, Attribute,
Deep Neural Network
BibRef
Pedersoli, M.,
Lucas, T.,
Schmid, C.,
Verbeek, J.,
Areas of Attention for Image Captioning,
ICCV17(1251-1259)
IEEE DOI
1802
image segmentation, inference mechanisms,
natural language processing, object detection,
Visualization
BibRef
Tavakoliy, H.R.,
Shetty, R.,
Borji, A.,
Laaksonen, J.,
Paying Attention to Descriptions Generated by Image Captioning Models,
ICCV17(2506-2515)
IEEE DOI
1802
feature extraction, image processing, human descriptions,
human-written descriptions, image captioning model,
Visualization
BibRef
Lu, J.,
Xiong, C.,
Parikh, D.,
Socher, R.,
Knowing When to Look: Adaptive Attention via a Visual Sentinel for
Image Captioning,
CVPR17(3242-3250)
IEEE DOI
1711
Adaptation models, Computational modeling, Context modeling,
Decoding, Logic gates, Mathematical model, Visualization
BibRef
Chen, L.,
Zhang, H.,
Xiao, J.,
Nie, L.,
Shao, J.,
Liu, W.,
Chua, T.S.,
SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks
for Image Captioning,
CVPR17(6298-6306)
IEEE DOI
1711
Detectors, Feature extraction, Image coding, Neural networks,
Semantics, Visualization
BibRef
Zanfir, M.[Mihai],
Marinoiu, E.[Elisabeta],
Sminchisescu, C.[Cristian],
Spatio-Temporal Attention Models for Grounded Video Captioning,
ACCV16(IV: 104-119).
Springer DOI
1704
BibRef
Chen, T.H.[Tseng-Hung],
Zeng, K.H.[Kuo-Hao],
Hsu, W.T.[Wan-Ting],
Sun, M.[Min],
Video Captioning via Sentence Augmentation and Spatio-Temporal
Attention,
Assist16(I: 269-286).
Springer DOI
1704
BibRef
Chen, T.L.[Tian-Lang],
Zhang, Z.P.[Zhong-Ping],
You, Q.Z.[Quan-Zeng],
Fang, C.[Chen],
Wang, Z.W.[Zhao-Wen],
Jin, H.L.[Hai-Lin],
Luo, J.B.[Jie-Bo],
'Factual' or 'Emotional':
Stylized Image Captioning with Adaptive Learning and Attention,
ECCV18(X: 527-543).
Springer DOI
1810
BibRef
You, Q.Z.[Quan-Zeng],
Jin, H.L.[Hai-Lin],
Wang, Z.W.[Zhao-Wen],
Fang, C.[Chen],
Luo, J.B.[Jie-Bo],
Image Captioning with Semantic Attention,
CVPR16(4651-4659)
IEEE DOI
1612
BibRef
Chapter on Matching and Recognition Using Volumes, High Level Vision Techniques, Invariants continues in
Remote Sensing Image Captioning .