Verma, Y.[Yashaswi],
Jawahar, C.V.,
A support vector approach for cross-modal search of images and texts,
CVIU(154), No. 1, 2017, pp. 48-63.
Elsevier DOI
1612
Image search
BibRef
Dutta, A.[Ayushi],
Verma, Y.[Yashaswi],
Jawahar, C.V.,
Recurrent Image Annotation with Explicit Inter-Label Dependencies,
ECCV20(XXIX: 191-207).
Springer DOI
2010
BibRef
Xue, J.F.[Jian-Fei],
Eguchi, K.[Koji],
Video Data Modeling Using Sequential Correspondence Hierarchical
Dirichlet Processes,
IEICE(E100-D), No. 1, January 2017, pp. 33-41.
WWW Link.
1701
multimodal data such as the mixture of visual words and speech words
extracted from video files
BibRef
Liu, A.A.[An-An],
Xu, N.[Ning],
Wong, Y.K.[Yong-Kang],
Li, J.[Junnan],
Su, Y.T.[Yu-Ting],
Kankanhalli, M.[Mohan],
Hierarchical & multimodal video captioning: Discovering and
transferring multimodal knowledge for vision to language,
CVIU(163), No. 1, 2017, pp. 113-125.
Elsevier DOI
1712
Video to text
BibRef
Guan, J.N.[Jin-Ning],
Wang, E.[Eric],
Repeated review based image captioning for image evidence review,
SP:IC(63), 2018, pp. 141-148.
Elsevier DOI
1804
Repeated review, Image captioning, Encoder-decoder, Multimodal layer
BibRef
Park, C.C.,
Kim, B.,
Kim, G.,
Towards Personalized Image Captioning via Multimodal Memory Networks,
PAMI(41), No. 4, April 2019, pp. 999-1012.
IEEE DOI
1903
BibRef
Earlier:
Attend to You: Personalized Image Captioning with Context Sequence
Memory Networks,
CVPR17(6432-6440)
IEEE DOI
1711
Tagging, Twitter, Task analysis, Computational modeling, Writing,
Vocabulary, Context modeling, Image captioning, personalization,
convolutional neural networks.
Pattern recognition
BibRef
Xian, Y.,
Tian, Y.,
Self-Guiding Multimodal LSTM: When We Do Not Have a Perfect Training
Dataset for Image Captioning,
IP(28), No. 11, November 2019, pp. 5241-5252.
IEEE DOI
1909
Task analysis, Visualization, Training, Semantics, Flickr, Urban areas,
Training data, Image captioning, self-guiding, real-world dataset,
recurrent neural network
BibRef
Yang, M.,
Zhao, W.,
Xu, W.,
Feng, Y.,
Zhao, Z.,
Chen, X.,
Lei, K.,
Multitask Learning for Cross-Domain Image Captioning,
MultMed(21), No. 4, April 2019, pp. 1047-1061.
IEEE DOI
1903
Task analysis, Image generation, Data models, Training data,
Neural networks, Training, Maximum likelihood estimation,
reinforcement learning
BibRef
Yu, N.,
Hu, X.,
Song, B.,
Yang, J.,
Zhang, J.,
Topic-Oriented Image Captioning Based on Order-Embedding,
IP(28), No. 6, June 2019, pp. 2743-2754.
IEEE DOI
1905
image classification, image matching, image retrieval,
learning (artificial intelligence), image matching,
cross-modal retrieval
BibRef
Li, X.,
Xu, C.,
Wang, X.,
Lan, W.,
Jia, Z.,
Yang, G.,
Xu, J.,
COCO-CN for Cross-Lingual Image Tagging, Captioning, and Retrieval,
MultMed(21), No. 9, September 2019, pp. 2347-2360.
IEEE DOI
1909
Image annotation, Task analysis, Training, Image retrieval, Internet,
Streaming media, Visualization, COCO-CN, Chinese language,
image retrieval
BibRef
Tian, C.[Chunna],
Tian, M.[Ming],
Jiang, M.M.[Meng-Meng],
Liu, H.[Heng],
Deng, D.H.[Dong-Hu],
How much do cross-modal related semantics benefit image captioning by
weighting attributes and re-ranking sentences?,
PRL(125), 2019, pp. 639-645.
Elsevier DOI
1909
Semant attributes, Attribute reweighting,
Cross-modal related semantics, Sentence re-ranking
BibRef
Niu, Y.,
Lu, Z.,
Wen, J.,
Xiang, T.,
Chang, S.,
Multi-Modal Multi-Scale Deep Learning for Large-Scale Image
Annotation,
IP(28), No. 4, April 2019, pp. 1720-1731.
IEEE DOI
1901
feature extraction, image classification, image fusion,
image representation, learning (artificial intelligence),
label quantity prediction
BibRef
Huang, Y.,
Chen, J.,
Ouyang, W.,
Wan, W.,
Xue, Y.,
Image Captioning With End-to-End Attribute Detection and Subsequent
Attributes Prediction,
IP(29), 2020, pp. 4013-4026.
IEEE DOI
2002
Image captioning, semantic attention, end-to-end training,
multimodal attribute detector, subsequent attribute predictor
BibRef
Zhao, W.,
Wu, X.,
Luo, J.,
Cross-Domain Image Captioning via Cross-Modal Retrieval and Model
Adaptation,
IP(30), 2021, pp. 1180-1192.
IEEE DOI
2012
Adaptation models, Task analysis, Visualization,
Computational modeling, Linguistics, Semantics, Image segmentation,
model adaptation
BibRef
Wang, H.[Hang],
Du, Y.T.[You-Tian],
Zhang, G.X.[Guang-Xun],
Cai, Z.M.[Zhong-Min],
Su, C.[Chang],
Learning Fundamental Visual Concepts Based on Evolved Multi-Edge
Concept Graph,
MultMed(23), 2021, pp. 4400-4413.
IEEE DOI
2112
Visualization, Semantics, Image annotation, Image edge detection,
Data models, Adaptation models, Task analysis,
cross media
BibRef
Zhou, M.Y.[Ming-Yang],
Zhou, L.W.[Luo-Wei],
Wang, S.H.[Shuo-Hang],
Cheng, Y.[Yu],
Li, L.J.[Lin-Jie],
Yu, Z.[Zhou],
Liu, J.J.[Jing-Jing],
UC2: Universal Cross-lingual Cross-modal Vision-and-Language
Pre-training,
CVPR21(4153-4163)
IEEE DOI
2111
Training, Visualization, Benchmark testing, Knowledge discovery,
Data models, Pattern recognition, Machine translation
BibRef
Laina, I.,
Rupprecht, C.,
Navab, N.,
Towards Unsupervised Image Captioning With Shared Multimodal
Embeddings,
ICCV19(7413-7423)
IEEE DOI
2004
natural language processing, text analysis,
multimodal embeddings, explicit supervision,
Semantics
BibRef
Akbari, H.[Hassan],
Karaman, S.[Svebor],
Bhargava, S.[Surabhi],
Chen, B.[Brian],
Vondrick, C.[Carl],
Chang, S.F.[Shih-Fu],
Multi-Level Multimodal Common Semantic Space for Image-Phrase Grounding,
CVPR19(12468-12478).
IEEE DOI
2002
BibRef
Chen, T.H.,
Liao, Y.H.,
Chuang, C.Y.,
Hsu, W.T.,
Fu, J.,
Sun, M.,
Show, Adapt and Tell:
Adversarial Training of Cross-Domain Image Captioner,
ICCV17(521-530)
IEEE DOI
1802
image processing, inference mechanisms, text analysis, MSCOCO,
adversarial training procedure, captioner act, critic networks,
Training data
BibRef
Niu, Z.X.[Zhen-Xing],
Zhou, M.[Mo],
Wang, L.[Le],
Gao, X.B.[Xin-Bo],
Hua, G.[Gang],
Hierarchical Multimodal LSTM for Dense Visual-Semantic Embedding,
ICCV17(1899-1907)
IEEE DOI
1802
map sentences and images.
document image processing, image representation,
recurrent neural nets, HM-LSTM, Hierarchical Multimodal LSTM,
Recurrent neural networks
BibRef
Pini, S.[Stefano],
Cornia, M.[Marcella],
Baraldi, L.[Lorenzo],
Cucchiara, R.[Rita],
Towards Video Captioning with Naming:
A Novel Dataset and a Multi-modal Approach,
CIAP17(II:384-395).
Springer DOI
1711
BibRef
Pan, J.Y.[Jia-Yu],
Yang, H.J.[Hyung-Jeong],
Faloutsos, C.[Christos],
MMSS: Graph-based Multi-modal Story-oriented Video Summarization and
Retrieval,
CMU-CS-TR-04-114. 2004.
HTML Version.
0501
BibRef
Pan, J.Y.[Jia-Yu],
Yang, H.J.[Hyung-Jeong],
Faloutsos, C.[Christos],
Duygulu, P.[Pinar],
GCap: Graph-based Automatic Image Captioning,
MMDE04(146).
IEEE DOI
0406
BibRef
Pan, J.Y.[Jia-Yu],
Advanced Tools for Video and Multimedia Mining,
CMU-CS-06-126, May 2006.
BibRef
0605
Ph.D.Thesis,
HTML Version.
BibRef
Chapter on Matching and Recognition Using Volumes, High Level Vision Techniques, Invariants continues in
Transformer for Captioning, Image Captioning .