13.6.10 Multi-Modal, Cross-Modal Captioning, Image Captioning

Chapter Contents (Back)
Image Captioning. Captioning. Multimodal. Cross-Modal.

Verma, Y.[Yashaswi], Jawahar, C.V.,
A support vector approach for cross-modal search of images and texts,
CVIU(154), No. 1, 2017, pp. 48-63.
Elsevier DOI 1612
Image search BibRef

Dutta, A.[Ayushi], Verma, Y.[Yashaswi], Jawahar, C.V.,
Recurrent Image Annotation with Explicit Inter-Label Dependencies,
ECCV20(XXIX: 191-207).
Springer DOI 2010
BibRef

Xue, J.F.[Jian-Fei], Eguchi, K.[Koji],
Video Data Modeling Using Sequential Correspondence Hierarchical Dirichlet Processes,
IEICE(E100-D), No. 1, January 2017, pp. 33-41.
WWW Link. 1701
multimodal data such as the mixture of visual words and speech words extracted from video files BibRef

Liu, A.A.[An-An], Xu, N.[Ning], Wong, Y.K.[Yong-Kang], Li, J.[Junnan], Su, Y.T.[Yu-Ting], Kankanhalli, M.[Mohan],
Hierarchical & multimodal video captioning: Discovering and transferring multimodal knowledge for vision to language,
CVIU(163), No. 1, 2017, pp. 113-125.
Elsevier DOI 1712
Video to text BibRef

Guan, J.N.[Jin-Ning], Wang, E.[Eric],
Repeated review based image captioning for image evidence review,
SP:IC(63), 2018, pp. 141-148.
Elsevier DOI 1804
Repeated review, Image captioning, Encoder-decoder, Multimodal layer BibRef

Park, C.C., Kim, B., Kim, G.,
Towards Personalized Image Captioning via Multimodal Memory Networks,
PAMI(41), No. 4, April 2019, pp. 999-1012.
IEEE DOI 1903
BibRef
Earlier:
Attend to You: Personalized Image Captioning with Context Sequence Memory Networks,
CVPR17(6432-6440)
IEEE DOI 1711
Tagging, Twitter, Task analysis, Computational modeling, Writing, Vocabulary, Context modeling, Image captioning, personalization, convolutional neural networks. Pattern recognition BibRef

Xian, Y., Tian, Y.,
Self-Guiding Multimodal LSTM: When We Do Not Have a Perfect Training Dataset for Image Captioning,
IP(28), No. 11, November 2019, pp. 5241-5252.
IEEE DOI 1909
Task analysis, Visualization, Training, Semantics, Flickr, Urban areas, Training data, Image captioning, self-guiding, real-world dataset, recurrent neural network BibRef

Yang, M., Zhao, W., Xu, W., Feng, Y., Zhao, Z., Chen, X., Lei, K.,
Multitask Learning for Cross-Domain Image Captioning,
MultMed(21), No. 4, April 2019, pp. 1047-1061.
IEEE DOI 1903
Task analysis, Image generation, Data models, Training data, Neural networks, Training, Maximum likelihood estimation, reinforcement learning BibRef

Yu, N., Hu, X., Song, B., Yang, J., Zhang, J.,
Topic-Oriented Image Captioning Based on Order-Embedding,
IP(28), No. 6, June 2019, pp. 2743-2754.
IEEE DOI 1905
image classification, image matching, image retrieval, learning (artificial intelligence), image matching, cross-modal retrieval BibRef

Li, X., Xu, C., Wang, X., Lan, W., Jia, Z., Yang, G., Xu, J.,
COCO-CN for Cross-Lingual Image Tagging, Captioning, and Retrieval,
MultMed(21), No. 9, September 2019, pp. 2347-2360.
IEEE DOI 1909
Image annotation, Task analysis, Training, Image retrieval, Internet, Streaming media, Visualization, COCO-CN, Chinese language, image retrieval BibRef

Tian, C.[Chunna], Tian, M.[Ming], Jiang, M.M.[Meng-Meng], Liu, H.[Heng], Deng, D.H.[Dong-Hu],
How much do cross-modal related semantics benefit image captioning by weighting attributes and re-ranking sentences?,
PRL(125), 2019, pp. 639-645.
Elsevier DOI 1909
Semant attributes, Attribute reweighting, Cross-modal related semantics, Sentence re-ranking BibRef

Niu, Y., Lu, Z., Wen, J., Xiang, T., Chang, S.,
Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation,
IP(28), No. 4, April 2019, pp. 1720-1731.
IEEE DOI 1901
feature extraction, image classification, image fusion, image representation, learning (artificial intelligence), label quantity prediction BibRef

Huang, Y., Chen, J., Ouyang, W., Wan, W., Xue, Y.,
Image Captioning With End-to-End Attribute Detection and Subsequent Attributes Prediction,
IP(29), 2020, pp. 4013-4026.
IEEE DOI 2002
Image captioning, semantic attention, end-to-end training, multimodal attribute detector, subsequent attribute predictor BibRef

Zhao, W., Wu, X., Luo, J.,
Cross-Domain Image Captioning via Cross-Modal Retrieval and Model Adaptation,
IP(30), 2021, pp. 1180-1192.
IEEE DOI 2012
Adaptation models, Task analysis, Visualization, Computational modeling, Linguistics, Semantics, Image segmentation, model adaptation BibRef

Wang, H.[Hang], Du, Y.T.[You-Tian], Zhang, G.X.[Guang-Xun], Cai, Z.M.[Zhong-Min], Su, C.[Chang],
Learning Fundamental Visual Concepts Based on Evolved Multi-Edge Concept Graph,
MultMed(23), 2021, pp. 4400-4413.
IEEE DOI 2112
Visualization, Semantics, Image annotation, Image edge detection, Data models, Adaptation models, Task analysis, cross media BibRef


Kuo, C.W.[Chia-Wen], Kira, Z.[Zsolt],
Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning,
CVPR22(17948-17958)
IEEE DOI 2210
Measurement, Visualization, Analytical models, Graphical models, Grounding, Computational modeling, Genomics, Vision + language BibRef

Zhou, M.Y.[Ming-Yang], Zhou, L.W.[Luo-Wei], Wang, S.H.[Shuo-Hang], Cheng, Y.[Yu], Li, L.J.[Lin-Jie], Yu, Z.[Zhou], Liu, J.J.[Jing-Jing],
UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training,
CVPR21(4153-4163)
IEEE DOI 2111
Training, Visualization, Benchmark testing, Knowledge discovery, Data models, Pattern recognition, Machine translation BibRef

Laina, I., Rupprecht, C., Navab, N.,
Towards Unsupervised Image Captioning With Shared Multimodal Embeddings,
ICCV19(7413-7423)
IEEE DOI 2004
natural language processing, text analysis, multimodal embeddings, explicit supervision, Semantics BibRef

Akbari, H.[Hassan], Karaman, S.[Svebor], Bhargava, S.[Surabhi], Chen, B.[Brian], Vondrick, C.[Carl], Chang, S.F.[Shih-Fu],
Multi-Level Multimodal Common Semantic Space for Image-Phrase Grounding,
CVPR19(12468-12478).
IEEE DOI 2002
BibRef

Chen, T.H., Liao, Y.H., Chuang, C.Y., Hsu, W.T., Fu, J., Sun, M.,
Show, Adapt and Tell: Adversarial Training of Cross-Domain Image Captioner,
ICCV17(521-530)
IEEE DOI 1802
image processing, inference mechanisms, text analysis, MSCOCO, adversarial training procedure, captioner act, critic networks, Training data BibRef

Niu, Z.X.[Zhen-Xing], Zhou, M.[Mo], Wang, L.[Le], Gao, X.B.[Xin-Bo], Hua, G.[Gang],
Hierarchical Multimodal LSTM for Dense Visual-Semantic Embedding,
ICCV17(1899-1907)
IEEE DOI 1802
map sentences and images. document image processing, image representation, recurrent neural nets, HM-LSTM, Hierarchical Multimodal LSTM, Recurrent neural networks BibRef

Pini, S.[Stefano], Cornia, M.[Marcella], Baraldi, L.[Lorenzo], Cucchiara, R.[Rita],
Towards Video Captioning with Naming: A Novel Dataset and a Multi-modal Approach,
CIAP17(II:384-395).
Springer DOI 1711
BibRef

Pan, J.Y.[Jia-Yu], Yang, H.J.[Hyung-Jeong], Faloutsos, C.[Christos],
MMSS: Graph-based Multi-modal Story-oriented Video Summarization and Retrieval,
CMU-CS-TR-04-114. 2004.
HTML Version. 0501
BibRef

Pan, J.Y.[Jia-Yu], Yang, H.J.[Hyung-Jeong], Faloutsos, C.[Christos], Duygulu, P.[Pinar],
GCap: Graph-based Automatic Image Captioning,
MMDE04(146).
IEEE DOI 0406
BibRef

Pan, J.Y.[Jia-Yu],
Advanced Tools for Video and Multimedia Mining,
CMU-CS-06-126, May 2006. BibRef 0605 Ph.D.Thesis,
HTML Version. BibRef

Chapter on Matching and Recognition Using Volumes, High Level Vision Techniques, Invariants continues in
Transformer for Captioning, Image Captioning .


Last update:May 22, 2023 at 22:32:27