15.3.1.11 Vision-Language Navigation

Chapter Contents (Back)
Navigation. Vision-Language.

Bajcsy, R., and Nagel, H.H.,
Descriptive and Prescriptive Languages for Mobility Tasks: Are They Different?,
AIU96(280-300). BibRef 9600

Zhu, M., Chen, W., Xia, J., Ma, Y., Zhang, Y., Luo, Y., Huang, Z., Liu, L.,
Location2Vec: A Situation-Aware Representation for Visual Exploration of Urban Locations,
ITS(20), No. 10, October 2019, pp. 3981-3990.
IEEE DOI 1910
Trajectory, Visualization, Sociology, Statistics, Vehicle dynamics, Mobile handsets, Natural language processing, Human mobility, visual exploration BibRef

Li, P.[Pei], Li, X.[Xinde], Li, X.H.[Xiang-Hui], Pan, H.[Hong], Khyam, M.O., Noor-A-Rahim, M., Ge, S.S.[Shuzhi Sam],
Place perception from the fusion of different image representation,
PR(110), 2021, pp. 107680.
Elsevier DOI 2011
Indoor place perception, CNN, LSTM, Convolutional auto-encoder, Natural language BibRef

Wu, Z.K.[Zong-Kai], Liu, Z.[Zihan], Wang, T.[Ting], Wang, D.L.[Dong-Lin],
Improved Speaker and Navigator for Vision-and-Language Navigation,
MultMedMag(28), No. 4, October 2021, pp. 55-63.
IEEE DOI 2112
Navigation, Visualization, Decoding, Trajectory, Task analysis, Feature extraction, Head BibRef

Wang, X.[Xin], Huang, Q.Y.[Qiu-Yuan], Celikyilmaz, A.[Asli], Gao, J.F.[Jian-Feng], Shen, D.[Dinghan], Wang, Y.F.[Yuan-Fang], Wang, W.Y.[William Yang], Zhang, L.[Lei],
Vision-Language Navigation Policy Learning and Adaptation,
PAMI(43), No. 12, December 2021, pp. 4205-4216.
IEEE DOI 2112
BibRef
Earlier:
Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation,
CVPR19(6622-6631).
IEEE DOI Award, CVPR, Student. 2002
Navigation, Visualization, Cognition, Reinforcement learning, Natural languages, Benchmark testing, Natural languages, multimodal machine learning BibRef


Irshad, M.Z.[Muhammad Zubair], Mithun, N.C.[Niluthpol Chowdhury], Seymour, Z.[Zachary], Chiu, H.P.[Han-Pang], Samarasekera, S.[Supun], Kumar, R.[Rakesh],
Semantically-aware Spatio-temporal Reasoning Agent for Vision-and-Language Navigation in Continuous Environments,
ICPR22(4065-4071)
IEEE DOI 2212
Visualization, Navigation, Semantics, Natural languages, Transformers, Feature extraction BibRef

Ossandón, J.[Joaquín], Earle, B.[Benjamín], Soto, Á.[Álvaro],
Bridging the Visual Semantic Gap in VLN via Semantically Richer Instructions,
ECCV22(XXXVII:54-69).
Springer DOI 2211
Visual-and-Language Navigation BibRef

Burns, A.[Andrea], Arsan, D.[Deniz], Agrawal, S.[Sanjna], Kumar, R.[Ranjitha], Saenko, K.[Kate], Plummer, B.A.[Bryan A.],
A Dataset for Interactive Vision-Language Navigation with Unknown Command Feasibility,
ECCV22(VIII:312-328).
Springer DOI 2211
BibRef

Huang, Z.M.[Zan-Ming], Shangguan, Z.[Zhongkai], Zhang, J.Y.[Jimu-Yang], Bar, G.[Gilad], Boyd, M.[Matthew], Ohn-Bar, E.[Eshed],
ASSISTER: Assistive Navigation via Conditional Instruction Generation,
ECCV22(XXXVI:271-289).
Springer DOI 2211
BibRef

Zhou, K.[Kaiwen], Wang, X.E.[Xin Eric],
FedVLN: Privacy-Preserving Federated Vision-and-Language Navigation,
ECCV22(XXXVI:682-699).
Springer DOI 2211
BibRef

Chen, S.[Shizhe], Guhur, P.L.[Pierre-Louis], Tapaswi, M.[Makarand], Schmid, C.[Cordelia], Laptev, I.[Ivan],
Learning from Unlabeled 3D Environments for Vision-and-Language Navigation,
ECCV22(XXIX:638-655).
Springer DOI 2211
BibRef

Krantz, J.[Jacob], Lee, S.[Stefan],
Sim-2-Sim Transfer for Vision-and-Language Navigation in Continuous Environments,
ECCV22(XXIX:588-603).
Springer DOI 2211
BibRef

Lin, C.[Chuang], Jiang, Y.[Yi], Cai, J.F.[Jian-Fei], Qu, L.Z.[Li-Zhen], Haffari, G.[Gholamreza], Yuan, Z.H.[Ze-Huan],
Multimodal Transformer with Variable-Length Memory for Vision-and-Language Navigation,
ECCV22(XXXVI:380-397).
Springer DOI 2211
BibRef

Cheng, W.H.[Wen-Hao], Dong, X.P.[Xing-Ping], Khan, S.[Salman], Shen, J.B.[Jian-Bing],
Learning Disentanglement with Decoupled Labels for Vision-Language Navigation,
ECCV22(XXXVI:309-329).
Springer DOI 2211
BibRef

Kolmet, M.[Manuel], Zhou, Q.[Qunjie], Ošep, A.[Aljoša], Leal-Taixé, L.[Laura],
Text2Pos: Text-to-Point-Cloud Cross-Modal Localization,
CVPR22(6677-6686)
IEEE DOI 2210
To specify a location. Location awareness, Point cloud compression, Visualization, Navigation, Mobile handsets, Pattern recognition, Navigation and autonomous driving BibRef

Partsey, R.[Ruslan], Wijmans, E.[Erik], Yokoyama, N.[Naoki], Dobosevych, O.[Oles], Batra, D.[Dhruv], Maksymets, O.[Oleksandr],
Is Mapping Necessary for Realistic PointGoal Navigation?,
CVPR22(17211-17220)
IEEE DOI 2210
Recurrent neural networks, Navigation, Robot vision systems, Reinforcement learning, Benchmark testing, Sensors, Robot vision BibRef

Ramakrishnan, S.K.[Santhosh Kumar], Chaplot, D.S.[Devendra Singh], Al-Halah, Z.[Ziad], Malik, J.[Jitendra], Grauman, K.[Kristen],
PONI: Potential Functions for ObjectGoal Navigation with Interaction-free Learning,
CVPR22(18868-18878)
IEEE DOI 2210
Training, Solid modeling, Navigation, Design methodology, Semantics, Supervised learning, Scene analysis and understanding, Robot vision BibRef

Chen, S.Z.[Shi-Zhe], Guhur, P.L.[Pierre-Louis], Tapaswi, M.[Makarand], Schmid, C.[Cordelia], Laptev, I.[Ivan],
Think Global, Act Local: Dual-scale Graph Transformer for Vision-and-Language Navigation,
CVPR22(16516-16526)
IEEE DOI 2210
Visualization, Navigation, Grounding, Benchmark testing, Transformers, Encoding, Vision+language, Navigation and autonomous driving BibRef

Zhou, M.Y.[Ming-Yang], Yu, L.C.[Li-Cheng], Singh, A.[Amanpreet], Wang, M.J.[Meng-Jiao], Yu, Z.[Zhou], Zhang, N.[Ning],
Unsupervised Vision-and-Language Pretraining via Retrieval-based Multi-Granular Alignment,
CVPR22(16464-16473)
IEEE DOI 2210
Adaptation models, Visualization, Computational modeling, Benchmark testing, Data models, Pattern recognition, Self- semi- meta- unsupervised learning BibRef

Qiao, Y.[Yanyuan], Qi, Y.[Yuankai], Hong, Y.C.[Yi-Cong], Yu, Z.[Zheng], Wang, P.[Peng], Wu, Q.[Qi],
HOP: History-and-Order Aware Pretraining for Vision-and-Language Navigation,
CVPR22(15397-15406)
IEEE DOI 2210
Visualization, Navigation, Computational modeling, Decision making, Trajectory, Pattern recognition, Vision+language, Navigation and autonomous driving BibRef

Wang, S.[Su], Montgomery, C.[Ceslee], Orbay, J.[Jordi], Birodkar, V.[Vighnesh], Faust, A.[Aleksandra], Gur, I.[Izzeddin], Jaques, N.[Natasha], Waters, A.[Austin], Baldridge, J.[Jason], Anderson, P.[Peter],
Less is More: Generating Grounded Navigation Instructions from Landmarks,
CVPR22(15407-15417)
IEEE DOI 2210
Training, Visualization, Navigation, Grounding, Focusing, Detectors, Multitasking, Vision+language BibRef

Hong, Y.C.[Yi-Cong], Wang, Z.[Zun], Wu, Q.[Qi], Gould, S.[Stephen],
Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation,
CVPR22(15418-15428)
IEEE DOI 2210
Training, Bridges, Navigation, Grounding, Pattern recognition, Task analysis, Vision+language BibRef

Chen, J.[Jinyu], Gao, C.[Chen], Meng, E.[Erli], Zhang, Q.[Qiong], Liu, S.[Si],
Reinforced Structured State-Evolution for Vision-Language Navigation,
CVPR22(15429-15438)
IEEE DOI 2210
Navigation, Computational modeling, Layout, Natural languages, Reinforcement learning, Predictive models, Vision+language BibRef

Georgakis, G.[Georgios], Schmeckpeper, K.[Karl], Wanchoo, K.[Karan], Dan, S.[Soham], Miltsakaki, E.[Eleni], Roth, D.[Dan], Daniilidis, K.[Kostas],
Cross-modal Map Learning for Vision and Language Navigation,
CVPR22(15439-15449)
IEEE DOI 2210
Navigation, Grounding, Semantics, Natural languages, Predictive models, Benchmark testing, Vision+language, Navigation and autonomous driving BibRef

Wang, H.Q.[Han-Qing], Liang, W.[Wei], Shen, J.B.[Jian-Bing], Van Gool, L.J.[Luc J.], Wang, W.[Wenguan],
Counterfactual Cycle-Consistent Learning for Instruction Following and Generation in Vision-Language Navigation,
CVPR22(15450-15460)
IEEE DOI 2210
Training, Learning systems, Correlation, Navigation, Computational modeling, Buildings, Vision+language BibRef

Song, C.H.[Chan Hee], Kil, J.[Jihyung], Pan, T.Y.[Tai-Yu], Sadler, B.M.[Brian M.], Chao, W.L.[Wei-Lun], Su, Y.[Yu],
One Step at a Time: Long-Horizon Vision-and-Language Navigation with Milestones,
CVPR22(15461-15470)
IEEE DOI 2210
Navigation, Computational modeling, Robot vision systems, Machine learning, Autonomous agents, Pattern recognition, Robot vision BibRef

Guhur, P.L.[Pierre-Louis], Tapaswi, M.[Makarand], Chen, S.Z.[Shi-Zhe], Laptev, I.[Ivan], Schmid, C.[Cordelia],
Airbert: In-Domain Pretraining for Vision-and-Language Navigation,
ICCV21(1614-1623)
IEEE DOI 2203
Adaptation models, Navigation, Atmospheric modeling, Computational modeling, Natural languages, Training data, Vision for robotics and autonomous vehicles BibRef

Liu, C.[Chong], Zhu, F.[Fengda], Chang, X.J.[Xiao-Jun], Liang, X.D.[Xiao-Dan], Ge, Z.[Zongyuan], Shen, Y.D.[Yi-Dong],
Vision-Language Navigation with Random Environmental Mixup,
ICCV21(1624-1634)
IEEE DOI 2203
Visualization, Navigation, Natural languages, Benchmark testing, Data models, Task analysis, Vision+language, BibRef

Qi, Y.[Yuankai], Pan, Z.Z.[Zi-Zheng], Hong, Y.C.[Yi-Cong], Yang, M.H.[Ming-Hsuan], van den Hengel, A.J.[Anton J.], Wu, Q.[Qi],
The Road to Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation,
ICCV21(1635-1644)
IEEE DOI 2203
Visualization, TV, Navigation, Roads, Bit error rate, Predictive models, Linguistics, Vision+language, BibRef

Liu, Z.Y.[Zhe-Yuan], Rodriguez-Opazo, C.[Cristian], Teney, D.[Damien], Gould, S.[Stephen],
Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models,
ICCV21(2105-2114)
IEEE DOI 2203
Visualization, Limiting, Codes, Image retrieval, Natural languages, Computer architecture, Vision+language, Representation learning BibRef

Pashevich, A.[Alexander], Schmid, C.[Cordelia], Sun, C.[Chen],
Episodic Transformer for Vision-and-Language Navigation,
ICCV21(15922-15932)
IEEE DOI 2203
Training, Visualization, Navigation, Natural languages, Detectors, Benchmark testing, Transformers, Vision+language BibRef

Ding, H.H.[Heng-Hui], Liu, C.[Chang], Wang, S.[Suchen], Jiang, X.D.[Xu-Dong],
Vision-Language Transformer and Query Generation for Referring Segmentation,
ICCV21(16301-16310)
IEEE DOI 2203
Convolutional codes, Image segmentation, Visualization, Computational modeling, Computer architecture, Transformers, Vision+language BibRef

Chen, K.[Kevin], Chen, J.K.[Junshen K.], Chuang, J.[Jo], Vázquez, M.[Marynel], Savarese, S.[Silvio],
Topological Planning with Transformers for Vision-and-Language Navigation,
CVPR21(11271-11281)
IEEE DOI 2111
Backtracking, Navigation, Natural languages, Buildings, Transformers, Planning BibRef

Badki, A.[Abhishek], Gallo, O.[Orazio], Kautz, J.[Jan], Sen, P.[Pradeep],
Binary TTC: A Temporal Geofence for Autonomous Navigation,
CVPR21(12941-12950)
IEEE DOI 2111
Quantization (signal), Estimation, Tools, Observers, Cameras, Real-time systems BibRef

Wang, H.Q.[Han-Qing], Wang, W.G.[Wen-Guan], Liang, W.[Wei], Xiong, C.M.[Cai-Ming], Shen, J.B.[Jian-Bing],
Structured Scene Memory for Vision-Language Navigation,
CVPR21(8451-8460)
IEEE DOI 2111
Visualization, Recurrent neural networks, Navigation, Decision making, Layout, Memory architecture BibRef

Wang, H.Q.[Han-Qing], Wang, W.[Wenguan], Shu, T.[Tianmin], Liang, W.[Wei], Shen, J.B.[Jian-Bing],
Active Visual Information Gathering for Vision-language Navigation,
ECCV20(XXII:307-322).
Springer DOI 2011
BibRef

Cao, J.[Jize], Gan, Z.[Zhe], Cheng, Y.[Yu], Yu, L.C.[Li-Cheng], Chen, Y.C.[Yen-Chun], Liu, J.J.[Jing-Jing],
Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-language Models,
ECCV20(VI:565-580).
Springer DOI 2011
BibRef

Moghaddam, M.K.[Mahdi Kazemi], Abbasnejad, E.[Ehsan], Wu, Q.[Qi], Shi, J.Q.F.[Javen Qin-Feng], van den Hengel, A.J.[Anton J.],
ForeSI: Success-Aware Visual Navigation Agent,
WACV22(3401-3410)
IEEE DOI 2202
Training, Visualization, Navigation, Detectors, Reinforcement learning, Predictive models, Analysis and Understanding BibRef

Qi, Y., Wu, Q., Anderson, P., Wang, X., Wang, W.Y., Shen, C., van den Hengel, A.J.,
REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments,
CVPR20(9979-9988)
IEEE DOI 2008
Task analysis, Navigation, Robots, Natural languages, Visualization, Object recognition, Indoor environments BibRef

Qi, Y.K.[Yuan-Kai], Pan, Z.Z.[Zi-Zheng], Zhang, S.P.[Sheng-Ping], van den Hengel, A.J.[Anton J.], Wu, Q.[Qi],
Object-and-action Aware Model for Visual Language Navigation,
ECCV20(X:303-317).
Springer DOI 2011
BibRef

Krantz, J.[Jacob], Wijmans, E.[Erik], Majumdar, A.[Arjun], Batra, D.[Dhruv], Lee, S.[Stefan],
Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments,
ECCV20(XXVIII:104-120).
Springer DOI 2011
Agents must execute low-level actions to follow natural language navigation directions. BibRef

Wang, H.[Hu], Wu, Q.[Qi], Shen, C.H.[Chun-Hua],
Soft Expert Reward Learning for Vision-and-Language Navigation,
ECCV20(IX:126-141).
Springer DOI 2011
BibRef

Kim, J., Moon, S., Rohrbach, A., Darrell, T.J., Canny, J.,
Advisable Learning for Self-Driving Vehicles by Internalizing Observation-to-Action Rules,
CVPR20(9658-9667)
IEEE DOI 2008
Visualization, Semantics, Natural languages, Image segmentation, Generators, Training, Roads BibRef

Fu, T.J.[Tsu-Jui], Wang, X.E.[Xin Eric], Peterson, M.F.[Matthew F.], Grafton, S.T.[Scott T.], Eckstein, M.P.[Miguel P.], Wang, W.Y.[William Yang],
Counterfactual Vision-and-Language Navigation via Adversarial Path Sampler,
ECCV20(VI:71-86).
Springer DOI 2011
Based on language descriptions, relate them to the environment. BibRef

Majumdar, A.[Arjun], Shrivastava, A.[Ayush], Lee, S.[Stefan], Anderson, P.[Peter], Parikh, D.[Devi], Batra, D.[Dhruv],
Improving Vision-and-language Navigation with Image-text Pairs from the Web,
ECCV20(VI:259-274).
Springer DOI 2011
BibRef

Zhu, F.D.[Feng-Da], Zhu, Y.[Yi], Chang, X.J.[Xiao-Jun], Liang, X.D.[Xiao-Dan],
Vision-Language Navigation With Self-Supervised Auxiliary Reasoning Tasks,
CVPR20(10009-10019)
IEEE DOI 2008
Task analysis, Navigation, Cognition, Trajectory, Semantics, Training, Natural languages BibRef

Hao, W., Li, C., Li, X., Carin, L., Gao, J.,
Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-Training,
CVPR20(13134-13143)
IEEE DOI 2008
Task analysis, Navigation, Visualization, Trajectory, Presses, Head, Predictive models BibRef

Yu, F., Deng, Z., Narasimhan, K., Russakovsky, O.,
Take the Scenic Route: Improving Generalization in Vision-and-Language Navigation,
VL3W20(4000-4004)
IEEE DOI 2008
Navigation, Benchmark testing, Task analysis, Natural languages, Visualization, Training data, Markov processes BibRef

Ma, C.Y.[Chih-Yao], Wu, Z.X.[Zu-Xuan], Al Regib, G.[Ghassan], Xiong, C.M.[Cai-Ming], Kira, Z.[Zsolt],
The Regretful Agent: Heuristic-Aided Navigation Through Progress Estimation,
CVPR19(6725-6733).
IEEE DOI 2002
Navigating to a goal purely from language instructions and visual information. BibRef

Ke, L.Y.M.[Li-Yi-Ming], Li, X.J.[Xiu-Jun], Bisk, Y.[Yonatan], Holtzman, A.[Ari], Gan, Z.[Zhe], Liu, J.J.[Jing-Jing], Gao, J.F.[Jian-Feng], Choi, Y.J.[Ye-Jin], Srinivasa, S.[Siddhartha],
Tactical Rewind: Self-Correction via Backtracking in Vision-And-Language Navigation,
CVPR19(6734-6742).
IEEE DOI 2002
BibRef

Wang, X.[Xin], Xiong, W.H.[Wen-Han], Wang, H.M.[Hong-Min], Wang, W.Y.[William Yang],
Look Before You Leap: Bridging Model-Free and Model-Based Reinforcement Learning for Planned-Ahead Vision-and-Language Navigation,
ECCV18(XVI: 38-55).
Springer DOI 1810
BibRef

Anderson, P.[Peter], Wu, Q.[Qi], Teney, D.[Damien], Bruce, J.[Jake], Johnson, M.[Mark], Sünderhauf, N.[Niko], Reid, I.D.[Ian D.], Gould, S.[Stephen], van den Hengel, A.J.[Anton J.],
Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments,
CVPR18(3674-3683)
IEEE DOI 1812
Navigation, Task analysis, Robots, Visualization, Cameras, Natural languages BibRef

Chen, H.[Howard], Suhr, A.[Alane], Misra, D.[Dipendra], Snavely, N.[Noah], Artzi, Y.[Yoav],
TOUCHDOWN: Natural Language Navigation and Spatial Reasoning in Visual Street Environments,
CVPR19(12530-12539).
IEEE DOI 2002
BibRef

Nguyen, K.[Khanh], Dey, D.[Debadeepta], Brockett, C.[Chris], Dolan, B.[Bill],
Vision-Based Navigation With Language-Based Assistance via Imitation Learning With Indirect Intervention,
CVPR19(12519-12529).
IEEE DOI 2002
BibRef

Khoshelham, K., Díaz-Vilariño, L.,
3D Modelling of Interior Spaces: Learning the Language of Indoor Architecture,
CloseRange14(321-326).
DOI Link 1411
BibRef

van Laere, O.[Olivier], Schockaert, S.[Steven], Dhoedt, B.[Bart],
Finding locations of Flickr resources using language models and similarity search,
ICMR11(48).
DOI Link 1301
estimate where a given photo or video was taken, using only the tags that a user has assigned BibRef

Chapter on Active Vision, Camera Calibration, Mobile Robots, Navigation, Road Following continues in
Visual SLAM: Simultaneous Location and Mapping or Matching .


Last update:Jan 23, 2023 at 16:42:47