14.5.9.5.1 Vision Transformers, ViT

Chapter Contents (Back)
Vision Transformers. Transformers. Shift, Scale, and Distortion Invariance.

Bazi, Y.[Yakoub], Bashmal, L.[Laila], Al Rahhal, M.M.[Mohamad M.], Al Dayil, R.[Reham], Al Ajlan, N.[Naif],
Vision Transformers for Remote Sensing Image Classification,
RS(13), No. 3, 2021, pp. xx-yy.
DOI Link 2102
BibRef

Hu, H.Q.[Hao-Qi], Lu, X.F.[Xiao-Feng], Zhang, X.P.[Xin-Peng], Zhang, T.X.[Tian-Xing], Sun, G.L.[Guang-Ling],
Inheritance Attention Matrix-Based Universal Adversarial Perturbations on Vision Transformers,
SPLetters(28), 2021, pp. 1923-1927.
IEEE DOI 2110
Perturbation methods, Robustness, Visualization, Transformers, Optimization, Task analysis, Head, Vision Transformers, self-attention BibRef

Li, T.[Tao], Zhang, Z.[Zheng], Pei, L.[Lishen], Gan, Y.[Yan],
HashFormer: Vision Transformer Based Deep Hashing for Image Retrieval,
SPLetters(29), 2022, pp. 827-831.
IEEE DOI 2204
Transformers, Binary codes, Task analysis, Training, Image retrieval, Feature extraction, Databases, Binary embedding, image retrieval BibRef

Jiang, B.[Bo], Zhao, K.K.[Kang-Kang], Tang, J.[Jin],
RGTransformer: Region-Graph Transformer for Image Representation and Few-Shot Classification,
SPLetters(29), 2022, pp. 792-796.
IEEE DOI 2204
Measurement, Transformers, Image representation, Feature extraction, Visualization, transformer BibRef

Chen, Z.M.[Zhao-Min], Cui, Q.[Quan], Zhao, B.[Borui], Song, R.J.[Ren-Jie], Zhang, X.Q.[Xiao-Qin], Yoshie, O.[Osamu],
SST: Spatial and Semantic Transformers for Multi-Label Image Recognition,
IP(31), 2022, pp. 2570-2583.
IEEE DOI 2204
Correlation, Semantics, Transformers, Image recognition, Task analysis, Training, Feature extraction, label correlation BibRef

Xue, Z.X.[Zhi-Xiang], Tan, X.[Xiong], Yu, X.[Xuchu], Liu, B.[Bing], Yu, A.[Anzhu], Zhang, P.Q.[Peng-Qiang],
Deep Hierarchical Vision Transformer for Hyperspectral and LiDAR Data Classification,
IP(31), 2022, pp. 3095-3110.
IEEE DOI 2205
Feature extraction, Transformers, Hyperspectral imaging, Laser radar, Data mining, Collaboration, Data models, cross attention fusion BibRef

Wang, G.H.[Guang-Hui], Li, B.[Bin], Zhang, T.[Tao], Zhang, S.[Shubi],
A Network Combining a Transformer and a Convolutional Neural Network for Remote Sensing Image Change Detection,
RS(14), No. 9, 2022, pp. xx-yy.
DOI Link 2205
BibRef

Luo, G.[Gen], Zhou, Y.[Yiyi], Sun, X.S.[Xiao-Shuai], Wang, Y.[Yan], Cao, L.J.[Liu-Juan], Wu, Y.J.[Yong-Jian], Huang, F.Y.[Fei-Yue], Ji, R.R.[Rong-Rong],
Towards Lightweight Transformer Via Group-Wise Transformation for Vision-and-Language Tasks,
IP(31), 2022, pp. 3386-3398.
IEEE DOI 2205
Transformers, Task analysis, Computational modeling, Benchmark testing, Visualization, Convolution, Head, reference expression comprehension BibRef

Tu, Y.B.[Yun-Bin], Li, L.[Liang], Su, L.[Li], Gao, S.X.[Sheng-Xiang], Yan, C.G.[Cheng-Gang], Zha, Z.J.[Zheng-Jun], Yu, Z.T.[Zheng-Tao], Huang, Q.M.[Qing-Ming],
I2-Transformer: Intra- and Inter-Relation Embedding Transformer for TV Show Captioning,
IP(31), 2022, pp. 3565-3577.
IEEE DOI 2206
Transformers, Semantics, Task analysis, Visualization, TV, Graph neural networks, TV Show captioning, transformer BibRef

Heo, J.[Jiseong], Wang, Y.[Yooseung], Park, J.[Jihun],
Occlusion-aware spatial attention transformer for occluded object recognition,
PRL(159), 2022, pp. 70-76.
Elsevier DOI 2206
Occluded object recognition, Visual transformer, Spatial attention BibRef

Wang, J.Y.[Jia-Yun], Chakraborty, R.[Rudrasis], Yu, S.X.[Stella X.],
Transformer for 3D Point Clouds,
PAMI(44), No. 8, August 2022, pp. 4419-4431.
IEEE DOI 2207
Convolution, Feature extraction, Shape, Semantics, Task analysis, Measurement, point cloud, transformation, deformable, segmentation, 3D detection BibRef

Wang, L.[Libo], Li, R.[Rui], Zhang, C.[Ce], Fang, S.H.[Sheng-Hui], Duan, C.X.[Chen-Xi], Meng, X.L.[Xiao-Liang], Atkinson, P.M.[Peter M.],
UNetFormer: A UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery,
PandRS(190), 2022, pp. 196-214.
Elsevier DOI 2208
Semantic Segmentation, Remote Sensing, Vision Transformer, Fully Transformer Network, Global-local Context, Urban Scene BibRef

Kheldouni, A.[Amine], Boumhidi, J.[Jaouad],
A Study of Bidirectional Encoder Representations from Transformers for Sequential Recommendations,
ISCV22(1-5)
IEEE DOI 2208
Knowledge engineering, Recurrent neural networks, Computer architecture, Predictive models, Markov processes BibRef


Zhao, H.S.[Heng-Shuang], Jiang, L.[Li], Jia, J.Y.[Jia-Ya], Torr, P.H.S.[Philip H.S.], Koltun, V.[Vladlen],
Point Transformer,
ICCV21(16239-16248)
IEEE DOI 2203
Point cloud compression, Measurement, Image segmentation, Semantics, Object detection, Transformer cores, Recognition and classification BibRef

Shao, R.Z.[Rui-Zhi], Wu, G.[Gaochang], Zhou, Y.M.[Yue-Mei], Fu, Y.[Ying], Fang, L.[Lu], Liu, Y.B.[Ye-Bin],
LocalTrans: A Multiscale Local Transformer Network for Cross-Resolution Homography Estimation,
ICCV21(14870-14879)
IEEE DOI 2203
Photography, Superresolution, Estimation, Transformers, Light fields, Kernel, Image and video synthesis, Computational photography BibRef

Guo, Z.H.[Zong-Hui], Guo, D.S.[Dong-Sheng], Zheng, H.Y.[Hai-Yong], Gu, Z.R.[Zhao-Rui], Zheng, B.[Bing], Dong, J.Y.[Jun-Yu],
Image Harmonization with Transformer,
ICCV21(14850-14859)
IEEE DOI 2203
Codes, Semantics, Imaging, Computer architecture, Transformers, Convolutional neural networks, Image and video synthesis, BibRef

Rombach, R.[Robin], Esser, P.[Patrick], Ommer, B.[Björn],
Geometry-Free View Synthesis: Transformers and no 3D Priors,
ICCV21(14336-14346)
IEEE DOI 2203
Solid modeling, Visualization, Geometric modeling, Computer architecture, Transformers, Image and video synthesis, Neural generative models BibRef

Tan, J.[Jing], Tang, J.Q.[Jia-Qi], Wang, L.M.[Li-Min], Wu, G.S.[Gang-Shan],
Relaxed Transformer Decoders for Direct Action Proposal Generation,
ICCV21(13506-13515)
IEEE DOI 2203
Visualization, Head, Pipelines, Estimation, Transformers, Decoding, Action and behavior recognition, Video analysis and understanding BibRef

Lin, K.[Kevin], Wang, L.[Lijuan], Liu, Z.C.[Zi-Cheng],
Mesh Graphormer,
ICCV21(12919-12928)
IEEE DOI 2203
Convolutional codes, Solid modeling, Network topology, Computer architecture, Transformers, Gestures and body pose BibRef

Liu, S.[Song], Fan, H.Q.[Hao-Qi], Qian, S.S.[Sheng-Sheng], Chen, Y.[Yiru], Ding, W.[Wenkui], Wang, Z.Y.[Zhong-Yuan],
HiT: Hierarchical Transformer with Momentum Contrast for Video-Text Retrieval,
ICCV21(11895-11905)
IEEE DOI 2203
Training, Memory management, Streaming media, Performance gain, Benchmark testing, Transformers, Image and video retrieval, Vision + other modalities BibRef

Casey, E.[Evan], Pérez, V.[Víctor], Li, Z.[Zhuoru],
The Animation Transformer: Visual Correspondence via Segment Matching,
ICCV21(11303-11312)
IEEE DOI 2203
Visualization, Image segmentation, Image color analysis, Production, Animation, Transformers, grouping and shape BibRef

Reizenstein, J.[Jeremy], Shapovalov, R.[Roman], Henzler, P.[Philipp], Sbordone, L.[Luca], Labatut, P.[Patrick], Novotny, D.[David],
Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction,
ICCV21(10881-10891)
IEEE DOI 2203
Award, Marr Prize, HM. Point cloud compression, Transformers, Rendering (computer graphics), Cameras, Image reconstruction, 3D from multiview and other sensors BibRef

Mariotti, O.[Octave], Aodha, O.M.[Oisin Mac], Bilen, H.[Hakan],
ViewNet: Unsupervised Viewpoint Estimation from Conditional Generation,
ICCV21(10398-10408)
IEEE DOI 2203
Training, Annotations, Estimation, Benchmark testing, Transformers, Representation learning, Transfer/Low-shot/Semi/Unsupervised Learning BibRef

Feng, W.X.[Wei-Xin], Wang, Y.J.[Yuan-Jiang], Ma, L.H.[Li-Hua], Yuan, Y.[Ye], Zhang, C.[Chi],
Temporal Knowledge Consistency for Unsupervised Visual Representation Learning,
ICCV21(10150-10160)
IEEE DOI 2203
Training, Representation learning, Visualization, Protocols, Object detection, Semisupervised learning, Transformers, Transfer/Low-shot/Semi/Unsupervised Learning BibRef

Wu, H.P.[Hai-Ping], Xiao, B.[Bin], Codella, N.[Noel], Liu, M.C.[Meng-Chen], Dai, X.Y.[Xi-Yang], Yuan, L.[Lu], Zhang, L.[Lei],
CvT: Introducing Convolutions to Vision Transformers,
ICCV21(22-31)
IEEE DOI 2203
Code, Vision Transformer.
WWW Link. Convolutional codes, Image resolution, Image recognition, Computer architecture, Performance gain, Transformers, Distortion, BibRef

Touvron, H.[Hugo], Cord, M.[Matthieu], Sablayrolles, A.[Alexandre], Synnaeve, G.[Gabriel], Jégou, H.[Hervé],
Going deeper with Image Transformers,
ICCV21(32-42)
IEEE DOI 2203
Training, Neural networks, Training data, Computer architecture, Data models, Circuit faults, Recognition and classification, Optimization and learning methods BibRef

Zhao, J.W.[Jia-Wei], Yan, K.[Ke], Zhao, Y.[Yifan], Guo, X.W.[Xiao-Wei], Huang, F.Y.[Fei-Yue], Li, J.[Jia],
Transformer-based Dual Relation Graph for Multi-label Image Recognition,
ICCV21(163-172)
IEEE DOI 2203
Image recognition, Correlation, Computational modeling, Semantics, Computer architecture, Benchmark testing, Representation learning BibRef

Chen, C.F.R.[Chun-Fu Richard], Fan, Q.F.[Quan-Fu], Panda, R.[Rameswar],
CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification,
ICCV21(347-356)
IEEE DOI 2203
Image segmentation, Image recognition, Computational modeling, Semantics, Memory management, Object detection, Representation learning BibRef

Peng, Z.L.[Zhi-Liang], Huang, W.[Wei], Gu, S.Z.[Shan-Zhi], Xie, L.X.[Ling-Xi], Wang, Y.[Yaowei], Jiao, J.B.[Jian-Bin], Ye, Q.X.[Qi-Xiang],
Conformer: Local Features Coupling Global Representations for Visual Recognition,
ICCV21(357-366)
IEEE DOI 2203
Couplings, Representation learning, Visualization, Fuses, Convolution, Object detection, Transformers, Representation learning BibRef

Pan, Z.Z.[Zi-Zheng], Zhuang, B.[Bohan], Liu, J.[Jing], He, H.Y.[Hao-Yu], Cai, J.F.[Jian-Fei],
Scalable Vision Transformers with Hierarchical Pooling,
ICCV21(367-376)
IEEE DOI 2203
Visualization, Image recognition, Computational modeling, Scalability, Transformers, Computational efficiency, Efficient training and inference methods BibRef

Yue, X.Y.[Xiao-Yu], Sun, S.Y.[Shu-Yang], Kuang, Z.H.[Zhang-Hui], Wei, M.[Meng], Torr, P.[Philip], Zhang, W.[Wayne], Lin, D.[Dahua],
Vision Transformer with Progressive Sampling,
ICCV21(377-386)
IEEE DOI 2203
Codes, Computational modeling, Interference, Computer architecture, Transformers, Feature extraction, Recognition and classification, Representation learning BibRef

Chefer, H.[Hila], Gur, S.[Shir], Wolf, L.[Lior],
Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers,
ICCV21(387-396)
IEEE DOI 2203
Measurement, Visualization, Image segmentation, Computational modeling, Computer architecture, Object detection, Vision + language BibRef

Yuan, L.[Li], Chen, Y.P.[Yun-Peng], Wang, T.[Tao], Yu, W.[Weihao], Shi, Y.J.[Yu-Jun], Jiang, Z.[Zihang], Tay, F.E.H.[Francis E. H.], Feng, J.S.[Jia-Shi], Yan, S.C.[Shui-Cheng],
Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet,
ICCV21(538-547)
IEEE DOI 2203
Training, Image resolution, Computational modeling, Image edge detection, Computer architecture, Transformers, BibRef

Wu, B.[Bichen], Xu, C.F.[Chen-Feng], Dai, X.L.[Xiao-Liang], Wan, A.[Alvin], Zhang, P.Z.[Pei-Zhao], Yan, Z.C.[Zhi-Cheng], Tomizuka, M.[Masayoshi], Gonzalez, J.[Joseph], Keutzer, K.[Kurt], Vajda, P.[Peter],
Visual Transformers: Where Do Transformers Really Belong in Vision Models?,
ICCV21(579-589)
IEEE DOI 2203
Training, Visualization, Image segmentation, Lips, Computational modeling, Semantics, Vision applications and systems BibRef

Truong, T.D.[Thanh-Dat], Duong, C.N.[Chi Nhan], Vu, T.D.[The De], Pham, H.A.[Hoang Anh], Raj, B.[Bhiksha], Le, N.[Ngan], Luu, K.[Khoa],
The Right to Talk: An Audio-Visual Transformer Approach,
ICCV21(1085-1094)
IEEE DOI 2203
Location awareness, Visualization, Correlation, Interrupters, Transformers, Feature extraction, Regulation, Video analysis and understanding BibRef

Hu, R.[Ronghang], Singh, A.[Amanpreet],
UniT: Multimodal Multitask Learning with a Unified Transformer,
ICCV21(1419-1429)
IEEE DOI 2203
Training, Natural languages, Computer architecture, Object detection, Predictive models, Transformers, Multitasking, Representation learning BibRef

Qiu, Y.[Yue], Yamamoto, S.[Shintaro], Nakashima, K.[Kodai], Suzuki, R.[Ryota], Iwata, K.[Kenji], Kataoka, H.[Hirokatsu], Satoh, Y.[Yutaka],
Describing and Localizing Multiple Changes with Transformers,
ICCV21(1951-1960)
IEEE DOI 2203
Measurement, Location awareness, Codes, Natural languages, Benchmark testing, Transformers, Vision + language, Vision applications and systems BibRef

Song, M.[Myungseo], Choi, J.[Jinyoung], Han, B.H.[Bo-Hyung],
Variable-Rate Deep Image Compression through Spatially-Adaptive Feature Transform,
ICCV21(2360-2369)
IEEE DOI 2203
Training, Image coding, Neural networks, Rate-distortion, Transforms, Network architecture, Computational photography, Low-level and physics-based vision BibRef

Weng, W.M.[Wen-Ming], Zhang, Y.[Yueyi], Xiong, Z.W.[Zhi-Wei],
Event-based Video Reconstruction Using Transformer,
ICCV21(2543-2552)
IEEE DOI 2203
Visualization, Computational modeling, Semantics, Memory management, Transformers, Feature extraction, Image and video synthesis BibRef

Shenga, H.[Hualian], Cai, S.[Sijia], Liu, Y.[Yuan], Deng, B.[Bing], Huang, J.Q.[Jian-Qiang], Hua, X.S.[Xian-Sheng], Zhao, M.J.[Min-Jian],
Improving 3D Object Detection with Channel-wise Transformer,
ICCV21(2723-2732)
IEEE DOI 2203
Point cloud compression, Object detection, Detectors, Transforms, Transformers, Encoding, Detection and localization in 2D and 3D, BibRef

Zhang, P.[Pengchuan], Dai, X.[Xiyang], Yang, J.W.[Jian-Wei], Xiao, B.[Bin], Yuan, L.[Lu], Zhang, L.[Lei], Gao, J.F.[Jian-Feng],
Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding,
ICCV21(2978-2988)
IEEE DOI 2203
Image segmentation, Image coding, Computational modeling, Memory management, Object detection, Transformers, Representation learning BibRef

Dong, Q.[Qi], Tu, Z.W.[Zhuo-Wen], Liao, H.[Haofu], Zhang, Y.T.[Yu-Ting], Mahadevan, V.[Vijay], Soatto, S.[Stefano],
Visual Relationship Detection Using Part-and-Sum Transformers with Composite Queries,
ICCV21(3530-3539)
IEEE DOI 2203
Visualization, Detectors, Transformers, Task analysis, Standards, Detection and localization in 2D and 3D, Representation learning BibRef

Wang, T.[Tao], Yuan, L.[Li], Chen, Y.P.[Yun-Peng], Feng, J.S.[Jia-Shi], Yan, S.C.[Shui-Cheng],
PnP-DETR: Towards Efficient Visual Analysis with Transformers,
ICCV21(4641-4650)
IEEE DOI 2203
Adaptation models, Visualization, Image segmentation, Image recognition, Computational modeling, Redundancy, Detection and localization in 2D and 3D BibRef

Fan, H.Q.[Hao-Qi], Xiong, B.[Bo], Mangalam, K.[Karttikeya], Li, Y.[Yanghao], Yan, Z.C.[Zhi-Cheng], Malik, J.[Jitendra], Feichtenhofer, C.[Christoph],
Multiscale Vision Transformers,
ICCV21(6804-6815)
IEEE DOI 2203
Visualization, Image recognition, Codes, Computational modeling, Transformers, Complexity theory, Recognition and classification BibRef

Arnab, A.[Anurag], Dehghani, M.[Mostafa], Heigold, G.[Georg], Sun, C.[Chen], Lucic, M.[Mario], Schmid, C.[Cordelia],
ViViT: A Video Vision Transformer,
ICCV21(6816-6826)
IEEE DOI 2203
Training, Benchmark testing, Transformers, Spatiotemporal phenomena, Kinetic theory, Action and behavior recognition BibRef

Mahmood, K.[Kaleel], Mahmood, R.[Rigel], van Dijk, M.[Marten],
On the Robustness of Vision Transformers to Adversarial Examples,
ICCV21(7818-7827)
IEEE DOI 2203
Computer architecture, Transformers, Robustness, Adversarial machine learning, Security, Machine learning architectures and formulations BibRef

Chen, X.L.[Xin-Lei], Xie, S.[Saining], He, K.[Kaiming],
An Empirical Study of Training Self-Supervised Vision Transformers,
ICCV21(9620-9629)
IEEE DOI 2203
Training, Benchmark testing, Transformers, Standards, Representation learning, Recognition and classification, Transfer/Low-shot/Semi/Unsupervised Learning BibRef

Caron, M.[Mathilde], Touvron, H.[Hugo], Misra, I.[Ishan], Jegou, H.[Hervé], Mairal, J.[Julien], Bojanowski, P.[Piotr], Joulin, A.[Armand],
Emerging Properties in Self-Supervised Vision Transformers,
ICCV21(9630-9640)
IEEE DOI 2203
Training, Image segmentation, Semantics, Layout, Image retrieval, Computer architecture, Representation learning, Transfer/Low-shot/Semi/Unsupervised Learning BibRef

Yuan, Y.[Ye], Weng, X.[Xinshuo], Ou, Y.[Yanglan], Kitani, K.[Kris],
AgentFormer: Agent-Aware Transformers for Socio-Temporal Multi-Agent Forecasting,
ICCV21(9793-9803)
IEEE DOI 2203
Uncertainty, Stochastic processes, Predictive models, Transformers, Encoding, Trajectory, Motion and tracking, Vision for robotics and autonomous vehicles BibRef

Xu, W.J.[Wei-Jian], Xu, Y.[Yifan], Chang, T.[Tyler], Tu, Z.W.[Zhuo-Wen],
Co-Scale Conv-Attentional Image Transformers,
ICCV21(9961-9970)
IEEE DOI 2203
Image segmentation, Computational modeling, Object detection, Transformers, Convolutional neural networks, Task analysis, Recognition and classification BibRef

Wu, K.[Kan], Peng, H.[Houwen], Chen, M.[Minghao], Fu, J.L.[Jian-Long], Chao, H.Y.[Hong-Yang],
Rethinking and Improving Relative Position Encoding for Vision Transformer,
ICCV21(10013-10021)
IEEE DOI 2203
Image coding, Codes, Computational modeling, Transformers, Encoding, Natural language processing, Datasets and evaluation, Recognition and classification BibRef

Bhojanapalli, S.[Srinadh], Chakrabarti, A.[Ayan], Glasner, D.[Daniel], Li, D.[Daliang], Unterthiner, T.[Thomas], Veit, A.[Andreas],
Understanding Robustness of Transformers for Image Classification,
ICCV21(10211-10221)
IEEE DOI 2203
Perturbation methods, Computer architecture, Transformers, Robustness, Data models, Convolutional neural networks, Recognition and classification BibRef

Yan, B.[Bin], Peng, H.[Houwen], Fu, J.L.[Jian-Long], Wang, D.[Dong], Lu, H.C.[Hu-Chuan],
Learning Spatio-Temporal Transformer for Visual Tracking,
ICCV21(10428-10437)
IEEE DOI 2203
Visualization, Target tracking, Smoothing methods, Pipelines, Computer architecture, Benchmark testing, Transformers, BibRef

Heo, B.[Byeongho], Yun, S.[Sangdoo], Han, D.Y.[Dong-Yoon], Chun, S.[Sanghyuk], Choe, J.[Junsuk], Oh, S.J.[Seong Joon],
Rethinking Spatial Dimensions of Vision Transformers,
ICCV21(11916-11925)
IEEE DOI 2203
Dimensionality reduction, Computational modeling, Computer architecture, Object detection, Transformers, Robustness, Recognition and classification BibRef

Voskou, A.[Andreas], Panousis, K.P.[Konstantinos P.], Kosmopoulos, D.[Dimitrios], Metaxas, D.N.[Dimitris N.], Chatzis, S.[Sotirios],
Stochastic Transformer Networks with Linear Competing Units: Application to end-to-end SL Translation,
ICCV21(11926-11935)
IEEE DOI 2203
Training, Memory management, Stochastic processes, Gesture recognition, Benchmark testing, Assistive technologies, Vision + language BibRef

Ranftl, R.[René], Bochkovskiy, A.[Alexey], Koltun, V.[Vladlen],
Vision Transformers for Dense Prediction,
ICCV21(12159-12168)
IEEE DOI 2203
Image resolution, Semantics, Neural networks, Estimation, Training data, Computer architecture, grouping and shape BibRef

Chen, M.H.[Ming-Hao], Peng, H.W.[Hou-Wen], Fu, J.L.[Jian-Long], Ling, H.B.[Hai-Bin],
AutoFormer: Searching Transformers for Visual Recognition,
ICCV21(12250-12260)
IEEE DOI 2203
Training, Convolutional codes, Visualization, Head, Search methods, Computer architecture, Manuals, Recognition and classification BibRef

Girdhar, R.[Rohit], Grauman, K.[Kristen],
Anticipative Video Transformer,
ICCV21(13485-13495)
IEEE DOI 2203
Video sequences, Computer architecture, Predictive models, Benchmark testing, Transformers, Task analysis, Video analysis and understanding BibRef

Zhang, Y.[Yanyi], Li, X.Y.[Xin-Yu], Liu, C.H.[Chun-Hui], Shuai, B.[Bing], Zhu, Y.[Yi], Brattoli, B.[Biagio], Chen, H.[Hao], Marsic, I.[Ivan], Tighe, J.[Joseph],
VidTr: Video Transformer Without Convolutions,
ICCV21(13557-13567)
IEEE DOI 2203
Training, Costs, Error analysis, Computational modeling, Transformers, Cognition, Action and behavior recognition, Video analysis and understanding BibRef

Yang, G.L.[Guang-Lei], Tang, H.[Hao], Ding, M.L.[Ming-Li], Sebe, N.[Nicu], Ricci, E.[Elisa],
Transformer-Based Attention Networks for Continuous Pixel-Wise Prediction,
ICCV21(16249-16259)
IEEE DOI 2203
Correlation, Estimation, Computer architecture, Logic gates, Transformers, Natural language processing, Vision applications and systems BibRef

Yuan, K.[Kun], Guo, S.P.[Shao-Peng], Liu, Z.[Ziwei], Zhou, A.[Aojun], Yu, F.W.[Feng-Wei], Wu, W.[Wei],
Incorporating Convolution Designs into Visual Transformers,
ICCV21(559-568)
IEEE DOI 2203
Training, Visualization, Costs, Convolution, Training data, Transformers, Feature extraction, Recognition and classification, Efficient training and inference methods BibRef

Chen, Z.[Zhengsu], Xie, L.X.[Ling-Xi], Niu, J.W.[Jian-Wei], Liu, X.F.[Xue-Feng], Wei, L.[Longhui], Tian, Q.[Qi],
Visformer: The Vision-friendly Transformer,
ICCV21(569-578)
IEEE DOI 2203
Convolutional codes, Training, Visualization, Protocols, Computational modeling, Fitting, Recognition and classification, Representation learning BibRef

Wang, W.[Wenhai], Xie, E.[Enze], Li, X.[Xiang], Fan, D.P.[Deng-Ping], Song, K.[Kaitao], Liang, D.[Ding], Lu, T.[Tong], Luo, P.[Ping], Shao, L.[Ling],
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions,
ICCV21(548-558)
IEEE DOI 2203
Image resolution, Costs, Semantics, Object detection, Transformers, Feature extraction, Recognition and classification, grouping and shape BibRef

Yao, Z.L.[Zhu-Liang], Cao, Y.[Yue], Lin, Y.[Yutong], Liu, Z.[Ze], Zhang, Z.[Zheng], Hu, H.[Han],
Leveraging Batch Normalization for Vision Transformers,
NeruArch21(413-422)
IEEE DOI 2112
Training, Computer architecture, Transformers, Feeds BibRef

Chen, J.W.[Jia-Wei], Ho, C.M.[Chiu Man],
MM-ViT: Multi-Modal Video Transformer for Compressed Video Action Recognition,
WACV22(786-797)
IEEE DOI 2202
Computational modeling, Benchmark testing, ransformers, Computational efficiency, Spatiotemporal phenomena, Analysis and Understanding BibRef

Kim, K.[Kyungmin], Wu, B.C.[Bi-Chen], Dai, X.L.[Xiao-Liang], Zhang, P.Z.[Pei-Zhao], Yan, Z.C.[Zhi-Cheng], Vajda, P.[Peter], Kim, S.[Seon],
Rethinking the Self-Attention in Vision Transformers,
ECV21(3065-3069)
IEEE DOI 2109
Computational modeling, Pattern recognition BibRef

Zhang, Z.X.[Zi-Xiao], Lu, X.Q.[Xiao-Qiang], Cao, G.J.[Guo-Jin], Yang, Y.T.[Yu-Ting], Jiao, L.C.[Li-Cheng], Liu, F.[Fang],
ViT-YOLO: Transformer-Based YOLO for Object Detection,
VisDrone21(2799-2808)
IEEE DOI 2112
Semantics, Detectors, Object detection, Feature extraction, Robustness BibRef

Kong, D.[Daehyeon], Kong, K.[Kyeongbo], Kim, K.[Kyunghun], Min, S.J.[Sung-Jun], Kang, S.J.[Suk-Ju],
Image-Adaptive Hint Generation via Vision Transformer for Outpainting,
WACV22(4029-4038)
IEEE DOI 2202
Image synthesis, Neural networks, Complex networks, Benchmark testing, Transformers, Vision Systems and Applications BibRef

Graham, B.[Ben], El-Nouby, A.[Alaaeldin], Touvron, H.[Hugo], Stock, P.[Pierre], Joulin, A.[Armand], Jégou, H.[Hervé], Douze, M.[Matthijs],
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference,
ICCV21(12239-12249)
IEEE DOI 2203
Training, Image resolution, Neural networks, Computer architecture, Parallel processing, Transformers, Feature extraction, Representation learning BibRef

Horváth, J.[János], Baireddy, S.[Sriram], Hao, H.X.[Han-Xiang], Montserrat, D.M.[Daniel Mas], Delp, E.J.[Edward J.],
Manipulation Detection in Satellite Images Using Vision Transformer,
WMF21(1032-1041)
IEEE DOI 2109
BibRef
Earlier: A1, A4, A3, A5, Only:
Manipulation Detection in Satellite Images Using Deep Belief Networks,
WMF20(2832-2840)
IEEE DOI 2008
Image sensors, Satellites, Splicing, Forestry, Tools. Satellites, Image reconstruction, Training, Forgery, Heating systems, Feature extraction BibRef

Beal, J.[Josh], Wu, H.Y.[Hao-Yu], Park, D.H.[Dong Huk], Zhai, A.[Andrew], Kislyuk, D.[Dmitry],
Billion-Scale Pretraining with Vision Transformers for Multi-Task Visual Representations,
WACV22(1431-1440)
IEEE DOI 2202
Visualization, Solid modeling, Systematics, Computational modeling, Computer architecture, Transformers, Semi- and Un- supervised Learning BibRef

Li, S.Y.[Shu-Yan], Li, X.[Xiu], Lu, J.W.[Ji-Wen], Zhou, J.[Jie],
Self-supervised Video Hashing via Bidirectional Transformers,
CVPR21(13544-13553)
IEEE DOI 2111
Training, Hash functions, Visualization, Correlation, Benchmark testing, Transformers BibRef

Hsu, T.C.[Tzu-Chun], Liao, Y.S.[Yi-Sheng], Huang, C.R.[Chun-Rong],
Video Summarization With Frame Index Vision Transformer,
MVA21(1-5)
DOI Link 2109
Training, Deep learning, Recurrent neural networks, Streaming media, Real-time systems, Computational efficiency BibRef

Chapter on Pattern Recognition, Clustering, Statistics, Grammars, Learning, Neural Nets, Genetic Algorithms continues in
Spiking Neural Networks .


Last update:Aug 14, 2022 at 21:20:19