14.5.10.5.1 Vision Transformers, ViT

Chapter Contents (Back)
Vision Transformers. Transformers. Shift, Scale, and Distortion Invariance. Video specific:
See also Video Transformers.

Bazi, Y.[Yakoub], Bashmal, L.[Laila], Al Rahhal, M.M.[Mohamad M.], Al Dayil, R.[Reham], Al Ajlan, N.[Naif],
Vision Transformers for Remote Sensing Image Classification,
RS(13), No. 3, 2021, pp. xx-yy.
DOI Link 2102
BibRef

Hu, H.Q.[Hao-Qi], Lu, X.F.[Xiao-Feng], Zhang, X.P.[Xin-Peng], Zhang, T.X.[Tian-Xing], Sun, G.L.[Guang-Ling],
Inheritance Attention Matrix-Based Universal Adversarial Perturbations on Vision Transformers,
SPLetters(28), 2021, pp. 1923-1927.
IEEE DOI 2110
Perturbation methods, Robustness, Visualization, Transformers, Optimization, Task analysis, Head, Vision Transformers, self-attention BibRef

Li, T.[Tao], Zhang, Z.[Zheng], Pei, L.[Lishen], Gan, Y.[Yan],
HashFormer: Vision Transformer Based Deep Hashing for Image Retrieval,
SPLetters(29), 2022, pp. 827-831.
IEEE DOI 2204
Transformers, Binary codes, Task analysis, Training, Image retrieval, Feature extraction, Databases, Binary embedding, image retrieval BibRef

Jiang, B.[Bo], Zhao, K.K.[Kang-Kang], Tang, J.[Jin],
RGTransformer: Region-Graph Transformer for Image Representation and Few-Shot Classification,
SPLetters(29), 2022, pp. 792-796.
IEEE DOI 2204
Measurement, Transformers, Image representation, Feature extraction, Visualization, transformer BibRef

Chen, Z.M.[Zhao-Min], Cui, Q.[Quan], Zhao, B.[Borui], Song, R.J.[Ren-Jie], Zhang, X.Q.[Xiao-Qin], Yoshie, O.[Osamu],
SST: Spatial and Semantic Transformers for Multi-Label Image Recognition,
IP(31), 2022, pp. 2570-2583.
IEEE DOI 2204
Correlation, Semantics, Transformers, Image recognition, Task analysis, Training, Feature extraction, label correlation BibRef

Xue, Z.X.[Zhi-Xiang], Tan, X.[Xiong], Yu, X.[Xuchu], Liu, B.[Bing], Yu, A.[Anzhu], Zhang, P.Q.[Peng-Qiang],
Deep Hierarchical Vision Transformer for Hyperspectral and LiDAR Data Classification,
IP(31), 2022, pp. 3095-3110.
IEEE DOI 2205
Feature extraction, Transformers, Hyperspectral imaging, Laser radar, Data mining, Collaboration, Data models, cross attention fusion BibRef

Wang, G.H.[Guang-Hui], Li, B.[Bin], Zhang, T.[Tao], Zhang, S.[Shubi],
A Network Combining a Transformer and a Convolutional Neural Network for Remote Sensing Image Change Detection,
RS(14), No. 9, 2022, pp. xx-yy.
DOI Link 2205
BibRef

Luo, G.[Gen], Zhou, Y.[Yiyi], Sun, X.S.[Xiao-Shuai], Wang, Y.[Yan], Cao, L.J.[Liu-Juan], Wu, Y.J.[Yong-Jian], Huang, F.Y.[Fei-Yue], Ji, R.R.[Rong-Rong],
Towards Lightweight Transformer Via Group-Wise Transformation for Vision-and-Language Tasks,
IP(31), 2022, pp. 3386-3398.
IEEE DOI 2205
Transformers, Task analysis, Computational modeling, Benchmark testing, Visualization, Convolution, Head, reference expression comprehension BibRef

Tu, Y.B.[Yun-Bin], Li, L.[Liang], Su, L.[Li], Gao, S.X.[Sheng-Xiang], Yan, C.G.[Cheng-Gang], Zha, Z.J.[Zheng-Jun], Yu, Z.T.[Zheng-Tao], Huang, Q.M.[Qing-Ming],
I2-Transformer: Intra- and Inter-Relation Embedding Transformer for TV Show Captioning,
IP(31), 2022, pp. 3565-3577.
IEEE DOI 2206
Transformers, Semantics, Task analysis, Visualization, TV, Graph neural networks, TV Show captioning, transformer BibRef

Heo, J.[Jiseong], Wang, Y.[Yooseung], Park, J.[Jihun],
Occlusion-aware spatial attention transformer for occluded object recognition,
PRL(159), 2022, pp. 70-76.
Elsevier DOI 2206
Occluded object recognition, Visual transformer, Spatial attention BibRef

Wang, J.Y.[Jia-Yun], Chakraborty, R.[Rudrasis], Yu, S.X.[Stella X.],
Transformer for 3D Point Clouds,
PAMI(44), No. 8, August 2022, pp. 4419-4431.
IEEE DOI 2207
Convolution, Feature extraction, Shape, Semantics, Task analysis, Measurement, point cloud, transformation, deformable, segmentation, 3D detection BibRef

Wang, L.[Libo], Li, R.[Rui], Zhang, C.[Ce], Fang, S.H.[Sheng-Hui], Duan, C.X.[Chen-Xi], Meng, X.L.[Xiao-Liang], Atkinson, P.M.[Peter M.],
UNetFormer: A UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery,
PandRS(190), 2022, pp. 196-214.
Elsevier DOI 2208
Semantic Segmentation, Remote Sensing, Vision Transformer, Fully Transformer Network, Global-local Context, Urban Scene BibRef

Kheldouni, A.[Amine], Boumhidi, J.[Jaouad],
A Study of Bidirectional Encoder Representations from Transformers for Sequential Recommendations,
ISCV22(1-5)
IEEE DOI 2208
Knowledge engineering, Recurrent neural networks, Predictive models, Markov processes BibRef

Li, Z.[Zekun], Liu, Y.F.[Yu-Fan], Li, B.[Bing], Feng, B.L.[Bai-Lan], Wu, K.[Kebin], Peng, C.W.[Cheng-Wei], Hu, W.M.[Wei-Ming],
SDTP: Semantic-Aware Decoupled Transformer Pyramid for Dense Image Prediction,
CirSysVideo(32), No. 9, September 2022, pp. 6160-6173.
IEEE DOI 2209
Transformers, Semantics, Task analysis, Detectors, Image segmentation, Head, Convolution, Transformer, dense prediction, multi-level interaction BibRef

Wu, J.J.[Jia-Jing], Wei, Z.Q.[Zhi-Qiang], Zhang, J.P.[Jin-Peng], Zhang, Y.[Yushi], Jia, D.N.[Dong-Ning], Yin, B.[Bo], Yu, Y.C.[Yun-Chao],
Full-Coupled Convolutional Transformer for Surface-Based Duct Refractivity Inversion,
RS(14), No. 17, 2022, pp. xx-yy.
DOI Link 2209
BibRef

Dalmaz, O.[Onat], Yurt, M.[Mahmut], Çukur, T.[Tolga],
ResViT: Residual Vision Transformers for Multimodal Medical Image Synthesis,
MedImg(41), No. 10, October 2022, pp. 2598-2614.
IEEE DOI 2210
Transformers, Biomedical imaging, Subspace constraints, Task analysis, Image synthesis, Magnetic resonance imaging, unified BibRef

Jiang, K.[Kai], Peng, P.[Peng], Lian, Y.[Youzao], Xu, W.S.[Wei-Sheng],
The encoding method of position embeddings in vision transformer,
JVCIR(89), 2022, pp. 103664.
Elsevier DOI 2212
Vision transformer, Position embeddings, Gabor filters BibRef

Han, K.[Kai], Wang, Y.H.[Yun-He], Chen, H.[Hanting], Chen, X.[Xinghao], Guo, J.[Jianyuan], Liu, Z.H.[Zhen-Hua], Tang, Y.[Yehui], Xiao, A.[An], Xu, C.J.[Chun-Jing], Xu, Y.X.[Yi-Xing], Yang, Z.H.[Zhao-Hui], Zhang, Y.[Yiman], Tao, D.C.[Da-Cheng],
A Survey on Vision Transformer,
PAMI(45), No. 1, January 2023, pp. 87-110.
IEEE DOI 2212
Survey, Vision Transformer. Transformers, Task analysis, Encoding, Computational modeling, Visualization, Object detection, high-level vision, video BibRef

Hou, Q.[Qibin], Jiang, Z.[Zihang], Yuan, L.[Li], Cheng, M.M.[Ming-Ming], Yan, S.C.[Shui-Cheng], Feng, J.S.[Jia-Shi],
Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition,
PAMI(45), No. 1, January 2023, pp. 1328-1334.
IEEE DOI 2212
Transformers, Encoding, Visualization, Convolutional codes, Mixers, Computer architecture, Training data, Vision permutator, deep neural network BibRef

Yu, X.H.[Xiao-Han], Wang, J.[Jun], Zhao, Y.[Yang], Gao, Y.S.[Yong-Sheng],
Mix-ViT: Mixing attentive vision transformer for ultra-fine-grained visual categorization,
PR(135), 2023, pp. 109131.
Elsevier DOI 2212
Ultra-fine-grained visual categorization, Vision transformer, Self-supervised learning, Attentive mixing BibRef

Li, Y.[Yehao], Yao, T.[Ting], Pan, Y.[Yingwei], Mei, T.[Tao],
Contextual Transformer Networks for Visual Recognition,
PAMI(45), No. 2, February 2023, pp. 1489-1500.
IEEE DOI 2301
Transformers, Convolution, Visualization, Task analysis, Image recognition, Object detection, Transformer, image recognition BibRef

Wang, H.[Hang], Du, Y.[Youtian], Zhang, Y.[Yabin], Li, S.[Shuai], Zhang, L.[Lei],
One-Stage Visual Relationship Referring With Transformers and Adaptive Message Passing,
IP(32), 2023, pp. 190-202.
IEEE DOI 2301
Visualization, Proposals, Transformers, Task analysis, Detectors, Message passing, Predictive models, gated message passing BibRef

Kim, B.[Boah], Kim, J.[Jeongsol], Ye, J.C.[Jong Chul],
Task-Agnostic Vision Transformer for Distributed Learning of Image Processing,
IP(32), 2023, pp. 203-218.
IEEE DOI 2301
Task analysis, Transformers, Servers, Distance learning, Computer aided instruction, Tail, Head, Distributed learning, task-agnostic learning BibRef

Kiya, H.[Hitoshi], Iijima, R.[Ryota], Maungmaung, A.[Aprilpyone], Kinoshit, Y.[Yuma],
Image and Model Transformation with Secret Key for Vision Transformer,
IEICE(E106-D), No. 1, January 2023, pp. 2-11.
WWW Link. 2301
BibRef

Lin, X.[Xiao], Sun, S.Z.[Shu-Zhou], Huang, W.[Wei], Sheng, B.[Bin], Li, P.[Ping], Feng, D.D.[David Dagan],
EAPT: Efficient Attention Pyramid Transformer for Image Processing,
MultMed(25), 2023, pp. 50-61.
IEEE DOI 2301
Transformers, Encoding, Task analysis, Semantics, Feature extraction, Costs, Convolutional neural networks, Transformer, semantic segmentation BibRef

Mou, C.[Chong], Zhang, J.[Jian],
TransCL: Transformer Makes Strong and Flexible Compressive Learning,
PAMI(45), No. 4, April 2023, pp. 5236-5251.
IEEE DOI 2303
Task analysis, Transformers, Image reconstruction, Image coding, Compressed sensing, Sensors, Cameras, Compressed sensing, semantic segmentation BibRef


Rangrej, S.B.[Samrudhdhi B], Liang, K.J.[Kevin J], Hassner, T.[Tal], Clark, J.J.[James J],
GliTr: Glimpse Transformers with Spatiotemporal Consistency for Online Action Prediction,
WACV23(3402-3412)
IEEE DOI 2302
Predictive models, Transformers, Cameras, Spatiotemporal phenomena, Sensors, Observability BibRef

Mo, S.T.[Shen-Tong], Sun, Z.[Zhun], Li, C.[Chao],
Multi-level Contrastive Learning for Self-Supervised Vision Transformers,
WACV23(2777-2786)
IEEE DOI 2302
Training, Representation learning, Head, Semantic segmentation, Self-supervised learning, visual reasoning BibRef

Yun, J.[Jooyeol], Lee, S.[Sanghyeon], Park, M.H.[Min-Ho], Choo, J.[Jaegul],
iColoriT: Towards Propagating Local Hints to the Right Region in Interactive Colorization by Leveraging Vision Transformer,
WACV23(1787-1796)
IEEE DOI 2302
Convolutional codes, Image color analysis, Stacking, Gray-scale, Transformers, Algorithms: Computational photography, image and video synthesis BibRef

Liu, Y.[Yue], Matsoukas, C.[Christos], Strand, F.[Fredrik], Azizpour, H.[Hossein], Smith, K.[Kevin],
PatchDropout: Economizing Vision Transformers Using Patch Dropout,
WACV23(3942-3951)
IEEE DOI 2302
Training, Image resolution, Computational modeling, Biological system modeling, Memory management, Transformers, Biomedical/healthcare/medicine BibRef

Chen, X.Y.[Xiang-Yu], Hu, Q.[Qinghao], Li, K.[Kaidong], Zhong, C.[Cuncong], Wang, G.H.[Guang-Hui],
Accumulated Trivial Attention Matters in Vision Transformers on Small Datasets,
WACV23(3973-3981)
IEEE DOI 2302
Codes, Focusing, Transformers, Convolutional neural networks, Task analysis, Algorithms: Machine learning architectures, and algorithms (including transfer) BibRef

Chen, C.[Chang], Zhang, J.[JiaMing], Yang, K.[Kailun], Peng, K.[Kunyu], Stiefelhagen, R.[Rainer],
Trans4Map: Revisiting Holistic Bird's-Eye-View Mapping from Egocentric Images to Allocentric Semantics with Vision Transformers,
WACV23(4002-4011)
IEEE DOI 2302
Computational modeling, Semantic segmentation, Semantics, Memory management, Pipelines, Transformers, Feature extraction, segmentation BibRef

Lan, H.[Hai], Wang, X.[Xihao], Shen, H.[Hao], Liang, P.[Peidong], Wei, X.[Xian],
Couplformer: Rethinking Vision Transformer with Coupling Attention,
WACV23(6464-6473)
IEEE DOI 2302
Couplings, Visualization, Image segmentation, Computational modeling, Memory management, Object detection, Visualization BibRef

Marin, D.[Dmitrii], Chang, J.H.R.[Jen-Hao Rick], Ranjan, A.[Anurag], Prabhu, A.[Anish], Rastegari, M.[Mohammad], Tuzel, O.[Oncel],
Token Pooling in Vision Transformers for Image Classification,
WACV23(12-21)
IEEE DOI 2302
Filtering, Semantic segmentation, Pose estimation, Transformers, Encoding, Convolutional neural networks, and algorithms (including transfer) BibRef

Song, C.H.[Chull Hwan], Yoon, J.Y.[Joo-Young], Choi, S.[Shunghyun], Avrithis, Y.[Yannis],
Boosting vision transformers for image retrieval,
WACV23(107-117)
IEEE DOI 2302
Training, Location awareness, Image retrieval, Self-supervised learning, Image representation, Transformers BibRef

Yang, J.[Jinyu], Liu, J.J.[Jing-Jing], Xu, N.[Ning], Huang, J.Z.[Jun-Zhou],
TVT: Transferable Vision Transformer for Unsupervised Domain Adaptation,
WACV23(520-530)
IEEE DOI 2302
Benchmark testing, Image representation, Transformers, Convolutional neural networks, Task analysis, and algorithms (including transfer) BibRef

Lin, K.E.[Kai-En], Yen-Chen, L.[Lin], Lai, W.S.[Wei-Sheng], Lin, T.Y.[Tsung-Yi], Shih, Y.C.[Yi-Chang], Ramamoorthi, R.[Ravi],
Vision Transformer for NeRF-Based View Synthesis from a Single Input Image,
WACV23(806-815)
IEEE DOI 2302
Shape, Pose estimation, Feature extraction, Transformers, Cameras, Algorithms: Computational photography, 3D computer vision BibRef

Saavedra-Ruiz, M.[Miguel], Morin, S.[Sacha], Paull, L.[Liam],
Monocular Robot Navigation with Self-Supervised Pretrained Vision Transformers,
CRV22(197-204)
IEEE DOI 2301
Adaptation models, Image segmentation, Image resolution, Navigation, Transformers, Robot sensing systems, Visual Servoing BibRef

Debnath, B.[Biplob], Po, O.[Oliver], Chowdhury, F.A.[Farhan Asif], Chakradhar, S.[Srimat],
Cosine Similarity based Few-Shot Video Classifier with Attention-based Aggregation,
ICPR22(1273-1279)
IEEE DOI 2212
Training, Head, Pipelines, Benchmark testing, Feature extraction, Transformers BibRef

Patel, K.[Krushi], Bur, A.M.[Andrés M.], Li, F.J.[Feng-Jun], Wang, G.H.[Guang-Hui],
Aggregating Global Features into Local Vision Transformer,
ICPR22(1141-1147)
IEEE DOI 2212
Source coding, Computational modeling, Information processing, Performance gain, Transformers BibRef

Shen, Z.Q.[Zhi-Qiang], Liu, Z.[Zechun], Xing, E.[Eric],
Sliced Recursive Transformer,
ECCV22(XXIV:727-744).
Springer DOI 2211
BibRef

Shao, Y.[Yidi], Loy, C.C.[Chen Change], Dai, B.[Bo],
Transformer with Implicit Edges for Particle-Based Physics Simulation,
ECCV22(XIX:549-564).
Springer DOI 2211
BibRef

Wang, W.[Wen], Zhang, J.[Jing], Cao, Y.[Yang], Shen, Y.L.[Yong-Liang], Tao, D.C.[Da-Cheng],
Towards Data-Efficient Detection Transformers,
ECCV22(IX:88-105).
Springer DOI 2211
BibRef

Mari, C.R.[Carlos Roig], Gonzalez, D.V.[David Varas], Bou-Balust, E.[Elisenda],
Multi-Scale Transformer-Based Feature Combination for Image Retrieval,
ICIP22(3166-3170)
IEEE DOI 2211
Visualization, Semantics, Image retrieval, Feature extraction, Transformers, Internet, Image retrieval, Attention, Multi-scale, Feature combination BibRef

Lorenzana, M.B.[Marlon Bran], Engstrom, C.[Craig], Chandra, S.S.[Shekhar S.],
Transformer Compressed Sensing Via Global Image Tokens,
ICIP22(3011-3015)
IEEE DOI 2211
Training, Limiting, Image resolution, Neural networks, Image representation, Transformers, MRI BibRef

Furukawa, R.[Ryouichi], Hotta, K.[Kazuhiro],
Local Embedding for Axial Attention,
ICIP22(2586-2590)
IEEE DOI 2211
Deep learning, Image segmentation, Visualization, Computational modeling, Neural networks, Transformers. BibRef

Lu, X.Y.[Xiao-Yong], Du, S.[Songlin],
NCTR: Neighborhood Consensus Transformer for Feature Matching,
ICIP22(2726-2730)
IEEE DOI 2211
Learning systems, Impedance matching, Aggregates, Pose estimation, Neural networks, Transformers, Local feature matching, graph neural network BibRef

Jeny, A.A.[Afsana Ahsan], Junayed, M.S.[Masum Shah], Islam, M.B.[Md Baharul],
An Efficient End-To-End Image Compression Transformer,
ICIP22(1786-1790)
IEEE DOI 2211
Image coding, Correlation, Limiting, Computational modeling, Rate-distortion, Video compression, Transformers, entropy model BibRef

Shang, J.H.[Jing-Huan], Kahatapitiya, K.[Kumara], Li, X.[Xiang], Ryoo, M.S.[Michael S.],
StARformer: Transformer with State-Action-Reward Representations for Visual Reinforcement Learning,
ECCV22(XXIX:462-479).
Springer DOI 2211
BibRef

Kakogeorgiou, I.[Ioannis], Gidaris, S.[Spyros], Psomas, B.[Bill], Avrithis, Y.[Yannis], Bursuc, A.[Andrei], Karantzalos, K.[Konstantinos], Komodakis, N.[Nikos],
What to Hide from Your Students: Attention-Guided Masked Image Modeling,
ECCV22(XXX:300-318).
Springer DOI 2211

WWW Link. BibRef

Bai, J.W.[Jia-Wang], Yuan, L.[Li], Xia, S.T.[Shu-Tao], Yan, S.C.[Shui-Cheng], Li, Z.F.[Zhi-Feng], Liu, W.[Wei],
Improving Vision Transformers by Revisiting High-Frequency Components,
ECCV22(XXIV:1-18).
Springer DOI 2211
BibRef

Ding, M.Y.[Ming-Yu], Xiao, B.[Bin], Codella, N.[Noel], Luo, P.[Ping], Wang, J.D.[Jing-Dong], Yuan, L.[Lu],
DaViT: Dual Attention Vision Transformers,
ECCV22(XXIV:74-92).
Springer DOI 2211
BibRef

Li, K.[Kehan], Yu, R.[Runyi], Wang, Z.[Zhennan], Yuan, L.[Li], Song, G.[Guoli], Chen, J.[Jie],
Locality Guidance for Improving Vision Transformers on Tiny Datasets,
ECCV22(XXIV:110-127).
Springer DOI 2211
BibRef

Wang, P.[Pichao], Wang, X.[Xue], Wang, F.[Fan], Lin, M.[Ming], Chang, S.[Shuning], Li, H.[Hao], Jin, R.[Rong],
KVT: k-NN Attention for Boosting Vision Transformers,
ECCV22(XXIV:285-302).
Springer DOI 2211
BibRef

Tu, Z.Z.[Zheng-Zhong], Talebi, H.[Hossein], Zhang, H.[Han], Yang, F.[Feng], Milanfar, P.[Peyman], Bovik, A.C.[Alan C.], Li, Y.[Yinxiao],
MaxViT: Multi-axis Vision Transformer,
ECCV22(XXIV:459-479).
Springer DOI 2211
BibRef

Yang, R.[Rui], Ma, H.L.[Hai-Long], Wu, J.[Jie], Tang, Y.[Yansong], Xiao, X.F.[Xue-Feng], Zheng, M.[Min], Li, X.[Xiu],
ScalableViT: Rethinking the Context-Oriented Generalization of Vision Transformer,
ECCV22(XXIV:480-496).
Springer DOI 2211
BibRef

Touvron, H.[Hugo], Cord, M.[Matthieu], El-Nouby, A.[Alaaeldin], Verbeek, J.[Jakob], Jégou, H.[Hervé],
Three Things Everyone Should Know About Vision Transformers,
ECCV22(XXIV:497-515).
Springer DOI 2211
BibRef

Touvron, H.[Hugo], Cord, M.[Matthieu], Jégou, H.[Hervé],
DeiT III: Revenge of the ViT,
ECCV22(XXIV:516-533).
Springer DOI 2211
BibRef

Li, Y.H.[Yang-Hao], Mao, H.Z.[Han-Zi], Girshick, R.[Ross], He, K.M.[Kai-Ming],
Exploring Plain Vision Transformer Backbones for Object Detection,
ECCV22(IX:280-296).
Springer DOI 2211
BibRef

Yu, Q.H.[Qi-Hang], Wang, H.Y.[Hui-Yu], Qiao, S.Y.[Si-Yuan], Collins, M.[Maxwell], Zhu, Y.K.[Yu-Kun], Adam, H.[Hartwig], Yuille, A.L.[Alan L.], Chen, L.C.[Liang-Chieh],
k-means Mask Transformer,
ECCV22(XXIX:288-307).
Springer DOI 2211
BibRef

Lezama, J.[José], Chang, H.[Huiwen], Jiang, L.[Lu], Essa, I.[Irfan],
Improved Masked Image Generation with Token-Critic,
ECCV22(XXIII:70-86).
Springer DOI 2211
Generative transformer. BibRef

Rao, Y.M.[Yong-Ming], Zhao, W.[Wenliang], Zhou, J.[Jie], Lu, J.W.[Ji-Wen],
AMixer: Adaptive Weight Mixing for Self-Attention Free Vision Transformers,
ECCV22(XXI:50-67).
Springer DOI 2211
BibRef

Pham, K.[Khoi], Kafle, K.[Kushal], Lin, Z.[Zhe], Ding, Z.H.[Zhi-Hong], Cohen, S.[Scott], Tran, Q.[Quan], Shrivastava, A.[Abhinav],
Improving Closed and Open-Vocabulary Attribute Prediction Using Transformers,
ECCV22(XXV:201-219).
Springer DOI 2211
BibRef

Yu, W.X.[Wen-Xin], Zhang, H.[Hongru], Lan, T.X.[Tian-Xiang], Hu, Y.C.[Yu-Cheng], Yin, D.[Dong],
CBPT: A New Backbone for Enhancing Information Transmission of Vision Transformers,
ICIP22(156-160)
IEEE DOI 2211
Merging, Information processing, Object detection, Transformers, Computational complexity, Vision Transformer, Backbone BibRef

Takeda, M.[Mana], Yanai, K.[Keiji],
Continual Learning in Vision Transformer,
ICIP22(616-620)
IEEE DOI 2211
Learning systems, Image recognition, Transformers, Natural language processing, Convolutional neural networks, Vision Transformer BibRef

Zhou, W.L.[Wei-Lian], Kamata, S.I.[Sei-Ichiro], Luo, Z.[Zhengbo], Xue, X.[Xi],
Rethinking Unified Spectral-Spatial-Based Hyperspectral Image Classification Under 3D Configuration of Vision Transformer,
ICIP22(711-715)
IEEE DOI 2211
Flowcharts, Correlation, Convolution, Transformers, Hyperspectral image classification, 3D coordinate positional embedding BibRef

Li, A.[Ang], Jiao, J.[Jichao], Li, N.[Ning], Qi, W.[Wangjing], Xu, W.[Wei], Pang, M.[Min],
Conmw Transformer: A General Vision Transformer Backbone With Merged-Window Attention,
ICIP22(1551-1555)
IEEE DOI 2211
Image resolution, Convolution, Transformers, Feature extraction, Tokenization, Computational efficiency, Vision Transformer, hybrid architecture BibRef

Li, J.[Junbo], Zhang, H.[Huan], Xie, C.[Cihang],
ViP: Unified Certified Detection and Recovery for Patch Attack with Vision Transformers,
ECCV22(XXV:573-587).
Springer DOI 2211
BibRef

Zhang, Q.M.[Qi-Ming], Xu, Y.[Yufei], Zhang, J.[Jing], Tao, D.C.[Da-Cheng],
VSA: Learning Varied-Size Window Attention in Vision Transformers,
ECCV22(XXV:466-483).
Springer DOI 2211
BibRef

Cao, Y.H.[Yun-Hao], Yu, H.[Hao], Wu, J.X.[Jian-Xin],
Training Vision Transformers with only 2040 Images,
ECCV22(XXV:220-237).
Springer DOI 2211
BibRef

Wang, C.[Cong], Xu, H.M.[Hong-Min], Zhang, X.[Xiong], Wang, L.[Li], Zheng, Z.[Zhitong], Liu, H.F.[Hai-Feng],
Convolutional Embedding Makes Hierarchical Vision Transformer Stronger,
ECCV22(XX:739-756).
Springer DOI 2211
BibRef

Wu, B.[Boxi], Gu, J.D.[Jin-Dong], Li, Z.F.[Zhi-Feng], Cai, D.[Deng], He, X.F.[Xiao-Fei], Liu, W.[Wei],
Towards Efficient Adversarial Training on Vision Transformers,
ECCV22(XIII:307-325).
Springer DOI 2211
BibRef

Gu, J.D.[Jin-Dong], Tresp, V.[Volker], Qin, Y.[Yao],
Are Vision Transformers Robust to Patch Perturbations?,
ECCV22(XII:404-421).
Springer DOI 2211
BibRef

Zong, Z.[Zhuofan], Li, K.[Kunchang], Song, G.[Guanglu], Wang, Y.[Yali], Qiao, Y.[Yu], Leng, B.[Biao], Liu, Y.[Yu],
Self-slimmed Vision Transformer,
ECCV22(XI:432-448).
Springer DOI 2211
BibRef

Fayyaz, M.[Mohsen], Koohpayegani, S.A.[Soroush Abbasi], Jafari, F.R.[Farnoush Rezaei], Sengupta, S.[Sunando], Joze, H.R.V.[Hamid Reza Vaezi], Sommerlade, E.[Eric], Pirsiavash, H.[Hamed], Gall, J.[Jürgen],
Adaptive Token Sampling for Efficient Vision Transformers,
ECCV22(XI:396-414).
Springer DOI 2211
BibRef

Li, Z.K.[Zhi-Kai], Ma, L.P.[Li-Ping], Chen, M.J.[Meng-Juan], Xiao, J.R.[Jun-Rui], Gu, Q.Y.[Qing-Yi],
Patch Similarity Aware Data-Free Quantization for Vision Transformers,
ECCV22(XI:154-170).
Springer DOI 2211
BibRef

Weng, Z.J.[Ze-Jia], Yang, X.T.[Xi-Tong], Li, A.[Ang], Wu, Z.X.[Zu-Xuan], Jiang, Y.G.[Yu-Gang],
Semi-supervised Vision Transformers,
ECCV22(XXX:605-620).
Springer DOI 2211
BibRef

Mallick, R.[Rupayan], Benois-Pineau, J.[Jenny], Zemmari, A.[Akka],
I Saw: A Self-Attention Weighted Method for Explanation of Visual Transformers,
ICIP22(3271-3275)
IEEE DOI 2211
Measurement, Correlation coefficient, Visualization, Image segmentation, Databases, Object detection, Transformers, Gaze Fixation Density Maps BibRef

Su, T.[Tong], Ye, S.[Shuo], Song, C.Q.[Cheng-Qun], Cheng, J.[Jun],
Mask-Vit: an Object Mask Embedding in Vision Transformer for Fine-Grained Visual Classification,
ICIP22(1626-1630)
IEEE DOI 2211
Knowledge engineering, Visualization, Focusing, Interference, Benchmark testing, Transformers, Feature extraction, Knowledge Embedding BibRef

Gai, L.[Lulu], Chen, W.[Wei], Gao, R.[Rui], Chen, Y.W.[Yan-Wei], Qiao, X.[Xu],
Using Vision Transformers in 3-D Medical Image Classifications,
ICIP22(696-700)
IEEE DOI 2211
Deep learning, Training, Visualization, Transfer learning, Optimization methods, Self-supervised learning, Transformers, 3-D medical image classifications BibRef

Wu, K.[Kan], Zhang, J.[Jinnian], Peng, H.[Houwen], Liu, M.[Mengchen], Xiao, B.[Bin], Fu, J.L.[Jian-Long], Yuan, L.[Lu],
TinyViT: Fast Pretraining Distillation for Small Vision Transformers,
ECCV22(XXI:68-85).
Springer DOI 2211
BibRef

Gao, L.[Li], Nie, D.[Dong], Li, B.[Bo], Ren, X.F.[Xiao-Feng],
Doubly-Fused ViT: Fuse Information from Vision Transformer Doubly with Local Representation,
ECCV22(XXIII:744-761).
Springer DOI 2211
BibRef

Yao, T.[Ting], Pan, Y.[Yingwei], Li, Y.[Yehao], Ngo, C.W.[Chong-Wah], Mei, T.[Tao],
Wave-ViT: Unifying Wavelet and Transformers for Visual Representation Learning,
ECCV22(XXV:328-345).
Springer DOI 2211
BibRef

Yuan, Z.H.[Zhi-Hang], Xue, C.H.[Chen-Hao], Chen, Y.Q.[Yi-Qi], Wu, Q.[Qiang], Sun, G.[Guangyu],
PTQ4ViT: Post-training Quantization for Vision Transformers with Twin Uniform Quantization,
ECCV22(XII:191-207).
Springer DOI 2211
BibRef

Kong, Z.L.[Zheng-Lun], Dong, P.Y.[Pei-Yan], Ma, X.L.[Xiao-Long], Meng, X.[Xin], Niu, W.[Wei], Sun, M.S.[Meng-Shu], Shen, X.[Xuan], Yuan, G.[Geng], Ren, B.[Bin], Tang, H.[Hao], Qin, M.[Minghai], Wang, Y.Z.[Yan-Zhi],
SPViT: Enabling Faster Vision Transformers via Latency-Aware Soft Token Pruning,
ECCV22(XI:620-640).
Springer DOI 2211
BibRef

Pan, J.[Junting], Bulat, A.[Adrian], Tan, F.[Fuwen], Zhu, X.T.[Xia-Tian], Dudziak, L.[Lukasz], Li, H.S.[Hong-Sheng], Tzimiropoulos, G.[Georgios], Martinez, B.[Brais],
EdgeViTs: Competing Light-Weight CNNs on Mobile Devices with Vision Transformers,
ECCV22(XI:294-311).
Springer DOI 2211
BibRef

Xu, R.S.[Run-Sheng], Xiang, H.[Hao], Tu, Z.Z.[Zheng-Zhong], Xia, X.[Xin], Yang, M.H.[Ming-Hsuan], Ma, J.Q.[Jia-Qi],
V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer,
ECCV22(XXIX:107-124).
Springer DOI 2211
BibRef

Liu, Y.[Yong], Mai, S.Q.[Si-Qi], Chen, X.N.[Xiang-Ning], Hsieh, C.J.[Cho-Jui], You, Y.[Yang],
Towards Efficient and Scalable Sharpness-Aware Minimization,
CVPR22(12350-12360)
IEEE DOI 2210

WWW Link. Training, Schedules, Scalability, Perturbation methods, Stochastic processes, Transformers, Minimization, Vision applications and systems BibRef

Ren, P.Z.[Peng-Zhen], Li, C.[Changlin], Wang, G.[Guangrun], Xiao, Y.[Yun], Du, Q.[Qing], Liang, X.D.[Xiao-Dan], Chang, X.J.[Xiao-Jun],
Beyond Fixation: Dynamic Window Visual Transformer,
CVPR22(11977-11987)
IEEE DOI 2210
Performance evaluation, Visualization, Systematics, Computational modeling, Scalability, Transformers, Deep learning architectures and techniques BibRef

Liu, Z.[Ze], Hu, H.[Han], Lin, Y.T.[Yu-Tong], Yao, Z.L.[Zhu-Liang], Xie, Z.D.[Zhen-Da], Wei, Y.X.[Yi-Xuan], Ning, J.[Jia], Cao, Y.[Yue], Zhang, Z.[Zheng], Dong, L.[Li], Wei, F.[Furu], Guo, B.[Baining],
Swin Transformer V2: Scaling Up Capacity and Resolution,
CVPR22(11999-12009)
IEEE DOI 2210
Training, Representation learning, Adaptation models, Image resolution, Computational modeling, Semantics, Representation learning BibRef

Bhattacharjee, D.[Deblina], Zhang, T.[Tong], Süsstrunk, S.[Sabine], Salzmann, M.[Mathieu],
MuIT: An End-to-End Multitask Learning Transformer,
CVPR22(12021-12031)
IEEE DOI 2210
Heart, Image segmentation, Computational modeling, Image edge detection, Semantics, Estimation, Predictive models, Scene analysis and understanding BibRef

Fang, J.[Jiemin], Xie, L.X.[Ling-Xi], Wang, X.G.[Xing-Gang], Zhang, X.P.[Xiao-Peng], Liu, W.Y.[Wen-Yu], Tian, Q.[Qi],
MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger Tokens,
CVPR22(12053-12062)
IEEE DOI 2210
Deep learning, Visualization, Neural networks, Graphics processing units, retrieval BibRef

Sandler, M.[Mark], Zhmoginov, A.[Andrey], Vladymyrov, M.[Max], Jackson, A.[Andrew],
Fine-tuning Image Transformers using Learnable Memory,
CVPR22(12145-12154)
IEEE DOI 2210
Deep learning, Adaptation models, Costs, Computational modeling, Memory management, Transformers, Transfer/low-shot/long-tail learning BibRef

Yu, X.[Xumin], Tang, L.[Lulu], Rao, Y.M.[Yong-Ming], Huang, T.J.[Tie-Jun], Zhou, J.[Jie], Lu, J.W.[Ji-Wen],
Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling,
CVPR22(19291-19300)
IEEE DOI 2210
Point cloud compression, Solid modeling, Computational modeling, Bit error rate, Transformers, Pattern recognition, Deep learning architectures and techniques BibRef

Park, C.[Chunghyun], Jeong, Y.[Yoonwoo], Cho, M.[Minsu], Park, J.[Jaesik],
Fast Point Transformer,
CVPR22(16928-16937)
IEEE DOI 2210
Point cloud compression, Shape, Semantics, Neural networks, Transformers, grouping and shape analysis BibRef

Ren, S.[Sucheng], Zhou, D.[Daquan], He, S.F.[Sheng-Feng], Feng, J.S.[Jia-Shi], Wang, X.C.[Xin-Chao],
Shunted Self-Attention via Multi-Scale Token Aggregation,
CVPR22(10843-10852)
IEEE DOI 2210
Degradation, Deep learning, Costs, Computational modeling, Merging, Efficient learning and inferences BibRef

Zeng, W.[Wang], Jin, S.[Sheng], Liu, W.T.[Wen-Tao], Qian, C.[Chen], Luo, P.[Ping], Ouyang, W.L.[Wan-Li], Wang, X.G.[Xiao-Gang],
Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer,
CVPR22(11091-11101)
IEEE DOI 2210
Visualization, Shape, Pose estimation, Semantics, Pose estimation and tracking, Deep learning architectures and techniques BibRef

Yu, W.H.[Wei-Hao], Luo, M.[Mi], Zhou, P.[Pan], Si, C.Y.[Chen-Yang], Zhou, Y.C.[Yi-Chen], Wang, X.C.[Xin-Chao], Feng, J.S.[Jia-Shi], Yan, S.C.[Shui-Cheng],
MetaFormer is Actually What You Need for Vision,
CVPR22(10809-10819)
IEEE DOI 2210
Computational modeling, Focusing, Transformers, Pattern recognition, Task analysis, retrieval BibRef

Xie, Z.D.[Zhen-Da], Zhang, Z.[Zheng], Cao, Y.[Yue], Lin, Y.T.[Yu-Tong], Bao, J.M.[Jian-Min], Yao, Z.L.[Zhu-Liang], Dai, Q.[Qi], Hu, H.[Han],
SimMIM: a Simple Framework for Masked Image Modeling,
CVPR22(9643-9653)
IEEE DOI 2210

WWW Link. Representation learning, Training, Head, Self-supervised learning, Predictive models, Data models, Self- semi- meta- Representation learning BibRef

Song, Z.[Zikai], Yu, J.Q.[Jun-Qing], Chen, Y.P.P.[Yi-Ping Phoebe], Yang, W.[Wei],
Transformer Tracking with Cyclic Shifting Window Attention,
CVPR22(8781-8790)
IEEE DOI 2210

WWW Link. Visualization, Target tracking, Image recognition, Optimization methods, Benchmark testing BibRef

Tu, Z.Z.[Zheng-Zhong], Talebi, H.[Hossein], Zhang, H.[Han], Yang, F.[Feng], Milanfar, P.[Peyman], Bovik, A.[Alan], Li, Y.X.[Yin-Xiao],
MAXIM: Multi-Axis MLP for Image Processing,
CVPR22(5759-5770)
IEEE DOI 2210

WWW Link. Training, Photography, Adaptation models, Visualization, Computational modeling, Transformers, Low-level vision, Computational photography BibRef

Chen, Z.[Zhe], Zhang, J.[Jing], Tao, D.C.[Da-Cheng],
Recurrent Glimpse-based Decoder for Detection with Transformer,
CVPR22(5250-5259)
IEEE DOI 2210

WWW Link. Training, Visualization, Pipelines, Detectors, Feature extraction, Transformers, Recognition: detection, categorization, retrieval BibRef

Yun, S.[Sukmin], Lee, H.[Hankook], Kim, J.[Jaehyung], Shin, J.[Jinwoo],
Patch-level Representation Learning for Self-supervised Vision Transformers,
CVPR22(8344-8353)
IEEE DOI 2210
Training, Representation learning, Visualization, Neural networks, Object detection, Self-supervised learning, Transformers, Self- semi- meta- unsupervised learning BibRef

Hou, Z.J.[Ze-Jiang], Kung, S.Y.[Sun-Yuan],
Multi-Dimensional Vision Transformer Compression via Dependency Guided Gaussian Process Search,
EVW22(3668-3677)
IEEE DOI 2210
Adaptation models, Image coding, Head, Computational modeling, Neurons, Gaussian processes, Transformers BibRef

Zhang, G.J.[Gong-Jie], Luo, Z.P.[Zhi-Peng], Yu, Y.C.[Ying-Chen], Cui, K.[Kaiwen], Lu, S.J.[Shi-Jian],
Accelerating DETR Convergence via Semantic-Aligned Matching,
CVPR22(939-948)
IEEE DOI 2210
Code, Detection Transformer.
WWW Link. Training, Costs, Semantics, Object detection, Transformers, Feature extraction, Recognition: detection, categorization, Motion and tracking BibRef

Gupta, A.[Akshita], Narayan, S.[Sanath], Joseph, K.J.[K J], Khan, S.[Salman], Khan, F.S.[Fahad Shahbaz], Shah, M.[Mubarak],
OW-DETR: Open-world Detection Transformer,
CVPR22(9225-9234)
IEEE DOI 2210
Training, Object detection, Transformers, Pattern recognition, Proposals, Object recognition, retrieval, categorization, Recognition: detection BibRef

Lou, Q.[Qian], Hsu, Y.C.[Yen-Chang], Uzkent, B.[Burak], Hua, T.[Ting], Shen, Y.[Yilin], Jin, H.X.[Hong-Xia],
Lite-MDETR: A Lightweight Multi-Modal Detector,
CVPR22(12196-12205)
IEEE DOI 2210
nTraining, Performance evaluation, Visualization, Dictionaries, Grounding, Detectors, Transformers, BibRef

Li, F.[Feng], Zhang, H.[Hao], Liu, S.[Shilong], Guo, J.[Jian], Ni, L.M.[Lionel M.], Zhang, L.[Lei],
DN-DETR: Accelerate DETR Training by Introducing Query DeNoising,
CVPR22(13609-13617)
IEEE DOI 2210
Training, Codes, Machine vision, Noise reduction, Transformers, Pattern recognition, Decoding, Recognition: detection, Vision applications and systems BibRef

La Bonte, T.[Tyler], Song, Y.[Yale], Wang, X.[Xin], Vineet, V.[Vibhav], Joshi, N.[Neel],
Scaling Novel Object Detection with Weakly Supervised Detection Transformers,
WACV23(85-96)
IEEE DOI 2302
Training, Costs, Surveillance, Object detection, Detectors, Transformers, Data models, and un-supervised learning) BibRef

Bar, A.[Amir], Wang, X.[Xin], Kantorov, V.[Vadim], Reed, C.J.[Colorado J.], Herzig, R.[Roei], Chechik, G.[Gal], Rohrbach, A.[Anna], Darrell, T.J.[Trevor J.], Globerson, A.[Amir],
DETReg: Unsupervised Pretraining with Region Priors for Object Detection,
CVPR22(14585-14595)
IEEE DOI 2210
Location awareness, Training, Object detection, Detectors, Transformers, Generators, Representation learning BibRef

Salman, H.[Hadi], Jain, S.[Saachi], Wong, E.[Eric], Madry, A.[Aleksander],
Certified Patch Robustness via Smoothed Vision Transformers,
CVPR22(15116-15126)
IEEE DOI 2210
Visualization, Smoothing methods, Costs, Computational modeling, Transformers, Adversarial attack and defense BibRef

Wang, Y.K.[Yi-Kai], Chen, X.H.[Xing-Hao], Cao, L.[Lele], Huang, W.B.[Wen-Bing], Sun, F.C.[Fu-Chun], Wang, Y.H.[Yun-He],
Multimodal Token Fusion for Vision Transformers,
CVPR22(12176-12185)
IEEE DOI 2210
Point cloud compression, Image segmentation, Shape, Semantics, Object detection, Vision+X BibRef

Tang, Y.[Yehui], Han, K.[Kai], Wang, Y.H.[Yun-He], Xu, C.[Chang], Guo, J.Y.[Jian-Yuan], Xu, C.[Chao], Tao, D.C.[Da-Cheng],
Patch Slimming for Efficient Vision Transformers,
CVPR22(12155-12164)
IEEE DOI 2210
Visualization, Quantization (signal), Computational modeling, Aggregates, Benchmark testing, Representation learning BibRef

Zhang, J.[Jinnian], Peng, H.[Houwen], Wu, K.[Kan], Liu, M.[Mengchen], Xiao, B.[Bin], Fu, J.L.[Jian-Long], Yuan, L.[Lu],
MiniViT: Compressing Vision Transformers with Weight Multiplexing,
CVPR22(12135-12144)
IEEE DOI 2210
Multiplexing, Performance evaluation, Image coding, Codes, Computational modeling, Benchmark testing, Vision applications and systems BibRef

Chen, J.N.[Jie-Neng], Sun, S.[Shuyang], He, J.[Ju], Torr, P.H.S.[Philip H.S.], Yuille, A.L.[Alan L.], Bai, S.[Song],
TransMix: Attend to Mix for Vision Transformers,
CVPR22(12125-12134)
IEEE DOI 2210
Training, Image segmentation, Codes, Semantics, Object detection, Benchmark testing, Transformers, Representation learning BibRef

Dong, X.Y.[Xiao-Yi], Bao, J.[Jianmin], Chen, D.D.[Dong-Dong], Zhang, W.M.[Wei-Ming], Yu, N.H.[Neng-Hai], Yuan, L.[Lu], Chen, D.[Dong], Guo, B.[Baining],
CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows,
CVPR22(12114-12124)
IEEE DOI 2210
Image segmentation, Costs, Mathematical analysis, Training data, Transformer cores, Transformers, grouping and shape analysis BibRef

Liu, H.[Hao], Jiang, X.H.[Xing-Hua], Li, X.[Xin], Bao, Z.M.[Zhi-Min], Jiang, D.Q.[De-Qiang], Ren, B.[Bo],
NomMer: Nominate Synergistic Context in Vision Transformer for Visual Recognition,
CVPR22(12063-12072)
IEEE DOI 2210
Visualization, Image segmentation, Semantics, Redundancy, Object detection, Deep learning architectures and techniques BibRef

Chen, T.L.[Tian-Long], Zhang, Z.Y.[Zhen-Yu], Cheng, Y.[Yu], Awadallah, A.[Ahmed], Wang, Z.Y.[Zhang-Yang],
The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy,
CVPR22(12010-12020)
IEEE DOI 2210
Training, Convolutional codes, Deep learning, Computational modeling, Redundancy, Deep learning architectures and techniques BibRef

Yang, C.[Chenglin], Wang, Y.[Yilin], Zhang, J.M.[Jian-Ming], Zhang, H.[He], Wei, Z.J.[Zi-Jun], Lin, Z.[Zhe], Yuille, A.L.[Alan L.],
Lite Vision Transformer with Enhanced Self-Attention,
CVPR22(11988-11998)
IEEE DOI 2210
Convolutional codes, Image segmentation, Visualization, Convolution, Semantics, Merging, Predictive models, Deep learning architectures and techniques BibRef

Yin, H.X.[Hong-Xu], Vahdat, A.[Arash], Alvarez, J.M.[Jose M.], Mallya, A.[Arun], Kautz, J.[Jan], Molchanov, P.[Pavlo],
A-ViT: Adaptive Tokens for Efficient Vision Transformer,
CVPR22(10799-10808)
IEEE DOI 2210
Training, Adaptive systems, Network architecture, Transformers, Throughput, Hardware, Complexity theory, Efficient learning and inferences BibRef

Lu, J.H.[Jia-Hao], Zhang, X.S.[Xi Sheryl], Zhao, T.L.[Tian-Li], He, X.Y.[Xiang-Yu], Cheng, J.[Jian],
APRIL: Finding the Achilles' Heel on Privacy for Vision Transformers,
CVPR22(10041-10050)
IEEE DOI 2210
Privacy, Data privacy, Federated learning, Computational modeling, Training data, Transformers, Market research, Privacy and federated learning BibRef

Hatamizadeh, A.[Ali], Yin, H.X.[Hong-Xu], Roth, H.[Holger], Li, W.Q.[Wen-Qi], Kautz, J.[Jan], Xu, D.[Daguang], Molchanov, P.[Pavlo],
GradViT: Gradient Inversion of Vision Transformers,
CVPR22(10011-10020)
IEEE DOI 2210
Measurement, Differential privacy, Neural networks, Transformers, Pattern recognition, Security, Iterative methods, Privacy and federated learning BibRef

Zhang, H.[Haofei], Duan, J.R.[Jia-Rui], Xue, M.Q.[Meng-Qi], Song, J.[Jie], Sun, L.[Li], Song, M.L.[Ming-Li],
Bootstrapping ViTs: Towards Liberating Vision Transformers from Pre-training,
CVPR22(8934-8943)
IEEE DOI 2210
Training, Upper bound, Neural networks, Training data, Network architecture, Transformers, Computer vision theory, Efficient learning and inferences BibRef

Chavan, A.[Arnav], Shen, Z.Q.[Zhi-Qiang], Liu, Z.[Zhuang], Liu, Z.[Zechun], Cheng, K.T.[Kwang-Ting], Xing, E.[Eric],
Vision Transformer Slimming: Multi-Dimension Searching in Continuous Optimization Space,
CVPR22(4921-4931)
IEEE DOI 2210
Training, Performance evaluation, Image coding, Force, Graphics processing units, Vision applications and systems BibRef

Xia, Z.F.[Zhuo-Fan], Pan, X.[Xuran], Song, S.[Shiji], Li, L.E.[Li Erran], Huang, G.[Gao],
Vision Transformer with Deformable Attention,
CVPR22(4784-4793)
IEEE DOI 2210
Deformable models, Adaptation models, Computational modeling, Predictive models, Transformers, Data models, grouping and shape analysis BibRef

Hong, W.X.[Wei-Xiang], Lao, J.W.[Jiang-Wei], Ren, W.[Wang], Wang, J.[Jian], Chen, J.D.[Jing-Dong], Chu, W.[Wei],
Training Object Detectors from Scratch: An Empirical Study in the Era of Vision Transformer,
CVPR22(4652-4661)
IEEE DOI 2210
Training, Visualization, Semantics, Detectors, Object detection, Transformers, Recognition: detection, categorization, retrieval, Deep learning architectures and techniques BibRef

Chen, Z.Y.[Zhao-Yu], Li, B.[Bo], Wu, S.[Shuang], Xu, J.H.[Jiang-He], Ding, S.H.[Shou-Hong], Zhang, W.Q.[Wen-Qiang],
Shape Matters: Deformable Patch Attack,
ECCV22(IV:529-548).
Springer DOI 2211
BibRef

Chen, Z.Y.[Zhao-Yu], Li, B.[Bo], Xu, J.H.[Jiang-He], Wu, S.[Shuang], Ding, S.H.[Shou-Hong], Zhang, W.Q.[Wen-Qiang],
Towards Practical Certifiable Patch Defense with Vision Transformer,
CVPR22(15127-15137)
IEEE DOI 2210
Smoothing methods, Toy manufacturing industry, Semantics, Network architecture, Transformers, Robustness, Adversarial attack and defense BibRef

Chen, R.J.[Richard J.], Chen, C.[Chengkuan], Li, Y.C.[Yi-Cong], Chen, T.Y.[Tiffany Y.], Trister, A.D.[Andrew D.], Krishnan, R.G.[Rahul G.], Mahmood, F.[Faisal],
Scaling Vision Transformers to Gigapixel Images via Hierarchical Self-Supervised Learning,
CVPR22(16123-16134)
IEEE DOI 2210
Training, Visualization, Self-supervised learning, Image representation, Transformers, Self- semi- meta- unsupervised learning BibRef

Yang, Z.[Zhao], Wang, J.Q.[Jia-Qi], Tang, Y.S.[Yan-Song], Chen, K.[Kai], Zhao, H.S.[Heng-Shuang], Torr, P.H.S.[Philip H.S.],
LAVT: Language-Aware Vision Transformer for Referring Image Segmentation,
CVPR22(18134-18144)
IEEE DOI 2210
Image segmentation, Visualization, Image coding, Shape, Linguistics, Transformers, Feature extraction, Segmentation, grouping and shape analysis BibRef

Scheibenreif, L.[Linus], Hanna, J.[Joëlle], Mommert, M.[Michael], Borth, D.[Damian],
Self-supervised Vision Transformers for Land-cover Segmentation and Classification,
EarthVision22(1421-1430)
IEEE DOI 2210
Training, Earth, Image segmentation, Computational modeling, Conferences, Transformers BibRef

Zhai, X.H.[Xiao-Hua], Kolesnikov, A.[Alexander], Houlsby, N.[Neil], Beyer, L.[Lucas],
Scaling Vision Transformers,
CVPR22(1204-1213)
IEEE DOI 2210
Training, Error analysis, Computational modeling, Neural networks, Memory management, Training data, Transfer/low-shot/long-tail learning BibRef

Guo, J.Y.[Jian-Yuan], Han, K.[Kai], Wu, H.[Han], Tang, Y.[Yehui], Chen, X.H.[Xing-Hao], Wang, Y.H.[Yun-He], Xu, C.[Chang],
CMT: Convolutional Neural Networks Meet Vision Transformers,
CVPR22(12165-12175)
IEEE DOI 2210
Visualization, Image recognition, Force, Object detection, Transformers, Representation learning BibRef

Meng, L.C.[Ling-Chen], Li, H.D.[Heng-Duo], Chen, B.C.[Bor-Chun], Lan, S.Y.[Shi-Yi], Wu, Z.X.[Zu-Xuan], Jiang, Y.G.[Yu-Gang], Lim, S.N.[Ser-Nam],
AdaViT: Adaptive Vision Transformers for Efficient Image Recognition,
CVPR22(12299-12308)
IEEE DOI 2210
Image recognition, Head, Law enforcement, Computational modeling, Redundancy, Transformers, Efficient learning and inferences, retrieval BibRef

Herrmann, C.[Charles], Sargent, K.[Kyle], Jiang, L.[Lu], Zabih, R.[Ramin], Chang, H.[Huiwen], Liu, C.[Ce], Krishnan, D.[Dilip], Sun, D.Q.[De-Qing],
Pyramid Adversarial Training Improves ViT Performance,
CVPR22(13409-13419)
IEEE DOI 2210
Training, Image recognition, Stochastic processes, Transformers, Robustness, retrieval, Recognition: detection BibRef

Li, C.L.[Chang-Lin], Zhuang, B.[Bohan], Wang, G.R.[Guang-Run], Liang, X.D.[Xiao-Dan], Chang, X.J.[Xiao-Jun], Yang, Y.[Yi],
Automated Progressive Learning for Efficient Training of Vision Transformers,
CVPR22(12476-12486)
IEEE DOI 2210
Training, Adaptation models, Schedules, Computational modeling, Estimation, Manuals, Transformers, Representation learning BibRef

Yu, T.[Tong], Khalitov, R.[Ruslan], Cheng, L.[Lei], Yang, Z.R.[Zhi-Rong],
Paramixer: Parameterizing Mixing Links in Sparse Factors Works Better than Dot-Product Self-Attention,
CVPR22(681-690)
IEEE DOI 2210
Protocols, Costs, Scalability, Neural networks, Stacking, Genomics, Transformers, Deep learning architectures and techniques, Representation learning BibRef

Guo, J.Y.[Jian-Yuan], Tang, Y.H.[Ye-Hui], Han, K.[Kai], Chen, X.H.[Xing-Hao], Wu, H.[Han], Xu, C.[Chao], Xu, C.[Chang], Wang, Y.H.[Yun-He],
Hire-MLP: Vision MLP via Hierarchical Rearrangement,
CVPR22(816-826)
IEEE DOI 2210
Representation learning, Image segmentation, Semantics, Object detection, Transformers, Representation learning BibRef

Cheng, B.[Bowen], Misra, I.[Ishan], Schwing, A.G.[Alexander G.], Kirillov, A.[Alexander], Girdhar, R.[Rohit],
Masked-attention Mask Transformer for Universal Image Segmentation,
CVPR22(1280-1289)
IEEE DOI 2210
Image segmentation, Shape, Computational modeling, Semantics, Transformers, Feature extraction, retrieval BibRef

Pu, M.Y.[Meng-Yang], Huang, Y.P.[Ya-Ping], Liu, Y.M.[Yu-Ming], Guan, Q.J.[Qing-Ji], Ling, H.B.[Hai-Bin],
EDTER: Edge Detection with Transformer,
CVPR22(1392-1402)
IEEE DOI 2210
Head, Image edge detection, Semantics, Detectors, Transformers, Feature extraction, Segmentation, grouping and shape analysis, Scene analysis and understanding BibRef

Rangrej, S.B.[Samrudhdhi B.], Srinidhi, C.L.[Chetan L.], Clark, J.J.[James J.],
Consistency driven Sequential Transformers Attention Model for Partially Observable Scenes,
CVPR22(2508-2517)
IEEE DOI 2210
Training, Computational modeling, Imaging, Predictive models, Transformers, Prediction algorithms, Visual reasoning BibRef

Zhu, R.[Rui], Li, Z.Q.[Zheng-Qin], Matai, J.[Janarbek], Porikli, F.M.[Fatih M.], Chandraker, M.[Manmohan],
IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering in Indoor Scenes,
CVPR22(2812-2821)
IEEE DOI 2210
Photorealism, Shape, Computational modeling, Lighting, Transformers, Physics-based vision and shape-from-X BibRef

Ermolov, A.[Aleksandr], Mirvakhabova, L.[Leyla], Khrulkov, V.[Valentin], Sebe, N.[Nicu], Oseledets, I.[Ivan],
Hyperbolic Vision Transformers: Combining Improvements in Metric Learning,
CVPR22(7399-7409)
IEEE DOI 2210
Measurement, Geometry, Visualization, Semantics, Self-supervised learning, Transformer cores, Transformers, Representation learning BibRef

Lee, Y.[Youngwan], Kim, J.[Jonghee], Willette, J.[Jeffrey], Hwang, S.J.[Sung Ju],
MPViT: Multi-Path Vision Transformer for Dense Prediction,
CVPR22(7277-7286)
IEEE DOI 2210
Image segmentation, Semantics, Object detection, Transformers, Feature extraction, Pattern recognition, Recognition: detection, Representation learning BibRef

Zhang, C.Z.[Chong-Zhi], Zhang, M.Y.[Ming-Yuan], Zhang, S.H.[Shang-Hang], Jin, D.S.[Dai-Sheng], Zhou, Q.[Qiang], Cai, Z.A.[Zhong-Ang], Zhao, H.[Haiyu], Liu, X.L.[Xiang-Long], Liu, Z.[Ziwei],
Delving Deep into the Generalization of Vision Transformers under Distribution Shifts,
CVPR22(7267-7276)
IEEE DOI 2210
Training, Representation learning, Systematics, Shape, Taxonomy, Self-supervised learning, Transformers, Recognition: detection, Representation learning BibRef

Hou, Z.[Zhi], Yu, B.[Baosheng], Tao, D.C.[Da-Cheng],
BatchFormer: Learning to Explore Sample Relationships for Robust Representation Learning,
CVPR22(7246-7256)
IEEE DOI 2210
Training, Deep learning, Representation learning, Neural networks, Tail, Transformers, Transfer/low-shot/long-tail learning, Self- semi- meta- unsupervised learning BibRef

Zamir, S.W.[Syed Waqas], Arora, A.[Aditya], Khan, S.[Salman], Hayat, M.[Munawar], Khan, F.S.[Fahad Shahbaz], Yang, M.H.[Ming-Hsuan],
Restormer: Efficient Transformer for High-Resolution Image Restoration,
CVPR22(5718-5729)
IEEE DOI 2210
Computational modeling, Transformer cores, Transformers, Data models, Image restoration, Task analysis, Deep learning architectures and techniques BibRef

Zhao, H.S.[Heng-Shuang], Jiang, L.[Li], Jia, J.Y.[Jia-Ya], Torr, P.H.S.[Philip H.S.], Koltun, V.[Vladlen],
Point Transformer,
ICCV21(16239-16248)
IEEE DOI 2203
Point cloud compression, Measurement, Image segmentation, Semantics, Object detection, Transformer cores, Recognition and classification BibRef

Lin, K.[Kevin], Wang, L.J.[Li-Juan], Liu, Z.C.[Zi-Cheng],
Mesh Graphormer,
ICCV21(12919-12928)
IEEE DOI 2203
Convolutional codes, Solid modeling, Network topology, Transformers, Gestures and body pose BibRef

Casey, E.[Evan], Pérez, V.[Víctor], Li, Z.[Zhuoru],
The Animation Transformer: Visual Correspondence via Segment Matching,
ICCV21(11303-11312)
IEEE DOI 2203
Visualization, Image segmentation, Image color analysis, Production, Animation, Transformers, grouping and shape BibRef

Reizenstein, J.[Jeremy], Shapovalov, R.[Roman], Henzler, P.[Philipp], Sbordone, L.[Luca], Labatut, P.[Patrick], Novotny, D.[David],
Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction,
ICCV21(10881-10891)
IEEE DOI 2203
Award, Marr Prize, HM. Point cloud compression, Transformers, Rendering (computer graphics), Cameras, Image reconstruction, 3D from multiview and other sensors BibRef

Mariotti, O.[Octave], Aodha, O.M.[Oisin Mac], Bilen, H.[Hakan],
ViewNet: Unsupervised Viewpoint Estimation from Conditional Generation,
ICCV21(10398-10408)
IEEE DOI 2203
Training, Annotations, Estimation, Benchmark testing, Transformers, Representation learning, Transfer/Low-shot/Semi/Unsupervised Learning BibRef

Feng, W.X.[Wei-Xin], Wang, Y.J.[Yuan-Jiang], Ma, L.H.[Li-Hua], Yuan, Y.[Ye], Zhang, C.[Chi],
Temporal Knowledge Consistency for Unsupervised Visual Representation Learning,
ICCV21(10150-10160)
IEEE DOI 2203
Training, Representation learning, Visualization, Protocols, Object detection, Semisupervised learning, Transformers, Transfer/Low-shot/Semi/Unsupervised Learning BibRef

Wu, H.P.[Hai-Ping], Xiao, B.[Bin], Codella, N.[Noel], Liu, M.C.[Meng-Chen], Dai, X.Y.[Xi-Yang], Yuan, L.[Lu], Zhang, L.[Lei],
CvT: Introducing Convolutions to Vision Transformers,
ICCV21(22-31)
IEEE DOI 2203
Code, Vision Transformer.
WWW Link. Convolutional codes, Image resolution, Image recognition, Performance gain, Transformers, Distortion, BibRef

Touvron, H.[Hugo], Cord, M.[Matthieu], Sablayrolles, A.[Alexandre], Synnaeve, G.[Gabriel], Jégou, H.[Hervé],
Going deeper with Image Transformers,
ICCV21(32-42)
IEEE DOI 2203
Training, Neural networks, Training data, Data models, Circuit faults, Recognition and classification, Optimization and learning methods BibRef

Zhao, J.W.[Jia-Wei], Yan, K.[Ke], Zhao, Y.F.[Yi-Fan], Guo, X.W.[Xiao-Wei], Huang, F.Y.[Fei-Yue], Li, J.[Jia],
Transformer-based Dual Relation Graph for Multi-label Image Recognition,
ICCV21(163-172)
IEEE DOI 2203
Image recognition, Correlation, Computational modeling, Semantics, Benchmark testing, Representation learning BibRef

Chen, C.F.R.[Chun-Fu Richard], Fan, Q.F.[Quan-Fu], Panda, R.[Rameswar],
CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification,
ICCV21(347-356)
IEEE DOI 2203
Image segmentation, Image recognition, Computational modeling, Semantics, Memory management, Object detection, Representation learning BibRef

Peng, Z.L.[Zhi-Liang], Huang, W.[Wei], Gu, S.Z.[Shan-Zhi], Xie, L.X.[Ling-Xi], Wang, Y.[Yaowei], Jiao, J.B.[Jian-Bin], Ye, Q.X.[Qi-Xiang],
Conformer: Local Features Coupling Global Representations for Visual Recognition,
ICCV21(357-366)
IEEE DOI 2203
Couplings, Representation learning, Visualization, Fuses, Convolution, Object detection, Transformers, Representation learning BibRef

Pan, Z.Z.[Zi-Zheng], Zhuang, B.[Bohan], Liu, J.[Jing], He, H.Y.[Hao-Yu], Cai, J.F.[Jian-Fei],
Scalable Vision Transformers with Hierarchical Pooling,
ICCV21(367-376)
IEEE DOI 2203
Visualization, Image recognition, Computational modeling, Scalability, Transformers, Computational efficiency, Efficient training and inference methods BibRef

Yue, X.Y.[Xiao-Yu], Sun, S.Y.[Shu-Yang], Kuang, Z.H.[Zhang-Hui], Wei, M.[Meng], Torr, P.H.S.[Philip H.S.], Zhang, W.[Wayne], Lin, D.[Dahua],
Vision Transformer with Progressive Sampling,
ICCV21(377-386)
IEEE DOI 2203
Codes, Computational modeling, Interference, Transformers, Feature extraction, Recognition and classification, Representation learning BibRef

Chefer, H.[Hila], Gur, S.[Shir], Wolf, L.B.[Lior B.],
Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers,
ICCV21(387-396)
IEEE DOI 2203
Measurement, Visualization, Image segmentation, Computational modeling, Object detection, BibRef

Yuan, L.[Li], Chen, Y.P.[Yun-Peng], Wang, T.[Tao], Yu, W.H.[Wei-Hao], Shi, Y.J.[Yu-Jun], Jiang, Z.H.[Zi-Hang], Tay, F.E.H.[Francis E. H.], Feng, J.S.[Jia-Shi], Yan, S.C.[Shui-Cheng],
Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet,
ICCV21(538-547)
IEEE DOI 2203
Training, Image resolution, Computational modeling, Image edge detection, Transformers, BibRef

Wu, B.[Bichen], Xu, C.F.[Chen-Feng], Dai, X.L.[Xiao-Liang], Wan, A.[Alvin], Zhang, P.Z.[Pei-Zhao], Yan, Z.C.[Zhi-Cheng], Tomizuka, M.[Masayoshi], Gonzalez, J.[Joseph], Keutzer, K.[Kurt], Vajda, P.[Peter],
Visual Transformers: Where Do Transformers Really Belong in Vision Models?,
ICCV21(579-589)
IEEE DOI 2203
Training, Visualization, Image segmentation, Lips, Computational modeling, Semantics, Vision applications and systems BibRef

Hu, R.H.[Rong-Hang], Singh, A.[Amanpreet],
UniT: Multimodal Multitask Learning with a Unified Transformer,
ICCV21(1419-1429)
IEEE DOI 2203
Training, Natural languages, Object detection, Predictive models, Transformers, Multitasking, Representation learning BibRef

Qiu, Y.[Yue], Yamamoto, S.[Shintaro], Nakashima, K.[Kodai], Suzuki, R.[Ryota], Iwata, K.[Kenji], Kataoka, H.[Hirokatsu], Satoh, Y.[Yutaka],
Describing and Localizing Multiple Changes with Transformers,
ICCV21(1951-1960)
IEEE DOI 2203
Measurement, Location awareness, Codes, Natural languages, Benchmark testing, Transformers, Vision applications and systems BibRef

Song, M.[Myungseo], Choi, J.[Jinyoung], Han, B.H.[Bo-Hyung],
Variable-Rate Deep Image Compression through Spatially-Adaptive Feature Transform,
ICCV21(2360-2369)
IEEE DOI 2203
Training, Image coding, Neural networks, Rate-distortion, Transforms, Network architecture, Computational photography, Low-level and physics-based vision BibRef

Shenga, H.[Hualian], Cai, S.[Sijia], Liu, Y.[Yuan], Deng, B.[Bing], Huang, J.Q.[Jian-Qiang], Hua, X.S.[Xian-Sheng], Zhao, M.J.[Min-Jian],
Improving 3D Object Detection with Channel-wise Transformer,
ICCV21(2723-2732)
IEEE DOI 2203
Point cloud compression, Object detection, Detectors, Transforms, Transformers, Encoding, Detection and localization in 2D and 3D, BibRef

Zhang, P.[Pengchuan], Dai, X.[Xiyang], Yang, J.W.[Jian-Wei], Xiao, B.[Bin], Yuan, L.[Lu], Zhang, L.[Lei], Gao, J.F.[Jian-Feng],
Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding,
ICCV21(2978-2988)
IEEE DOI 2203
Image segmentation, Image coding, Computational modeling, Memory management, Object detection, Transformers, Representation learning BibRef

Dong, Q.[Qi], Tu, Z.W.[Zhuo-Wen], Liao, H.[Haofu], Zhang, Y.T.[Yu-Ting], Mahadevan, V.[Vijay], Soatto, S.[Stefano],
Visual Relationship Detection Using Part-and-Sum Transformers with Composite Queries,
ICCV21(3530-3539)
IEEE DOI 2203
Visualization, Detectors, Transformers, Task analysis, Standards, Detection and localization in 2D and 3D, Representation learning BibRef

Wang, T.[Tao], Yuan, L.[Li], Chen, Y.P.[Yun-Peng], Feng, J.S.[Jia-Shi], Yan, S.C.[Shui-Cheng],
PnP-DETR: Towards Efficient Visual Analysis with Transformers,
ICCV21(4641-4650)
IEEE DOI 2203
Adaptation models, Visualization, Image segmentation, Image recognition, Computational modeling, Redundancy, Detection and localization in 2D and 3D BibRef

Fan, H.Q.[Hao-Qi], Xiong, B.[Bo], Mangalam, K.[Karttikeya], Li, Y.[Yanghao], Yan, Z.C.[Zhi-Cheng], Malik, J.[Jitendra], Feichtenhofer, C.[Christoph],
Multiscale Vision Transformers,
ICCV21(6804-6815)
IEEE DOI 2203
Visualization, Image recognition, Codes, Computational modeling, Transformers, Complexity theory, Recognition and classification BibRef

Mahmood, K.[Kaleel], Mahmood, R.[Rigel], van Dijk, M.[Marten],
On the Robustness of Vision Transformers to Adversarial Examples,
ICCV21(7818-7827)
IEEE DOI 2203
Transformers, Robustness, Adversarial machine learning, Security, Machine learning architectures and formulations BibRef

Chen, X.L.[Xin-Lei], Xie, S.[Saining], He, K.[Kaiming],
An Empirical Study of Training Self-Supervised Vision Transformers,
ICCV21(9620-9629)
IEEE DOI 2203
Training, Benchmark testing, Transformers, Standards, Representation learning, Recognition and classification, Transfer/Low-shot/Semi/Unsupervised Learning BibRef

Caron, M.[Mathilde], Touvron, H.[Hugo], Misra, I.[Ishan], Jegou, H.[Hervé], Mairal, J.[Julien], Bojanowski, P.[Piotr], Joulin, A.[Armand],
Emerging Properties in Self-Supervised Vision Transformers,
ICCV21(9630-9640)
IEEE DOI 2203
Training, Image segmentation, Semantics, Layout, Image retrieval, Representation learning, Transfer/Low-shot/Semi/Unsupervised Learning BibRef

Yuan, Y.[Ye], Weng, X.[Xinshuo], Ou, Y.[Yanglan], Kitani, K.[Kris],
AgentFormer: Agent-Aware Transformers for Socio-Temporal Multi-Agent Forecasting,
ICCV21(9793-9803)
IEEE DOI 2203
Uncertainty, Stochastic processes, Predictive models, Transformers, Encoding, Trajectory, Motion and tracking, Vision for robotics and autonomous vehicles BibRef

Xu, W.J.[Wei-Jian], Xu, Y.F.[Yi-Fan], Chang, T.[Tyler], Tu, Z.W.[Zhuo-Wen],
Co-Scale Conv-Attentional Image Transformers,
ICCV21(9961-9970)
IEEE DOI 2203
Image segmentation, Computational modeling, Object detection, Transformers, Convolutional neural networks, Task analysis, Recognition and classification BibRef

Wu, K.[Kan], Peng, H.W.[Hou-Wen], Chen, M.H.[Ming-Hao], Fu, J.L.[Jian-Long], Chao, H.Y.[Hong-Yang],
Rethinking and Improving Relative Position Encoding for Vision Transformer,
ICCV21(10013-10021)
IEEE DOI 2203
Image coding, Codes, Computational modeling, Transformers, Encoding, Natural language processing, Datasets and evaluation, Recognition and classification BibRef

Bhojanapalli, S.[Srinadh], Chakrabarti, A.[Ayan], Glasner, D.[Daniel], Li, D.[Daliang], Unterthiner, T.[Thomas], Veit, A.[Andreas],
Understanding Robustness of Transformers for Image Classification,
ICCV21(10211-10221)
IEEE DOI 2203
Perturbation methods, Transformers, Robustness, Data models, Convolutional neural networks, Recognition and classification BibRef

Yan, B.[Bin], Peng, H.[Houwen], Fu, J.L.[Jian-Long], Wang, D.[Dong], Lu, H.C.[Hu-Chuan],
Learning Spatio-Temporal Transformer for Visual Tracking,
ICCV21(10428-10437)
IEEE DOI 2203
Visualization, Target tracking, Smoothing methods, Pipelines, Benchmark testing, Transformers, BibRef

Heo, B.[Byeongho], Yun, S.[Sangdoo], Han, D.Y.[Dong-Yoon], Chun, S.[Sanghyuk], Choe, J.[Junsuk], Oh, S.J.[Seong Joon],
Rethinking Spatial Dimensions of Vision Transformers,
ICCV21(11916-11925)
IEEE DOI 2203
Dimensionality reduction, Computational modeling, Object detection, Transformers, Robustness, Recognition and classification BibRef

Voskou, A.[Andreas], Panousis, K.P.[Konstantinos P.], Kosmopoulos, D.[Dimitrios], Metaxas, D.N.[Dimitris N.], Chatzis, S.[Sotirios],
Stochastic Transformer Networks with Linear Competing Units: Application to end-to-end SL Translation,
ICCV21(11926-11935)
IEEE DOI 2203
Training, Memory management, Stochastic processes, Gesture recognition, Benchmark testing, Assistive technologies, BibRef

Ranftl, R.[René], Bochkovskiy, A.[Alexey], Koltun, V.[Vladlen],
Vision Transformers for Dense Prediction,
ICCV21(12159-12168)
IEEE DOI 2203
Image resolution, Semantics, Neural networks, Estimation, Training data, grouping and shape BibRef

Chen, M.H.[Ming-Hao], Peng, H.W.[Hou-Wen], Fu, J.L.[Jian-Long], Ling, H.B.[Hai-Bin],
AutoFormer: Searching Transformers for Visual Recognition,
ICCV21(12250-12260)
IEEE DOI 2203
Training, Convolutional codes, Visualization, Head, Search methods, Manuals, Recognition and classification BibRef

Yang, G.L.[Guang-Lei], Tang, H.[Hao], Ding, M.L.[Ming-Li], Sebe, N.[Nicu], Ricci, E.[Elisa],
Transformer-Based Attention Networks for Continuous Pixel-Wise Prediction,
ICCV21(16249-16259)
IEEE DOI 2203
Correlation, Estimation, Logic gates, Transformers, Natural language processing, Vision applications and systems BibRef

Yuan, K.[Kun], Guo, S.P.[Shao-Peng], Liu, Z.[Ziwei], Zhou, A.[Aojun], Yu, F.W.[Feng-Wei], Wu, W.[Wei],
Incorporating Convolution Designs into Visual Transformers,
ICCV21(559-568)
IEEE DOI 2203
Training, Visualization, Costs, Convolution, Training data, Transformers, Feature extraction, Recognition and classification, Efficient training and inference methods BibRef

Chen, Z.[Zhengsu], Xie, L.X.[Ling-Xi], Niu, J.W.[Jian-Wei], Liu, X.F.[Xue-Feng], Wei, L.[Longhui], Tian, Q.[Qi],
Visformer: The Vision-friendly Transformer,
ICCV21(569-578)
IEEE DOI 2203
Convolutional codes, Training, Visualization, Protocols, Computational modeling, Fitting, Recognition and classification, Representation learning BibRef

Wang, W.[Wenhai], Xie, E.[Enze], Li, X.[Xiang], Fan, D.P.[Deng-Ping], Song, K.[Kaitao], Liang, D.[Ding], Lu, T.[Tong], Luo, P.[Ping], Shao, L.[Ling],
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions,
ICCV21(548-558)
IEEE DOI 2203
Image resolution, Costs, Semantics, Object detection, Transformers, Feature extraction, Recognition and classification, grouping and shape BibRef

Yao, Z.L.[Zhu-Liang], Cao, Y.[Yue], Lin, Y.T.[Yu-Tong], Liu, Z.[Ze], Zhang, Z.[Zheng], Hu, H.[Han],
Leveraging Batch Normalization for Vision Transformers,
NeruArch21(413-422)
IEEE DOI 2112
Training, Transformers, Feeds BibRef

Kim, K.[Kyungmin], Wu, B.C.[Bi-Chen], Dai, X.L.[Xiao-Liang], Zhang, P.Z.[Pei-Zhao], Yan, Z.C.[Zhi-Cheng], Vajda, P.[Peter], Kim, S.[Seon],
Rethinking the Self-Attention in Vision Transformers,
ECV21(3065-3069)
IEEE DOI 2109
Computational modeling, Pattern recognition BibRef

Zhang, Z.X.[Zi-Xiao], Lu, X.Q.[Xiao-Qiang], Cao, G.J.[Guo-Jin], Yang, Y.T.[Yu-Ting], Jiao, L.C.[Li-Cheng], Liu, F.[Fang],
ViT-YOLO: Transformer-Based YOLO for Object Detection,
VisDrone21(2799-2808)
IEEE DOI 2112
Semantics, Detectors, Object detection, Feature extraction, Robustness BibRef

Kong, D.[Daehyeon], Kong, K.[Kyeongbo], Kim, K.[Kyunghun], Min, S.J.[Sung-Jun], Kang, S.J.[Suk-Ju],
Image-Adaptive Hint Generation via Vision Transformer for Outpainting,
WACV22(4029-4038)
IEEE DOI 2202
Image synthesis, Neural networks, Complex networks, Benchmark testing, Transformers, Vision Systems and Applications BibRef

Graham, B.[Ben], El-Nouby, A.[Alaaeldin], Touvron, H.[Hugo], Stock, P.[Pierre], Joulin, A.[Armand], Jégou, H.[Hervé], Douze, M.[Matthijs],
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference,
ICCV21(12239-12249)
IEEE DOI 2203
Training, Image resolution, Neural networks, Parallel processing, Transformers, Feature extraction, Representation learning BibRef

Horváth, J.[János], Baireddy, S.[Sriram], Hao, H.X.[Han-Xiang], Montserrat, D.M.[Daniel Mas], Delp, E.J.[Edward J.],
Manipulation Detection in Satellite Images Using Vision Transformer,
WMF21(1032-1041)
IEEE DOI 2109
BibRef
Earlier: A1, A4, A3, A5, Only:
Manipulation Detection in Satellite Images Using Deep Belief Networks,
WMF20(2832-2840)
IEEE DOI 2008
Image sensors, Satellites, Splicing, Forestry, Tools. Satellites, Image reconstruction, Training, Forgery, Heating systems, Feature extraction BibRef

Beal, J.[Josh], Wu, H.Y.[Hao-Yu], Park, D.H.[Dong Huk], Zhai, A.[Andrew], Kislyuk, D.[Dmitry],
Billion-Scale Pretraining with Vision Transformers for Multi-Task Visual Representations,
WACV22(1431-1440)
IEEE DOI 2202
Visualization, Solid modeling, Systematics, Computational modeling, Transformers, Semi- and Un- supervised Learning BibRef

Chapter on Pattern Recognition, Clustering, Statistics, Grammars, Learning, Neural Nets, Genetic Algorithms continues in
Video Transformers .


Last update:Mar 21, 2023 at 18:34:39