14.5.10.6 Vision Transformers, ViT

Chapter Contents (Back)
Vision Transformers. Transformers. ViT. A subset
See also Attention in Vision Transformers.
See also Patch Based Vision Transformers. Shift, Scale, and Distortion Invariance. Shifted Window:
See also SWIN Transformer. Video specific:
See also Video Transformers. Semantic Segmentation:
See also Vision Transformers for Semantic Segmentation.
See also Zero-Shot Learning.
See also Detection Transformer, DETR Applications.

Bazi, Y.[Yakoub], Bashmal, L.[Laila], Al Rahhal, M.M.[Mohamad M.], Al Dayil, R.[Reham], Al Ajlan, N.[Naif],
Vision Transformers for Remote Sensing Image Classification,
RS(13), No. 3, 2021, pp. xx-yy.
DOI Link 2102
BibRef

Li, T.[Tao], Zhang, Z.[Zheng], Pei, L.[Lishen], Gan, Y.[Yan],
HashFormer: Vision Transformer Based Deep Hashing for Image Retrieval,
SPLetters(29), 2022, pp. 827-831.
IEEE DOI 2204
Transformers, Binary codes, Task analysis, Training, Image retrieval, Feature extraction, Databases, Binary embedding, image retrieval BibRef

Jiang, B.[Bo], Zhao, K.K.[Kang-Kang], Tang, J.[Jin],
RGTransformer: Region-Graph Transformer for Image Representation and Few-Shot Classification,
SPLetters(29), 2022, pp. 792-796.
IEEE DOI 2204
Measurement, Transformers, Image representation, Feature extraction, Visualization, transformer BibRef

Chen, Z.M.[Zhao-Min], Cui, Q.[Quan], Zhao, B.[Borui], Song, R.J.[Ren-Jie], Zhang, X.Q.[Xiao-Qin], Yoshie, O.[Osamu],
SST: Spatial and Semantic Transformers for Multi-Label Image Recognition,
IP(31), 2022, pp. 2570-2583.
IEEE DOI 2204
Correlation, Semantics, Transformers, Image recognition, Task analysis, Training, Feature extraction, label correlation BibRef

Wang, G.H.[Guang-Hui], Li, B.[Bin], Zhang, T.[Tao], Zhang, S.[Shubi],
A Network Combining a Transformer and a Convolutional Neural Network for Remote Sensing Image Change Detection,
RS(14), No. 9, 2022, pp. xx-yy.
DOI Link 2205
BibRef

Luo, G.[Gen], Zhou, Y.[Yiyi], Sun, X.S.[Xiao-Shuai], Wang, Y.[Yan], Cao, L.J.[Liu-Juan], Wu, Y.J.[Yong-Jian], Huang, F.Y.[Fei-Yue], Ji, R.R.[Rong-Rong],
Towards Lightweight Transformer Via Group-Wise Transformation for Vision-and-Language Tasks,
IP(31), 2022, pp. 3386-3398.
IEEE DOI 2205
Transformers, Task analysis, Computational modeling, Benchmark testing, Visualization, Convolution, Head, reference expression comprehension BibRef

Wang, J.Y.[Jia-Yun], Chakraborty, R.[Rudrasis], Yu, S.X.[Stella X.],
Transformer for 3D Point Clouds,
PAMI(44), No. 8, August 2022, pp. 4419-4431.
IEEE DOI 2207
Convolution, Feature extraction, Shape, Semantics, Task analysis, Measurement, point cloud, transformation, deformable, segmentation, 3D detection BibRef

Li, Z.K.[Ze-Kun], Liu, Y.F.[Yu-Fan], Li, B.[Bing], Feng, B.L.[Bai-Lan], Wu, K.[Kebin], Peng, C.W.[Cheng-Wei], Hu, W.M.[Wei-Ming],
SDTP: Semantic-Aware Decoupled Transformer Pyramid for Dense Image Prediction,
CirSysVideo(32), No. 9, September 2022, pp. 6160-6173.
IEEE DOI 2209
Transformers, Semantics, Task analysis, Detectors, Image segmentation, Head, Convolution, Transformer, dense prediction, multi-level interaction BibRef

Wu, J.J.[Jia-Jing], Wei, Z.Q.[Zhi-Qiang], Zhang, J.P.[Jin-Peng], Zhang, Y.S.[Yu-Shi], Jia, D.N.[Dong-Ning], Yin, B.[Bo], Yu, Y.C.[Yun-Chao],
Full-Coupled Convolutional Transformer for Surface-Based Duct Refractivity Inversion,
RS(14), No. 17, 2022, pp. xx-yy.
DOI Link 2209
BibRef

Jiang, K.[Kai], Peng, P.[Peng], Lian, Y.[Youzao], Xu, W.S.[Wei-Sheng],
The encoding method of position embeddings in vision transformer,
JVCIR(89), 2022, pp. 103664.
Elsevier DOI 2212
Vision transformer, Position embeddings, Gabor filters BibRef

Han, K.[Kai], Wang, Y.H.[Yun-He], Chen, H.T.[Han-Ting], Chen, X.H.[Xing-Hao], Guo, J.Y.[Jian-Yuan], Liu, Z.H.[Zhen-Hua], Tang, Y.[Yehui], Xiao, A.[An], Xu, C.J.[Chun-Jing], Xu, Y.X.[Yi-Xing], Yang, Z.H.[Zhao-Hui], Zhang, Y.[Yiman], Tao, D.C.[Da-Cheng],
A Survey on Vision Transformer,
PAMI(45), No. 1, January 2023, pp. 87-110.
IEEE DOI 2212
Survey, Vision Transformer. Transformers, Task analysis, Encoding, Computational modeling, Visualization, Object detection, high-level vision, video BibRef

Hou, Q.[Qibin], Jiang, Z.H.[Zi-Hang], Yuan, L.[Li], Cheng, M.M.[Ming-Ming], Yan, S.C.[Shui-Cheng], Feng, J.S.[Jia-Shi],
Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition,
PAMI(45), No. 1, January 2023, pp. 1328-1334.
IEEE DOI 2212
Transformers, Encoding, Visualization, Convolutional codes, Mixers, Computer architecture, Training data, Vision permutator, deep neural network BibRef

Yu, W.H.[Wei-Hao], Si, C.Y.[Chen-Yang], Zhou, P.[Pan], Luo, M.[Mi], Zhou, Y.C.[Yi-Chen], Feng, J.S.[Jia-Shi], Yan, S.C.[Shui-Cheng], Wang, X.C.[Xin-Chao],
MetaFormer Baselines for Vision,
PAMI(46), No. 2, February 2024, pp. 896-912.
IEEE DOI 2401
BibRef
And: A1, A4, A3, A2, A5, A8, A6, A7:
MetaFormer is Actually What You Need for Vision,
CVPR22(10809-10819)
IEEE DOI 2210
The abstracted architecture of Transformer. Computational modeling, Focusing, Transformers, Task analysis, retrieval BibRef

Zhou, D.[Daquan], Hou, Q.[Qibin], Yang, L.J.[Lin-Jie], Jin, X.J.[Xiao-Jie], Feng, J.S.[Jia-Shi],
Token Selection is a Simple Booster for Vision Transformers,
PAMI(45), No. 11, November 2023, pp. 12738-12746.
IEEE DOI 2310
BibRef

Yuan, L.[Li], Hou, Q.[Qibin], Jiang, Z.H.[Zi-Hang], Feng, J.S.[Jia-Shi], Yan, S.C.[Shui-Cheng],
VOLO: Vision Outlooker for Visual Recognition,
PAMI(45), No. 5, May 2023, pp. 6575-6586.
IEEE DOI 2304
Transformers, Computer architecture, Computational modeling, Training, Data models, Task analysis, Visualization, image classification BibRef

Ren, S.[Sucheng], Zhou, D.[Daquan], He, S.F.[Sheng-Feng], Feng, J.S.[Jia-Shi], Wang, X.C.[Xin-Chao],
Shunted Self-Attention via Multi-Scale Token Aggregation,
CVPR22(10843-10852)
IEEE DOI 2210
Degradation, Deep learning, Costs, Computational modeling, Merging, Efficient learning and inferences BibRef

Wu, Y.H.[Yu-Huan], Liu, Y.[Yun], Zhan, X.[Xin], Cheng, M.M.[Ming-Ming],
P2T: Pyramid Pooling Transformer for Scene Understanding,
PAMI(45), No. 11, November 2023, pp. 12760-12771.
IEEE DOI 2310
BibRef

Li, Y.[Yehao], Yao, T.[Ting], Pan, Y.W.[Ying-Wei], Mei, T.[Tao],
Contextual Transformer Networks for Visual Recognition,
PAMI(45), No. 2, February 2023, pp. 1489-1500.
IEEE DOI 2301
Transformers, Convolution, Visualization, Task analysis, Image recognition, Object detection, Transformer, image recognition BibRef

Wang, H.[Hang], Du, Y.[Youtian], Zhang, Y.[Yabin], Li, S.[Shuai], Zhang, L.[Lei],
One-Stage Visual Relationship Referring With Transformers and Adaptive Message Passing,
IP(32), 2023, pp. 190-202.
IEEE DOI 2301
Visualization, Proposals, Transformers, Task analysis, Detectors, Message passing, Predictive models, gated message passing BibRef

Kiya, H.[Hitoshi], Iijima, R.[Ryota], Maungmaung, A.[Aprilpyone], Kinoshit, Y.[Yuma],
Image and Model Transformation with Secret Key for Vision Transformer,
IEICE(E106-D), No. 1, January 2023, pp. 2-11.
WWW Link. 2301
BibRef

Zhang, H.F.[Hao-Fei], Mao, F.[Feng], Xue, M.Q.[Meng-Qi], Fang, G.F.[Gong-Fan], Feng, Z.L.[Zun-Lei], Song, J.[Jie], Song, M.L.[Ming-Li],
Knowledge Amalgamation for Object Detection With Transformers,
IP(32), 2023, pp. 2093-2106.
IEEE DOI 2304
Transformers, Task analysis, Object detection, Detectors, Training, Feature extraction, Model reusing, vision transformers BibRef

Li, Y.[Ying], Chen, K.[Kehan], Sun, S.L.[Shi-Lei], He, C.[Chu],
Multi-scale homography estimation based on dual feature aggregation transformer,
IET-IPR(17), No. 5, 2023, pp. 1403-1416.
DOI Link 2304
image matching, image registration BibRef

Wang, G.Q.[Guan-Qun], Chen, H.[He], Chen, L.[Liang], Zhuang, Y.[Yin], Zhang, S.H.[Shang-Hang], Zhang, T.[Tong], Dong, H.[Hao], Gao, P.[Peng],
P2FEViT: Plug-and-Play CNN Feature Embedded Hybrid Vision Transformer for Remote Sensing Image Classification,
RS(15), No. 7, 2023, pp. 1773.
DOI Link 2304
BibRef

Zhang, Q.M.[Qi-Ming], Xu, Y.F.[Yu-Fei], Zhang, J.[Jing], Tao, D.C.[Da-Cheng],
ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond,
IJCV(131), No. 5, May 2023, pp. 1141-1162.
Springer DOI 2305
BibRef

Zhang, J.N.[Jiang-Ning], Li, X.T.[Xiang-Tai], Wang, Y.B.[Ya-Biao], Wang, C.J.[Cheng-Jie], Yang, Y.B.[Yi-Bo], Liu, Y.[Yong], Tao, D.C.[Da-Cheng],
EATFormer: Improving Vision Transformer Inspired by Evolutionary Algorithm,
IJCV(132), No. 1, January 2024, pp. 3509-3536.
Springer DOI 2409
BibRef

Fan, X.Y.[Xin-Yi], Liu, H.J.[Hua-Jun],
FlexFormer: Flexible Transformer for efficient visual recognition,
PRL(169), 2023, pp. 95-101.
Elsevier DOI 2305
Vision transformer, Frequency analysis, Image classification BibRef

Cho, S.[Seokju], Hong, S.[Sunghwan], Kim, S.[Seungryong],
CATs++: Boosting Cost Aggregation With Convolutions and Transformers,
PAMI(45), No. 6, June 2023, pp. 7174-7194.
IEEE DOI
WWW Link. 2305
Costs, Transformers, Correlation, Semantics, Feature extraction, Task analysis, Cost aggregation, efficient transformer, semantic visual correspondence BibRef

Wang, Z.W.[Zi-Wei], Wang, C.Y.[Chang-Yuan], Xu, X.W.[Xiu-Wei], Zhou, J.[Jie], Lu, J.W.[Ji-Wen],
Quantformer: Learning Extremely Low-Precision Vision Transformers,
PAMI(45), No. 7, July 2023, pp. 8813-8826.
IEEE DOI 2306
Quantization (signal), Transformers, Computational modeling, Search problems, Object detection, Image color analysis, vision transformers BibRef

Yue, X.Y.[Xiao-Yu], Sun, S.Y.[Shu-Yang], Kuang, Z.H.[Zhang-Hui], Wei, M.[Meng], Torr, P.H.S.[Philip H.S.], Zhang, W.[Wayne], Lin, D.[Dahua],
Vision Transformer with Progressive Sampling,
ICCV21(377-386)
IEEE DOI 2203
Codes, Computational modeling, Interference, Transformers, Feature extraction, Recognition and classification, Representation learning BibRef

Peng, Z.L.[Zhi-Liang], Guo, Z.H.[Zong-Hao], Huang, W.[Wei], Wang, Y.W.[Yao-Wei], Xie, L.X.[Ling-Xi], Jiao, J.B.[Jian-Bin], Tian, Q.[Qi], Ye, Q.X.[Qi-Xiang],
Conformer: Local Features Coupling Global Representations for Recognition and Detection,
PAMI(45), No. 8, August 2023, pp. 9454-9468.
IEEE DOI 2307
Transformers, Feature extraction, Couplings, Visualization, Detectors, Convolution, Object detection, Feature fusion, vision transformer BibRef

Peng, Z.L.[Zhi-Liang], Huang, W.[Wei], Gu, S.Z.[Shan-Zhi], Xie, L.X.[Ling-Xi], Wang, Y.[Yaowei], Jiao, J.B.[Jian-Bin], Ye, Q.X.[Qi-Xiang],
Conformer: Local Features Coupling Global Representations for Visual Recognition,
ICCV21(357-366)
IEEE DOI 2203
Couplings, Representation learning, Visualization, Fuses, Convolution, Object detection, Transformers, Representation learning BibRef

Feng, Z.Z.[Zhan-Zhou], Zhang, S.L.[Shi-Liang],
Efficient Vision Transformer via Token Merger,
IP(32), 2023, pp. 4156-4169.
IEEE DOI 2307
Corporate acquisitions, Transformers, Semantics, Task analysis, Visualization, Merging, Computational efficiency, sparese representation BibRef

Huang, X.Y.[Xin-Yan], Liu, F.[Fang], Cui, Y.H.[Yuan-Hao], Chen, P.[Puhua], Li, L.L.[Ling-Ling], Li, P.F.[Peng-Fang],
Faster and Better: A Lightweight Transformer Network for Remote Sensing Scene Classification,
RS(15), No. 14, 2023, pp. 3645.
DOI Link 2307
BibRef

Zhao, J.X.[Jia-Xuan], Jiao, L.C.[Li-Cheng], Wang, C.[Chao], Liu, X.[Xu], Liu, F.[Fang], Li, L.L.[Ling-Ling], Ma, M.[Mengru], Yang, S.Y.[Shu-Yuan],
Knowledge Guided Evolutionary Transformer for Remote Sensing Scene Classification,
CirSysVideo(34), No. 10, October 2024, pp. 10368-10384.
IEEE DOI 2411
Transformers, Convolutional neural networks, Computer architecture, Scene classification, Feature extraction, graph neural networks BibRef

Zhang, D.[Dan], Ma, W.P.[Wen-Ping], Jiao, L.C.[Li-Cheng], Liu, X.[Xu], Yang, Y.T.[Yu-Ting], Liu, F.[Fang],
Multiple Hierarchical Cross-Scale Transformer for Remote Sensing Scene Classification,
RS(17), No. 1, 2025, pp. 42.
DOI Link 2501
BibRef

Yao, T.[Ting], Li, Y.[Yehao], Pan, Y.W.[Ying-Wei], Wang, Y.[Yu], Zhang, X.P.[Xiao-Ping], Mei, T.[Tao],
Dual Vision Transformer,
PAMI(45), No. 9, September 2023, pp. 10870-10882.
IEEE DOI 2309
Survey, Vision Transformer. BibRef

Rao, Y.M.[Yong-Ming], Liu, Z.[Zuyan], Zhao, W.L.[Wen-Liang], Zhou, J.[Jie], Lu, J.W.[Ji-Wen],
Dynamic Spatial Sparsification for Efficient Vision Transformers and Convolutional Neural Networks,
PAMI(45), No. 9, September 2023, pp. 10883-10897.
IEEE DOI 2309
BibRef

Li, J.[Jie], Liu, Z.[Zhao], Li, L.[Li], Lin, J.Q.[Jun-Qin], Yao, J.[Jian], Tu, J.[Jingmin],
Multi-view convolutional vision transformer for 3D object recognition,
JVCIR(95), 2023, pp. 103906.
Elsevier DOI 2309
Multi-view, 3D object recognition, Feature fusion, Convolutional neural networks BibRef

Shang, J.H.[Jing-Huan], Li, X.[Xiang], Kahatapitiya, K.[Kumara], Lee, Y.C.[Yu-Cheol], Ryoo, M.S.[Michael S.],
StARformer: Transformer With State-Action-Reward Representations for Robot Learning,
PAMI(45), No. 11, November 2023, pp. 12862-12877.
IEEE DOI 2310
BibRef
Earlier: A1, A3, A2, A5, Only:
StARformer: Transformer with State-Action-Reward Representations for Visual Reinforcement Learning,
ECCV22(XXIX:462-479).
Springer DOI 2211
BibRef

Duan, H.R.[Hao-Ran], Long, Y.[Yang], Wang, S.D.[Shi-Dong], Zhang, H.F.[Hao-Feng], Willcocks, C.G.[Chris G.], Shao, L.[Ling],
Dynamic Unary Convolution in Transformers,
PAMI(45), No. 11, November 2023, pp. 12747-12759.
IEEE DOI 2310
BibRef

Qian, S.J.[Sheng-Ju], Zhu, Y.[Yi], Li, W.B.[Wen-Bo], Li, M.[Mu], Jia, J.Y.[Jia-Ya],
What Makes for Good Tokenizers in Vision Transformer?,
PAMI(45), No. 11, November 2023, pp. 13011-13023.
IEEE DOI 2310
BibRef

Sun, W.X.[Wei-Xuan], Qin, Z.[Zhen], Deng, H.[Hui], Wang, J.[Jianyuan], Zhang, Y.[Yi], Zhang, K.[Kaihao], Barnes, N.[Nick], Birchfield, S.[Stan], Kong, L.P.[Ling-Peng], Zhong, Y.[Yiran],
Vicinity Vision Transformer,
PAMI(45), No. 10, October 2023, pp. 12635-12649.
IEEE DOI 2310
BibRef

Cao, C.J.[Chen-Jie], Dong, Q.L.[Qiao-Le], Fu, Y.W.[Yan-Wei],
ZITS++: Image Inpainting by Improving the Incremental Transformer on Structural Priors,
PAMI(45), No. 10, October 2023, pp. 12667-12684.
IEEE DOI 2310
BibRef

Fang, Y.X.[Yu-Xin], Wang, X.G.[Xing-Gang], Wu, R.[Rui], Liu, W.Y.[Wen-Yu],
What Makes for Hierarchical Vision Transformer?,
PAMI(45), No. 10, October 2023, pp. 12714-12720.
IEEE DOI 2310
BibRef

Liu, J.[Jun], Guo, H.R.[Hao-Ran], He, Y.[Yile], Li, H.L.[Hua-Li],
Vision Transformer-Based Ensemble Learning for Hyperspectral Image Classification,
RS(15), No. 21, 2023, pp. 5208.
DOI Link 2311
BibRef

Lin, M.B.[Ming-Bao], Chen, M.Z.[Meng-Zhao], Zhang, Y.X.[Yu-Xin], Shen, C.H.[Chun-Hua], Ji, R.R.[Rong-Rong], Cao, L.J.[Liu-Juan],
Super Vision Transformer,
IJCV(131), No. 12, December 2023, pp. 3136-3151.
Springer DOI 2311
BibRef

Li, Z.Y.[Zhong-Yu], Gao, S.H.[Shang-Hua], Cheng, M.M.[Ming-Ming],
SERE: Exploring Feature Self-Relation for Self-Supervised Transformer,
PAMI(45), No. 12, December 2023, pp. 15619-15631.
IEEE DOI 2311
BibRef

Yuan, Y.H.[Yu-Hui], Liang, W.C.[Wei-Cong], Ding, H.H.[Heng-Hui], Liang, Z.H.[Zhan-Hao], Zhang, C.[Chao], Hu, H.[Han],
Expediting Large-Scale Vision Transformer for Dense Prediction Without Fine-Tuning,
PAMI(46), No. 1, January 2024, pp. 250-266.
IEEE DOI 2312
BibRef

Jiao, J.[Jiayu], Tang, Y.M.[Yu-Ming], Lin, K.Y.[Kun-Yu], Gao, Y.P.[Yi-Peng], Ma, A.J.[Andy J.], Wang, Y.W.[Yao-Wei], Zheng, W.S.[Wei-Shi],
DilateFormer: Multi-Scale Dilated Transformer for Visual Recognition,
MultMed(25), 2023, pp. 8906-8919.
IEEE DOI Code:
HTML Version. 2312
BibRef

Fu, K.[Kexue], Yuan, M.Z.[Ming-Zhi], Liu, S.L.[Shao-Lei], Wang, M.[Manning],
Boosting Point-BERT by Multi-Choice Tokens,
CirSysVideo(34), No. 1, January 2024, pp. 438-447.
IEEE DOI 2401
self-supervised pre-training task.
See also Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling. BibRef

Ghosal, S.S.[Soumya Suvra], Li, Y.X.[Yi-Xuan],
Are Vision Transformers Robust to Spurious Correlations?,
IJCV(132), No. 3, March 2024, pp. 689-709.
Springer DOI 2402
BibRef

Yan, F.Y.[Fang-Yuan], Yan, B.[Bin], Liang, W.[Wei], Pei, M.T.[Ming-Tao],
Token labeling-guided multi-scale medical image classification,
PRL(178), 2024, pp. 28-34.
Elsevier DOI 2402
Medical image classification, Vision transformer, Token labeling BibRef

Li, Y.X.[Yue-Xiang], Huang, Y.W.[Ya-Wen], He, N.[Nanjun], Ma, K.[Kai], Zheng, Y.F.[Ye-Feng],
Improving vision transformer for medical image classification via token-wise perturbation,
JVCIR(98), 2024, pp. 104022.
Elsevier DOI 2402
Self-supervised learning, Vision transformer, Image classification BibRef

Nguyen, H.[Hung], Kim, C.[Chanho], Li, F.[Fuxin],
Space-time recurrent memory network,
CVIU(241), 2024, pp. 103943.
Elsevier DOI 2403
Deep learning architectures and techniques, Segmentation, Memory network, Transformer BibRef

Kheldouni, A.[Amine], Boumhidi, J.[Jaouad],
A Study of Bidirectional Encoder Representations from Transformers for Sequential Recommendations,
ISCV22(1-5)
IEEE DOI 2208
Knowledge engineering, Recurrent neural networks, Predictive models, Markov processes BibRef

Xiao, Q.[Qiao], Zhang, Y.[Yu], Yang, Q.[Qiang],
Selective Random Walk for Transfer Learning in Heterogeneous Label Spaces,
PAMI(46), No. 6, June 2024, pp. 4476-4488.
IEEE DOI 2405
Transfer learning, Bridges, Metalearning, Adaptation models, Training, Task analysis, Transfer learning, selective random walk BibRef

Akkaya, I.B.[Ibrahim Batuhan], Kathiresan, S.S.[Senthilkumar S.], Arani, E.[Elahe], Zonooz, B.[Bahram],
Enhancing performance of vision transformers on small datasets through local inductive bias incorporation,
PR(153), 2024, pp. 110510.
Elsevier DOI Code:
WWW Link. 2405
Vision transformer, Inductive bias, Locality, Small dataset BibRef

Yao, T.[Ting], Li, Y.[Yehao], Pan, Y.W.[Ying-Wei], Mei, T.[Tao],
HIRI-ViT: Scaling Vision Transformer With High Resolution Inputs,
PAMI(46), No. 9, September 2024, pp. 6431-6442.
IEEE DOI 2408
Transformers, Convolution, Convolutional neural networks, Computational efficiency, Spatial resolution, Visualization, vision transformer BibRef

Xu, G.Y.[Guang-Yi], Ye, J.Y.[Jun-Yong], Liu, X.Y.[Xin-Yuan], Wen, X.B.[Xu-Bin], Li, Y.[Youwei], Wang, J.J.[Jing-Jing],
LV-Adapter: Adapting Vision Transformers for Visual Classification with Linear-layers and Vectors,
CVIU(246), 2024, pp. 104049.
Elsevier DOI 2408
Deep learning, Vision Transformers, Fine-tuning, Plug and play, Transfer learning BibRef

Yan, L.Q.[Long-Quan], Yan, R.X.[Rui-Xiang], Chai, B.[Bosong], Geng, G.H.[Guo-Hua], Zhou, P.[Pengbo], Gao, J.[Jian],
DM-GAN: CNN hybrid vits for training GANs under limited data,
PR(156), 2024, pp. 110810.
Elsevier DOI 2408
GAN, Few-shot, Vision transformer, Proprietary artifact image BibRef

Feng, Q.H.[Qi-Hua], Li, P.Y.[Pei-Ya], Lu, Z.X.[Zhi-Xun], Li, C.Z.[Chao-Zhuo], Wang, Z.[Zefan], Liu, Z.Q.[Zhi-Quan], Duan, C.H.[Chun-Hui], Huang, F.[Feiran], Weng, J.[Jian], Yu, P.S.[Philip S.],
EViT: Privacy-Preserving Image Retrieval via Encrypted Vision Transformer in Cloud Computing,
CirSysVideo(34), No. 8, August 2024, pp. 7467-7483.
IEEE DOI Code:
WWW Link. 2408
Feature extraction, Encryption, Codes, Cloud computing, Transform coding, Streaming media, Ciphers, Image retrieval, self-supervised learning BibRef

Wang, H.Y.[Hong-Yu], Ma, S.M.[Shu-Ming], Dong, L.[Li], Huang, S.[Shaohan], Zhang, D.D.[Dong-Dong], Wei, F.[Furu],
DeepNet: Scaling Transformers to 1,000 Layers,
PAMI(46), No. 10, October 2024, pp. 6761-6774.
IEEE DOI 2409
Transformers, Training, Optimization, Stability analysis, Machine translation, Decoding, Computational modeling, Big models, transformers BibRef

Papa, L.[Lorenzo], Russo, P.[Paolo], Amerini, I.[Irene], Zhou, L.P.[Lu-Ping],
A Survey on Efficient Vision Transformers: Algorithms, Techniques, and Performance Benchmarking,
PAMI(46), No. 12, December 2024, pp. 7682-7700.
IEEE DOI 2411
Survey, Vision Transformers. Transformers, Task analysis, Computational modeling, Surveys, Feature extraction, Costs, vision transformer BibRef

Hu, S.C.[Sheng-Chao], Shen, L.[Li], Zhang, Y.[Ya], Chen, Y.X.[Yi-Xin], Tao, D.C.[Da-Cheng],
On Transforming Reinforcement Learning With Transformers: The Development Trajectory,
PAMI(46), No. 12, December 2024, pp. 8580-8599.
IEEE DOI 2411
Transformers, Analytical models, Task analysis, Surveys, Trajectory optimization, Literature survey BibRef

Xu, R.S.[Run-Sheng], Chen, C.J.[Chia-Ju], Tu, Z.Z.[Zheng-Zhong], Yang, M.H.[Ming-Hsuan],
V2X-ViTv2: Improved Vision Transformers for Vehicle-to-Everything Cooperative Perception,
PAMI(47), No. 1, January 2025, pp. 650-662.
IEEE DOI 2412
Vehicle-to-everything, Feature extraction, Transformers, Visualization, Metadata, Location awareness, Laser radar, Robustness, vehicle-to-everything (V2X) BibRef

Xu, R.S.[Run-Sheng], Xiang, H.[Hao], Tu, Z.Z.[Zheng-Zhong], Xia, X.[Xin], Yang, M.H.[Ming-Hsuan], Ma, J.Q.[Jia-Qi],
V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer,
ECCV22(XXIX:107-124).
Springer DOI 2211
BibRef

Xiang, H.[Hao], Zheng, Z.L.[Zhao-Liang], Xia, X.[Xin], Xu, R.S.[Run-Sheng], Gao, L.[Letian], Zhou, Z.W.[Ze-Wei], Han, X.[Xu], Ji, X.[Xinkai], Li, M.X.[Ming-Xi], Meng, Z.L.[Zong-Lin], Jin, L.[Li], Lei, M.Y.[Ming-Yue], Ma, Z.Y.[Zhao-Yang], He, Z.H.[Zi-Hang], Ma, H.X.[Hao-Xuan], Yuan, Y.S.[Yun-Shuang], Zhao, Y.Q.[Ying-Qian], Ma, J.Q.[Jia-Qi],
V2X-Real: A Largs-scale Dataset for Vehicle-to-everything Cooperative Perception,
ECCV24(LII: 455-470).
Springer DOI 2412
BibRef

Xiang, H.[Hao], Xu, R.S.[Run-Sheng], Ma, J.Q.[Jia-Qi],
HM-ViT: Hetero-modal Vehicle-to-Vehicle Cooperative Perception with Vision Transformer,
ICCV23(284-295)
IEEE DOI Code:
WWW Link. 2401
BibRef

Ma, X.[Xiao], Zhang, Z.[Zetian], Yu, R.[Rong], Ji, Z.[Zexuan], Li, M.C.[Ming-Chao], Zhang, Y.H.[Yu-Han], Chen, Q.[Qiang],
SAVE: Encoding spatial interactions for vision transformers,
IVC(152), 2024, pp. 105312.
Elsevier DOI Code:
WWW Link. 2412
Vision transformers, Position encoding, Spatial interactions BibRef


Chen, P.Q.[Pei-Qi], Yu, L.[Lei], Wan, Y.[Yi], Zhang, Y.J.[Yong-Jun], Wang, J.[Jian], Zhong, L.[Liheng], Chen, J.D.[Jing-Dong], Yang, M.[Ming],
Ecomatcher: Efficient Clustering Oriented Matcher for Detector-free Image Matching,
ECCV24(LXVIII: 344-360).
Springer DOI 2412
BibRef

Wang, H.Q.[Hao-Qi], Zhang, T.[Tong], Salzmann, M.[Mathieu],
Sinder: Repairing the Singular Defects of Dinov2,
ECCV24(VII: 20-35).
Springer DOI 2412
Code:
WWW Link. BibRef

Suri, S.[Saksham], Walmer, M.[Matthew], Gupta, K.[Kamal], Shrivastava, A.[Abhinav],
Lift: A Surprisingly Simple Lightweight Feature Transform for Dense Vit Descriptors,
ECCV24(VII: 110-128).
Springer DOI 2412
BibRef

Pan, Z.Z.[Zi-Zheng], Liu, J.[Jing], He, H.Y.[Hao-Yu], Cai, J.F.[Jian-Fei], Zhuang, B.[Bohan],
Stitched VITS are Flexible Vision Backbones,
ECCV24(XLI: 258-274).
Springer DOI 2412
BibRef

Kim, D.H.[Dong-Hyun], Heo, B.[Byeongho], Han, D.Y.[Dong-Yoon],
Densenets Reloaded: Paradigm Shift Beyond Resnets and VITS,
ECCV24(III: 395-415).
Springer DOI 2412
BibRef

Zhang, C.[Chi], Cheng, J.[Jingpu], Li, Q.X.[Qian-Xiao],
An Optimal Control View of Lora and Binary Controller Design for Vision Transformers,
ECCV24(LIII: 144-160).
Springer DOI 2412
BibRef

Koner, R.[Rajat], Jain, G.[Gagan], Jain, P.[Prateek], Tresp, V.[Volker], Paul, S.[Sujoy],
LookupVIT: Compressing Visual Information to a Limited Number of Tokens,
ECCV24(LXXXVI: 322-337).
Springer DOI 2412
BibRef

Zhang, T.[Taolin], Bai, J.[Jiawang], Lu, Z.[Zhihe], Lian, D.Z.[Dong-Ze], Wang, G.[Genping], Wang, X.C.[Xin-Chao], Xia, S.T.[Shu-Tao],
Parameter-efficient and Memory-efficient Tuning for Vision Transformer: A Disentangled Approach,
ECCV24(XLV: 346-363).
Springer DOI 2412
BibRef

Wang, H.Y.[Hai-Yang], Tang, H.[Hao], Jiang, L.[Li], Shi, S.S.[Shao-Shuai], Naeem, M.F.[Muhammad Ferjad], Li, H.S.[Hong-Sheng], Schiele, B.[Bernt], Wang, L.W.[Li-Wei],
Git: Towards Generalist Vision Transformer Through Universal Language Interface,
ECCV24(XXIX: 55-73).
Springer DOI 2412
BibRef

Wu, Z.G.Y.[Zhu-Guan-Yu], Chen, J.X.[Jia-Xin], Zhong, H.[Hanwen], Huang, D.[Di], Wang, Y.H.[Yun-Hong],
Adalog: Post-training Quantization for Vision Transformers with Adaptive Logarithm Quantizer,
ECCV24(XXVII: 411-427).
Springer DOI 2412
BibRef

Jie, S.[Shibo], Tang, Y.[Yehui], Guo, J.[Jianyuan], Deng, Z.H.[Zhi-Hong], Han, K.[Kai], Wang, Y.H.[Yun-He],
Token Compensator: Altering Inference Cost of Vision Transformer Without Re-tuning,
ECCV24(XVI: 76-94).
Springer DOI 2412
BibRef

Xiao, H.[Han], Zheng, W.Z.[Wen-Zhao], Zuo, S.C.[Si-Cheng], Gao, P.[Peng], Zhou, J.[Jie], Lu, J.W.[Ji-Wen],
Spatialformer: Towards Generalizable Vision Transformers with Explicit Spatial Understanding,
ECCV24(XIII: 37-54).
Springer DOI 2412
BibRef

Heo, B.[Byeongho], Park, S.[Song], Han, D.Y.[Dong-Yoon], Yun, S.[Sangdoo],
Rotary Position Embedding for Vision Transformer,
ECCV24(X: 289-305).
Springer DOI 2412
BibRef

Kondo, R.[Ryota], Minoura, H.[Hiroaki], Hirakawa, T.[Tsubasa], Yamashita, T.[Takayoshi], Fujiyoshi, H.[Hironobu],
Binary-Decomposed Vision Transformer: Compressing and Accelerating Vision Transformer by Binary Decomposition,
ICIP24(3600-3605)
IEEE DOI 2411
Visualization, Image coding, Quantization (signal), Accuracy, Computational modeling, Object detection, Binary Decomposition, Vision Transformer BibRef

Bellitto, G.[Giovanni], Sortino, R.[Renato], Spadaro, P.[Paolo], Palazzo, S.[Simone], Salanitri, F.P.[Federica Proietto], Fiameni, G.[Giuseppe], Gavves, E.[Efstratios], Spampinato, C.[Concetto],
Vito: Vision Transformer Optimization Via Knowledge Distillation On Decoders,
ICIP24(493-499)
IEEE DOI 2411
Visualization, Correlation, Predictive models, Benchmark testing, Transformers, Robustness, Inductive bias, Autoregression, Sequence models BibRef

Gani, H.[Hanan], Saadi, N.[Nada], Hussein, N.[Noor], Nandakumar, K.[Karthik],
Multi-Attribute Vision Transformers are Efficient and Robust Learners,
ICIP24(766-772)
IEEE DOI 2411
Training, Transformers, Robustness, Convolutional neural networks, Task analysis, Vision Transformers, Multi-attribute learning, adversarial attacks BibRef

Huang, W.X.[Wen-Xuan], Shen, Y.[Yunhang], Xie, J.[Jiao], Zhang, B.C.[Bao-Chang], He, G.[Gaoqi], Li, K.[Ke], Sun, X.[Xing], Lin, S.H.[Shao-Hui],
A General and Efficient Training for Transformer via Token Expansion,
CVPR24(15783-15792)
IEEE DOI Code:
WWW Link. 2410
Training, Accuracy, Costs, Codes, Pipelines, Computer architecture BibRef

Cho, J.H.[Jang Hyun], Krähenbühl, P.[Philipp],
Language-Conditioned Detection Transformer,
CVPR24(16593-16603)
IEEE DOI Code:
WWW Link. 2410
Training, Codes, Computational modeling, Detectors, Computer architecture, Benchmark testing, Self-training BibRef

Lin, S.[Sihao], Lyu, P.[Pumeng], Liu, D.[Dongrui], Tang, T.[Tao], Liang, X.D.[Xiao-Dan], Song, A.[Andy], Chang, X.J.[Xiao-Jun],
MLP Can Be a Good Transformer Learner,
CVPR24(19489-19498)
IEEE DOI 2410
Computational modeling, Memory management, Redundancy, Transformers, Throughput, Particle measurements, Efficient Inference BibRef

Wang, A.[Ao], Chen, H.[Hui], Lin, Z.J.[Zi-Jia], Han, J.G.[Jun-Gong], Ding, G.G.[Gui-Guang],
Rep ViT: Revisiting Mobile CNN From ViT Perspective,
CVPR24(15909-15920)
IEEE DOI Code:
WWW Link. 2410
Performance evaluation, Codes, Accuracy, Computational modeling, Transformers, Mobile handsets, CNN, ViT BibRef

Weng, H.H.[Hao-Han], Huang, D.[Danqing], Qiao, Y.[Yu], Hu, Z.[Zheng], Lin, C.Y.[Chin-Yew], Zhang, T.[Tong], Chen, C.L.P.[C. L. Philip],
Desigen: A Pipeline for Controllable Design Template Generation,
CVPR24(12721-12732)
IEEE DOI Code:
WWW Link. 2410
Visualization, Pipelines, Layout, Process control, Transformers, design generation, layout generation BibRef

Park, S.[Sungho], Byun, H.R.[Hye-Ran],
Fair-VPT: Fair Visual Prompt Tuning for Image Classification,
CVPR24(12268-12278)
IEEE DOI 2410
Visualization, Contrastive learning, Benchmark testing, Transformers, Linear programming, Decorrelation, FAI, Fairness, Large Vision Model BibRef

Xu, H.Y.[Heng-Yuan], Xiang, L.[Liyao], Ye, H.Y.[Hang-Yu], Yao, D.[Dixi], Chu, P.Z.[Peng-Zhi], Li, B.C.[Bao-Chun],
Permutation Equivariance of Transformers and its Applications,
CVPR24(5987-5996)
IEEE DOI Code:
WWW Link. 2410
Backpropagation, Authorization, Deep learning, Adaptation models, Codes, Computational modeling, Permutation equivariance, Privacy-preserving BibRef

Zhang, Y.Y.[Yi-Yuan], Ding, X.H.[Xiao-Han], Gong, K.X.[Kai-Xiong], Ge, Y.X.[Yi-Xiao], Shan, Y.[Ying], Yue, X.Y.[Xiang-Yu],
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities,
CVPR24(6108-6117)
IEEE DOI Code:
WWW Link. 2410
Point cloud compression, Image recognition, Head, Costs, Codes, Computational modeling, Multimodal Pathway, Network Achitecture BibRef

Kobayashi, T.[Takumi],
Mean-Shift Feature Transformer,
CVPR24(6047-6056)
IEEE DOI Code:
WWW Link. 2410
Analytical models, Costs, Codes, Computational modeling, Transformers, Mean shift, Grouped projection BibRef

Wu, J.[Junyi], Duan, B.[Bin], Kang, W.T.[Wei-Tai], Tang, H.[Hao], Yan, Y.[Yan],
Token Transformation Matters: Towards Faithful Post-Hoc Explanation for Vision Transformer,
CVPR24(10926-10935)
IEEE DOI 2410
Visualization, Correlation, Computational modeling, Perturbation methods, Predictive models, Length measurement, Explainability BibRef

Yun, S.[Seokju], Ro, Y.[Youngmin],
SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design,
CVPR24(5756-5767)
IEEE DOI 2410
Performance evaluation, Head, Accuracy, Redundancy, Graphics processing units, Object detection, CNNs BibRef

Shi, X.Y.[Xin-Yu], Hao, Z.C.[Ze-Cheng], Yu, Z.F.[Zhao-Fei],
SpikingResformer: Bridging ResNet and Vision Transformer in Spiking Neural Networks,
CVPR24(5610-5619)
IEEE DOI Code:
WWW Link. 2410
Energy consumption, Accuracy, Codes, Computer architecture, Spiking neural networks, Transformers, Spiking Neural Networks, Vision Transformer BibRef

Ye, H.C.[Han-Cheng], Yu, C.[Chong], Ye, P.[Peng], Xia, R.[Renqiu], Tang, Y.S.[Yan-Song], Lu, J.W.[Ji-Wen], Chen, T.[Tao], Zhang, B.[Bo],
Once for Both: Single Stage of Importance and Sparsity Search for Vision Transformer Compression,
CVPR24(5578-5588)
IEEE DOI 2410
Dimensionality reduction, Image coding, Costs, Costing, Graphics processing units, Computer architecture BibRef

Zhang, J.[Junyi], Herrmann, C.[Charles], Hur, J.[Junhwa], Chen, E.[Eric], Jampani, V.[Varun], Sun, D.Q.[De-Qing], Yang, M.H.[Ming-Hsuan],
Telling Left from Right: Identifying Geometry-Aware Semantic Correspondence,
CVPR24(3076-3085)
IEEE DOI Code:
WWW Link. 2410
Geometry, Codes, Animals, Semantics, Pose estimation, Benchmark testing, semantic correspondence, diffusion models, vision transformer BibRef

Huang, N.C.[Ning-Chi], Chang, C.C.[Chi-Chih], Lin, W.C.[Wei-Cheng], Taka, E.[Endri], Marculescu, D.[Diana], Wu, K.C.A.[Kai-Chi-Ang],
ELSA: Exploiting Layer-wise N:M Sparsity for Vision Transformer Acceleration,
ECV24(8006-8015)
IEEE DOI Code:
WWW Link. 2410
Training, Degradation, Accuracy, Transformers, Throughput, Software BibRef

Devulapally, A.[Anusha], Khan, M.F.F.[Md Fahim Faysal], Advani, S.[Siddharth], Narayanan, V.[Vijaykrishnan],
Multi-Modal Fusion of Event and RGB for Monocular Depth Estimation Using a Unified Transformer-based Architecture,
MULA24(2081-2089)
IEEE DOI Code:
WWW Link. 2410
Measurement, Accuracy, Recurrent neural networks, Robot vision systems, Estimation, Computer architecture, Vision Transformer BibRef

Yang, Z.D.[Zhen-Dong], Li, Z.[Zhe], Zeng, A.[Ailing], Li, Z.X.[Ze-Xian], Yuan, C.[Chun], Li, Y.[Yu],
ViTKD: Feature-based Knowledge Distillation for Vision Transformers,
PBDL24(1379-1388)
IEEE DOI Code:
WWW Link. 2410
Knowledge engineering, Computational modeling, MIMICs, Transformers BibRef

Mehri, F.[Faridoun], Fayyaz, M.[Mohsen], Baghshah, M.S.[Mahdieh Soleymani], Pilehvar, M.T.[Mohammad Taher],
SkipPLUS: Skip the First Few Layers to Better Explain Vision Transformers,
FaDE-TCV24(204-215)
IEEE DOI Code:
WWW Link. 2410
Training, Animals, Aggregates, Transformers, xAI, Interpretability, Vision Transformers, White-Box Input Attribution Methods BibRef

Jain, S.[Samyak], Dutta, T.[Tanima],
Towards Understanding and Improving Adversarial Robustness of Vision Transformers,
CVPR24(24736-24745)
IEEE DOI 2410
Training, Measurement, Perturbation methods, Design methodology, Transformers, Robustness, adversarial robustness, Vision Transformers BibRef

Yang, S.[Sheng], Bai, J.[Jiawang], Gao, K.[Kuofeng], Yang, Y.[Yong], Li, Y.M.[Yi-Ming], Xia, S.T.[Shu-Tao],
Not All Prompts Are Secure: A Switchable Backdoor Attack Against Pre-trained Vision Transfomers,
CVPR24(24431-24441)
IEEE DOI Code:
WWW Link. 2410
Visualization, Codes, Computational modeling, Force, Switches, Predictive models, Vision Transformers, Visual Prompting, Backdoor, Parameter-Efficient Fine Tuning BibRef

Steitz, J.M.O.[Jan-Martin O.], Roth, S.[Stefan],
Adapters Strike Back,
CVPR24(23449-23459)
IEEE DOI Code:
WWW Link. 2410
Adaptation models, Accuracy, Systematics, Computer architecture, Benchmark testing, Transformers, vision transformer, image classification BibRef

Rangwani, H.[Harsh], Mondal, P.[Pradipto], Mondal, P.[Pradipto], Mishra, M.[Mayank], Asokan, A.R.[Ashish Ramayee], Babu, R.V.[R. Venkatesh],
DeiT-LT: Distillation Strikes Back for Vision Transformer Training on Long-Tailed Datasets,
CVPR24(23396-23406)
IEEE DOI Code:
WWW Link. 2410
Training, Head, Tail, Computer architecture, Transformers, Distance measurement, long-tail-learning, vision transformers, vit, distillation BibRef

Liu, J.Y.[Jin-Yang], Teshome, W.[Wondmgezahu], Ghimire, S.[Sandesh], Sznaier, M.[Mario], Camps, O.[Octavia],
Solving Masked Jigsaw Puzzles with Diffusion Vision Transformers*,
CVPR24(23009-23018)
IEEE DOI 2410
Visualization, Face recognition, Noise reduction, Video sequences, Predictive models, Fasteners, Solving puzzles, diffusion models, data imputation BibRef

Kim, M.[Manjin], Seo, P.H.[Paul Hongsuck], Schmid, C.[Cordelia], Cho, M.[Minsu],
Learning Correlation Structures for Vision Transformers,
CVPR24(18941-18951)
IEEE DOI 2410
Representation learning, Visualization, Correlation, Aggregates, Layout, Transformers, Vision Transformers, correlation modeling, video classification BibRef

Yang, M.[Min], Gao, H.[Huan], Guo, P.[Ping], Wang, L.M.[Li-Min],
Adapting Short-Term Transformers for Action Detection in Untrimmed Videos,
CVPR24(18570-18579)
IEEE DOI 2410
Adaptation models, Computational modeling, Memory management, Detectors, Transformers, Feature extraction, Vision Transformer BibRef

Shi, D.[Dai],
TransNeXt: Robust Foveal Visual Perception for Vision Transformers,
CVPR24(17773-17783)
IEEE DOI 2410
Degradation, Visualization, Accuracy, Image resolution, Stacking, Transformers, Vision Transformer, Visual Backbone, Perceptual Artifacts BibRef

Agiza, A.[Ahmed], Neseem, M.[Marina], Reda, S.[Sherief],
MTLoRA: A Low-Rank Adaptation Approach for Efficient Multi-Task Learning,
CVPR24(16196-16205)
IEEE DOI Code:
WWW Link. 2410
Training, Deep learning, Adaptation models, Accuracy, Instruments, Computer architecture, multi-task learning, vision transformers, hierarchical transformers BibRef

Dong, W.[Wei], Zhang, X.[Xing], Chen, B.[Bihui], Yan, D.W.[Da-Wei], Lin, Z.J.[Zhi-Jun], Yan, Q.[Qingsen], Wang, P.[Peng], Yang, Y.[Yang],
Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach,
CVPR24(16101-16110)
IEEE DOI Code:
WWW Link. 2410
Adaptation models, Codes, Buildings, Transformers, Matrix decomposition, Low-Rank Adaptation BibRef

Wu, J.[Junyi], Kang, W.T.[Wei-Tai], Tang, H.[Hao], Hong, Y.[Yuan], Yan, Y.[Yan],
On the Faithfulness of Vision Transformer Explanations,
CVPR24(10936-10945)
IEEE DOI 2410
Measurement, Heating systems, Correlation, Aggregates, Predictive models, Benchmark testing, Transformer, Explainability BibRef

Navaneet, K.L., Koohpayegani, S.A.[Soroush Abbasi], Sleiman, E.[Essam], Pirsiavash, H.[Hamed],
SlowFormer: Adversarial Attack on Compute and Energy Consumption of Efficient Vision Transformers,
CVPR24(24786-24797)
IEEE DOI Code:
WWW Link. 2410
Training, Adaptation models, Power demand, Computational modeling, Training data, Transformers, Adversarial attack, efficient vision transformers BibRef

Koyun, O.C.[Onur Can], Töreyin, B.U.[Behçet Ugur],
HaLViT: Half of the Weights are Enough,
LargeVM24(3669-3678)
IEEE DOI 2410
Computational modeling, Deep architecture, Transformers, Convolutional neural networks, Efficient, deep learning BibRef

Bafghi, R.A.[Reza Akbarian], Harilal, N.[Nidhin], Monteleoni, C.[Claire], Raissi, M.[Maziar],
Parameter Efficient Fine-tuning of Self-supervised ViTs without Catastrophic Forgetting,
LargeVM24(3679-3684)
IEEE DOI 2410
BibRef
And: LargeVM24(7864-7869)
IEEE DOI 2410
Knowledge engineering, Adaptation models, Learning (artificial intelligence), Artificial neural networks, Catastrophic Forgetting BibRef

Yuan, X.[Xin], Fei, H.L.[Hong-Liang], Baek, J.[Jinoo],
Efficient Transformer Adaptation with Soft Token Merging,
LargeVM24(3658-3668)
IEEE DOI 2410
Training, Accuracy, Costs, Merging, Video sequences, Optimization methods, Transformers BibRef

Edalati, A.[Ali], Hameed, M.G.A.[Marawan Gamal Abdel], Mosleh, A.[Ali],
Generalized Kronecker-based Adapters for Parameter-efficient Fine-tuning of Vision Transformers,
CRV23(97-104)
IEEE DOI 2406
Adaptation models, Tensors, Limiting, Computational modeling, Transformers, Convolutional neural networks BibRef

Marouf, I.E.[Imad Eddine], Tartaglione, E.[Enzo], Lathuiličre, S.[Stéphane],
Mini but Mighty: Finetuning ViTs with Mini Adapters,
WACV24(1721-1730)
IEEE DOI 2404
Training, Costs, Neurons, Transfer learning, Estimation, Computer architecture, Algorithms BibRef

Kim, G.[Gihyun], Kim, J.[Juyeop], Lee, J.S.[Jong-Seok],
Exploring Adversarial Robustness of Vision Transformers in the Spectral Perspective,
WACV24(3964-3973)
IEEE DOI 2404
Deep learning, Perturbation methods, Frequency-domain analysis, Linearity, Transformers, Robustness, High frequency, Algorithms, adversarial attack and defense methods BibRef

Xu, X.[Xuwei], Wang, S.[Sen], Chen, Y.D.[Yu-Dong], Zheng, Y.P.[Yan-Ping], Wei, Z.W.[Zhe-Wei], Liu, J.J.[Jia-Jun],
GTP-ViT: Efficient Vision Transformers via Graph-based Token Propagation,
WACV24(86-95)
IEEE DOI Code:
WWW Link. 2404
Source coding, Computational modeling, Merging, Broadcasting, Transformers, Computational complexity, Algorithms BibRef

Han, Q.[Qiu], Zhang, G.J.[Gong-Jie], Huang, J.X.[Jia-Xing], Gao, P.[Peng], Wei, Z.[Zhang], Lu, S.J.[Shi-Jian],
Efficient MAE towards Large-Scale Vision Transformers,
WACV24(595-604)
IEEE DOI 2404
Measurement, Degradation, Visualization, Runtime, Computational modeling, Transformers, Algorithms BibRef

Park, J.W.[Jong-Woo], Kahatapitiya, K.[Kumara], Kim, D.H.[Dong-Hyun], Sudalairaj, S.[Shivchander], Fan, Q.F.[Quan-Fu], Ryoo, M.S.[Michael S.],
Grafting Vision Transformers,
WACV24(1134-1143)
IEEE DOI Code:
WWW Link. 2404
Codes, Computational modeling, Semantics, Information sharing, Computer architecture, Transformers, Algorithms, Image recognition and understanding BibRef

Shimizu, S.[Shuki], Tamaki, T.[Toru],
Joint learning of images and videos with a single Vision Transformer,
MVA23(1-6)
DOI Link 2403
Training, Image recognition, Machine vision, Transformers, Tuning, Videos BibRef

Ding, S.R.[Shuang-Rui], Zhao, P.S.[Pei-Sen], Zhang, X.P.[Xiao-Peng], Qian, R.[Rui], Xiong, H.K.[Hong-Kai], Tian, Q.[Qi],
Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation,
ICCV23(16899-16910)
IEEE DOI Code:
WWW Link. 2401
BibRef

Chen, M.Z.[Meng-Zhao], Lin, M.[Mingbao], Lin, Z.H.[Zhi-Hang], Zhang, Y.X.[Yu-Xin], Chao, F.[Fei], Ji, R.R.[Rong-Rong],
SMMix: Self-Motivated Image Mixing for Vision Transformers,
ICCV23(17214-17224)
IEEE DOI Code:
WWW Link. 2401
BibRef

Kim, D.[Dahun], Angelova, A.[Anelia], Kuo, W.C.[Wei-Cheng],
Contrastive Feature Masking Open-Vocabulary Vision Transformer,
ICCV23(15556-15566)
IEEE DOI 2401
BibRef

Li, Z.K.[Zhi-Kai], Gu, Q.Y.[Qing-Yi],
I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference,
ICCV23(17019-17029)
IEEE DOI Code:
WWW Link. 2401
BibRef

Frumkin, N.[Natalia], Gope, D.[Dibakar], Marculescu, D.[Diana],
Jumping through Local Minima: Quantization in the Loss Landscape of Vision Transformers,
ICCV23(16932-16942)
IEEE DOI Code:
WWW Link. 2401
BibRef

Li, Z.K.[Zhi-Kai], Xiao, J.R.[Jun-Rui], Yang, L.W.[Lian-Wei], Gu, Q.Y.[Qing-Yi],
RepQ-ViT: Scale Reparameterization for Post-Training Quantization of Vision Transformers,
ICCV23(17181-17190)
IEEE DOI Code:
WWW Link. 2401
BibRef

Havtorn, J.D.[Jakob Drachmann], Royer, A.[Amélie], Blankevoort, T.[Tijmen], Bejnordi, B.E.[Babak Ehteshami],
MSViT: Dynamic Mixed-scale Tokenization for Vision Transformers,
NIVT23(838-848)
IEEE DOI 2401
BibRef

Haurum, J.B.[Joakim Bruslund], Escalera, S.[Sergio], Taylor, G.W.[Graham W.], Moeslund, T.B.[Thomas B.],
Which Tokens to Use? Investigating Token Reduction in Vision Transformers,
NIVT23(773-783)
IEEE DOI Code:
WWW Link. 2401
BibRef

Wang, X.[Xijun], Chu, X.J.[Xiao-Jie], Han, C.[Chunrui], Zhang, X.Y.[Xiang-Yu],
SCSC: Spatial Cross-scale Convolution Module to Strengthen both CNNs and Transformers,
NIVT23(731-741)
IEEE DOI 2401
BibRef

Chen, Y.H.[Yi-Hsin], Weng, Y.C.[Ying-Chieh], Kao, C.H.[Chia-Hao], Chien, C.[Cheng], Chiu, W.C.[Wei-Chen], Peng, W.H.[Wen-Hsiao],
TransTIC: Transferring Transformer-based Image Compression from Human Perception to Machine Perception,
ICCV23(23240-23250)
IEEE DOI 2401
BibRef

Li, Y.[Yanyu], Hu, J.[Ju], Wen, Y.[Yang], Evangelidis, G.[Georgios], Salahi, K.[Kamyar], Wang, Y.Z.[Yan-Zhi], Tulyakov, S.[Sergey], Ren, J.[Jian],
Rethinking Vision Transformers for MobileNet Size and Speed,
ICCV23(16843-16854)
IEEE DOI 2401
BibRef

Nurgazin, M.[Maxat], Tu, N.A.[Nguyen Anh],
A Comparative Study of Vision Transformer Encoders and Few-shot Learning for Medical Image Classification,
CVAMD23(2505-2513)
IEEE DOI 2401
BibRef

Xie, W.[Wei], Zhao, Z.[Zimeng], Li, S.Y.[Shi-Ying], Zuo, B.H.[Bing-Hui], Wang, Y.G.[Yan-Gang],
Nonrigid Object Contact Estimation With Regional Unwrapping Transformer,
ICCV23(9308-9317)
IEEE DOI 2401
BibRef

Vasu, P.K.A.[Pavan Kumar Anasosalu], Gabriel, J.[James], Zhu, J.[Jeff], Tuzel, O.[Oncel], Ranjan, A.[Anurag],
FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization,
ICCV23(5762-5772)
IEEE DOI Code:
WWW Link. 2401
BibRef

Tang, C.[Chen], Zhang, L.L.[Li Lyna], Jiang, H.Q.[Hui-Qiang], Xu, J.H.[Jia-Hang], Cao, T.[Ting], Zhang, Q.[Quanlu], Yang, Y.Q.[Yu-Qing], Wang, Z.[Zhi], Yang, M.[Mao],
ElasticViT: Conflict-aware Supernet Training for Deploying Fast Vision Transformer on Diverse Mobile Devices,
ICCV23(5806-5817)
IEEE DOI 2401
BibRef

Ren, S.[Sucheng], Yang, X.Y.[Xing-Yi], Liu, S.[Songhua], Wang, X.C.[Xin-Chao],
SG-Former: Self-guided Transformer with Evolving Token Reallocation,
ICCV23(5980-5991)
IEEE DOI Code:
WWW Link. 2401
BibRef

Lin, W.F.[Wei-Feng], Wu, Z.H.[Zi-Heng], Chen, J.[Jiayu], Huang, J.[Jun], Jin, L.W.[Lian-Wen],
Scale-Aware Modulation Meet Transformer,
ICCV23(5992-6003)
IEEE DOI Code:
WWW Link. 2401
BibRef

He, Y.F.[Ye-Fei], Lou, Z.Y.[Zhen-Yu], Zhang, L.[Luoming], Liu, J.[Jing], Wu, W.J.[Wei-Jia], Zhou, H.[Hong], Zhuang, B.[Bohan],
BiViT: Extremely Compressed Binary Vision Transformers,
ICCV23(5628-5640)
IEEE DOI 2401
BibRef

Dutson, M.[Matthew], Li, Y.[Yin], Gupta, M.[Mohit],
Eventful Transformers: Leveraging Temporal Redundancy in Vision Transformers,
ICCV23(16865-16877)
IEEE DOI 2401
BibRef

Wang, Z.Q.[Zi-Qing], Fang, Y.T.[Yue-Tong], Cao, J.H.[Jia-Hang], Zhang, Q.[Qiang], Wang, Z.[Zhongrui], Xu, R.[Renjing],
Masked Spiking Transformer,
ICCV23(1761-1771)
IEEE DOI Code:
WWW Link. 2401
BibRef

Peebles, W.[William], Xie, S.[Saining],
Scalable Diffusion Models with Transformers,
ICCV23(4172-4182)
IEEE DOI 2401
BibRef

Mentzer, F.[Fabian], Agustson, E.[Eirikur], Tschannen, M.[Michael],
M2T: Masking Transformers Twice for Faster Decoding,
ICCV23(5317-5326)
IEEE DOI 2401
BibRef

Xiao, H.[Han], Zheng, W.Z.[Wen-Zhao], Zhu, Z.[Zheng], Zhou, J.[Jie], Lu, J.W.[Ji-Wen],
Token-Label Alignment for Vision Transformers,
ICCV23(5472-5481)
IEEE DOI Code:
WWW Link. 2401
BibRef

Yu, R.Y.[Run-Yi], Wang, Z.N.[Zhen-Nan], Wang, Y.H.[Yin-Huai], Li, K.[Kehan], Liu, C.[Chang], Duan, H.[Haoyi], Ji, X.Y.[Xiang-Yang], Chen, J.[Jie],
LaPE: Layer-adaptive Position Embedding for Vision Transformers with Independent Layer Normalization,
ICCV23(5863-5873)
IEEE DOI 2401
BibRef

Roy, A.[Anurag], Verma, V.K.[Vinay K.], Voonna, S.[Sravan], Ghosh, K.[Kripabandhu], Ghosh, S.[Saptarshi], Das, A.[Abir],
Exemplar-Free Continual Transformer with Convolutions,
ICCV23(5874-5884)
IEEE DOI 2401
BibRef

Xu, Y.X.[Yi-Xing], Li, C.[Chao], Li, D.[Dong], Sheng, X.[Xiao], Jiang, F.[Fan], Tian, L.[Lu], Sirasao, A.[Ashish],
FDViT: Improve the Hierarchical Architecture of Vision Transformer,
ICCV23(5927-5937)
IEEE DOI 2401
BibRef

Chen, Y.J.[Yong-Jie], Liu, H.M.[Hong-Min], Yin, H.R.[Hao-Ran], Fan, B.[Bin],
Building Vision Transformers with Hierarchy Aware Feature Aggregation,
ICCV23(5885-5895)
IEEE DOI 2401
BibRef

Quétu, V.[Victor], Milovanovic, M.[Marta], Tartaglione, E.[Enzo],
Sparse Double Descent in Vision Transformers: Real or Phantom Threat?,
CIAP23(II:490-502).
Springer DOI 2312
BibRef

Ak, K.E.[Kenan Emir], Lee, G.G.[Gwang-Gook], Xu, Y.[Yan], Shen, M.W.[Ming-Wei],
Leveraging Efficient Training and Feature Fusion in Transformers for Multimodal Classification,
ICIP23(1420-1424)
IEEE DOI 2312
BibRef

Popovic, N.[Nikola], Paudel, D.P.[Danda Pani], Probst, T.[Thomas], Van Gool, L.J.[Luc J.],
Token-Consistent Dropout For Calibrated Vision Transformers,
ICIP23(1030-1034)
IEEE DOI 2312
BibRef

Sajjadi, M.S.M.[Mehdi S. M.], Mahendran, A.[Aravindh], Kipf, T.[Thomas], Pot, E.[Etienne], Duckworth, D.[Daniel], Lucic, M.[Mario], Greff, K.[Klaus],
RUST: Latent Neural Scene Representations from Unposed Imagery,
CVPR23(17297-17306)
IEEE DOI 2309
BibRef

Bowman, B.[Benjamin], Achille, A.[Alessandro], Zancato, L.[Luca], Trager, M.[Matthew], Perera, P.[Pramuditha], Paolini, G.[Giovanni], Soatto, S.[Stefano],
Ŕ-la-carte Prompt Tuning (APT): Combining Distinct Data Via Composable Prompting,
CVPR23(14984-14993)
IEEE DOI 2309
BibRef

Nakhli, R.[Ramin], Moghadam, P.A.[Puria Azadi], Mi, H.Y.[Hao-Yang], Farahani, H.[Hossein], Baras, A.[Alexander], Gilks, B.[Blake], Bashashati, A.[Ali],
Sparse Multi-Modal Graph Transformer with Shared-Context Processing for Representation Learning of Giga-pixel Images,
CVPR23(11547-11557)
IEEE DOI 2309
BibRef

Gärtner, E.[Erik], Metz, L.[Luke], Andriluka, M.[Mykhaylo], Freeman, C.D.[C. Daniel], Sminchisescu, C.[Cristian],
Transformer-Based Learned Optimization,
CVPR23(11970-11979)
IEEE DOI 2309
BibRef

Li, J.C.[Jia-Chen], Hassani, A.[Ali], Walton, S.[Steven], Shi, H.[Humphrey],
ConvMLP: Hierarchical Convolutional MLPs for Vision,
WFM23(6307-6316)
IEEE DOI 2309
multi-layer perceptron BibRef

Walmer, M.[Matthew], Suri, S.[Saksham], Gupta, K.[Kamal], Shrivastava, A.[Abhinav],
Teaching Matters: Investigating the Role of Supervision in Vision Transformers,
CVPR23(7486-7496)
IEEE DOI 2309
BibRef

Wang, S.G.[Shi-Guang], Xie, T.[Tao], Cheng, J.[Jian], Zhang, X.C.[Xing-Cheng], Liu, H.J.[Hai-Jun],
MDL-NAS: A Joint Multi-domain Learning Framework for Vision Transformer,
CVPR23(20094-20104)
IEEE DOI 2309
BibRef

Ren, S.[Sucheng], Wei, F.Y.[Fang-Yun], Zhang, Z.[Zheng], Hu, H.[Han],
TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models,
CVPR23(3687-3697)
IEEE DOI 2309
BibRef

He, J.F.[Jian-Feng], Gao, Y.[Yuan], Zhang, T.Z.[Tian-Zhu], Zhang, Z.[Zhe], Wu, F.[Feng],
D2Former: Jointly Learning Hierarchical Detectors and Contextual Descriptors via Agent-Based Transformers,
CVPR23(2904-2914)
IEEE DOI 2309
BibRef

Chen, X.Y.[Xuan-Yao], Liu, Z.J.[Zhi-Jian], Tang, H.T.[Hao-Tian], Yi, L.[Li], Zhao, H.[Hang], Han, S.[Song],
SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer,
CVPR23(2061-2070)
IEEE DOI 2309
BibRef

Wei, S.Y.[Si-Yuan], Ye, T.Z.[Tian-Zhu], Zhang, S.[Shen], Tang, Y.[Yao], Liang, J.J.[Jia-Jun],
Joint Token Pruning and Squeezing Towards More Aggressive Compression of Vision Transformers,
CVPR23(2092-2101)
IEEE DOI 2309
BibRef

Lin, Y.B.[Yan-Bo], Bertasius, G.[Gedas],
Siamese Vision Transformers are Scalable Audio-Visual Learners,
ECCV24(XIV: 303-321).
Springer DOI 2412
BibRef

Lin, Y.B.[Yan-Bo], Sung, Y.L.[Yi-Lin], Lei, J.[Jie], Bansal, M.[Mohit], Bertasius, G.[Gedas],
Vision Transformers are Parameter-Efficient Audio-Visual Learners,
CVPR23(2299-2309)
IEEE DOI 2309
BibRef

Das, R.[Rajshekhar], Dukler, Y.[Yonatan], Ravichandran, A.[Avinash], Swaminathan, A.[Ashwin],
Learning Expressive Prompting With Residuals for Vision Transformers,
CVPR23(3366-3377)
IEEE DOI 2309
BibRef

Zheng, M.X.[Meng-Xin], Lou, Q.[Qian], Jiang, L.[Lei],
TrojViT: Trojan Insertion in Vision Transformers,
CVPR23(4025-4034)
IEEE DOI 2309
BibRef

Li, Y.X.[Yan-Xi], Xu, C.[Chang],
Trade-off between Robustness and Accuracy of Vision Transformers,
CVPR23(7558-7568)
IEEE DOI 2309
BibRef

Tarasiou, M.[Michail], Chavez, E.[Erik], Zafeiriou, S.[Stefanos],
ViTs for SITS: Vision Transformers for Satellite Image Time Series,
CVPR23(10418-10428)
IEEE DOI 2309
BibRef

Yu, Z.Z.[Zhong-Zhi], Wu, S.[Shang], Fu, Y.G.[Yong-Gan], Zhang, S.[Shunyao], Lin, Y.Y.C.[Ying-Yan Celine],
Hint-Aug: Drawing Hints from Foundation Vision Transformers towards Boosted Few-shot Parameter-Efficient Tuning,
CVPR23(11102-11112)
IEEE DOI 2309
BibRef

Kim, D.[Dahun], Angelova, A.[Anelia], Kuo, W.C.[Wei-Cheng],
Region-centric Image-Language Pretraining for Open-Vocabulary Detection,
ECCV24(LXIII: 162-179).
Springer DOI 2412
BibRef
Earlier:
Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers,
CVPR23(11144-11154)
IEEE DOI 2309
BibRef

Hou, J.[Ji], Dai, X.L.[Xiao-Liang], He, Z.J.[Zi-Jian], Dai, A.[Angela], Nießner, M.[Matthias],
Mask3D: Pretraining 2D Vision Transformers by Learning Masked 3D Priors,
CVPR23(13510-13519)
IEEE DOI 2309
BibRef

Xu, Z.Z.[Zheng-Zhuo], Liu, R.K.[Rui-Kang], Yang, S.[Shuo], Chai, Z.H.[Zeng-Hao], Yuan, C.[Chun],
Learning Imbalanced Data with Vision Transformers,
CVPR23(15793-15803)
IEEE DOI 2309
BibRef

Zhang, J.P.[Jian-Ping], Huang, Y.Z.[Yi-Zhan], Wu, W.B.[Wei-Bin], Lyu, M.R.[Michael R.],
Transferable Adversarial Attacks on Vision Transformers with Token Gradient Regularization,
CVPR23(16415-16424)
IEEE DOI 2309
BibRef

Yang, H.[Huanrui], Yin, H.X.[Hong-Xu], Shen, M.[Maying], Molchanov, P.[Pavlo], Li, H.[Hai], Kautz, J.[Jan],
Global Vision Transformer Pruning with Hessian-Aware Saliency,
CVPR23(18547-18557)
IEEE DOI 2309
BibRef

Nakamura, R.[Ryo], Kataoka, H.[Hirokatsu], Takashima, S.[Sora], Noriega, E.J.M.[Edgar Josafat Martinez], Yokota, R.[Rio], Inoue, N.[Nakamasa],
Pre-training Vision Transformers with Very Limited Synthesized Images,
ICCV23(20303-20312)
IEEE DOI 2401
BibRef

Takashima, S.[Sora], Hayamizu, R.[Ryo], Inoue, N.[Nakamasa], Kataoka, H.[Hirokatsu], Yokota, R.[Rio],
Visual Atoms: Pre-Training Vision Transformers with Sinusoidal Waves,
CVPR23(18579-18588)
IEEE DOI 2309
BibRef

Liu, Y.J.[Yi-Jiang], Yang, H.R.[Huan-Rui], Dong, Z.[Zhen], Keutzer, K.[Kurt], Du, L.[Li], Zhang, S.H.[Shang-Hang],
NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers,
CVPR23(20321-20330)
IEEE DOI 2309
BibRef

Park, J.[Jeongsoo], Johnson, J.[Justin],
RGB No More: Minimally-Decoded JPEG Vision Transformers,
CVPR23(22334-22346)
IEEE DOI 2309
BibRef

Yu, C.[Chong], Chen, T.[Tao], Gan, Z.X.[Zhong-Xue], Fan, J.Y.[Jia-Yuan],
Boost Vision Transformer with GPU-Friendly Sparsity and Quantization,
CVPR23(22658-22668)
IEEE DOI 2309
BibRef

Bao, F.[Fan], Nie, S.[Shen], Xue, K.W.[Kai-Wen], Cao, Y.[Yue], Li, C.X.[Chong-Xuan], Su, H.[Hang], Zhu, J.[Jun],
All are Worth Words: A ViT Backbone for Diffusion Models,
CVPR23(22669-22679)
IEEE DOI 2309
BibRef

Li, B.[Bonan], Hu, Y.[Yinhan], Nie, X.C.[Xue-Cheng], Han, C.Y.[Cong-Ying], Jiang, X.J.[Xiang-Jian], Guo, T.D.[Tian-De], Liu, L.Q.[Luo-Qi],
DropKey for Vision Transformer,
CVPR23(22700-22709)
IEEE DOI 2309
BibRef

Lan, S.Y.[Shi-Yi], Yang, X.T.[Xi-Tong], Yu, Z.D.[Zhi-Ding], Wu, Z.[Zuxuan], Alvarez, J.M.[Jose M.], Anandkumar, A.[Anima],
Vision Transformers are Good Mask Auto-Labelers,
CVPR23(23745-23755)
IEEE DOI 2309
BibRef

Yu, L.[Lu], Xiang, W.[Wei],
X-Pruner: eXplainable Pruning for Vision Transformers,
CVPR23(24355-24363)
IEEE DOI 2309
BibRef

Singh, A.[Apoorv],
Training Strategies for Vision Transformers for Object Detection,
WAD23(110-118)
IEEE DOI 2309
BibRef

Hukkelĺs, H.[Hĺkon], Lindseth, F.[Frank],
Does Image Anonymization Impact Computer Vision Training?,
WAD23(140-150)
IEEE DOI 2309
BibRef

Marnissi, M.A.[Mohamed Amine],
Revolutionizing Thermal Imaging: GAN-Based Vision Transformers for Image Enhancement,
ICIP23(2735-2739)
IEEE DOI 2312
BibRef

Marnissi, M.A.[Mohamed Amine], Fathallah, A.[Abir],
GAN-based Vision Transformer for High-Quality Thermal Image Enhancement,
GCV23(817-825)
IEEE DOI 2309
BibRef

Scheibenreif, L.[Linus], Mommert, M.[Michael], Borth, D.[Damian],
Masked Vision Transformers for Hyperspectral Image Classification,
EarthVision23(2166-2176)
IEEE DOI 2309
BibRef

Komorowski, P.[Piotr], Baniecki, H.[Hubert], Biecek, P.[Przemyslaw],
Towards Evaluating Explanations of Vision Transformers for Medical Imaging,
XAI4CV23(3726-3732)
IEEE DOI 2309
BibRef

Ronen, T.[Tomer], Levy, O.[Omer], Golbert, A.[Avram],
Vision Transformers with Mixed-Resolution Tokenization,
ECV23(4613-4622)
IEEE DOI 2309
BibRef

Le, P.H.C.[Phuoc-Hoan Charles], Li, X.[Xinlin],
BinaryViT: Pushing Binary Vision Transformers Towards Convolutional Models,
ECV23(4665-4674)
IEEE DOI 2309
BibRef

Ma, D.[Dongning], Zhao, P.F.[Peng-Fei], Jiao, X.[Xun],
PerfHD: Efficient ViT Architecture Performance Ranking using Hyperdimensional Computing,
NAS23(2230-2237)
IEEE DOI 2309
BibRef

Wang, J.[Jun], Alamayreh, O.[Omran], Tondi, B.[Benedetta], Barni, M.[Mauro],
Open Set Classification of GAN-based Image Manipulations via a ViT-based Hybrid Architecture,
WMF23(953-962)
IEEE DOI 2309
BibRef

Tian, R.[Rui], Wu, Z.[Zuxuan], Dai, Q.[Qi], Hu, H.[Han], Qiao, Y.[Yu], Jiang, Y.G.[Yu-Gang],
ResFormer: Scaling ViTs with Multi-Resolution Training,
CVPR23(22721-22731)
IEEE DOI 2309
BibRef

Li, Y.[Yi], Min, K.[Kyle], Tripathi, S.[Subarna], Vasconcelos, N.M.[Nuno M.],
SViTT: Temporal Learning of Sparse Video-Text Transformers,
CVPR23(18919-18929)
IEEE DOI 2309
BibRef

Guo, X.D.[Xin-Dong], Sun, Y.[Yu], Zhao, R.[Rong], Kuang, L.Q.[Li-Qun], Han, X.[Xie],
SWPT: Spherical Window-based Point Cloud Transformer,
ACCV22(I:396-412).
Springer DOI 2307
BibRef

Wang, W.J.[Wen-Ju], Chen, G.[Gang], Zhou, H.R.[Hao-Ran], Wang, X.L.[Xiao-Lin],
OVPT: Optimal Viewset Pooling Transformer for 3d Object Recognition,
ACCV22(I:486-503).
Springer DOI 2307
BibRef

Kim, D.[Daeho], Kim, J.[Jaeil],
Vision Transformer Compression and Architecture Exploration with Efficient Embedding Space Search,
ACCV22(III:524-540).
Springer DOI 2307
BibRef

Lee, Y.S.[Yun-Sung], Lee, G.[Gyuseong], Ryoo, K.[Kwangrok], Go, H.[Hyojun], Park, J.[Jihye], Kim, S.[Seungryong],
Towards Flexible Inductive Bias via Progressive Reparameterization Scheduling,
VIPriors22(706-720).
Springer DOI 2304
Transformers vs. CNN different benefits. Best of both. BibRef

Amir, S.[Shir], Gandelsman, Y.[Yossi], Bagon, S.[Shai], Dekel, T.[Tali],
On the Effectiveness of VIT Features as Local Semantic Descriptors,
SelfLearn22(39-55).
Springer DOI 2304
BibRef

Deng, X.[Xuran], Liu, C.B.[Chuan-Bin], Lu, Z.Y.[Zhi-Ying],
Recombining Vision Transformer Architecture for Fine-grained Visual Categorization,
MMMod23(II: 127-138).
Springer DOI 2304
BibRef

Tonkes, V.[Vincent], Sabatelli, M.[Matthia],
How Well Do Vision Transformers (vts) Transfer to the Non-natural Image Domain? An Empirical Study Involving Art Classification,
VisArt22(234-250).
Springer DOI 2304
BibRef

Rangrej, S.B.[Samrudhdhi B], Liang, K.J.[Kevin J], Hassner, T.[Tal], Clark, J.J.[James J],
GliTr: Glimpse Transformers with Spatiotemporal Consistency for Online Action Prediction,
WACV23(3402-3412)
IEEE DOI 2302
Predictive models, Transformers, Cameras, Spatiotemporal phenomena, Sensors, Observability BibRef

Song, C.H.[Chull Hwan], Yoon, J.Y.[Joo-Young], Choi, S.[Shunghyun], Avrithis, Y.[Yannis],
Boosting vision transformers for image retrieval,
WACV23(107-117)
IEEE DOI 2302
Training, Location awareness, Image retrieval, Self-supervised learning, Image representation, Transformers BibRef

Yang, J.[Jinyu], Liu, J.J.[Jing-Jing], Xu, N.[Ning], Huang, J.Z.[Jun-Zhou],
TVT: Transferable Vision Transformer for Unsupervised Domain Adaptation,
WACV23(520-530)
IEEE DOI 2302
Benchmark testing, Image representation, Transformers, Convolutional neural networks, Task analysis, and algorithms (including transfer) BibRef

Saavedra-Ruiz, M.[Miguel], Morin, S.[Sacha], Paull, L.[Liam],
Monocular Robot Navigation with Self-Supervised Pretrained Vision Transformers,
CRV22(197-204)
IEEE DOI 2301
Adaptation models, Image segmentation, Image resolution, Navigation, Transformers, Robot sensing systems, Visual Servoing BibRef

Patel, K.[Krushi], Bur, A.M.[Andrés M.], Li, F.J.[Feng-Jun], Wang, G.H.[Guang-Hui],
Aggregating Global Features into Local Vision Transformer,
ICPR22(1141-1147)
IEEE DOI 2212
Source coding, Computational modeling, Information processing, Performance gain, Transformers BibRef

Shen, Z.Q.[Zhi-Qiang], Liu, Z.[Zechun], Xing, E.[Eric],
Sliced Recursive Transformer,
ECCV22(XXIV:727-744).
Springer DOI 2211
BibRef

Shao, Y.[Yidi], Loy, C.C.[Chen Change], Dai, B.[Bo],
Transformer with Implicit Edges for Particle-Based Physics Simulation,
ECCV22(XIX:549-564).
Springer DOI 2211
BibRef

Wang, W.[Wen], Zhang, J.[Jing], Cao, Y.[Yang], Shen, Y.L.[Yong-Liang], Tao, D.C.[Da-Cheng],
Towards Data-Efficient Detection Transformers,
ECCV22(IX:88-105).
Springer DOI 2211
BibRef

Lorenzana, M.B.[Marlon Bran], Engstrom, C.[Craig], Chandra, S.S.[Shekhar S.],
Transformer Compressed Sensing Via Global Image Tokens,
ICIP22(3011-3015)
IEEE DOI 2211
Training, Limiting, Image resolution, Neural networks, Image representation, Transformers, MRI BibRef

Lu, X.Y.[Xiao-Yong], Du, S.[Songlin],
NCTR: Neighborhood Consensus Transformer for Feature Matching,
ICIP22(2726-2730)
IEEE DOI 2211
Learning systems, Impedance matching, Aggregates, Pose estimation, Neural networks, Transformers, Local feature matching, graph neural network BibRef

Jeny, A.A.[Afsana Ahsan], Junayed, M.S.[Masum Shah], Islam, M.B.[Md Baharul],
An Efficient End-To-End Image Compression Transformer,
ICIP22(1786-1790)
IEEE DOI 2211
Image coding, Correlation, Limiting, Computational modeling, Rate-distortion, Video compression, Transformers, entropy model BibRef

Bai, J.W.[Jia-Wang], Yuan, L.[Li], Xia, S.T.[Shu-Tao], Yan, S.C.[Shui-Cheng], Li, Z.F.[Zhi-Feng], Liu, W.[Wei],
Improving Vision Transformers by Revisiting High-Frequency Components,
ECCV22(XXIV:1-18).
Springer DOI 2211
BibRef

Li, K.[Kehan], Yu, R.[Runyi], Wang, Z.[Zhennan], Yuan, L.[Li], Song, G.[Guoli], Chen, J.[Jie],
Locality Guidance for Improving Vision Transformers on Tiny Datasets,
ECCV22(XXIV:110-127).
Springer DOI 2211
BibRef

Tu, Z.Z.[Zheng-Zhong], Talebi, H.[Hossein], Zhang, H.[Han], Yang, F.[Feng], Milanfar, P.[Peyman], Bovik, A.C.[Alan C.], Li, Y.[Yinxiao],
MaxViT: Multi-axis Vision Transformer,
ECCV22(XXIV:459-479).
Springer DOI 2211
BibRef

Yang, R.[Rui], Ma, H.L.[Hai-Long], Wu, J.[Jie], Tang, Y.S.[Yan-Song], Xiao, X.F.[Xue-Feng], Zheng, M.[Min], Li, X.[Xiu],
ScalableViT: Rethinking the Context-Oriented Generalization of Vision Transformer,
ECCV22(XXIV:480-496).
Springer DOI 2211
BibRef

Touvron, H.[Hugo], Cord, M.[Matthieu], El-Nouby, A.[Alaaeldin], Verbeek, J.[Jakob], Jégou, H.[Hervé],
Three Things Everyone Should Know About Vision Transformers,
ECCV22(XXIV:497-515).
Springer DOI 2211
BibRef

Touvron, H.[Hugo], Cord, M.[Matthieu], Jégou, H.[Hervé],
DeiT III: Revenge of the ViT,
ECCV22(XXIV:516-533).
Springer DOI 2211
BibRef

Li, Y.H.[Yang-Hao], Mao, H.Z.[Han-Zi], Girshick, R.[Ross], He, K.M.[Kai-Ming],
Exploring Plain Vision Transformer Backbones for Object Detection,
ECCV22(IX:280-296).
Springer DOI 2211
BibRef

Yu, Q.H.[Qi-Hang], Wang, H.Y.[Hui-Yu], Qiao, S.Y.[Si-Yuan], Collins, M.[Maxwell], Zhu, Y.K.[Yu-Kun], Adam, H.[Hartwig], Yuille, A.L.[Alan L.], Chen, L.C.[Liang-Chieh],
k-means Mask Transformer,
ECCV22(XXIX:288-307).
Springer DOI 2211
BibRef

Pham, K.[Khoi], Kafle, K.[Kushal], Lin, Z.[Zhe], Ding, Z.H.[Zhi-Hong], Cohen, S.[Scott], Tran, Q.[Quan], Shrivastava, A.[Abhinav],
Improving Closed and Open-Vocabulary Attribute Prediction Using Transformers,
ECCV22(XXV:201-219).
Springer DOI 2211
BibRef

Yu, W.X.[Wen-Xin], Zhang, H.[Hongru], Lan, T.X.[Tian-Xiang], Hu, Y.C.[Yu-Cheng], Yin, D.[Dong],
CBPT: A New Backbone for Enhancing Information Transmission of Vision Transformers,
ICIP22(156-160)
IEEE DOI 2211
Merging, Information processing, Object detection, Transformers, Computational complexity, Vision Transformer, Backbone BibRef

Takeda, M.[Mana], Yanai, K.[Keiji],
Continual Learning in Vision Transformer,
ICIP22(616-620)
IEEE DOI 2211
Learning systems, Image recognition, Transformers, Natural language processing, Convolutional neural networks, Vision Transformer BibRef

Zhou, W.L.[Wei-Lian], Kamata, S.I.[Sei-Ichiro], Luo, Z.[Zhengbo], Xue, X.[Xi],
Rethinking Unified Spectral-Spatial-Based Hyperspectral Image Classification Under 3D Configuration of Vision Transformer,
ICIP22(711-715)
IEEE DOI 2211
Flowcharts, Correlation, Convolution, Transformers, Hyperspectral image classification, 3D coordinate positional embedding BibRef

Cao, Y.H.[Yun-Hao], Yu, H.[Hao], Wu, J.X.[Jian-Xin],
Training Vision Transformers with only 2040 Images,
ECCV22(XXV:220-237).
Springer DOI 2211
BibRef

Wang, C.[Cong], Xu, H.M.[Hong-Min], Zhang, X.[Xiong], Wang, L.[Li], Zheng, Z.T.[Zhi-Tong], Liu, H.F.[Hai-Feng],
Convolutional Embedding Makes Hierarchical Vision Transformer Stronger,
ECCV22(XX:739-756).
Springer DOI 2211
BibRef

Wu, B.X.[Bo-Xi], Gu, J.D.[Jin-Dong], Li, Z.F.[Zhi-Feng], Cai, D.[Deng], He, X.F.[Xiao-Fei], Liu, W.[Wei],
Towards Efficient Adversarial Training on Vision Transformers,
ECCV22(XIII:307-325).
Springer DOI 2211
BibRef

Zong, Z.F.[Zhuo-Fan], Li, K.C.[Kun-Chang], Song, G.L.[Guang-Lu], Wang, Y.[Yali], Qiao, Y.[Yu], Leng, B.[Biao], Liu, Y.[Yu],
Self-slimmed Vision Transformer,
ECCV22(XI:432-448).
Springer DOI 2211
BibRef

Fayyaz, M.[Mohsen], Koohpayegani, S.A.[Soroush Abbasi], Jafari, F.R.[Farnoush Rezaei], Sengupta, S.[Sunando], Joze, H.R.V.[Hamid Reza Vaezi], Sommerlade, E.[Eric], Pirsiavash, H.[Hamed], Gall, J.[Jürgen],
Adaptive Token Sampling for Efficient Vision Transformers,
ECCV22(XI:396-414).
Springer DOI 2211
BibRef

Weng, Z.J.[Ze-Jia], Yang, X.T.[Xi-Tong], Li, A.[Ang], Wu, Z.X.[Zu-Xuan], Jiang, Y.G.[Yu-Gang],
Semi-supervised Vision Transformers,
ECCV22(XXX:605-620).
Springer DOI 2211
BibRef

Su, T.[Tong], Ye, S.[Shuo], Song, C.Q.[Cheng-Qun], Cheng, J.[Jun],
Mask-Vit: an Object Mask Embedding in Vision Transformer for Fine-Grained Visual Classification,
ICIP22(1626-1630)
IEEE DOI 2211
Knowledge engineering, Visualization, Focusing, Interference, Benchmark testing, Transformers, Feature extraction, Knowledge Embedding BibRef

Gai, L.[Lulu], Chen, W.[Wei], Gao, R.[Rui], Chen, Y.W.[Yan-Wei], Qiao, X.[Xu],
Using Vision Transformers in 3-D Medical Image Classifications,
ICIP22(696-700)
IEEE DOI 2211
Deep learning, Training, Visualization, Transfer learning, Optimization methods, Self-supervised learning, Transformers, 3-D medical image classifications BibRef

Wu, K.[Kan], Zhang, J.[Jinnian], Peng, H.[Houwen], Liu, M.C.[Meng-Chen], Xiao, B.[Bin], Fu, J.L.[Jian-Long], Yuan, L.[Lu],
TinyViT: Fast Pretraining Distillation for Small Vision Transformers,
ECCV22(XXI:68-85).
Springer DOI 2211
BibRef

Gao, L.[Li], Nie, D.[Dong], Li, B.[Bo], Ren, X.F.[Xiao-Feng],
Doubly-Fused ViT: Fuse Information from Vision Transformer Doubly with Local Representation,
ECCV22(XXIII:744-761).
Springer DOI 2211
BibRef

Yao, T.[Ting], Pan, Y.W.[Ying-Wei], Li, Y.[Yehao], Ngo, C.W.[Chong-Wah], Mei, T.[Tao],
Wave-ViT: Unifying Wavelet and Transformers for Visual Representation Learning,
ECCV22(XXV:328-345).
Springer DOI 2211
BibRef

Yuan, Z.H.[Zhi-Hang], Xue, C.H.[Chen-Hao], Chen, Y.Q.[Yi-Qi], Wu, Q.[Qiang], Sun, G.Y.[Guang-Yu],
PTQ4ViT: Post-training Quantization for Vision Transformers with Twin Uniform Quantization,
ECCV22(XII:191-207).
Springer DOI 2211
BibRef

Kong, Z.L.[Zheng-Lun], Dong, P.Y.[Pei-Yan], Ma, X.L.[Xiao-Long], Meng, X.[Xin], Niu, W.[Wei], Sun, M.S.[Meng-Shu], Shen, X.[Xuan], Yuan, G.[Geng], Ren, B.[Bin], Tang, H.[Hao], Qin, M.H.[Ming-Hai], Wang, Y.Z.[Yan-Zhi],
SPViT: Enabling Faster Vision Transformers via Latency-Aware Soft Token Pruning,
ECCV22(XI:620-640).
Springer DOI 2211
BibRef

Pan, J.T.[Jun-Ting], Bulat, A.[Adrian], Tan, F.[Fuwen], Zhu, X.T.[Xia-Tian], Dudziak, L.[Lukasz], Li, H.S.[Hong-Sheng], Tzimiropoulos, G.[Georgios], Martinez, B.[Brais],
EdgeViTs: Competing Light-Weight CNNs on Mobile Devices with Vision Transformers,
ECCV22(XI:294-311).
Springer DOI 2211
BibRef

Liu, Y.[Yong], Mai, S.Q.[Si-Qi], Chen, X.N.[Xiang-Ning], Hsieh, C.J.[Cho-Jui], You, Y.[Yang],
Towards Efficient and Scalable Sharpness-Aware Minimization,
CVPR22(12350-12360)
IEEE DOI 2210

WWW Link. Training, Schedules, Scalability, Perturbation methods, Stochastic processes, Transformers, Minimization, Vision applications and systems BibRef

Ren, P.Z.[Peng-Zhen], Li, C.[Changlin], Wang, G.[Guangrun], Xiao, Y.[Yun], Du, Q.[Qing], Liang, X.D.[Xiao-Dan], Chang, X.J.[Xiao-Jun],
Beyond Fixation: Dynamic Window Visual Transformer,
CVPR22(11977-11987)
IEEE DOI 2210
Performance evaluation, Visualization, Systematics, Computational modeling, Scalability, Transformers, Deep learning architectures and techniques BibRef

Fang, J.[Jiemin], Xie, L.X.[Ling-Xi], Wang, X.G.[Xing-Gang], Zhang, X.P.[Xiao-Peng], Liu, W.Y.[Wen-Yu], Tian, Q.[Qi],
MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger Tokens,
CVPR22(12053-12062)
IEEE DOI 2210
Deep learning, Visualization, Neural networks, Graphics processing units, retrieval BibRef

Sandler, M.[Mark], Zhmoginov, A.[Andrey], Vladymyrov, M.[Max], Jackson, A.[Andrew],
Fine-tuning Image Transformers using Learnable Memory,
CVPR22(12145-12154)
IEEE DOI 2210
Deep learning, Adaptation models, Costs, Computational modeling, Memory management, Transformers, Transfer/low-shot/long-tail learning BibRef

Yu, X.[Xumin], Tang, L.[Lulu], Rao, Y.M.[Yong-Ming], Huang, T.J.[Tie-Jun], Zhou, J.[Jie], Lu, J.W.[Ji-Wen],
Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling,
CVPR22(19291-19300)
IEEE DOI 2210
Point cloud compression, Solid modeling, Computational modeling, Bit error rate, Transformers, Deep learning architectures and techniques BibRef

Park, C.[Chunghyun], Jeong, Y.[Yoonwoo], Cho, M.[Minsu], Park, J.[Jaesik],
Fast Point Transformer,
CVPR22(16928-16937)
IEEE DOI 2210
Point cloud compression, Shape, Semantics, Neural networks, Transformers, grouping and shape analysis BibRef

Tu, Z.Z.[Zheng-Zhong], Talebi, H.[Hossein], Zhang, H.[Han], Yang, F.[Feng], Milanfar, P.[Peyman], Bovik, A.[Alan], Li, Y.X.[Yin-Xiao],
MAXIM: Multi-Axis MLP for Image Processing,
CVPR22(5759-5770)
IEEE DOI 2210

WWW Link. Training, Photography, Adaptation models, Visualization, Computational modeling, Transformers, Low-level vision, Computational photography BibRef

Hou, Z.J.[Ze-Jiang], Kung, S.Y.[Sun-Yuan],
Multi-Dimensional Vision Transformer Compression via Dependency Guided Gaussian Process Search,
EVW22(3668-3677)
IEEE DOI 2210
Adaptation models, Image coding, Head, Computational modeling, Neurons, Gaussian processes, Transformers BibRef

Wang, Y.K.[Yi-Kai], Chen, X.H.[Xing-Hao], Cao, L.[Lele], Huang, W.B.[Wen-Bing], Sun, F.C.[Fu-Chun], Wang, Y.H.[Yun-He],
Multimodal Token Fusion for Vision Transformers,
CVPR22(12176-12185)
IEEE DOI 2210
Point cloud compression, Image segmentation, Shape, Semantics, Object detection, Vision+X BibRef

Zhang, J.N.[Jin-Nian], Peng, H.W.[Hou-Wen], Wu, K.[Kan], Liu, M.C.[Meng-Chen], Xiao, B.[Bin], Fu, J.L.[Jian-Long], Yuan, L.[Lu],
MiniViT: Compressing Vision Transformers with Weight Multiplexing,
CVPR22(12135-12144)
IEEE DOI 2210
Multiplexing, Performance evaluation, Image coding, Codes, Computational modeling, Benchmark testing, Vision applications and systems BibRef

Chen, T.L.[Tian-Long], Zhang, Z.Y.[Zhen-Yu], Cheng, Y.[Yu], Awadallah, A.[Ahmed], Wang, Z.Y.[Zhang-Yang],
The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy,
CVPR22(12010-12020)
IEEE DOI 2210
Training, Convolutional codes, Deep learning, Computational modeling, Redundancy, Deep learning architectures and techniques BibRef

Yin, H.X.[Hong-Xu], Vahdat, A.[Arash], Alvarez, J.M.[Jose M.], Mallya, A.[Arun], Kautz, J.[Jan], Molchanov, P.[Pavlo],
A-ViT: Adaptive Tokens for Efficient Vision Transformer,
CVPR22(10799-10808)
IEEE DOI 2210
Training, Adaptive systems, Network architecture, Transformers, Throughput, Hardware, Complexity theory, Efficient learning and inferences BibRef

Lu, J.H.[Jia-Hao], Zhang, X.S.[Xi Sheryl], Zhao, T.L.[Tian-Li], He, X.Y.[Xiang-Yu], Cheng, J.[Jian],
APRIL: Finding the Achilles' Heel on Privacy for Vision Transformers,
CVPR22(10041-10050)
IEEE DOI 2210
Privacy, Data privacy, Federated learning, Computational modeling, Training data, Transformers, Privacy and federated learning BibRef

Hatamizadeh, A.[Ali], Yin, H.X.[Hong-Xu], Roth, H.[Holger], Li, W.Q.[Wen-Qi], Kautz, J.[Jan], Xu, D.[Daguang], Molchanov, P.[Pavlo],
GradViT: Gradient Inversion of Vision Transformers,
CVPR22(10011-10020)
IEEE DOI 2210
Measurement, Differential privacy, Neural networks, Transformers, Security, Iterative methods, Privacy and federated learning BibRef

Zhang, H.F.[Hao-Fei], Duan, J.R.[Jia-Rui], Xue, M.Q.[Meng-Qi], Song, J.[Jie], Sun, L.[Li], Song, M.L.[Ming-Li],
Bootstrapping ViTs: Towards Liberating Vision Transformers from Pre-training,
CVPR22(8934-8943)
IEEE DOI 2210
Training, Upper bound, Neural networks, Training data, Network architecture, Transformers, Computer vision theory, Efficient learning and inferences BibRef

Chavan, A.[Arnav], Shen, Z.Q.[Zhi-Qiang], Liu, Z.[Zhuang], Liu, Z.[Zechun], Cheng, K.T.[Kwang-Ting], Xing, E.[Eric],
Vision Transformer Slimming: Multi-Dimension Searching in Continuous Optimization Space,
CVPR22(4921-4931)
IEEE DOI 2210
Training, Performance evaluation, Image coding, Force, Graphics processing units, Vision applications and systems BibRef

Chen, R.J.[Richard J.], Chen, C.[Chengkuan], Li, Y.C.[Yi-Cong], Chen, T.Y.[Tiffany Y.], Trister, A.D.[Andrew D.], Krishnan, R.G.[Rahul G.], Mahmood, F.[Faisal],
Scaling Vision Transformers to Gigapixel Images via Hierarchical Self-Supervised Learning,
CVPR22(16123-16134)
IEEE DOI 2210
Training, Visualization, Self-supervised learning, Image representation, Transformers, Self- semi- meta- unsupervised learning BibRef

Zhai, X.H.[Xiao-Hua], Kolesnikov, A.[Alexander], Houlsby, N.[Neil], Beyer, L.[Lucas],
Scaling Vision Transformers,
CVPR22(1204-1213)
IEEE DOI 2210
Training, Error analysis, Computational modeling, Neural networks, Memory management, Training data, Transfer/low-shot/long-tail learning BibRef

Guo, J.Y.[Jian-Yuan], Han, K.[Kai], Wu, H.[Han], Tang, Y.[Yehui], Chen, X.H.[Xing-Hao], Wang, Y.H.[Yun-He], Xu, C.[Chang],
CMT: Convolutional Neural Networks Meet Vision Transformers,
CVPR22(12165-12175)
IEEE DOI 2210
Visualization, Image recognition, Force, Object detection, Transformers, Representation learning BibRef

Meng, L.C.[Ling-Chen], Li, H.D.[Heng-Duo], Chen, B.C.[Bor-Chun], Lan, S.Y.[Shi-Yi], Wu, Z.X.[Zu-Xuan], Jiang, Y.G.[Yu-Gang], Lim, S.N.[Ser-Nam],
AdaViT: Adaptive Vision Transformers for Efficient Image Recognition,
CVPR22(12299-12308)
IEEE DOI 2210
Image recognition, Head, Law enforcement, Computational modeling, Redundancy, Transformers, Efficient learning and inferences, retrieval BibRef

Herrmann, C.[Charles], Sargent, K.[Kyle], Jiang, L.[Lu], Zabih, R.[Ramin], Chang, H.[Huiwen], Liu, C.[Ce], Krishnan, D.[Dilip], Sun, D.Q.[De-Qing],
Pyramid Adversarial Training Improves ViT Performance,
CVPR22(13409-13419)
IEEE DOI 2210
Training, Image recognition, Stochastic processes, Transformers, Robustness, retrieval, Recognition: detection BibRef

Li, C.L.[Chang-Lin], Zhuang, B.[Bohan], Wang, G.R.[Guang-Run], Liang, X.D.[Xiao-Dan], Chang, X.J.[Xiao-Jun], Yang, Y.[Yi],
Automated Progressive Learning for Efficient Training of Vision Transformers,
CVPR22(12476-12486)
IEEE DOI 2210
Training, Adaptation models, Schedules, Computational modeling, Estimation, Manuals, Transformers, Representation learning BibRef

Pu, M.Y.[Meng-Yang], Huang, Y.P.[Ya-Ping], Liu, Y.M.[Yu-Ming], Guan, Q.J.[Qing-Ji], Ling, H.B.[Hai-Bin],
EDTER: Edge Detection with Transformer,
CVPR22(1392-1402)
IEEE DOI 2210
Head, Image edge detection, Semantics, Detectors, Transformers, Feature extraction, Segmentation, grouping and shape analysis, Scene analysis and understanding BibRef

Zhu, R.[Rui], Li, Z.Q.[Zheng-Qin], Matai, J.[Janarbek], Porikli, F.M.[Fatih M.], Chandraker, M.[Manmohan],
IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering in Indoor Scenes,
CVPR22(2812-2821)
IEEE DOI 2210
Photorealism, Shape, Computational modeling, Lighting, Transformers, Physics-based vision and shape-from-X BibRef

Ermolov, A.[Aleksandr], Mirvakhabova, L.[Leyla], Khrulkov, V.[Valentin], Sebe, N.[Nicu], Oseledets, I.[Ivan],
Hyperbolic Vision Transformers: Combining Improvements in Metric Learning,
CVPR22(7399-7409)
IEEE DOI 2210
Measurement, Geometry, Visualization, Semantics, Self-supervised learning, Transformer cores, Transformers, Representation learning BibRef

Zhang, C.Z.[Chong-Zhi], Zhang, M.Y.[Ming-Yuan], Zhang, S.H.[Shang-Hang], Jin, D.S.[Dai-Sheng], Zhou, Q.[Qiang], Cai, Z.A.[Zhong-Ang], Zhao, H.[Haiyu], Liu, X.L.[Xiang-Long], Liu, Z.W.[Zi-Wei],
Delving Deep into the Generalization of Vision Transformers under Distribution Shifts,
CVPR22(7267-7276)
IEEE DOI 2210
Training, Representation learning, Systematics, Shape, Taxonomy, Self-supervised learning, Transformers, Recognition: detection, Representation learning BibRef

Hou, Z.[Zhi], Yu, B.[Baosheng], Tao, D.C.[Da-Cheng],
BatchFormer: Learning to Explore Sample Relationships for Robust Representation Learning,
CVPR22(7246-7256)
IEEE DOI 2210
Training, Deep learning, Representation learning, Neural networks, Tail, Transformers, Transfer/low-shot/long-tail learning, Self- semi- meta- unsupervised learning BibRef

Zamir, S.W.[Syed Waqas], Arora, A.[Aditya], Khan, S.[Salman], Hayat, M.[Munawar], Khan, F.S.[Fahad Shahbaz], Yang, M.H.[Ming-Hsuan],
Restormer: Efficient Transformer for High-Resolution Image Restoration,
CVPR22(5718-5729)
IEEE DOI 2210
Computational modeling, Transformer cores, Transformers, Data models, Image restoration, Task analysis, Deep learning architectures and techniques BibRef

Lin, K.[Kevin], Wang, L.J.[Li-Juan], Liu, Z.C.[Zi-Cheng],
Mesh Graphormer,
ICCV21(12919-12928)
IEEE DOI 2203
Convolutional codes, Solid modeling, Network topology, Transformers, Gestures and body pose BibRef

Casey, E.[Evan], Pérez, V.[Víctor], Li, Z.R.[Zhuo-Ru],
The Animation Transformer: Visual Correspondence via Segment Matching,
ICCV21(11303-11312)
IEEE DOI 2203
Visualization, Image segmentation, Image color analysis, Production, Animation, Transformers, grouping and shape BibRef

Reizenstein, J.[Jeremy], Shapovalov, R.[Roman], Henzler, P.[Philipp], Sbordone, L.[Luca], Labatut, P.[Patrick], Novotny, D.[David],
Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction,
ICCV21(10881-10891)
IEEE DOI 2203
Award, Marr Prize, HM. Point cloud compression, Transformers, Rendering (computer graphics), Cameras, Image reconstruction, 3D from multiview and other sensors BibRef

Feng, W.X.[Wei-Xin], Wang, Y.J.[Yuan-Jiang], Ma, L.H.[Li-Hua], Yuan, Y.[Ye], Zhang, C.[Chi],
Temporal Knowledge Consistency for Unsupervised Visual Representation Learning,
ICCV21(10150-10160)
IEEE DOI 2203
Training, Representation learning, Visualization, Protocols, Object detection, Semisupervised learning, Transformers, Transfer/Low-shot/Semi/Unsupervised Learning BibRef

Wu, H.P.[Hai-Ping], Xiao, B.[Bin], Codella, N.[Noel], Liu, M.C.[Meng-Chen], Dai, X.Y.[Xi-Yang], Yuan, L.[Lu], Zhang, L.[Lei],
CvT: Introducing Convolutions to Vision Transformers,
ICCV21(22-31)
IEEE DOI 2203
Code, Vision Transformer.
WWW Link. Convolutional codes, Image resolution, Image recognition, Performance gain, Transformers, Distortion, BibRef

Touvron, H.[Hugo], Cord, M.[Matthieu], Sablayrolles, A.[Alexandre], Synnaeve, G.[Gabriel], Jégou, H.[Hervé],
Going deeper with Image Transformers,
ICCV21(32-42)
IEEE DOI 2203
Training, Neural networks, Training data, Data models, Circuit faults, Recognition and classification, Optimization and learning methods BibRef

Zhao, J.W.[Jia-Wei], Yan, K.[Ke], Zhao, Y.F.[Yi-Fan], Guo, X.W.[Xiao-Wei], Huang, F.Y.[Fei-Yue], Li, J.[Jia],
Transformer-based Dual Relation Graph for Multi-label Image Recognition,
ICCV21(163-172)
IEEE DOI 2203
Image recognition, Correlation, Computational modeling, Semantics, Benchmark testing, Representation learning BibRef

Pan, Z.Z.[Zi-Zheng], Zhuang, B.[Bohan], Liu, J.[Jing], He, H.Y.[Hao-Yu], Cai, J.F.[Jian-Fei],
Scalable Vision Transformers with Hierarchical Pooling,
ICCV21(367-376)
IEEE DOI 2203
Visualization, Image recognition, Computational modeling, Scalability, Transformers, Computational efficiency, Efficient training and inference methods BibRef

Yuan, L.[Li], Chen, Y.P.[Yun-Peng], Wang, T.[Tao], Yu, W.H.[Wei-Hao], Shi, Y.J.[Yu-Jun], Jiang, Z.H.[Zi-Hang], Tay, F.E.H.[Francis E. H.], Feng, J.S.[Jia-Shi], Yan, S.C.[Shui-Cheng],
Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet,
ICCV21(538-547)
IEEE DOI 2203
Training, Image resolution, Computational modeling, Image edge detection, Transformers, BibRef

Wu, B.[Bichen], Xu, C.F.[Chen-Feng], Dai, X.L.[Xiao-Liang], Wan, A.[Alvin], Zhang, P.Z.[Pei-Zhao], Yan, Z.C.[Zhi-Cheng], Tomizuka, M.[Masayoshi], Gonzalez, J.[Joseph], Keutzer, K.[Kurt], Vajda, P.[Peter],
Visual Transformers: Where Do Transformers Really Belong in Vision Models?,
ICCV21(579-589)
IEEE DOI 2203
Training, Visualization, Image segmentation, Lips, Computational modeling, Semantics, Vision applications and systems BibRef

Hu, R.H.[Rong-Hang], Singh, A.[Amanpreet],
UniT: Multimodal Multitask Learning with a Unified Transformer,
ICCV21(1419-1429)
IEEE DOI 2203
Training, Natural languages, Object detection, Predictive models, Transformers, Multitasking, Representation learning BibRef

Qiu, Y.[Yue], Yamamoto, S.[Shintaro], Nakashima, K.[Kodai], Suzuki, R.[Ryota], Iwata, K.[Kenji], Kataoka, H.[Hirokatsu], Satoh, Y.[Yutaka],
Describing and Localizing Multiple Changes with Transformers,
ICCV21(1951-1960)
IEEE DOI 2203
Measurement, Location awareness, Codes, Natural languages, Benchmark testing, Transformers, Vision applications and systems BibRef

Song, M.[Myungseo], Choi, J.[Jinyoung], Han, B.H.[Bo-Hyung],
Variable-Rate Deep Image Compression through Spatially-Adaptive Feature Transform,
ICCV21(2360-2369)
IEEE DOI 2203
Training, Image coding, Neural networks, Rate-distortion, Transforms, Network architecture, Computational photography, Low-level and physics-based vision BibRef

Shenga, H.[Hualian], Cai, S.[Sijia], Liu, Y.[Yuan], Deng, B.[Bing], Huang, J.Q.[Jian-Qiang], Hua, X.S.[Xian-Sheng], Zhao, M.J.[Min-Jian],
Improving 3D Object Detection with Channel-wise Transformer,
ICCV21(2723-2732)
IEEE DOI 2203
Point cloud compression, Object detection, Detectors, Transforms, Transformers, Encoding, Detection and localization in 2D and 3D, BibRef

Zhang, P.C.[Peng-Chuan], Dai, X.[Xiyang], Yang, J.W.[Jian-Wei], Xiao, B.[Bin], Yuan, L.[Lu], Zhang, L.[Lei], Gao, J.F.[Jian-Feng],
Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding,
ICCV21(2978-2988)
IEEE DOI 2203
Image segmentation, Image coding, Computational modeling, Memory management, Object detection, Transformers, Representation learning BibRef

Dong, Q.[Qi], Tu, Z.W.[Zhuo-Wen], Liao, H.F.[Hao-Fu], Zhang, Y.T.[Yu-Ting], Mahadevan, V.[Vijay], Soatto, S.[Stefano],
Visual Relationship Detection Using Part-and-Sum Transformers with Composite Queries,
ICCV21(3530-3539)
IEEE DOI 2203
Visualization, Detectors, Transformers, Task analysis, Standards, Detection and localization in 2D and 3D, Representation learning BibRef

Fan, H.Q.[Hao-Qi], Xiong, B.[Bo], Mangalam, K.[Karttikeya], Li, Y.[Yanghao], Yan, Z.C.[Zhi-Cheng], Malik, J.[Jitendra], Feichtenhofer, C.[Christoph],
Multiscale Vision Transformers,
ICCV21(6804-6815)
IEEE DOI 2203
Visualization, Image recognition, Codes, Computational modeling, Transformers, Complexity theory, Recognition and classification BibRef

Mahmood, K.[Kaleel], Mahmood, R.[Rigel], van Dijk, M.[Marten],
On the Robustness of Vision Transformers to Adversarial Examples,
ICCV21(7818-7827)
IEEE DOI 2203
Transformers, Robustness, Adversarial machine learning, Security, Machine learning architectures and formulations BibRef

Chen, X.L.[Xin-Lei], Xie, S.[Saining], He, K.[Kaiming],
An Empirical Study of Training Self-Supervised Vision Transformers,
ICCV21(9620-9629)
IEEE DOI 2203
Training, Benchmark testing, Transformers, Standards, Representation learning, Recognition and classification, Transfer/Low-shot/Semi/Unsupervised Learning BibRef

Yuan, Y.[Ye], Weng, X.[Xinshuo], Ou, Y.[Yanglan], Kitani, K.[Kris],
AgentFormer: Agent-Aware Transformers for Socio-Temporal Multi-Agent Forecasting,
ICCV21(9793-9803)
IEEE DOI 2203
Uncertainty, Stochastic processes, Predictive models, Transformers, Encoding, Trajectory, Motion and tracking, Vision for robotics and autonomous vehicles BibRef

Wu, K.[Kan], Peng, H.W.[Hou-Wen], Chen, M.H.[Ming-Hao], Fu, J.L.[Jian-Long], Chao, H.Y.[Hong-Yang],
Rethinking and Improving Relative Position Encoding for Vision Transformer,
ICCV21(10013-10021)
IEEE DOI 2203
Image coding, Codes, Computational modeling, Transformers, Encoding, Natural language processing, Datasets and evaluation, Recognition and classification BibRef

Bhojanapalli, S.[Srinadh], Chakrabarti, A.[Ayan], Glasner, D.[Daniel], Li, D.[Daliang], Unterthiner, T.[Thomas], Veit, A.[Andreas],
Understanding Robustness of Transformers for Image Classification,
ICCV21(10211-10221)
IEEE DOI 2203
Perturbation methods, Transformers, Robustness, Data models, Convolutional neural networks, Recognition and classification BibRef

Yan, B.[Bin], Peng, H.[Houwen], Fu, J.L.[Jian-Long], Wang, D.[Dong], Lu, H.C.[Hu-Chuan],
Learning Spatio-Temporal Transformer for Visual Tracking,
ICCV21(10428-10437)
IEEE DOI 2203
Visualization, Target tracking, Smoothing methods, Pipelines, Benchmark testing, Transformers, BibRef

Heo, B.[Byeongho], Yun, S.[Sangdoo], Han, D.Y.[Dong-Yoon], Chun, S.[Sanghyuk], Choe, J.[Junsuk], Oh, S.J.[Seong Joon],
Rethinking Spatial Dimensions of Vision Transformers,
ICCV21(11916-11925)
IEEE DOI 2203
Dimensionality reduction, Computational modeling, Object detection, Transformers, Robustness, Recognition and classification BibRef

Voskou, A.[Andreas], Panousis, K.P.[Konstantinos P.], Kosmopoulos, D.[Dimitrios], Metaxas, D.N.[Dimitris N.], Chatzis, S.[Sotirios],
Stochastic Transformer Networks with Linear Competing Units: Application to end-to-end SL Translation,
ICCV21(11926-11935)
IEEE DOI 2203
Training, Memory management, Stochastic processes, Gesture recognition, Benchmark testing, Assistive technologies, BibRef

Ranftl, R.[René], Bochkovskiy, A.[Alexey], Koltun, V.[Vladlen],
Vision Transformers for Dense Prediction,
ICCV21(12159-12168)
IEEE DOI 2203
Image resolution, Semantics, Neural networks, Estimation, Training data, grouping and shape BibRef

Chen, M.H.[Ming-Hao], Peng, H.W.[Hou-Wen], Fu, J.L.[Jian-Long], Ling, H.B.[Hai-Bin],
AutoFormer: Searching Transformers for Visual Recognition,
ICCV21(12250-12260)
IEEE DOI 2203
Training, Convolutional codes, Visualization, Head, Search methods, Manuals, Recognition and classification BibRef

Yuan, K.[Kun], Guo, S.P.[Shao-Peng], Liu, Z.W.[Zi-Wei], Zhou, A.[Aojun], Yu, F.W.[Feng-Wei], Wu, W.[Wei],
Incorporating Convolution Designs into Visual Transformers,
ICCV21(559-568)
IEEE DOI 2203
Training, Visualization, Costs, Convolution, Training data, Transformers, Feature extraction, Recognition and classification, Efficient training and inference methods BibRef

Chen, Z.[Zhengsu], Xie, L.X.[Ling-Xi], Niu, J.W.[Jian-Wei], Liu, X.F.[Xue-Feng], Wei, L.[Longhui], Tian, Q.[Qi],
Visformer: The Vision-friendly Transformer,
ICCV21(569-578)
IEEE DOI 2203
Convolutional codes, Training, Visualization, Protocols, Computational modeling, Fitting, Recognition and classification, Representation learning BibRef

Yao, Z.L.[Zhu-Liang], Cao, Y.[Yue], Lin, Y.T.[Yu-Tong], Liu, Z.[Ze], Zhang, Z.[Zheng], Hu, H.[Han],
Leveraging Batch Normalization for Vision Transformers,
NeruArch21(413-422)
IEEE DOI 2112
Training, Transformers, Feeds BibRef

Graham, B.[Ben], El-Nouby, A.[Alaaeldin], Touvron, H.[Hugo], Stock, P.[Pierre], Joulin, A.[Armand], Jégou, H.[Hervé], Douze, M.[Matthijs],
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference,
ICCV21(12239-12249)
IEEE DOI 2203
Training, Image resolution, Neural networks, Parallel processing, Transformers, Feature extraction, Representation learning BibRef

Horváth, J.[János], Baireddy, S.[Sriram], Hao, H.X.[Han-Xiang], Montserrat, D.M.[Daniel Mas], Delp, E.J.[Edward J.],
Manipulation Detection in Satellite Images Using Vision Transformer,
WMF21(1032-1041)
IEEE DOI 2109
BibRef
Earlier: A1, A4, A3, A5, Only:
Manipulation Detection in Satellite Images Using Deep Belief Networks,
WMF20(2832-2840)
IEEE DOI 2008
Image sensors, Satellites, Splicing, Forestry, Tools. Satellites, Image reconstruction, Training, Forgery, Heating systems, Feature extraction BibRef

Beal, J.[Josh], Wu, H.Y.[Hao-Yu], Park, D.H.[Dong Huk], Zhai, A.[Andrew], Kislyuk, D.[Dmitry],
Billion-Scale Pretraining with Vision Transformers for Multi-Task Visual Representations,
WACV22(1431-1440)
IEEE DOI 2202
Visualization, Solid modeling, Systematics, Computational modeling, Transformers, Semi- and Un- supervised Learning BibRef

Chapter on Pattern Recognition, Clustering, Statistics, Grammars, Learning, Neural Nets, Genetic Algorithms continues in
Patch Based Vision Transformers .


Last update:Jan 15, 2025 at 14:36:47