Jiang, B.[Bo],
Zhao, K.K.[Kang-Kang],
Tang, J.[Jin],
RGTransformer: Region-Graph Transformer for Image Representation and
Few-Shot Classification,
SPLetters(29), 2022, pp. 792-796.
IEEE DOI
2204
Measurement, Transformers, Image representation,
Feature extraction, Visualization, transformer
BibRef
Kim, B.[Boah],
Kim, J.[Jeongsol],
Ye, J.C.[Jong Chul],
Task-Agnostic Vision Transformer for Distributed Learning of Image
Processing,
IP(32), 2023, pp. 203-218.
IEEE DOI
2301
Task analysis, Transformers, Servers, Distance learning,
Computer aided instruction, Tail, Head, Distributed learning,
task-agnostic learning
BibRef
Park, S.[Sangjoon],
Ye, J.C.[Jong Chul],
Multi-Task Distributed Learning Using Vision Transformer With Random
Patch Permutation,
MedImg(42), No. 7, July 2023, pp. 2091-2105.
IEEE DOI
2307
Task analysis, Transformers, Head, Tail, Servers, Multitasking,
Distance learning, Federated learning, split learning,
privacy preservation
BibRef
Kim, B.J.[Bum Jun],
Choi, H.[Hyeyeon],
Jang, H.[Hyeonah],
Lee, D.G.[Dong Gu],
Jeong, W.[Wonseok],
Kim, S.W.[Sang Woo],
Improved robustness of vision transformers via prelayernorm in patch
embedding,
PR(141), 2023, pp. 109659.
Elsevier DOI
2306
Vision transformer, Patch embedding, Contrast enhancement,
Robustness, Layer normalization, Convolutional neural network, Deep learning
BibRef
Zhou, D.[Daquan],
Hou, Q.[Qibin],
Yang, L.J.[Lin-Jie],
Jin, X.J.[Xiao-Jie],
Feng, J.S.[Jia-Shi],
Token Selection is a Simple Booster for Vision Transformers,
PAMI(45), No. 11, November 2023, pp. 12738-12746.
IEEE DOI
2310
BibRef
Feng, Z.Z.[Zhan-Zhou],
Zhang, S.L.[Shi-Liang],
Efficient Vision Transformer via Token Merger,
IP(32), 2023, pp. 4156-4169.
IEEE DOI
2307
Corporate acquisitions, Transformers, Semantics, Task analysis,
Visualization, Merging, Computational efficiency, sparese representation
BibRef
Qian, S.J.[Sheng-Ju],
Zhu, Y.[Yi],
Li, W.B.[Wen-Bo],
Li, M.[Mu],
Jia, J.Y.[Jia-Ya],
What Makes for Good Tokenizers in Vision Transformer?,
PAMI(45), No. 11, November 2023, pp. 13011-13023.
IEEE DOI
2310
BibRef
Fu, K.[Kexue],
Yuan, M.Z.[Ming-Zhi],
Liu, S.L.[Shao-Lei],
Wang, M.[Manning],
Boosting Point-BERT by Multi-Choice Tokens,
CirSysVideo(34), No. 1, January 2024, pp. 438-447.
IEEE DOI
2401
self-supervised pre-training task.
See also Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling.
BibRef
Yan, F.Y.[Fang-Yuan],
Yan, B.[Bin],
Liang, W.[Wei],
Pei, M.T.[Ming-Tao],
Token labeling-guided multi-scale medical image classification,
PRL(178), 2024, pp. 28-34.
Elsevier DOI
2402
Medical image classification, Vision transformer, Token labeling
BibRef
Li, Y.X.[Yue-Xiang],
Huang, Y.W.[Ya-Wen],
He, N.[Nanjun],
Ma, K.[Kai],
Zheng, Y.F.[Ye-Feng],
Improving vision transformer for medical image classification via
token-wise perturbation,
JVCIR(98), 2024, pp. 104022.
Elsevier DOI
2402
Self-supervised learning, Vision transformer, Image classification
BibRef
Kang, J.Y.[Jun-Yong],
Heo, B.[Byeongho],
Choe, J.[Junsuk],
Improving ViT interpretability with patch-level mask prediction,
PRL(187), 2025, pp. 73-79.
Elsevier DOI
2501
Vision Transformer, Interpretability, Weak supervision, Object localization
BibRef
Arya, R.K.[Rajat Kumar],
Peddi, R.[Rohith],
Srivastava, R.[Rajeev],
Hyperspectral image classification using hybrid convolutional-based
cross-patch retentive network,
CVIU(257), 2025, pp. 104382.
Elsevier DOI
2505
Hyperspectral images, Classification, Feature extraction,
Convolutional neural networks, Retention mechanism
BibRef
Niu, Y.[Yi],
Song, Z.C.[Zhuo-Chen],
Luo, Q.Y.[Qing-Yu],
Chen, G.C.[Guo-Chao],
Ma, M.M.[Ming-Ming],
Li, F.[Fu],
ATMformer: An Adaptive Token Merging Vision Transformer for Remote
Sensing Image Scene Classification,
RS(17), No. 4, 2025, pp. 660.
DOI Link
2502
downsample to improve computaton.
BibRef
Wang, Y.C.[Yan-Cheng],
Yang, Y.Z.[Ying-Zhen],
Efficient Visual Transformer by Learnable Token Merging,
PAMI(47), No. 11, November 2025, pp. 9597-9608.
IEEE DOI
2510
Transformers, Merging, Visualization, Upper bound,
Accuracy, Training, Mutual information, Deep learning,
compact transformer networks
BibRef
Dang, C.X.[Chen-Xu],
Duan, Z.[Zaipeng],
An, P.[Pei],
Zhang, X.M.[Xin-Min],
Hu, X.[Xuzhong],
Ma, J.[Jie],
FASTer: Focal Token Acquiring-and-Scaling Transformer for Long-term
3D Object Detection,
CVPR25(17029-17038)
IEEE DOI Code:
WWW Link.
2508
Laser radar, Detectors, Object detection, Transformers, Robustness,
Complexity theory, Spatiotemporal phenomena, Proposals,
adaptive scaling
BibRef
Olszewski, J.[Jan],
Rymarczyk, D.[Dawid],
Wójcik, P.[Piotr],
Pach, M.[Mateusz],
Zielinski, B.[Bartosz],
TORE: Token Recycling in Vision Transformers for Efficient Active
Visual Exploration,
WACV25(8606-8616)
IEEE DOI
2505
Visualization, Autoencoders, Transformers, Recycling, Decoding
BibRef
Eliopoulos, N.J.[Nicholas John],
Jajal, P.[Purvish],
Davis, J.C.[James C.],
Liu, G.[Gaowen],
Thiravathukal, G.K.[George K.],
Lu, Y.H.[Yung-Hsiang],
Pruning One More Token is Enough: Leveraging Latency-Workload
Non-Linearities for Vision Transformers on the Edge,
WACV25(7153-7162)
IEEE DOI
2505
Degradation, Schedules, Accuracy, Image edge detection, Merging,
Neural networks, Transformers, Market research, vision transformer,
token sparsification
BibRef
Koner, R.[Rajat],
Jain, G.[Gagan],
Jain, P.[Prateek],
Tresp, V.[Volker],
Paul, S.[Sujoy],
LookupVIT: Compressing Visual Information to a Limited Number of Tokens,
ECCV24(LXXXVI: 322-337).
Springer DOI
2412
BibRef
Jie, S.[Shibo],
Tang, Y.H.[Ye-Hui],
Guo, J.Y.[Jian-Yuan],
Deng, Z.H.[Zhi-Hong],
Han, K.[Kai],
Wang, Y.H.[Yun-He],
Token Compensator: Altering Inference Cost of Vision Transformer
Without Re-tuning,
ECCV24(XVI: 76-94).
Springer DOI
2412
BibRef
Huang, W.X.[Wen-Xuan],
Shen, Y.H.[Yun-Hang],
Xie, J.[Jiao],
Zhang, B.C.[Bao-Chang],
He, G.Q.[Gao-Qi],
Li, K.[Ke],
Sun, X.[Xing],
Lin, S.H.[Shao-Hui],
A General and Efficient Training for Transformer via Token Expansion,
CVPR24(15783-15792)
IEEE DOI Code:
WWW Link.
2410
Training, Accuracy, Costs, Codes, Pipelines, Computer architecture
BibRef
Wu, J.[Junyi],
Duan, B.[Bin],
Kang, W.T.[Wei-Tai],
Tang, H.[Hao],
Yan, Y.[Yan],
Token Transformation Matters: Towards Faithful Post-Hoc Explanation
for Vision Transformer,
CVPR24(10926-10935)
IEEE DOI
2410
Visualization, Correlation, Computational modeling, Perturbation methods,
Predictive models, Length measurement, Explainability
BibRef
Yu, Q.[Qing],
Tanaka, M.[Mikihiro],
Fujiwara, K.[Kent],
Exploring Vision Transformers for 3D Human Motion-Language Models
with Motion Patches,
CVPR24(937-946)
IEEE DOI
2410
Training, Solid modeling, Computational modeling,
Transfer learning, Transformers, Motion-Language Models, Text-Motion Retrieval
BibRef
Yuan, X.[Xin],
Fei, H.L.[Hong-Liang],
Baek, J.[Jinoo],
Efficient Transformer Adaptation with Soft Token Merging,
LargeVM24(3658-3668)
IEEE DOI
2410
Training, Accuracy, Costs, Merging, Video sequences,
Optimization methods, Transformers
BibRef
Xu, X.[Xuwei],
Wang, S.[Sen],
Chen, Y.D.[Yu-Dong],
Zheng, Y.P.[Yan-Ping],
Wei, Z.W.[Zhe-Wei],
Liu, J.J.[Jia-Jun],
GTP-ViT: Efficient Vision Transformers via Graph-based Token
Propagation,
WACV24(86-95)
IEEE DOI Code:
WWW Link.
2404
Source coding, Computational modeling, Merging, Broadcasting,
Transformers, Computational complexity, Algorithms
BibRef
Ding, S.R.[Shuang-Rui],
Zhao, P.S.[Pei-Sen],
Zhang, X.P.[Xiao-Peng],
Qian, R.[Rui],
Xiong, H.K.[Hong-Kai],
Tian, Q.[Qi],
Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation,
ICCV23(16899-16910)
IEEE DOI Code:
WWW Link.
2401
BibRef
Guo, Y.[Yong],
Stutz, D.[David],
Schiele, B.[Bernt],
Improving Robustness of Vision Transformers by Reducing Sensitivity
to Patch Corruptions,
CVPR23(4108-4118)
IEEE DOI
2309
BibRef
Xie, W.[Wei],
Zhao, Z.[Zimeng],
Li, S.Y.[Shi-Ying],
Zuo, B.H.[Bing-Hui],
Wang, Y.G.[Yan-Gang],
Nonrigid Object Contact Estimation With Regional Unwrapping
Transformer,
ICCV23(9308-9317)
IEEE DOI
2401
BibRef
Nalmpantis, A.[Angelos],
Panagiotopoulos, A.[Apostolos],
Gkountouras, J.[John],
Papakostas, K.[Konstantinos],
Aziz, W.[Wilker],
Vision DiffMask: Faithful Interpretation of Vision Transformers with
Differentiable Patch Masking,
XAI4CV23(3756-3763)
IEEE DOI
2309
BibRef
Beyer, L.[Lucas],
Izmailov, P.[Pavel],
Kolesnikov, A.[Alexander],
Caron, M.[Mathilde],
Kornblith, S.[Simon],
Zhai, X.H.[Xiao-Hua],
Minderer, M.[Matthias],
Tschannen, M.[Michael],
Alabdulmohsin, I.[Ibrahim],
Pavetic, F.[Filip],
FlexiViT: One Model for All Patch Sizes,
CVPR23(14496-14506)
IEEE DOI
2309
BibRef
Chang, S.N.[Shu-Ning],
Wang, P.[Pichao],
Lin, M.[Ming],
Wang, F.[Fan],
Zhang, D.J.H.[David Jun-Hao],
Jin, R.[Rong],
Shou, M.Z.[Mike Zheng],
Making Vision Transformers Efficient from A Token Sparsification View,
CVPR23(6195-6205)
IEEE DOI
2309
BibRef
Phan, L.[Lam],
Nguyen, H.T.H.[Hiep Thi Hong],
Warrier, H.[Harikrishna],
Gupta, Y.[Yogesh],
Patch Embedding as Local Features: Unifying Deep Local and Global
Features via Vision Transformer for Image Retrieval,
ACCV22(II:204-221).
Springer DOI
2307
BibRef
Liu, Y.[Yue],
Matsoukas, C.[Christos],
Strand, F.[Fredrik],
Azizpour, H.[Hossein],
Smith, K.[Kevin],
PatchDropout: Economizing Vision Transformers Using Patch Dropout,
WACV23(3942-3951)
IEEE DOI
2302
Training, Image resolution, Computational modeling,
Biological system modeling, Memory management, Transformers,
Biomedical/healthcare/medicine
BibRef
Havtorn, J.D.[Jakob Drachmann],
Royer, A.[Amélie],
Blankevoort, T.[Tijmen],
Bejnordi, B.E.[Babak Ehteshami],
MSViT: Dynamic Mixed-scale Tokenization for Vision Transformers,
NIVT23(838-848)
IEEE DOI
2401
BibRef
Haurum, J.B.[Joakim Bruslund],
Escalera, S.[Sergio],
Taylor, G.W.[Graham W.],
Moeslund, T.B.[Thomas B.],
Which Tokens to Use? Investigating Token Reduction in Vision
Transformers,
NIVT23(773-783)
IEEE DOI Code:
WWW Link.
2401
BibRef
Ren, S.[Sucheng],
Yang, X.Y.[Xing-Yi],
Liu, S.[Songhua],
Wang, X.C.[Xin-Chao],
SG-Former: Self-guided Transformer with Evolving Token Reallocation,
ICCV23(5980-5991)
IEEE DOI Code:
WWW Link.
2401
BibRef
Xiao, H.[Han],
Zheng, W.Z.[Wen-Zhao],
Zhu, Z.[Zheng],
Zhou, J.[Jie],
Lu, J.W.[Ji-Wen],
Token-Label Alignment for Vision Transformers,
ICCV23(5472-5481)
IEEE DOI Code:
WWW Link.
2401
BibRef
Popovic, N.[Nikola],
Paudel, D.P.[Danda Pani],
Probst, T.[Thomas],
Van Gool, L.J.[Luc J.],
Token-Consistent Dropout For Calibrated Vision Transformers,
ICIP23(1030-1034)
IEEE DOI
2312
BibRef
Wei, S.Y.[Si-Yuan],
Ye, T.Z.[Tian-Zhu],
Zhang, S.[Shen],
Tang, Y.[Yao],
Liang, J.J.[Jia-Jun],
Joint Token Pruning and Squeezing Towards More Aggressive Compression
of Vision Transformers,
CVPR23(2092-2101)
IEEE DOI
2309
BibRef
Zhang, J.P.[Jian-Ping],
Huang, Y.Z.[Yi-Zhan],
Wu, W.B.[Wei-Bin],
Lyu, M.R.[Michael R.],
Transferable Adversarial Attacks on Vision Transformers with Token
Gradient Regularization,
CVPR23(16415-16424)
IEEE DOI
2309
BibRef
Ronen, T.[Tomer],
Levy, O.[Omer],
Golbert, A.[Avram],
Vision Transformers with Mixed-Resolution Tokenization,
ECV23(4613-4622)
IEEE DOI
2309
BibRef
Lorenzana, M.B.[Marlon Bran],
Engstrom, C.[Craig],
Chandra, S.S.[Shekhar S.],
Transformer Compressed Sensing Via Global Image Tokens,
ICIP22(3011-3015)
IEEE DOI
2211
Training, Limiting, Image resolution, Neural networks,
Image representation, Transformers, MRI
BibRef
Fayyaz, M.[Mohsen],
Koohpayegani, S.A.[Soroush Abbasi],
Jafari, F.R.[Farnoush Rezaei],
Sengupta, S.[Sunando],
Joze, H.R.V.[Hamid Reza Vaezi],
Sommerlade, E.[Eric],
Pirsiavash, H.[Hamed],
Gall, J.[Jürgen],
Adaptive Token Sampling for Efficient Vision Transformers,
ECCV22(XI:396-414).
Springer DOI
2211
BibRef
Kong, Z.L.[Zheng-Lun],
Dong, P.Y.[Pei-Yan],
Ma, X.L.[Xiao-Long],
Meng, X.[Xin],
Niu, W.[Wei],
Sun, M.S.[Meng-Shu],
Shen, X.[Xuan],
Yuan, G.[Geng],
Ren, B.[Bin],
Tang, H.[Hao],
Qin, M.H.[Ming-Hai],
Wang, Y.Z.[Yan-Zhi],
SPViT:
Enabling Faster Vision Transformers via Latency-Aware Soft Token Pruning,
ECCV22(XI:620-640).
Springer DOI
2211
BibRef
Fang, J.[Jiemin],
Xie, L.X.[Ling-Xi],
Wang, X.G.[Xing-Gang],
Zhang, X.P.[Xiao-Peng],
Liu, W.Y.[Wen-Yu],
Tian, Q.[Qi],
MSG-Transformer:
Exchanging Local Spatial Information by Manipulating Messenger Tokens,
CVPR22(12053-12062)
IEEE DOI
2210
Deep learning, Visualization, Neural networks,
Graphics processing units, retrieval
BibRef
Yin, H.X.[Hong-Xu],
Vahdat, A.[Arash],
Alvarez, J.M.[Jose M.],
Mallya, A.[Arun],
Kautz, J.[Jan],
Molchanov, P.[Pavlo],
A-ViT: Adaptive Tokens for Efficient Vision Transformer,
CVPR22(10799-10808)
IEEE DOI
2210
Training, Adaptive systems, Network architecture, Transformers,
Throughput, Hardware, Complexity theory,
Efficient learning and inferences
BibRef
Gu, J.D.[Jin-Dong],
Tresp, V.[Volker],
Qin, Y.[Yao],
Are Vision Transformers Robust to Patch Perturbations?,
ECCV22(XII:404-421).
Springer DOI
2211
BibRef
Li, Z.K.[Zhi-Kai],
Ma, L.P.[Li-Ping],
Chen, M.J.[Meng-Juan],
Xiao, J.R.[Jun-Rui],
Gu, Q.Y.[Qing-Yi],
Patch Similarity Aware Data-Free Quantization for Vision Transformers,
ECCV22(XI:154-170).
Springer DOI
2211
BibRef
Yun, S.[Sukmin],
Lee, H.[Hankook],
Kim, J.[Jaehyung],
Shin, J.[Jinwoo],
Patch-level Representation Learning for Self-supervised Vision
Transformers,
CVPR22(8344-8353)
IEEE DOI
2210
Training, Representation learning, Visualization, Neural networks,
Object detection, Self-supervised learning, Transformers,
Self- semi- meta- unsupervised learning
BibRef
Salman, H.[Hadi],
Jain, S.[Saachi],
Wong, E.[Eric],
Madry, A.[Aleksander],
Certified Patch Robustness via Smoothed Vision Transformers,
CVPR22(15116-15126)
IEEE DOI
2210
Visualization, Smoothing methods, Costs, Computational modeling,
Transformers, Adversarial attack and defense
BibRef
Tang, Y.H.[Ye-Hui],
Han, K.[Kai],
Wang, Y.H.[Yun-He],
Xu, C.[Chang],
Guo, J.Y.[Jian-Yuan],
Xu, C.[Chao],
Tao, D.C.[Da-Cheng],
Patch Slimming for Efficient Vision Transformers,
CVPR22(12155-12164)
IEEE DOI
2210
Visualization, Quantization (signal), Computational modeling,
Aggregates, Benchmark testing,
Representation learning
BibRef
Chen, Z.Y.[Zhao-Yu],
Li, B.[Bo],
Wu, S.[Shuang],
Xu, J.H.[Jiang-He],
Ding, S.H.[Shou-Hong],
Zhang, W.Q.[Wen-Qiang],
Shape Matters: Deformable Patch Attack,
ECCV22(IV:529-548).
Springer DOI
2211
BibRef
Chen, Z.Y.[Zhao-Yu],
Li, B.[Bo],
Xu, J.H.[Jiang-He],
Wu, S.[Shuang],
Ding, S.H.[Shou-Hong],
Zhang, W.Q.[Wen-Qiang],
Towards Practical Certifiable Patch Defense with Vision Transformer,
CVPR22(15127-15137)
IEEE DOI
2210
Smoothing methods, Toy manufacturing industry, Semantics,
Network architecture, Transformers, Robustness,
Adversarial attack and defense
BibRef
Yuan, L.[Li],
Chen, Y.P.[Yun-Peng],
Wang, T.[Tao],
Yu, W.H.[Wei-Hao],
Shi, Y.J.[Yu-Jun],
Jiang, Z.H.[Zi-Hang],
Tay, F.E.H.[Francis E. H.],
Feng, J.S.[Jia-Shi],
Yan, S.C.[Shui-Cheng],
Tokens-to-Token ViT:
Training Vision Transformers from Scratch on ImageNet,
ICCV21(538-547)
IEEE DOI
2203
Training, Image resolution, Computational modeling,
Image edge detection, Transformers,
BibRef
Chapter on Pattern Recognition, Clustering, Statistics, Grammars, Learning, Neural Nets, Genetic Algorithms continues in
Attention in Vision Transformers .