Bazi, Y.[Yakoub],
Bashmal, L.[Laila],
Al Rahhal, M.M.[Mohamad M.],
Al Dayil, R.[Reham],
Al Ajlan, N.[Naif],
Vision Transformers for Remote Sensing Image Classification,
RS(13), No. 3, 2021, pp. xx-yy.
DOI Link
2102
BibRef
Li, T.[Tao],
Zhang, Z.[Zheng],
Pei, L.[Lishen],
Gan, Y.[Yan],
HashFormer: Vision Transformer Based Deep Hashing for Image Retrieval,
SPLetters(29), 2022, pp. 827-831.
IEEE DOI
2204
Transformers, Binary codes, Task analysis, Training, Image retrieval,
Feature extraction, Databases, Binary embedding, image retrieval
BibRef
Jiang, B.[Bo],
Zhao, K.K.[Kang-Kang],
Tang, J.[Jin],
RGTransformer: Region-Graph Transformer for Image Representation and
Few-Shot Classification,
SPLetters(29), 2022, pp. 792-796.
IEEE DOI
2204
Measurement, Transformers, Image representation,
Feature extraction, Visualization, transformer
BibRef
Chen, Z.M.[Zhao-Min],
Cui, Q.[Quan],
Zhao, B.[Borui],
Song, R.J.[Ren-Jie],
Zhang, X.Q.[Xiao-Qin],
Yoshie, O.[Osamu],
SST: Spatial and Semantic Transformers for Multi-Label Image
Recognition,
IP(31), 2022, pp. 2570-2583.
IEEE DOI
2204
Correlation, Semantics, Transformers, Image recognition,
Task analysis, Training, Feature extraction, label correlation
BibRef
Wang, G.H.[Guang-Hui],
Li, B.[Bin],
Zhang, T.[Tao],
Zhang, S.[Shubi],
A Network Combining a Transformer and a Convolutional Neural Network
for Remote Sensing Image Change Detection,
RS(14), No. 9, 2022, pp. xx-yy.
DOI Link
2205
BibRef
Luo, G.[Gen],
Zhou, Y.[Yiyi],
Sun, X.S.[Xiao-Shuai],
Wang, Y.[Yan],
Cao, L.J.[Liu-Juan],
Wu, Y.J.[Yong-Jian],
Huang, F.Y.[Fei-Yue],
Ji, R.R.[Rong-Rong],
Towards Lightweight Transformer Via Group-Wise Transformation for
Vision-and-Language Tasks,
IP(31), 2022, pp. 3386-3398.
IEEE DOI
2205
Transformers, Task analysis, Computational modeling,
Benchmark testing, Visualization, Convolution, Head,
reference expression comprehension
BibRef
Wang, J.Y.[Jia-Yun],
Chakraborty, R.[Rudrasis],
Yu, S.X.[Stella X.],
Transformer for 3D Point Clouds,
PAMI(44), No. 8, August 2022, pp. 4419-4431.
IEEE DOI
2207
Convolution, Feature extraction, Shape, Semantics, Task analysis,
Measurement, point cloud, transformation, deformable, segmentation, 3D detection
BibRef
Li, Z.K.[Ze-Kun],
Liu, Y.F.[Yu-Fan],
Li, B.[Bing],
Feng, B.L.[Bai-Lan],
Wu, K.[Kebin],
Peng, C.W.[Cheng-Wei],
Hu, W.M.[Wei-Ming],
SDTP: Semantic-Aware Decoupled Transformer Pyramid for Dense Image
Prediction,
CirSysVideo(32), No. 9, September 2022, pp. 6160-6173.
IEEE DOI
2209
Transformers, Semantics, Task analysis, Detectors,
Image segmentation, Head, Convolution, Transformer, dense prediction,
multi-level interaction
BibRef
Wu, J.J.[Jia-Jing],
Wei, Z.Q.[Zhi-Qiang],
Zhang, J.P.[Jin-Peng],
Zhang, Y.S.[Yu-Shi],
Jia, D.N.[Dong-Ning],
Yin, B.[Bo],
Yu, Y.C.[Yun-Chao],
Full-Coupled Convolutional Transformer for Surface-Based Duct
Refractivity Inversion,
RS(14), No. 17, 2022, pp. xx-yy.
DOI Link
2209
BibRef
Jiang, K.[Kai],
Peng, P.[Peng],
Lian, Y.[Youzao],
Xu, W.S.[Wei-Sheng],
The encoding method of position embeddings in vision transformer,
JVCIR(89), 2022, pp. 103664.
Elsevier DOI
2212
Vision transformer, Position embeddings, Gabor filters
BibRef
Han, K.[Kai],
Wang, Y.H.[Yun-He],
Chen, H.T.[Han-Ting],
Chen, X.H.[Xing-Hao],
Guo, J.Y.[Jian-Yuan],
Liu, Z.H.[Zhen-Hua],
Tang, Y.[Yehui],
Xiao, A.[An],
Xu, C.J.[Chun-Jing],
Xu, Y.X.[Yi-Xing],
Yang, Z.H.[Zhao-Hui],
Zhang, Y.[Yiman],
Tao, D.C.[Da-Cheng],
A Survey on Vision Transformer,
PAMI(45), No. 1, January 2023, pp. 87-110.
IEEE DOI
2212
Survey, Vision Transformer. Transformers, Task analysis, Encoding, Computational modeling,
Visualization, Object detection, high-level vision,
video
BibRef
Hou, Q.[Qibin],
Jiang, Z.H.[Zi-Hang],
Yuan, L.[Li],
Cheng, M.M.[Ming-Ming],
Yan, S.C.[Shui-Cheng],
Feng, J.S.[Jia-Shi],
Vision Permutator:
A Permutable MLP-Like Architecture for Visual Recognition,
PAMI(45), No. 1, January 2023, pp. 1328-1334.
IEEE DOI
2212
Transformers, Encoding, Visualization, Convolutional codes, Mixers,
Computer architecture, Training data, Vision permutator, deep neural network
BibRef
Yu, W.H.[Wei-Hao],
Si, C.Y.[Chen-Yang],
Zhou, P.[Pan],
Luo, M.[Mi],
Zhou, Y.C.[Yi-Chen],
Feng, J.S.[Jia-Shi],
Yan, S.C.[Shui-Cheng],
Wang, X.C.[Xin-Chao],
MetaFormer Baselines for Vision,
PAMI(46), No. 2, February 2024, pp. 896-912.
IEEE DOI
2401
BibRef
And: A1, A4, A3, A2, A5, A8, A6, A7:
MetaFormer is Actually What You Need for Vision,
CVPR22(10809-10819)
IEEE DOI
2210
The abstracted architecture of Transformer.
Computational modeling, Focusing,
Transformers, Task analysis, retrieval
BibRef
Zhou, D.[Daquan],
Hou, Q.[Qibin],
Yang, L.J.[Lin-Jie],
Jin, X.J.[Xiao-Jie],
Feng, J.S.[Jia-Shi],
Token Selection is a Simple Booster for Vision Transformers,
PAMI(45), No. 11, November 2023, pp. 12738-12746.
IEEE DOI
2310
BibRef
Yuan, L.[Li],
Hou, Q.[Qibin],
Jiang, Z.H.[Zi-Hang],
Feng, J.S.[Jia-Shi],
Yan, S.C.[Shui-Cheng],
VOLO: Vision Outlooker for Visual Recognition,
PAMI(45), No. 5, May 2023, pp. 6575-6586.
IEEE DOI
2304
Transformers, Computer architecture, Computational modeling,
Training, Data models, Task analysis, Visualization,
image classification
BibRef
Ren, S.[Sucheng],
Zhou, D.[Daquan],
He, S.F.[Sheng-Feng],
Feng, J.S.[Jia-Shi],
Wang, X.C.[Xin-Chao],
Shunted Self-Attention via Multi-Scale Token Aggregation,
CVPR22(10843-10852)
IEEE DOI
2210
Degradation, Deep learning, Costs, Computational modeling, Merging,
Efficient learning and inferences
BibRef
Wu, Y.H.[Yu-Huan],
Liu, Y.[Yun],
Zhan, X.[Xin],
Cheng, M.M.[Ming-Ming],
P2T: Pyramid Pooling Transformer for Scene Understanding,
PAMI(45), No. 11, November 2023, pp. 12760-12771.
IEEE DOI
2310
BibRef
Li, Y.[Yehao],
Yao, T.[Ting],
Pan, Y.W.[Ying-Wei],
Mei, T.[Tao],
Contextual Transformer Networks for Visual Recognition,
PAMI(45), No. 2, February 2023, pp. 1489-1500.
IEEE DOI
2301
Transformers, Convolution, Visualization, Task analysis,
Image recognition, Object detection, Transformer, image recognition
BibRef
Wang, H.[Hang],
Du, Y.[Youtian],
Zhang, Y.[Yabin],
Li, S.[Shuai],
Zhang, L.[Lei],
One-Stage Visual Relationship Referring With Transformers and
Adaptive Message Passing,
IP(32), 2023, pp. 190-202.
IEEE DOI
2301
Visualization, Proposals, Transformers, Task analysis, Detectors,
Message passing, Predictive models, gated message passing
BibRef
Kiya, H.[Hitoshi],
Iijima, R.[Ryota],
Maungmaung, A.[Aprilpyone],
Kinoshit, Y.[Yuma],
Image and Model Transformation with Secret Key for Vision Transformer,
IEICE(E106-D), No. 1, January 2023, pp. 2-11.
WWW Link.
2301
BibRef
Zhang, H.F.[Hao-Fei],
Mao, F.[Feng],
Xue, M.Q.[Meng-Qi],
Fang, G.F.[Gong-Fan],
Feng, Z.L.[Zun-Lei],
Song, J.[Jie],
Song, M.L.[Ming-Li],
Knowledge Amalgamation for Object Detection With Transformers,
IP(32), 2023, pp. 2093-2106.
IEEE DOI
2304
Transformers, Task analysis, Object detection, Detectors, Training,
Feature extraction, Model reusing, vision transformers
BibRef
Li, Y.[Ying],
Chen, K.[Kehan],
Sun, S.L.[Shi-Lei],
He, C.[Chu],
Multi-scale homography estimation based on dual feature aggregation
transformer,
IET-IPR(17), No. 5, 2023, pp. 1403-1416.
DOI Link
2304
image matching, image registration
BibRef
Wang, G.Q.[Guan-Qun],
Chen, H.[He],
Chen, L.[Liang],
Zhuang, Y.[Yin],
Zhang, S.H.[Shang-Hang],
Zhang, T.[Tong],
Dong, H.[Hao],
Gao, P.[Peng],
P2FEViT: Plug-and-Play CNN Feature Embedded Hybrid Vision Transformer
for Remote Sensing Image Classification,
RS(15), No. 7, 2023, pp. 1773.
DOI Link
2304
BibRef
Zhang, Q.M.[Qi-Ming],
Xu, Y.F.[Yu-Fei],
Zhang, J.[Jing],
Tao, D.C.[Da-Cheng],
ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for
Image Recognition and Beyond,
IJCV(131), No. 5, May 2023, pp. 1141-1162.
Springer DOI
2305
BibRef
Zhang, J.N.[Jiang-Ning],
Li, X.T.[Xiang-Tai],
Wang, Y.B.[Ya-Biao],
Wang, C.J.[Cheng-Jie],
Yang, Y.B.[Yi-Bo],
Liu, Y.[Yong],
Tao, D.C.[Da-Cheng],
EATFormer: Improving Vision Transformer Inspired by Evolutionary
Algorithm,
IJCV(132), No. 1, January 2024, pp. 3509-3536.
Springer DOI
2409
BibRef
Fan, X.Y.[Xin-Yi],
Liu, H.J.[Hua-Jun],
FlexFormer: Flexible Transformer for efficient visual recognition,
PRL(169), 2023, pp. 95-101.
Elsevier DOI
2305
Vision transformer, Frequency analysis, Image classification
BibRef
Cho, S.[Seokju],
Hong, S.[Sunghwan],
Kim, S.[Seungryong],
CATs++: Boosting Cost Aggregation With Convolutions and Transformers,
PAMI(45), No. 6, June 2023, pp. 7174-7194.
IEEE DOI
WWW Link.
2305
Costs, Transformers, Correlation, Semantics, Feature extraction,
Task analysis, Cost aggregation, efficient transformer,
semantic visual correspondence
BibRef
Wang, Z.W.[Zi-Wei],
Wang, C.Y.[Chang-Yuan],
Xu, X.W.[Xiu-Wei],
Zhou, J.[Jie],
Lu, J.W.[Ji-Wen],
Quantformer: Learning Extremely Low-Precision Vision Transformers,
PAMI(45), No. 7, July 2023, pp. 8813-8826.
IEEE DOI
2306
Quantization (signal), Transformers, Computational modeling, Search problems,
Object detection, Image color analysis, vision transformers
BibRef
Yue, X.Y.[Xiao-Yu],
Sun, S.Y.[Shu-Yang],
Kuang, Z.H.[Zhang-Hui],
Wei, M.[Meng],
Torr, P.H.S.[Philip H.S.],
Zhang, W.[Wayne],
Lin, D.[Dahua],
Vision Transformer with Progressive Sampling,
ICCV21(377-386)
IEEE DOI
2203
Codes, Computational modeling, Interference,
Transformers, Feature extraction, Recognition and classification,
Representation learning
BibRef
Peng, Z.L.[Zhi-Liang],
Guo, Z.H.[Zong-Hao],
Huang, W.[Wei],
Wang, Y.W.[Yao-Wei],
Xie, L.X.[Ling-Xi],
Jiao, J.B.[Jian-Bin],
Tian, Q.[Qi],
Ye, Q.X.[Qi-Xiang],
Conformer: Local Features Coupling Global Representations for
Recognition and Detection,
PAMI(45), No. 8, August 2023, pp. 9454-9468.
IEEE DOI
2307
Transformers, Feature extraction, Couplings, Visualization,
Detectors, Convolution, Object detection, Feature fusion,
vision transformer
BibRef
Peng, Z.L.[Zhi-Liang],
Huang, W.[Wei],
Gu, S.Z.[Shan-Zhi],
Xie, L.X.[Ling-Xi],
Wang, Y.[Yaowei],
Jiao, J.B.[Jian-Bin],
Ye, Q.X.[Qi-Xiang],
Conformer: Local Features Coupling Global Representations for Visual
Recognition,
ICCV21(357-366)
IEEE DOI
2203
Couplings, Representation learning, Visualization, Fuses,
Convolution, Object detection, Transformers,
Representation learning
BibRef
Feng, Z.Z.[Zhan-Zhou],
Zhang, S.L.[Shi-Liang],
Efficient Vision Transformer via Token Merger,
IP(32), 2023, pp. 4156-4169.
IEEE DOI
2307
Corporate acquisitions, Transformers, Semantics, Task analysis,
Visualization, Merging, Computational efficiency, sparese representation
BibRef
Huang, X.Y.[Xin-Yan],
Liu, F.[Fang],
Cui, Y.H.[Yuan-Hao],
Chen, P.[Puhua],
Li, L.L.[Ling-Ling],
Li, P.F.[Peng-Fang],
Faster and Better: A Lightweight Transformer Network for Remote
Sensing Scene Classification,
RS(15), No. 14, 2023, pp. 3645.
DOI Link
2307
BibRef
Zhao, J.X.[Jia-Xuan],
Jiao, L.C.[Li-Cheng],
Wang, C.[Chao],
Liu, X.[Xu],
Liu, F.[Fang],
Li, L.L.[Ling-Ling],
Ma, M.[Mengru],
Yang, S.Y.[Shu-Yuan],
Knowledge Guided Evolutionary Transformer for Remote Sensing Scene
Classification,
CirSysVideo(34), No. 10, October 2024, pp. 10368-10384.
IEEE DOI
2411
Transformers, Convolutional neural networks,
Computer architecture, Scene classification, Feature extraction,
graph neural networks
BibRef
Zhang, D.[Dan],
Ma, W.P.[Wen-Ping],
Jiao, L.C.[Li-Cheng],
Liu, X.[Xu],
Yang, Y.T.[Yu-Ting],
Liu, F.[Fang],
Multiple Hierarchical Cross-Scale Transformer for Remote Sensing
Scene Classification,
RS(17), No. 1, 2025, pp. 42.
DOI Link
2501
BibRef
Yao, T.[Ting],
Li, Y.[Yehao],
Pan, Y.W.[Ying-Wei],
Wang, Y.[Yu],
Zhang, X.P.[Xiao-Ping],
Mei, T.[Tao],
Dual Vision Transformer,
PAMI(45), No. 9, September 2023, pp. 10870-10882.
IEEE DOI
2309
Survey, Vision Transformer.
BibRef
Rao, Y.M.[Yong-Ming],
Liu, Z.[Zuyan],
Zhao, W.L.[Wen-Liang],
Zhou, J.[Jie],
Lu, J.W.[Ji-Wen],
Dynamic Spatial Sparsification for Efficient Vision Transformers and
Convolutional Neural Networks,
PAMI(45), No. 9, September 2023, pp. 10883-10897.
IEEE DOI
2309
BibRef
Li, J.[Jie],
Liu, Z.[Zhao],
Li, L.[Li],
Lin, J.Q.[Jun-Qin],
Yao, J.[Jian],
Tu, J.[Jingmin],
Multi-view convolutional vision transformer for 3D object recognition,
JVCIR(95), 2023, pp. 103906.
Elsevier DOI
2309
Multi-view, 3D object recognition, Feature fusion, Convolutional neural networks
BibRef
Shang, J.H.[Jing-Huan],
Li, X.[Xiang],
Kahatapitiya, K.[Kumara],
Lee, Y.C.[Yu-Cheol],
Ryoo, M.S.[Michael S.],
StARformer: Transformer With State-Action-Reward Representations for
Robot Learning,
PAMI(45), No. 11, November 2023, pp. 12862-12877.
IEEE DOI
2310
BibRef
Earlier: A1, A3, A2, A5, Only:
StARformer: Transformer with State-Action-Reward Representations for
Visual Reinforcement Learning,
ECCV22(XXIX:462-479).
Springer DOI
2211
BibRef
Duan, H.R.[Hao-Ran],
Long, Y.[Yang],
Wang, S.D.[Shi-Dong],
Zhang, H.F.[Hao-Feng],
Willcocks, C.G.[Chris G.],
Shao, L.[Ling],
Dynamic Unary Convolution in Transformers,
PAMI(45), No. 11, November 2023, pp. 12747-12759.
IEEE DOI
2310
BibRef
Qian, S.J.[Sheng-Ju],
Zhu, Y.[Yi],
Li, W.B.[Wen-Bo],
Li, M.[Mu],
Jia, J.Y.[Jia-Ya],
What Makes for Good Tokenizers in Vision Transformer?,
PAMI(45), No. 11, November 2023, pp. 13011-13023.
IEEE DOI
2310
BibRef
Sun, W.X.[Wei-Xuan],
Qin, Z.[Zhen],
Deng, H.[Hui],
Wang, J.[Jianyuan],
Zhang, Y.[Yi],
Zhang, K.[Kaihao],
Barnes, N.[Nick],
Birchfield, S.[Stan],
Kong, L.P.[Ling-Peng],
Zhong, Y.[Yiran],
Vicinity Vision Transformer,
PAMI(45), No. 10, October 2023, pp. 12635-12649.
IEEE DOI
2310
BibRef
Cao, C.J.[Chen-Jie],
Dong, Q.L.[Qiao-Le],
Fu, Y.W.[Yan-Wei],
ZITS++: Image Inpainting by Improving the Incremental Transformer on
Structural Priors,
PAMI(45), No. 10, October 2023, pp. 12667-12684.
IEEE DOI
2310
BibRef
Fang, Y.X.[Yu-Xin],
Wang, X.G.[Xing-Gang],
Wu, R.[Rui],
Liu, W.Y.[Wen-Yu],
What Makes for Hierarchical Vision Transformer?,
PAMI(45), No. 10, October 2023, pp. 12714-12720.
IEEE DOI
2310
BibRef
Liu, J.[Jun],
Guo, H.R.[Hao-Ran],
He, Y.[Yile],
Li, H.L.[Hua-Li],
Vision Transformer-Based Ensemble Learning for Hyperspectral Image
Classification,
RS(15), No. 21, 2023, pp. 5208.
DOI Link
2311
BibRef
Lin, M.B.[Ming-Bao],
Chen, M.Z.[Meng-Zhao],
Zhang, Y.X.[Yu-Xin],
Shen, C.H.[Chun-Hua],
Ji, R.R.[Rong-Rong],
Cao, L.J.[Liu-Juan],
Super Vision Transformer,
IJCV(131), No. 12, December 2023, pp. 3136-3151.
Springer DOI
2311
BibRef
Li, Z.Y.[Zhong-Yu],
Gao, S.H.[Shang-Hua],
Cheng, M.M.[Ming-Ming],
SERE: Exploring Feature Self-Relation for Self-Supervised Transformer,
PAMI(45), No. 12, December 2023, pp. 15619-15631.
IEEE DOI
2311
BibRef
Yuan, Y.H.[Yu-Hui],
Liang, W.C.[Wei-Cong],
Ding, H.H.[Heng-Hui],
Liang, Z.H.[Zhan-Hao],
Zhang, C.[Chao],
Hu, H.[Han],
Expediting Large-Scale Vision Transformer for Dense Prediction
Without Fine-Tuning,
PAMI(46), No. 1, January 2024, pp. 250-266.
IEEE DOI
2312
BibRef
Jiao, J.[Jiayu],
Tang, Y.M.[Yu-Ming],
Lin, K.Y.[Kun-Yu],
Gao, Y.P.[Yi-Peng],
Ma, A.J.[Andy J.],
Wang, Y.W.[Yao-Wei],
Zheng, W.S.[Wei-Shi],
DilateFormer: Multi-Scale Dilated Transformer for Visual Recognition,
MultMed(25), 2023, pp. 8906-8919.
IEEE DOI Code:
HTML Version.
2312
BibRef
Fu, K.[Kexue],
Yuan, M.Z.[Ming-Zhi],
Liu, S.L.[Shao-Lei],
Wang, M.[Manning],
Boosting Point-BERT by Multi-Choice Tokens,
CirSysVideo(34), No. 1, January 2024, pp. 438-447.
IEEE DOI
2401
self-supervised pre-training task.
See also Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling.
BibRef
Ghosal, S.S.[Soumya Suvra],
Li, Y.X.[Yi-Xuan],
Are Vision Transformers Robust to Spurious Correlations?,
IJCV(132), No. 3, March 2024, pp. 689-709.
Springer DOI
2402
BibRef
Yan, F.Y.[Fang-Yuan],
Yan, B.[Bin],
Liang, W.[Wei],
Pei, M.T.[Ming-Tao],
Token labeling-guided multi-scale medical image classification,
PRL(178), 2024, pp. 28-34.
Elsevier DOI
2402
Medical image classification, Vision transformer, Token labeling
BibRef
Li, Y.X.[Yue-Xiang],
Huang, Y.W.[Ya-Wen],
He, N.[Nanjun],
Ma, K.[Kai],
Zheng, Y.F.[Ye-Feng],
Improving vision transformer for medical image classification via
token-wise perturbation,
JVCIR(98), 2024, pp. 104022.
Elsevier DOI
2402
Self-supervised learning, Vision transformer, Image classification
BibRef
Nguyen, H.[Hung],
Kim, C.[Chanho],
Li, F.[Fuxin],
Space-time recurrent memory network,
CVIU(241), 2024, pp. 103943.
Elsevier DOI
2403
Deep learning architectures and techniques, Segmentation,
Memory network, Transformer
BibRef
Kheldouni, A.[Amine],
Boumhidi, J.[Jaouad],
A Study of Bidirectional Encoder Representations from Transformers
for Sequential Recommendations,
ISCV22(1-5)
IEEE DOI
2208
Knowledge engineering, Recurrent neural networks,
Predictive models, Markov processes
BibRef
Xiao, Q.[Qiao],
Zhang, Y.[Yu],
Yang, Q.[Qiang],
Selective Random Walk for Transfer Learning in Heterogeneous Label
Spaces,
PAMI(46), No. 6, June 2024, pp. 4476-4488.
IEEE DOI
2405
Transfer learning, Bridges, Metalearning, Adaptation models,
Training, Task analysis, Transfer learning, selective random walk
BibRef
Akkaya, I.B.[Ibrahim Batuhan],
Kathiresan, S.S.[Senthilkumar S.],
Arani, E.[Elahe],
Zonooz, B.[Bahram],
Enhancing performance of vision transformers on small datasets
through local inductive bias incorporation,
PR(153), 2024, pp. 110510.
Elsevier DOI Code:
WWW Link.
2405
Vision transformer, Inductive bias, Locality, Small dataset
BibRef
Yao, T.[Ting],
Li, Y.[Yehao],
Pan, Y.W.[Ying-Wei],
Mei, T.[Tao],
HIRI-ViT: Scaling Vision Transformer With High Resolution Inputs,
PAMI(46), No. 9, September 2024, pp. 6431-6442.
IEEE DOI
2408
Transformers, Convolution, Convolutional neural networks,
Computational efficiency, Spatial resolution, Visualization, vision transformer
BibRef
Xu, G.Y.[Guang-Yi],
Ye, J.Y.[Jun-Yong],
Liu, X.Y.[Xin-Yuan],
Wen, X.B.[Xu-Bin],
Li, Y.[Youwei],
Wang, J.J.[Jing-Jing],
LV-Adapter: Adapting Vision Transformers for Visual Classification
with Linear-layers and Vectors,
CVIU(246), 2024, pp. 104049.
Elsevier DOI
2408
Deep learning, Vision Transformers, Fine-tuning, Plug and play,
Transfer learning
BibRef
Yan, L.Q.[Long-Quan],
Yan, R.X.[Rui-Xiang],
Chai, B.[Bosong],
Geng, G.H.[Guo-Hua],
Zhou, P.[Pengbo],
Gao, J.[Jian],
DM-GAN: CNN hybrid vits for training GANs under limited data,
PR(156), 2024, pp. 110810.
Elsevier DOI
2408
GAN, Few-shot, Vision transformer, Proprietary artifact image
BibRef
Feng, Q.H.[Qi-Hua],
Li, P.Y.[Pei-Ya],
Lu, Z.X.[Zhi-Xun],
Li, C.Z.[Chao-Zhuo],
Wang, Z.[Zefan],
Liu, Z.Q.[Zhi-Quan],
Duan, C.H.[Chun-Hui],
Huang, F.[Feiran],
Weng, J.[Jian],
Yu, P.S.[Philip S.],
EViT: Privacy-Preserving Image Retrieval via Encrypted Vision
Transformer in Cloud Computing,
CirSysVideo(34), No. 8, August 2024, pp. 7467-7483.
IEEE DOI Code:
WWW Link.
2408
Feature extraction, Encryption, Codes, Cloud computing, Transform coding,
Streaming media, Ciphers, Image retrieval, self-supervised learning
BibRef
Wang, H.Y.[Hong-Yu],
Ma, S.M.[Shu-Ming],
Dong, L.[Li],
Huang, S.[Shaohan],
Zhang, D.D.[Dong-Dong],
Wei, F.[Furu],
DeepNet: Scaling Transformers to 1,000 Layers,
PAMI(46), No. 10, October 2024, pp. 6761-6774.
IEEE DOI
2409
Transformers, Training, Optimization, Stability analysis,
Machine translation, Decoding, Computational modeling, Big models,
transformers
BibRef
Papa, L.[Lorenzo],
Russo, P.[Paolo],
Amerini, I.[Irene],
Zhou, L.P.[Lu-Ping],
A Survey on Efficient Vision Transformers: Algorithms, Techniques,
and Performance Benchmarking,
PAMI(46), No. 12, December 2024, pp. 7682-7700.
IEEE DOI
2411
Survey, Vision Transformers. Transformers, Task analysis, Computational modeling, Surveys,
Feature extraction, Costs, vision transformer
BibRef
Hu, S.C.[Sheng-Chao],
Shen, L.[Li],
Zhang, Y.[Ya],
Chen, Y.X.[Yi-Xin],
Tao, D.C.[Da-Cheng],
On Transforming Reinforcement Learning With Transformers:
The Development Trajectory,
PAMI(46), No. 12, December 2024, pp. 8580-8599.
IEEE DOI
2411
Transformers, Analytical models,
Task analysis, Surveys, Trajectory optimization, Literature survey
BibRef
Xu, R.S.[Run-Sheng],
Chen, C.J.[Chia-Ju],
Tu, Z.Z.[Zheng-Zhong],
Yang, M.H.[Ming-Hsuan],
V2X-ViTv2: Improved Vision Transformers for Vehicle-to-Everything
Cooperative Perception,
PAMI(47), No. 1, January 2025, pp. 650-662.
IEEE DOI
2412
Vehicle-to-everything, Feature extraction, Transformers,
Visualization, Metadata, Location awareness, Laser radar, Robustness,
vehicle-to-everything (V2X)
BibRef
Xu, R.S.[Run-Sheng],
Xiang, H.[Hao],
Tu, Z.Z.[Zheng-Zhong],
Xia, X.[Xin],
Yang, M.H.[Ming-Hsuan],
Ma, J.Q.[Jia-Qi],
V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision
Transformer,
ECCV22(XXIX:107-124).
Springer DOI
2211
BibRef
Xiang, H.[Hao],
Zheng, Z.L.[Zhao-Liang],
Xia, X.[Xin],
Xu, R.S.[Run-Sheng],
Gao, L.[Letian],
Zhou, Z.W.[Ze-Wei],
Han, X.[Xu],
Ji, X.[Xinkai],
Li, M.X.[Ming-Xi],
Meng, Z.L.[Zong-Lin],
Jin, L.[Li],
Lei, M.Y.[Ming-Yue],
Ma, Z.Y.[Zhao-Yang],
He, Z.H.[Zi-Hang],
Ma, H.X.[Hao-Xuan],
Yuan, Y.S.[Yun-Shuang],
Zhao, Y.Q.[Ying-Qian],
Ma, J.Q.[Jia-Qi],
V2X-Real: A Largs-scale Dataset for Vehicle-to-everything Cooperative
Perception,
ECCV24(LII: 455-470).
Springer DOI
2412
BibRef
Xiang, H.[Hao],
Xu, R.S.[Run-Sheng],
Ma, J.Q.[Jia-Qi],
HM-ViT: Hetero-modal Vehicle-to-Vehicle Cooperative Perception with
Vision Transformer,
ICCV23(284-295)
IEEE DOI Code:
WWW Link.
2401
BibRef
Ma, X.[Xiao],
Zhang, Z.[Zetian],
Yu, R.[Rong],
Ji, Z.[Zexuan],
Li, M.C.[Ming-Chao],
Zhang, Y.H.[Yu-Han],
Chen, Q.[Qiang],
SAVE: Encoding spatial interactions for vision transformers,
IVC(152), 2024, pp. 105312.
Elsevier DOI Code:
WWW Link.
2412
Vision transformers, Position encoding, Spatial interactions
BibRef
Wang, H.Q.[Hao-Qi],
Zhang, T.[Tong],
Salzmann, M.[Mathieu],
Sinder: Repairing the Singular Defects of Dinov2,
ECCV24(VII: 20-35).
Springer DOI
2412
Code:
WWW Link.
BibRef
Suri, S.[Saksham],
Walmer, M.[Matthew],
Gupta, K.[Kamal],
Shrivastava, A.[Abhinav],
Lift: A Surprisingly Simple Lightweight Feature Transform for Dense Vit
Descriptors,
ECCV24(VII: 110-128).
Springer DOI
2412
BibRef
Pan, Z.Z.[Zi-Zheng],
Liu, J.[Jing],
He, H.Y.[Hao-Yu],
Cai, J.F.[Jian-Fei],
Zhuang, B.[Bohan],
Stitched VITS are Flexible Vision Backbones,
ECCV24(XLI: 258-274).
Springer DOI
2412
BibRef
Kim, D.H.[Dong-Hyun],
Heo, B.[Byeongho],
Han, D.Y.[Dong-Yoon],
Densenets Reloaded: Paradigm Shift Beyond Resnets and VITS,
ECCV24(III: 395-415).
Springer DOI
2412
BibRef
Zhang, C.[Chi],
Cheng, J.[Jingpu],
Li, Q.X.[Qian-Xiao],
An Optimal Control View of Lora and Binary Controller Design for Vision
Transformers,
ECCV24(LIII: 144-160).
Springer DOI
2412
BibRef
Koner, R.[Rajat],
Jain, G.[Gagan],
Jain, P.[Prateek],
Tresp, V.[Volker],
Paul, S.[Sujoy],
LookupVIT: Compressing Visual Information to a Limited Number of Tokens,
ECCV24(LXXXVI: 322-337).
Springer DOI
2412
BibRef
Zhang, T.[Taolin],
Bai, J.[Jiawang],
Lu, Z.[Zhihe],
Lian, D.Z.[Dong-Ze],
Wang, G.[Genping],
Wang, X.C.[Xin-Chao],
Xia, S.T.[Shu-Tao],
Parameter-efficient and Memory-efficient Tuning for Vision Transformer:
A Disentangled Approach,
ECCV24(XLV: 346-363).
Springer DOI
2412
BibRef
Wang, H.Y.[Hai-Yang],
Tang, H.[Hao],
Jiang, L.[Li],
Shi, S.S.[Shao-Shuai],
Naeem, M.F.[Muhammad Ferjad],
Li, H.S.[Hong-Sheng],
Schiele, B.[Bernt],
Wang, L.W.[Li-Wei],
Git: Towards Generalist Vision Transformer Through Universal Language
Interface,
ECCV24(XXIX: 55-73).
Springer DOI
2412
BibRef
Wu, Z.G.Y.[Zhu-Guan-Yu],
Chen, J.X.[Jia-Xin],
Zhong, H.[Hanwen],
Huang, D.[Di],
Wang, Y.H.[Yun-Hong],
Adalog: Post-training Quantization for Vision Transformers with
Adaptive Logarithm Quantizer,
ECCV24(XXVII: 411-427).
Springer DOI
2412
BibRef
Jie, S.[Shibo],
Tang, Y.[Yehui],
Guo, J.[Jianyuan],
Deng, Z.H.[Zhi-Hong],
Han, K.[Kai],
Wang, Y.H.[Yun-He],
Token Compensator: Altering Inference Cost of Vision Transformer
Without Re-tuning,
ECCV24(XVI: 76-94).
Springer DOI
2412
BibRef
Xiao, H.[Han],
Zheng, W.Z.[Wen-Zhao],
Zuo, S.C.[Si-Cheng],
Gao, P.[Peng],
Zhou, J.[Jie],
Lu, J.W.[Ji-Wen],
Spatialformer: Towards Generalizable Vision Transformers with Explicit
Spatial Understanding,
ECCV24(XIII: 37-54).
Springer DOI
2412
BibRef
Heo, B.[Byeongho],
Park, S.[Song],
Han, D.Y.[Dong-Yoon],
Yun, S.[Sangdoo],
Rotary Position Embedding for Vision Transformer,
ECCV24(X: 289-305).
Springer DOI
2412
BibRef
Kondo, R.[Ryota],
Minoura, H.[Hiroaki],
Hirakawa, T.[Tsubasa],
Yamashita, T.[Takayoshi],
Fujiyoshi, H.[Hironobu],
Binary-Decomposed Vision Transformer: Compressing and Accelerating
Vision Transformer by Binary Decomposition,
ICIP24(3600-3605)
IEEE DOI
2411
Visualization, Image coding, Quantization (signal), Accuracy,
Computational modeling, Object detection, Binary Decomposition,
Vision Transformer
BibRef
Bellitto, G.[Giovanni],
Sortino, R.[Renato],
Spadaro, P.[Paolo],
Palazzo, S.[Simone],
Salanitri, F.P.[Federica Proietto],
Fiameni, G.[Giuseppe],
Gavves, E.[Efstratios],
Spampinato, C.[Concetto],
Vito: Vision Transformer Optimization Via Knowledge Distillation On
Decoders,
ICIP24(493-499)
IEEE DOI
2411
Visualization, Correlation, Predictive models, Benchmark testing,
Transformers, Robustness, Inductive bias, Autoregression, Sequence models
BibRef
Gani, H.[Hanan],
Saadi, N.[Nada],
Hussein, N.[Noor],
Nandakumar, K.[Karthik],
Multi-Attribute Vision Transformers are Efficient and Robust Learners,
ICIP24(766-772)
IEEE DOI
2411
Training, Transformers, Robustness, Convolutional neural networks,
Task analysis, Vision Transformers, Multi-attribute learning,
adversarial attacks
BibRef
Huang, W.X.[Wen-Xuan],
Shen, Y.[Yunhang],
Xie, J.[Jiao],
Zhang, B.C.[Bao-Chang],
He, G.[Gaoqi],
Li, K.[Ke],
Sun, X.[Xing],
Lin, S.H.[Shao-Hui],
A General and Efficient Training for Transformer via Token Expansion,
CVPR24(15783-15792)
IEEE DOI Code:
WWW Link.
2410
Training, Accuracy, Costs, Codes, Pipelines, Computer architecture
BibRef
Cho, J.H.[Jang Hyun],
Krähenbühl, P.[Philipp],
Language-Conditioned Detection Transformer,
CVPR24(16593-16603)
IEEE DOI Code:
WWW Link.
2410
Training, Codes, Computational modeling, Detectors,
Computer architecture, Benchmark testing, Self-training
BibRef
Lin, S.[Sihao],
Lyu, P.[Pumeng],
Liu, D.[Dongrui],
Tang, T.[Tao],
Liang, X.D.[Xiao-Dan],
Song, A.[Andy],
Chang, X.J.[Xiao-Jun],
MLP Can Be a Good Transformer Learner,
CVPR24(19489-19498)
IEEE DOI
2410
Computational modeling, Memory management, Redundancy,
Transformers, Throughput, Particle measurements,
Efficient Inference
BibRef
Wang, A.[Ao],
Chen, H.[Hui],
Lin, Z.J.[Zi-Jia],
Han, J.G.[Jun-Gong],
Ding, G.G.[Gui-Guang],
Rep ViT: Revisiting Mobile CNN From ViT Perspective,
CVPR24(15909-15920)
IEEE DOI Code:
WWW Link.
2410
Performance evaluation, Codes, Accuracy, Computational modeling,
Transformers, Mobile handsets, CNN, ViT
BibRef
Weng, H.H.[Hao-Han],
Huang, D.[Danqing],
Qiao, Y.[Yu],
Hu, Z.[Zheng],
Lin, C.Y.[Chin-Yew],
Zhang, T.[Tong],
Chen, C.L.P.[C. L. Philip],
Desigen: A Pipeline for Controllable Design Template Generation,
CVPR24(12721-12732)
IEEE DOI Code:
WWW Link.
2410
Visualization, Pipelines, Layout, Process control,
Transformers, design generation, layout generation
BibRef
Park, S.[Sungho],
Byun, H.R.[Hye-Ran],
Fair-VPT: Fair Visual Prompt Tuning for Image Classification,
CVPR24(12268-12278)
IEEE DOI
2410
Visualization, Contrastive learning, Benchmark testing, Transformers,
Linear programming, Decorrelation, FAI, Fairness, Large Vision Model
BibRef
Xu, H.Y.[Heng-Yuan],
Xiang, L.[Liyao],
Ye, H.Y.[Hang-Yu],
Yao, D.[Dixi],
Chu, P.Z.[Peng-Zhi],
Li, B.C.[Bao-Chun],
Permutation Equivariance of Transformers and its Applications,
CVPR24(5987-5996)
IEEE DOI Code:
WWW Link.
2410
Backpropagation, Authorization, Deep learning, Adaptation models,
Codes, Computational modeling, Permutation equivariance,
Privacy-preserving
BibRef
Zhang, Y.Y.[Yi-Yuan],
Ding, X.H.[Xiao-Han],
Gong, K.X.[Kai-Xiong],
Ge, Y.X.[Yi-Xiao],
Shan, Y.[Ying],
Yue, X.Y.[Xiang-Yu],
Multimodal Pathway: Improve Transformers with Irrelevant Data from
Other Modalities,
CVPR24(6108-6117)
IEEE DOI Code:
WWW Link.
2410
Point cloud compression, Image recognition, Head, Costs, Codes,
Computational modeling, Multimodal Pathway, Network Achitecture
BibRef
Kobayashi, T.[Takumi],
Mean-Shift Feature Transformer,
CVPR24(6047-6056)
IEEE DOI Code:
WWW Link.
2410
Analytical models, Costs, Codes, Computational modeling,
Transformers, Mean shift, Grouped projection
BibRef
Wu, J.[Junyi],
Duan, B.[Bin],
Kang, W.T.[Wei-Tai],
Tang, H.[Hao],
Yan, Y.[Yan],
Token Transformation Matters: Towards Faithful Post-Hoc Explanation
for Vision Transformer,
CVPR24(10926-10935)
IEEE DOI
2410
Visualization, Correlation, Computational modeling,
Perturbation methods, Predictive models, Length measurement,
Explainability
BibRef
Yun, S.[Seokju],
Ro, Y.[Youngmin],
SHViT: Single-Head Vision Transformer with Memory Efficient Macro
Design,
CVPR24(5756-5767)
IEEE DOI
2410
Performance evaluation, Head, Accuracy, Redundancy,
Graphics processing units, Object detection, CNNs
BibRef
Shi, X.Y.[Xin-Yu],
Hao, Z.C.[Ze-Cheng],
Yu, Z.F.[Zhao-Fei],
SpikingResformer: Bridging ResNet and Vision Transformer in Spiking
Neural Networks,
CVPR24(5610-5619)
IEEE DOI Code:
WWW Link.
2410
Energy consumption, Accuracy, Codes, Computer architecture,
Spiking neural networks, Transformers, Spiking Neural Networks,
Vision Transformer
BibRef
Ye, H.C.[Han-Cheng],
Yu, C.[Chong],
Ye, P.[Peng],
Xia, R.[Renqiu],
Tang, Y.S.[Yan-Song],
Lu, J.W.[Ji-Wen],
Chen, T.[Tao],
Zhang, B.[Bo],
Once for Both: Single Stage of Importance and Sparsity Search for
Vision Transformer Compression,
CVPR24(5578-5588)
IEEE DOI
2410
Dimensionality reduction, Image coding, Costs, Costing,
Graphics processing units, Computer architecture
BibRef
Zhang, J.[Junyi],
Herrmann, C.[Charles],
Hur, J.[Junhwa],
Chen, E.[Eric],
Jampani, V.[Varun],
Sun, D.Q.[De-Qing],
Yang, M.H.[Ming-Hsuan],
Telling Left from Right: Identifying Geometry-Aware Semantic
Correspondence,
CVPR24(3076-3085)
IEEE DOI Code:
WWW Link.
2410
Geometry, Codes, Animals, Semantics, Pose estimation,
Benchmark testing, semantic correspondence, diffusion models, vision transformer
BibRef
Huang, N.C.[Ning-Chi],
Chang, C.C.[Chi-Chih],
Lin, W.C.[Wei-Cheng],
Taka, E.[Endri],
Marculescu, D.[Diana],
Wu, K.C.A.[Kai-Chi-Ang],
ELSA: Exploiting Layer-wise N:M Sparsity for Vision Transformer
Acceleration,
ECV24(8006-8015)
IEEE DOI Code:
WWW Link.
2410
Training, Degradation, Accuracy, Transformers, Throughput, Software
BibRef
Devulapally, A.[Anusha],
Khan, M.F.F.[Md Fahim Faysal],
Advani, S.[Siddharth],
Narayanan, V.[Vijaykrishnan],
Multi-Modal Fusion of Event and RGB for Monocular Depth Estimation
Using a Unified Transformer-based Architecture,
MULA24(2081-2089)
IEEE DOI Code:
WWW Link.
2410
Measurement, Accuracy, Recurrent neural networks,
Robot vision systems, Estimation, Computer architecture,
Vision Transformer
BibRef
Yang, Z.D.[Zhen-Dong],
Li, Z.[Zhe],
Zeng, A.[Ailing],
Li, Z.X.[Ze-Xian],
Yuan, C.[Chun],
Li, Y.[Yu],
ViTKD: Feature-based Knowledge Distillation for Vision Transformers,
PBDL24(1379-1388)
IEEE DOI Code:
WWW Link.
2410
Knowledge engineering, Computational modeling, MIMICs,
Transformers
BibRef
Mehri, F.[Faridoun],
Fayyaz, M.[Mohsen],
Baghshah, M.S.[Mahdieh Soleymani],
Pilehvar, M.T.[Mohammad Taher],
SkipPLUS: Skip the First Few Layers to Better Explain Vision
Transformers,
FaDE-TCV24(204-215)
IEEE DOI Code:
WWW Link.
2410
Training, Animals, Aggregates, Transformers, xAI,
Interpretability, Vision Transformers,
White-Box Input Attribution Methods
BibRef
Jain, S.[Samyak],
Dutta, T.[Tanima],
Towards Understanding and Improving Adversarial Robustness of Vision
Transformers,
CVPR24(24736-24745)
IEEE DOI
2410
Training, Measurement, Perturbation methods, Design methodology,
Transformers, Robustness, adversarial robustness, Vision Transformers
BibRef
Yang, S.[Sheng],
Bai, J.[Jiawang],
Gao, K.[Kuofeng],
Yang, Y.[Yong],
Li, Y.M.[Yi-Ming],
Xia, S.T.[Shu-Tao],
Not All Prompts Are Secure: A Switchable Backdoor Attack Against
Pre-trained Vision Transfomers,
CVPR24(24431-24441)
IEEE DOI Code:
WWW Link.
2410
Visualization, Codes, Computational modeling, Force, Switches,
Predictive models, Vision Transformers, Visual Prompting, Backdoor,
Parameter-Efficient Fine Tuning
BibRef
Steitz, J.M.O.[Jan-Martin O.],
Roth, S.[Stefan],
Adapters Strike Back,
CVPR24(23449-23459)
IEEE DOI Code:
WWW Link.
2410
Adaptation models, Accuracy, Systematics, Computer architecture,
Benchmark testing, Transformers, vision transformer,
image classification
BibRef
Rangwani, H.[Harsh],
Mondal, P.[Pradipto],
Mondal, P.[Pradipto],
Mishra, M.[Mayank],
Asokan, A.R.[Ashish Ramayee],
Babu, R.V.[R. Venkatesh],
DeiT-LT: Distillation Strikes Back for Vision Transformer Training on
Long-Tailed Datasets,
CVPR24(23396-23406)
IEEE DOI Code:
WWW Link.
2410
Training, Head, Tail, Computer architecture, Transformers,
Distance measurement, long-tail-learning, vision transformers, vit, distillation
BibRef
Liu, J.Y.[Jin-Yang],
Teshome, W.[Wondmgezahu],
Ghimire, S.[Sandesh],
Sznaier, M.[Mario],
Camps, O.[Octavia],
Solving Masked Jigsaw Puzzles with Diffusion Vision Transformers*,
CVPR24(23009-23018)
IEEE DOI
2410
Visualization, Face recognition, Noise reduction, Video sequences,
Predictive models, Fasteners, Solving puzzles, diffusion models, data imputation
BibRef
Kim, M.[Manjin],
Seo, P.H.[Paul Hongsuck],
Schmid, C.[Cordelia],
Cho, M.[Minsu],
Learning Correlation Structures for Vision Transformers,
CVPR24(18941-18951)
IEEE DOI
2410
Representation learning, Visualization, Correlation, Aggregates,
Layout, Transformers, Vision Transformers, correlation modeling,
video classification
BibRef
Yang, M.[Min],
Gao, H.[Huan],
Guo, P.[Ping],
Wang, L.M.[Li-Min],
Adapting Short-Term Transformers for Action Detection in Untrimmed
Videos,
CVPR24(18570-18579)
IEEE DOI
2410
Adaptation models, Computational modeling, Memory management,
Detectors, Transformers, Feature extraction,
Vision Transformer
BibRef
Shi, D.[Dai],
TransNeXt: Robust Foveal Visual Perception for Vision Transformers,
CVPR24(17773-17783)
IEEE DOI
2410
Degradation, Visualization, Accuracy, Image resolution, Stacking,
Transformers, Vision Transformer, Visual Backbone, Perceptual Artifacts
BibRef
Agiza, A.[Ahmed],
Neseem, M.[Marina],
Reda, S.[Sherief],
MTLoRA: A Low-Rank Adaptation Approach for Efficient Multi-Task
Learning,
CVPR24(16196-16205)
IEEE DOI Code:
WWW Link.
2410
Training, Deep learning, Adaptation models, Accuracy, Instruments,
Computer architecture, multi-task learning, vision transformers,
hierarchical transformers
BibRef
Dong, W.[Wei],
Zhang, X.[Xing],
Chen, B.[Bihui],
Yan, D.W.[Da-Wei],
Lin, Z.J.[Zhi-Jun],
Yan, Q.[Qingsen],
Wang, P.[Peng],
Yang, Y.[Yang],
Low-Rank Rescaled Vision Transformer Fine-Tuning:
A Residual Design Approach,
CVPR24(16101-16110)
IEEE DOI Code:
WWW Link.
2410
Adaptation models, Codes, Buildings, Transformers,
Matrix decomposition, Low-Rank Adaptation
BibRef
Wu, J.[Junyi],
Kang, W.T.[Wei-Tai],
Tang, H.[Hao],
Hong, Y.[Yuan],
Yan, Y.[Yan],
On the Faithfulness of Vision Transformer Explanations,
CVPR24(10936-10945)
IEEE DOI
2410
Measurement, Heating systems, Correlation, Aggregates,
Predictive models, Benchmark testing, Transformer, Explainability
BibRef
Navaneet, K.L.,
Koohpayegani, S.A.[Soroush Abbasi],
Sleiman, E.[Essam],
Pirsiavash, H.[Hamed],
SlowFormer: Adversarial Attack on Compute and Energy Consumption of
Efficient Vision Transformers,
CVPR24(24786-24797)
IEEE DOI Code:
WWW Link.
2410
Training, Adaptation models, Power demand, Computational modeling,
Training data, Transformers, Adversarial attack,
efficient vision transformers
BibRef
Koyun, O.C.[Onur Can],
Töreyin, B.U.[Behçet Ugur],
HaLViT: Half of the Weights are Enough,
LargeVM24(3669-3678)
IEEE DOI
2410
Computational modeling, Deep architecture, Transformers,
Convolutional neural networks, Efficient,
deep learning
BibRef
Bafghi, R.A.[Reza Akbarian],
Harilal, N.[Nidhin],
Monteleoni, C.[Claire],
Raissi, M.[Maziar],
Parameter Efficient Fine-tuning of Self-supervised ViTs without
Catastrophic Forgetting,
LargeVM24(3679-3684)
IEEE DOI
2410
BibRef
And:
LargeVM24(7864-7869)
IEEE DOI
2410
Knowledge engineering, Adaptation models,
Learning (artificial intelligence), Artificial neural networks,
Catastrophic Forgetting
BibRef
Yuan, X.[Xin],
Fei, H.L.[Hong-Liang],
Baek, J.[Jinoo],
Efficient Transformer Adaptation with Soft Token Merging,
LargeVM24(3658-3668)
IEEE DOI
2410
Training, Accuracy, Costs, Merging, Video sequences,
Optimization methods, Transformers
BibRef
Edalati, A.[Ali],
Hameed, M.G.A.[Marawan Gamal Abdel],
Mosleh, A.[Ali],
Generalized Kronecker-based Adapters for Parameter-efficient
Fine-tuning of Vision Transformers,
CRV23(97-104)
IEEE DOI
2406
Adaptation models, Tensors, Limiting, Computational modeling,
Transformers, Convolutional neural networks
BibRef
Marouf, I.E.[Imad Eddine],
Tartaglione, E.[Enzo],
Lathuiličre, S.[Stéphane],
Mini but Mighty: Finetuning ViTs with Mini Adapters,
WACV24(1721-1730)
IEEE DOI
2404
Training, Costs, Neurons, Transfer learning, Estimation,
Computer architecture, Algorithms
BibRef
Kim, G.[Gihyun],
Kim, J.[Juyeop],
Lee, J.S.[Jong-Seok],
Exploring Adversarial Robustness of Vision Transformers in the
Spectral Perspective,
WACV24(3964-3973)
IEEE DOI
2404
Deep learning, Perturbation methods, Frequency-domain analysis,
Linearity, Transformers, Robustness, High frequency, Algorithms,
adversarial attack and defense methods
BibRef
Xu, X.[Xuwei],
Wang, S.[Sen],
Chen, Y.D.[Yu-Dong],
Zheng, Y.P.[Yan-Ping],
Wei, Z.W.[Zhe-Wei],
Liu, J.J.[Jia-Jun],
GTP-ViT: Efficient Vision Transformers via Graph-based Token
Propagation,
WACV24(86-95)
IEEE DOI Code:
WWW Link.
2404
Source coding, Computational modeling, Merging, Broadcasting,
Transformers, Computational complexity, Algorithms
BibRef
Han, Q.[Qiu],
Zhang, G.J.[Gong-Jie],
Huang, J.X.[Jia-Xing],
Gao, P.[Peng],
Wei, Z.[Zhang],
Lu, S.J.[Shi-Jian],
Efficient MAE towards Large-Scale Vision Transformers,
WACV24(595-604)
IEEE DOI
2404
Measurement, Degradation, Visualization, Runtime,
Computational modeling, Transformers, Algorithms
BibRef
Park, J.W.[Jong-Woo],
Kahatapitiya, K.[Kumara],
Kim, D.H.[Dong-Hyun],
Sudalairaj, S.[Shivchander],
Fan, Q.F.[Quan-Fu],
Ryoo, M.S.[Michael S.],
Grafting Vision Transformers,
WACV24(1134-1143)
IEEE DOI Code:
WWW Link.
2404
Codes, Computational modeling, Semantics, Information sharing,
Computer architecture, Transformers, Algorithms,
Image recognition and understanding
BibRef
Shimizu, S.[Shuki],
Tamaki, T.[Toru],
Joint learning of images and videos with a single Vision Transformer,
MVA23(1-6)
DOI Link
2403
Training, Image recognition, Machine vision, Transformers, Tuning, Videos
BibRef
Ding, S.R.[Shuang-Rui],
Zhao, P.S.[Pei-Sen],
Zhang, X.P.[Xiao-Peng],
Qian, R.[Rui],
Xiong, H.K.[Hong-Kai],
Tian, Q.[Qi],
Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation,
ICCV23(16899-16910)
IEEE DOI Code:
WWW Link.
2401
BibRef
Chen, M.Z.[Meng-Zhao],
Lin, M.[Mingbao],
Lin, Z.H.[Zhi-Hang],
Zhang, Y.X.[Yu-Xin],
Chao, F.[Fei],
Ji, R.R.[Rong-Rong],
SMMix: Self-Motivated Image Mixing for Vision Transformers,
ICCV23(17214-17224)
IEEE DOI Code:
WWW Link.
2401
BibRef
Kim, D.[Dahun],
Angelova, A.[Anelia],
Kuo, W.C.[Wei-Cheng],
Contrastive Feature Masking Open-Vocabulary Vision Transformer,
ICCV23(15556-15566)
IEEE DOI
2401
BibRef
Li, Z.K.[Zhi-Kai],
Gu, Q.Y.[Qing-Yi],
I-ViT: Integer-only Quantization for Efficient Vision Transformer
Inference,
ICCV23(17019-17029)
IEEE DOI Code:
WWW Link.
2401
BibRef
Frumkin, N.[Natalia],
Gope, D.[Dibakar],
Marculescu, D.[Diana],
Jumping through Local Minima: Quantization in the Loss Landscape of
Vision Transformers,
ICCV23(16932-16942)
IEEE DOI Code:
WWW Link.
2401
BibRef
Li, Z.K.[Zhi-Kai],
Xiao, J.R.[Jun-Rui],
Yang, L.W.[Lian-Wei],
Gu, Q.Y.[Qing-Yi],
RepQ-ViT: Scale Reparameterization for Post-Training Quantization of
Vision Transformers,
ICCV23(17181-17190)
IEEE DOI Code:
WWW Link.
2401
BibRef
Havtorn, J.D.[Jakob Drachmann],
Royer, A.[Amélie],
Blankevoort, T.[Tijmen],
Bejnordi, B.E.[Babak Ehteshami],
MSViT: Dynamic Mixed-scale Tokenization for Vision Transformers,
NIVT23(838-848)
IEEE DOI
2401
BibRef
Haurum, J.B.[Joakim Bruslund],
Escalera, S.[Sergio],
Taylor, G.W.[Graham W.],
Moeslund, T.B.[Thomas B.],
Which Tokens to Use? Investigating Token Reduction in Vision
Transformers,
NIVT23(773-783)
IEEE DOI Code:
WWW Link.
2401
BibRef
Wang, X.[Xijun],
Chu, X.J.[Xiao-Jie],
Han, C.[Chunrui],
Zhang, X.Y.[Xiang-Yu],
SCSC: Spatial Cross-scale Convolution Module to Strengthen both CNNs
and Transformers,
NIVT23(731-741)
IEEE DOI
2401
BibRef
Chen, Y.H.[Yi-Hsin],
Weng, Y.C.[Ying-Chieh],
Kao, C.H.[Chia-Hao],
Chien, C.[Cheng],
Chiu, W.C.[Wei-Chen],
Peng, W.H.[Wen-Hsiao],
TransTIC: Transferring Transformer-based Image Compression from Human
Perception to Machine Perception,
ICCV23(23240-23250)
IEEE DOI
2401
BibRef
Li, Y.[Yanyu],
Hu, J.[Ju],
Wen, Y.[Yang],
Evangelidis, G.[Georgios],
Salahi, K.[Kamyar],
Wang, Y.Z.[Yan-Zhi],
Tulyakov, S.[Sergey],
Ren, J.[Jian],
Rethinking Vision Transformers for MobileNet Size and Speed,
ICCV23(16843-16854)
IEEE DOI
2401
BibRef
Nurgazin, M.[Maxat],
Tu, N.A.[Nguyen Anh],
A Comparative Study of Vision Transformer Encoders and Few-shot
Learning for Medical Image Classification,
CVAMD23(2505-2513)
IEEE DOI
2401
BibRef
Xie, W.[Wei],
Zhao, Z.[Zimeng],
Li, S.Y.[Shi-Ying],
Zuo, B.H.[Bing-Hui],
Wang, Y.G.[Yan-Gang],
Nonrigid Object Contact Estimation With Regional Unwrapping
Transformer,
ICCV23(9308-9317)
IEEE DOI
2401
BibRef
Vasu, P.K.A.[Pavan Kumar Anasosalu],
Gabriel, J.[James],
Zhu, J.[Jeff],
Tuzel, O.[Oncel],
Ranjan, A.[Anurag],
FastViT: A Fast Hybrid Vision Transformer using Structural
Reparameterization,
ICCV23(5762-5772)
IEEE DOI Code:
WWW Link.
2401
BibRef
Tang, C.[Chen],
Zhang, L.L.[Li Lyna],
Jiang, H.Q.[Hui-Qiang],
Xu, J.H.[Jia-Hang],
Cao, T.[Ting],
Zhang, Q.[Quanlu],
Yang, Y.Q.[Yu-Qing],
Wang, Z.[Zhi],
Yang, M.[Mao],
ElasticViT: Conflict-aware Supernet Training for Deploying Fast
Vision Transformer on Diverse Mobile Devices,
ICCV23(5806-5817)
IEEE DOI
2401
BibRef
Ren, S.[Sucheng],
Yang, X.Y.[Xing-Yi],
Liu, S.[Songhua],
Wang, X.C.[Xin-Chao],
SG-Former: Self-guided Transformer with Evolving Token Reallocation,
ICCV23(5980-5991)
IEEE DOI Code:
WWW Link.
2401
BibRef
Lin, W.F.[Wei-Feng],
Wu, Z.H.[Zi-Heng],
Chen, J.[Jiayu],
Huang, J.[Jun],
Jin, L.W.[Lian-Wen],
Scale-Aware Modulation Meet Transformer,
ICCV23(5992-6003)
IEEE DOI Code:
WWW Link.
2401
BibRef
He, Y.F.[Ye-Fei],
Lou, Z.Y.[Zhen-Yu],
Zhang, L.[Luoming],
Liu, J.[Jing],
Wu, W.J.[Wei-Jia],
Zhou, H.[Hong],
Zhuang, B.[Bohan],
BiViT: Extremely Compressed Binary Vision Transformers,
ICCV23(5628-5640)
IEEE DOI
2401
BibRef
Dutson, M.[Matthew],
Li, Y.[Yin],
Gupta, M.[Mohit],
Eventful Transformers:
Leveraging Temporal Redundancy in Vision Transformers,
ICCV23(16865-16877)
IEEE DOI
2401
BibRef
Wang, Z.Q.[Zi-Qing],
Fang, Y.T.[Yue-Tong],
Cao, J.H.[Jia-Hang],
Zhang, Q.[Qiang],
Wang, Z.[Zhongrui],
Xu, R.[Renjing],
Masked Spiking Transformer,
ICCV23(1761-1771)
IEEE DOI Code:
WWW Link.
2401
BibRef
Peebles, W.[William],
Xie, S.[Saining],
Scalable Diffusion Models with Transformers,
ICCV23(4172-4182)
IEEE DOI
2401
BibRef
Mentzer, F.[Fabian],
Agustson, E.[Eirikur],
Tschannen, M.[Michael],
M2T: Masking Transformers Twice for Faster Decoding,
ICCV23(5317-5326)
IEEE DOI
2401
BibRef
Xiao, H.[Han],
Zheng, W.Z.[Wen-Zhao],
Zhu, Z.[Zheng],
Zhou, J.[Jie],
Lu, J.W.[Ji-Wen],
Token-Label Alignment for Vision Transformers,
ICCV23(5472-5481)
IEEE DOI Code:
WWW Link.
2401
BibRef
Yu, R.Y.[Run-Yi],
Wang, Z.N.[Zhen-Nan],
Wang, Y.H.[Yin-Huai],
Li, K.[Kehan],
Liu, C.[Chang],
Duan, H.[Haoyi],
Ji, X.Y.[Xiang-Yang],
Chen, J.[Jie],
LaPE: Layer-adaptive Position Embedding for Vision Transformers with
Independent Layer Normalization,
ICCV23(5863-5873)
IEEE DOI
2401
BibRef
Roy, A.[Anurag],
Verma, V.K.[Vinay K.],
Voonna, S.[Sravan],
Ghosh, K.[Kripabandhu],
Ghosh, S.[Saptarshi],
Das, A.[Abir],
Exemplar-Free Continual Transformer with Convolutions,
ICCV23(5874-5884)
IEEE DOI
2401
BibRef
Xu, Y.X.[Yi-Xing],
Li, C.[Chao],
Li, D.[Dong],
Sheng, X.[Xiao],
Jiang, F.[Fan],
Tian, L.[Lu],
Sirasao, A.[Ashish],
FDViT: Improve the Hierarchical Architecture of Vision Transformer,
ICCV23(5927-5937)
IEEE DOI
2401
BibRef
Chen, Y.J.[Yong-Jie],
Liu, H.M.[Hong-Min],
Yin, H.R.[Hao-Ran],
Fan, B.[Bin],
Building Vision Transformers with Hierarchy Aware Feature Aggregation,
ICCV23(5885-5895)
IEEE DOI
2401
BibRef
Quétu, V.[Victor],
Milovanovic, M.[Marta],
Tartaglione, E.[Enzo],
Sparse Double Descent in Vision Transformers: Real or Phantom Threat?,
CIAP23(II:490-502).
Springer DOI
2312
BibRef
Ak, K.E.[Kenan Emir],
Lee, G.G.[Gwang-Gook],
Xu, Y.[Yan],
Shen, M.W.[Ming-Wei],
Leveraging Efficient Training and Feature Fusion in Transformers for
Multimodal Classification,
ICIP23(1420-1424)
IEEE DOI
2312
BibRef
Popovic, N.[Nikola],
Paudel, D.P.[Danda Pani],
Probst, T.[Thomas],
Van Gool, L.J.[Luc J.],
Token-Consistent Dropout For Calibrated Vision Transformers,
ICIP23(1030-1034)
IEEE DOI
2312
BibRef
Sajjadi, M.S.M.[Mehdi S. M.],
Mahendran, A.[Aravindh],
Kipf, T.[Thomas],
Pot, E.[Etienne],
Duckworth, D.[Daniel],
Lucic, M.[Mario],
Greff, K.[Klaus],
RUST: Latent Neural Scene Representations from Unposed Imagery,
CVPR23(17297-17306)
IEEE DOI
2309
BibRef
Bowman, B.[Benjamin],
Achille, A.[Alessandro],
Zancato, L.[Luca],
Trager, M.[Matthew],
Perera, P.[Pramuditha],
Paolini, G.[Giovanni],
Soatto, S.[Stefano],
Ŕ-la-carte Prompt Tuning (APT):
Combining Distinct Data Via Composable Prompting,
CVPR23(14984-14993)
IEEE DOI
2309
BibRef
Nakhli, R.[Ramin],
Moghadam, P.A.[Puria Azadi],
Mi, H.Y.[Hao-Yang],
Farahani, H.[Hossein],
Baras, A.[Alexander],
Gilks, B.[Blake],
Bashashati, A.[Ali],
Sparse Multi-Modal Graph Transformer with Shared-Context Processing
for Representation Learning of Giga-pixel Images,
CVPR23(11547-11557)
IEEE DOI
2309
BibRef
Gärtner, E.[Erik],
Metz, L.[Luke],
Andriluka, M.[Mykhaylo],
Freeman, C.D.[C. Daniel],
Sminchisescu, C.[Cristian],
Transformer-Based Learned Optimization,
CVPR23(11970-11979)
IEEE DOI
2309
BibRef
Li, J.C.[Jia-Chen],
Hassani, A.[Ali],
Walton, S.[Steven],
Shi, H.[Humphrey],
ConvMLP: Hierarchical Convolutional MLPs for Vision,
WFM23(6307-6316)
IEEE DOI
2309
multi-layer perceptron
BibRef
Walmer, M.[Matthew],
Suri, S.[Saksham],
Gupta, K.[Kamal],
Shrivastava, A.[Abhinav],
Teaching Matters:
Investigating the Role of Supervision in Vision Transformers,
CVPR23(7486-7496)
IEEE DOI
2309
BibRef
Wang, S.G.[Shi-Guang],
Xie, T.[Tao],
Cheng, J.[Jian],
Zhang, X.C.[Xing-Cheng],
Liu, H.J.[Hai-Jun],
MDL-NAS: A Joint Multi-domain Learning Framework for Vision
Transformer,
CVPR23(20094-20104)
IEEE DOI
2309
BibRef
Ren, S.[Sucheng],
Wei, F.Y.[Fang-Yun],
Zhang, Z.[Zheng],
Hu, H.[Han],
TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models,
CVPR23(3687-3697)
IEEE DOI
2309
BibRef
He, J.F.[Jian-Feng],
Gao, Y.[Yuan],
Zhang, T.Z.[Tian-Zhu],
Zhang, Z.[Zhe],
Wu, F.[Feng],
D2Former: Jointly Learning Hierarchical Detectors and Contextual
Descriptors via Agent-Based Transformers,
CVPR23(2904-2914)
IEEE DOI
2309
BibRef
Chen, X.Y.[Xuan-Yao],
Liu, Z.J.[Zhi-Jian],
Tang, H.T.[Hao-Tian],
Yi, L.[Li],
Zhao, H.[Hang],
Han, S.[Song],
SparseViT: Revisiting Activation Sparsity for Efficient
High-Resolution Vision Transformer,
CVPR23(2061-2070)
IEEE DOI
2309
BibRef
Wei, S.Y.[Si-Yuan],
Ye, T.Z.[Tian-Zhu],
Zhang, S.[Shen],
Tang, Y.[Yao],
Liang, J.J.[Jia-Jun],
Joint Token Pruning and Squeezing Towards More Aggressive Compression
of Vision Transformers,
CVPR23(2092-2101)
IEEE DOI
2309
BibRef
Lin, Y.B.[Yan-Bo],
Bertasius, G.[Gedas],
Siamese Vision Transformers are Scalable Audio-Visual Learners,
ECCV24(XIV: 303-321).
Springer DOI
2412
BibRef
Lin, Y.B.[Yan-Bo],
Sung, Y.L.[Yi-Lin],
Lei, J.[Jie],
Bansal, M.[Mohit],
Bertasius, G.[Gedas],
Vision Transformers are Parameter-Efficient Audio-Visual Learners,
CVPR23(2299-2309)
IEEE DOI
2309
BibRef
Das, R.[Rajshekhar],
Dukler, Y.[Yonatan],
Ravichandran, A.[Avinash],
Swaminathan, A.[Ashwin],
Learning Expressive Prompting With Residuals for Vision Transformers,
CVPR23(3366-3377)
IEEE DOI
2309
BibRef
Zheng, M.X.[Meng-Xin],
Lou, Q.[Qian],
Jiang, L.[Lei],
TrojViT: Trojan Insertion in Vision Transformers,
CVPR23(4025-4034)
IEEE DOI
2309
BibRef
Li, Y.X.[Yan-Xi],
Xu, C.[Chang],
Trade-off between Robustness and Accuracy of Vision Transformers,
CVPR23(7558-7568)
IEEE DOI
2309
BibRef
Tarasiou, M.[Michail],
Chavez, E.[Erik],
Zafeiriou, S.[Stefanos],
ViTs for SITS: Vision Transformers for Satellite Image Time Series,
CVPR23(10418-10428)
IEEE DOI
2309
BibRef
Yu, Z.Z.[Zhong-Zhi],
Wu, S.[Shang],
Fu, Y.G.[Yong-Gan],
Zhang, S.[Shunyao],
Lin, Y.Y.C.[Ying-Yan Celine],
Hint-Aug: Drawing Hints from Foundation Vision Transformers towards
Boosted Few-shot Parameter-Efficient Tuning,
CVPR23(11102-11112)
IEEE DOI
2309
BibRef
Kim, D.[Dahun],
Angelova, A.[Anelia],
Kuo, W.C.[Wei-Cheng],
Region-centric Image-Language Pretraining for Open-Vocabulary Detection,
ECCV24(LXIII: 162-179).
Springer DOI
2412
BibRef
Earlier:
Region-Aware Pretraining for Open-Vocabulary Object Detection with
Vision Transformers,
CVPR23(11144-11154)
IEEE DOI
2309
BibRef
Hou, J.[Ji],
Dai, X.L.[Xiao-Liang],
He, Z.J.[Zi-Jian],
Dai, A.[Angela],
Nießner, M.[Matthias],
Mask3D: Pretraining 2D Vision Transformers by Learning Masked 3D
Priors,
CVPR23(13510-13519)
IEEE DOI
2309
BibRef
Xu, Z.Z.[Zheng-Zhuo],
Liu, R.K.[Rui-Kang],
Yang, S.[Shuo],
Chai, Z.H.[Zeng-Hao],
Yuan, C.[Chun],
Learning Imbalanced Data with Vision Transformers,
CVPR23(15793-15803)
IEEE DOI
2309
BibRef
Zhang, J.P.[Jian-Ping],
Huang, Y.Z.[Yi-Zhan],
Wu, W.B.[Wei-Bin],
Lyu, M.R.[Michael R.],
Transferable Adversarial Attacks on Vision Transformers with Token
Gradient Regularization,
CVPR23(16415-16424)
IEEE DOI
2309
BibRef
Yang, H.[Huanrui],
Yin, H.X.[Hong-Xu],
Shen, M.[Maying],
Molchanov, P.[Pavlo],
Li, H.[Hai],
Kautz, J.[Jan],
Global Vision Transformer Pruning with Hessian-Aware Saliency,
CVPR23(18547-18557)
IEEE DOI
2309
BibRef
Nakamura, R.[Ryo],
Kataoka, H.[Hirokatsu],
Takashima, S.[Sora],
Noriega, E.J.M.[Edgar Josafat Martinez],
Yokota, R.[Rio],
Inoue, N.[Nakamasa],
Pre-training Vision Transformers with Very Limited Synthesized Images,
ICCV23(20303-20312)
IEEE DOI
2401
BibRef
Takashima, S.[Sora],
Hayamizu, R.[Ryo],
Inoue, N.[Nakamasa],
Kataoka, H.[Hirokatsu],
Yokota, R.[Rio],
Visual Atoms: Pre-Training Vision Transformers with Sinusoidal Waves,
CVPR23(18579-18588)
IEEE DOI
2309
BibRef
Liu, Y.J.[Yi-Jiang],
Yang, H.R.[Huan-Rui],
Dong, Z.[Zhen],
Keutzer, K.[Kurt],
Du, L.[Li],
Zhang, S.H.[Shang-Hang],
NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization
for Vision Transformers,
CVPR23(20321-20330)
IEEE DOI
2309
BibRef
Park, J.[Jeongsoo],
Johnson, J.[Justin],
RGB No More: Minimally-Decoded JPEG Vision Transformers,
CVPR23(22334-22346)
IEEE DOI
2309
BibRef
Yu, C.[Chong],
Chen, T.[Tao],
Gan, Z.X.[Zhong-Xue],
Fan, J.Y.[Jia-Yuan],
Boost Vision Transformer with GPU-Friendly Sparsity and Quantization,
CVPR23(22658-22668)
IEEE DOI
2309
BibRef
Bao, F.[Fan],
Nie, S.[Shen],
Xue, K.W.[Kai-Wen],
Cao, Y.[Yue],
Li, C.X.[Chong-Xuan],
Su, H.[Hang],
Zhu, J.[Jun],
All are Worth Words: A ViT Backbone for Diffusion Models,
CVPR23(22669-22679)
IEEE DOI
2309
BibRef
Li, B.[Bonan],
Hu, Y.[Yinhan],
Nie, X.C.[Xue-Cheng],
Han, C.Y.[Cong-Ying],
Jiang, X.J.[Xiang-Jian],
Guo, T.D.[Tian-De],
Liu, L.Q.[Luo-Qi],
DropKey for Vision Transformer,
CVPR23(22700-22709)
IEEE DOI
2309
BibRef
Lan, S.Y.[Shi-Yi],
Yang, X.T.[Xi-Tong],
Yu, Z.D.[Zhi-Ding],
Wu, Z.[Zuxuan],
Alvarez, J.M.[Jose M.],
Anandkumar, A.[Anima],
Vision Transformers are Good Mask Auto-Labelers,
CVPR23(23745-23755)
IEEE DOI
2309
BibRef
Yu, L.[Lu],
Xiang, W.[Wei],
X-Pruner: eXplainable Pruning for Vision Transformers,
CVPR23(24355-24363)
IEEE DOI
2309
BibRef
Singh, A.[Apoorv],
Training Strategies for Vision Transformers for Object Detection,
WAD23(110-118)
IEEE DOI
2309
BibRef
Hukkelĺs, H.[Hĺkon],
Lindseth, F.[Frank],
Does Image Anonymization Impact Computer Vision Training?,
WAD23(140-150)
IEEE DOI
2309
BibRef
Marnissi, M.A.[Mohamed Amine],
Revolutionizing Thermal Imaging: GAN-Based Vision Transformers for
Image Enhancement,
ICIP23(2735-2739)
IEEE DOI
2312
BibRef
Marnissi, M.A.[Mohamed Amine],
Fathallah, A.[Abir],
GAN-based Vision Transformer for High-Quality Thermal Image
Enhancement,
GCV23(817-825)
IEEE DOI
2309
BibRef
Scheibenreif, L.[Linus],
Mommert, M.[Michael],
Borth, D.[Damian],
Masked Vision Transformers for Hyperspectral Image Classification,
EarthVision23(2166-2176)
IEEE DOI
2309
BibRef
Komorowski, P.[Piotr],
Baniecki, H.[Hubert],
Biecek, P.[Przemyslaw],
Towards Evaluating Explanations of Vision Transformers for Medical
Imaging,
XAI4CV23(3726-3732)
IEEE DOI
2309
BibRef
Ronen, T.[Tomer],
Levy, O.[Omer],
Golbert, A.[Avram],
Vision Transformers with Mixed-Resolution Tokenization,
ECV23(4613-4622)
IEEE DOI
2309
BibRef
Le, P.H.C.[Phuoc-Hoan Charles],
Li, X.[Xinlin],
BinaryViT: Pushing Binary Vision Transformers Towards Convolutional
Models,
ECV23(4665-4674)
IEEE DOI
2309
BibRef
Ma, D.[Dongning],
Zhao, P.F.[Peng-Fei],
Jiao, X.[Xun],
PerfHD: Efficient ViT Architecture Performance Ranking using
Hyperdimensional Computing,
NAS23(2230-2237)
IEEE DOI
2309
BibRef
Wang, J.[Jun],
Alamayreh, O.[Omran],
Tondi, B.[Benedetta],
Barni, M.[Mauro],
Open Set Classification of GAN-based Image Manipulations via a
ViT-based Hybrid Architecture,
WMF23(953-962)
IEEE DOI
2309
BibRef
Tian, R.[Rui],
Wu, Z.[Zuxuan],
Dai, Q.[Qi],
Hu, H.[Han],
Qiao, Y.[Yu],
Jiang, Y.G.[Yu-Gang],
ResFormer: Scaling ViTs with Multi-Resolution Training,
CVPR23(22721-22731)
IEEE DOI
2309
BibRef
Li, Y.[Yi],
Min, K.[Kyle],
Tripathi, S.[Subarna],
Vasconcelos, N.M.[Nuno M.],
SViTT: Temporal Learning of Sparse Video-Text Transformers,
CVPR23(18919-18929)
IEEE DOI
2309
BibRef
Guo, X.D.[Xin-Dong],
Sun, Y.[Yu],
Zhao, R.[Rong],
Kuang, L.Q.[Li-Qun],
Han, X.[Xie],
SWPT: Spherical Window-based Point Cloud Transformer,
ACCV22(I:396-412).
Springer DOI
2307
BibRef
Wang, W.J.[Wen-Ju],
Chen, G.[Gang],
Zhou, H.R.[Hao-Ran],
Wang, X.L.[Xiao-Lin],
OVPT: Optimal Viewset Pooling Transformer for 3d Object Recognition,
ACCV22(I:486-503).
Springer DOI
2307
BibRef
Kim, D.[Daeho],
Kim, J.[Jaeil],
Vision Transformer Compression and Architecture Exploration with
Efficient Embedding Space Search,
ACCV22(III:524-540).
Springer DOI
2307
BibRef
Lee, Y.S.[Yun-Sung],
Lee, G.[Gyuseong],
Ryoo, K.[Kwangrok],
Go, H.[Hyojun],
Park, J.[Jihye],
Kim, S.[Seungryong],
Towards Flexible Inductive Bias via Progressive Reparameterization
Scheduling,
VIPriors22(706-720).
Springer DOI
2304
Transformers vs. CNN different benefits. Best of both.
BibRef
Amir, S.[Shir],
Gandelsman, Y.[Yossi],
Bagon, S.[Shai],
Dekel, T.[Tali],
On the Effectiveness of VIT Features as Local Semantic Descriptors,
SelfLearn22(39-55).
Springer DOI
2304
BibRef
Deng, X.[Xuran],
Liu, C.B.[Chuan-Bin],
Lu, Z.Y.[Zhi-Ying],
Recombining Vision Transformer Architecture for Fine-grained Visual
Categorization,
MMMod23(II: 127-138).
Springer DOI
2304
BibRef
Tonkes, V.[Vincent],
Sabatelli, M.[Matthia],
How Well Do Vision Transformers (vts) Transfer to the Non-natural Image
Domain? An Empirical Study Involving Art Classification,
VisArt22(234-250).
Springer DOI
2304
BibRef
Rangrej, S.B.[Samrudhdhi B],
Liang, K.J.[Kevin J],
Hassner, T.[Tal],
Clark, J.J.[James J],
GliTr: Glimpse Transformers with Spatiotemporal Consistency for
Online Action Prediction,
WACV23(3402-3412)
IEEE DOI
2302
Predictive models, Transformers, Cameras, Spatiotemporal phenomena,
Sensors, Observability
BibRef
Song, C.H.[Chull Hwan],
Yoon, J.Y.[Joo-Young],
Choi, S.[Shunghyun],
Avrithis, Y.[Yannis],
Boosting vision transformers for image retrieval,
WACV23(107-117)
IEEE DOI
2302
Training, Location awareness, Image retrieval,
Self-supervised learning, Image representation, Transformers
BibRef
Yang, J.[Jinyu],
Liu, J.J.[Jing-Jing],
Xu, N.[Ning],
Huang, J.Z.[Jun-Zhou],
TVT: Transferable Vision Transformer for Unsupervised Domain
Adaptation,
WACV23(520-530)
IEEE DOI
2302
Benchmark testing, Image representation, Transformers,
Convolutional neural networks, Task analysis,
and algorithms (including transfer)
BibRef
Saavedra-Ruiz, M.[Miguel],
Morin, S.[Sacha],
Paull, L.[Liam],
Monocular Robot Navigation with Self-Supervised Pretrained Vision
Transformers,
CRV22(197-204)
IEEE DOI
2301
Adaptation models, Image segmentation, Image resolution,
Navigation, Transformers, Robot sensing systems, Visual Servoing
BibRef
Patel, K.[Krushi],
Bur, A.M.[Andrés M.],
Li, F.J.[Feng-Jun],
Wang, G.H.[Guang-Hui],
Aggregating Global Features into Local Vision Transformer,
ICPR22(1141-1147)
IEEE DOI
2212
Source coding, Computational modeling,
Information processing, Performance gain, Transformers
BibRef
Shen, Z.Q.[Zhi-Qiang],
Liu, Z.[Zechun],
Xing, E.[Eric],
Sliced Recursive Transformer,
ECCV22(XXIV:727-744).
Springer DOI
2211
BibRef
Shao, Y.[Yidi],
Loy, C.C.[Chen Change],
Dai, B.[Bo],
Transformer with Implicit Edges for Particle-Based Physics Simulation,
ECCV22(XIX:549-564).
Springer DOI
2211
BibRef
Wang, W.[Wen],
Zhang, J.[Jing],
Cao, Y.[Yang],
Shen, Y.L.[Yong-Liang],
Tao, D.C.[Da-Cheng],
Towards Data-Efficient Detection Transformers,
ECCV22(IX:88-105).
Springer DOI
2211
BibRef
Lorenzana, M.B.[Marlon Bran],
Engstrom, C.[Craig],
Chandra, S.S.[Shekhar S.],
Transformer Compressed Sensing Via Global Image Tokens,
ICIP22(3011-3015)
IEEE DOI
2211
Training, Limiting, Image resolution, Neural networks,
Image representation, Transformers, MRI
BibRef
Lu, X.Y.[Xiao-Yong],
Du, S.[Songlin],
NCTR: Neighborhood Consensus Transformer for Feature Matching,
ICIP22(2726-2730)
IEEE DOI
2211
Learning systems, Impedance matching, Aggregates, Pose estimation,
Neural networks, Transformers, Local feature matching,
graph neural network
BibRef
Jeny, A.A.[Afsana Ahsan],
Junayed, M.S.[Masum Shah],
Islam, M.B.[Md Baharul],
An Efficient End-To-End Image Compression Transformer,
ICIP22(1786-1790)
IEEE DOI
2211
Image coding, Correlation, Limiting, Computational modeling,
Rate-distortion, Video compression, Transformers, entropy model
BibRef
Bai, J.W.[Jia-Wang],
Yuan, L.[Li],
Xia, S.T.[Shu-Tao],
Yan, S.C.[Shui-Cheng],
Li, Z.F.[Zhi-Feng],
Liu, W.[Wei],
Improving Vision Transformers by Revisiting High-Frequency Components,
ECCV22(XXIV:1-18).
Springer DOI
2211
BibRef
Li, K.[Kehan],
Yu, R.[Runyi],
Wang, Z.[Zhennan],
Yuan, L.[Li],
Song, G.[Guoli],
Chen, J.[Jie],
Locality Guidance for Improving Vision Transformers on Tiny Datasets,
ECCV22(XXIV:110-127).
Springer DOI
2211
BibRef
Tu, Z.Z.[Zheng-Zhong],
Talebi, H.[Hossein],
Zhang, H.[Han],
Yang, F.[Feng],
Milanfar, P.[Peyman],
Bovik, A.C.[Alan C.],
Li, Y.[Yinxiao],
MaxViT: Multi-axis Vision Transformer,
ECCV22(XXIV:459-479).
Springer DOI
2211
BibRef
Yang, R.[Rui],
Ma, H.L.[Hai-Long],
Wu, J.[Jie],
Tang, Y.S.[Yan-Song],
Xiao, X.F.[Xue-Feng],
Zheng, M.[Min],
Li, X.[Xiu],
ScalableViT: Rethinking the Context-Oriented Generalization of Vision
Transformer,
ECCV22(XXIV:480-496).
Springer DOI
2211
BibRef
Touvron, H.[Hugo],
Cord, M.[Matthieu],
El-Nouby, A.[Alaaeldin],
Verbeek, J.[Jakob],
Jégou, H.[Hervé],
Three Things Everyone Should Know About Vision Transformers,
ECCV22(XXIV:497-515).
Springer DOI
2211
BibRef
Touvron, H.[Hugo],
Cord, M.[Matthieu],
Jégou, H.[Hervé],
DeiT III: Revenge of the ViT,
ECCV22(XXIV:516-533).
Springer DOI
2211
BibRef
Li, Y.H.[Yang-Hao],
Mao, H.Z.[Han-Zi],
Girshick, R.[Ross],
He, K.M.[Kai-Ming],
Exploring Plain Vision Transformer Backbones for Object Detection,
ECCV22(IX:280-296).
Springer DOI
2211
BibRef
Yu, Q.H.[Qi-Hang],
Wang, H.Y.[Hui-Yu],
Qiao, S.Y.[Si-Yuan],
Collins, M.[Maxwell],
Zhu, Y.K.[Yu-Kun],
Adam, H.[Hartwig],
Yuille, A.L.[Alan L.],
Chen, L.C.[Liang-Chieh],
k-means Mask Transformer,
ECCV22(XXIX:288-307).
Springer DOI
2211
BibRef
Pham, K.[Khoi],
Kafle, K.[Kushal],
Lin, Z.[Zhe],
Ding, Z.H.[Zhi-Hong],
Cohen, S.[Scott],
Tran, Q.[Quan],
Shrivastava, A.[Abhinav],
Improving Closed and Open-Vocabulary Attribute Prediction Using
Transformers,
ECCV22(XXV:201-219).
Springer DOI
2211
BibRef
Yu, W.X.[Wen-Xin],
Zhang, H.[Hongru],
Lan, T.X.[Tian-Xiang],
Hu, Y.C.[Yu-Cheng],
Yin, D.[Dong],
CBPT: A New Backbone for Enhancing Information Transmission of Vision
Transformers,
ICIP22(156-160)
IEEE DOI
2211
Merging, Information processing, Object detection, Transformers,
Computational complexity, Vision Transformer, Backbone
BibRef
Takeda, M.[Mana],
Yanai, K.[Keiji],
Continual Learning in Vision Transformer,
ICIP22(616-620)
IEEE DOI
2211
Learning systems, Image recognition, Transformers,
Natural language processing, Convolutional neural networks, Vision Transformer
BibRef
Zhou, W.L.[Wei-Lian],
Kamata, S.I.[Sei-Ichiro],
Luo, Z.[Zhengbo],
Xue, X.[Xi],
Rethinking Unified Spectral-Spatial-Based Hyperspectral Image
Classification Under 3D Configuration of Vision Transformer,
ICIP22(711-715)
IEEE DOI
2211
Flowcharts, Correlation, Convolution, Transformers,
Hyperspectral image classification, 3D coordinate positional embedding
BibRef
Cao, Y.H.[Yun-Hao],
Yu, H.[Hao],
Wu, J.X.[Jian-Xin],
Training Vision Transformers with only 2040 Images,
ECCV22(XXV:220-237).
Springer DOI
2211
BibRef
Wang, C.[Cong],
Xu, H.M.[Hong-Min],
Zhang, X.[Xiong],
Wang, L.[Li],
Zheng, Z.T.[Zhi-Tong],
Liu, H.F.[Hai-Feng],
Convolutional Embedding Makes Hierarchical Vision Transformer Stronger,
ECCV22(XX:739-756).
Springer DOI
2211
BibRef
Wu, B.X.[Bo-Xi],
Gu, J.D.[Jin-Dong],
Li, Z.F.[Zhi-Feng],
Cai, D.[Deng],
He, X.F.[Xiao-Fei],
Liu, W.[Wei],
Towards Efficient Adversarial Training on Vision Transformers,
ECCV22(XIII:307-325).
Springer DOI
2211
BibRef
Zong, Z.F.[Zhuo-Fan],
Li, K.C.[Kun-Chang],
Song, G.L.[Guang-Lu],
Wang, Y.[Yali],
Qiao, Y.[Yu],
Leng, B.[Biao],
Liu, Y.[Yu],
Self-slimmed Vision Transformer,
ECCV22(XI:432-448).
Springer DOI
2211
BibRef
Fayyaz, M.[Mohsen],
Koohpayegani, S.A.[Soroush Abbasi],
Jafari, F.R.[Farnoush Rezaei],
Sengupta, S.[Sunando],
Joze, H.R.V.[Hamid Reza Vaezi],
Sommerlade, E.[Eric],
Pirsiavash, H.[Hamed],
Gall, J.[Jürgen],
Adaptive Token Sampling for Efficient Vision Transformers,
ECCV22(XI:396-414).
Springer DOI
2211
BibRef
Weng, Z.J.[Ze-Jia],
Yang, X.T.[Xi-Tong],
Li, A.[Ang],
Wu, Z.X.[Zu-Xuan],
Jiang, Y.G.[Yu-Gang],
Semi-supervised Vision Transformers,
ECCV22(XXX:605-620).
Springer DOI
2211
BibRef
Su, T.[Tong],
Ye, S.[Shuo],
Song, C.Q.[Cheng-Qun],
Cheng, J.[Jun],
Mask-Vit: an Object Mask Embedding in Vision Transformer for
Fine-Grained Visual Classification,
ICIP22(1626-1630)
IEEE DOI
2211
Knowledge engineering, Visualization, Focusing, Interference,
Benchmark testing, Transformers, Feature extraction, Knowledge Embedding
BibRef
Gai, L.[Lulu],
Chen, W.[Wei],
Gao, R.[Rui],
Chen, Y.W.[Yan-Wei],
Qiao, X.[Xu],
Using Vision Transformers in 3-D Medical Image Classifications,
ICIP22(696-700)
IEEE DOI
2211
Deep learning, Training, Visualization, Transfer learning,
Optimization methods, Self-supervised learning, Transformers,
3-D medical image classifications
BibRef
Wu, K.[Kan],
Zhang, J.[Jinnian],
Peng, H.[Houwen],
Liu, M.C.[Meng-Chen],
Xiao, B.[Bin],
Fu, J.L.[Jian-Long],
Yuan, L.[Lu],
TinyViT: Fast Pretraining Distillation for Small Vision Transformers,
ECCV22(XXI:68-85).
Springer DOI
2211
BibRef
Gao, L.[Li],
Nie, D.[Dong],
Li, B.[Bo],
Ren, X.F.[Xiao-Feng],
Doubly-Fused ViT: Fuse Information from Vision Transformer Doubly with
Local Representation,
ECCV22(XXIII:744-761).
Springer DOI
2211
BibRef
Yao, T.[Ting],
Pan, Y.W.[Ying-Wei],
Li, Y.[Yehao],
Ngo, C.W.[Chong-Wah],
Mei, T.[Tao],
Wave-ViT: Unifying Wavelet and Transformers for Visual Representation
Learning,
ECCV22(XXV:328-345).
Springer DOI
2211
BibRef
Yuan, Z.H.[Zhi-Hang],
Xue, C.H.[Chen-Hao],
Chen, Y.Q.[Yi-Qi],
Wu, Q.[Qiang],
Sun, G.Y.[Guang-Yu],
PTQ4ViT: Post-training Quantization for Vision Transformers with Twin
Uniform Quantization,
ECCV22(XII:191-207).
Springer DOI
2211
BibRef
Kong, Z.L.[Zheng-Lun],
Dong, P.Y.[Pei-Yan],
Ma, X.L.[Xiao-Long],
Meng, X.[Xin],
Niu, W.[Wei],
Sun, M.S.[Meng-Shu],
Shen, X.[Xuan],
Yuan, G.[Geng],
Ren, B.[Bin],
Tang, H.[Hao],
Qin, M.H.[Ming-Hai],
Wang, Y.Z.[Yan-Zhi],
SPViT:
Enabling Faster Vision Transformers via Latency-Aware Soft Token Pruning,
ECCV22(XI:620-640).
Springer DOI
2211
BibRef
Pan, J.T.[Jun-Ting],
Bulat, A.[Adrian],
Tan, F.[Fuwen],
Zhu, X.T.[Xia-Tian],
Dudziak, L.[Lukasz],
Li, H.S.[Hong-Sheng],
Tzimiropoulos, G.[Georgios],
Martinez, B.[Brais],
EdgeViTs: Competing Light-Weight CNNs on Mobile Devices with Vision
Transformers,
ECCV22(XI:294-311).
Springer DOI
2211
BibRef
Liu, Y.[Yong],
Mai, S.Q.[Si-Qi],
Chen, X.N.[Xiang-Ning],
Hsieh, C.J.[Cho-Jui],
You, Y.[Yang],
Towards Efficient and Scalable Sharpness-Aware Minimization,
CVPR22(12350-12360)
IEEE DOI
2210
WWW Link. Training, Schedules, Scalability, Perturbation methods,
Stochastic processes, Transformers, Minimization,
Vision applications and systems
BibRef
Ren, P.Z.[Peng-Zhen],
Li, C.[Changlin],
Wang, G.[Guangrun],
Xiao, Y.[Yun],
Du, Q.[Qing],
Liang, X.D.[Xiao-Dan],
Chang, X.J.[Xiao-Jun],
Beyond Fixation: Dynamic Window Visual Transformer,
CVPR22(11977-11987)
IEEE DOI
2210
Performance evaluation, Visualization, Systematics,
Computational modeling, Scalability, Transformers,
Deep learning architectures and techniques
BibRef
Fang, J.[Jiemin],
Xie, L.X.[Ling-Xi],
Wang, X.G.[Xing-Gang],
Zhang, X.P.[Xiao-Peng],
Liu, W.Y.[Wen-Yu],
Tian, Q.[Qi],
MSG-Transformer:
Exchanging Local Spatial Information by Manipulating Messenger Tokens,
CVPR22(12053-12062)
IEEE DOI
2210
Deep learning, Visualization, Neural networks,
Graphics processing units, retrieval
BibRef
Sandler, M.[Mark],
Zhmoginov, A.[Andrey],
Vladymyrov, M.[Max],
Jackson, A.[Andrew],
Fine-tuning Image Transformers using Learnable Memory,
CVPR22(12145-12154)
IEEE DOI
2210
Deep learning, Adaptation models, Costs, Computational modeling,
Memory management, Transformers, Transfer/low-shot/long-tail learning
BibRef
Yu, X.[Xumin],
Tang, L.[Lulu],
Rao, Y.M.[Yong-Ming],
Huang, T.J.[Tie-Jun],
Zhou, J.[Jie],
Lu, J.W.[Ji-Wen],
Point-BERT: Pre-training 3D Point Cloud Transformers with Masked
Point Modeling,
CVPR22(19291-19300)
IEEE DOI
2210
Point cloud compression, Solid modeling, Computational modeling,
Bit error rate, Transformers,
Deep learning architectures and techniques
BibRef
Park, C.[Chunghyun],
Jeong, Y.[Yoonwoo],
Cho, M.[Minsu],
Park, J.[Jaesik],
Fast Point Transformer,
CVPR22(16928-16937)
IEEE DOI
2210
Point cloud compression, Shape, Semantics, Neural networks,
Transformers, grouping and shape analysis
BibRef
Tu, Z.Z.[Zheng-Zhong],
Talebi, H.[Hossein],
Zhang, H.[Han],
Yang, F.[Feng],
Milanfar, P.[Peyman],
Bovik, A.[Alan],
Li, Y.X.[Yin-Xiao],
MAXIM: Multi-Axis MLP for Image Processing,
CVPR22(5759-5770)
IEEE DOI
2210
WWW Link. Training, Photography, Adaptation models, Visualization,
Computational modeling, Transformers, Low-level vision,
Computational photography
BibRef
Hou, Z.J.[Ze-Jiang],
Kung, S.Y.[Sun-Yuan],
Multi-Dimensional Vision Transformer Compression via Dependency
Guided Gaussian Process Search,
EVW22(3668-3677)
IEEE DOI
2210
Adaptation models, Image coding, Head, Computational modeling,
Neurons, Gaussian processes, Transformers
BibRef
Wang, Y.K.[Yi-Kai],
Chen, X.H.[Xing-Hao],
Cao, L.[Lele],
Huang, W.B.[Wen-Bing],
Sun, F.C.[Fu-Chun],
Wang, Y.H.[Yun-He],
Multimodal Token Fusion for Vision Transformers,
CVPR22(12176-12185)
IEEE DOI
2210
Point cloud compression, Image segmentation, Shape, Semantics,
Object detection, Vision+X
BibRef
Zhang, J.N.[Jin-Nian],
Peng, H.W.[Hou-Wen],
Wu, K.[Kan],
Liu, M.C.[Meng-Chen],
Xiao, B.[Bin],
Fu, J.L.[Jian-Long],
Yuan, L.[Lu],
MiniViT: Compressing Vision Transformers with Weight Multiplexing,
CVPR22(12135-12144)
IEEE DOI
2210
Multiplexing, Performance evaluation, Image coding, Codes,
Computational modeling, Benchmark testing,
Vision applications and systems
BibRef
Chen, T.L.[Tian-Long],
Zhang, Z.Y.[Zhen-Yu],
Cheng, Y.[Yu],
Awadallah, A.[Ahmed],
Wang, Z.Y.[Zhang-Yang],
The Principle of Diversity: Training Stronger Vision Transformers
Calls for Reducing All Levels of Redundancy,
CVPR22(12010-12020)
IEEE DOI
2210
Training, Convolutional codes, Deep learning,
Computational modeling, Redundancy, Deep learning architectures and techniques
BibRef
Yin, H.X.[Hong-Xu],
Vahdat, A.[Arash],
Alvarez, J.M.[Jose M.],
Mallya, A.[Arun],
Kautz, J.[Jan],
Molchanov, P.[Pavlo],
A-ViT: Adaptive Tokens for Efficient Vision Transformer,
CVPR22(10799-10808)
IEEE DOI
2210
Training, Adaptive systems, Network architecture, Transformers,
Throughput, Hardware, Complexity theory,
Efficient learning and inferences
BibRef
Lu, J.H.[Jia-Hao],
Zhang, X.S.[Xi Sheryl],
Zhao, T.L.[Tian-Li],
He, X.Y.[Xiang-Yu],
Cheng, J.[Jian],
APRIL: Finding the Achilles' Heel on Privacy for Vision Transformers,
CVPR22(10041-10050)
IEEE DOI
2210
Privacy, Data privacy, Federated learning, Computational modeling,
Training data, Transformers, Privacy and federated learning
BibRef
Hatamizadeh, A.[Ali],
Yin, H.X.[Hong-Xu],
Roth, H.[Holger],
Li, W.Q.[Wen-Qi],
Kautz, J.[Jan],
Xu, D.[Daguang],
Molchanov, P.[Pavlo],
GradViT: Gradient Inversion of Vision Transformers,
CVPR22(10011-10020)
IEEE DOI
2210
Measurement, Differential privacy, Neural networks, Transformers,
Security, Iterative methods, Privacy and federated learning
BibRef
Zhang, H.F.[Hao-Fei],
Duan, J.R.[Jia-Rui],
Xue, M.Q.[Meng-Qi],
Song, J.[Jie],
Sun, L.[Li],
Song, M.L.[Ming-Li],
Bootstrapping ViTs: Towards Liberating Vision Transformers from
Pre-training,
CVPR22(8934-8943)
IEEE DOI
2210
Training, Upper bound, Neural networks, Training data,
Network architecture, Transformers, Computer vision theory,
Efficient learning and inferences
BibRef
Chavan, A.[Arnav],
Shen, Z.Q.[Zhi-Qiang],
Liu, Z.[Zhuang],
Liu, Z.[Zechun],
Cheng, K.T.[Kwang-Ting],
Xing, E.[Eric],
Vision Transformer Slimming:
Multi-Dimension Searching in Continuous Optimization Space,
CVPR22(4921-4931)
IEEE DOI
2210
Training, Performance evaluation, Image coding, Force,
Graphics processing units,
Vision applications and systems
BibRef
Chen, R.J.[Richard J.],
Chen, C.[Chengkuan],
Li, Y.C.[Yi-Cong],
Chen, T.Y.[Tiffany Y.],
Trister, A.D.[Andrew D.],
Krishnan, R.G.[Rahul G.],
Mahmood, F.[Faisal],
Scaling Vision Transformers to Gigapixel Images via Hierarchical
Self-Supervised Learning,
CVPR22(16123-16134)
IEEE DOI
2210
Training, Visualization, Self-supervised learning,
Image representation, Transformers,
Self- semi- meta- unsupervised learning
BibRef
Zhai, X.H.[Xiao-Hua],
Kolesnikov, A.[Alexander],
Houlsby, N.[Neil],
Beyer, L.[Lucas],
Scaling Vision Transformers,
CVPR22(1204-1213)
IEEE DOI
2210
Training, Error analysis, Computational modeling, Neural networks,
Memory management, Training data,
Transfer/low-shot/long-tail learning
BibRef
Guo, J.Y.[Jian-Yuan],
Han, K.[Kai],
Wu, H.[Han],
Tang, Y.[Yehui],
Chen, X.H.[Xing-Hao],
Wang, Y.H.[Yun-He],
Xu, C.[Chang],
CMT: Convolutional Neural Networks Meet Vision Transformers,
CVPR22(12165-12175)
IEEE DOI
2210
Visualization, Image recognition, Force,
Object detection, Transformers,
Representation learning
BibRef
Meng, L.C.[Ling-Chen],
Li, H.D.[Heng-Duo],
Chen, B.C.[Bor-Chun],
Lan, S.Y.[Shi-Yi],
Wu, Z.X.[Zu-Xuan],
Jiang, Y.G.[Yu-Gang],
Lim, S.N.[Ser-Nam],
AdaViT: Adaptive Vision Transformers for Efficient Image Recognition,
CVPR22(12299-12308)
IEEE DOI
2210
Image recognition, Head, Law enforcement, Computational modeling,
Redundancy, Transformers, Efficient learning and inferences,
retrieval
BibRef
Herrmann, C.[Charles],
Sargent, K.[Kyle],
Jiang, L.[Lu],
Zabih, R.[Ramin],
Chang, H.[Huiwen],
Liu, C.[Ce],
Krishnan, D.[Dilip],
Sun, D.Q.[De-Qing],
Pyramid Adversarial Training Improves ViT Performance,
CVPR22(13409-13419)
IEEE DOI
2210
Training, Image recognition, Stochastic processes,
Transformers, Robustness, retrieval,
Recognition: detection
BibRef
Li, C.L.[Chang-Lin],
Zhuang, B.[Bohan],
Wang, G.R.[Guang-Run],
Liang, X.D.[Xiao-Dan],
Chang, X.J.[Xiao-Jun],
Yang, Y.[Yi],
Automated Progressive Learning for Efficient Training of Vision
Transformers,
CVPR22(12476-12486)
IEEE DOI
2210
Training, Adaptation models, Schedules, Computational modeling,
Estimation, Manuals, Transformers, Representation learning
BibRef
Pu, M.Y.[Meng-Yang],
Huang, Y.P.[Ya-Ping],
Liu, Y.M.[Yu-Ming],
Guan, Q.J.[Qing-Ji],
Ling, H.B.[Hai-Bin],
EDTER: Edge Detection with Transformer,
CVPR22(1392-1402)
IEEE DOI
2210
Head, Image edge detection, Semantics, Detectors, Transformers,
Feature extraction, Segmentation, grouping and shape analysis,
Scene analysis and understanding
BibRef
Zhu, R.[Rui],
Li, Z.Q.[Zheng-Qin],
Matai, J.[Janarbek],
Porikli, F.M.[Fatih M.],
Chandraker, M.[Manmohan],
IRISformer: Dense Vision Transformers for Single-Image Inverse
Rendering in Indoor Scenes,
CVPR22(2812-2821)
IEEE DOI
2210
Photorealism, Shape, Computational modeling, Lighting,
Transformers,
Physics-based vision and shape-from-X
BibRef
Ermolov, A.[Aleksandr],
Mirvakhabova, L.[Leyla],
Khrulkov, V.[Valentin],
Sebe, N.[Nicu],
Oseledets, I.[Ivan],
Hyperbolic Vision Transformers: Combining Improvements in Metric
Learning,
CVPR22(7399-7409)
IEEE DOI
2210
Measurement, Geometry, Visualization, Semantics, Self-supervised learning,
Transformer cores, Transformers, Representation learning
BibRef
Zhang, C.Z.[Chong-Zhi],
Zhang, M.Y.[Ming-Yuan],
Zhang, S.H.[Shang-Hang],
Jin, D.S.[Dai-Sheng],
Zhou, Q.[Qiang],
Cai, Z.A.[Zhong-Ang],
Zhao, H.[Haiyu],
Liu, X.L.[Xiang-Long],
Liu, Z.W.[Zi-Wei],
Delving Deep into the Generalization of Vision Transformers under
Distribution Shifts,
CVPR22(7267-7276)
IEEE DOI
2210
Training, Representation learning, Systematics, Shape, Taxonomy,
Self-supervised learning, Transformers, Recognition: detection,
Representation learning
BibRef
Hou, Z.[Zhi],
Yu, B.[Baosheng],
Tao, D.C.[Da-Cheng],
BatchFormer: Learning to Explore Sample Relationships for Robust
Representation Learning,
CVPR22(7246-7256)
IEEE DOI
2210
Training, Deep learning, Representation learning, Neural networks,
Tail, Transformers, Transfer/low-shot/long-tail learning,
Self- semi- meta- unsupervised learning
BibRef
Zamir, S.W.[Syed Waqas],
Arora, A.[Aditya],
Khan, S.[Salman],
Hayat, M.[Munawar],
Khan, F.S.[Fahad Shahbaz],
Yang, M.H.[Ming-Hsuan],
Restormer: Efficient Transformer for High-Resolution Image
Restoration,
CVPR22(5718-5729)
IEEE DOI
2210
Computational modeling, Transformer cores,
Transformers, Data models, Image restoration, Task analysis,
Deep learning architectures and techniques
BibRef
Lin, K.[Kevin],
Wang, L.J.[Li-Juan],
Liu, Z.C.[Zi-Cheng],
Mesh Graphormer,
ICCV21(12919-12928)
IEEE DOI
2203
Convolutional codes, Solid modeling, Network topology,
Transformers, Gestures and body pose
BibRef
Casey, E.[Evan],
Pérez, V.[Víctor],
Li, Z.R.[Zhuo-Ru],
The Animation Transformer: Visual Correspondence via Segment Matching,
ICCV21(11303-11312)
IEEE DOI
2203
Visualization, Image segmentation, Image color analysis,
Production, Animation, Transformers,
grouping and shape
BibRef
Reizenstein, J.[Jeremy],
Shapovalov, R.[Roman],
Henzler, P.[Philipp],
Sbordone, L.[Luca],
Labatut, P.[Patrick],
Novotny, D.[David],
Common Objects in 3D: Large-Scale Learning and Evaluation of
Real-life 3D Category Reconstruction,
ICCV21(10881-10891)
IEEE DOI
2203
Award, Marr Prize, HM. Point cloud compression, Transformers,
Rendering (computer graphics), Cameras, Image reconstruction,
3D from multiview and other sensors
BibRef
Feng, W.X.[Wei-Xin],
Wang, Y.J.[Yuan-Jiang],
Ma, L.H.[Li-Hua],
Yuan, Y.[Ye],
Zhang, C.[Chi],
Temporal Knowledge Consistency for Unsupervised Visual Representation
Learning,
ICCV21(10150-10160)
IEEE DOI
2203
Training, Representation learning, Visualization, Protocols,
Object detection, Semisupervised learning, Transformers,
Transfer/Low-shot/Semi/Unsupervised Learning
BibRef
Wu, H.P.[Hai-Ping],
Xiao, B.[Bin],
Codella, N.[Noel],
Liu, M.C.[Meng-Chen],
Dai, X.Y.[Xi-Yang],
Yuan, L.[Lu],
Zhang, L.[Lei],
CvT: Introducing Convolutions to Vision Transformers,
ICCV21(22-31)
IEEE DOI
2203
Code, Vision Transformer.
WWW Link. Convolutional codes, Image resolution, Image recognition,
Performance gain, Transformers, Distortion,
BibRef
Touvron, H.[Hugo],
Cord, M.[Matthieu],
Sablayrolles, A.[Alexandre],
Synnaeve, G.[Gabriel],
Jégou, H.[Hervé],
Going deeper with Image Transformers,
ICCV21(32-42)
IEEE DOI
2203
Training, Neural networks, Training data,
Data models, Circuit faults, Recognition and classification,
Optimization and learning methods
BibRef
Zhao, J.W.[Jia-Wei],
Yan, K.[Ke],
Zhao, Y.F.[Yi-Fan],
Guo, X.W.[Xiao-Wei],
Huang, F.Y.[Fei-Yue],
Li, J.[Jia],
Transformer-based Dual Relation Graph for Multi-label Image
Recognition,
ICCV21(163-172)
IEEE DOI
2203
Image recognition, Correlation, Computational modeling, Semantics,
Benchmark testing, Representation learning
BibRef
Pan, Z.Z.[Zi-Zheng],
Zhuang, B.[Bohan],
Liu, J.[Jing],
He, H.Y.[Hao-Yu],
Cai, J.F.[Jian-Fei],
Scalable Vision Transformers with Hierarchical Pooling,
ICCV21(367-376)
IEEE DOI
2203
Visualization, Image recognition, Computational modeling,
Scalability, Transformers, Computational efficiency,
Efficient training and inference methods
BibRef
Yuan, L.[Li],
Chen, Y.P.[Yun-Peng],
Wang, T.[Tao],
Yu, W.H.[Wei-Hao],
Shi, Y.J.[Yu-Jun],
Jiang, Z.H.[Zi-Hang],
Tay, F.E.H.[Francis E. H.],
Feng, J.S.[Jia-Shi],
Yan, S.C.[Shui-Cheng],
Tokens-to-Token ViT:
Training Vision Transformers from Scratch on ImageNet,
ICCV21(538-547)
IEEE DOI
2203
Training, Image resolution, Computational modeling,
Image edge detection, Transformers,
BibRef
Wu, B.[Bichen],
Xu, C.F.[Chen-Feng],
Dai, X.L.[Xiao-Liang],
Wan, A.[Alvin],
Zhang, P.Z.[Pei-Zhao],
Yan, Z.C.[Zhi-Cheng],
Tomizuka, M.[Masayoshi],
Gonzalez, J.[Joseph],
Keutzer, K.[Kurt],
Vajda, P.[Peter],
Visual Transformers: Where Do Transformers Really Belong in Vision
Models?,
ICCV21(579-589)
IEEE DOI
2203
Training, Visualization, Image segmentation, Lips,
Computational modeling, Semantics, Vision applications and systems
BibRef
Hu, R.H.[Rong-Hang],
Singh, A.[Amanpreet],
UniT: Multimodal Multitask Learning with a Unified Transformer,
ICCV21(1419-1429)
IEEE DOI
2203
Training, Natural languages,
Object detection, Predictive models, Transformers, Multitasking,
Representation learning
BibRef
Qiu, Y.[Yue],
Yamamoto, S.[Shintaro],
Nakashima, K.[Kodai],
Suzuki, R.[Ryota],
Iwata, K.[Kenji],
Kataoka, H.[Hirokatsu],
Satoh, Y.[Yutaka],
Describing and Localizing Multiple Changes with Transformers,
ICCV21(1951-1960)
IEEE DOI
2203
Measurement, Location awareness, Codes, Natural languages,
Benchmark testing, Transformers,
Vision applications and systems
BibRef
Song, M.[Myungseo],
Choi, J.[Jinyoung],
Han, B.H.[Bo-Hyung],
Variable-Rate Deep Image Compression through Spatially-Adaptive
Feature Transform,
ICCV21(2360-2369)
IEEE DOI
2203
Training, Image coding, Neural networks, Rate-distortion, Transforms,
Network architecture, Computational photography,
Low-level and physics-based vision
BibRef
Shenga, H.[Hualian],
Cai, S.[Sijia],
Liu, Y.[Yuan],
Deng, B.[Bing],
Huang, J.Q.[Jian-Qiang],
Hua, X.S.[Xian-Sheng],
Zhao, M.J.[Min-Jian],
Improving 3D Object Detection with Channel-wise Transformer,
ICCV21(2723-2732)
IEEE DOI
2203
Point cloud compression, Object detection, Detectors, Transforms,
Transformers, Encoding, Detection and localization in 2D and 3D,
BibRef
Zhang, P.C.[Peng-Chuan],
Dai, X.[Xiyang],
Yang, J.W.[Jian-Wei],
Xiao, B.[Bin],
Yuan, L.[Lu],
Zhang, L.[Lei],
Gao, J.F.[Jian-Feng],
Multi-Scale Vision Longformer: A New Vision Transformer for
High-Resolution Image Encoding,
ICCV21(2978-2988)
IEEE DOI
2203
Image segmentation, Image coding, Computational modeling,
Memory management, Object detection, Transformers,
Representation learning
BibRef
Dong, Q.[Qi],
Tu, Z.W.[Zhuo-Wen],
Liao, H.F.[Hao-Fu],
Zhang, Y.T.[Yu-Ting],
Mahadevan, V.[Vijay],
Soatto, S.[Stefano],
Visual Relationship Detection Using Part-and-Sum Transformers with
Composite Queries,
ICCV21(3530-3539)
IEEE DOI
2203
Visualization, Detectors, Transformers, Task analysis, Standards,
Detection and localization in 2D and 3D,
Representation learning
BibRef
Fan, H.Q.[Hao-Qi],
Xiong, B.[Bo],
Mangalam, K.[Karttikeya],
Li, Y.[Yanghao],
Yan, Z.C.[Zhi-Cheng],
Malik, J.[Jitendra],
Feichtenhofer, C.[Christoph],
Multiscale Vision Transformers,
ICCV21(6804-6815)
IEEE DOI
2203
Visualization, Image recognition, Codes, Computational modeling,
Transformers, Complexity theory,
Recognition and classification
BibRef
Mahmood, K.[Kaleel],
Mahmood, R.[Rigel],
van Dijk, M.[Marten],
On the Robustness of Vision Transformers to Adversarial Examples,
ICCV21(7818-7827)
IEEE DOI
2203
Transformers, Robustness,
Adversarial machine learning, Security,
Machine learning architectures and formulations
BibRef
Chen, X.L.[Xin-Lei],
Xie, S.[Saining],
He, K.[Kaiming],
An Empirical Study of Training Self-Supervised Vision Transformers,
ICCV21(9620-9629)
IEEE DOI
2203
Training, Benchmark testing, Transformers, Standards,
Representation learning, Recognition and classification, Transfer/Low-shot/Semi/Unsupervised Learning
BibRef
Yuan, Y.[Ye],
Weng, X.[Xinshuo],
Ou, Y.[Yanglan],
Kitani, K.[Kris],
AgentFormer: Agent-Aware Transformers for Socio-Temporal Multi-Agent
Forecasting,
ICCV21(9793-9803)
IEEE DOI
2203
Uncertainty, Stochastic processes, Predictive models, Transformers,
Encoding, Trajectory, Motion and tracking,
Vision for robotics and autonomous vehicles
BibRef
Wu, K.[Kan],
Peng, H.W.[Hou-Wen],
Chen, M.H.[Ming-Hao],
Fu, J.L.[Jian-Long],
Chao, H.Y.[Hong-Yang],
Rethinking and Improving Relative Position Encoding for Vision
Transformer,
ICCV21(10013-10021)
IEEE DOI
2203
Image coding, Codes, Computational modeling, Transformers, Encoding,
Natural language processing, Datasets and evaluation,
Recognition and classification
BibRef
Bhojanapalli, S.[Srinadh],
Chakrabarti, A.[Ayan],
Glasner, D.[Daniel],
Li, D.[Daliang],
Unterthiner, T.[Thomas],
Veit, A.[Andreas],
Understanding Robustness of Transformers for Image Classification,
ICCV21(10211-10221)
IEEE DOI
2203
Perturbation methods, Transformers,
Robustness, Data models, Convolutional neural networks,
Recognition and classification
BibRef
Yan, B.[Bin],
Peng, H.[Houwen],
Fu, J.L.[Jian-Long],
Wang, D.[Dong],
Lu, H.C.[Hu-Chuan],
Learning Spatio-Temporal Transformer for Visual Tracking,
ICCV21(10428-10437)
IEEE DOI
2203
Visualization, Target tracking, Smoothing methods, Pipelines,
Benchmark testing, Transformers,
BibRef
Heo, B.[Byeongho],
Yun, S.[Sangdoo],
Han, D.Y.[Dong-Yoon],
Chun, S.[Sanghyuk],
Choe, J.[Junsuk],
Oh, S.J.[Seong Joon],
Rethinking Spatial Dimensions of Vision Transformers,
ICCV21(11916-11925)
IEEE DOI
2203
Dimensionality reduction, Computational modeling,
Object detection, Transformers, Robustness,
Recognition and classification
BibRef
Voskou, A.[Andreas],
Panousis, K.P.[Konstantinos P.],
Kosmopoulos, D.[Dimitrios],
Metaxas, D.N.[Dimitris N.],
Chatzis, S.[Sotirios],
Stochastic Transformer Networks with Linear Competing Units:
Application to end-to-end SL Translation,
ICCV21(11926-11935)
IEEE DOI
2203
Training, Memory management, Stochastic processes,
Gesture recognition, Benchmark testing, Assistive technologies,
BibRef
Ranftl, R.[René],
Bochkovskiy, A.[Alexey],
Koltun, V.[Vladlen],
Vision Transformers for Dense Prediction,
ICCV21(12159-12168)
IEEE DOI
2203
Image resolution, Semantics, Neural networks, Estimation,
Training data, grouping and shape
BibRef
Chen, M.H.[Ming-Hao],
Peng, H.W.[Hou-Wen],
Fu, J.L.[Jian-Long],
Ling, H.B.[Hai-Bin],
AutoFormer: Searching Transformers for Visual Recognition,
ICCV21(12250-12260)
IEEE DOI
2203
Training, Convolutional codes, Visualization, Head, Search methods,
Manuals, Recognition and classification
BibRef
Yuan, K.[Kun],
Guo, S.P.[Shao-Peng],
Liu, Z.W.[Zi-Wei],
Zhou, A.[Aojun],
Yu, F.W.[Feng-Wei],
Wu, W.[Wei],
Incorporating Convolution Designs into Visual Transformers,
ICCV21(559-568)
IEEE DOI
2203
Training, Visualization, Costs, Convolution, Training data,
Transformers, Feature extraction, Recognition and classification,
Efficient training and inference methods
BibRef
Chen, Z.[Zhengsu],
Xie, L.X.[Ling-Xi],
Niu, J.W.[Jian-Wei],
Liu, X.F.[Xue-Feng],
Wei, L.[Longhui],
Tian, Q.[Qi],
Visformer: The Vision-friendly Transformer,
ICCV21(569-578)
IEEE DOI
2203
Convolutional codes, Training, Visualization, Protocols,
Computational modeling, Fitting, Recognition and classification,
Representation learning
BibRef
Yao, Z.L.[Zhu-Liang],
Cao, Y.[Yue],
Lin, Y.T.[Yu-Tong],
Liu, Z.[Ze],
Zhang, Z.[Zheng],
Hu, H.[Han],
Leveraging Batch Normalization for Vision Transformers,
NeruArch21(413-422)
IEEE DOI
2112
Training, Transformers, Feeds
BibRef
Graham, B.[Ben],
El-Nouby, A.[Alaaeldin],
Touvron, H.[Hugo],
Stock, P.[Pierre],
Joulin, A.[Armand],
Jégou, H.[Hervé],
Douze, M.[Matthijs],
LeViT: a Vision Transformer in ConvNet's Clothing for Faster
Inference,
ICCV21(12239-12249)
IEEE DOI
2203
Training, Image resolution, Neural networks,
Parallel processing, Transformers, Feature extraction,
Representation learning
BibRef
Horváth, J.[János],
Baireddy, S.[Sriram],
Hao, H.X.[Han-Xiang],
Montserrat, D.M.[Daniel Mas],
Delp, E.J.[Edward J.],
Manipulation Detection in Satellite Images Using Vision Transformer,
WMF21(1032-1041)
IEEE DOI
2109
BibRef
Earlier: A1, A4, A3, A5, Only:
Manipulation Detection in Satellite Images Using Deep Belief Networks,
WMF20(2832-2840)
IEEE DOI
2008
Image sensors, Satellites, Splicing, Forestry, Tools.
Satellites, Image reconstruction, Training, Forgery,
Heating systems, Feature extraction
BibRef
Beal, J.[Josh],
Wu, H.Y.[Hao-Yu],
Park, D.H.[Dong Huk],
Zhai, A.[Andrew],
Kislyuk, D.[Dmitry],
Billion-Scale Pretraining with Vision Transformers for Multi-Task
Visual Representations,
WACV22(1431-1440)
IEEE DOI
2202
Visualization, Solid modeling, Systematics,
Computational modeling, Transformers,
Semi- and Un- supervised Learning
BibRef
Chapter on Pattern Recognition, Clustering, Statistics, Grammars, Learning, Neural Nets, Genetic Algorithms continues in
Patch Based Vision Transformers .