_ | token | _ |
3d-array | token | Petri Nets Generating Tetrahedral Picture Languages |
A-ViT: Adaptive | token | s for Efficient Vision Transformer |
Accurate 3D Face Reconstruction with Facial Component | token | s |
Adaptive Frequency Filters As Efficient Global | token | Mixers |
Adaptive | token | Sampling for Efficient Vision Transformers |
Adjunct Partial Array | token | Petri Net Structure |
All in | token | s: Unifying Output Space of Visual Tasks via Soft Token |
All in | token | s: Unifying Output Space of Visual Tasks via Soft Token |
Animal Pose Tracking: 3D Multimodal Dataset and | token | -based Pose Optimization |
Architectures for Biometric Match-on- | token | Solutions |
Attribute Surrogates Learning and Spectral | token | s Pooling in Transformers for Few-shot Learning |
Beyond Attentive | token | s: Incorporating Token Importance and Diversity for Efficient Vision Transformers |
Beyond Attentive | token | s: Incorporating Token Importance and Diversity for Efficient Vision Transformers |
Biophasor: | token | Supplemented Cancellable Biometrics |
Blind Image Quality Assessment via Transformer Predicted Error Map and Perceptual Quality | token | |
Boosting Point-BERT by Multi-Choice | token | s |
Building Extraction from Remote Sensing Images with Sparse | token | Transformers |
Computing Curvilinear Structure by | token | -Based Grouping |
Content-aware | token | Sharing for Efficient Semantic Segmentation with Vision Transformers |
Continuous Intermediate | token | Learning with Implicit Motion Manifold for Keyframe Based Motion Interpolation |
Cooperative Game Modeling With Weighted | token | -Level Alignment for Audio-Text Retrieval |
CT2: Colorization Transformer via Color | token | s |
Detecting Structure by Symbolic Constructions on | token | s |
Discriminative Class | token | s for Text-to-Image Diffusion Models |
Dual Class | token | Vision Transformer for Direction of Arrival Estimation in Low SNR |
Dual-Factor Authentication System Featuring Speaker Verification and | token | Technology, A |
Dynamic | token | Pruning in Plain Vision Transformers for Semantic Segmentation |
DyTox: Transformers for Continual Learning with DYnamic | token | eXpansion |
ECT: Fine-grained edge detection with learned cause | token | s |
Effective Style | token | Weight Control Technique for End-to-End Emotional Speech Synthesis, An |
Efficient | token | -Guided Image-Text Retrieval With Consistent Multimodal Contrastive Training |
Efficient Transformer-Based 3D Object Detection with Dynamic | token | Halting |
Efficient Video Action Detection with | token | Dropout and Context Refinement |
Efficient Video Transformers with Spatial-Temporal | token | Selection |
Efficient Vision Transformer via | token | Merger |
Entity Extraction and Correction Based on | token | Structure Model Generation |
Focus and Align: Learning Tube | token | s for Video-Language Pre-Training |
Fully Attentional Networks with Self-emerging | token | Labeling |
General Approach for | token | Correspondence, A |
general vision problem solving architecture: Hierarchical | token | grouping, A |
Human Pose as Compositional | token | s |
Hybrid | token | transformer for deep face recognition |
Hybrid-TransCD: A Hybrid Transformer Remote Sensing Image Change Detection Network via | token | Aggregation |
Immunized | token | -Based Approach for Autonomous Deployment of Multiple Mobile Robots in Burnt Area |
Implementation of the USB | token | System for Fingerprint Verification |
Improved Masked Image Generation with | token | -Critic |
Improving defocus blur detection via adaptive supervision prior- | token | s |
Improving vision transformer for medical image classification via | token | -wise perturbation |
ISR3: A | token | Database for Integration of Visual Modules |
Joint | token | and Feature Alignment Framework for Text-Based Person Search |
Joint | token | Pruning and Squeezing Towards More Aggressive Compression of Vision Transformers |
Labeling of Curvilinear Structure Across Scales by | token | Grouping |
language model using variable length | token | s for open-vocabulary Hangul text recognition, A |
Learning Multi-Modal Class-Specific | token | s for Weakly Supervised Dense Object Localization |
Leveraging per Image- | token | Consistency for Vision-Language Pre-Training |
Lightweight Image Super-Resolution with Superpixel | token | Interaction |
Making Vision Transformers Efficient from A | token | Sparsification View |
ManiTrans: Entity-Level Text-Guided Image Manipulation via | token | -wise Semantic Alignment and Generation |
Mask-Guided Transformer Network with Topic | token | for Remote Sensing Image Captioning, A |
MatteFormer: Transformer-Based Image Matting via Prior- | token | s |
MedoidsFormer: A Strong 3D Object Detection Backbone by Exploiting Interaction With Adjacent Medoid | token | s |
Memory- | token | Transformer for Unsupervised Video Anomaly Detection |
Method for identification of | token | s in video sequences |
METransformer: Radiology Report Generation by Transformer with Multiple Learnable Expert | token | s |
MonoATT: Online Monocular 3D Object Detection with Adaptive | token | Transformer |
Morphological image processing on a | token | passing pyramid computer |
MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger | token | s |
Multi-class | token | Transformer for Weakly Supervised Semantic Segmentation |
Multi-Scale | token | s-Aware Transformer Network for Multi-Region and Multi-Sequence MR-to-CT Synthesis in a Single Model |
Multi-user VR Experience for Creating and Trading Non-fungible | token | s |
Multimodal | token | Fusion for Vision Transformers |
New Coeff- | token | Decoding Method With Efficient Memory Access in H.264/AVC Video Coding Standard, A |
No | token | Left Behind: Explainability-Aided Image Classification and Generation |
Not All | token | s Are Equal: Human-centric Visual Analysis via Token Clustering Transformer |
Not All | token | s Are Equal: Human-centric Visual Analysis via Token Clustering Transformer |
Object Discovery from Motion-Guided | token | s |
On Correspondence, Line | token | s And Missing Tokens |
On Correspondence, Line | token | s And Missing Tokens |
Optimisation of biometric ID | token | s by using hardware/software co-design |
PPT: | token | -Pruned Pose Transformer for Monocular and Multi-view Human Pose Estimation |
Prune Spatio-temporal | token | s by Semantic-aware Temporal Accumulation |
Pyramid | token | s-to-Token Vision Transformer for Thyroid Pathology Image Classification |
Pyramid | token | s-to-Token Vision Transformer for Thyroid Pathology Image Classification |
Quantification and Abstraction: Low Level | token | s for Object Extraction |
Recovering 3D Motion and Structure from Stereo and 2D | token | Tracking Cooperation |
Request for Clarity over the End of Sequence | token | in the Self-critical Sequence Training, A |
Revisiting Multimodal Representation in Contrastive Learning: From Patch and | token | Embeddings to Finite Discrete Tokens |
Revisiting Multimodal Representation in Contrastive Learning: From Patch and | token | Embeddings to Finite Discrete Tokens |
RIFormer: Keep Your Vision Backbone Effective But Removing | token | Mixer |
Robust Distance Measures for Face-Recognition Supporting Revocable Biometric | token | s. |
Robustifying | token | Attention for Vision Transformers |
SeiT: Storage-Efficient Vision Training with | token | s Using 1% of Pixel Storage |
Self-Supervised Anomaly Detection from Anomalous Training Data via Iterative Latent | token | Masking |
SG-Former: Self-guided Transformer with Evolving | token | Reallocation |
Shunted Self-Attention via Multi-Scale | token | Aggregation |
Sketch | token | s: A Learned Mid-level Representation for Contour and Object Detection |
Smoothest Velocity Field and | token | Matching Schemes, The |
Soft Measure of Visual | token | Occurrences for Object Categorization |
Spatial Positioning | token | (SPToken) for Smart Mobility |
Spatial-Aware | token | for Weakly Supervised Object Localization |
SPViT: Enabling Faster Vision Transformers via Latency-Aware Soft | token | Pruning |
Strategies for Tracking | token | s in a Cluttered Scene |
Strip-MLP: Efficient | token | Interaction for Vision MLP |
TACo: | token | -aware Cascade Contrastive Learning for Video-Text Alignment |
Taming the curse of dimensionality for perturbed | token | identification |
TFS-ViT: | token | -level feature stylization for domain generalization |
| token | Boosting for Robust Self-Supervised Visual Transformer Pre-training |
| token | Contrast for Weakly-Supervised Semantic Segmentation |
| token | Grouping Based on 3d Motion and Feature Selection in Object Tracking |
| token | labeling-guided multi-scale medical image classification |
| token | Merging for Fast Stable Diffusion |
| token | Pooling in Vision Transformers for Image Classification |
| token | Selection is a Simple Booster for Vision Transformers |
| token | Tracking in a Cluttered Scene |
| token | Turing Machines |
| token | -Based Extraction of Straight Lines |
| token | -Based Fingerprint Authentication |
| token | -Consistent Dropout For Calibrated Vision Transformers |
| token | -Label Alignment for Vision Transformers |
| token | -Textured Object Detection by Pyramids |
| token | HPE: Learning Orientation Tokens for Efficient Head Pose Estimation via Transformers |
| token | Pose: Learning Keypoint Tokens for Human Pose Estimation |
| token | s-to-Token ViT: Training Vision Transformers from Scratch on ImageNet |
| token | s-to-Token ViT: Training Vision Transformers from Scratch on ImageNet |
TopFormer: | token | Pyramid Transformer for Mobile Semantic Segmentation |
TORE: | token | Reduction for Efficient Human Mesh Recovery with Transformer |
Transferable Adversarial Attacks on Vision Transformers with | token | Gradient Regularization |
Transformer Compressed Sensing Via Global Image | token | s |
Transformer vision-language tracking via proxy | token | guided cross-modal fusion |
Translating Optical Flow into | token | Matches |
Translating Optical Flow into | token | Matches and Depth from Looming |
TS-CAM: | token | Semantic Coupled Attention Map for Weakly Supervised Object Localization |
TS2-Net: | token | Shift and Selection Transformer for Text-Video Retrieval |
TSVT: | token | Sparsification Vision Transformer for robust RGB-D salient object detection |
TTST: A Top-k | token | Selective Transformer for Remote Sensing Image Super-Resolution |
UMIFormer: Mining the Correlations between Similar | token | s for Multi-View 3D Reconstruction |
Unleashing Transformers: Parallel | token | Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes |
Using orientation | token | s for object recognition |
VL-Match: Enhancing Vision-Language Pretraining with | token | -Level and Instance-Level Matching |
Which | token | s to Use? Investigating Token Reduction in Vision Transformers |
Which | token | s to Use? Investigating Token Reduction in Vision Transformers |
141 for token