Index for token

_token_
3d-array token Petri Nets Generating Tetrahedral Picture Languages
A-ViT: Adaptive tokens for Efficient Vision Transformer
Accelerating Multimodal Large Language Models by Searching Optimal Vision token Reduction
Accurate 3D Face Reconstruction with Facial Component tokens
Activating Associative Disease-Aware Vision token Memory for LLM-Based X-Ray Report Generation
Adanat: Exploring Adaptive Policy for token-based Image Generation
Adaptive Frequency Filters As Efficient Global token Mixers
Adaptive token Sampling for Efficient Vision Transformers
Adaptor: Adaptive token Reduction for Video Diffusion Transformers
Adjunct Partial Array token Petri Net Structure
Agglomerative token Clustering
AITTI: Learning Adaptive Inclusive token for Text-to-Image Generation
ALGM: Adaptive Local-then-Global token Merging for Efficient Semantic Segmentation with Plain Vision Transformers
All in tokens: Unifying Output Space of Visual Tasks via Soft Token
All in tokens: Unifying Output Space of Visual Tasks via Soft Token
Animal Pose Tracking: 3D Multimodal Dataset and token-based Pose Optimization
Architectures for Biometric Match-on-token Solutions
ATMformer: An Adaptive token Merging Vision Transformer for Remote Sensing Image Scene Classification
ATP-LLaVA: Adaptive token Pruning for Large Vision Language Models
Attend to Not Attended: Structure-then-Detail token Merging for Post-training DiT Acceleration
Attention-Based Layer Fusion and token Masking for Weakly Supervised Semantic Segmentation
Attribute Surrogates Learning and Spectral tokens Pooling in Transformers for Few-shot Learning
Augmenting Multimodal LLMs with Self-Reflective tokens for Knowledge-based Visual Question Answering
Badtoken: Token-level Backdoor Attacks to Multi-modal Large Language Models
Beyond Attentive tokens: Incorporating Token Importance and Diversity for Efficient Vision Transformers
Beyond Attentive tokens: Incorporating Token Importance and Diversity for Efficient Vision Transformers
Beyond masking: Demystifying token-based pre-training for vision transformers
Biophasor: token Supplemented Cancellable Biometrics
Blind Image Quality Assessment via Transformer Predicted Error Map and Perceptual Quality token
Boosting Point-BERT by Multi-Choice tokens
Building Extraction from Remote Sensing Images with Sparse token Transformers
CAgMLP: An MLP-like architecture with a Cross-Axis gated token mixer for image classification
CAST: Clustering self-Attention using Surrogate tokens for efficient transformers
CATANet: Efficient Content-Aware token Aggregation for Lightweight Image Super-Resolution
Class tokens Infusion for Weakly Supervised Semantic Segmentation
CMI-Net: Cross-View Message token Interaction Network for 3D Shape Recognition
CMTM: Cross-Modal token Modulation for Unsupervised Video Object Segmentation
Collaborative Intelligence for Vision Transformers: A token Sparsity-Driven Edge-Cloud Framework
Computing Curvilinear Structure by token-Based Grouping
Confidence-Based Sampling Strategy for Dense Temporal token Learning in Thermal Infrared Object Tracking, A
Content-aware token Sharing for Efficient Semantic Segmentation with Vision Transformers
Continuous Intermediate token Learning with Implicit Motion Manifold for Keyframe Based Motion Interpolation
Cooperative Game Modeling With Weighted token-Level Alignment for Audio-Text Retrieval
Cross-Block Sparse Class token Contrast for Weakly Supervised Semantic Segmentation
Cross-Domain Detection Transformer Based on Spatial-Aware and Semantic-Aware token Alignment
CT2: Colorization Transformer via Color tokens
CVT-Track: Concentrating on Valid tokens for One-Stream Tracking
Detect-and-Guide: Self-regulation of Diffusion Models for Safe Text-to-Image Generation via Guideline token Optimization
Detecting Structure by Symbolic Constructions on tokens
Devil is in Temporal token: High Quality Video Reasoning Segmentation, The
Difference Inversion: Interpolate and Isolate the Difference with token Consistency for Image Analogy Generation
Discriminative Class tokens for Text-to-Image Diffusion Models
Discriminatively Matched Part tokens for Pointly Supervised Instance Segmentation
DiViCo: Disentangled Visual token Compression for Efficient Large Vision-Language Model
DivPrune: Diversity-based Visual token Pruning for Large Multimodal Models
Dtpose: Learning Disentangled token Representation for Effective Human Pose Estimation
Dual Class token Vision Transformer for Direction of Arrival Estimation in Low SNR
Dual-Factor Authentication System Featuring Speaker Verification and token Technology, A
DyCoke: Dynamic Compression of tokens for Fast Video Large Language Models
Dynamic token Pruning in Plain Vision Transformers for Semantic Segmentation
Dynamic token-Pass Transformers for Semantic Segmentation
DyTox: Transformers for Continual Learning with DYnamic token eXpansion
ECT: Fine-grained edge detection with learned cause tokens
EDTST: Efficient Dynamic token Selection Transformer for Hyperspectral Image Classification
Effective Style token Weight Control Technique for End-to-End Emotional Speech Synthesis, An
Efficient token-Guided Image-Text Retrieval With Consistent Multimodal Contrastive Training
Efficient Transformer Adaptation with Soft token Merging
Efficient Transformer-Based 3D Object Detection with Dynamic token Halting
Efficient Video Action Detection with token Dropout and Context Refinement
Efficient Video Transformers with Spatial-Temporal token Selection
Efficient Vision Transformer via token Merger
Efficient Vision Transformer with token Sparsification for Event-Based Object Tracking
Efficient Visual Transformer by Learnable token Merging
Emerging Property of Masked token for Effective Pre-training
Enriching Local Patterns with Multi-token Attention for Broad-Sight Neural Networks
Entity Extraction and Correction Based on token Structure Model Generation
Exploring token-Level Augmentation in Vision Transformer for Semi-Supervised Semantic Segmentation
Fast-iTPN: Integrally Pre-Trained Transformer Pyramid Network With token Migration
Faster Parameter-Efficient Tuning with token Redundancy Reduction
FASTer: Focal token Acquiring-and-Scaling Transformer for Long-term 3D Object Detection
First to Know: How token Distributions Reveal Hidden Knowledge in Large Vision-language Models?, The
Focus and Align: Learning Tube tokens for Video-Language Pre-Training
FourierSR: A Fourier token-Based Plugin for Efficient Image Super-Resolution
Fully Attentional Networks with Self-emerging token Labeling
General and Efficient Training for Transformer via token Expansion, A
General Approach for token Correspondence, A
general vision problem solving architecture: Hierarchical token grouping, A
Generalized Concordant Vision Transformer With Masked Image tokens for Object Detection
Generative Multimodal Pretraining with Discrete Diffusion Timestep tokens
GroupedMixer: An Entropy Model With Group-Wise token-Mixers for Learned Image Compression
GroupRF: Panoptic Scene Graph Generation with group relation tokens
GTP-ViT: Efficient Vision Transformers via Graph-based token Propagation
GTPT: Group-based token Pruning Transformer for Efficient Human Pose Estimation
HalLoc: token-level Localization of Hallucinations for Vision Language Models
Heterogeneous Generative tokens and Distance-Aware Recovery Network for Occluded Person Re-Identification
Hierarchical Graph Interaction Transformer With Dynamic token Clustering for Camouflaged Object Detection
Hierarchical token-Aware Cross-Modality Reconstruction for Visible-Infrared Person Re-Identification
Human Pose as Compositional tokens
Hybrid Multi-Class token Vision Transformer Convolutional Network for DOA Estimation
Hybrid token transformer for deep face recognition
Hybrid-Level Instruction Injection for Video token Compression in Multi-modal Large Language Models
Hybrid-TransCD: A Hybrid Transformer Remote Sensing Image Change Detection Network via token Aggregation
Hyperspectral image classification with token fusion on GPU
HyperTransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic token Mixer for Hyperspectral Image Classification
Image is Worth 1/2 tokens After Layer 2: Plug-and-play Inference Acceleration for Large Vision-language Models, An
Immunized token-Based Approach for Autonomous Deployment of Multiple Mobile Robots in Burnt Area
Implementation of the USB token System for Fingerprint Verification
Improved Masked Image Generation with token-Critic
Improving Autoregressive Visual Generation with Cluster-Oriented token Prediction
Improving defocus blur detection via adaptive supervision prior-tokens
Improving vision transformer for medical image classification via token-wise perturbation
Instruction Tuning-free Visual token Complement for Multimodal LLMs
Inter-image token Relation Learning for weakly supervised semantic segmentation
ISR3: A token Database for Integration of Visual Modules
IVTP: Instruction-guided Visual token Pruning for Large Vision-language Models
Joint token and Feature Alignment Framework for Text-Based Person Search
Joint token Pruning and Squeezing Towards More Aggressive Compression of Vision Transformers
Labeling of Curvilinear Structure Across Scales by token Grouping
language model using variable length tokens for open-vocabulary Hangul text recognition, A
Large Model Empowered Multi-Modal Semantic Communication With Selective tokens for Training
Layer-and Timestep-Adaptive Differentiable token Compression Ratios for Efficient Diffusion Transformers
Learnable token for visual tracking
Learning Multi-Modal Class-Specific tokens for Weakly Supervised Dense Object Localization
Learning to mask and permute visual tokens for Vision Transformer pre-training
Learning with Unmasked tokens Drives Stronger Vision Learners
Leveraging multi-class background description and token dictionary representation for hyperspectral anomaly detection
Leveraging per Image-token Consistency for Vision-Language Pre-Training
Lightweight Image Super-Resolution with Superpixel token Interaction
Llama-vid: An Image is Worth 2 tokens in Large Language Models
LookupVIT: Compressing Visual Information to a Limited Number of tokens
MADTP: Multimodal Alignment-Guided Dynamic token Pruning for Accelerating Vision-Language Transformer
Magic tokens: Select Diverse Tokens for Multi-modal Object Re-Identification
Magic tokens: Select Diverse Tokens for Multi-modal Object Re-Identification
Make Your Vit-based Multi-view 3d Detectors Faster via token Compression
Making Vision Transformers Efficient from A token Sparsification View
ManiTrans: Entity-Level Text-Guided Image Manipulation via token-wise Semantic Alignment and Generation
MAPM: PolSAR Image Classification with Masked Autoencoder Based on Position Prediction and Memory tokens
Mask-Guided Transformer Network with Topic token for Remote Sensing Image Captioning, A
Masked Reference token Supervision-Based Iterative Visual-Language Framework for Robust Visual Grounding, A
MatteFormer: Transformer-Based Image Matting via Prior-tokens
MCTformer+: Multi-Class token Transformer for Weakly Supervised Semantic Segmentation
MedoidsFormer: A Strong 3D Object Detection Backbone by Exploiting Interaction With Adjacent Medoid tokens
Memory-token Transformer for Unsupervised Video Anomaly Detection
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled token Merging and Quantization
Method for identification of tokens in video sequences
METransformer: Radiology Report Generation by Transformer with Multiple Learnable Expert tokens
Mining representative tokens via transformer-based multi-modal interaction for RGB-T tracking
MMoT: Mixture-of-Modality-tokens Transformer for Composed Multimodal Conditional Image Synthesis
MonoATT: Online Monocular 3D Object Detection with Adaptive token Transformer
Morphological image processing on a token passing pyramid computer
MovieChat: From Dense token to Sparse Memory for Long Video Understanding
MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger tokens
MST: Adaptive Multi-Scale tokens Guided Interactive Segmentation
Multi-class token Transformer for Weakly Supervised Semantic Segmentation
Multi-Criteria token Fusion with One-Step-Ahead Attention for Efficient Vision Transformers
Multi-Faceted Adaptive token Pruning for Efficient Remote Sensing Image Segmentation
Multi-modal interaction with token division strategy for RGB-T tracking
Multi-Scale tokens-Aware Transformer Network for Multi-Region and Multi-Sequence MR-to-CT Synthesis in a Single Model
Multi-schema prompting powered token-feature woven attention network for short text classification
Multi-user VR Experience for Creating and Trading Non-fungible tokens
Multimodal token Fusion for Vision Transformers
MVFormer: Diversifying feature normalization and token mixing for efficient vision transformers
New Coeff-token Decoding Method With Efficient Memory Access in H.264/AVC Video Coding Standard, A
No token Left Behind: Explainability-Aided Image Classification and Generation
Not All tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer
Not All tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer
Object Discovery from Motion-Guided tokens
Object Recognition as Next token Prediction
Omni-RGPT: Unifying Image and Video Region-level Understanding via token Marks
On Correspondence, Line tokens And Missing Tokens
On Correspondence, Line tokens And Missing Tokens
Open-Vocabulary Attention Maps with token Optimization for Semantic Segmentation in Diffusion Models
Optimisation of biometric ID tokens by using hardware/software co-design
OTE: Exploring Accurate Scene Text Recognition Using One token
Other tokens matter: Exploring global and local features of Vision Transformers for Object Re-Identification
PACT: Pruning and Clustering-Based token Reduction for Faster Visual Language Models
Partitioned token fusion and pruning strategy for transformer tracking
Patch Ranking: token Pruning as Ranking Prediction for Efficient CLIP
Pedestrian Crossing Intention Prediction via Progressive Multimodal token Fusion for Autonomous Driving
Perception tokens Enhance Visual Reasoning in Multimodal Language Models
Picture is Worth More Than 77 Text tokens: Evaluating CLIP-Style Models on Dense Captions, A
PointLoRA: Low-Rank Adaptation with token Selection for Point Cloud Learning
Pose-guided token selection for the recognition of activities of daily living
PostoMETRO: Pose token Enhanced Mesh Transformer for Robust 3D Human Mesh Recovery
PPT: token-Pruned Pose Transformer for Monocular and Multi-view Human Pose Estimation
PRANCE: Joint token-Optimization and Structural Channel-Pruning for Adaptive ViT Inference
Predtoken: Predicting Unknown Tokens and Beyond with Coarse-to-Fine Iterative Decoding
Prune and Merge: Efficient token Compression for Vision Transformer With Spatial Information Preserved
Prune Spatio-temporal tokens by Semantic-aware Temporal Accumulation
Pruning One More token is Enough: Leveraging Latency-Workload Non-Linearities for Vision Transformers on the Edge
PTET: A progressive token exchanging transformer for infrared and visible image fusion
PVC: Progressive Visual token Compression for Unified Image and Video Processing in Large Vision-Language Models
Pyramid tokens-to-Token Vision Transformer for Thyroid Pathology Image Classification
Pyramid tokens-to-Token Vision Transformer for Thyroid Pathology Image Classification
Quantification and Abstraction: Low Level tokens for Object Extraction
Random Entangled tokens for Adversarially Robust Vision Transformer
Reasoning to Attend: Try to Understand How token Works
Recovering 3D Motion and Structure from Stereo and 2D token Tracking Cooperation
Removing Rows and Columns of tokens in Vision Transformer Enables Faster Dense Prediction Without Retraining
Representation Selective Coupling via token Sparsification for Multi-Spectral Object Re-Identification
Request for Clarity over the End of Sequence token in the Self-critical Sequence Training, A
ResiComp: Loss-Resilient Image Compression via Dual-Functional Masked Visual token Modeling
Rethinking token Reduction with Parameter-Efficient Fine-Tuning in ViT for Pixel-Level Tasks
Rethinking visual prompt learning as masked visual token modeling
Revisiting Multimodal Representation in Contrastive Learning: From Patch and token Embeddings to Finite Discrete Tokens
Revisiting Multimodal Representation in Contrastive Learning: From Patch and token Embeddings to Finite Discrete Tokens
Revisiting token Pruning for Object Detection and Instance Segmentation
RIFormer: Keep Your Vision Backbone Effective But Removing token Mixer
Robust Distance Measures for Face-Recognition Supporting Revocable Biometric tokens.
Robust scene text understanding with OCR token and word alignment for Text-VQA and text-caption
Robustifying token Attention for Vision Transformers
Robustness tokens: Towards Adversarial Robustness of Transformers
Rollout-Guided token Pruning for Efficient Video Understanding
Salience-based Adaptive Masking: Revisiting token Dynamics for Enhanced Pre-Training
SarAdapter: Prioritizing Attention on Semantic-Aware Representative tokens for Enhanced Medical Image Segmentation
SAT-HMR: Real-Time Multi-Person 3D Mesh Estimation via Scale-Adaptive tokens
SATA: Spatial Autocorrelation token Analysis for Enhancing the Robustness of Vision Transformers
Scale-aware token-matching for transformer-based object detector
Segment Any Event Streams via Weighted Adaptation of Pivotal tokens
Seit++: Masked token Modeling Improves Storage-efficient Training
SeiT: Storage-Efficient Vision Training with tokens Using 1% of Pixel Storage
Self-Supervised Anomaly Detection from Anomalous Training Data via Iterative Latent token Masking
Self-supervised Video Copy Localization with Regional token Representation
Semantic Prompting with Image token for Continual Learning
SETA: Semantic-Aware Edge-Guided token Augmentation for Domain Generalization
SG-Former: Self-guided Transformer with Evolving token Reallocation
Shunted Self-Attention via Multi-Scale token Aggregation
Simple token-Level Confidence Improves Caption Correctness
Simple yet Effective Layout token in Large Language Models for Document Understanding, A
Sketch tokens: A Learned Mid-level Representation for Contour and Object Detection
Smoothest Velocity Field and token Matching Schemes, The
Soft Measure of Visual token Occurrences for Object Categorization
Spatial Positioning token (SPToken) for Smart Mobility
Spatial-Aware token for Weakly Supervised Object Localization
SPViT: Enabling Faster Vision Transformers via Latency-Aware Soft token Pruning
STFormer: An efficient visual Transformer model with sparse attention and adaptive token aggregation
STPM: Spatial-Temporal token Pruning and Merging for Complex Activity Recognition
Strategies for Tracking tokens in a Cluttered Scene
Strip-MLP: Efficient token Interaction for Vision MLP
SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and token Folding
T4P: Test-Time Training of Trajectory Prediction via Masked Autoencoder and Actor-Specific token Memory
TACo: token-aware Cascade Contrastive Learning for Video-Text Alignment
Taming the curse of dimensionality for perturbed token identification
TCFormer: Visual Recognition via token Clustering Transformer
TCSAFormer: Efficient Vision Transformer With token Compression and Sparse Attention for Medical Image Segmentation
TFRNet: Semantic Segmentation Network with token Filtration and Refinement Method
TFS-ViT: token-level feature stylization for domain generalization
token Aggregation and Selection Hashing for Efficient Underwater Image Retrieval
token Boosting for Robust Self-Supervised Visual Transformer Pre-training
token Calibration for Transformer-Based Domain Adaptation
token Compensator: Altering Inference Cost of Vision Transformer Without Re-tuning
token Contrast for Weakly-Supervised Semantic Segmentation
token Cropr: Faster ViTs for Quite a Few Tasks
token Fusion: Bridging the Gap between Token Pruning and Token Merging
token Fusion: Bridging the Gap between Token Pruning and Token Merging
token Fusion: Bridging the Gap between Token Pruning and Token Merging
token Grouping Based on 3d Motion and Feature Selection in Object Tracking
token labeling-guided multi-scale medical image classification
token Masking Transformer for Weakly Supervised Object Localization
token Merging for Fast Stable Diffusion
token Pooling in Vision Transformers for Image Classification
token pyramid pooling-driven style adapter learning with dual-view balanced loss for imbalanced diabetic retinopathy grading
token Selection is a Simple Booster for Vision Transformers
token Tracking in a Cluttered Scene
token Transformation Matters: Towards Faithful Post-Hoc Explanation for Vision Transformer
token Turing Machines
token Turing Machines are Efficient Vision Models
token-aware and step-aware acceleration for Stable Diffusion
token-based dynamic bit-width assignment for ViT quantization
token-Based Extraction of Straight Lines
token-Based Fingerprint Authentication
token-Based, Patch Based Vision Transformers
token-Consistent Dropout For Calibrated Vision Transformers
token-Label Alignment for Vision Transformers
token-Level Prompt Mixture With Parameter-Free Routing for Federated Domain Generalization
token-Mixer: Bind Image and Text in One Embedding Space for Medical Image Reporting
token-Prediction-Based Post-Processing for Low-Bitrate Speech Coding
token-Textured Object Detection by Pyramids
token-word mixer meets object-aware transformer for referring image segmentation
tokenCompose: Text-to-Image Diffusion with Token-Level Supervision
tokenHPE: Learning Orientation Tokens for Efficient Head Pose Estimation via Transformers
tokenMotion: Decoupled Motion Control via Token Disentanglement for Human-centric Video Generation
tokenPose: Learning Keypoint Tokens for Human Pose Estimation
tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
TopFormer: token Pyramid Transformer for Mobile Semantic Segmentation
TopV: Compatible token Pruning with Inference Time Optimization for Fast and Low-Memory Multimodal Vision Language Model
TORE: token Recycling in Vision Transformers for Efficient Active Visual Exploration
TORE: token Reduction for Efficient Human Mesh Recovery with Transformer
Toward Unified token Learning for Vision-Language Tracking
Towards Universal Modal Tracking With Online Dense Temporal token Learning
Trajectory-aligned Space-time tokens for Few-shot Action Recognition
Transcending the Limit of Local Window: Advanced Super-Resolution Transformer with Adaptive token Dictionary
Transferable Adversarial Attacks on Vision Transformers with token Gradient Regularization
Transformer Compressed Sensing Via Global Image tokens
Transformer RGBT Tracking With Spatio-Temporal Multimodal tokens
Transformer vision-language tracking via proxy token guided cross-modal fusion
Transformer with token attention and attribute prediction for image captioning
Translating Optical Flow into token Matches
Translating Optical Flow into token Matches and Depth from Looming
TS-CAM: token Semantic Coupled Attention Map for Weakly Supervised Object Localization
TS2-Net: token Shift and Selection Transformer for Text-Video Retrieval
TSVT: token Sparsification Vision Transformer for robust RGB-D salient object detection
TTST: A Top-k token Selective Transformer for Remote Sensing Image Super-Resolution
UMIFormer: Mining the Correlations between Similar tokens for Multi-View 3D Reconstruction
Understanding the Effect of using Semantically Meaningful tokens for Visual Representation Learning
Unleashing Transformers: Parallel token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes
Using orientation tokens for object recognition
VASparse: Towards Efficient Visual Hallucination Mitigation via Visual-Aware token Sparsification
vid-TLDR: Training Free token merging for Light-Weight Video Transformer
Video, How do Your tokens Merge?
VidToMe: Video token Merging for Zero-Shot Video Editing
Vista-llama: Reducing Hallucination in Video Language Models via Equal Distance to Visual tokens
VL-Match: Enhancing Vision-Language Pretraining with token-Level and Instance-Level Matching
VLTP: Vision-Language Guided token Pruning for Task-Oriented Segmentation
Which tokens to Use? Investigating Token Reduction in Vision Transformers
Which tokens to Use? Investigating Token Reduction in Vision Transformers
Window token Concatenation for Efficient Visual Large Language Models
Zero-shot 3D Question Answering via Voxel-based Dynamic token Compression
Zero-TPrune: Zero-Shot token Pruning Through Leveraging of the Attention Graph in Pre-Trained Transformers
320 for token

_tokenbinder_
tokenbinder: Text-Video Retrieval with One-to-Many Alignment Paradigm

_tokencompose_
tokencompose: Text-to-Image Diffusion with Token-Level Supervision

_tokencut_
tokencut: Segmenting Objects in Images and Videos With Self-Supervised Transformer and Normalized Cut

_tokenflow_
tokenflow: Unified Image Tokenizer for Multimodal Understanding and Generation

_tokenfocus_
tokenfocus-VQA: Enhancing Text-to-Image Alignment with Position-Aware Focus and Multi-Perspective Aggregations on LVLMs

_tokenhmr_
tokenhmr: Advancing Human Mesh Recovery with a Tokenized Pose Representation

_tokenhpe_
tokenhpe: Learning Orientation Tokens for Efficient Head Pose Estimation via Transformers

_tokenhsi_
tokenhsi: Unified Synthesis of Physical Human-Scene Interactions through Task Tokenization

_tokenised_
Biohashing: Two factor authentication featuring fingerprint data and tokenised random number
Cancellable biometerics featuring with tokenised random number
Integrated Dual Factor Authenticator Based on the Face Data and tokenised Random Number, An
multi-modal method based on the competitors of FVC2004 and on palm data combined with tokenised random numbers, A

_tokenization_
Beyond local patches: Preserving global-local interactions by enhancing self-attention via 3D point cloud tokenization
Channel-Reduced Transformer With Cross-Region tokenization for Hyperspectral Image Classification
Efficient Long Video tokenization via Coordinate-based Patch Reconstruction
Frame-level Feature tokenization Learning for Human Body Pose and Shape Estimation
Gaussian Segmentation and tokenization for Low Cost Language Identification
GROMA: Localized Visual tokenization for Grounding Multimodal Large Language Models
Language-Guided Image tokenization for Generation
MoST: Multi-modality Scene tokenization for Motion Prediction
MSViT: Dynamic Mixed-scale tokenization for Vision Transformers
Neural Sign Language Translation by Learning tokenization
Prior tokenization-based interactive segmentation with Vision Transformers
Region-native Visual tokenization
Scale-free and unbiased transformer with tokenization for cell type annotation from single-cell RNA-seq data
Scaling Mesh Generation via Compressive tokenization
Segment This Thing: Foveated tokenization for Efficient Point-Prompted Segmentation
TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through Task tokenization
Tuning Large Language Model for Speech Recognition With Mixed-Scale Re-tokenization
Video Question Answering with Iterative Video-Text Co-tokenization
Video Segmentation and tokenization for Model-Based Video Scene Classification
Vision Transformers with Mixed-Resolution tokenization
Voxel-MPI: Scene-adaptive multiplane images based local voxel tokenization with attention coordination for 3D scene representation
Zero-Shot Sketch-Based Remote-Sensing Image Retrieval Based on Multi-Level and Attention-Guided tokenization
22 for tokenization

_tokenize_
tokenize Anything via Prompting
tokenize Image Patches: Global Context Fusion for Effective Haze Removal in Large Images

_tokenized_
Closed-Loop Supervised Fine-Tuning of tokenized Traffic Models
DistilPose: tokenized Pose Regression with Heatmap Distillation
Regularized Vector Quantization for tokenized Image Synthesis
SDPose: tokenized Pose Estimation via Circulation-Guide Self-Distillation
SSTNet: Saliency sparse transformers network with tokenized dilation for salient object detection
TM2T: Stochastic and tokenized Modeling for the Reciprocal Generation of 3D Human Motions and Texts
TokenHMR: Advancing Human Mesh Recovery with a tokenized Pose Representation
tokenized Generative Speech Enhancement With Language Model and Flow Matching
8 for tokeniz

_tokenizer_
CSLT-AK: Convolutional-embedded transformer with an action tokenizer and keypoint emphasizer for sign language translation
Divot: Diffusion Powers Video tokenizer for Comprehension and Generation
epislon-ViLM: Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized tokenizer
Finite Scalar Quantization as Facial tokenizer for Dyadic Reaction Generation
H2OT: Hierarchical Hourglass tokenizer for Efficient Video Pose Transformers
Homogeneous tokenizer matters: Homogeneous visual tokenizer for remote sensing image understanding
Homogeneous tokenizer matters: Homogeneous visual tokenizer for remote sensing image understanding
Hourglass tokenizer for Efficient Transformer-Based 3D Human Pose Estimation
Rethinking the Objectives of Vector-Quantized tokenizers for Image Synthesis
Revisiting Kernel Temporal Segmentation as an Adaptive tokenizer for Long-form Video Understanding
SoftVQ-VAE: Efficient 1-Dimensional Continuous tokenizer
StylerDALLE: Language-Guided Style Transfer Using a Vector-Quantized tokenizer of a Large-Scale Generative Model
TokenFlow: Unified Image tokenizer for Multimodal Understanding and Generation
What Makes for Good tokenizers in Vision Transformer?
14 for tokenizer

_tokenless_
tokenless Cancelable Biometrics Scheme for Protecting Iris Codes

_tokenmix_
tokenmix: Rethinking Image Mixing for Data Augmentation in Vision Transformers

_tokenmotion_
tokenmotion: Decoupled Motion Control via Token Disentanglement for Human-centric Video Generation

_tokenpacker_
tokenpacker: Efficient Visual Projector for Multimodal LLM

_tokenpose_
tokenpose: Learning Keypoint Tokens for Human Pose Estimation

Index for "t"


Last update:26-Feb-26 11:52:11
Use price@usc.edu for comments.