_ | masked | _ |
Abnormal Walking Gait Analysis Using Silhouette- | masked | Flow Histograms |
AdaMAE: Adaptive Masking for Efficient Spatiotemporal Learning with | masked | Autoencoders |
Adaptive spread-transform dither modulation using an improved luminance- | masked | threshold |
ADMM Approach to | masked | Signal Decomposition Using Subspace Representation, An |
AMP: Adaptive | masked | Proxies for Few-Shot Segmentation |
Attention-Guided Contrastive | masked | Image Modeling for Transformer-Based Self-Supervised Learning |
Audiovisual | masked | Autoencoders |
autoSMIM: Automatic Superpixel-Based | masked | Image Modeling for Skin Lesion Segmentation |
Background | masked | Guided Network for Skin Lesion Segmentation in Dermoscopy Image |
Balanced | masked | and Standard Face Recognition |
BirdSAT: Cross-View Contrastive | masked | Autoencoders for Bird Species Classification and Mapping |
Boosting Fairness for | masked | Face Recognition |
Boosting | masked | dominant orientation templates for efficient object detection |
Bootstrapped | masked | Autoencoders for Vision BERT Pretraining |
Can you read lips with a | masked | face? |
CL-MAE: Curriculum-Learned | masked | Autoencoders |
CM-MaskSD: Cross-Modality | masked | Self-Distillation for Referring Image Segmentation |
Comment-Context Dual Collaborative | masked | Transformer Network for Fake News Detection |
Consistent Sub-Decision Network for Low-Quality | masked | Face Recognition |
Continuously | masked | Transformer for Image Inpainting |
Contraction of Dynamically | masked | Deep Neural Networks for Efficient Video Processing |
Contrastive | masked | Autoencoders are Stronger Vision Learners |
ConvNeXt V2: Co-designing and Scaling ConvNets with | masked | Autoencoders |
Convolutional | masked | Image Modeling for Dense Prediction Tasks on Pathology Images |
CXRMIM: | masked | Image Modeling Pre-Training Paradigm for Chest X-Ray Images Analysis |
Deep Covariance Feature and CNN-based End-to-End | masked | Face Recognition |
Defining a Threshold Value for Maximum Spatial Information Loss of | masked | Geo-Data |
Delving into | masked | Autoencoders for Multi-Label Thorax Disease Classification |
Depth estimation of light field data from pinhole- | masked | DSLR cameras |
Depth | masked | Discriminative Correlation Filter |
Detecting | masked | Faces in the Wild with LLE-CNNs |
Detection of Windows in IR Building Textures Using | masked | Correlation |
DiffSwap: High-Fidelity and Controllable Face Swapping via 3D-Aware | masked | Diffusion |
Diffusion Models as | masked | Autoencoders |
Disjoint Masking With Joint Distillation for Efficient | masked | Image Modeling |
Domain Invariant | masked | Autoencoders for Self-supervised Learning from Multi-domains |
DPPMask: | masked | Image Modeling with Determinantal Point Processes |
DropMAE: | masked | Autoencoders with Spatial-Attention Dropout for Tracking Tasks |
Dual | masked | Modeling for Weakly-Supervised Temporal Boundary Discovery |
Dual Sensor Indian | masked | Face Dataset |
Efficient | masked | face identification biometric systems based on ResNet and DarkNet convolutional neural networks |
Efficient Parallel Audio Generation Using Group | masked | Language Modeling |
Empirical Study of End-to-End Video-Language Transformers with | masked | Visual Modeling, An |
End-to-End Dense Video Captioning with | masked | Transformer |
Enhancing HEVC Compressed Videos with a Partition- | masked | Convolutional Neural Network |
epislon-ViLM: Efficient Video-Language Model via | masked | Video Modeling with Semantic Vector-Quantized Tokenizer |
EVA: Exploring the Limits of | masked | Visual Representation Learning at Scale |
Evaluation of Video | masked | Autoencoders' Performance and Uncertainty Estimations for Driver Action and Intention Recognition |
Example-guided Image Synthesis Using | masked | Spatial-channel Attention and Self-supervision |
Face Bio-Metrics Under COVID: | masked | Face Recognition |
Fast generative adversarial networks model for | masked | image restoration |
Few-Shot Contrastive Transfer Learning With Pretrained Model for | masked | Face Verification |
FlowFormer++: | masked | Cost Volume Autoencoding for Pretraining Optical Flow Estimation |
FocusFace: Multi-task Contrastive Learning for | masked | Face Recognition |
Forecast-MAE: Self-supervised Pre-training for Motion Forecasting with | masked | Autoencoders |
FreMIM: Fourier Transform Meets | masked | Image Modeling for Medical Image Segmentation |
Generating Attribution Maps with Disentangled | masked | Backpropagation |
Generic-to-Specific Distillation of | masked | Autoencoders |
GeoMAE: | masked | Geometric Target Prediction for Self-supervised Point Cloud Pre-Training |
GeoMIM: Towards Better 3D Knowledge Transfer via | masked | Image Modeling for Multi-view 3D Understanding |
Group | masked | Model Learning for General Audio Representation |
Hard Patches Mining for | masked | Image Modeling |
Hierarchical- | masked | Image Filtering for Privacy-Protection |
High-Quality and Diverse Few-Shot Image Generation via | masked | Discrimination |
HQRetouch: Learning Professional Face Retouching Via | masked | Feature Fusion and Semantic-Aware Modulation |
HumanMAC: | masked | Motion Completion for Human Motion Prediction |
Hybrid Graph Convolutional Network With Online | masked | Autoencoder for Robust Multimodal Cancer Survival Prediction |
Hypersphere guided embedding for | masked | face recognition |
Image fusion method based on spatially | masked | convolutional sparse representation |
Improve Unsupervised Deep Hashing Via | masked | Contrastive Learning |
Improved | masked | Image Generation with Token-Critic |
Improving Adversarial Robustness of | masked | Autoencoders via Test-time Frequency-domain Prompting |
Improving Representation Consistency with Pairwise Loss for | masked | Face Recognition |
Indian | masked | Faces in the Wild Dataset |
Inter-Modal | masked | Autoencoder for Self-Supervised Learning on Point Clouds |
Iterative Robust Visual Grounding with | masked | Reference based Centerpoint Supervision |
JDSR-GAN: Constructing an Efficient Joint Learning Network for | masked | Face Super-Resolution |
Large-Scale Isolated Gesture Recognition Using a Refined Fused Model Based on | masked | Res-C3D Network and Skeleton LSTM |
LAVENDER: Unifying Video-Language Understanding as | masked | Language Modeling |
LC-MSM: Language-Conditioned | masked | Segmentation Model for unsupervised domain adaptation |
Learnable EVC Intra Predictor Using | masked | Convolutions, A |
Learning 3D Representations from 2D Pre-Trained Models via Image-to-Point | masked | Autoencoders |
Learning from the Web: Webly Supervised Meta-Learning for | masked | Face Recognition |
Learning upper patch attention using dual-branch training strategy for | masked | face recognition |
LEMaRT: Label-Efficient | masked | Region Transform for Image Harmonization |
Limited Data, Unlimited Potential: A Study on ViTs Augmented by | masked | Autoencoders |
Localization using Multi-Focal Spatial Attention for | masked | Face Recognition |
M33D: Learning 3D priors using Multi-Modal | masked | Autoencoders for 2D image and video understanding |
MAE, | masked | Autoencoder |
MAELi: | masked | Autoencoder for Large-Scale LiDAR Point Clouds |
MAESTER: | masked | Autoencoder Guided Segmentation at Pixel Resolution for Accurate, Self-Supervised Subcellular Structure Recognition |
MAGAN: A | masked | autoencoder generative adversarial network for processing missing IoT sequence data |
MAGE: | masked | Generative Encoder to Unify Representation Learning and Image Synthesis |
MAGVIT: | masked | Generative Video Transformer |
MAGVLT: | masked | Generative Vision-and-Language Transformer |
MAR: | masked | Autoencoders for Efficient Action Recognition |
MARLIN: | masked | Autoencoder for facial video Representation LearnINg |
Mask Aware Network for | masked | Face Recognition in the Wild |
Mask-ShadowNet: Toward Shadow Removal via | masked | Adaptive Instance Normalization |
Mask3D: Pretraining 2D Vision Transformers by Learning | masked | 3D Priors |
MaskCLIP: | masked | Self-Distillation Advances Contrastive Language-Image Pretraining |
MaskCon: | masked | Contrastive Learning for Coarse-Labelled Dataset |
| masked | and Adaptive Transformer for Exemplar Based Image Translation |
| masked | and Permuted Implicit Context Learning for Scene Text Recognition |
| masked | Auto-Encoders Meet Generative Adversarial Networks and Beyond |
| masked | Autoencoder for Self-Supervised Pre-training on Lidar Point Clouds |
| masked | Autoencoders are Efficient Class Incremental Learners |
| masked | Autoencoders Are Scalable Vision Learners |
| masked | Autoencoders Are Stronger Knowledge Distillers |
| masked | Autoencoders Enable Efficient Knowledge Distillers |
| masked | Autoencoders for Point Cloud Self-Supervised Learning |
| masked | Autoencoding Does Not Help Natural Language Supervision at Scale |
| masked | Batch Normalization to Improve Tracking-Based Sign Language Recognition Using Graph Convolutional Networks |
| masked | Collaborative Contrast for Weakly Supervised Semantic Segmentation |
| masked | Conditional Variational Autoencoders for Chromosome Straightening |
| masked | Contrastive Representation Learning for Reinforcement Learning |
| masked | Diffusion Transformer is a Strong Image Synthesizer |
| masked | Discrimination for Self-supervised Learning on Point Clouds |
| masked | Embedding Modeling With Rapid Domain Adjustment for Few-Shot Image Classification |
| masked | Event Modeling: Self-Supervised Pretraining for Event Cameras |
| masked | Face Recognition Challenge: The InsightFace Track Report |
| masked | Face Recognition Datasets and Validation |
| masked | Face Recognition via Self-Attention Based Local Consistency Regularization |
| masked | face recognition: Human versus machine |
| masked | Faces with Faced Masks |
| masked | fake face detection using radiance measurements |
| masked | Feature Prediction for Self-Supervised Visual Pre-Training |
| masked | FFT registration |
| masked | Generative Distillation |
| masked | Graph Convolutional Network for Small Sample Classification of Hyperspectral Images |
| masked | Image Modeling |
| masked | Image Modeling Advances 3D Medical Image Analysis |
| masked | Image Modeling with Local Multi-Scale Reconstruction |
| masked | Image Training for Generalizable Deep Image Denoising |
| masked | Images Are Counterfactual Samples for Robust Fine-Tuning |
| masked | Jigsaw Puzzle: A Versatile Position Embedding for Vision Transformers |
| masked | Label Learning for Optical Flow Regression |
| masked | Linear Regression for Learning Local Receptive Fields for Facial Expression Synthesis |
| masked | Motion Encoding for Self-Supervised Video Representation Learning |
| masked | Motion Predictors are Strong 3D Action Representation Learners |
| masked | Object Registration in the Fourier Domain |
| masked | Representation Learning for Domain Generalized Stereo Matching |
| masked | Retraining Teacher-Student Framework for Domain Adaptive Object Detection |
| masked | Scene Contrast: A Scalable Framework for Unsupervised 3D Representation Learning |
| masked | Siamese Networks for Label-Efficient Learning |
| masked | SIFT with align-based refinement for contactless palmprint recognition |
| masked | Spatio-Temporal Structure Prediction for Self-supervised Learning on Point Cloud Videos |
| masked | Spectral Bands Modeling with Shifted Windows: An Excellent Self-Supervised Learner for Classification of Medical Hyperspectral Images |
| masked | Spiking Transformer |
| masked | Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning |
| masked | Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning |
| masked | Vision Transformers for Hyperspectral Image Classification |
| masked | Wavelet Representation for Compact Neural Radiance Fields |
| masked | -attention Mask Transformer for Universal Image Segmentation |
| masked | FaceNet: A Progressive Semi-Supervised Masked Face Detector |
MaskFaceGAN: High-Resolution Face Editing With | masked | GAN Latent Code Optimization |
MaskGIT: | masked | Generative Image Transformer |
MaskOut: A Data Augmentation Method for | masked | Face Recognition |
MaskSketch: Unpaired Structure-guided | masked | Image Generation |
MATE: | masked | Autoencoders are Online 3D Test-Time Learners |
MeshMAE: | masked | Autoencoders for 3D Mesh Data Analysis |
MGM-AE: Self-Supervised Learning on 3D Shape Using Mesh Graph | masked | Autoencoders |
MGMAE: Motion Guided Masking for Video | masked | Autoencoding |
MIC: | masked | Image Consistency for Context-Enhanced Domain Adaptation |
MixMAE: Mixed and | masked | Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers |
MM-3DScene: 3D Scene Understanding by Customizing | masked | Modeling with Informative-Preserved Reconstruction and Self-Distilled Consistency |
Mobile based Human Identification using Forehead Creases: Application and Assessment under COVID-19 | masked | Face Scenarios |
MOT: | masked | Optimal Transport for Partial Domain Adaptation |
MRM: | masked | Relation Modeling for Medical Image Pre-Training with Genetics |
Multi-Dataset Benchmarks for | masked | Identification using Contrastive Representation Learning |
Multi-modal Facial Affective Analysis based on | masked | Autoencoder |
Multi-modal | masked | Pre-training for Monocular Panoramic Depth Completion |
MultiMAE: Multi-modal Multi-task | masked | Autoencoders |
Multimodal Channel-Mixing: Channel and Spatial | masked | AutoEncoder on Facial Action Unit Detection |
Multiple Instance Learning Framework with | masked | Hard Instance Mining for Whole Slide Image Classification |
MV-JAR: | masked | Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training |
Neural Image Compression Using | masked | Sparse Visual Representation |
Not All Image Regions Matter: | masked | Vector Quantization for Autoregressive Image Generation |
OmniMAE: Single Model | masked | Pretraining on Images and Videos |
On Data Scaling in | masked | Image Modeling |
Open Set | masked | Face Identification system |
Pain Detection in | masked | Faces during Procedural Sedation |
Paper Fingerprinting Using alpha- | masked | Image Matching |
Partial Attack Supervision and Regional Weighted Inference for | masked | Face Presentation Attack Detection |
Periocular biometrics and its relevance to partially | masked | faces: A survey |
Personalized Image Enhancement Featuring | masked | Style Modeling |
PiMAE: Point Cloud and Image Interactive | masked | Autoencoders for 3D Object Detection |
PMatch: Paired | masked | Image Modeling for Dense Geometric Matching |
Point Cloud Domain Adaptation via | masked | Local 3D Structure Prediction |
Point-BERT: Pre-training 3D Point Cloud Transformers with | masked | Point Modeling |
Positive Unlabeled Fake News Detection via Multi-Modal | masked | Transformer Network |
Predicting Spatiotemporal Demand of Dockless E-Scooter Sharing Services with a | masked | Fully Convolutional Network |
Pyramid | masked | Image Modeling for Transformer-Based Aerial Object Detection |
Real masks and spoof faces: On the | masked | face presentation attack detection |
Reconstructing Randomly | masked | Spectra Helps DNNs Identify Discriminant Wavenumbers |
Representation Learning for Visual Object Tracking by | masked | Appearance Transfer |
ResSaNet: A Hybrid Backbone of Residual Block and Self-Attention Module for | masked | Face Recognition |
Rethinking Out-of-distribution (OOD) Detection: | masked | Image Modeling is All You Need |
Revealing the Dark Secrets of | masked | Image Modeling |
RILS: | masked | Visual Reconstruction in Language Semantic Space |
Ring- | masked | Attention Network for Rotation-Invariant Template-Matching |
Robust Lane Detection Through Self Pre-Training With | masked | Sequential Autoencoders and Fine-Tuning With Customized PolyLoss |
Robust Multiview Multimodal Driver Monitoring System Using | masked | Multi-Head Self-Attention |
Robust Star Identification Algorithm Based on a | masked | Distance Map, A |
Rotated and | masked | Image Modeling: A Superior Self-Supervised Method for Classification |
rPPG-MAE: Self-Supervised Pretraining With | masked | Autoencoders for Remote Physiological Measurements |
Scale-MAE: A Scale-Aware | masked | Autoencoder for Multiscale Geospatial Representation Learning |
SdAE: Self-distillated | masked | Autoencoder |
SeaMAE: | masked | Pre-Training with Meteorological Satellite Imagery for Sea Fog Detection |
Seeing Beyond the Brain: Conditional Diffusion Model with Sparse | masked | Modeling for Vision Decoding |
Self-restrained triplet loss for accurate | masked | face recognition |
Self-Supervised Learning for Visual Relationship Detection through | masked | Bounding Box Reconstruction |
Self-Supervised Learning with | masked | Autoencoders for Teeth Segmentation from Intra-oral 3D Scans |
Self-Supervised Learning with | masked | Image Modeling for Teeth Numbering, Detection of Dental Restorations, and Instance Segmentation in Dental Panoramic Radiographs |
Self-Supervised | masked | Convolutional Transformer Block for Anomaly Detection |
Self-Supervised Multi-Scale Cropping and Simple | masked | Attentive Predicting for Lung CT-Scan Anomaly Detection |
Self-Supervised Pre-Training with | masked | Shape Prediction for 3D Scene Understanding |
SeMask: Semantically | masked | Transformers for Semantic Segmentation |
SimMIM: a Simple Framework for | masked | Image Modeling |
SkeletonMAE: Graph-based | masked | Autoencoder for Skeleton Sequence Pre-training |
SMAE: Few-shot Learning for HDR Deghosting with Saturation-Aware | masked | Autoencoders |
SMART: Semantic-Aware | masked | Attention Relational Transformer for Multi-label Image Recognition |
SMAUG: Sparse | masked | Autoencoder for Efficient Video-Language Pre-training |
SparseMAE: Sparse Training Meets | masked | Autoencoders |
Spectral Analysis of | masked | Signals in the Context of Image Inpainting |
Stare at What You See: | masked | Image Modeling without Reconstruction |
Stealthy Physical | masked | Face Recognition Attack via Adversarial Style Optimization |
Subgraph and object context- | masked | network for scene graph generation |
Supervised | masked | Knowledge Distillation for Few-Shot Transformers |
Swin-CasUNet: Cascaded U-Net with Swin Transformer for | masked | Face Restoration |
Target tracker with | masked | discriminative correlation filter |
Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal | masked | Video Generation |
Text-Conditioned Sampling Framework for Text-to-Image Generation with | masked | Generative Models |
Towards NIR-VIS | masked | Face Recognition |
Traj-MAE: | masked | Autoencoders for Trajectory Prediction |
TranPhys: Spatiotemporal | masked | Transformer Steered Remote Photoplethysmography Estimation |
Understanding | masked | Autoencoders via Hierarchical Latent Variable Models |
Understanding | masked | Image Modeling via Learning Occlusion Invariant Feature |
Unified Framework for | masked | and Mask-Free Face Recognition Via Feature Rectification, A |
Unleashing Vanilla Vision Transformer with | masked | Image Modeling for Object Detection |
Unmasking Your Expression: Expression-Conditioned GAN for | masked | Face Inpainting |
VideoMAE V2: Scaling Video | masked | Autoencoders with Dual Masking |
Voice Conversion Using Learnable Similarity-guided | masked | Autoencoder |
Weakly-Supervised Multiple Object Tracking Via A | masked | Center Point Warping Loss |
What to Hide from Your Students: Attention-Guided | masked | Image Modeling |
245 for masked