Paper ID | Paper Title | Category |
267 | Quaternion Equivariant Capsule Networks for 3D Point Clouds | Oral |
283 | DeepFit: 3D Surface Fitting by Neural Network Weighted Least Squares | Oral |
343 | MoSaNAS: Multi-Objective Surrogate-Assisted Neural Architecture Search | Oral |
384 | Describing Textures using Natural Language | Oral |
410 | Empowering Relational Network by Self-Attention Augmented Conditional Random Fields for Group Activity Recognition | Oral |
445 | AiR: Attention with Reasoning Capability | Oral |
500 | Self6D: Self-Supervised Monocular 6D Object Pose Estimation | Oral |
529 | Invertible Image Rescaling | Oral |
612 | Synthesize then Compare: Detecting Failures and Anomalies for Semantic Segmentation | Oral |
677 | House-GAN: Relational Generative Adversarial Networks for Graph-constrained House Layout Generation | Oral |
736 | Crowdsampling the Plenoptic Function | Oral |
738 | End-to-End Estimation of Multi-Person 3D Poses from Multiple Cameras | Oral |
832 | End-to-End Object Detection with Transformers | Oral |
840 | DeepSFM: Structure From Motion Via Deep Bundle Adjustment | Oral |
1044 | Ladybird: Deep Implicit Field Based 3D Reconstruction with Sampling and Symmetry | Oral |
1059 | Segment as Points for Efficient Online Multi-Object Tracking and Segmentation | Oral |
1105 | Conditional Convolutions for Instance Segmentation | Oral |
1196 | MutualNet: Adaptive ConvNet via Mutual Learning from Network Width and Resolution | Oral |
1203 | Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset | Oral |
1273 | Privacy Preserving Structure-from-Motion | Oral |
1326 | Rewriting a Deep Generative Model | Oral |
1417 | Compare and Reweight: Distinctive Image Captioning Using Similar Images Sets | Oral |
1448 | Long-term Human Motion Prediction with Scene Context | Oral |
1473 | NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis | Oral |
1501 | ReferIt3D: Neural Listeners for Fine-Grained 3D Object Identification in Real-World Scenes | Oral |
1737 | MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere Images | Oral |
1793 | Learning and aggregating deep local descriptors for instance-level recognition | Oral |
1969 | A Consistently Fast and Globally Optimal Solution to the Perspective-n-Point Problem | Oral |
2096 | Learn to Recover Visible Color for Video Surveillance in a Day | Oral |
2149 | Deep Fashion3D: A Dataset and Benchmark for 3D Garment Reconstruction from Single-view Images | Oral |
2193 | Spatially Adaptive Inference with Stochastic Feature Sampling and Interpolation | Oral |
2211 | BorderDet: Border Feature for Dense Object Detection | Oral |
2258 | Regularization with Latent Space Virtual Adversarial Training | Oral |
2263 | Du$^2$Net: Learning Depth Estimation from Dual-Cameras and Dual-Pixels | Oral |
2307 | Model-Agnostic Boundary-Adversarial Sampling for Test-Time Generalization in Few-Shot learning | Oral |
2463 | Targeted Attack for Deep Hashing based Retrieval | Oral |
2471 | Gradient Centralization: A New Optimization Technique for Deep Neural Networks | Oral |
2503 | Content-Aware Unsupervised Deep Homography Estimation | Oral |
2556 | Multi-View Optimization of Local Feature Geometry | Oral |
2597 | Efï¬cient Model Fitting by Combining Lifted Optimization with Phong Surface Models | Oral |
2641 | Forecasting Human-Object Interaction: Joint Prediction of Motor Attention and Actions in First Person Video | Oral |
2683 | Learning Stereo from Single Images | Oral |
2748 | Prototype Rectification for Few-Shot Learning | Oral |
2784 | Learning Feature Descriptors using Camera Pose Supervision | Oral |
2785 | Semantic Flow for Fast and Accurate Scene Parsing | Oral |
2788 | Appearance Consensus Driven Self-Supervised Human Mesh Recovery | Oral |
2825 | Diffraction Line Imaging | Oral |
2834 | Aligning and Projecting Images to Class-conditional Generative Networks | Oral |
2852 | Suppress and Balance: A Simple Gated Network for Salient Object Detection | Oral |
2904 | Visual Memorability for Robotic Interestingness Prediction via Unsupervised Online Learning | Oral |
2949 | Post-Training Piecewise Linear Quantization for Deep Neural Networks | Oral |
2974 | Joint Disentangling and Adaptation for Cross-Domain Person Re-Identification | Oral |
2978 | In-Home Daily-Life Captioning Using Radio Signals | Oral |
3018 | Self-Challenging Improves Cross-Domain Generalization | Oral |
3029 | A Competence-aware Curriculum for Visual Concepts Learning via Question Answering | Oral |
3047 | Multi-task Learning Increases Adversarial Robustness | Oral |
3054 | S2DNAS: Transforming Static CNN Model for Dynamic Inference via Neural Architecture Search | Oral |
3112 | Improving Deep Video Compression by Resolution-adaptive Flow Coding | Oral |
3158 | Motion Capture from Internet Videos | Oral |
3183 | Appearance-Preserving 3D Convolution for Video-based Person Re-identification | Oral |
3241 | Solving the Blind Perspective-n-Point Problem End-To-End With Robust Differentiable Geometric Optimization | Oral |
3265 | Exploiting Deep Generative Prior for Versatile Image Restoration and Manipulation | Oral |
3312 | Deep Spatial-angular Regularization for Compressive Light Field Reconstruction over Coded Apertures | Oral |
3331 | Video-based Remote Physiological Measurement via Cross-verified Feature Disentangling | Oral |
3356 | Combining Implicit Function Learning and Parametric Models for 3D Human Reconstruction | Oral |
3376 | Orientation-aware Vehicle Re-identification with Semantics-guided Part Attention Network | Oral |
3387 | Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation | Oral |
3439 | Coherent full scene 3D reconstruction from a single RGB image | Oral |
3482 | Layer-wise Conditioning Analysis in Exploring the Learning Dynamics of DNNs | Oral |
3526 | RAFT: Recurrent All-Pairs Field Transforms for Optical Flow | Oral |
3528 | Domain-invariant Stereo Matching Networks | Oral |
3538 | DeepHandMesh: Weakly-supervised Deep Encoder-Decoder Framework for High-fidelity Hand Mesh Modeling from a Single RGB Image | Oral |
3544 | Content Adaptive and Error Propagation Aware Deep Video Compression | Oral |
3553 | Towards Streaming Image Understanding | Oral |
3570 | Towards Automated Testing and Robustification by Semantic Adversarial Data Generation | Oral |
3582 | Adversarial Generative Grammars for Human Activity Prediction | Oral |
3587 | Greedy Sampler and Dumb Learner: A Surprisingly Effective Approach for Continual Learning | Oral |
3622 | Learning Lane Graph Representations for Motion Forecasting | Oral |
3651 | What Matters in Unsupervised Optical Flow | Oral |
3678 | Synthesis and Completion of Facades from Satellite Imagery | Oral |
3772 | Mapillary Planet-Scale Depth Dataset | Oral |
3838 | V2VNet: Vehicle-to-Vehicle Communication for Joint Perception and Prediction | Oral |
3891 | Training Interpretable Convolutional Neural Networks by Differentiating Class-specific Filters | Oral |
3948 | EagleEye: Fast Sub-net Evaluation for Efficient Neural Network Pruning | Oral |
3975 | Intrinsic Point Cloud Interpolation via Dual Latent Space Navigation | Oral |
3976 | Cross-Domain Cascaded Deep Translation | Oral |
4043 | "Look Ma, no landmarks!" - Unsupervised, model-based dense face alignment | Oral |
4158 | Online Invariance Selection for Local Feature Descriptors | Oral |
4179 | Rethinking image inpainting via a mutual encoder-decoder with feature equalization | Oral |
4358 | TextCaps: a Dataset for Image Captioning with Reading Comprehension | Oral |
4423 | It is not the Journey but the Destination: Endpoint Conditioned Trajectory Prediction | Oral |
4440 | Learning What to Learn for Video Object Segmentation | Oral |
4732 | SIZER: A Dataset and Model for Parsing 3D Clothing and Learning Size Sensitive 3D Clothing | Oral |
4866 | LIMP: Learning Latent Shape Representations with Metric Preservation Priors | Oral |
5277 | Unsupervised Sketch-to-Photo Synthesis | Oral |
5360 | A simple way to make neural networks robust against diverse image corruptions | Oral |
5457 | SoftpoolNet: Shape Descriptor for Point Cloud Completion and Classification | Oral |
5800 | Hierarchical Face Aging through Disentangled Latent Characteristics | Oral |
5859 | Hybrid Models for Open Set Recognition | Oral |
5932 | TopoGAN: A Topology-Aware Generative Adversarial Network | Oral |
6101 | Learning to Localize Actions from Moments | Oral |
6147 | ForkGAN: Seeing into the Rainy Night | Oral |
6209 | TCGM: An Information-Theoretic Framework for Semi-Supervised Multi-Modality Learning | Oral |
6502 | ExchNet: A Unified Hashing Network for Large-Scale Fine-Grained Image Retrieval | Oral |
22 | A Simple and Versatile Framework for Image-to-Image Translation | Spotlight |
43 | ProxyBNN: Learning Binarized Neural Networks via Proxy Matrices | Spotlight |
87 | Fair Attribute Classification through Latent Space De-biasing | Spotlight |
148 | HMOR: Hierarchical Multi-person Ordinal Relations for Monocular Multi-Person 3D Pose Estimation | Spotlight |
193 | Mask2CAD: 3D Shape Prediction by Learning to Segment and Retrieve | Spotlight |
223 | A Unified Framework of Surrogate Loss by Refactorization and Interpolation | Spotlight |
362 | Deep Reflectance Volumes: Relightable Reconstructions from Multi-View Photometric Images | Spotlight |
366 | Memory-augmented Dense Predictive Coding for Video Representation Learning | Spotlight |
378 | PointMixup: Augmentation for Point Clouds | Spotlight |
415 | Identity-Guided Human Semantic Parsing Learning for Person Re-Identification | Spotlight |
462 | Learning Gradient Fields for Shape Generation | Spotlight |
467 | Few-Shot Unsupervised Image Translation with a Content Conditioned Style Encoder | Spotlight |
492 | Corner Proposal Network for Anchor-free, Two-stage Object Detection | Spotlight |
495 | PhraseClick: Toward Achieving Flexible Interactive Segmentation by Phrase and Click | Spotlight |
513 | Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing | Spotlight |
526 | Learning Delicate Local Representations for Multi-Person Pose Estimation | Spotlight |
544 | Learning to plan with uncertain topological maps | Spotlight |
574 | Neural Design Network: Graphic Layout Generation with Constraints | Spotlight |
591 | Learning Open Set Network with Discriminative Reciprocal Points | Spotlight |
597 | Convolutional Occupancy Networks | Spotlight |
672 | Multi-person 3D Pose Estimation in Crowded Scenes Based on Multi-View Geometry | Spotlight |
849 | A General Toolbox for Understanding Errors in Object Detection | Spotlight |
893 | PointContrast: Unsupervised Pretraining for 3D Point Cloud Understanding | Spotlight |
922 | DSA: More Efficient Budgeted Pruning via Differentiable Sparsity Allocation | Spotlight |
990 | Circumventing Outliers of AutoAugment with Knowledge Distillation | Spotlight |
997 | S2DNet: Learning accurate correspondences for sparse-to-dense feature matching | Spotlight |
1054 | RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving | Spotlight |
1062 | Video Object Segmentation with Graph Memory Network | Spotlight |
1101 | Rethinking Bottleneck Structure for Efficient Mobile Network Design | Spotlight |
1104 | Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks | Spotlight |
1121 | Towards Part-aware Monocular 3D Human Pose Estimation: An Architecture Search Approach | Spotlight |
1207 | A Tool for Measuring and Mitigating Bias in Visual Datasets | Spotlight |
1327 | Contrastive Learning for Weakly Supervised Phrase Grounding | Spotlight |
1362 | Collaborative Learning of Gesture Recognition and 3D Hand Pose Estimation with Multi-Order Feature Analysis | Spotlight |
1425 | Studying the Transferability of Adv |