本专栏是计算机视觉方向论文收集积累,时间:2021年6月22日,来源:paper digest
欢迎关注原创公众号 【计算机视觉联盟】,回复 【西瓜书手推笔记】 可获取我的机器学习纯手推笔记!
直达笔记地址:机器学习手推笔记(GitHub地址)
1, TITLE: One Million Scenes for Autonomous Driving: ONCE Dataset
AUTHORS: JIAGENG MAO et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we introduce the ONCE (One millioN sCenEs) dataset for 3D object detection in the autonomous driving scenario. To facilitate future research on exploiting unlabeled data for 3D detection, we additionally provide a benchmark in which we reproduce and evaluate a variety of self-supervised and semi-supervised methods on the ONCE dataset.
2, TITLE: Practical Transferability Estimation for Image Classification Tasks
AUTHORS: Yang Tan ; Yang Li ; Shao-Lun Huang
CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG]
HIGHLIGHT: Consequently, we propose a practical transferability metric called JC-NCE score to further improve the cross-domain cross-task transferability estimation performance, which is more efficient than the OTCE score and more accurate than the OT-based NCE score.
3, TITLE: Informative Class Activation Maps
AUTHORS: Zhenyue Qin ; Dongwoo Kim ; Tom Gedeon
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: We study how to evaluate the quantitative information content of a region within an image for a particular label.
4, TITLE: Unbalanced Feature Transport for Exemplar-based Image Translation
AUTHORS: FANGNENG ZHAN et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: This paper presents a general image translation framework that incorporates optimal transport for feature alignment between conditional inputs and style exemplars in image translation.
5, TITLE: CompConv: A Compact Convolution Module for Efficient Feature Learning
AUTHORS: Chen Zhang ; Yinghao Xu ; Yujun Shen
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we make a close study of the convolution operator, which is the basic unit used in CNNs, to reduce its computing load.
6, TITLE: CenterAtt: Fast 2-stage Center Attention Network
AUTHORS: Jianyun Xu ; Xin Tang ; Jian Dou ; Xu Shu ; Yushi Zhu
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this technical report, we introduce the methods of HIKVISION_LiDAR_Det in the challenge of waymo open dataset real-time 3D detection.
7, TITLE: CUDA-GR: Controllable Unsupervised Domain Adaptation for Gaze Redirection
AUTHORS: Swati Jindal ; Xin Eric Wang
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose an unsupervised domain adaptation framework, called CUDA-GR, that learns to disentangle gaze representations from the labeled source domain and transfers them to an unlabeled target domain.
8, TITLE: Robust Pooling Through The Data Mode
AUTHORS: Ayman Mukhaimar ; Ruwan Tennakoon ; Chow Yin Lai ; Reza Hoseinnezhad ; AlirezaBab-Hadiashar
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: This paper proposes a deep learning solution that includes a novel robust pooling layer which greatly enhances network robustness and performs significantly faster than state-of-the-art approaches.
9, TITLE: Moving in A 360 World: Synthesizing Panoramic Parallaxes from A Single Panorama
AUTHORS: Ching-Yu Hsu ; Cheng Sun ; Hwann-Tzong Chen
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We present Omnidirectional Neural Radiance Fields (OmniNeRF), the first method to the application of parallax-enabled novel panoramic view synthesis.
10, TITLE: Multi-VAE: Learning Disentangled View-common and View-peculiar Visual Representations for Multi-view Clustering
AUTHORS: JIE XU et. al.
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: To address this issue, we present a novel VAE-based multi-view clustering framework (Multi-VAE) by learning disentangled visual representations.
11, TITLE: Dynamical Deep Generative Latent Modeling of 3D Skeletal Motion
AUTHORS: Amirreza Farnoosh ; Sarah Ostadabbas
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose a Bayesian switching dynamical model for segmentation of 3D pose data over time that uncovers interpretable patterns in the data and is generative.
12, TITLE: Interpretable Face Manipulation Detection Via Feature Whitening
AUTHORS: Yingying Hua ; Daichi Zhang ; Pengju Wang ; Shiming Ge
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we propose an interpretable face manipulation detection approach to achieve the trustworthy and accurate inference.
13, TITLE: Trainable Class Prototypes for Few-Shot Learning
AUTHORS: Jianyi Li ; Guizhong Liu
CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG]
HIGHLIGHT: In this paper we propose the trainable prototypes for distance measure instead of the artificial ones within the meta-training and task-training framework.
14, TITLE: Temporal Early Exits for Efficient Video Object Detection
AUTHORS: Amin Sabet ; Jonathon Hare ; Bashir Al-Hashimi ; Geoff V. Merrett
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose temporal early exits to reduce the computational complexity of per-frame video object detection.
15, TITLE: PIANO: A Parametric Hand Bone Model from Magnetic Resonance Imaging
AUTHORS: Yuwei Li ; Minye Wu ; Yuyao Zhang ; Lan Xu ; Jingyi Yu
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we present PIANO, the first parametric bone model of human hands from MRI data.
16, TITLE: Remote Sensing Images Semantic Segmentation with General Remote Sensing Vision Model Via A Self-Supervised Contrastive Learning Method
AUTHORS: HAIFENG LI et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: Therefore, we propose Global style and Local matching Contrastive Learning Network (GLCNet) for remote sensing semantic segmentation.
17, TITLE: Attack to Fool and Explain Deep Networks
AUTHORS: Naveed Akhtar ; Muhammad A. A. K. Jalwana ; Mohammed Bennamoun ; Ajmal Mian
CATEGORY: cs.CV [cs.CV, cs.AI, cs.CR, cs.LG]
HIGHLIGHT: In all, our major contribution is a novel pragmatic adversarial attack that is subsequently transformed into a tool to interpret the visual models.
18, TITLE: ReGO: Reference-Guided Outpainting for Scenery Image
AUTHORS: Yaxiong Wang ; Yunchao Wei ; Xueming Qian ; Li Zhu ; Yi Yang
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We aim to tackle the challenging yet practical scenery image outpainting task in this work.
19, TITLE: Neighborhood Contrastive Learning for Novel Class Discovery
AUTHORS: ZHUN ZHONG et. al.
CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG]
HIGHLIGHT: In this paper, we address Novel Class Discovery (NCD), the task of unveiling new classes in a set of unlabeled samples given a labeled dataset with known classes.
20, TITLE: Exploring Semantic Relationships for Unpaired Image Captioning
AUTHORS: Fenglin Liu ; Meng Gao ; Tianhao Zhang ; Yuexian Zou
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we achieve unpaired image captioning by bridging the vision and the language domains with high-level semantic information.
21, TITLE: Mobile Sensing for Multipurpose Applications in Transportation
AUTHORS: Armstrong Aboah ; Michael Boeding ; Yaw Adu-Gyamfi
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: Recent advancements in the sensors integrated into smartphones have resulted in a more affordable method of data collection.The primary objective of this study is to develop and implement a smartphone application for data collection.The currently designed app consists of three major modules: a frontend graphical user interface (GUI), a sensor module, and a backend module.
22, TITLE: More Than Encoder: Introducing Transformer Decoder to Upsample
AUTHORS: Yijiang Li ; Wentian Cai ; Ying Gao ; Xiping Hu
CATEGORY: cs.CV [cs.CV, eess.IV]
HIGHLIGHT: In this paper, we present a new upsample approach, Attention Upsample (AU), that could serve as general upsample method and be incorporated into any segmentation model that possesses lateral connections.
23, TITLE: Augmented 2D-TAN: A Two-stage Approach for Human-centric Spatio-Temporal Video Grounding
AUTHORS: Chaolei Tan ; Zihang Lin ; Jian-Fang Hu ; Xiang Li ; Wei-Shi Zheng
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We propose an effective two-stage approach to tackle the problem of language-based Human-centric Spatio-Temporal Video Grounding (HC-STVG) task.
24, TITLE: FloorPP-Net: Reconstructing Floor Plans Using Point Pillars for Scan-to-BIM
AUTHORS: Yijie Wu ; Fan Xue
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: This paper presents a deep learning-based point cloud processing method named FloorPP-Net for the task of Scan-to-BIM (building information model).
25, TITLE: CAMERAS: Enhanced Resolution And Sanity Preserving Class Activation Mapping for Image Saliency
AUTHORS: Mohammad A. A. K. Jalwana ; Naveed Akhtar ; Mohammed Bennamoun ; Ajmal Mian
CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG]
HIGHLIGHT: We propose CAMERAS, a technique to compute high-fidelity backpropagation saliency maps without requiring any external priors and preserving the map sanity.
26, TITLE: TCIC: Theme Concepts Learning Cross Language and Vision for Image Captioning
AUTHORS: ZHIHAO FAN et. al.
CATEGORY: cs.CV [cs.CV, cs.CL]
HIGHLIGHT: In this paper, we propose a Theme Concepts extended Image Captioning (TCIC) framework that incorporates theme concepts to represent high-level cross-modality semantics.
27, TITLE: CLIP2Video: Mastering Video-Text Retrieval Via Image CLIP
AUTHORS: Han Fang ; Pengfei Xiong ; Luhui Xu ; Yu Chen
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We present CLIP2Video network to transfer the image-language pre-training model to video-text retrieval in an end-to-end manner.
28, TITLE: Obstacle Detection for BVLOS Drones
AUTHORS: Jan Moros Esteban
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: A deep learning powered object detection method is the subject of our research, and various experiments are held to maximize its performance, such as comparing various data augmentation techniques or YOLOv3 and YOLOv5.
29, TITLE: Hard Hat Wearing Detection Based on Head Keypoint Localization
AUTHORS: Bartosz W�jcik ; Mateusz ?arski ; Kamil Ksi??ek ; Jaros?aw Adam Miszczak ; Miros?aw Jan Skibniewski
CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG, cs.NE]
HIGHLIGHT: To answer this problem a combination of deep learning, object detection and head keypoint localization, with simple rule-based reasoning is proposed in this article.
30, TITLE: Distilling Effective Supervision for Robust Medical Image Segmentation with Noisy Labels
AUTHORS: Jialin Shi ; Ji Wu
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we propose a novel framework to address segmenting with noisy labels by distilling effective supervision information from both pixel and image levels.
31, TITLE: Surgical Data Science for Safe Cholecystectomy: A Protocol for Segmentation of Hepatocystic Anatomy and Assessment of The Critical View of Safety
AUTHORS: PIETRO MASCAGNI et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: Here, we present a protocol, checklists, and visual examples to promote consistent annotation of hepatocystic anatomy and CVS criteria.
32, TITLE: Neural Marching Cubes
AUTHORS: Zhiqin Chen ; Hao Zhang
CATEGORY: cs.CV [cs.CV, cs.GR, cs.LG]
HIGHLIGHT: We introduce Neural Marching Cubes (NMC), a data-driven approach for extracting a triangle mesh from a discretized implicit field.
33, TITLE: Unsupervised Deep Learning By Injecting Low-Rank and Sparse Priors
AUTHORS: Tomoya Sakai
CATEGORY: cs.CV [cs.CV, eess.IV]
HIGHLIGHT: We focus on employing sparsity-inducing priors in deep learning to encourage the network to concisely capture the nature of high-dimensional data in an unsupervised way.
34, TITLE: Cross-layer Navigation Convolutional Neural Network for Fine-grained Visual Classification
AUTHORS: Chenyu Guo ; Jiyang Xie ; Kongming Liang ; Xian Sun ; Zhanyu Ma
CATEGORY: cs.CV [cs.CV, 14J60 (Primary) 14F05, 14J26 (Secondary), F.2.2; I.2.7]
HIGHLIGHT: Therefore, in this paper, we propose cross-layer navigation convolutional neural network for feature fusion.
35, TITLE: OadTR: Online Action Detection with Transformers
AUTHORS: XIANG WANG et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose a new encoder-decoder framework based on Transformers, named OadTR, to tackle these problems.
36, TITLE: The Arm-Swing Is Discriminative in Video Gait Recognition for Athlete Re-Identification
AUTHORS: Yapkan Choi ; Yeshwanth Napolean ; Jan C. van Gemert
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper we evaluate running gait as an attribute for video person re-identification in a long-distance running event.
37, TITLE: FP-Age: Leveraging Face Parsing Attention for Facial Age Estimation in The Wild
AUTHORS: Yiming Lin ; Jie Shen ; Yujiang Wang ; Maja Pantic
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To address this issue, we propose a simple yet effective method to explicitly incorporate facial semantics into age estimation, so that the model would learn to correctly focus on the most informative facial components from unaligned facial images regardless of head pose and non-rigid deformation. To evaluate our method on in-the-wild data, we also introduce a new challenging large-scale benchmark called IMDB-Clean.
38, TITLE: VIMPAC: Video Pre-Training Via Masked Token Prediction and Contrastive Learning
AUTHORS: Hao Tan ; Jie Lei ; Thomas Wolf ; Mohit Bansal
CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG]
HIGHLIGHT: To deal with this issue, we propose a block-wise masking strategy where we mask neighboring video tokens in both spatial and temporal domains.
39, TITLE: Automatic Plant Cover Estimation with CNNs Automatic Plant Cover Estimation with Convolutional Neural Networks
AUTHORS: MATTHIAS K�RSCHENS et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To remedy these caveats, we investigate approaches using convolutional neural networks (CNNs) to automatically extract the relevant data from images, focusing on plant community composition and species coverages of 9 herbaceous plant species.
40, TITLE: An End-to-End Khmer Optical Character Recognition Using Sequence-to-Sequence with Attention
AUTHORS: Rina Buoy ; Sokchea Kor ; Nguonly Taing
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: This paper presents an end-to-end deep convolutional recurrent neural network solution for Khmer optical character recognition (OCR) task.
41, TITLE: Total Generate: Cycle in Cycle Generative Adversarial Networks for Generating Human Faces, Hands, Bodies, and Natural Scenes
AUTHORS: Hao Tang ; Nicu Sebe
CATEGORY: cs.CV [cs.CV, cs.AI, cs.MM]
HIGHLIGHT: We propose a novel and unified Cycle in Cycle Generative Adversarial Network (C2GAN) for generating human faces, hands, bodies, and natural scenes.
42, TITLE: Knowledge Distillation Via Instance-level Sequence Learning
AUTHORS: Haoran Zhao ; Xin Sun ; Junyu Dong ; Zihe Dong ; Qiong Li
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we provide a curriculum learning knowledge distillation framework via instance-level sequence learning.
43, TITLE: Affect-driven Engagement Measurement from Videos
AUTHORS: Ali Abedi ; Shehroz Khan
CATEGORY: cs.CV [cs.CV, cs.HC]
HIGHLIGHT: In this paper, we present a novel approach for video-based engagement measurement in virtual learning programs.
44, TITLE: AdaZoom: Adaptive Zoom Network for Multi-Scale Object Detection in Large Scenes
AUTHORS: Jingtao Xu ; Yali Li ; Shengjin Wang
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose a novel Adaptive Zoom (AdaZoom) network as a selective magnifier with flexible shape and focal length to adaptively zoom the focus regions for object detection.
45, TITLE: Confidence-Guided Radiology Report Generation
AUTHORS: YIXIN WANG et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose a novel method to explicitly quantify both the visual uncertainty and the textual uncertainty for the task of radiology report generation.
46, TITLE: Structured Sparse R-CNN for Direct Scene Graph Generation
AUTHORS: Yao Teng ; Limin Wang
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: Instead, from a perspective on SGG as a direct set prediction, this paper presents a simple, sparse, and unified framework for relation detection, termed as Structured Sparse R-CNN.
47, TITLE: ToAlign: Task-oriented Alignment for Unsupervised Domain Adaptation
AUTHORS: Guoqiang Wei ; Cuiling Lan ; Wenjun Zeng ; Zhibo Chen
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose an effective Task-oriented Alignment (ToAlign) for unsupervised domain adaptation (UDA).
48, TITLE: DiGS : Divergence Guided Shape Implicit Neural Representation for Unoriented Point Clouds
AUTHORS: Yizhak Ben-Shabat ; Chamin Hewa Koneputugodage ; Stephen Gould
CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG]
HIGHLIGHT: In this paper, we propose a divergence guided shape representation learning approach that does not require normal vectors as input.
49, TITLE: Adversarial Manifold Matching Via Deep Metric Learning for Generative Modeling
AUTHORS: Mengyu Dai ; Haibin Hang
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We propose a manifold matching approach to generative models which includes a distribution generator (or data generator) and a metric generator.
50, TITLE: 3D Object Detection for Autonomous Driving: A Survey
AUTHORS: Rui Qian ; Xin Lai ; Xirong Li
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: As such, we present a comprehensive review of the latest progress in this field covering all the main topics including sensors, fundamentals, and the recent state-of-the-art detection methods with their pros and cons.
51, TITLE: Two-Stream Consensus Network: Submission to HACS Challenge 2021 Weakly-Supervised Learning Track
AUTHORS: Yuanhao Zhai ; Le Wang ; David Doermann ; Junsong Yuan
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: This technical report presents our solution to the HACS Temporal Action Localization Challenge 2021, Weakly-Supervised Learning Track.
52, TITLE: Towards Single Stage Weakly Supervised Semantic Segmentation
AUTHORS: Peri Akiva ; Kristin Dana
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: The costly process of obtaining semantic segmentation labels has driven research towards weakly supervised semantic segmentation (WSSS) methods, using only image-level, point, or box labels.
53, TITLE: Learning to Track Object Position Through Occlusion
AUTHORS: Satyaki Chakraborty ; Martial Hebert
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: We propose to address this with a `tracking-by-detection` approach that builds upon the success of region based video object detectors.
54, TITLE: Applying VertexShuffle Toward 360-Degree Video Super-Resolution on Focused-Icosahedral-Mesh
AUTHORS: Na Li ; Yao Liu
CATEGORY: cs.CV [cs.CV, cs.MM]
HIGHLIGHT: To address the bandwidth waste problem associated with 360-degree video streaming and save computation, we exploit Focused Icosahedral Mesh to represent a small area and construct matrices to rotate spherical content to the focused mesh area. To evaluate our model, we also collect a set of high-resolution 360-degree videos to generate a spherical image dataset.
55, TITLE: Automated Deepfake Detection
AUTHORS: Ping Liu
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose to utilize Automated Machine Learning to automatically search architecture for deepfake detection.
56, TITLE: Understanding Object Dynamics for Interactive Image-to-Video Synthesis
AUTHORS: Andreas Blattmann ; Timo Milbich ; Michael Dorkenwald ; Bj�rn Ommer
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We present an approach that learns naturally-looking global articulations caused by a local manipulation at a pixel level.
57, TITLE: Humble Teachers Teach Better Students for Semi-Supervised Object Detection
AUTHORS: Yihe Tang ; Weifeng Chen ; Yijun Luo ; Yuting Zhang
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We propose a semi-supervised approach for contemporary object detectors following the teacher-student dual model framework.
58, TITLE: Simple Distillation Baselines for Improving Small Self-supervised Models
AUTHORS: Jindong Gu ; Wei Liu ; Yonglong Tian
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this report, we explore simple baselines for improving small self-supervised models via distillation, called SimDis.
59, TITLE: TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
AUTHORS: Michael S. Ryoo ; AJ Piergiovanni ; Anurag Arnab ; Mostafa Dehghani ; Anelia Angelova
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: In this paper, we introduce a novel visual representation learning which relies on a handful of adaptively learned tokens, and which is applicable to both image and video understanding tasks.
60, TITLE: MSN: Efficient Online Mask Selection Network for Video Instance Segmentation
AUTHORS: Vidit Goel ; Jiachen Li ; Shubhika Garg ; Harsh Maheshwari ; Humphrey Shi
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: In this work we present a novel solution for Video Instance Segmentation(VIS), that is automatically generating instance level segmentation masks along with object class and tracking them in a video.
61, TITLE: Single View Physical Distance Estimation Using Human Pose
AUTHORS: XIAOHAN FEI et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We propose a fully automated system that simultaneously estimates the camera intrinsics, the ground plane, and physical distances between people from a single RGB image or video captured by a camera viewing a 3-D scene from a fixed vantage point.
62, TITLE: Fast Simultaneous Gravitational Alignment of Multiple Point Sets
AUTHORS: Vladislav Golyanik ; Soshi Shimada ; Christian Theobalt
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: This paper proposes a new resilient technique for simultaneous registration of multiple point sets by interpreting the latter as particle swarms rigidly moving in the mutually induced force fields.
63, TITLE: Place Recognition Survey: An Update on Deep Learning Approaches
AUTHORS: Tiago Barros ; Ricardo Pereira ; Lu�s Garrote ; Cristiano Premebida ; Urbano J. Nunes
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: The contributions of this work are twofold: surveying recent sensors such as 3D LiDARs and RADARs, applied in place recognition; and categorizing the various DL-based place recognition works into supervised, unsupervised, semi-supervised, parallel, and hierarchical categories.
64, TITLE: Interactive Object Segmentation with Dynamic Click Transform
AUTHORS: Chun-Tse Lin ; Wei-Chih Tu ; Chih-Ting Liu ; Shao-Yi Chien
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We propose a Dynamic Click Transform Network~(DCT-Net), consisting of Spatial-DCT and Feature-DCT, to better represent user interactions.
65, TITLE: Towards Long-Form Video Understanding
AUTHORS: Chao-Yuan Wu ; Philipp Kr�henb�hl
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we study long-form video understanding.
66, TITLE: Large-scale Image Segmentation Based on Distributed Clustering Algorithms
AUTHORS: Ran Lu ; Aleksandar Zlateski ; H. Sebastian Seung
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: Here we describe a distributed algorithm capable of handling a tremendous number of supervoxels.
67, TITLE: Neural Network Facial Authentication for Public Electric Vehicle Charging Station
AUTHORS: Muhamad Amin Husni Abdul Haris ; Sin Liang Lim
CATEGORY: cs.CV [cs.CV, eess.IV]
HIGHLIGHT: The comparisons are both implemented on the facial vectors extracted using the Histogram of Oriented Gradients (HOG) method and use the same dataset for a fair comparison.
68, TITLE: A System of Vision Sensor Based Deep Neural Networks for Complex Driving Scene Analysis in Support of Crash Risk Assessment and Prevention
AUTHORS: Muhammad Monjurul Karim ; Yu Li ; Ruwen Qin ; Zhaozheng Yin
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: To fill the gap, this paper develops a scene analysis system.
69, TITLE: Attend What You Need: Motion-Appearance Synergistic Networks for Video Question Answering
AUTHORS: Ahjeong Seo ; Gi-Cheon Kang ; Joonhan Park ; Byoung-Tak Zhang
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: We propose Motion-Appearance Synergistic Networks (MASN), which embed two cross-modal features grounded on motion and appearance information and selectively utilize them depending on the question's intentions.
70, TITLE: 3D Shape Registration Using Spectral Graph Embedding and Probabilistic Matching
AUTHORS: Avinash Sharma ; Radu Horaud ; Diana Mateus
CATEGORY: cs.CV [cs.CV, stat.ML]
HIGHLIGHT: We address the problem of 3D shape registration and we propose a novel technique based on spectral graph theory and probabilistic matching.
71, TITLE: TNT: Text-Conditioned Network with Transductive Inference for Few-Shot Video Classification
AUTHORS: ANDR�S VILLA et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, for the first time, we propose to leverage these human-provided textual descriptions as privileged information when training a few-shot video classification model.
72, TITLE: Plant Disease Detection Using Image Processing and Machine Learning
AUTHORS: PRANESH KULKARNI et. al.
CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG]
HIGHLIGHT: This paper proposes a smart and efficient technique for detection of crop disease which uses computer vision and machine learning techniques.
73, TITLE: The Animal ID Problem: Continual Curation
AUTHORS: Charles V. Stewart ; Jason R. Parham ; Jason Holmberg ; Tanya Y. Berger-Wolf
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: Hoping to stimulate new research in individual animal identification from images, we propose to formulate the problem as the human-machine Continual Curation of images and animal identities.
74, TITLE: SHREC 2021: Track on Skeleton-based Hand Gesture Recognition in The Wild
AUTHORS: ARIEL CAPUTO et. al.
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: This paper presents the result of the contest, showing the performances of the techniques proposed by four research groups on the challenging task compared with a simple baseline method. For this contest, we created a novel dataset with heterogeneous gestures featuring different types and duration.
75, TITLE: Quality-Aware Memory Network for Interactive Volumetric Image Segmentation
AUTHORS: Tianfei Zhou ; Liulei Li ; Gustav Bredell ; Jianwu Li ; Ender Konukoglu
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we propose a quality-aware memory network for interactive segmentation of 3D medical images.
76, TITLE: Solution for Large-scale Long-tailed Recognition with Noisy Labels
AUTHORS: Yuqiao Xian ; Jia-Xin Zhuang ; Fufu Yu
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: In our solution, we adopt stateof-the-art model architectures of both CNNs and Transformer, including ResNeSt, EfficientNetV2, and DeiT.
77, TITLE: Tag, Copy or Predict: A Unified Weakly-Supervised Learning Framework for Visual Information Extraction Using Sequences
AUTHORS: JIAPENG WANG et. al.
CATEGORY: cs.CV [cs.CV, cs.AI, cs.IR, cs.LG, cs.MM]
HIGHLIGHT: In this paper, we propose a unified weakly-supervised learning framework called TCPN (Tag, Copy or Predict Network), which introduces 1) an efficient encoder to simultaneously model the semantic and layout information in 2D OCR results; 2) a weakly-supervised training strategy that utilizes only key information sequences as supervision; and 3) a flexible and switchable decoder which contains two inference modes: one (Copy or Predict Mode) is to output key information sequences of different categories by copying a token from the input or predicting one in each time step, and the other (Tag Mode) is to directly tag the input sequence in a single forward pass.
78, TITLE: Exploring Visual Context for Weakly Supervised Person Search
AUTHORS: YICHAO YAN et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We proposed the first framework to address this novel task, namely Context-Guided Person Search (CGPS), by investigating three levels of context clues (i.e., detection, memory and scene) in unconstrained natural images.
79, TITLE: Can Poachers Find Animals from Public Camera Trap Images?
AUTHORS: Sara Beery ; Elizabeth Bondi
CATEGORY: cs.CV [cs.CV, cs.AI, q-bio.PE]
HIGHLIGHT: In this paper, we investigate the robustness of geo-obfuscation for maintaining camera trap location privacy, and show via a case study that a few simple, intuitive heuristics and publicly available satellite rasters can be used to reduce the area likely to contain the camera by 87% (assuming random obfuscation within 1km), demonstrating that geo-obfuscation may be less effective than previously believed.
80, TITLE: NeuS: Learning Neural Implicit Surfaces By Volume Rendering for Multi-view Reconstruction
AUTHORS: PENG WANG et. al.
CATEGORY: cs.CV [cs.CV, cs.GR]
HIGHLIGHT: We present a novel neural surface reconstruction method, called NeuS, for reconstructing objects and scenes with high fidelity from 2D image inputs.
81, TITLE: Interventional Video Grounding with Dual Contrastive Learning
AUTHORS: GUOSHUN NAN et. al.
CATEGORY: cs.CV [cs.CV, cs.CL]
HIGHLIGHT: Existing approaches focus more on the alignment of visual and language stimuli with various likelihood-based matching or regression strategies, i.e., P(Y|X).
82, TITLE: Pre-training Also Transfers Non-Robustness
AUTHORS: Jiaming Zhang ; Jitao Sang ; Qi Yi ; Huiwen Dong ; Jian Yu
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: In spite of its recognized contribution to generalization, we observed in this study that pre-training also transfers the non-robustness from pre-trained model into the fine-tuned model.
83, TITLE: Multiple Object Tracking with Mixture Density Networks for Trajectory Estimation
AUTHORS: Andreu Girbau ; Xavier Gir�-i-Nieto ; Ignasi Rius ; Ferran Marqu�s
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we show that trajectory estimation can become a key factor for tracking, and present TrajE, a trajectory estimator based on recurrent mixture density networks, as a generic module that can be added to existing object trackers.
84, TITLE: Segmentation of Cell-level Anomalies in Electroluminescence Images of Photovoltaic Modules
AUTHORS: URTZI OTAMENDI et. al.
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: In this article, we propose an end-to-end deep learning pipeline that detects, locates and segments cell-level anomalies from entire photovoltaic modules via EL images.
85, TITLE: SODA10M: Towards Large-Scale Object Detection Benchmark for Autonomous Driving
AUTHORS: JIANHUA HAN et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: Aiming at facilitating a real-world, ever-evolving and scalable autonomous driving system, we present a large-scale benchmark for standardizing the evaluation of different self-supervised and semi-supervised approaches by learning from raw data, which is the first and largest benchmark to date. Here, we release a Large-Scale Object Detection benchmark for Autonomous driving, named as SODA10M, containing 10 million unlabeled images and 20K images labeled with 6 representative object categories.
86, TITLE: CataNet: Predicting Remaining Cataract Surgery Duration
AUTHORS: ANDR�S MARAFIOTI et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To this end, we propose CataNet, a method for cataract surgeries that predicts in real time the RSD jointly with two influential elements: the surgeon's experience, and the current phase of the surgery.
87, TITLE: Visual Probing: Cognitive Framework for Explaining Self-Supervised Image Representations
AUTHORS: WITOLD OLESZKIEWICZ et. al.
CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG]
HIGHLIGHT: Hence, we propose a systematic approach to obtain analogs of natural language in vision, such as visual words, context, and taxonomy.
88, TITLE: VQA-Aid: Visual Question Answering for Post-Disaster Damage Assessment and Analysis
AUTHORS: Argho Sarkar ; Maryam Rahnemoonfar
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we address the importance of \textit{visual question answering (VQA)} task for post-disaster damage assessment by presenting our recently developed VQA dataset called \textit{HurMic-VQA} collected during hurricane Michael, and comparing the performances of baseline VQA models.
89, TITLE: Supervised Learning for Crop/weed Classification Based on Color and Texture Features
AUTHORS: Faiza Mekhalfa ; Fouad Yacef
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: This paper investigates the use of color and texture features for discrimination of Soybean crops and weeds.
90, TITLE: Low-Power Multi-Camera Object Re-Identification Using Hierarchical Neural Networks
AUTHORS: ABHINAV GOEL et. al.
CATEGORY: cs.CV [cs.CV, eess.IV]
HIGHLIGHT: This paper describes a low-power technique for the object re-identification (reID) problem: matching a query image against a gallery of previously seen images.
91, TITLE: Exploring Vision Transformers for Fine-grained Classification
AUTHORS: Marcos V. Conde ; Kerem Turgutlu
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: In this work, we propose a multi-stage ViT framework for fine-grained image classification tasks, which localizes the informative image regions without requiring architectural changes using the inherent multi-head self-attention mechanism.
92, TITLE: Delving Into The Pixels of Adversarial Samples
AUTHORS: Blerta Lindqvist
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: Based on the insights of pixel-level examination, we find new ways to detect some of the strongest current attacks.
93, TITLE: Video Summarization Through Reinforcement Learning with A 3D Spatio-Temporal U-Net
AUTHORS: TIANRUI LIU et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we introduce the 3DST-UNet-RL framework for video summarization.
94, TITLE: Crop-Transform-Paste: Self-Supervised Learning for Visual Tracking
AUTHORS: XIN LI et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we develop the Crop-Transform-Paste operation, which is able to synthesize sufficient training data by simulating various kinds of scene variations during tracking, including appearance variations of objects and background changes.
95, TITLE: TGRNet: A Table Graph Reconstruction Network for Table Structure Recognition
AUTHORS: Wenyuan Xue ; Baosheng Yu ; Wen Wang ; Dacheng Tao ; Qingyong Li
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: In this paper, we reformulate the problem of table structure recognition as the table graph reconstruction, and propose an end-to-end trainable table graph reconstruction network (TGRNet) for table structure recognition.
96, TITLE: Active Learning for Deep Neural Networks on Edge Devices
AUTHORS: Yuya Senzaki ; Christian Hamelain
CATEGORY: cs.LG [cs.LG, cs.CV, cs.NI]
HIGHLIGHT: In this paper, we formalize a practical active learning problem for DNNs on edge devices and propose a general task-agnostic framework to tackle this problem, which reduces it to a stream submodular maximization.
97, TITLE: Deep Generative Learning Via Schr�dinger Bridge
AUTHORS: Gefei Wang ; Yuling Jiao ; Qian Xu ; Yang Wang ; Can Yang
CATEGORY: cs.LG [cs.LG, cs.CV]
HIGHLIGHT: We propose to learn a generative model via entropy interpolation with a Schr\"{o}dinger Bridge.
98, TITLE: Practical Assessment of Generalization Performance Robustness for Deep Networks Via Contrastive Examples
AUTHORS: XUANYU WU et. al.
CATEGORY: cs.LG [cs.LG, cs.CV]
HIGHLIGHT: In this work, we propose a practical framework ContRE (The word "contre" means "against" or "versus" in French.)
99, TITLE: Task Attended Meta-Learning for Few-Shot Learning
AUTHORS: Aroof Aimen ; Sahil Sidheekh ; Narayanan C. Krishnan
CATEGORY: cs.LG [cs.LG, cs.CV]
HIGHLIGHT: In this work, we study the importance of a batch for ML.
100, TITLE: A Game-Theoretic Taxonomy of Visual Concepts in DNNs
AUTHORS: Xu Cheng ; Chuntung Chu ; Yi Zheng ; Jie Ren ; Quanshi Zhang
CATEGORY: cs.LG [cs.LG, cs.CV]
HIGHLIGHT: In this paper, we rethink how a DNN encodes visual concepts of different complexities from a new perspective, i.e. the game-theoretic multi-order interactions between pixels in an image.
101, TITLE: Leveraging Conditional Generative Models in A General Explanation Framework of Classifier Decisions
AUTHORS: Martin Charachon ; Paul-Henry Courn�de ; C�line Hudelot ; Roberto Ardon
CATEGORY: cs.LG [cs.LG, cs.CV]
HIGHLIGHT: In this paper, we propose a new general perspective of the visual explanation problem overcoming these limitations.
102, TITLE: Attention-based Neural Network for Driving Environment Complexity Perception
AUTHORS: Ce Zhang ; Azim Eskandarian ; Xuelai Du
CATEGORY: cs.LG [cs.LG, cs.CV, cs.RO, eess.IV]
HIGHLIGHT: This paper proposes a novel attention-based neural network model to predict the complexity level of the surrounding driving environment.
103, TITLE: Sparse Training Via Boosting Pruning Plasticity with Neuroregeneration
AUTHORS: SHIWEI LIU et. al.
CATEGORY: cs.LG [cs.LG, cs.CV]
HIGHLIGHT: Based on the insights from pruning plasticity, we design a novel gradual magnitude pruning (GMP) method, named gradual pruning with zero-cost neuroregeneration (GraNet), and its dynamic sparse training (DST) variant (GraNet-ST).
104, TITLE: How Do Adam and Training Strategies Help BNNs Optimization?
AUTHORS: ZECHUN LIU et. al.
CATEGORY: cs.LG [cs.LG, cs.AI, cs.CV]
HIGHLIGHT: Through extensive experiments and analysis, we derive a simple training scheme, building on existing Adam-based optimization, which achieves 70.5% top-1 accuracy on the ImageNet dataset using the same architecture as the state-of-the-art ReActNet while achieving 1.1% higher accuracy.
105, TITLE: Prediction of The Facial Growth Direction with Machine Learning Methods
AUTHORS: Stanis?aw Ka?mierczak ; Zofia Juszka ; Piotr Fudalej ; Jacek Ma?dziuk
CATEGORY: cs.LG [cs.LG, cs.CV]
HIGHLIGHT: Conducted data analysis reveals the inherent complexity of the problem and explains the reasons of difficulty in FG direction prediction based on 2D X-ray images.
106, TITLE: Contrastive Multi-Modal Clustering
AUTHORS: Jie Xu ; Huayi Tang ; Yazhou Ren ; Xiaofeng Zhu ; Lifang He
CATEGORY: cs.LG [cs.LG, cs.CV]
HIGHLIGHT: In this paper, we propose Contrastive Multi-Modal Clustering (CMMC) which can mine high-level semantic information via contrastive learning.
107, TITLE: Does Optimal Source Task Performance Imply Optimal Pre-training for A Target Task?
AUTHORS: Steven Gutstein ; Brent Lance ; Sanjay Shakkottai
CATEGORY: cs.LG [cs.LG, cs.CV]
HIGHLIGHT: We performed several experiments demonstrating this effect, as well as the influence of amount of training and of learning rate.
108, TITLE: Graceful Degradation and Related Fields
AUTHORS: Jack Dymond
CATEGORY: cs.LG [cs.LG, cs.CV]
HIGHLIGHT: This work presents a definition and discussion of graceful degradation and where it can be applied in deployed visual systems.
109, TITLE: Multi-Contextual Design of Convolutional Neural Network for Steganalysis
AUTHORS: Brijesh Singh ; Arijit Sur ; Pinaki Mitra
CATEGORY: cs.MM [cs.MM, cs.CV, eess.IV]
HIGHLIGHT: In this work, unlike the conventional approaches, the proposed model first extracts the noise residual using learned denoising kernels to boost the signal-to-noise ratio.
110, TITLE: Domain and Modality Gaps for LiDAR-based Person Detection on Mobile Robots
AUTHORS: Dan Jia ; Alexander Hermans ; Bastian Leibe
CATEGORY: cs.RO [cs.RO, cs.CV]
HIGHLIGHT: For the domain gap, we aim to understand if detectors pretrained on driving datasets can achieve good performance on the mobile robot scenarios, for which there are currently no trained models readily available.
111, TITLE: GLIB: Towards Automated Test Oracle for Graphically-Rich Applications
AUTHORS: KE CHEN et. al.
CATEGORY: cs.SE [cs.SE, cs.CV, cs.LG, 68N01, 68T45, D.2.5; I.2.10]
HIGHLIGHT: In this paper, we present the first step in automating the test oracle for detecting non-crashing bugs in graphically-rich applications.
112, TITLE: Direct Reconstruction of Linear Parametric Images from Dynamic PET Using Nonlocal Deep Image Prior
AUTHORS: Kuang Gong ; Ciprian Catana ; Jinyi Qi ; Quanzheng Li
CATEGORY: eess.IV [eess.IV, cs.CV, physics.med-ph]
HIGHLIGHT: In this work, we proposed an unsupervised deep learning framework for direct parametric reconstruction from dynamic PET, which was tested on the Patlak model and the relative equilibrium Logan model.
113, TITLE: Implementing A Detection System for COVID-19 Based on Lung Ultrasound Imaging and Deep Learning
AUTHORS: Carlos Rojas-Azabache ; Karen Vilca-Janampa ; Renzo Guerrero-Huayta ; Dennis N��ez-Fern�ndez
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: In this paper we present the ongoing work on a system for COVID-19 detection using ultrasound imaging and using Deep Learning techniques.
114, TITLE: Underwater Image Restoration Via Contrastive Learning and A Real-world Dataset
AUTHORS: JUNLIN HAN et. al.
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: To address this gap, we have constructed a large-scale real underwater image dataset, dubbed `HICRD' (Heron Island Coral Reef Dataset), for the purpose of benchmarking existing methods and supporting the development of new deep-learning based methods.
115, TITLE: Nuclei Grading of Clear Cell Renal Cell Carcinoma in Histopathological Image By Composite High-Resolution Network
AUTHORS: ZEYU GAO et. al.
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: In this paper, we propose a Composite High-Resolution Network for ccRCC nuclei grading. Furthermore, we introduce a dataset for ccRCC nuclei grading, containing 1000 image patches with 70945 annotated nuclei.
116, TITLE: Brain Tumor Grade Classification Using LSTM Neural Networks with Domain Pre-Transforms
AUTHORS: Maedeh Sadat Fasihi ; Wasfy B. Mikhael
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: To alleviate thislimitation, in this study, we propose a weakly supervised imageclassification method based on combination of hand-craftedfeatures.
117, TITLE: One-to-many Approach for Improving Super-Resolution
AUTHORS: Sieun Park ; Eunho Lee
CATEGORY: eess.IV [eess.IV, cs.CV, cs.LG]
HIGHLIGHT: To achieve this, we propose adding weighted pixel-wise noise after every Residual-in-Residual Dense Block (RRDB) to enable the generator to generate various images.
118, TITLE: Reversible Colour Density Compression of Images Using CGANs
AUTHORS: Arun Jose ; Abraham Francis
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: We examine the use of conditional generative adversarial networks in making this transformation more feasible, through learning a mapping between the images and a loss function to train on.
119, TITLE: Estimating MRI Image Quality Via Image Reconstruction Uncertainty
AUTHORS: Richard Shaw ; Carole H. Sudre ; Sebastien Ourselin ; M. Jorge Cardoso
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: In this work, we pose MR image quality assessment from an image reconstruction perspective.