Bootstrap

[论文记录] 2021 - Video Swin Transformer

[论文记录] 2021 - Video Swin Transformer(更新中)


论文简介

原论文:Video Swin Transformer1

论文地址:https://arxiv.org/abs/2106.13230

以下仅为作者阅读论文时的记录,学识浅薄,如有错误,欢迎指正。


论文内容

摘要

  • The vision community is witnessing a modeling shift from CNNs to Transformers, where pure Transformer architectures have attained top accuracy on the major video recognition benchmarks.
    计算机视觉领域正在见证着模型从CNN到Transformer的变迁,纯Transformer结构已经在主要的视频识别基准上达到了最高准确率

  • These video models are all built on Transformer layers that globally connect patches across the spatial and temporal dimensio

;