Bootstrap

Eye Tracking for Everyone 代码复现环境

准备工作

  1. 到 Github 下载论文代码 https://github.com/CSAILVision/GazeCapture,这里主要关注 PyTorch 复现部分代码
  2. 按照 README.md 说明下载数据集,然后解压为如下目录格式:
GazeCapture
\--00002
    \--frames
    \--appleFace.json
\--00003
  1. 运行 prepareDataset.py 进行预处理,执行命令如下:
python prepareDataset.py --dataset_path [A = where extracted] --output_path [B = where to save new data]

输出如下:

======================
        Summary
======================
Total added 1490959 frames from 1471 recordings.
There are no missing files.
There are no extra files that were not in the reference dataset.
The new metadata.mat is an exact match to the reference from GitHub (including ordering)

并且目录 B 会有如下目录结构:

\---00002
    \---appleFace
        \---00000.jpg
    \---appleLeftEye
    \---appleRightEye   
\---00003
...
\---metadata.mat
  1. 执行训练语句
python main.py --data_path [path B] --reset

问题说明

  1. 在复现 Eye Tracking for Everyone 代码时,发现将模型迁移到 cuda 的过程超级慢,即 main.py 文件中的 model.cuda() 语句执行时间过长。
  2. 执行 main.py 时,报错信息如下:RuntimeError: cublas runtime error : the GPU program failed to execute at C:/w/1/s/windows/pytorch/aten/src/THC/THCBlas.cu:259
  3. 执行 main.py 时,报错信息如下:CondaHTTPError: HTTP 000 CONNECTION FAILED for url <https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/win-64/pytorch-1.12.0-py3.10_cuda11.3_cudnn8_0.tar.bz2> Elapsed: -
  4. 执行 main.py 时,报错信息如下:An HTTP error occurred when trying to retrieve this URL. HTTP errors are often intermittent, and a simple retry will get you on your way.

发生以上任意一个问题,都可采用如下解决方案。

问题思考

代码中的 requirements.txt 文件:

numpy==1.16.4
Pillow==8.2.0
pkg-resources==0.0.0
scipy==1.3.0
six==1.12.0
torch==1.1.0
torchfile==0.1.0
torchvision==0.3.0a0

网上查阅资料后发现是 torchvision==0.3.0 的问题,据说将 torchvision 升级到 0.3.1 即可。但是由于 torchtorchvisioncuda 等的版本匹配问题,不能只升级一个 torchvision,于是我决定推倒重来,新建一个环境

问题解决

依次执行如下命令

> conda create -n pytorch_gpu python=3.7    # 新建一个 conda 环境
> conda activate pytorch_gpu   # 激活 conda 环境

打开 PyTorch 官网,选择自己电脑 CUDA 对应版本的 PyTorch 安装。
那么如何查看自己电脑的 CUDA 版本呢?答案是:Windows 环境下,打开 cmd 命令行,输入命令 nvidia-smi,注意我这里的 CUDA 版本是 11.4

(pytorch_gpu) c:\gaze\code\GazeCapture-master\pytorch>nvidia-smi
Tue Oct  4 16:34:39 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 471.86       Driver Version: 471.86       CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ... WDDM  | 00000000:01:00.0  On |                  N/A |
| N/A   48C    P8    16W /  N/A |    308MiB /  6144MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
> # 根据 PyTorch 官网(https://pytorch.org/get-started/previous-versions/)命令,安装 PyTorch
> conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.3 -c pytorch	# 注意我在执行这一步命令前,将 C:\Users\ASUS\.condarc 文件中的内容清空了。因为之前添加了清华源,如果使用清华源下载这些包,有的会找不到,所以我直接将 .condarc 中的内容清空,使用 pytorch 官方源下载
> conda install scipy # 需要额外下载项目代码需要的 scipy 依赖
> # 然后执行 main.py 即可啦~
> python main.py --data_path [path B] --reset

输出结果:

(pytorch_gpu) c:\gaze\code\GazeCapture-master\pytorch>python main.py --data_path c:\gaze\datasets\GazeCapture_final --reset
Loading iTracker dataset...
        Reading metadata from c:\gaze\datasets\GazeCapture_final\metadata.mat...
        Reading metadata from ./mean_face_224.mat...
        Reading metadata from ./mean_left_224.mat...
        Reading metadata from ./mean_right_224.mat...
Loaded iTracker dataset split "train" with 1140395 records...
Loading iTracker dataset...
        Reading metadata from c:\gaze\datasets\GazeCapture_final\metadata.mat...
        Reading metadata from ./mean_face_224.mat...
        Reading metadata from ./mean_left_224.mat...
        Reading metadata from ./mean_right_224.mat...
Loaded iTracker dataset split "test" with 156720 records...
Epoch (train): [0][0/11404]     Time 60.626 (60.626)    Data 48.368 (48.368)    Loss 29.4269 (29.4269)
Epoch (train): [0][1/11404]     Time 2.956 (31.791)     Data 0.000 (24.184)     Loss 32.6070 (31.0169)
Epoch (train): [0][2/11404]     Time 2.953 (22.178)     Data 0.001 (16.123)     Loss 40.2652 (34.0997)
Epoch (train): [0][3/11404]     Time 2.782 (17.329)     Data 0.000 (12.092)     Loss 30.2922 (33.1478)
Epoch (train): [0][4/11404]     Time 2.859 (14.435)     Data 0.000 (9.674)      Loss 32.7997 (33.0782)
Epoch (train): [0][5/11404]     Time 2.787 (12.494)     Data 0.006 (8.062)      Loss 29.5378 (32.4881)
Epoch (train): [0][6/11404]     Time 2.868 (11.119)     Data 0.000 (6.911)      Loss 26.5112 (31.6343)
Epoch (train): [0][7/11404]     Time 2.831 (10.083)     Data 0.000 (6.047)      Loss 29.9735 (31.4267)
Epoch (train): [0][8/11404]     Time 2.880 (9.282)      Data 0.000 (5.375)      Loss 31.4549 (31.4298)
Epoch (train): [0][9/11404]     Time 2.900 (8.644)      Data 0.003 (4.838)      Loss 33.0178 (31.5886)
Epoch (train): [0][10/11404]    Time 2.819 (8.114)      Data 0.000 (4.398)      Loss 31.9660 (31.6229)
Epoch (train): [0][11/11404]    Time 2.857 (7.676)      Data 0.000 (4.031)      Loss 32.4135 (31.6888)
Epoch (train): [0][12/11404]    Time 2.799 (7.301)      Data 0.000 (3.721)      Loss 30.6588 (31.6096)
Epoch (train): [0][13/11404]    Time 2.787 (6.979)      Data 0.000 (3.456)      Loss 27.6615 (31.3276)
Epoch (train): [0][14/11404]    Time 2.798 (6.700)      Data 0.000 (3.225)      Loss 32.5645 (31.4100)
Epoch (train): [0][15/11404]    Time 2.810 (6.457)      Data 0.001 (3.024)      Loss 38.2859 (31.8398)
Epoch (train): [0][16/11404]    Time 2.799 (6.242)      Data 0.001 (2.846)      Loss 34.6483 (32.0050)
Epoch (train): [0][17/11404]    Time 2.883 (6.055)      Data 0.000 (2.688)      Loss 27.8001 (31.7714)
Epoch (train): [0][18/11404]    Time 2.974 (5.893)      Data 0.000 (2.546)      Loss 29.3663 (31.6448)
Epoch (train): [0][19/11404]    Time 2.876 (5.742)      Data 0.001 (2.419)      Loss 36.6065 (31.8929)
Epoch (train): [0][20/11404]    Time 2.901 (5.607)      Data 0.001 (2.304)      Loss 28.8153 (31.7463)
;