【Error】记录AutoDL跑ChatGLM微调模型环境踩的坑

1、学习资源加速通道

开启：source /etc/network_turbo
关闭：unset http_proxy && unset https_proxy
在克隆结束以后关闭，否则conda下载环境时会出现网络错误

2、Conda下载时报错

2.1 PackagesNotFoundError: The following packages are not available from current channels

解决方案：conda config --append channels conda-forge

2.2 CondaHTTPError:HTTP 000 CONNECTION FAILED for url＜https://mirrors.tuna.tsinghua.edu.cn/anaco

镜像源出问题，修改镜像源

vim ~/.condarc
修改为以下内容

show_channel_urls: true
default_channels:
  - http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/linux-64
  - http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/r
  - http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/msys2
custom_channels:
  conda-forge: http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  msys2: http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  bioconda: http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  menpo: http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  pytorch: http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  simpleitk: http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
ssl_verify: false

3、ImportError: /lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.29’ not found (required by /home/wuye/anaconda3/envs/tf2/lib/python3.8/site-packages/google/protobuf/pyext/_message.cpython-38-x86_64-linux-gnu.so)

查看具体有哪些GLIBCXX包 strings /usr/lib/x86_64-linux-gnu/libstdc++.so.6 | grep GLIBCXX
来查看当前系统中其它的同类型文件，找到一个版本比较高 sudo find / -name "libstdc++.so.6**
选了版本较高，看报错指令，是否包含需要的版本
复制新版本到本地

sudo cp /home/wuye/anaconda3/envs/tf2/lib/libstdc++.so.6.0.29 /usr/lib/x86_64-linux-gnu/

注意前一个地址是包所在源地址，后一个地址是你需要复制到的目标地址
如果漏写后面的地址会报错 cp: missing destination file operand（丢失目标文件操作数）
5. 删除旧的链接 sudo rm /usr/lib/x86_64-linux-gnu/libstdc++.so.6
6. 创建新的链接 sudo ln -s /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.29 /usr/lib/x86_64-linux-gnu/libstdc++.so.6

4、无法调用GPU：RuntimeError: Unable to proceed, no GPU resources available

附带错误 RuntimeError: PytorchStreamReader failed reading zip archive: failed finding
租到实例以后执行 nvidia-smi 发现显存为0 说明没有调用gpu
在gpu存在情况下，应该是torch版本出错，卸载torch，重新下载
注意不要用conda下载，用pip

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117

5、缺少pytorch_model-00001-of-00008.bin

OSError: Unable to load weights from pytorch checkpoint file for '/root/autodl-tmp/chatglm-6b/pytorch_model-00001-of-00008.bin' at '/root/autodl-tmp/chatglm-6b/pytorch_model-00001-of-00008.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.

在链接中下载模型，移动到报错指定的地址，从1下载到8逐一上传到指定的模型地址，
ChatGLM微调模型的默认地址就是 /root/autodl-tmp/chatglm-6b
清华云盘

6、huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form ‘repo_name‘ or ‘namespace/repo_name’: ‘你的地址’. Use repo_type argument if needed.

附带的错误有：

OSError: Can't load the configuration of '../root/autodl-tmp/chatglm-6b/'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure '../root/autodl-tmp/chatglm-6b/' is the correct path to a directory containing a config.json file

此处的这个路径就是错误的模型路径，导致找不到模型，也就是args["model_dir"]代表的地址

../root/autodl-tmp/chatglm-6b/

在所有文件的如下config部分，将args["model_dir"]修改为你的模型的地址

model = AutoModel.from_pretrained("/root/autodl-tmp/chatglm-6b/", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("/root/autodl-tmp/chatglm-6b/", trust_remote_code=True)

参考链接

1、PackagesNotFoundError: The following packages are not available from current channels的解决办法
2、解决方法集合CondaHTTPError:HTTP 000 CONNECTION FAILED for url＜https://mirrors.tuna.tsinghua.edu.cn/anaco
3、HTTP 000 CONNECTION FAILED for url ＜https://mirrors.ustc.edu.cn/anaconda/pkgs/free/noarch/repodata.
4、cp: missing destination file operand（丢失目标文件操作数）
5、ImportError: /lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.29’ not found
6、无法调用GPU
7、GPU版本pytorch安装
8、huggingface_hub.utils._validators.HFValidationError