【Error】记录AutoDL跑ChatGLM微调模型环境踩的坑
- 1、学习资源加速通道
- 2、Conda下载时报错
- 3、ImportError: /lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.29' not found (required by /home/wuye/anaconda3/envs/tf2/lib/python3.8/site-packages/google/protobuf/pyext/_message.cpython-38-x86_64-linux-gnu.so)
- 4、无法调用GPU:RuntimeError: Unable to proceed, no GPU resources available
- 5、缺少pytorch_model-00001-of-00008.bin
- 6、huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form ‘repo_name‘ or ‘namespace/repo_name’: ‘你的地址’. Use repo_type argument if needed.
- 参考链接
1、学习资源加速通道
开启:source /etc/network_turbo
关闭:unset http_proxy && unset https_proxy
在克隆结束以后关闭,否则conda下载环境时会出现网络错误
2、Conda下载时报错
2.1 PackagesNotFoundError: The following packages are not available from current channels
解决方案:conda config --append channels conda-forge
2.2 CondaHTTPError:HTTP 000 CONNECTION FAILED for url<https://mirrors.tuna.tsinghua.edu.cn/anaco
镜像源出问题,修改镜像源
vim ~/.condarc
- 修改为以下内容
show_channel_urls: true
default_channels:
- http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/linux-64
- http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/r
- http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/msys2
custom_channels:
conda-forge: http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
msys2: http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
bioconda: http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
menpo: http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
pytorch: http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
simpleitk: http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
ssl_verify: false
3、ImportError: /lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.29’ not found (required by /home/wuye/anaconda3/envs/tf2/lib/python3.8/site-packages/google/protobuf/pyext/_message.cpython-38-x86_64-linux-gnu.so)
- 查看具体有哪些GLIBCXX包
strings /usr/lib/x86_64-linux-gnu/libstdc++.so.6 | grep GLIBCXX
- 来查看当前系统中其它的同类型文件,找到一个版本比较高
sudo find / -name "libstdc++.so.6**
- 选了版本较高,看报错指令,是否包含需要的版本
- 复制新版本到本地
sudo cp /home/wuye/anaconda3/envs/tf2/lib/libstdc++.so.6.0.29 /usr/lib/x86_64-linux-gnu/
注意前一个地址是包所在源地址,后一个地址是你需要复制到的目标地址
如果漏写后面的地址会报错 cp: missing destination file operand(丢失目标文件操作数)
5. 删除旧的链接 sudo rm /usr/lib/x86_64-linux-gnu/libstdc++.so.6
6. 创建新的链接 sudo ln -s /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.29 /usr/lib/x86_64-linux-gnu/libstdc++.so.6
4、无法调用GPU:RuntimeError: Unable to proceed, no GPU resources available
附带错误 RuntimeError: PytorchStreamReader failed reading zip archive: failed finding
租到实例以后执行 nvidia-smi 发现显存为0 说明没有调用gpu
在gpu存在情况下,应该是torch版本出错,卸载torch,重新下载
注意不要用conda下载,用pip
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117
5、缺少pytorch_model-00001-of-00008.bin
OSError: Unable to load weights from pytorch checkpoint file for '/root/autodl-tmp/chatglm-6b/pytorch_model-00001-of-00008.bin' at '/root/autodl-tmp/chatglm-6b/pytorch_model-00001-of-00008.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.
在链接中下载模型,移动到报错指定的地址,从1下载到8逐一上传到指定的模型地址,
ChatGLM微调模型的默认地址就是 /root/autodl-tmp/chatglm-6b
清华云盘
6、huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form ‘repo_name‘ or ‘namespace/repo_name’: ‘你的地址’. Use repo_type argument if needed.
附带的错误有:
OSError: Can't load the configuration of '../root/autodl-tmp/chatglm-6b/'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure '../root/autodl-tmp/chatglm-6b/' is the correct path to a directory containing a config.json file
此处的这个路径就是错误的模型路径,导致找不到模型,也就是args["model_dir"]
代表的地址
../root/autodl-tmp/chatglm-6b/
在所有文件的如下config部分,将args["model_dir"]
修改为你的模型的地址
model = AutoModel.from_pretrained("/root/autodl-tmp/chatglm-6b/", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("/root/autodl-tmp/chatglm-6b/", trust_remote_code=True)
参考链接
1、PackagesNotFoundError: The following packages are not available from current channels的解决办法
2、解决方法集合CondaHTTPError:HTTP 000 CONNECTION FAILED for url<https://mirrors.tuna.tsinghua.edu.cn/anaco
3、HTTP 000 CONNECTION FAILED for url <https://mirrors.ustc.edu.cn/anaconda/pkgs/free/noarch/repodata.
4、cp: missing destination file operand(丢失目标文件操作数)
5、ImportError: /lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.29’ not found
6、无法调用GPU
7、GPU版本pytorch安装
8、huggingface_hub.utils._validators.HFValidationError