conda,jupyter lab 以及对应插件的安装和配置
1.conda 安装
参考:https://blog.csdn.net/LJX_ahut/article/details/114282900
conda国内下载地址:https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/
bash Anaconda3-2021.05-Linux-x86_64.sh -p /opt/module/anaconda_202105_install_env
source ~/.bashrc
添加国内镜像源:
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --set show_channel_urls yes
2.创建虚拟环境
3.jupyter lab 配置
参考:https://www.cnblogs.com/liuxiaomo/p/13164530.html
conda自带jupyter lab
生成配置文件:jupyter lab --generate-config
生成密钥:
from jupyter_server.auth import passwd
passwd()
修改配置文件: vim /root/.jupyter/jupyter_lab_config.py
c.ServerApp.ip = ‘*’
c.ServerApp.password = “密钥”
c.ExtensionApp.open_browser = False
c.ServerApp.port = 8889
c.ServerApp.allow_remote_access = True
c.ServerApp.root_dir = ‘/home/jupyter-work-dir’
3.jupyter lab 插件配置
参考: https://blog.csdn.net/moledyzhang/article/details/78850820
1.scala内核
下载jupyter-scala-cli2.11内核包:https://oss.sonatype.org/content/repositories/snapshots/com/github/alexarchambault/jupyter/jupyter-scala-cli_2.11.6/0.2.0-SNAPSHOT/
安装:
tar xvf jupyter-scala_2.11.6-0.2.0-SNAPSHOT.tar.xz -C /opt/module/
bash /opt/module/jupyter-scala_2.11.6-0.2.0-SNAPSHOT/bin/jupyter-scala
2.spark内核
需要提前安装sbt和docker(前戏有点多,,)
git clone https://github.com/apache/incubator-toree.git
cd incubator-toree/
修改文件MAKEFILE,修改内容为:
APACHE_SPARK_VERSION?=2.4.5
make build # sbt要绑定国内源,不然很慢
make dist # docker要绑定国内源,不然很慢
cd dist/toree/bin/
ls
pwd # 记住路径
在 /root/.ipython/kernels 创建目录spark,新建文件kernel.json,内容为(要记得粘贴时候,把注释去掉):
{
“display_name”: “Spark 2.4.5 (Scala 2.12.12)”,
“lauguage_info”: {“name”: “scala”},
“argv”: [
“/opt/module/incubator-toree-master/dist/toree/bin/run.sh”, # 改为上面的路径
“–profile”,
“{connection_file}”
],
“codemirror_mode”: “scala”,
“env”: {
“SPARK_OPTS”: “–master=local[2] --conf spark.sql.catalogImplementation=hive --driver-java-options=-Xms1024M --driver-java-options=-Xms4096M --driver-java-options=-Dlog4j.logLevel=info”, # 后面可以修改这些参数
“MAX_INTERPRETER_THREADS”: “16”,
“CAPTURE_STANDARD_OUT”: “true”,
“CAPTURE_STANDARD_ERR”: “true”,
“SEND_EMPTY_OUTPUT”: “false”,
“SPARK_HOME”: “/opt/module/spark-2.4.5”,
“PYTHONPATH”: “/opt/module/spark-2.4.5/python:/opt/module/spark-2.4.5/python/lib/py4j-0.10.7-src.zip” # 改为自己的路径
}
}
查看内核
jupyter kernelspec list
可以通过查看spark-shell的web界面来看运行进度:http://server:4040/jobs/
4.启动
jupyter lab --allow-root