Bootstrap

sklearn训练模型、保存模型文件(文本、pkl)、模型文件转换(pkl2onnx)以及模型可视化

1.使用环境

IDE:Jupyter Lab,使用Python2 kernel实现

模型可视化:GraphViz,可以直接在jupyter中使用;Netron    window版本

模型转化:在onnx/onnx-ecosystem容器中进行

2.代码

创建并训练模型

import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
import pandas as pd


from sklearn.datasets import load_iris
from sklearn import tree

iris = load_iris()

# 训练模型
clf =  tree.DecisionTreeClassifier()
clf = clf.fit(iris.data, iris.target)
with open("iris.dot", 'w') as f:
    f = tree.export_graphviz(clf, out_file=f)


from IPython.display import Image  
import pydotplus

dot_data = tree.export_graphviz(clf, out_file=None, 
                         feature_names=iris.feature_names,  
                         class_names=iris.target_names,  
                         filled=True, rounded=True,  
                         special_characters=True)


graph = pydotplus.graph_from_dot_data(dot_data)

# 模型可视化
Image(graph.create_png())


 

将图片保存为pdf

#设置环境变量,解决调用graph时“InvocationException: GraphViz's executables not found”的错误。

import os
os.environ["PATH"] += os.pathsep + 'D:/Anaconda2/Library/bin/graphviz/' 

dot_data = tree.export_graphviz(clf, out_file=None)
graph = pydotplus.graph_from_dot_data(dot_data) 
graph.write_pdf("iris.pdf")

使用joblib保存模型为pkl格式,并读取pkl格式的模型文件进行预测

from sklearn.externals import joblib
joblib.dump(clf, "DecisionTreeClassifier.pkl")

f1=joblib.load('DecisionTreeClassifier.pkl')

f1.score(iris.data, iris.target)


使用pickle保存模型为文本格式并读取通过pickle保存的模型文件进行预测

import pickle
s=pickle.dumps(clf)
f=open('DecisionTreeClassifier.txt','w')
f.write(s)
f.close()

f2=open('DecisionTreeClassifier.txt','r')
s2=f2.read()
clf2=pickle.loads(s2)
clf2.score(iris.data, iris.target)

模型格式转换

在onnx/onnx-ecosystem容器执行如下代码:

将pkl格式的模型文件转换为onnx:DecisionTreeClassifier.pkl  ----> model.onnx

from sklearn.externals import joblib
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import *
import onnxmltools

# Update the input name and path for your sklearn model
input_skl_model = 'DecisionTreeClassifier.pkl'

# input data type for your sklearn model
input_data_type = [('float_input', FloatTensorType([1, 4]))]

# Change this path to the output name and path for the ONNX model
output_onnx_model = 'model.onnx'

# Load your sklearn model
skl_model = joblib.load(input_skl_model)

# Convert the sklearn model into ONNX
onnx_model = onnxmltools.convert_sklearn(skl_model, initial_types=input_data_type)

# Save as protobuf
onnxmltools.utils.save_model(onnx_model, output_onnx_model)

3.使用Netron查看pkl模型和onnx模型

查看pkl格式的模型

 

查看onnx格式的模型

 

;