python调用html数据_Python读取HTML页面

有一个类库叫作beautifulsoup。使用这个库，可以搜索html标签的值，并获取页面标题和页面标题列表等特定数据。

安装Beautifulsoup

使用Anaconda软件包管理器安装所需的软件包及其相关软件包。

conda install Beaustifulsoap

读取HTML文件

在下面的例子中，我们请求一个url被加载到python环境中。然后使用html parser参数来读取整个html文件。接下来，打印html页面的前几行。

import urllib2

from bs4 import BeautifulSoup

# Fetch the html file

import urllib3

from bs4 import BeautifulSoup

# Fetch the html file

http = urllib3.PoolManager()

response = http.request('GET','http://www.zyiz.net/python/features.html')

html_doc = response.data

# Parse the html file

soup = BeautifulSoup(html_doc, 'html.parser')

# Format the parsed html file

strhtm = soup.prettify()

# Print the first few characters

print (strhtm[:225])

当执行上面示例代码，得到以下输出结果 -