BeautifulSoup库的安装

BeautifulSoup库的安装

首先我们使用 win+R 输入 cmd回车打开终端，切盘（切到自己安装python的盘，切盘f: 切盘语句用盘符加上冒号:）
输入安装语句pip Install BeautifulSoup4回车等待即可

引入库

使用import引入，引入bs4

from bs4 import BeautifulSoup
import bs4

解析网页

使用resquests库获得网页源代码，在使用BeautifulSoup对其进行解析

>>> import requests
>>> r=requests.get("http://www.baidu.com")
>>> r.status_code
200
>>> r.encoding='utf-8'
>>> rt=r.text
#这个时候我们看到的网页源代码是杂乱的，使用BeautifulSoup对其进行美化
>>> from bs4 import BeautifulSoup #引库
>>> soup=BeautifulSoup(rt,"html.parser") #把r.text按照html.perser（解析器）的格式
>>> print(soup.prettify())

解析器：安装与安装beautifulsoup库一样，见下表

种类	方法	条件
bs4的html解析器	BeautifulSoup(r.text,“html.parser”)	安装bs4库
ixml的html解析器	BeautifulSoup(r.text,“ixml”)	安装`pip install ixml`
ixml的xml解析器	BeautifulSoup(r.text,“xml”)	安装`pip install ixml`
html5lib解析器	BeautifulSoup(r.text,“ihtml5lib”)	安装`pip install html5lib`

rt r.text h5代码

不做详解了，参考w3c教程此处提供一个链接
接上述代码
Tag标签与h5的标签指的同一个标签
接上面的代码，使用.标签名即可访问标签，（整个标签从开始到结束里面所有的内容）

soup.title
<title>百度一下，你就知道</title>

使用< tag>.name可访问标签的名字，返回的是字符串类型

>>> soup.title.name
'title'

使用< tag>.attrs可访问标签的属性，返回的是字典类型

>>> soup.title.attrs
{}#这个标签没有属性，故返回一个空字典

使用< tag>.string可访问标签的内容，返回的是字符串类型

>>> soup.title.string
'百度一下，你就知道'

BeautifulSoup库的安装

悦读