Bootstrap

python大学专业_Python爬虫爬取全国各大高校各专业分数

本文仅练习爬虫程序的编写,并无保存任何数据,网址接口已经打码处理。

我们通过分析网络请求可以看到有这两个json文件:

https://xxx.cn/www/2.0/schoolprovinceindex/2018/318/12/1/1.json

https://xxx..cn/www/2.0/schoolspecialindex/2018/31/11/1/1.json

其中318是学校id,12是省份id,代表的是天津

分别对应着学校各省分数线以及和各专业分数线

因此我们当前页面的代码为:

import requests

HEADERS = {

"Accept": "text/html,application/xhtml+xml,application/xml;",

"Accept-Language": "zh-CN,zh;q=0.8",

"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:67.0) Gecko/20100101 Firefox/67.0",

'Referer': 'https://xxx.cn/school/search'

}

url = 'https://xxx.cn/www/2.0/schoolprovinceindex/2018/1217/12/1/1.json'

response = requests.get(url,headers=HEADERS)

print(response.json())

接下来我们就要想办法获取学校id了,同样我们分析到:

https://xxxl.cn/gkcx/api/?uri=apigkcx/api/school/hotlists

通过post如下数据:

data = {"access_token":"","admissions":"","central":"","department":"","dual_class":"","f211":"","f985":"","is_dual_class":"","keyword":"","page":2,"province_id":"","request_type":1,"school_type":"","size":20,"sort":"view_total","type":"","uri":"apigkcx/api/school/hotlists"}

我们可以看到一个参数是page,对应着页码:

所以我们这部分的代码为:

import requests

HEADERS &

;