Bootstrap

python爬虫--------Beautiful Soup 案列(二十一天)

🎈🎈作者主页: 喔的嘛呀🎈🎈
🎈🎈所属专栏:python爬虫学习🎈🎈
✨✨谢谢大家捧场,祝屏幕前的小伙伴们每天都有好运相伴左右,一定要天天开心哦!✨✨ 

目录

一、股票信息提取(http://quote.stockstar.com/)

1、首先打开网页

2、我们选取信息技术行业的股票,点进去。然后先复制网页地址http://quote.stockstar.com/stock/industry_I.shtml

3、然后点点击键盘上的F12打开开发工具分析网页结构,开始定位要爬取的数据对应的网页结构

4、提取定位的网页结构元素进行分析

5、分析完了,开写

(1)使用Beautiful Soup解析HTML代码:

(2)找到包含股票信息的表格:

(3)提取表格中的行数据:

(4)遍历每一行,提取股票信息:

(6)完整代码

6、结果演示

二、提取新浪新闻热榜新闻

三、结语


一、股票信息提取(http://quote.stockstar.com/)

1、首先打开网页

2、我们选取信息技术行业的股票,点进去。然后先复制网页地址http://quote.stockstar.com/stock/industry_I.shtml

3、然后点点击键盘上的F12打开开发工具分析网页结构,开始定位要爬取的数据对应的网页结构

上图可以看出爬取的数据都在box box_02这个盒子中

4、提取定位的网页结构元素进行分析

<div class="box box02">
    <div class="bg_box" id="dataTable">
        <div class="con">
        **//这里是股票所对应的表格 需要提取**
            <table width="100%" border="0" cellpadding="0" cellspacing="0" class="trHover" id="table1">
                <thead class="tbody_right">
                <tr>
                    <td width="6%" class="align_center">
                        <a href="javascript:void(0)" sort="0" target="_self" class="newup">代码</a>
                    </td>
                    <td width="24%" class="align_center">简称
                    </td>
                    <td width="17.5%" class="align_right">
                        <a href="javascript:void(0)" sort="1" target="_self">流通市值(万元)</a>
                    </td>
                    <td width="17.5%" class="align_right">
                        <a href="javascript:void(0)" sort="2" target="_self">总市值(万元)</a>
                    </td>
                    <td width="17.5%" class="align_right">
                        <a href="javascript:void(0)" sort="3" target="_self">流通股本(万元)</a>
                    </td>
                    <td width="17.5%" class="align_right">
                        <a href="javascript:void(0)" sort="4" target="_self">总股本(万元)</a>
                    </td>

                </tr>
                </thead>
                <tbody class="tbody_right" id="datalist">
               
                //start(从start到end是要每一行对应的股票信息,我们进行遍历,最后打印出来就好了)
                <tr>
                    <td class="align_center select"><a href="//stock.quote.stockstar.com/000004.shtml">000004</a></td>
                    <td class="align_center"><a href="//stock.quote.stockstar.com/000004.shtml">国华网安</a></td>
                    <td class="align_right ">190063.58</td>
                    <td class="align_right ">199232.32</td>
                    <td class="align_right ">12628.81</td>
                    <td class="align_right ">13238.03</td>
                </tr>
                <tr>
                    <td class="align_center select"><a href="//stock.quote.stockstar.com/000032.shtml">000032</a></td>
                    <td class="align_center"><a href="//stock.quote.stockstar.com/000032.shtml">深桑达A</a></td>
                    <td class="align_right ">1166377.73</td>
                    <td class="align_right ">2058568.25</td>
                    <td class="align_right ">64476.38</td>
                    <td class="align_right ">113795.92</td>
                </tr>
                <tr>
                    <td class="align_center select"><a href="//stock.quote.stockstar.com/000158.shtml">000158</a></td>
                    <td class="align_center"><a href="//stock.quote.stockstar.com/000158.shtml">常山北明</a></td>
                    <td class="align_right ">1224653.26</td>
                    <td class="align_right ">1235730.73</td>
                    <td class="align_right ">158428.62</td>
                    <td class="align_right ">159861.67</td>
                </tr>
                <tr>
                    <td class="align_center select"><a href="//stock.quote.stockstar.com/000409.shtml">000409</a></td>
                    <td class="align_center"><a href="//stock.quote.stockstar.com/000409.shtml">云鼎科技</a></td>
                    <td class="align_right ">364104.22</td>
                    <td class="align_right ">581661.43</td>
                    <td class="align_right ">42337.70</td>
                    <td class="align_right ">67635.05</td>
                </tr>
                <tr>
                    <td class="align_center select"><a href="//stock.quote.stockstar.com/000503.shtml">000503</a></td>
                    <td class="align_center"><a href="//stock.quote.stockstar.com/000503.shtml">国新健康</a></td>
                    <td class="align_right ">905071.39</td>
                    <td class="align_right ">991065.37</td>
                    <td class="align_right ">89877.99</td>
                    <td class="align_right ">98417.61</td>
                </tr>
                <tr>
                    <td class="align_center select"><a href="//stock.quote.stockstar.com/000555.shtml">000555</a></td>
                    <td class="align_center"><a href="//stock.quote.stockstar.com/000555.shtml">神州信息</a></td>
                    <td class="align_right ">1166774.74</td>
                    <td class="align_right ">1170929.32</td>
                    <td class="align_right ">97231.23</td>
                    <td class="align_right ">97577.44</td>
                </tr>
                <tr>
                    <td class="align_center select"><a href="//stock.quote.stockstar.com/000676.shtml">000676</a></td>
                    <td class="align_center"><a href="//stock.quote.stockstar.com/000676.shtml">智度股份</a></td>
                    <td class="align_right ">914176.97</td>
                    <td class="align_right ">915255.50</td>
                    <td class="align_right ">127500.28</td>
                    <td class="align_right ">127650.70</td>
                </tr>
                <tr>
                    <td class="align_center select"><a href="//stock.quote.stockstar.com/000682.shtml">000682</a></td>
                    <td class="align_center"><a href="//stock.quote.stockstar.com/000682.shtml">东方电子</a></td>
                    <td class="align_right ">1222620.92</td>
                    <td class="align_right ">1222743.03</td>
                    <td class="align_right ">134059.31</td>
                    <td class="align_right ">134072.70</td>
                </tr>
                <tr>
                    <td class="align_center select"><a href="//stock.quote.stockstar.com/000839.shtml">000839</a></td>
                    <td class="align_center"><a href="//stock.quote.stockstar.com/000839.shtml">ST国安</a></td>
                    <td class="align_right ">764366.14</td>
                    <td class="align_right ">764366.14</td>
                    <td class="align_right ">391982.64</td>
                    <td class="align_right ">391982.64</td>
                </tr>
                <tr>
                    <td class="align_center select"><a href="//stock.quote.stockstar.com/000889.shtml">000889</a></td>
                    <td class="align_center"><a href="//stock.quote.stockstar.com/000889.shtml">ST中嘉</a></td>
                    <td class="align_right ">148744.29</td>
                    <td class="align_right ">160105.78</td>
                    <td class="align_right ">86984.97</td>
                    <td class="align_right ">93629.11</td>
                </tr>
                <tr>
                    <td class="align_center select"><a href="//stock.quote.stockstar.com/000948.shtml">000948</a></td>
                    <td class="align_center"><a href="//stock.quote.stockstar.com/000948.shtml">南天信息</a></td>
                    <td class="align_right ">528632.72</td>
                    <td class="align_right ">539879.79</td>
                    <td class="align_right ">38614.52</td>
                    <td class="align_right ">39436.07</td>
                </tr>
                <tr>
                    <td class="align_center select"><a href="//stock.quote.stockstar.com/000971.shtml">000971</a></td>
                    <td class="align_center"><a href="//stock.quote.stockstar.com/000971.shtml">ST高升</a></td>
                    <td class="align_right ">134618.70</td>
                    <td class="align_right ">166725.83</td>
                    <td class="align_right ">84665.85</td>
                    <td class="align_right ">104859.01</td>
                </tr>
                <tr>
                    <td class="align_center select"><a href="//stock.quote.stockstar.com/000997.shtml">000997</a></td>
                    <td class="align_center"><a href="//stock.quote.stockstar.com/000997.shtml">新 大 陆</a></td>
                    <td class="align_right ">1788947.49</td>
                    <td class="align_right ">1798885.70</td>
                    <td class="align_right ">102636.12</td>
                    <td class="align_right ">103206.29</td>
                </tr>
                <tr>
                    <td class="align_center select"><a href="//stock.quote.stockstar.com/002063.shtml">002063</a></td>
                    <td class="align_center"><a href="//stock.quote.stockstar.com/002063.shtml">远光软件</a></td>
                    <td class="align_right ">943962.50</td>
                    <td class="align_right ">1024941.65</td>
                    <td class="align_right ">175457.71</td>
                    <td class="align_right ">190509.60</td>
                </tr>
                <tr>
                    <td class="align_center select"><a href="//stock.quote.stockstar.com/002065.shtml">002065</a></td>
                    <td class="align_center"><a href="//stock.quote.stockstar.com/002065.shtml">东华软件</a></td>
                    <td class="align_right ">1625588.07</td>
                    <td class="align_right ">1795070.13</td>
                    <td class="align_right ">290283.58</td>
                    <td class="align_right ">320548.24</td>
                </tr>
                <tr>
                    <td class="align_center select"><a href="//stock.quote.stockstar.com/002093.shtml">002093</a></td>
                    <td class="align_center"><a href="//stock.quote.stockstar.com/002093.shtml">国脉科技</a></td>
                    <td class="align_right ">713868.17</td>
                    <td class="align_right ">714317.50</td>
                    <td class="align_right ">100686.62</td>
                    <td class="align_right ">100750.00</td>
                </tr>
                <tr>
                    <td class="align_center select"><a href="//stock.quote.stockstar.com/002095.shtml">002095</a></td>
                    <td class="align_center"><a href="//stock.quote.stockstar.com/002095.shtml">生 意 宝</a></td>
                    <td class="align_right ">392665.13</td>
                    <td class="align_right ">394243.20</td>
                    <td class="align_right ">25170.84</td>
                    <td class="align_right ">25272.00</td>
                </tr>
                <tr>
                    <td class="align_center select"><a href="//stock.quote.stockstar.com/002123.shtml">002123</a></td>
                    <td class="align_center"><a href="//stock.quote.stockstar.com/002123.shtml">梦网科技</a></td>
                    <td class="align_right ">650474.06</td>
                    <td class="align_right ">757978.52</td>
                    <td class="align_right ">68687.86</td>
                    <td class="align_right ">80039.97</td>
                </tr>
                <tr>
                    <td class="align_center select"><a href="//stock.quote.stockstar.com/002131.shtml">002131</a></td>
                    <td class="align_center"><a href="//stock.quote.stockstar.com/002131.shtml">利欧股份</a></td>
                    <td class="align_right ">1309578.84</td>
                    <td class="align_right ">1515714.58</td>
                    <td class="align_right ">584633.41</td>
                    <td class="align_right ">676658.29</td>
                </tr>
                <tr>
                    <td class="align_center select"><a href="//stock.quote.stockstar.com/002148.shtml">002148</a></td>
                    <td class="align_center"><a href="//stock.quote.stockstar.com/002148.shtml">北纬科技</a></td>
                    <td class="align_right ">249222.00</td>
                    <td class="align_right ">308537.10</td>
                    <td class="align_right ">45148.91</td>
                    <td class="align_right ">55894.40</td>
                </tr>
                <tr>
                    <td class="align_center select"><a href="//stock.quote.stockstar.com/002153.shtml">002153</a></td>
                    <td class="align_center"><a href="//stock.quote.stockstar.com/002153.shtml">石基信息</a></td>
                    <td class="align_right ">1121434.43</td>
                    <td class="align_right ">1913164.88</td>
                    <td class="align_right ">159976.38</td>
                    <td class="align_right ">272919.38</td>
                </tr>
                <tr>
                    <td class="align_center select"><a href="//stock.quote.stockstar.com/002174.shtml">002174</a></td>
                    <td class="align_center"><a href="//stock.quote.stockstar.com/002174.shtml">游族网络</a></td>
                    <td class="align_right ">916019.16</td>
                    <td class="align_right ">917717.77</td>
                    <td class="align_right ">91419.08</td>
                    <td class="align_right ">91588.60</td>
                </tr>
                <tr>
                    <td class="align_center select"><a href="//stock.quote.stockstar.com/002195.shtml">002195</a></td>
                    <td class="align_center"><a href="//stock.quote.stockstar.com/002195.shtml">岩山科技</a></td>
                    <td class="align_right ">1674280.22</td>
                    <td class="align_right ">1694554.91</td>
                    <td class="align_right ">565635.21</td>
                    <td class="align_right ">572484.77</td>
                </tr>
                <tr>
                    <td class="align_center select"><a href="//stock.quote.stockstar.com/002197.shtml">002197</a></td>
                    <td class="align_center"><a href="//stock.quote.stockstar.com/002197.shtml">证通电子</a></td>
                    <td class="align_right ">491566.64</td>
                    <td class="align_right ">565213.89</td>
                    <td class="align_right ">53431.16</td>
                    <td class="align_right ">61436.29</td>
                </tr>
                <tr>
                    <td class="align_center select"><a href="//stock.quote.stockstar.com/002212.shtml">002212</a></td>
                    <td class="align_center"><a href="//stock.quote.stockstar.com/002212.shtml">天融信</a></td>
                    <td class="align_right ">811854.37</td>
                    <td class="align_right ">823376.63</td>
                    <td class="align_right ">116813.58</td>
                    <td class="align_right ">118471.46</td>
                </tr>
                <tr>
                    <td class="align_center select"><a href="//stock.quote.stockstar.com/002230.shtml">002230</a></td>
                    <td class="align_center"><a href="//stock.quote.stockstar.com/002230.shtml">科大讯飞</a></td>
                    <td class="align_right ">10642815.24</td>
                    <td class="align_right ">11280510.86</td>
                    <td class="align_right ">218448.59</td>
                    <td class="align_right ">231537.58</td>
                </tr>
                <tr>
                    <td class="align_center select"><a href="//stock.quote.stockstar.com/002232.shtml">002232</a></td>
                    <td class="align_center"><a href="//stock.quote.stockstar.com/002232.shtml">启明信息</a></td>
                    <td class="align_right ">637335.59</td>
                    <td class="align_right ">637335.59</td>
                    <td class="align_right ">40854.85</td>
                    <td class="align_right ">40854.85</td>
                </tr>
                <tr>
                    <td class="align_center select"><a href="//stock.quote.stockstar.com/002235.shtml">002235</a></td>
                    <td class="align_center"><a href="//stock.quote.stockstar.com/002235.shtml">安妮股份</a></td>
                    <td class="align_right ">319414.68</td>
                    <td class="align_right ">334413.21</td>
                    <td class="align_right ">55357.83</td>
                    <td class="align_right ">57957.23</td>
                </tr>
                <tr>
                    <td class="align_center select"><a href="//stock.quote.stockstar.com/002238.shtml">002238</a></td>
                    <td class="align_center"><a href="//stock.quote.stockstar.com/002238.shtml">天威视讯</a></td>
                    <td class="align_right ">898866.26</td>
                    <td class="align_right ">898866.26</td>
                    <td class="align_right ">80255.92</td>
                    <td class="align_right ">80255.92</td>
                </tr>
                <tr>
                    <td class="align_center select"><a href="//stock.quote.stockstar.com/002247.shtml">002247</a></td>
                    <td class="align_center"><a href="//stock.quote.stockstar.com/002247.shtml">聚力文化</a></td>
                    <td class="align_right ">108114.83</td>
                    <td class="align_right ">142946.17</td>
                    <td class="align_right ">64354.07</td>
                    <td class="align_right ">85087.00</td>
                </tr>
                //end
                </tbody>
                <tbody>
                <tr id="has_fyStock_data" class="noSelect no_trHover">
                    <td colspan="12" class="time notSelect">
                        <span class="fl" id="latesttime_span">数据时间:2024-03-29</span>
                        <div class="fenye fr" id="divPageControl1">共<strong>422</strong>条记录<span><em>1</em></span><a
                                href="/stock/industry_I_0_0_2.html" target="_self"><em>2</em></a><a
                                href="/stock/industry_I_0_0_3.html" target="_self"><em>3</em></a><a
                                href="/stock/industry_I_0_0_4.html" target="_self"><em>4</em></a><a
                                href="/stock/industry_I_0_0_5.html" target="_self"><em>5</em></a><em>...</em><a
                                href="/stock/industry_I_0_0_15.html" target="_self"><em>15</em></a><a
                                href="/stock/industry_I_0_0_2.html" target="_self"
                                class="n"><em>下一页</em></a>到第<input type="text" class="page_input"
                                                                        id="txtPageNumber"
                                                                        onkeydown="if (event.keyCode == 13){PagedControl.GoToThePage('/stock/industry_I_0_0_{0}.html');return false;}">页<a
                                href="javascript:void(0);"
                                onclick="PagedControl.GoToThePage('/stock/industry_I_0_0_{0}.html');return false;"><em>确定</em></a>
                        </div>
                    </td>
                </tr>

                </tbody>
            </table>
        </div>
    </div>
</div>

5、分析完了,开写

(1)使用Beautiful Soup解析HTML代码:


import requests                                          
from bs4 import BeautifulSoup                            
                                                         
url = "<http://quote.stockstar.com/stock/industry_I.shtml>"
response = requests.get(url)                             
response.encoding = 'gbk'  # 设置编码为 gbk                   
soup = BeautifulSoup(response.text, 'html.parser')       
                                                         

(2)找到包含股票信息的表格:


table = soup.find('table', class_='trHover')

(3)提取表格中的行数据:

rows = table.find_all('tr')

(4)遍历每一行,提取股票信息:

pythonCopy code
for row in rows[1:]:  # Skip the header row
    cells = row.find_all('td')
    if len(cells) >= 6:  # Ensure there are enough cells
        stock_code = cells[0].text.strip()
        stock_name = cells[1].text.strip()
        circulation_market_value = cells[2].text.strip()
        total_market_value = cells[3].text.strip()
        circulation_stock = cells[4].text.strip()
        total_stock = cells[5].text.strip()

        print(f"股票代码: {stock_code}, 股票名称: {stock_name}, 流通市值: {circulation_market_value}, 总市值: {total_market_value}, 流通股本: {circulation_stock}, 总股本: {total_stock}")

(6)完整代码

import requests                                                                
from bs4 import BeautifulSoup                                                  
                                                                               
url = "<http://quote.stockstar.com/stock/industry_I.shtml>"                      
response = requests.get(url)                                                   
response.encoding = 'gbk'  # 设置编码为 gbk   不设置这个编码会乱码                                      
soup = BeautifulSoup(response.text, 'html.parser')                             
                                                                               
table = soup.find('table', class_='trHover')                                   
rows = table.find_all('tr')                                                    
                                                                               
for row in rows[1:]:  # Skip the header row                                    
    cells = row.find_all('td')                                                 
    if len(cells) >= 6:  # Ensure there are enough cells                       
        stock_code = cells[0].text.strip()                                     
        stock_name = cells[1].text.strip()                                     
        circulation_market_value = cells[2].text.strip()                       
        total_market_value = cells[3].text.strip()                             
        circulation_stock = cells[4].text.strip()                              
        total_stock = cells[5].text.strip()                                    
                                                                               
        print(f"股票代码: {stock_code}, 股票名称: {stock_name}, "                      
              f"流通市值: {circulation_market_value}, 总市值: {total_market_value}, " 
              f"流通股本: {circulation_stock}, 总股本: {total_stock}")                
                                                                                                                         

这样就可以提取出表格中的股票信息了。如果你有其他需求或者需要进一步解析页面,请提供更多详细信息。

6、结果演示

二、提取新浪新闻热榜新闻

还是给以上的步骤一样

打开网页点F12提取要爬取数据的页面结构代码,分析,写出代码。

就是提取蓝色部分的网页结构代码

<div class="blk_main_card">
			<!-- 热榜 -->
			//blk_main_li为父元素
			<div class="blk_main_li" tab-type="tab-cont">
				<ul class="uni-blk-list02 list-a list-0427" style="padding-top: 7px;">
				<li><a href="<https://sinanews.sina.cn/native_zt/yingyanlandingpage1711786917>" data="0" target="_blank">小米汽车遭遇上百余名消费者投诉</a></li>
					<li><a href="<https://sinanews.sina.cn/native_page/quanzi_914931027323416577.html>" data="1" target="_blank">偷点外卖就不要写真实姓名了</a></li>
					<li id="hot_list_ad">
				<a id="hotlist_index_3" href="<https://s.weibo.com/weibo?q=%E5%93%AA%E4%BA%9B%E4%BA%BA%E5%AE%B9%E6%98%93%E5%BE%97%E7%99%BE%E6%97%A5%E5%92%B3>" data="2" target="_blank">哪些人容易得百日咳</a>
		<ins class="sinaads sinaads-fail" id="sinaads-right-hotlist" data-ad-pdps="PDPS000000067800" data-ad-width="360" data-ad-height="26" data-ad-type="embed" style="display:none" data-ad-status="done"></ins>
		   <script>(sinaads = window.sinaads || []).push({
        params: {
            element: document.getElementById("PDPS000000067800"),
            sinaads_success_handler:function () {
                  var ads = document.getElementById("sinaads-right-hotlist");
				  var _news= document.getElementById("hotlist_index_3");
				  var hot_list_ad= document.getElementById("hot_list_ad")
				  _news.style.display="none";
			      ads.style.display= "block";
				  hot_list_ad.classList.add("hotlist_have_ad")
            },
            sinaads_fail_handler: function () {
                console.log('sinaads_fail_handler')
            }
        }
    })</script>
    
		</li>
		
		//热榜新闻都被包含在li标签中
					<li><a href="<https://sinanews.sina.cn/native_zt/yingyanlandingpage1711790585>" data="3" target="_blank">杭州东站</a></li>
					<li><a href="<https://sinanews.sina.cn/native_page/quanzi_914336910352965633.html>" data="4" target="_blank">2024中国网络媒体论坛</a></li>
					<li><a href="<https://sinanews.sina.cn/native_page/quanzi_914966334487650305.html>" data="5" target="_blank">雷军能不能生产一下相机</a></li>
					<li><a href="<https://sinanews.sina.cn/native_zt/yingyanlandingpage1711790450>" data="6" target="_blank">医院取精室里都有些什么</a></li>
					<li><a href="<https://k.sina.com.cn/article_5756451891_m1571c7c3303301b0u4.html?from=news&amp;subch=onews>" data="7" target="_blank">警方辟谣面具男用病毒针扎人</a></li>
					<li><a href="<https://finance.sina.cn/2024-03-30/detail-inaqawts0171984.d.html>" data="8" target="_blank">殡葬用品店否认南通烧纸普遍2层楼高</a></li>
					<li><a href="<https://sinanews.sina.cn/native_zt/yingyanlandingpage1711790306>" data="9" target="_blank">花间令女性群像没有郑合惠子</a></li>
					<li><a class="fe661" href="<https://sinanews.sina.cn/h5/top_news_list.d.html>" data="10" target="_blank">点击查看更多实时热点</a></li>
			
</ul>
			</div>
					</div>

分析以后写出代码

import requests
from bs4 import BeautifulSoup

# 网页 URL
url = '<https://news.sina.com.cn/>'

# 发送 GET 请求并获取响应
response = requests.get(url)

# 使用 BeautifulSoup 解析 HTML 内容
soup = BeautifulSoup(response.content, 'html.parser')

# 找到热榜新闻所在的父元素
hot_news_parent = soup.find('div', class_='blk_main_li')

# 找到所有热榜新闻条目
hot_news_list = hot_news_parent.find_all('li')

# 遍历热榜新闻列表并提取信息
for news_item in hot_news_list:
    # 提取新闻标题和链接
    news_title = news_item.a.text.strip()  # 获取新闻标题文本并去除首尾空格
    news_link = news_item.a['href']  # 获取新闻链接

    # 打印新闻标题和链接
    print(f"标题: {news_title}\\n链接: {news_link}\\n")

结果:

三、结语

通过今天的案例练习和实践,我们可以进一步加深对Beautiful Soup的理解和运用。在进行网页爬取时,记得遵守网站的爬虫规则,不要频繁请求或者过度抓取,以免对网站造成影响。同时,保持学习的态度,不断探索和尝试新的技术和方法,提高自己的爬虫能力和效率。不管做什么都一样,祝兄弟姐妹们在自己的道路上取得更多的成就!

;