java爬虫获取网站图片

因为刚好需要一些图片素材，然后就去网站上下载，但是一张一张的太慢了，一想我学啥的，我学编程的啊，就想到用爬虫，虽然老早就听说过python爬虫的大名，但是还没有使用过，就去看了一些技术博客。哎嘿，居然发现一些用java写的，而且看着好像挺简单，就去入门了一下。

☝️一些废话

👇关于爬虫的一些使用

大体流程就是：通过网站url获取到页面源码，然后找到img标签获取图片路径，最后通过流操作拷贝到本地文件夹中。

创建一个maven项目，然后导入需要的依赖

<dependencies>
    <dependency>
        <groupId>org.jsoup</groupId>
        <artifactId>jsoup</artifactId>
        <version>1.15.4</version>
    </dependency>
    <dependency>
        <groupId>commons-io</groupId>
        <artifactId>commons-io</artifactId>
        <version>2.11.0</version>
    </dependency>
</dependencies>

编写代码完成图片的爬取，我这个只爬取了3页图片，一共39张，速度非常快。

public class ImageCrawl {

    public static void main(String[] args) throws Exception {
        //获取网站前3页的图片
        for (int i = 1; i <= 3; i++) {
            //网站地址 因为我需要爬取的图片翻页只要改变一个数字，就直接使用的for循环
            String url = "需要爬取的网站地址"+i+".html";
            getPic(url,i);
        }

    }
    public static void getPic(String url,int i) throws Exception {
            //获取页面源码
            Document document = Jsoup.parse(new URL(url), 10000);
            Elements picClass = document.getElementsByClass("new-search-works-item");
            Elements imgs = null;
            int id = 0;
            for(Element e : picClass){
                id++;
                //获取img标签
                imgs = e.getElementsByTag("img");
                //获取src元素
                String imgUrl = "https:"+ imgs.attr("src");
                //加载链接获取响应
                Connection.Response response = Jsoup.connect(imgUrl)
                        .("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 Edg/120.0.0.0")//用户代理
                        .ignoreContentType(true)
                        .execute();
                //将方法体转为字节数组作为输入流
                ByteArrayInputStream stream = new ByteArrayInputStream(response.bodyAsBytes());
                String fileName = "D:\\pic\\" +i+"_"+id + ".jpg";
                //将输入流拷贝到文件夹
                FileUtils.copyInputStreamToFile(stream, new File(fileName));
                System.out.println("第"+i+"页第"+id+"张图片已经下载完毕！");
            }
    }
}

java爬虫获取网站图片

目录

悦读