Bootstrap

Elasticsearch基础篇(八):常用查询以及使用Java Api Client进行检索

ES常用查询以及使用Java Api Client进行检索

1. 检索需求

参照豆瓣阅读的列表页面

需求:

  • 检索词需要在数据库中的题名、作者和摘要字段进行检索并进行高亮标红
  • 返回的检索结果需要根据综合、热度最高、最近更新、销量最高、好评最多进行排序
  • 分页数量为10,并且返回检索到的总数量

image-20231220150501858

2. 建立测试环境

2.1 根据需求建立es字段

mapping.json

 {
  "mappings": {
    "properties": {
      "title": {
        "analyzer": "standard",
        "type": "text"
      },
      "author": {
        "analyzer": "standard",
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      },
      "contentDesc": {
        "analyzer": "standard",
        "type": "text"
      },
      "wordCount": {
        "type": "double"
      },
      "price": {
        "type": "double"
      },
      "cover": {
        "type": "keyword"
      },
      "heatCount": {
        "type": "integer"
      },
      "updateTime": {
        "type": "date"
      }
    }
  }
}

映射字段说明:

  1. id(长整型): 表示唯一标识的字段,类型为long
  2. title(文本类型): 用于存储文档标题的字段,类型为text。指定默认的标准分析器(analyzer)为standard
  3. author(文本类型): 存储文档作者的字段,同样是text类型。除了使用标准分析器外,还定义额外的关键字(keyword)字段,该关键字字段通常用于==精确匹配和聚合==操作。
  4. contentDesc(文本类型): 存储文档内容描述的字段,同样是text类型,使用标准分析器。
  5. wordCount(双精度浮点型): 存储文档字数的字段,类型为double。通常用于存储浮点数值。
  6. price(双精度浮点型): 存储文档价格的字段,同样是double类型。用于存储浮点数值,例如书籍的价格。
  7. cover(关键字类型): 存储文档封面的字段,类型为keyword。关键字字段通常用于精确匹配。
  8. heatCount(整型): 存储热度计数的字段,类型为integer。通常用于热度排序
  9. updateTime(日期类型): 存储文档更新时间的字段,类型为date。用于最近更新排序

2.2 创建索引和映射

image-20231220154508492

2.3 增加测试数据

 POST /douban/_doc/1001
 {
    "title":"诗云",
    "author":"刘慈欣",
    "contentDesc":"伊依一行三人乘坐一艘游艇在南太平洋上做吟诗航行,平时难得一见的美洲大陆清晰地显示在天空中,在东半球构成的覆盖世界的巨大穹顶上,大陆好像是墙皮脱落的区域…",
    "wordCount":18707,
    "price":6.99,
    "cover":"https://pic.arkread.com/cover/ebook/f/19534800.1653698501.jpg!cover_default.jpg",
    "heatCount":201,
    "updateTime":"2023-12-20"
 }
 
  POST /douban/_doc/1002
 {
    "title":"三体2·黑暗森林",
    "author":"刘慈欣",
    "contentDesc":"征服世界的中国科幻神作!包揽九项世界顶级科幻大奖!《三体》获得第73届“雨果奖”最佳长篇奖!",
    "wordCount":318901,
    "price":32.00,
    "cover":"https://pic.arkread.com/cover/ebook/f/110344476.1653700299.jpg!cover_default.jpg",
    "heatCount":545,
    "updateTime":"2023-12-25"
 }
 
  POST /douban/_doc/1003
 {
    "title":"三体前传:球状闪电",
    "author":"刘慈欣",
    "contentDesc":"征服世界的中国科幻神作!包揽九项世界顶级科幻大奖!《三体》获得第73届“雨果奖”最佳长篇奖!",
    "wordCount":181119,
    "price":35.00,
    "cover":"https://pic.arkread.com/cover/ebook/f/116984494.1653699856.jpg!cover_default.jpg",
    "heatCount":765,
    "updateTime":"2022-11-12"
 }
 
  POST /douban/_doc/1004
 {
    "title":"全频带阻塞干扰",
    "author":"刘慈欣",
    "contentDesc":"这是一个场面浩大而惨烈的故事。21世纪的某年,以美国为首的北约发起了对俄罗斯的全面攻击。在残酷的保卫战中,俄国的电子战设备无力抵挡美国的进攻",
    "wordCount":28382,
    "price":6.99,
    "cover":"https://pic.arkread.com/cover/ebook/f/19532617.1653698557.jpg!cover_default.jpg",
    "heatCount":153,
    "updateTime":"2021-03-23"
 }

3. 执行查询

3.1 主键查询

# 此种方式已过时,不推荐
GET /douban/_doc/1001

# 推荐此种方式
POST /douban/_search
{
    "query": {
        "match": {
            "_id": 1001
        }
    }
}

3.2 全量查询

POST /douban/_search
{
    "query": {
        "match_all": { 
        }
    }
}

3.3 分页查询

POST /douban/_search
{
    "query": {
        "match_all": { 
        }
    },
    "from":1,
    "size":2
}

3.4 排序查询

POST /douban/_search
{
  "query": {
    "match_all": {
    }
  },
  "sort": [
    {
      "price": { 
        "order": "desc" 
      }
    }
  ]
}

3.5 全文检索

POST /douban/_search
{
  "query": {
    "match": {
      "title":"三体球闪"
  }
 }
}

检索结果:

image-20231220170628892

3.6 高亮检索

POST /douban/_search
{
    "query": {
        "match": {
            "title": "三体球闪"
        }
    },
    "highlight": {
        "fields": {
            "title": {
                "pre_tags": [
                    "<font style='red'>"
                ],
                "post_tags": [
                    "</font>"
                ]
            }
        }
    }
}

image-20231220172424036

3.7 bool查询

题名进行全文检索包含‘三体球闪’,并且价格为‘35’的数据

POST /douban/_search
{
    "query": {
        "bool": {
            "must": [
                {
                    "match": {
                        "title": "三体球闪"
                    }
                },
                {
                    "term": {
                        "price": 35
                    }
                }
            ]
        }
    }
}

3.7 多字段全文检索

对题名、作者、摘要进行全文匹配,同时根据三个字段进行高亮标红

POST /douban/_search
{
    "query": {
        "multi_match": {
            "query": "三体球闪",
            "fields": [
                "title",
                "author",
                "contentDesc"
            ]
        }
    },
    "highlight": {
        "fields": {
            "title": {},
            "author": {},
            "contentDesc": {}
        }
    }
}

image-20231220173414933

3.8 综合检索

对题名、作者、摘要进行全文匹配,同时根据三个字段进行高亮标红

增加分页条件查询、增加更新日期降序排序、同时返回需要的必备字段

POST /douban/_search
{
    "query": {
        "multi_match": {
            "query": "三体球闪",
            "fields": [
                "title",
                "author",
                "contentDesc"
            ]
        }
    },
    "from": 0,
    "size": 2,
    "_source": [
        "title",
        "author",
        "price",
        "wordCount"
    ],
    "sort": [
        {
            "updateTime": {
                "order": "desc"
            }
        }
    ],
    "highlight": {
        "fields": {
            "title": {
            },
            "author": {
            },
            "contentDesc": {
            }
        }
    }
}

image-20231221092538555

4. Spring项目集成elasticsearch

参考文档:[Installation | Elasticsearch Java API Client 7.17] | Elastic

4.1 创建Spring项目并引入es依赖

image-20231222145917463 image-20231222150055190

如果希望使用java8,就打开pom.xml修改parent版本和java.version的值,然后点击刷新maven

image-20231222152132304

在Elasticsearch7.15版本之后,Elasticsearch官方将它的高级客户端RestHighLevelClient标记为弃用状态。同时推出了全新的Java API客户端Elasticsearch Java API Client,该客户端也将在Elasticsearch8.0及以后版本中成为官方推荐使用的客户端。

Api名称介绍
TransportClient-废弃,8.x删除基于TCP方式访问,只支持JAVA,7.x开始弃用,8.x删除.
Rest Lower Level Rest Client低等级RestApi,最小依赖。
Rest High Level Rest Client废弃,未说明删除时间高等级的RestApi,基于低等级Api,7.15开始弃用,但没有说明会删除。用低等级Api替换。
RestClient基于Http的Api形式,跨语言,推荐使用,底层基于低等级Api,7.15才开始提供
<dependency>
	<groupId>co.elastic.clients</groupId>
	<artifactId>elasticsearch-java</artifactId>
	<version>7.17.11</version>
</dependency>

<dependency>
	<groupId>com.fasterxml.jackson.core</groupId>
	<artifactId>jackson-databind</artifactId>
	<version>2.12.3</version>
</dependency>

<!-- 此依赖的作用是解决:lassNotFoundException: jakarta.json.spi.JsonProvider
     参考:https://github.com/elastic/elasticsearch-java/issues/311 -->
<dependency>
    <groupId>jakarta.json</groupId>
    <artifactId>jakarta.json-api</artifactId>
    <version>2.0.1</version>
</dependency>

完整依赖如下:注意 properties中一定要加 <elasticsearch.version>7.17.11</elasticsearch.version>,否则会导致无法覆盖父引用中依赖

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>2.5.15</version>
        <relativePath/>
    </parent>
    <groupId>com.zhouquan</groupId>
    <artifactId>client</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <name>client</name>
    <description>Demo project for Spring Boot</description>
    <properties>
        <java.version>8</java.version>
        <lombok.version>1.18.22</lombok.version>
        <elasticsearch.version>7.17.11</elasticsearch.version>
        <jakarta.version>2.0.1</jakarta.version>
    </properties>
    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
        </dependency>

        <dependency>
            <groupId>co.elastic.clients</groupId>
            <artifactId>elasticsearch-java</artifactId>
            <version>7.17.11</version>
        </dependency>

        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-databind</artifactId>
            <version>2.12.3</version>
        </dependency>

        <!-- 此依赖的作用是解决:lassNotFoundException: jakarta.json.spi.JsonProvider
         参考:https://github.com/elastic/elasticsearch-java/issues/311 -->
        <dependency>
            <groupId>org.glassfish</groupId>
            <artifactId>jakarta.json</artifactId>
            <version>${jakarta.version}</version>
        </dependency>

        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
            <version>${lombok.version}</version>
        </dependency>

        <!-- Apache Commons IO -->
        <dependency>
            <groupId>commons-io</groupId>
            <artifactId>commons-io</artifactId>
            <version>2.11.0</version>
        </dependency>

    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
            </plugin>
        </plugins>
    </build>

</project>

4.2 增加es客户端配置类

交给spring进行管理,使用时通过@Resource private ElasticsearchClient client; 注入即可使用

@Configuration
@Slf4j
public class EsClient {

    @Resource
    private EsConfig esConfig;

    /**
     * Bean 定义,用于创建 ElasticsearchClient 实例。
     *
     * @return 配置有 RestClient 和传输设置的 ElasticsearchClient 实例。
     */
    @Bean
    public ElasticsearchClient elasticsearchClient() {
        // 使用 Elasticsearch 集群的主机和端口配置 RestClient
        List<String> clusterNodes = esConfig.getClusterNodes();
        HttpHost[] httpHosts = clusterNodes.stream().map(HttpHost::create).toArray(HttpHost[]::new);

        // Create the low-level client
        RestClient restClient = RestClient.builder(httpHosts).build();

        // JSON 序列化
        ElasticsearchTransport transport = new RestClientTransport(restClient, new JacksonJsonpMapper());

        ElasticsearchClient client = new ElasticsearchClient(transport);

        // 打印连接信息
        log.info("Elasticsearch Client 连接节点信息:{}", Arrays.toString(httpHosts));

        return client;
    }

}

4.3 使用 Java API Client 创建索引

参考链接:Using the Java API Client

image-20240109152629604

/**
 * 创建索引
 */
@Test
void createIndex() throws IOException {
    ClassLoader classLoader = ResourceLoader.class.getClassLoader();
    InputStream input = classLoader.getResourceAsStream("mapping/douban.json");

    CreateIndexRequest req = CreateIndexRequest.of(b -> b
            .index("douban_v1")
            .withJson(input)
    );

    boolean created = client.indices().create(req).acknowledged();
    log.info("是否创建成功:" + created);
}

4.4 保存文档

实体类 DouBan.java

package com.zhouquan.client.entity;

import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;

import java.util.Date;

/**
 * @author ZhouQuan
 * @description todo
 * @date 2024-01-09 15:54
 **/
@Data
@AllArgsConstructor
@NoArgsConstructor
public class DouBan {

    private String id;

    private String title;

    private String author;

    private String contentDesc;

    private Integer wordCount;

    private Double price;

    private String cover;

    private Integer heatCount;

    private Date updateTime;
}

4.4.1 索引单个文档

public String indexSingleDoc() {
        IndexResponse indexResponse;
        DouBan douBan = new DouBan("1211", "河边的错误", "余华", "内容简介", 50000, 52.5, "封面1", 74, new Date());
        try {
            // 使用流式dsl保存
            indexResponse = client.index(i -> i
                    .index(indexName)
                    .id(douBan.getId())
                    .document(douBan));

            // 使用 Java API Client的静态of()方法
            IndexRequest<DouBan> objectIndexRequest = IndexRequest.of(i -> i
                    .index(indexName)
                    .id(douBan.getId())
                    .document(douBan));
            IndexResponse ofIndexResponse = client.index(objectIndexRequest);

            // 使用经典版本
            IndexRequest.Builder<DouBan> objectBuilder = new IndexRequest.Builder<>();
            objectBuilder.index(indexName);
            objectBuilder.id(douBan.getId());
            objectBuilder.document(douBan);
            IndexResponse classicIndexResponse = client.index(objectBuilder.build());

            // 异步保存
            asyncClient.index(i -> i
                    .index("douban")
                    .id(douBan.getId())
                    .document(douBan)
            ).whenComplete((response, exception) -> {
                if (exception != null) {
                    log.error("Failed to index", exception);
                } else {
                    log.info("Indexed with version " + response.version());
                }
            });

            // 索引原始json数据
            IndexResponse response = null;
            try {
                String jsonData = " {\"id\":\"1741\",\"title\":\"三体\",\"author\":\"刘慈欣\",\"contentDesc\":\"内容简介\",\"wordCount\":50000,\"price\":52.5}";
                Reader input = new StringReader(jsonData);
                IndexRequest<JsonData> request = IndexRequest.of(i -> i
                        .index("douban_v1")
                        .withJson(input)
                );

                response = client.index(request);
                log.info("Indexed with version " + response.version());
            } catch (IOException e) {
                throw new RuntimeException(e);
            }

        } catch (IOException e) {
            throw new RuntimeException(e);
        }
        return Result.Created.equals(indexResponse.result()) + "";
    }

4.4.2 批量索引文档

/**
 * 批量保存
 *
 * @throws IOException
 */
@Test
void bulkSave() throws IOException {

    DouBan douBan1 = new DouBan("1002", "题名1", "余华", "内容简介", 50000, 52.5, "封面1", 74, new Date());
    DouBan douBan2 = new DouBan("1003", "题名2", "余华", "内容简介", 50000, 52.5, "封面1", 74, new Date());
    DouBan douBan3 = new DouBan("1004", "题名3", "余华", "内容简介", 50000, 52.5, "封面1", 74, new Date());
    List<DouBan> douBanList = new ArrayList<>();
    douBanList.add(douBan1);
    douBanList.add(douBan2);
    douBanList.add(douBan3);

    BulkRequest.Builder br = new BulkRequest.Builder();
    for (DouBan douBan : douBanList) {
        br.operations(op -> op
                .index(idx -> idx
                        .index("products")
                        .id(douBan.getId())
                        .document(douBan)
                )
        );
    }

    BulkResponse result = client.bulk(br.build());
    if (result.errors()) {
        log.error("Bulk had errors");
        for (BulkResponseItem item : result.items()) {
            if (item.error() != null) {
                log.error(item.error().reason());
            }
        }
    }
}

4.4.3 原始数据批量索引文档

/**
 * 原始json数据批量保存
 *
 * @throws IOException
 */
@Test
void rawDataBulkSave() throws IOException {

    File logDir = new File("D:\\IdeaProjects\\client\\src\\main\\resources\\data");

    File[] logFiles = logDir.listFiles(
            file -> file.getName().matches("bulk*.*\\.json")
    );

    BulkRequest.Builder br = new BulkRequest.Builder();

    for (File file : logFiles) {
        FileInputStream input = new FileInputStream(file);
        BinaryData data = BinaryData.of(IOUtils.toByteArray(input), ContentType.APPLICATION_JSON);

        br.operations(op -> op
                .index(idx -> idx
                        .index("douban_v1")
                        .document(data)
                )
        );
    }

    BulkResponse result = client.bulk(br.build());
    if (result.errors()) {
        List<BulkResponseItem> items = result.items();
        items.forEach(x -> System.out.println(x.error()));
    }
    log.info("是否成功批量保存:" + !result.errors());
}

4.5 获取单个文档

// 根据id获取数据并装载为java对象
GetRequest getRequest = GetRequest.of(x -> x.index("douban_v1").id("1002"));
GetResponse<DouBan> douBanGetResponse = client.get(getRequest, DouBan.class);
DouBan source = douBanGetResponse.source();

GetResponse<DouBan> response = client.get(g -> g
                .index(indexName)
                .id(id),
        DouBan.class
);

if (!response.found()) {
    throw new BusinessException("未获取到指定id的数据");
}

DouBan douBan = response.source();
log.info("资料title: " + douBan.getTitle());
return douBan;
// 根据id获取原始JSON数据
GetResponse<ObjectNode> response1 = client.get(g -> g
                .index(indexName)
                .id(id),
        ObjectNode.class
);

if (response1.found()) {
    ObjectNode json = response1.source();
    String name = json.get("title").asText();
    log.info(" title " + name);
} else {
    log.info("data not found");
}
return null;

4.6 文档检索

4.6.1 普通的搜索查询

public List<DouBan> search(String searchText) {

    SearchResponse<DouBan> response = null;
    try {
        response = client.search(s -> s
                        .index(indexName)
                        .query(q -> q
                                .match(t -> t
                                        .field("title")
                                        .query(searchText)
                                )
                        ),
                DouBan.class
        );
    } catch (IOException e) {
        throw new RuntimeException(e);
    }

    TotalHits total = response.hits().total();
    boolean isExactResult = total.relation() == TotalHitsRelation.Eq;

    if (isExactResult) {
        log.info("There are " + total.value() + " results");
    } else {
        log.info("There are more than " + total.value() + " results");
    }

    List<Hit<DouBan>> hits = response.hits().hits();
    List<DouBan> list = new ArrayList<>();
    for (Hit<DouBan> hit : hits) {
        DouBan DouBan = hit.source();
        list.add(DouBan);
        log.info("Found DouBan " + DouBan.getTitle() + ", score " + hit.score());
    }
    return list;
}

4.6.2 嵌套搜索查询

public List<DouBan> search2(String searchText, Double price) {

    Query titleQuery = MatchQuery.of(m -> m
            .field("title")
            .query(searchText)
    )._toQuery();

    Query rangeQuery = RangeQuery.of(r -> r
            .field("price")
            .gte(JsonData.of(price))
    )._toQuery();

    try {
        SearchResponse<DouBan> search = client.search(s -> s
                        .index(indexName)
                        .query(q -> q
                                .bool(b -> b
                                        .must(titleQuery)
                                        .must(rangeQuery)
                                )
                        )
                ,
                DouBan.class
        );

        // 解析检索结果
        List<DouBan> douBanList = new ArrayList<>();
        List<Hit<DouBan>> hits = search.hits().hits();
        for (Hit<DouBan> hit : hits) {
            DouBan douBan = hit.source();
            douBanList.add(douBan);
        }
        return douBanList;
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
}

4.6.3 模板搜索

// 创建模板,返回搜索请求正文的存储脚本
client.putScript(r -> r
        .id("query-script")
        .script(s -> s
                .lang("mustache")
                .source("{\"query\":{\"match\":{\"{{field}}\":\"{{value}}\"}}}")
        ));

// 执行请求
SearchTemplateResponse<DouBan> response = client.searchTemplate(r -> r
                .index("douban_v1")
                .id("query-script")
                .params("field", JsonData.of("title"))
                .params("value", JsonData.of("题名")),
        DouBan.class
);

// 结果解析
List<Hit<DouBan>> hits = response.hits().hits();
for (Hit<DouBan> hit: hits) {
    DouBan DouBan = hit.source();
    log.info("Found DouBan " + DouBan.getTitle() + ", score " + hit.score());
}

4.7 文档聚合

Query query = MatchQuery.of(t -> t
        .field("title")
        .query(searchText))._toQuery();

Aggregation authorAgg = AggregationBuilders.terms().field("author").build()._toAggregation();

SearchResponse<DouBan> response = null;

response = client.search(s -> s
                .index(indexName)
                .query(query)
                .aggregations("author", authorAgg),
        DouBan.class
);
;