Bootstrap

[C++][第三方库][Elasticsearch]详细讲解


1.介绍

  • Elasticsearch,简称ES,它是个开源分布式搜索引擎
    • 特点:分布式,零配置,自动发现,索引自动分片,索引副本机制,restful风格接口,多数据源,自动搜索负载等
    • 它可以近乎实时的存储、检索数据;本身扩展性很好,可以扩展到上百台服务器,处理PB级别的数据
    • ES也使用Java开发并使用Lucene作为其核心来实现所有索引和搜索的功能,但是它的目的是通过简单的RESTfulAPI来隐藏Lucene的复杂性,从而让全文搜索变得简单
  • Elasticsearch是**面向文档**(document oriented)的
    • 这意味着它可以存储整个对象或文档(document)
    • 然而它不仅仅是存储,还会索引(index)每个文档的内容使之可以被搜索
      • 可以对文档(而非成行成列的数据)进行索引、搜索、排序、过滤

2.安装

1.ES

  • 添加仓库密钥:上边的添加方式会导致一个apt-key的警告,如果不想报警告使用下边这个
    # 1.
    wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
    
    # 2.
    curl -s https://artifacts.elastic.co/GPG-KEY-elasticsearch | \
    sudo gpg --no-default-keyring \
    --keyring gnupg-ring:/etc/apt/trusted.gpg.d/icsearch.gpg --import
    
  • 添加镜像源仓库
    echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" \
    | sudo tee /etc/apt/sources.list.d/elasticsearch.list
    
  • 更新软件包列表
    sudo apt update
    
  • 安装ES
    sudo apt-get install elasticsearch=7.17.21
    
  • 启动ES
    sudo systemctl start elasticsearch
    
  • 安装ik分词器插件
    sudo /usr/share/elasticsearch/bin/elasticsearch-plugin install \
    https://get.infini.cloud/elasticsearch/analysis-ik/7.17.21
    
  • 查看ES服务的状态
    sudo systemctl status elasticsearch.service
    
  • 验证ES是否安装成功
    curl -X GET "http://localhost:9200/"
    
  • 设置外网访问:默认只能在本机进行访问,修改后浏览器访问IP:PORT
    vim /etc/elasticsearch/elasticsearch.yml
    
    # 新增配置
    network.host: 0.0.0.0
    http.port: 9200
    cluster.initial_master_nodes: ["node-1"]
    
  • 如果启动ES的时候出现报错
    • 解决方法
      # 调整ES虚拟内存,虚拟内存默认最大映射数为65530,无法满足ES系统要求, 需要调整为262144以上
      sudo sysctl -w vm.max_map_count=262144
      
      # 增加虚拟机内存配置
      sudo vim /etc/elasticsearch/jvm.options
      
      # 新增如下内容
      -Xms512m
      -Xmx512m
      
      Job for elasticsearch.service failed because the control process exited with error code.
      See "systemctl status elasticsearch.service" and "journalctl -xeu elasticsearch.service" for details.
      

2.Kibana

  • 安装Kibana
    sudo apt install kibana
    
  • 配置Kibana(可选):根据需要配置Kibana,配置文件通常位于/etc/kibana/kibana.yml,可能需要设置如服务器地址、端口、Elasticsearch URL
  • 启动Kibana
    sudo systemctl start kibana
    
  • 设置开机自启(可选)
    sudo systemctl enable kibana
    
  • 访问Kibanahttp://<ip>:5601

3.ES核心概念

1.索引(index)

  • 一个索引就是一个拥有几分相似特征的文档的集合
    • 例如
      • 有一个客户数据的索引,一个产品目录的索引,还有一个订单数据的索引
      • 一个索引由一个名字来标识(必须全部是小写字母的),并且当要对应于这个索引中的文档进行索引、搜索、更新和删除的时候,都要使用到这个名字
  • 在一个集群中,可以定义任意多的索引
  • 索引类似于数据库中的概念
    • 数据库中的库,表示了一组数据的集合
    • ES中的索引,是一组相似特征数据的集合

2.类型(Type)

  • 在一个索引中,可以定义一种或多种类型
  • 一个类型是索引的一个逻辑上的分类/分区,其语义完全由用户来定
  • 通常,会为具有一组共同字段的文档定义一个类型
    • 例如
      • 运营一个博客平台并且将所有的数据存储到一个索引中
      • 在这个索引中,可以为用户数据定义一个类型,为博客数据定义另一个类型,为评论数据定义另一个类型
  • [类型]类似于数据库中表的概念,在索引的概念下,又对数据集合进行了一层细分
  • 现在[类型]几乎已经弃用

3.字段(Field)

  • 字段相当于是数据库表的字段,对文档数据根据不同属性进行的分类标识 -> 数据类型
    ![[Pasted image 20240918180030.png]]

4.映射(mapping)

  • 映射是在处理数据的方式和规则方面做一些限制
    • 某个字段的数据类型、默认值、分析器、是否被索引等等,这些都是映射里面可以设置的
      • 映射类似于告诉ES哪些字段需要分词,做出索引映射,能够进行数据检索
    • 其它就是处理ES里面数据的一些使用规则设置也叫做映射
  • 按着最优规则处理数据对性能提高很大,因此才需要建立映射,并且需要思考如何建立映射才能对性能更好
  • 具体规则
    • enabled:是否仅作存储,不做搜索和分析
      • 取值true(默认)/false
    • index:是否构建倒排索引(决定了是否分词,是否被索引)
      • 取值true(默认)/false
    • index_option
    • dynamic:控制mapping的自动更新
      • 取值true(默认)/false
    • doc_value:是否开启doc_value,用户聚合和排序分析,分词字段不能使用
      • 取值true(默认)/false
    • fielddata:是否为text类型启动fielddata,实现排序和聚合分析
      • 针对分词字段,参与排序或聚合时能提高性能
      • 不分词字段统一建议使用doc_value
        fielddata": {
            "format": "disabled"
        } 
        
    • store:是否单独设置此字段的是否存储而从_source字段中分离
      • 取值true/false(默认)
    • coerce:是否开启自动数据类型转换功能,如字符串转整形,浮点转整形
      • 取值true(默认)/false
    • analyzer:指定分词器,默认分词器是standard analyzer
      • 示例”analyzer”: “ik”
    • boost:字段级别的分数加权,默认值是1.0
      • 示例”boost”: 1.25
    • fields:对一个字段提供多种索引模式,同一个字段的值,一个分词一个不分词
      "fields": { 
          "raw": { 
              "type": "text",  
              "index": "not_analyzed" 
          }
      }
      
    • data_detection:是否自动识别日期类型
      • 取值true(默认)/false

5.文档(document)

  • 一个文档是一个可被索引的基础信息单元

  • 例如:某一个客户的文档,某一个产品的一个文档或者某个订单的一个文档

    • 文档以JSON格式来表示,而JSON是一个到处存在的互联网数据交互格式
    • 在一个index/type里面,可以存储任意多的文档
    • 一个文档必须被索引或者赋予一个索引的type
  • Elasticsearch与传统关系性数据库相比

    DBDatabaseTableRowColumn
    ESIndexTypeDocumentField

4.Kibana访问ES进行测试

  • 创建索引库
    POST /user/_doc 
    { 
        "settings" : { 
            "analysis" : { 
                "analyzer" : { 
                    "ik" : { 
                        "tokenizer" : "ik_max_word" 
                    } 
                } 
            } 
        }, 
        "mappings" : { 
            "dynamic" : true, 
            "properties" : { 
                "nickname" : { 
                    "type" : "text", 
                    "analyzer" : "ik_max_word" 
                }, 
                "user_id" : { 
                    "type" : "keyword", 
                    "analyzer" : "standard" 
                }, 
                "phone" : { 
                    "type" : "keyword", 
                    "analyzer" : "standard" 
                }, 
                "description" : { 
                    "type" : "text", 
                    "enabled" : false 
                }, 
                "avatar_id" : { 
                    "type" : "keyword",
                    "enabled" : false 
                } 
            } 
        }
    }
    
  • 新增数据
    • 插入形式
      POST /user/_doc/_bulk 
      {"index":{"_id":"1"}} 
      {"user_id" : "USER4b862aaa-2df8654a-7eb4bb65e3507f66","nickname" : "昵称1","phone" : "手机号1","description" : "签名1","avatar_id" : "头像1"} 
      {"index":{"_id":"2"}} 
      {"user_id" : "USER14eeeaa5-442771b9-0262e455e4663d1d","nickname" : "昵称2","phone" : "手机号2","description" : "签名2","avatar_id" : "头像2"} 
      {"index":{"_id":"3"}} 
      {"user_id" : "USER484a6734-03a124f0-996c169dd05c1869","nickname" : "昵称3","phone" : "手机号3","description" : "签名3","avatar_id" : "头像3"} 
      {"index":{"_id":"4"}} 
      {"user_id" : "USER186ade83-4460d4a6-8c08068f83127b5d","nickname" : "昵称4","phone" : "手机号4","description" : "签名4","avatar_id" : "头像4"} 
      {"index":{"_id":"5"}} 
      {"user_id" : "USER6f19d074-c33891cf-23bf5a8357189a19","nickname" : "昵称5","phone" : "手机号5","description" : "签名5","avatar_id" : "头像5"} 
      {"index":{"_id":"6"}} 
      {"user_id" : "USER97605c64-9833ebb7-d045535335a59195","nickname" : "昵称6","phone" : "手机号6","description" : "签名6","avatar_id" : "头像6"}
      
    • 便于阅读
      [
          {
              "index": {
                  "_id": "1"
              },
              "user": {
                  "user_id": "USER4b862aaa-2df8654a-7eb4bb65e3507f66",
                  "nickname": "昵称1",
                  "phone": "手机号1",
                  "description": "签名1",
                  "avatar_id": "头像1"
              }
          },
          {
              "index": {
                  "_id": "2"
              },
              "user": {
                  "user_id": "USER14eeeaa5-442771b9-0262e455e4663d1d",
                  "nickname": "昵称2",
                  "phone": "手机号2",
                  "description": "签名2",
                  "avatar_id": "头像2"
              }
          },
          {
              "index": {
                  "_id": "3"
              },
              "user": {
                  "user_id": "USER484a6734-03a124f0-996c169dd05c1869",
                  "nickname": "昵称3",
                  "phone": "手机号3",
                  "description": "签名3",
                  "avatar_id": "头像3"
              }
          },
          {
              "index": {
                  "_id": "4"
              },
              "user": {
                  "user_id": "USER186ade83-4460d4a6-8c08068f83127b5d",
                  "nickname": "昵称4",
                  "phone": "手机号4",
                  "description": "签名4",
                  "avatar_id": "头像4"
              }
          },
          {
              "index": {
                  "_id": "5"
              },
              "user": {
                  "user_id": "USER6f19d074-c33891cf-23bf5a8357189a19",
                  "nickname": "昵称5",
                  "phone": "手机号5",
                  "description": "签名5",
                  "avatar_id": "头像5"
              }
          },
          {
              "index": {
                  "_id": "6"
              },
              "user": {
                  "user_id": "USER97605c64-9833ebb7-d045535335a59195",
                  "nickname": "昵称6",
                  "phone": "手机号6",
                  "description": "签名6",
                  "avatar_id": "头像6"
              }
          }
      ]
      
  • 查看并搜索数据:
    GET /user/_doc/_search?pretty 
    { 
        "query" : { 
            "bool" : { 
                "must_not" : [ 
                    { 
                        "terms" : { 
                            "user_id.keyword" : [ 
                                "USER4b862aaa-2df8654a-7eb4bb65e3507f66", 
                                "USER14eeeaa5-442771b9-0262e455e4663d1d", 
                                "USER484a6734-03a124f0-996c169dd05c1869" 
                            ] 
                        } 
                    } 
                ], 
                "should" : [ 
                    { 
                        "match" : { 
                            "user_id" : "昵称" 
                        } 
                    }, 
                    { 
                        "match" : { 
                            "nickname" : "昵称" 
                        } 
                    }, 
                    { 
                        "match" : { 
                            "phone" : "昵称" 
                        } 
                    } 
                ] 
            } 
        } 
    } 
    
  • 删除索引
    DELETE /user
    
  • 查询所有数据
    POST /user/_doc/_search
    {
        "query": 
        { 
            "match_all":{} 
        }
    }
    

5.ES客户端的安装

  • 代码
  • 官网
  • ES C++的客户端选择并不多, 这里使用elasticlient
  • 前置安装:依赖MicroHTTPD
    sudo apt-get install libmicrohttpd-dev
    
  • 安装
    # 克隆代码
    git clone https://github.com/seznam/elasticlient
    
    # 切换目录
    cd elasticlient
    
    # 更新子模块
    git submodule update --init --recursive
    
    # 编译代码
    make build && cd build
    cmake ..
    make
    
    # 安装
    make install
    

6.ES客户端接口介绍

/** 
* Perform search on nodes until it is successful. Throws 
exception if all nodes 
* has failed to respond. 
* \param indexName specification of an Elasticsearch index. 
* \param docType specification of an Elasticsearch document type. 
* \param body Elasticsearch request body. 
* \param routing Elasticsearch routing. If empty, no routing has 
been used. 
* 
 * \return cpr::Response if any of node responds to request. 
 * \throws ConnectionException if all hosts in cluster failed to 
respond. 
 */ 
cpr::Response search(const std::string &indexName, 
                     const std::string &docType, 
                     const std::string &body, 
                     const std::string &routing = std::string()); 
 
/** 
 * Get document with specified id from cluster. Throws exception 
if all nodes 
 * has failed to respond. 
 * \param indexName specification of an Elasticsearch index. 
 * \param docType specification of an Elasticsearch document type. 
 * \param id Id of document which should be retrieved. 
 * \param routing Elasticsearch routing. If empty, no routing has 
been used. 
 * 
 * \return cpr::Response if any of node responds to request. 
 * \throws ConnectionException if all hosts in cluster failed to 
respond. 
 */ 
cpr::Response get(const std::string &indexName, 
                  const std::string &docType, 
                  const std::string &id = std::string(), 
                  const std::string &routing = std::string()); 
 
/** 
 * Index new document to cluster. Throws exception if all nodes 
has failed to respond. 
 * \param indexName specification of an Elasticsearch index. 
 * \param docType specification of an Elasticsearch document type. 
 * \param body Elasticsearch request body. 
 * \param id Id of document which should be indexed. If empty, id 
will be generated 
 *           automatically by Elasticsearch cluster. 
 * \param routing Elasticsearch routing. If empty, no routing has 
been used. 
 * 
 * \return cpr::Response if any of node responds to request. 
 * \throws ConnectionException if all hosts in cluster failed to 
respond. 
 */ 
cpr::Response index(const std::string &indexName, 
                    const std::string &docType, 
                    const std::string &id, 
                    const std::string &body, 
                    const std::string &routing = std::string()); 
 
/** 
 * Delete document with specified id from cluster. Throws 
exception if all nodes 
 * has failed to respond. 
 * \param indexName specification of an Elasticsearch index. 
 * \param docType specification of an Elasticsearch document type. 
 * \param id Id of document which should be deleted. 
 * \param routing Elasticsearch routing. If empty, no routing has 
been used. 
 * 
 * \return cpr::Response if any of node responds to request. 
 * \throws ConnectionException if all hosts in cluster failed to 
respond. 
 */ 
cpr::Response remove(const std::string &indexName, 
                     const std::string &docType, 
                     const std::string &id, 
                     const std::string &routing = std::string()); 

7.使用

  • ES客户端使用注意
    • 地址后边不要忘了相对根目录http://127.0.0.1:9200/
    • ES客户端API使用时,要进行异常捕捉,否则操作失败会导致程序异常退出
    #include <iostream>
    #include <elasticlient/client.h>
    #include <cpr/cpr.h>
    
    int main()
    {
        // 1.构造ES客户端
        elasticlient::Client client({"http://127.0.0.1:9200/"});
    
        // 2.发起搜索请求
        try
        {
            auto resp = client.search("user", "_doc", 
    							      "{\"query\": { \"match_all\":{} }}");
            std::cout << resp.status_code << std::endl;
            std::cout << resp.text << std::endl;
        }
        catch(std::exception &e)
        {
            std::cout << e.what() << std::endl;
            return -1;
        }
    
        return 0;
    }
    

;