Bootstrap

如何将 Elasticsearch 与流行的 Ruby 工具结合使用

作者:来自 Elastic  Fernando Briano

了解如何将 Elasticsearch 与一些流行的 Ruby 库一起使用。

在这篇博文中,我们将介绍如何将 Elasticsearch 与一些流行的 Ruby 工具结合使用。我们将实现 Ruby 客户端 “入门”指南 中介绍的常用 API。如果你点击该链接,你将看到如何使用官方 Elasticsearch 客户端:elasticsearch-ruby 运行这些相同的操作。

我们对客户端进行了广泛的测试,以确保 Elasticsearch 中的所有 API 都支持每个版本,包括当前正在开发的版本。这涵盖了近 500 个 API。

但是,在某些情况下,你可能不想使用客户端,而想在 Ruby 代码中自己实现一些功能。你的代码可能严重依赖于某个特定的库,并且你希望将其重用于 Elasticsearch。你可能在只需要几个 API 的设置中工作,并且不想引入新的依赖项。或者你的资源有限,并且你不想使用可以在 Elasticsearch 中完成所有操作的成熟客户端。

无论出于何种原因,Elasticsearch 都通过公开可直接调用的 REST API 简化了操作,因此你可以通过发出 HTTP 请求来访问其功能,而无需客户端。使用 API 时,建议查看 API 约定常用选项

简介

这些示例中使用的库是 Net::HTTP、HTTParty、exon、HTTP(又名 http.rb)、Faraday 和 elastic-transport。除了了解如何从 Ruby 与 Elasticsearch 交互之外,本文还将简要介绍每个库,让我们了解它们以及如何使用它们。本文不会深入介绍任何库,但会让你了解每个库的使用方式。

代码是在 Ruby 3.3.5 中编写和测试的。每个工具的版本将在各自的部分中提及。示例使用 require 'bundler/inline' 以方便在编写代码的同一文件中安装必要的 gem,但你也可以使用 Gemfile。

设置

在处理这些示例时,我使用 start-local,这是一个简单的 shell 脚本,可在几秒钟内设置 Elasticsearch 和 Kibana 以进行本地开发。在我编写此代码的目录中,我运行:

curl -fsSL https://elastic.co/start-local | sh

这将创建一个名为 elastic-start-local 的子目录,其中包含一个 .env 文件,其中包含我们连接和验证 Elasticsearch 所需的信息。我们可以在运行 Ruby 代码之前运行 source elastic-start-local/.env,也可以使用 dotenv gem:

require 'dotenv'
Dotenv.load('./elastic-start-local/.env')

以下代码示例假设此文件中的 ENV 变量已加载。

我们可以使用基本身份验证或 API 密钥身份验证对 Elasticsearch 进行身份验证。要使用基本身份验证,我们必须使用用户名 “elastic” 和存储在 ES_LOCAL_PASSWORD 中的值作为密码。要使用 API 密钥身份验证,我们需要此 .env 文件中存储在 ES_LOCAL_API_KEY 中的值。可以使用 Kibana 管理 Elasticsearch,它将使用 start-local 在 http://localhost:5601 运行,你也可以在 Kibana 中手动创建 API 密钥。

默认情况下,Elasticsearch 将在 http://localhost:9200 上运行,但示例从 ES_LOCAL_URL 环境变量加载主机。

你还可以使用任何其他 Elasticsearch 集群来运行这些集群,并相应地调整主机和凭据。如果你使用的是 start-local,你可以使用命令 docker compose stop 停止正在运行的 Elasticsearch 实例,然后从 elastic-start-local 目录使用 docker compose up 重新启动它。

Net::HTTP

Net::HTTP 提供了一个丰富的库,它使用 HTTP 请求-响应协议在客户端-服务器模型中实现客户端。我们可以在代码中使用 require 'net-http' 来要求这个库,然后开始使用它,而无需安装任何额外的依赖项。它不是最用户友好的,但它在 Ruby 中是原生可用的。这些示例中使用的版本是 0.4.1。

require 'json'
require 'net/http'

host = URI(ENV['ES_LOCAL_URL'])
headers = {
  'Authorization' => "ApiKey #{ENV['ES_LOCAL_API_KEY']}",
  'Content-Type' => 'application/json'
}

这为我们提供了执行对 Elasticsearch 的请求的设置。我们可以通过对服务器根路径的初始请求来测试这一点:

response = JSON.parse(Net::HTTP.get(host, headers))
puts response
# {"name"=>"b3349dfab89f", "cluster_name"=>"docker-cluster", ..., "tagline"=>"You Know, for Search"}

我们可以检查响应以获取更多信息:

response = Net::HTTP.get_response(host, headers)
puts "Content-Type: #{response['Content-type']}"
puts "Response status: #{response.code}"
puts "Body: #{JSON.parse(response.body)}"
# Content-Type: application/json
# Response status: 200
# Body: {"name"=>"b3349dfab89f", ...

我们现在可以尝试创建一个索引

index = 'nethttp_docs'
http = Net::HTTP.new(host.hostname, host.port)

# Create an index
response = http.put("/#{index}", '', headers)
puts response
# {"acknowledged":true,"shards_acknowledged":true,"index":"nethttp_index"}

有了索引,我们现在就可以开始处理文档了。

# Index a document
document = { name: 'elasticsearch-ruby', description: 'Official Elasticsearch Ruby client' }.to_json
response = http.post("/#{index}/_doc", document, headers)
puts response
# {"_index":"nethttp_docs","_id":"...

# Save the id for following requests:
id = JSON.parse(response.body)["_id"]

请注意,我们需要将文档转换为 JSON 才能在请求中使用它。有了索引文档,我们可以测试一个非常简单的 search 请求:

# Search
search_body = { query: { match_all: {} } }.to_json
response = http.post("#{index}/_search", search_body, headers)
JSON.parse(response.body)['hits']['hits']
# => [{"_index"=>"nethttp_docs", "_id"=>...

并对索引数据进行更多工作:

# Get a document
response = http.get("#{index}/_doc/#{id}", headers)
JSON.parse(response.body)
# => {"_index"=>"nethttp_docs", ..., "_source"=>{"name"=>"elasticsearch-ruby", "description"=>"Official Ruby client"}}

# Update a document
document = { doc: { name: 'net-http-ruby', description: 'NetHTTP Ruby client' } }.to_json
response = http.post("#{index}/_update/#{id}", document, headers)
# => <Net::HTTPOK 200 OK readbody=true>

# Deleting documents
response = http.delete("#{index}/_doc/#{id}", headers)
# => <Net::HTTPOK 200 OK readbody=true>

最后,我们将删除索引来清理集群:

# Deleting an index
response = http.delete(index, headers)
# => <Net::HTTPOK 200 OK readbody=true>

HTTParty

HTTParty 是一款 “旨在让 HTTP 变得有趣” 的瑰宝。它提供了一些有用的抽象来发出请求和处理响应。这些示例使用了该库的 0.22.0 版本。

require 'bundler/inline'
require 'json'

gemfile do
  source 'https://rubygems.org'
  gem 'httparty'
end

host = URI(ENV['ES_LOCAL_URL'])
headers = {
  'Authorization' => "ApiKey #{ENV['ES_LOCAL_API_KEY']}",
  'Content-Type' => 'application/json'
}

向服务器发出的初始请求:

response = HTTParty.get(host, headers: headers)
# => {"name"=>"b3349dfab89f",
...

# And we can see more info:
response.headers['content-type']
# => "application/json"
response.code
# => 200
JSON.parse(response.body)
# => {"name"=>"b3349dfab89f", ...

如果响应内容类型为 application/json,HTTParty 将解析响应并返回 Ruby 对象(例如哈希或数组)。解析 JSON 的默认行为将以字符串形式返回键。我们可以按如下方式使用响应:

response
# =>
# {"name"=>"b3349dfab89f",
#  "cluster_name"=>"docker-cluster",
#  ...
#  "tagline"=>"You Know, for Search"}

JSON.parse(response.body, symbolize_names: true)
#  =>
# {:name=>"b3349dfab89f",
#  :cluster_name=>"docker-cluster",
#  ...
#  :tagline=>"You Know, for Search"}

# We can also access the response keys directly, like the Elasticsearch::API:Response object returned by the Elasticsearch Ruby client:
response['name']
# => "b3349dfab89f"

README 展示了如何使用类方法快速发出请求以及创建自定义类的选项。实现 Elasticsearch Client 类并添加我们想要使用的不同 API 方法会更方便。例如:

class ESClient
  include HTTParty
  base_uri ENV['ES_LOCAL_URL']

  def initialize
    @headers = {
      'Authorization' => "ApiKey #{ENV['ES_LOCAL_API_KEY']}",
      'Content-Type' => 'application/json'
    }
  end

  def info
    self.class.get('/', headers: @headers)
  end
end

client = ESClient.new
puts client.info

我们不想在这篇博文中使用 HTTParty 重新实现 Elasticsearch Ruby,但当仅使用一些 API 时,这可能是一种替代方案。我们将了解如何构建其余请求:

index = 'httparty-test'
# Create an index
response = HTTParty.put("#{host}/#{index}", headers: headers)
puts response
# {"acknowledged"=>true, "shards_acknowledged"=>true, "index"=>"httparty-test"}

# Index a Document
document = { name: 'elasticsearch-ruby', description: 'Official Elasticsearch Ruby client' }.to_json
response = HTTParty.post("#{host}/#{index}/_doc", body: document, headers: headers)
# => {"_index"=>"httparty-test", "_id": ... }

# Save the id for following requests:
id = response["_id"]

# Get a document:
response = HTTParty.get("#{host}/#{index}/_doc/#{id}", headers: headers)
# => {"_index"=>"httparty-test", ..., "_source"=>{"name"=>"elasticsearch-ruby", "description"=>"Official Elasticsearch Ruby client"}}

# Search
search_body = { query: { match_all: {} } }.to_json
response = HTTParty.post("#{host}/#{index}/_search", body: search_body, headers: headers)
response['hits']['hits']
# => [{"_index"=>"httparty-test", ... ,"_source"=>{"name"=>"elasticsearch-ruby", "description"=>"Official Elasticsearch Ruby client"}}]

# Update a document
document = { doc: { name: 'httparty-ruby', description: 'HTTParty Elasticsearch client' } }.to_json
response = HTTParty.post("#{host}/#{index}/_update/#{id}", body: document, headers: headers)
# => {"_index"=>"httparty-test", "_id" ... }
response.code
# => 200

# Deleting documents
response = HTTParty.delete("#{host}/#{index}/_doc/#{id}", headers: headers)
# => {"_index"=>"httparty-test", "_id" ... }
response.code
# => 200

# Deleting an index
response = HTTParty.delete("#{host}/#{index}", headers: headers)
# => {"acknowledged":true}

excon

Excon 的设计目标是简单、快速和高性能。它特别适合在 API 客户端中使用,因此非常适合与 Elasticsearch 交互。此代码使用 Excon 版本 0.111.0。

require 'bundler/inline'
require 'json'
gemfile do
  source 'https://rubygems.org'
  gem 'excon'
end

host = URI(ENV['ES_LOCAL_URL'])
headers = {
  'Authorization' => "ApiKey #{ENV['ES_LOCAL_API_KEY']}",
  'Content-Type' => 'application/json'
}

response = Excon.get(host, headers: headers)
puts "Content-Type: #{response.headers['content-type']}"
puts "Response status: #{response.status}"
puts "Body: #{JSON.parse(response.body)}"
# Content-Type: application/json
# Response status: 200
# Body: {"name"=>"b3349dfab89f", "cluster_name"=>"docker-cluster", ..., "tagline"=>"You Know, for Search"}

Excon 请求返回一个 Excon::Response 对象,该对象具有 body、headers、remote_ip 和 status 属性。我们还可以使用键作为符号直接访问数据,类似于 Elasticsearch::API::Response 的工作方式:

response[:headers]
# => {"X-elastic-product"=>"Elasticsearch", "content-type"=>"application/json", "content-length"=>"541"}

我们可以在多个请求之间重复使用一个连接,以共享选项并提高性能。我们还可以使用持久连接与初始请求建立套接字连接,并在运行以下示例时保持套接字打开:

connection = Excon.new(host, persistent: true, headers: headers)
index = 'excon-test'

# Create an index
response = connection.put(path: index)
puts response.body
# {"acknowledged":true,"shards_acknowledged":true,"index":"excon-test"}

# Index a Document
document = { name: 'elasticsearch-ruby', description: 'Official Elasticsearch Ruby client' }.to_json
response = connection.post(path: "#{index}/_doc", body: document)
puts response.body
# {"_index":"excon-test","_id" ... }

# Save the id for following requests:
id = JSON.parse(response.body)["_id"]

# Get a document:
response = connection.get(path: "#{index}/_doc/#{id}")
JSON.parse(response.body)
# => {"_index"=>"excon-test", ...,"_source"=>{"name"=>"elasticsearch-ruby", "description"=>"Official Elasticsearch Ruby client"}}

# Search
search_body = { query: { match_all: {} } }.to_json
response = connection.post(path: "#{index}/_search", body: search_body)
JSON.parse(response.body)['hits']['hits']
# => [{"_index"=>"excon-test",..., "_source"=>{"name"=>"elasticsearch-ruby", "description"=>"Official Elasticsearch Ruby client"}}]

# Update a document
document = { doc: { name: 'excon-ruby', description: 'Excon Elasticsearch client' } }.to_json
response = connection.post(path: "#{index}/_update/#{id}", body: document)
# => <Excon::Response:0x0000...
response.status
# => 200

# Deleting documents
response = connection.delete(path: "#{index}/_doc/#{id}")
# => <Excon::Response:0x0000...
response.status
# => 200

# Deleting an index
response = connection.delete(path: index)
puts response.body
# {"acknowledged":true}

# Close connection
connection.reset

HTTP (http.rb)

HTTP 是一个 HTTP 客户端,它使用类似于 Python 的 Requests 的可链接 API。它用 Ruby 实现 HTTP 协议,并将解析外包给本机扩展。此代码中使用的版本是 5.2.0。

host = URI(ENV['ES_LOCAL_URL'])

headers = {
  'Authorization' => "ApiKey #{ENV['ES_LOCAL_API_KEY']}",
  'Content-Type' => 'application/json'
}

HTTP.get(host, headers: headers)
response = HTTP.get(host, headers: headers)
# => <HTTP::Response/1.1 200 OK {"X-elastic-product"=>"Elasticsearch", "content-type"=>"application/json", "content-length"=>"541"}>
puts "Content-Type: #{response.headers['content-type']}"
puts "Response status: #{response.code}"
puts "Body: #{JSON.parse(response.body)}"
# Content-Type: application/json
# Response status: 200
# Body: {"name"=>"b3349dfab89f", ..., "tagline"=>"You Know, for Search"}

我们还可以使用 auth 方法来利用可链接 API:

HTTP.auth(headers["Authorization"]).get(host)

或者因为我们还关心内容类型标头、链 headers:

HTTP.headers(headers).get(host)

因此,一旦我们创建了持久客户端,构建请求就会变得更短:

# Create an index
index = 'http-test'
response = http.put("#{host}/#{index}")
response.parse
# => {"acknowledged"=>true, "shards_acknowledged"=>true, "index"=>"http-test"}

# Index a Document
document = { name: 'elasticsearch-ruby', description: 'Official Elasticsearch Ruby client' }.to_json
response = http.post("#{host}/#{index}/_doc", body: document)
# => <HTTP::Response/1.1 201 Created {"Location"=>"/http-test/_doc/GCA1KZIBr7n-DzjRAVLZ", "X-elastic-product"=>"Elasticsearch", "content-type"=>"application/json", "content-length"=>"161"}>
response.parse
# => {"_index"=>"http-test", "_id" ...}

# Save the id for following requests:
id = response.parse['_id']

# Get a document:
response = http.get("#{host}/#{index}/_doc/#{id}")
# => <HTTP::Response/1.1 200 OK {"X-elastic-product"=>"Elasticsearch", "content-type"=>"application/json", "content-length"=>"198"}>
response.parse
# => {"_index"=>"http-test", "_id", ..., "_source"=>{"name"=>"elasticsearch-ruby", "description"=>"Official Elasticsearch Ruby client"}}

# Search
search_body = { query: { match_all: {} } }.to_json
response = http.post("#{host}/#{index}/_search", body: search_body)
# => <HTTP::Response/1.1 200 OK ...
response.parse['hits']['hits']
# => [{"_index"=>"http-test", "_id",..., "_source"=>{"name"=>"elasticsearch-ruby", "description"=>"Official Elasticsearch Ruby client"}}]

# Update a document
document = { doc: { name: 'http-ruby', description: 'HTTP Elasticsearch client' } }.to_json
response = http.post("#{host}/#{index}/_update/#{id}", body: document)
# => <HTTP::Response/1.1 200 OK ...
response.code
# => 200
response.flush

# Deleting documents
response = http.delete("#{host}/#{index}/_doc/#{id}")
# => <HTTP::Response/1.1 200 OK ...
response.code
# => 200
response.flush

# Deleting an index
response = http.delete("#{host}/#{index}")
# => <HTTP::Response/1.1 200 OK ...
response.parse
# => {"acknowledged"=>true}

文档警告我们,在持久连接中发送下一个请求之前,必须使用响应。这意味着在响应对象上调用 to_s、parse 或 flush。

Faraday

Faraday 是 Elasticsearch 客户端默认使用的 HTTP 客户端库。它提供了多个适配器的通用接口,你可以在实例化客户端时选择这些适配器(Net::HTTP、Typhoeus、Patron、Excon 等)。此代码中使用的 Faraday 版本为 2.12.0。

get 的签名为 (url, params = nil, headers = nil),因此我们在此初始测试请求中传递 nil 作为参数:

require 'bundler/inline'
require 'json'
gemfile do
  source 'https://rubygems.org'
  gem 'faraday'
end

response = Faraday.get(host, nil, headers)
# => <Faraday::Response:0x0000...

响应是一个 Faraday::Response 对象,其中包含响应状态、标头和正文,我们还可以访问 Faraday Env 对象中的许多属性。正如我们在其他库中看到的那样,在我们的用例中使用 Faraday 的推荐方法是创建一个 Faraday::Connection 对象:

conn = Faraday.new(
  url: host,
  headers: headers
)

response = conn.get('/')
puts "Content-Type: #{response.headers['content-type']}"
puts "Response status: #{response.code}"
puts "Body: #{JSON.parse(response.body)}"
# Content-Type: application/json
# Response status: 200
# Body: {"name"=>"b3349dfab89f", ..., "tagline"=>"You Know, for Search"}

现在重新使用该连接,我们可以看到 Faraday 的其余请求是什么样的:

index = 'faraday-test'
# Create an index
response = conn.put(index)
puts response.body
# {"acknowledged":true,"shards_acknowledged":true,"index":"faraday-test"}'

# Index a Document
document = { name: 'elasticsearch-ruby', description: 'Official Elasticsearch Ruby client' }.to_json
# The signature for post is (url, body = nil, headers = nil), unlike the get signature:
response = conn.post("#{index}/_doc", document)
puts response.body
# {"_index":"faraday-test","_id" ... }

# Save the id for following requests:
id = JSON.parse(response.body)["_id"]

# Get a document:
response = conn.get("#{index}/_doc/#{id}")
JSON.parse(response.body)
# => {"_index"=>"faraday-test", ...,"_source"=>{"name"=>"elasticsearch-ruby", "description"=>"Official Elasticsearch Ruby client"}}

# Search
search_body = { query: { match_all: {} } }.to_json
response = conn.post("#{index}/_search", search_body)
JSON.parse(response.body)['hits']['hits']
# => [{"_index"=>"faraday-test", ..., "_source"=>{"name"=>"elasticsearch-ruby", "description"=>"Official Elasticsearch Ruby client"}}]

# Update a document
document = { doc: { name: 'faraday-ruby', description: 'Faraday client' } }.to_json
response = conn.post("#{index}/_update/#{id}", document)
# => <Faraday::Response:0x0000...
response.status
# => 200

# Deleting documents
response = conn.delete("#{index}/_doc/#{id}")
# => <Excon::Response:0x0000...
response.status
# => 200

# Deleting an index
response = conn.delete(index)
puts response.body
# {"acknowledged":true}
 

Elastic Transport

elastic-transport 库是 Ruby 中的精华,用于在官方 Elastic Ruby 客户端中执行 HTTP 请求、编码、压缩等。多年来,这个库已经针对每个官方版本的 Elasticsearch 进行了实战测试。它曾被称为 elasticsearch-transport,因为它是官方 Elasticsearch 客户端的基础。然而,在客户端的 8.0.0 版本中,我们将传输库迁移到了 elastic-transport,因为它还支持官方企业搜索客户端和最近的 Elasticsearch 无服务器(severless)客户端。

它默认使用 Faraday 实现,支持我们之前看到的几种不同的适配器。你还可以使用库中包含的 Manticore 和 Curb(libcurl 的 Ruby 绑定)实现。你甚至可以编写自己的实现,或者使用我们在这里介绍的一些库编写实现。但这将是另一篇博客文章的主题!

Elastic Transport 还可以用作与 Elasticsearch 交互的 HTTP 库。它将处理你需要的一切,并且具有许多与 Elastic 使用相关的设置和不同配置。这里使用的版本是最新的8.3.5。一个简单的例子:

require 'bundler/inline'

gemfile do
  source 'https://rubygems.org'
  gem 'elastic-transport'
end

host = URI(ENV['ES_LOCAL_URL'])
headers = { 'Authorization' => "ApiKey #{ENV['ES_LOCAL_API_KEY']}" }

# Instantiate a new Transport client
transport = Elastic::Transport::Client.new(hosts: host)

# Make a request to the root path ('info') to make sure we can connect:
response = transport.perform_request('GET', '/', {}, nil, headers)

# Create an index
index = 'elastic-transport_docs'
response = transport.perform_request('PUT', "/#{index}", {}, nil, headers)
response.body
# => {"acknowledged"=>true, "shards_acknowledged"=>true, "index"=>"elastic-transport_docs"}

# Index a document
document = { name: 'elastic-transport', description: 'Official Elastic Ruby HTTP transport layer' }.to_json
response = transport.perform_request('POST', "/#{index}/_doc", {}, document, headers)
response.body
# => {"_index"=>"elastic-transport_docs", "_id"=> ... }

# Get the document we just indexed
id = response.body['_id']
response = transport.perform_request('GET', "/#{index}/_doc/#{id}", {}, nil, headers)
response.body['_source']
# => {"name"=>"elastic-transport", "description"=>"Official Elastic Ruby HTTP transport layer"}

# Search for a document
search_body = { query: { match_all: {} } }
response = transport.perform_request('POST', "/#{index}/_search", {}, search_body, headers)
response.body.dig('hits', 'hits').first
# => {"_index"=>"elastic-transport_docs", ..., "_source"=>{"name"=>"elastic-transport", "description"=>"Official Elastic Ruby HTTP transport layer"}}

# Update the document
body = { doc: { name: 'elastic-transport', description: 'Official Elastic Ruby HTTP transport layer.' } }.to_json
response = transport.perform_request('POST', "/#{index}/_update/#{id}", {}, body, headers)
response.body
# => {"_index"=>"elastic-transport_docs", ... }

# Delete a document
response = transport.perform_request('DELETE', "/#{index}/_doc/#{id}", {}, nil, headers)
response.body
# => {"_index"=>"elastic-transport_docs", ... }

结论

如你所见,Elasticsearch Ruby 客户端做了很多工作,让你可以轻松地在 Ruby 代码中与 Elasticsearch 交互。在这篇博文中,我们甚至没有深入讨论如何处理更复杂的请求或处理错误。但 Elasticsearch 的 REST API 使其可以与任何支持 HTTP 请求的库一起使用,无论是 Ruby 还是其他任何语言。Elasticsearch REST API 指南是了解可用 API 及其使用方法的绝佳参考。

准备好自己尝试一下了吗?开始免费试用

想要获得 Elastic 认证?了解下一期 Elasticsearch 工程师培训何时举行!

原文:How to use Elasticsearch with popular Ruby tools - Search Labs

;