为了说明Elasticsearch的不同查询类型,我们将搜索一个图书文档集合,其中有以下字段:标题、作者、摘要、发布日期和评论数量。但首先,让我们创建一个新的索引,并使用批量API索引一些文档:
PUT /bookdb_index
{ "settings": { "number_of_shards": 1 }}
POST /bookdb_index/book/_bulk
{ "index": { "_id": 1 }}
{ "title": "Elasticsearch: The Definitive Guide", "authors": ["clinton gormley", "zachary tong"], "summary" : "A distibuted real-time search and analytics engine", "publish_date" : "2015-02-07", "num_reviews": 20, "publisher": "oreilly" }
{ "index": { "_id": 2 }}
{ "title": "Taming Text: How to Find, Organize, and Manipulate It", "authors": ["grant ingersoll", "thomas morton", "drew farris"], "summary" : "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization", "publish_date" : "2013-01-24", "num_reviews": 12, "publisher": "manning" }
{ "index": { "_id": 3 }}
{ "title": "Elasticsearch in Action", "authors": ["radu gheorge", "matthew lee hinman", "roy russo"], "summary" : "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms", "publish_date" : "2015-12-03", "num_reviews": 18, "publisher": "manning" }
{ "index": { "_id": 4 }}
{ "title": "Solr in Action", "authors": ["trey grainger", "timothy potter"], "summary" : "Comprehensive guide to implementing a scalable search engine using Apache Solr", "publish_date" : "2014-04-05", "num_reviews": 23, "publisher": "manning" }
实例
基本匹配查询
有两种执行基本全文(匹配)查询的方法:使用 Search Lite API,它希望所有搜索参数都作为 URL 的一部分传入,或者使用完整的 JSON 请求正文,它允许您使用完整的 Elasticsearch DSL。
这是一个在所有字段中搜索字符串“guide”的基本匹配查询:
GET /bookdb_index/book/_search?q=guide
[Results]
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "4",
"_score": 1.3278645,
"_source": {
"title": "Solr in Action",
"authors": [
"trey grainger",
"timothy potter"
],
"summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
"publish_date": "2014-04-05",
"num_reviews": 23,
"publisher": "manning"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "1",
"_score": 1.2871116,
"_source": {
"title": "Elasticsearch: The Definitive Guide",
"authors": [
"clinton gormley",
"zachary tong"
],
"summary": "A distibuted real-time search and analytics engine",
"publish_date": "2015-02-07",
"num_reviews": 20,
"publisher": "oreilly"
}
}
]
该查询的完整体版本如下所示,产生的结果与上述搜索精简版相同:
{
"query": {
"multi_match" : {
"query" : "guide",
"fields" : ["title", "authors", "summary", "publish_date", "num_reviews", "publisher"]
}
}
}
multi_match
关键字用于代替 match
关键字,作为对多个字段运行相同查询的便捷简写方式。 该 fields
属性指定要查询的字段,在这种情况下,我们要查询文档中的所有字段。
注意:在 ElasticSearch 6 之前,您可以使用“ _all
”字段在所有字段中查找匹配项,而不必指定每个字段。" _all
" 字段的工作原理是将所有字段连接成一个大字段,使用空格作为分隔符,然后对该字段进行分析和索引。在 ES6 中,默认情况下已弃用并禁用此功能。copy_to
如果您有兴趣创建自定义“ _all
”字段, ES6 提供了“ ”参数。有关详细信息,请参阅ElasticSearch 指南。
SearchLite API 还允许您指定要搜索的字段。例如,要搜索标题字段中带有“in Action”字样的书籍:
GET /bookdb_index/book/_search?q=title:in action
[Results]
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "3",
"_score": 1.6323128,
"_source": {
"title": "Elasticsearch in Action",
"authors": [
"radu gheorge",
"matthew lee hinman",
"roy russo"
],
"summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
"publish_date": "2015-12-03",
"num_reviews": 18,
"publisher": "manning"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "4",
"_score": 1.6323128,
"_source": {
"title": "Solr in Action",
"authors": [
"trey grainger",
"timothy potter"
],
"summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
"publish_date": "2014-04-05",
"num_reviews": 23,
"publisher": "manning"
}
}
]
但是,完整的 DSL 在创建更复杂的查询(我们将在后面看到)和指定如何返回结果方面为您提供了更大的灵活性。在下面的示例中,我们指定了我们想要返回的结果的数量、开始的偏移量(用于分页)、我们想要返回的文档字段以及术语突出显示。请注意,我们使用“ match
”查询而不是“ multi_match
”查询,因为我们只关心在标题字段中进行搜索。
POST /bookdb_index/book/_search
{
"query": {
"match" : {
"title" : "in action"
}
},
"size": 2,
"from": 0,
"_source": [ "title", "summary", "publish_date" ],
"highlight": {
"fields" : {
"title" : {}
}
}
}
[Results]
"hits": {
"total": 2,
"max_score": 1.6323128,
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "3",
"_score": 1.6323128,
"_source": {
"summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
"title": "Elasticsearch in Action",
"publish_date": "2015-12-03"
},
"highlight": {
"title": [
"Elasticsearch <em>in</em> <em>Action</em>"
]
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "4",
"_score": 1.6323128,
"_source": {
"summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
"title": "Solr in Action",
"publish_date": "2014-04-05"
},
"highlight": {
"title": [
"Solr <em>in</em> <em>Action</em>"
]
}
}
]
注意:对于多词查询, match
查询允许您指定是否使用 and
运算符而不是默认 or
运算符。您还可以指定 minimum_should_match
选项来调整返回结果的相关性。详细信息可以在 Elasticsearch 指南中找到。
提升
由于我们正在跨多个领域进行搜索,因此我们可能希望提高某个领域的分数。在下面的人为示例中,我们将摘要字段的分数提高了 3 倍,以增加摘要字段的重要性,这反过来又会增加文档的相关性 _id 4
。
POST /bookdb_index/book/_search
{
"query": {
"multi_match" : {
"query" : "elasticsearch guide",
"fields": ["title", "summary^3"]
}
},
"_source": ["title", "summary", "publish_date"]
}
[Results]
"hits": {
"total": 3,
"max_score": 3.9835935,
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "4",
"_score": 3.9835935,
"_source": {
"summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
"title": "Solr in Action",
"publish_date": "2014-04-05"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "3",
"_score": 3.1001682,
"_source": {
"summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
"title": "Elasticsearch in Action",
"publish_date": "2015-12-03"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "1",
"_score": 2.0281231,
"_source": {
"summary": "A distibuted real-time search and analytics engine",
"title": "Elasticsearch: The Definitive Guide",
"publish_date": "2015-02-07"
}
}
]
注意:提升不仅仅意味着计算的分数乘以提升因子。应用的实际提升值经过标准化和一些内部优化。有关提升工作原理的更多信息,请参阅Elasticsearch 指南。
布尔查询
AND/OR/NOT 运算符可用于微调我们的搜索查询,以提供更相关或更具体的结果。这是在搜索 API 中作为 bool
查询实现的。该 bool
查询接受一个 must
参数(相当于 AND)、一个 must_not
参数(相当于 NOT)和一个 should
参数(相当于 OR)。例如,如果我想搜索标题中包含“Elasticsearch”或“Solr”一词的书,并且作者是“clinton gormley”,但不是“radu gheorge”:
POST /bookdb_index/book/_search
{
"query": {
"bool": {
"must": {
"bool" : {
"should": [
{ "match": { "title": "Elasticsearch" }},
{ "match": { "title": "Solr" }}
],
"must": { "match": { "authors": "clinton gormely" }}
}
},
"must_not": { "match": {"authors": "radu gheorge" }}
}
}
}
[Results]
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "1",
"_score": 2.0749094,
"_source": {
"title": "Elasticsearch: The Definitive Guide",
"authors": [
"clinton gormley",
"zachary tong"
],
"summary": "A distibuted real-time search and analytics engine",
"publish_date": "2015-02-07",
"num_reviews": 20,
"publisher": "oreilly"
}
}
]
注意:如您所见,bool 查询可以包装任何其他查询类型,包括其他 bool 查询,以创建任意复杂或深度嵌套的查询。
模糊查询
可以在 Match 和 Multi-Match 查询上启用模糊匹配以捕获拼写错误。模糊程度是根据与原始单词的Levenshtein 距离来指定的,即需要对一个字符串进行单个字符更改以使其与另一个字符串相同的次数。
POST /bookdb_index/book/_search
{
"query": {
"multi_match" : {
"query" : "comprihensiv guide",
"fields": ["title", "summary"],
"fuzziness": "AUTO"
}
},
"_source": ["title", "summary", "publish_date"],
"size": 1
}
[Results]
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "4",
"_score": 2.4344182,
"_source": {
"summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
"title": "Solr in Action",
"publish_date": "2014-04-05"
}
}
]
注意: 您可以指定数字 0、1 或 2 而不是指定“AUTO”,以指示可以对字符串进行的最大编辑次数以找到匹配项。使用“AUTO”的好处是它考虑了字符串的长度。对于只有 3 个字符长的字符串,允许模糊度为 2 将导致搜索性能不佳。因此,建议在大多数情况下坚持“自动”。
通配符查询
通配符查询允许您指定要匹配的模式而不是整个术语。 ?
匹配任何字符并 *
匹配零个或多个字符。例如,要查找作者姓名以字母“t”开头的所有记录:
POST /bookdb_index/book/_search
{
"query": {
"wildcard" : {
"authors" : "t*"
}
},
"_source": ["title", "authors"],
"highlight": {
"fields" : {
"authors" : {}
}
}
}
[Results]
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "1",
"_score": 1,
"_source": {
"title": "Elasticsearch: The Definitive Guide",
"authors": [
"clinton gormley",
"zachary tong"
]
},
"highlight": {
"authors": [
"zachary <em>tong</em>"
]
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "2",
"_score": 1,
"_source": {
"title": "Taming Text: How to Find, Organize, and Manipulate It",
"authors": [
"grant ingersoll",
"thomas morton",
"drew farris"
]
},
"highlight": {
"authors": [
"<em>thomas</em> morton"
]
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "4",
"_score": 1,
"_source": {
"title": "Solr in Action",
"authors": [
"trey grainger",
"timothy potter"
]
},
"highlight": {
"authors": [
"<em>trey</em> grainger",
"<em>timothy</em> potter"
]
}
}
]
正则表达式查询
正则表达式查询允许您指定比通配符查询更复杂的模式。
POST /bookdb_index/book/_search
{
"query": {
"regexp" : {
"authors" : "t[a-z]*y"
}
},
"_source": ["title", "authors"],
"highlight": {
"fields" : {
"authors" : {}
}
}
}
[Results]
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "4",
"_score": 1,
"_source": {
"title": "Solr in Action",
"authors": [
"trey grainger",
"timothy potter"
]
},
"highlight": {
"authors": [
"<em>trey</em> grainger",
"<em>timothy</em> potter"
]
}
}
]
匹配词组查询
匹配短语查询要求查询字符串中的所有术语都存在于文档中,按照查询字符串中指定的顺序并且彼此靠近。默认情况下,这些术语需要彼此完全相邻,但您可以指定一个值,该 slop
值指示在仍将文档视为匹配项的同时允许术语相距多远。
POST /bookdb_index/book/_search
{
"query": {
"multi_match" : {
"query": "search engine",
"fields": ["title", "summary"],
"type": "phrase",
"slop": 3
}
},
"_source": [ "title", "summary", "publish_date" ]
}
[Results]
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "4",
"_score": 0.22327082,
"_source": {
"summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
"title": "Solr in Action",
"publish_date": "2014-04-05"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "1",
"_score": 0.16113183,
"_source": {
"summary": "A distibuted real-time search and analytics engine",
"title": "Elasticsearch: The Definitive Guide",
"publish_date": "2015-02-07"
}
}
]
注意:在上面的例子中,对于一个非短语类型的查询,文档 _id 1
通常会有更高的分数并且出现在文档之前, _id 4
因为它的字段长度更短。然而,作为一个短语查询,术语的接近度被考虑在内,因此文档 _id 4
得分更好。
注意:还要注意,如果 slop 参数减少到 1 个文档 _id 1
将不再出现在结果集中。
匹配短语前缀
匹配短语前缀查询在查询时提供“即用型搜索”或“穷人版本”的自动完成功能,无需以任何方式准备数据。像 match_phrase
查询一样,它接受一个 slop
参数来使词序和相对位置不那么严格。它还接受 max_expansions
参数来限制匹配的术语数量,以降低资源强度。
POST /bookdb_index/book/_search
{
"query": {
"match_phrase_prefix" : {
"summary": {
"query": "search en",
"slop": 3,
"max_expansions": 10
}
}
},
"_source": [ "title", "summary", "publish_date" ]
}
[Results]
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "4",
"_score": 0.5161346,
"_source": {
"summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
"title": "Solr in Action",
"publish_date": "2014-04-05"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "1",
"_score": 0.37248808,
"_source": {
"summary": "A distibuted real-time search and analytics engine",
"title": "Elasticsearch: The Definitive Guide",
"publish_date": "2015-02-07"
}
}
]
注意:查询时搜索即键入会产生性能成本。更好的解决方案是索引时搜索,即键入。查看Completion Suggester API或使用Edge-Ngram 过滤器了解更多信息。
请求参数
该 query_string
查询提供了一种以简洁的速记语法执行 multi_match
查询、布尔查询、提升、模糊匹配、通配符、正则表达式和范围查询的方法。在下面的示例中,我们对术语“搜索算法”执行模糊搜索,其中书籍作者之一是“grant ingersoll”或“tom morton”。我们搜索所有字段,但对汇总字段应用 2 的提升。
POST /bookdb_index/book/_search
{
"query": {
"query_string" : {
"query": "(saerch~1 algorithm~1) AND (grant ingersoll) OR (tom morton)",
"fields": ["title", "authors" , "summary^2"]
}
},
"_source": [ "title", "summary", "authors" ],
"highlight": {
"fields" : {
"summary" : {}
}
}
}
[Results]
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "2",
"_score": 3.571021,
"_source": {
"summary": "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization",
"title": "Taming Text: How to Find, Organize, and Manipulate It",
"authors": [
"grant ingersoll",
"thomas morton",
"drew farris"
]
},
"highlight": {
"summary": [
"organize text using approaches such as full-text <em>search</em>, proper name recognition, clustering, tagging"
]
}
}
]
简单查询字符串
查询是更适合在向用户公开的单个搜索框中使用的查询版本,因为它将 AND/OR/NOT 的使用分别替换为 +/|/-,并丢弃了无效 simple_query_string
部分 query_string
如果用户出错,则查询而不是抛出异常。
POST /bookdb_index/book/_search
{
"query": {
"simple_query_string" : {
"query": "(saerch~1 algorithm~1) + (grant ingersoll) | (tom morton)",
"fields": ["title", "authors" , "summary^2"]
}
},
"_source": [ "title", "summary", "authors" ],
"highlight": {
"fields" : {
"summary" : {}
}
}
}
名词/术语查询
上面的例子都是全文搜索的例子。有时我们对结构化搜索更感兴趣,我们希望在其中找到完全匹配并返回结果。term
和 查询在 terms
这里帮助我们。在下面的示例中,我们正在搜索由 Manning Publications 出版的索引中的所有书籍。
POST /bookdb_index/book/_search
{
"query": {
"term" : {
"publisher": "manning"
}
},
"_source" : ["title","publish_date","publisher"]
}
[Results]
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "2",
"_score": 1.2231436,
"_source": {
"publisher": "manning",
"title": "Taming Text: How to Find, Organize, and Manipulate It",
"publish_date": "2013-01-24"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "3",
"_score": 1.2231436,
"_source": {
"publisher": "manning",
"title": "Elasticsearch in Action",
"publish_date": "2015-12-03"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "4",
"_score": 1.2231436,
"_source": {
"publisher": "manning",
"title": "Solr in Action",
"publish_date": "2014-04-05"
}
}
]
可以通过使用 terms
关键字并传入搜索词数组来指定多个词。
{
"query": {
"terms" : {
"publisher": ["oreilly", "packt"]
}
}
}
术语查询 - 排序
术语查询结果(与任何其他查询结果一样)可以轻松排序。也允许多级排序。
POST /bookdb_index/book/_search
{
"query": {
"term" : {
"publisher": "manning"
}
},
"_source" : ["title","publish_date","publisher"],
"sort": [
{ "publish_date": {"order":"desc"}}
]
}
[Results]
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "3",
"_score": null,
"_source": {
"publisher": "manning",
"title": "Elasticsearch in Action",
"publish_date": "2015-12-03"
},
"sort": [
1449100800000
]
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "4",
"_score": null,
"_source": {
"publisher": "manning",
"title": "Solr in Action",
"publish_date": "2014-04-05"
},
"sort": [
1396656000000
]
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "2",
"_score": null,
"_source": {
"publisher": "manning",
"title": "Taming Text: How to Find, Organize, and Manipulate It",
"publish_date": "2013-01-24"
},
"sort": [
1358985600000
]
}
]
注意:在 ES6 中,要按文本字段(例如标题)进行排序或聚合,您需要在该字段上启用 fielddata。有关这方面的更多详细信息,请参阅ElasticSearch 指南。
范围查询
另一个结构化查询示例是范围查询。在此示例中,我们搜索 2015 年出版的书籍。
POST /bookdb_index/book/_search
{
"query": {
"range" : {
"publish_date": {
"gte": "2015-01-01",
"lte": "2015-12-31"
}
}
},
"_source" : ["title","publish_date","publisher"]
}
[Results]
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "1",
"_score": 1,
"_source": {
"publisher": "oreilly",
"title": "Elasticsearch: The Definitive Guide",
"publish_date": "2015-02-07"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "3",
"_score": 1,
"_source": {
"publisher": "manning",
"title": "Elasticsearch in Action",
"publish_date": "2015-12-03"
}
}
]
注意:范围查询适用于日期、数字和字符串类型字段。
过滤布尔查询
使用 bool 查询时,您可以使用 filter 子句来过滤查询结果。在我们的示例中,我们正在查询标题或摘要中包含“Elasticsearch”一词的书籍,但我们希望将结果过滤为仅包含 20 条或更多评论的书籍。
POST /bookdb_index/book/_search
{
"query": {
"filtered": {
"query" : {
"multi_match": {
"query": "elasticsearch",
"fields": ["title","summary"]
}
},
"filter": {
"range" : {
"num_reviews": {
"gte": 20
}
}
}
}
},
"_source" : ["title","summary","publisher", "num_reviews"]
}
[Results]
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "1",
"_score": 0.5955761,
"_source": {
"summary": "A distibuted real-time search and analytics engine",
"publisher": "oreilly",
"num_reviews": 20,
"title": "Elasticsearch: The Definitive Guide"
}
}
]
通过使用过滤器可以组合多个过滤 bool
器。在下一个示例中,过滤器确定返回的结果必须至少有 20 条评论,不得在 2015 年之前发布,并且应该由 O'Reilly 发布。
POST /bookdb_index/book/_search
{
"query": {
"filtered": {
"query" : {
"multi_match": {
"query": "elasticsearch",
"fields": ["title","summary"]
}
},
"filter": {
"bool": {
"must": {
"range" : { "num_reviews": { "gte": 20 } }
},
"must_not": {
"range" : { "publish_date": { "lte": "2014-12-31" } }
},
"should": {
"term": { "publisher": "oreilly" }
}
}
}
}
},
"_source" : ["title","summary","publisher", "num_reviews", "publish_date"]
}
[Results]
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "1",
"_score": 0.5955761,
"_source": {
"summary": "A distibuted real-time search and analytics engine",
"publisher": "oreilly",
"num_reviews": 20,
"title": "Elasticsearch: The Definitive Guide",
"publish_date": "2015-02-07"
}
}
]
功能得分:字段值因子
在某些情况下,您可能希望将文档中特定字段的值考虑到相关性分数的计算中。这在您希望根据其受欢迎程度提高文档相关性的情况下很典型。在我们的示例中,我们希望提升更受欢迎的书籍(根据评论数量来判断)。这可以使用 field_value_factor
函数 score。
POST /bookdb_index/book/_search
{
"query": {
"function_score": {
"query": {
"multi_match" : {
"query" : "search engine",
"fields": ["title", "summary"]
}
},
"field_value_factor": {
"field" : "num_reviews",
"modifier": "log1p",
"factor" : 2
}
}
},
"_source": ["title", "summary", "publish_date", "num_reviews"]
}
[Results]
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "1",
"_score": 0.44831306,
"_source": {
"summary": "A distibuted real-time search and analytics engine",
"num_reviews": 20,
"title": "Elasticsearch: The Definitive Guide",
"publish_date": "2015-02-07"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "4",
"_score": 0.3718407,
"_source": {
"summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
"num_reviews": 23,
"title": "Solr in Action",
"publish_date": "2014-04-05"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "3",
"_score": 0.046479136,
"_source": {
"summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
"num_reviews": 18,
"title": "Elasticsearch in Action",
"publish_date": "2015-12-03"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "2",
"_score": 0.041432835,
"_source": {
"summary": "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization",
"num_reviews": 12,
"title": "Taming Text: How to Find, Organize, and Manipulate It",
"publish_date": "2013-01-24"
}
}
]
注意 1:我们可以只运行常规 multi_match
查询并按 num_reviews
字段排序,但随后我们失去了相关性评分的好处。
注意 2:还有许多附加参数可以调整对原始相关性分数的提升效果,例如“修饰符”、“因子”、“提升模式”等。这些在Elasticsearch 指南中进行了详细探讨。
函数得分:衰减函数
假设您不想通过字段的值递增地提升,而是有一个想要定位的理想值,并且您希望提升因子随着您远离该值而衰减。这通常在基于纬度/经度、价格或日期等数字字段的提升中很有用。在我们设计的示例中,我们正在搜索最好在 2014 年 6 月左右出版的关于“搜索引擎”的书籍。
POST /bookdb_index/book/_search
{
"query": {
"function_score": {
"query": {
"multi_match" : {
"query" : "search engine",
"fields": ["title", "summary"]
}
},
"functions": [
{
"exp": {
"publish_date" : {
"origin": "2014-06-15",
"offset": "7d",
"scale" : "30d"
}
}
}
],
"boost_mode" : "replace"
}
},
"_source": ["title", "summary", "publish_date", "num_reviews"]
}
[Results]
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "4",
"_score": 0.27420625,
"_source": {
"summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
"num_reviews": 23,
"title": "Solr in Action",
"publish_date": "2014-04-05"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "1",
"_score": 0.005920768,
"_source": {
"summary": "A distibuted real-time search and analytics engine",
"num_reviews": 20,
"title": "Elasticsearch: The Definitive Guide",
"publish_date": "2015-02-07"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "2",
"_score": 0.000011564,
"_source": {
"summary": "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization",
"num_reviews": 12,
"title": "Taming Text: How to Find, Organize, and Manipulate It",
"publish_date": "2013-01-24"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "3",
"_score": 0.0000059171475,
"_source": {
"summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
"num_reviews": 18,
"title": "Elasticsearch in Action",
"publish_date": "2015-12-03"
}
}
]
功能评分:脚本评分
如果内置评分功能不能满足您的需求,可以选择指定一个 Groovy 脚本用于评分。在我们的示例中,我们希望指定一个脚本, publish_date
在决定将多少因素考虑到评论数量之前将其考虑在内。较新的书籍可能还没有那么多评论,所以他们不应该因此受到惩罚。
评分脚本如下所示:
publish_date = doc['publish_date'].value
num_reviews = doc['num_reviews'].value
if (publish_date > Date.parse('yyyy-MM-dd', threshold).getTime()) {
my_score = Math.log(2.5 + num_reviews)
} else {
my_score = Math.log(1 + num_reviews)
}
return my_score
要动态使用评分脚本,我们使用以下 script_score
参数:
POST /bookdb_index/book/_search
{
"query": {
"function_score": {
"query": {
"multi_match" : {
"query" : "search engine",
"fields": ["title", "summary"]
}
},
"functions": [
{
"script_score": {
"params" : {
"threshold": "2015-07-30"
},
"script": "publish_date = doc['publish_date'].value; num_reviews = doc['num_reviews'].value; if (publish_date > Date.parse('yyyy-MM-dd', threshold).getTime()) { return log(2.5 + num_reviews) }; return log(1 + num_reviews);"
}
}
]
}
},
"_source": ["title", "summary", "publish_date", "num_reviews"]
}
[Results]
"hits": {
"total": 4,
"max_score": 0.8463001,
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "1",
"_score": 0.8463001,
"_source": {
"summary": "A distibuted real-time search and analytics engine",
"num_reviews": 20,
"title": "Elasticsearch: The Definitive Guide",
"publish_date": "2015-02-07"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "4",
"_score": 0.7067348,
"_source": {
"summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
"num_reviews": 23,
"title": "Solr in Action",
"publish_date": "2014-04-05"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "3",
"_score": 0.08952084,
"_source": {
"summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
"num_reviews": 18,
"title": "Elasticsearch in Action",
"publish_date": "2015-12-03"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "2",
"_score": 0.07602123,
"_source": {
"summary": "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization",
"num_reviews": 12,
"title": "Taming Text: How to Find, Organize, and Manipulate It",
"publish_date": "2013-01-24"
}
}
]
}
注意 1:要使用动态脚本,必须在 config/elasticsearch.yaml
文件中为您的 Elasticsearch 实例启用它。也可以使用存储在 Elasticsearch 服务器上的脚本。查看Elasticsearch 参考文档了解更多信息。
注意 2: JSON 不能包含嵌入的换行符,因此分号用于分隔语句。
推荐阅读:《深入理解Elasticsearch》
推荐理由:资深软件开发专家、架构师撰写,从设计原理、部署调优、高级特性、扩展开发等方面助你全面进阶。
书讯 |8月书讯(上) | 重磅新书来袭!书讯 |8月书讯(下) | 重磅新书来袭!资讯 |《Java核心技术》基于Java 17全面升级!干货 |再见了Java8,Java17:我要取代你干货 | 李三红:Java版本升级需要纳入到可持续性维度
干货 |市面上的大前端岗位到底是做什么的?