龙空技术网

ElasticSearch 查询

RendaZhang 148

前言:

此时小伙伴们对“es查询索引下的数据”可能比较注意,朋友们都需要知道一些“es查询索引下的数据”的相关知识。那么小编在网络上汇集了一些对于“es查询索引下的数据””的相关知识,希望朋友们能喜欢,大家快快来了解一下吧!

基本查询

基本语法:

GET /索引库名/_search{    "query":{        "查询类型":{            "查询条件":"查询条件值"        }    }}

这里的 query 代表一个查询对象,里面可以有不同的查询属性。

查询类型:match_all、match、term、range 等等。

查询条件:查询条件会根据类型的不同,写法也有差异。

查询所有(match_all)

示例:

GET /renda/_search{    "query":{        "match_all": {}    }}

query:代表查询对象。

match_all:代表查询所有。

结果:

{  "took": 1,  "timed_out": false,  "_shards": {    "total": 5,    "successful": 5,    "skipped": 0,    "failed": 0  },  "hits": {    "total": 3,    "max_score": 1,    "hits": [      {        "_index": "renda",        "_type": "goods",        "_id": "2",        "_score": 1,        "_source": {          "title": "白米手机",          "images": ";,          "price": 2699        }      },      {        "_index": "renda",        "_type": "goods",        "_id": "gPeQqHUB-UTJAEEuqOm9",        "_score": 1,        "_source": {          "title": "小米手机",          "images": ";,          "price": 2699        }      },      {        "_index": "renda",        "_type": "goods",        "_id": "3",        "_score": 1,        "_source": {          "title": "超大米手机",          "images": ";,          "price": 3299,          "stock": 200,          "saleable": true,          "subTitle": "大米"        }      }    ]  }}

结果解析:

took - 检索所耗费的时间,单位是毫秒。timed_out - 是否超时。​_shards - 分片信息。​hits - 命中结果,检索结果信息。  total - 搜索到的总条数。  max_score - 所有结果中文档得分的最高分。  hits - 搜索结果的文档对象数组,每个元素是一条搜索到的文档信息。    _index - 索引库。    _type - 文档类型。    _id - 文档 id。    _score - 评分;索引库的一个概念;关联度。    _source - 原始数据。

文档得分:使用 ES 时,对于查询出的文档无疑会有文档相似度之别;而理想的排序是和查询条件相关性越高排序越靠前,而这个排序的依据就是 _score

匹配查询(match)

加入一条数据用于测试:

PUT /renda/goods/3{    "title": "小米电视4A",    "images": ";,    "price": 3899.00}

索引库中有 3 部手机,1 台电视。

match 类型查询,会把查询条件进行分词,然后进行查询,多个词条之间是 or 的关系:

GET /renda/_search{  "query": {    "match": {      "title": "小米电视"    }  }}

响应结果:

{  "took": 15,  "timed_out": false,  "_shards": {    "total": 5,    "successful": 5,    "skipped": 0,    "failed": 0  },  "hits": {    "total": 2,    "max_score": 0.5753642,    "hits": [      {        "_index": "renda",        "_type": "goods",        "_id": "3",        "_score": 0.5753642,        "_source": {          "title": "小米电视4A",          "images": ";,          "price": 3899        }      },      {        "_index": "renda",        "_type": "goods",        "_id": "gPeQqHUB-UTJAEEuqOm9",        "_score": 0.2876821,        "_source": {          "title": "小米手机",          "images": ";,          "price": 2699        }      }    ]  }}

在上面的案例中,不仅会查询到电视,而且与小米相关的都会查询到,多个词之间是 or 的关系。

某些情况下,需要更精确查找,即 and 关系。比如在电商平台精确搜索商品时,希望这个关系(查询条件切分词之后的关系)变成 and,可以这样做:

GET /renda/_search{  "query": {    "match": {      "title": {        "query": "小米电视",        "operator": "and"      }    }  }}

响应结果:

{  "took": 8,  "timed_out": false,  "_shards": {    "total": 5,    "successful": 5,    "skipped": 0,    "failed": 0  },  "hits": {    "total": 1,    "max_score": 0.5753642,    "hits": [      {        "_index": "renda",        "_type": "goods",        "_id": "3",        "_score": 0.5753642,        "_source": {          "title": "小米电视4A",          "images": ";,          "price": 3899        }      }    ]  }}

此时,只有同时包含小米和电视的词条才会被搜索到。

词条匹配(term)

term 查询被用于精确值匹配,这些精确值可能是数字、时间、布尔,或者那些未分词的字符串、keyword 类型的字符串。

效果类似于:select * from tableName where colName='value';

GET /renda/_search{    "query":{        "term":{            "price": 2699.00        }    }}

响应结果:

{  "took": 6,  "timed_out": false,  "_shards": {    "total": 5,    "successful": 5,    "skipped": 0,    "failed": 0  },  "hits": {    "total": 2,    "max_score": 1,    "hits": [      {        "_index": "renda",        "_type": "goods",        "_id": "2",        "_score": 1,        "_source": {          "title": "白米手机",          "images": ";,          "price": 2699        }      },      {        "_index": "renda",        "_type": "goods",        "_id": "gPeQqHUB-UTJAEEuqOm9",        "_score": 1,        "_source": {          "title": "小米手机",          "images": ";,          "price": 2699        }      }    ]  }}
布尔组合(bool)

bool 把各种其它查询通过 must - 与must_not - should - 或 的方式进行组合。

GET /renda/_search{    "query":{        "bool":{            "must": {              "match": {                "title": "小米"              }            },            "must_not": {              "match": {                "title": "电视"              }            },            "should": {              "match": {                "title": "手机"              }            }        }    }}

响应结果:

{  "took": 2,  "timed_out": false,  "_shards": {    "total": 5,    "successful": 5,    "skipped": 0,    "failed": 0  },  "hits": {    "total": 1,    "max_score": 0.5753642,    "hits": [      {        "_index": "renda",        "_type": "goods",        "_id": "gPeQqHUB-UTJAEEuqOm9",        "_score": 0.5753642,        "_source": {          "title": "小米手机",          "images": ";,          "price": 2699        }      }    ]  }}
范围查询(range)

range 查询找出那些落在指定区间内的数字或者时间。

GET /renda/_search{    "query": {        "range": {            "price": {                "gte": 3000.0,                "lt": 4000.00            }        }    }}

响应结果:

{  "took": 1,  "timed_out": false,  "_shards": {    "total": 5,    "successful": 5,    "skipped": 0,    "failed": 0  },  "hits": {    "total": 1,    "max_score": 1,    "hits": [      {        "_index": "renda",        "_type": "goods",        "_id": "3",        "_score": 1,        "_source": {          "title": "小米电视4A",          "images": ";,          "price": 3899        }      }    ]  }}

range 查询允许以下字符:

gt - 大于gte - 大于等于lt - 小于lte - 小于等于模糊查询(fuzzy)

fuzzy 查询是 term 查询的模糊等价,很少直接使用它。

新增一个商品:

POST /renda/goods/5{    "title": "Apple手机",    "images": ";,    "price": 6899.00}

响应结果:

{  "_index": "renda",  "_type": "goods",  "_id": "5",  "_version": 1,  "result": "created",  "_shards": {    "total": 2,    "successful": 1,    "failed": 0  },  "_seq_no": 0,  "_primary_term": 2}

fuzzy 查询是 term 查询的模糊等价,它允许用户搜索词条与实际词条的拼写出现偏差,但是偏差的编辑距离不得超过 2

GET /renda/_search{    "query": {        "fuzzy": {            "title": "applas"        }    }}

上面的查询,也能查询到 apple 手机:

{  "took": 4,  "timed_out": false,  "_shards": {    "total": 5,    "successful": 5,    "skipped": 0,    "failed": 0  },  "hits": {    "total": 1,    "max_score": 0.17260925,    "hits": [      {        "_index": "renda",        "_type": "goods",        "_id": "5",        "_score": 0.17260925,        "_source": {          "title": "Apple手机",          "images": ";,          "price": 6899        }      }    ]  }}

结果过滤

默认情况下,Elasticsearch 在搜索的结果中,会把文档中保存在 _source 的所有字段都返回。

如果只想获取其中的部分字段,可以添加 _source 的过滤。

直接指定字段

示例:

GET /renda/_search{    "_source": ["title","price"],    "query": {        "term": {            "price": 2699        }    }}

返回的结果:

{  "took": 2,  "timed_out": false,  "_shards": {    "total": 5,    "successful": 5,    "skipped": 0,    "failed": 0  },  "hits": {    "total": 2,    "max_score": 1,    "hits": [      {        "_index": "renda",        "_type": "goods",        "_id": "2",        "_score": 1,        "_source": {          "price": 2699,          "title": "白米手机"        }      },      {        "_index": "renda",        "_type": "goods",        "_id": "gPeQqHUB-UTJAEEuqOm9",        "_score": 1,        "_source": {          "price": 2699,          "title": "小米手机"        }      }    ]  }}
指定 includes 和 excludes

includes:来指定想要显示的字段。

excludes:来指定不想要显示的字段。

二者都是可选的。

示例:

GET /renda/_search{    "_source": {        "includes":["title", "price"]    },    "query": {        "term": {            "price": 2699        }    }}

与下面的结果将是一样的:

GET /renda/_search{    "_source": {        "excludes": ["images"]    },    "query": {        "term": {            "price": 2699        }    }}

响应结果:

{  "took": 1,  "timed_out": false,  "_shards": {    "total": 5,    "successful": 5,    "skipped": 0,    "failed": 0  },  "hits": {    "total": 2,    "max_score": 1,    "hits": [      {        "_index": "renda",        "_type": "goods",        "_id": "2",        "_score": 1,        "_source": {          "price": 2699,          "title": "白米手机"        }      },      {        "_index": "renda",        "_type": "goods",        "_id": "gPeQqHUB-UTJAEEuqOm9",        "_score": 1,        "_source": {          "price": 2699,          "title": "小米手机"        }      }    ]  }}

过滤(Filter)

Elasticsearch 使用的查询语言(DSL)拥有一套查询组件,这些组件可以以无限组合的方式进行搭配。

这套组件可以在以下两种情况下使用:过滤情况 - filtering context 和查询情况 - query context。

如何选择查询与过滤:

通常的规则是,使用查询(query)语句来进行全文搜索或者其它任何需要影响相关性得分的搜索;除此以外的情况都使用过滤(filters)。

条件查询中进行过滤:

所有的查询都会影响到文档的评分及排名。如果需要在查询结果中进行过滤,并且不希望过滤条件影响评分,那么就不要把过滤条件作为查询条件来用,而是使用 filter 方式:

GET /renda/_search{    "query":{        "bool":{            "must":{ "match": { "title": "小米手机" }},            "filter":{                "range":{"price":{"gt":2000.00,"lt":3800.00}}            }        }    }}

无查询条件,直接过滤:

如果一次查询只有过滤,没有查询条件,不希望进行评分,可以使用 constant_score 取代只有 filter 语句的 bool 查询。在性能上是完全相同的,但对于提高查询简洁性和清晰度有很大帮助。

GET /renda/_search{    "query":{        "constant_score": {            "filter": {                "range": {"price":{"gt":2000.00, "lt":3000.00}}            }        }    }}

排序单字段排序

sort 可以按照不同的字段进行排序,并且通过 order 指定排序的方式。

GET /renda/_search{    "query": {        "match": {            "title": "小米手机"        }    },    "sort": [        {            "price": {                "order": "desc"            }        }    ]}
多字段排序

假定想要结合使用 price_score 进行查询,并且匹配的结果首先按照价格排序,然后按照相关性得分排序:

GET /renda/_search{    "query":    {        "bool":        {            "must":            {              "match":              {                "title": "小米手机"              }            },            "filter":{                "range":                {                  "price":                  {                    "gt":2000,                    "lt":4000                  }                }            }        }    },    "sort": [        {          "price":          {            "order": "desc"          }        },        {          "_score":          {            "order": "desc"          }                  }    ]}

分页

Elasticsearch 中数据都存储在分片中,当执行搜索时每个分片独立搜索后,数据再经过整合返回。那么,如何实现分页查询呢?

Elasticsearch 的分页与 MySQL 数据库非常相似,都是指定两个值:

from - 目标数据的偏移值(开始位置),默认 from 为 0。size - 每页大小。

GET /renda/_search{    "query": {        "match_all": {}    },    "sort": [        {            "price": {                "order": "asc"            }        }    ],    "from": 3,    "size": 3}

结果:

{  "took": 1,  "timed_out": false,  "_shards": {    "total": 5,    "successful": 5,    "skipped": 0,    "failed": 0  },  "hits": {    "total": 4,    "max_score": null,    "hits": [      {        "_index": "renda",        "_type": "goods",        "_id": "5",        "_score": null,        "_source": {          "title": "Apple手机",          "images": ";,          "price": 6899        },        "sort": [          6899        ]      }    ]  }}

高亮

高亮原理:

服务端搜索数据,得到搜索结果。把搜索结果中,搜索关键字都加上约定好的标签。前端页面提前写好标签的 CSS 样式,即可高亮。

Elasticsearch 中实现高亮的语法比较简单:

GET /renda/_search{    "query": {        "match": {            "title": "手机"        }    },    "highlight": {        "pre_tags": "<em>",        "post_tags": "</em>",         "fields": {            "title": {}        }    }}

在使用 match 查询的同时,加上一个 highlight 属性:

pre_tags:前置标签post_tags:后置标签fields:需要高亮的字段  title:这里声明 title 字段需要高亮

结果:

{  "took": 2,  "timed_out": false,  "_shards": {    "total": 5,    "successful": 5,    "skipped": 0,    "failed": 0  },  "hits": {    "total": 3,    "max_score": 0.2876821,    "hits": [      {        "_index": "renda",        "_type": "goods",        "_id": "5",        "_score": 0.2876821,        "_source": {          "title": "Apple手机",          "images": ";,          "price": 6899        },        "highlight": {          "title": [            "Apple<em>手机</em>"          ]        }      },      {        "_index": "renda",        "_type": "goods",        "_id": "2",        "_score": 0.2876821,        "_source": {          "title": "白米手机",          "images": ";,          "price": 2699        },        "highlight": {          "title": [            "白米<em>手机</em>"          ]        }      },      {        "_index": "renda",        "_type": "goods",        "_id": "gPeQqHUB-UTJAEEuqOm9",        "_score": 0.2876821,        "_source": {          "title": "小米手机",          "images": ";,          "price": 2699        },        "highlight": {          "title": [            "小米<em>手机</em>"          ]        }      }    ]  }}

想了解更多,欢迎关注我的微信公众号:Renda_Zhang

标签: #es查询索引下的数据