龙空技术网

热血爆肝的万字ElasticSearch入门好文(上)

深漂小小熊 126

前言:

此时大家对“hgroupcss”大体比较讲究,兄弟们都想要剖析一些“hgroupcss”的相关知识。那么小编在网摘上收集了一些关于“hgroupcss””的相关知识,希望兄弟们能喜欢,看官们一起来学习一下吧!

初识elasticsearch

ES是一个基于RESTful web接口并且构建在Apache Lucene之上的开源分布式搜索引擎 同时ES还是一个分布式文档数据库,其中每个字段均可被索引,而且每个字段的数据均可被搜索,能够横向扩展至数以百计的服务器存储以及处理PB级的数据。

可以在极短的时间内存储、搜索和分析大量的数据。通常作为具有复杂搜索场景情况下的核心发动机。

ES就是为高可用和可扩展而生的。一方面可以通过升级硬件来完成系统扩展,称为垂直或向上扩展(Vertical Scale/Scaling Up)。

另一方面,增加更多的服务器来完成系统扩展,称为水平扩展或者向外扩展(Horizontal Scale/Scaling Out)。尽管ES能够利用更强劲的硬件,但是垂直扩展毕竟还是有它的极限。真正的可扩展性来自于水平扩展,通过向集群中添加更多的节点来分担负载,增加可靠性。ES天生就是分布式的,它知道如何管理多个节点来完成扩展和实现高可用性。意味应用不需要做任何的改动。

为什么会这么快,查询倒排索引相关的资料

想深入了解的话查找相关资料,或者到es官网看。

es的几个重要概念索引(index)

索引是映射类型的容器一个ES的索引非常像关系型世界中的数据库,是独立的大量文档集合。
类型(type)
类型是文档的逻辑容器,类似于表格是行的容器。在不同的类型中,最好放入不同的结构的文档。type在es高版本已经弱化了,查询一般用get /index/_doc/id的形式就可以了,新增文档也不用填type
文档(docments)
存储在索引种类型下面的数据
字段 (Fields)
ES中,每个文档,其实是以json形式存储的。而一个文档可以被视为多个字段的集合
与java种字段对比
字段类型概述一级分类	二级分类	具体类型核心类型	字符串类型	text,keyword整数类型	integer,long,short,byte浮点类型	double,float,half_float,scaled_float逻辑类型	boolean日期类型	date范围类型	range二进制类型	binary复合类型	数组类型	array对象类型	object嵌套类型	nested地理类型	地理坐标类型	geo_point地理地图	geo_shape特殊类型	IP类型	ip范围类型	completion令牌计数类型	token_count附件类型	attachment抽取类型	percolator

详细的请查看这里 elasticsearch字段详解

es数据存储模型对比数据库单机部署es和kibana部署

只带过一下windows部署,超级简单,mac和linux差不多,不做过多的展开,下面实战才是重点

配置Java环境变量,由于elasticsearch是Java写的,所以要下载JDK来配置Java环境下载并解压elasticsearch,然后进入bin 目录,双击执行 elasticsearch.bat,在浏览器中输入: ,出现以下页面,说明安装成功:进入Kibana的bin 目录,双击启动 kibana.bat(或者利用终端启动),然后会出现以下页面就证明启动Kibana成功进入kibana操作页面es 语句操作页面java Api 操作es准备工作,集成 elasticsearch-rest-high-level-clientmaven依赖

<dependency>    <groupId>org.elasticsearch.client</groupId>    <artifactId>elasticsearch-rest-high-level-client</artifactId>    <version>7.4.1</version></dependency><dependency>    <groupId>org.elasticsearch</groupId>    <artifactId>elasticsearch</artifactId>    <version>7.4.1</version></dependency>
springboot集成restHighLevelClient
import org.apache.http.HttpHost;import org.elasticsearch.client.RestClient;import org.elasticsearch.client.RestHighLevelClient;import org.springframework.beans.factory.annotation.Value;import org.springframework.context.annotation.Bean;import org.springframework.context.annotation.Configuration;/** * @author xiong * @Date 20/05/07 下午4:11 */@Configurationpublic class EsConfiguration {    @Value("${elastic.host}")    private String host;    @Value("${elastic.port}")    private int port;    @Value("${elastic.username}")    private String userName;    @Value("${elastic.password}")    private String password;    @Bean(destroyMethod = "close")    public RestHighLevelClient restHighLevelClient() {        final CredentialsProvider credentialsProvider = new BasicCredentialsProvider();        credentialsProvider.setCredentials(AuthScope.ANY, new UsernamePasswordCredentials(userName, password));        HttpHost httpHost = new HttpHost(host, port, "http");        RestClientBuilder builder = RestClient.builder(httpHost).setRequestConfigCallback(new RestClientBuilder.RequestConfigCallback() {            @Override            public RequestConfig.Builder customizeRequestConfig(RequestConfig.Builder requestConfigBuilder) {                requestConfigBuilder.setConnectTimeout(2000);                requestConfigBuilder.setSocketTimeout(5000);                requestConfigBuilder.setConnectionRequestTimeout(5000);                return requestConfigBuilder;            }        }).setHttpClientConfigCallback(new RestClientBuilder.HttpClientConfigCallback() {            @Override            public HttpAsyncClientBuilder customizeHttpClient(HttpAsyncClientBuilder httpClientBuilder) {                httpClientBuilder.disableAuthCaching();                return httpClientBuilder.setDefaultCredentialsProvider(credentialsProvider);            }        });        RestHighLevelClient restHighLevelClient = new RestHighLevelClient(builder);        return restHighLevelClient;    }}
application.yml配置ip和端口
elastic:  host: 127.0.0.1  port: 9200
restHighLevelClient操作es
import com.alibaba.fastjson.JSON;import com.google.common.collect.Maps;import com.miya.item.center.dao.EsClientDAO;import lombok.extern.slf4j.Slf4j;import org.elasticsearch.action.admin.indices.create.CreateIndexRequest;import org.elasticsearch.action.admin.indices.create.CreateIndexResponse;import org.elasticsearch.action.admin.indices.get.GetIndexRequest;import org.elasticsearch.action.delete.DeleteRequest;import org.elasticsearch.action.delete.DeleteResponse;import org.elasticsearch.action.index.IndexRequest;import org.elasticsearch.action.index.IndexResponse;import org.elasticsearch.action.search.SearchRequest;import org.elasticsearch.action.search.SearchResponse;import org.elasticsearch.client.RequestOptions;import org.elasticsearch.client.RestHighLevelClient;import org.elasticsearch.common.xcontent.XContentType;import org.elasticsearch.index.query.QueryBuilder;import org.elasticsearch.index.query.QueryBuilders;import org.elasticsearch.search.aggregations.AggregationBuilder;import org.elasticsearch.search.aggregations.AggregationBuilders;import org.elasticsearch.search.aggregations.bucket.terms.ParsedTerms;import org.elasticsearch.search.aggregations.bucket.terms.Terms;import org.elasticsearch.search.aggregations.bucket.terms.TermsAggregationBuilder;import org.elasticsearch.search.builder.SearchSourceBuilder;import org.springframework.stereotype.Repository;import javax.annotation.Resource;import java.util.Map;/** * @author xiong * @Date 20/05/07 下午4:20 */@Slf4j@Repositorypublic class RestHighLevelClientDAOImpl implements EsClientDAO {    @Resource    private RestHighLevelClient restHighLevelClient;    @Override    public IndexResponse insertDocument(String index, String type, Object doc) {        IndexRequest indexRequest = new IndexRequest(index);        //indexRequest.type(type);        indexRequest.source(JSON.toJSONString(doc), XContentType.JSON);        IndexResponse indexResponse = null;        try {            indexResponse = restHighLevelClient.index(indexRequest, RequestOptions.DEFAULT);        } catch (Exception e) {            log.warn("createDocument failure!, parameter index: {}, type:{}, details: {}", index, type, JSON.toJSONString(doc), e);        }        return indexResponse;    }    @Override    public CreateIndexResponse createIndex(String index, String type, Map<String, Object> mapping) {        CreateIndexRequest createIndexRequest = new CreateIndexRequest(index);        createIndexRequest.mapping(type, mapping);        CreateIndexResponse createIndexResponse = null;        try {            createIndexResponse = restHighLevelClient.indices().create(createIndexRequest, RequestOptions.DEFAULT);        } catch (Exception e) {            log.warn("createIndex failure! index: {}, type: {}, mapping: {}", index, type, JSON.toJSONString(mapping), e);        }        return createIndexResponse;    }    @Override    public DeleteResponse deleteIndex(String index) {        DeleteRequest indexRequest = new DeleteRequest(index);        DeleteResponse deleteResponse = null;        try {            deleteResponse = restHighLevelClient.delete(indexRequest, RequestOptions.DEFAULT);        } catch (Exception e) {            log.warn("deleteIndex failure! index: {}", index);        }        return deleteResponse;    }    @Override    public Boolean existsIndex(String index) {        GetIndexRequest request = new GetIndexRequest();        request.indices(index);        boolean exists = false;        try {            exists = restHighLevelClient.indices().exists(request, RequestOptions.DEFAULT);        } catch (Exception e) {            log.warn("existsIndex failure! index: {}", index, e);        }        return exists;    }    @Override    public SearchResponse searchDoc(String index, String type, QueryBuilder queryBuilder, AggregationBuilder aggregationBuilder) {        SearchRequest searchRequest = new SearchRequest(index);        searchRequest.types(type);        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();        searchRequest.source(searchSourceBuilder);        searchSourceBuilder.query(queryBuilder);        if (aggregationBuilder != null) {            searchSourceBuilder.aggregation(aggregationBuilder);        }        SearchResponse searchResponse = null;        try {            searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);        } catch (Exception e) {            log.warn("searchDoc failure! index: {}, type: {}, queryBuilder: {}, aggBuilder: {}", index, type, queryBuilder, aggregationBuilder, e);        }        return searchResponse;    }    /**     * <=> group by key1, key2     * @param index     * @param key1     * @param key2     * @return     */    public Map<String, Long> groupByTwice(String index, String type, String key1, String key2) {        SearchRequest searchRequest = new SearchRequest(index);        searchRequest.types(type);        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();        searchRequest.source(searchSourceBuilder);        TermsAggregationBuilder aggregation = AggregationBuilders.terms("by_" + key1)                .field(key1 + ".keyword");        aggregation.subAggregation(AggregationBuilders.terms("by_" + key2)                .field(key2 + ".keyword"));        searchSourceBuilder.aggregation(aggregation);        SearchResponse searchResponse = null;        try {            searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);        } catch (Exception e) {            log.warn("serach failure! key1: {}, key2: {}", key1, key2, e);        }        Map<String, Long> map = Maps.newHashMap();        if (searchResponse != null) {            ParsedTerms key1Agg = searchResponse.getAggregations().get("by_" + key1);            for (Terms.Bucket bucket : key1Agg.getBuckets()) {                ParsedTerms key2Agg = bucket.getAggregations().get("by_" + key2);                key2Agg.getBuckets().forEach((bucket1) -> map.put(bucket.getKeyAsString() + "-" + bucket1.getKeyAsString(), bucket1.getDocCount()));            }        }        return map;    }    /**     * <=> where field1 = field1Value group by field2     * @param index     * @param type     * @param field1     * @param field1Value     * @param field2     * @return     */    public Map<String, Long> selectByField1AndGroupByField2(String index, String type, String field1, String field1Value, String field2) {        SearchRequest searchRequest = new SearchRequest(index);        searchRequest.types(type);        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();        searchRequest.source(searchSourceBuilder);        searchSourceBuilder.query(QueryBuilders.matchQuery(field1, field1Value));        searchSourceBuilder.aggregation(AggregationBuilders.terms("by_" + field2).field(field2));        SearchResponse searchResponse = null;        try {            searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);        } catch (Exception e) {            log.warn("serach failure! field1: {}, field1Value: {}, field2: {}", field1, field1Value, field2, e);        }        Map<String, Long> map = Maps.newHashMap();        if (searchResponse != null) {            ParsedTerms field2Agg = searchResponse.getAggregations().get("by_" + field2);            field2Agg.getBuckets().forEach((bucket) -> map.put(bucket.getKeyAsString(), bucket.getDocCount()));        }        return map;    }    /**     * 索引是否存在     * @return     */    public boolean existIndex(String index) {        GetIndexRequest request = new GetIndexRequest();        request.indices(index);        boolean exists = false;        try {            exists = restHighLevelClient.indices().exists(request, RequestOptions.DEFAULT);        } catch (Exception e) {            log.warn("exist failure! index: {}", index, e);        }        return exists;    }}
elasticsearch官方文档中心

restHighLevelClient官方文档

es原生操作文档

查看es索引字段属性

GET /item/_mapping返回:{  "item" : {    "mappings" : {      "properties" : {        "barCode" : {          "type" : "text",          "fields" : {            "keyword" : {              "type" : "keyword",              "ignore_above" : 256            }          }        },        "categoryCode" : {          "type" : "text",          "fields" : {            "keyword" : {              "type" : "keyword",              "ignore_above" : 256            }          }        },        "createTime" : {          "type" : "long"        },        "itemName" : {          "type" : "text",          "fields" : {            "keyword" : {              "type" : "keyword",              "ignore_above" : 256            }          }        },        "mchId" : {          "type" : "long"        },        "skuId" : {          "type" : "text",          "fields" : {            "keyword" : {              "type" : "keyword",              "ignore_above" : 256            }          }        }      }    }  }}
ignore_above 超过长度导致索引不能存储

例如:

PUT my_index{  "mappings": {    "properties": {      "message": {        "type": "keyword",        "ignore_above": 20  //该字段将忽略任何超过20个字符的字符串。      }    }  }}PUT my_index/_doc/1  //该文档已成功建立索引{  "message": "Syntax error"}PUT my_index/_doc/2 {  "message": "Syntax error with some long stacktrace" //该文档将被索引,但不对该message字段建立索引。}GET my_index/_search //搜索返回两个文档,但是术语聚合中仅存在第一个文档。{  "aggs": {     "messages": {      "terms": {        "field": "message"      }    }  }}

超过长度,就不会对该字段建立索引

es索引增删改查新增操作es原生语句:

格式:PUT /{index}/{type}/{id}{  "field": "value",  ...}例子:PUT /website/blog/123{  "title": "My first blog entry",  "text":  "Just trying this out...",  "date":  "2014/01/01"}

Elasticsearch 可以帮我们自动生成 ID 。 请求的结构调整为: 不再使用 PUT 谓词(“使用这个 URL 存储这个文档”), 而是使用 POST 谓词(“存储文档在这个 URL 命名空间下”)。 现在该 URL 只需包含 _index 和 _type :

POST /website/blog/{  "title": "My second blog entry",  "text":  "Still trying this out...",  "date":  "2014/01/01"}返回结果:{   "_index":    "website",   "_type":     "blog",   "_id":       "AVFgSgVHUP18jI2wRx0w",  //id是es自动生成的   "_version":  1,   "created":   true}
javaApi新增操作

同步新增

方式一:

IndexRequest request = new IndexRequest("posts"); request.id("1"); String jsonString = "{" +        "\"user\":\"kimchy\"," +        "\"postDate\":\"2013-01-30\"," +        "\"message\":\"trying out Elasticsearch\"" +        "}";request.source(jsonString, XContentType.JSON); restHighLevelClient.index(indexRequest, RequestOptions.DEFAULT);

方式二:

XContentBuilder builder = XContentFactory.jsonBuilder();builder.startObject();{    builder.field("user", "kimchy");    builder.timeField("postDate", new Date());    builder.field("message", "trying out Elasticsearch");}builder.endObject();IndexRequest indexRequest = new IndexRequest("posts")    .id("1").source(builder);restHighLevelClient.index(indexRequest, RequestOptions.DEFAULT);

方式三:

IndexRequest indexRequest = new IndexRequest("posts")    .id("1")    .source("user", "kimchy",        "postDate", new Date(),        "message", "trying out Elasticsearch"); restHighLevelClient.index(indexRequest, RequestOptions.DEFAULT);      

异步新增

client.indexAsync(request, RequestOptions.DEFAULT, new ActionListener<IndexResponse>() {    @Override    public void onResponse(IndexResponse indexResponse) {        //成功回调    }    @Override    public void onFailure(Exception e) {        //失败回调    }}); 

返回状态

IndexResponse indexResponse = client.index(request, RequestOptions.DEFAULT);//索引String index = indexResponse.getIndex();//id主键String id = indexResponse.getId();//操作状态是新增还是更新if (indexResponse.getResult() == DocWriteResponse.Result.CREATED) {    } else if (indexResponse.getResult() == DocWriteResponse.Result.UPDATED) {    }//分片同步情况ReplicationResponse.ShardInfo shardInfo = indexResponse.getShardInfo();if (shardInfo.getTotal() != shardInfo.getSuccessful()) {    }if (shardInfo.getFailed() > 0) {    for (ReplicationResponse.ShardInfo.Failure failure :            shardInfo.getFailures()) {        String reason = failure.reason();     }}

操作冲突

IndexRequest request = new IndexRequest("posts")    .id("1")    .source("field", "value")    .setIfSeqNo(10L)    .setIfPrimaryTerm(20);try {    IndexResponse response = client.index(request, RequestOptions.DEFAULT);} catch(ElasticsearchException e) {    if (e.status() == RestStatus.CONFLICT) {            }}
查找索引es原生语句
GET /website/blog/123?pretty如果只要返回一部分字段得话,就如sql中select后面一样例如:GET /website/blog/123?_source=title,text返回:{  "_index" :   "website",  "_type" :    "blog",  "_id" :      "123",  "_version" : 1,  "found" :    true,  "_source" :  {      "title": "My first blog entry",      "text":  "Just trying this out...",      "date":  "2014/01/01"  }}

只想得到的_source字段的数据, 不需要任何元数据

GET /website/blog/123/_source返回:{   "title": "My first blog entry",   "text":  "Just trying this out...",   "date":  "2014/01/01"}

索引是否存在

curl -i -XHEAD 
javaApi操作
GetRequest getRequest = new GetRequest(        "website",         "123");   //自定义返回字段String[] includes = new String[]{"title", "text"};//包含字段String[] excludes = Strings.EMPTY_ARRAY;//排除字段FetchSourceContext fetchSourceContext =        new FetchSourceContext(true, includes, excludes);getRequest.fetchSourceContext(fetchSourceContext);GetResponse getResponse = client.get(getRequest, RequestOptions.DEFAULT);//返回信息String index = getResponse.getIndex();String id = getResponse.getId();if (getResponse.isExists()) {    long version = getResponse.getVersion();    String sourceAsString = getResponse.getSourceAsString();            Map<String, Object> sourceAsMap = getResponse.getSourceAsMap();     byte[] sourceAsBytes = getResponse.getSourceAsBytes();          } else {    }

仅仅只获取_source中的field

GetSourceRequest getSourceRequest = new GetSourceRequest(    "posts",     "1");GetSourceResponse response =    client.getSource(getSourceRequest, RequestOptions.DEFAULT);

异步获取

client.getSourceAsync(request, RequestOptions.DEFAULT, new ActionListener<GetSourceResponse>() {        @Override        public void onResponse(GetSourceResponse getResponse) {                    }        @Override        public void onFailure(Exception e) {                    }    }); 

不存在的情况

GetRequest request = new GetRequest("does_not_exist", "1");try {    GetResponse getResponse = client.get(request, RequestOptions.DEFAULT);} catch (ElasticsearchException e) {    if (e.status() == RestStatus.NOT_FOUND) {        //todo 处理的没有找到情况            }}

处理冲突

try {    GetRequest request = new GetRequest("website", "123").version(2); //带上version,假如已经更新了,version会变,查询就会差产生冲突异常    GetResponse getResponse = client.get(request, RequestOptions.DEFAULT);} catch (ElasticsearchException exception) {    if (exception.status() == RestStatus.CONFLICT) {        //todo 处理冲突逻辑            }}

索引中字段是否存在

GetRequest getRequest = new GetRequest(    "website",     "1");    getRequest.fetchSourceContext(new FetchSourceContext(false)); getRequest.storedFields("_none_");  //要判断的字段boolean exists = client.exists(getRequest, RequestOptions.DEFAULT);
删除es原生的语句
DELETE /website/blog/123返回:{  "found" :    true,  "_index" :   "website",  "_type" :    "blog",  "_id" :      "123",  "_version" : 3}
javaapi删除操作

都存在异步删除的操作,详情去看官方文档,不一一列举

DeleteRequest request = new DeleteRequest(        "posts",            "1"); DeleteResponse deleteResponse = client.delete(        request, RequestOptions.DEFAULT);    //返回String index = deleteResponse.getIndex();String id = deleteResponse.getId();long version = deleteResponse.getVersion();ReplicationResponse.ShardInfo shardInfo = deleteResponse.getShardInfo();if (shardInfo.getTotal() != shardInfo.getSuccessful()) {    }if (shardInfo.getFailed() > 0) {    for (ReplicationResponse.ShardInfo.Failure failure :            shardInfo.getFailures()) {        String reason = failure.reason();     }}

不存在情况

DeleteRequest request = new DeleteRequest("posts", "does_not_exist");DeleteResponse deleteResponse = client.delete(        request, RequestOptions.DEFAULT);if (deleteResponse.getResult() == DocWriteResponse.Result.NOT_FOUND) {    }

冲突

try {    DeleteResponse deleteResponse = client.delete(        new DeleteRequest("posts", "1").setIfSeqNo(100).setIfPrimaryTerm(2),            RequestOptions.DEFAULT);} catch (ElasticsearchException exception) {    if (exception.status() == RestStatus.CONFLICT) {            }}
更新es语句
PUT /website/blog/123{  "title": "My first blog entry",  "text":  "I am starting to get the hang of this...",  "date":  "2014/01/02"}返回:{  "_index" :   "website",  "_type" :    "blog",  "_id" :      "123",  "_version" : 2,  "created":   false  //created 标志设置成 false ,是因为相同的索引、类型和 ID 的文档已经存在}
javaapi更新操作
UpdateRequest request = new UpdateRequest(        "website",         "123"); //脚本更新Map<String, Object> parameters = singletonMap("count", 4); Script inline = new Script(ScriptType.INLINE, "painless",        "ctx._source.field += params.count", parameters);  request.script(inline);  //或者UpdateRequest request = new UpdateRequest("posts", "1");String jsonString = "{" +        "\"updated\":\"2017-01-01\"," +        "\"reason\":\"daily update\"" +        "}";request.doc(jsonString, XContentType.JSON);//再或者Map<String, Object> jsonMap = new HashMap<>();jsonMap.put("updated", new Date());jsonMap.put("reason", "daily update");UpdateRequest request = new UpdateRequest("posts", "1")        .doc(jsonMap);         UpdateResponse updateResponse = client.update(        request, RequestOptions.DEFAULT);

排除不用更新字段

UpdateRequest request = new UpdateRequest(        "website",         "123");String[] includes = new String[]{"updated", "r*"};String[] excludes = Strings.EMPTY_ARRAY;request.fetchSource(        new FetchSourceContext(true, includes, excludes))UpdateResponse updateResponse = client.update(        request, RequestOptions.DEFAULT);

代价最小批量操作 bulk api

es原生语句

格式:{ action: { metadata }}\n{ request body        }\n{ action: { metadata }}\n{ request body        }\n...例如:POST /_bulk{ "delete": { "_index": "website", "_type": "blog", "_id": "123" }} { "create": { "_index": "website", "_type": "blog", "_id": "123" }}{ "title":    "My first blog post" }{ "index":  { "_index": "website", "_type": "blog" }}{ "title":    "My second blog post" }{ "update": { "_index": "website", "_type": "blog", "_id": "123", "_retry_on_conflict" : 3} }{ "doc" : {"title" : "My updated blog post"} }

javaapi操作

BulkRequest request = new BulkRequest("posts");request.add(new DeleteRequest("posts", "3")); request.add(new UpdateRequest("posts", "2")         .doc(XContentType.JSON,"other", "test"));request.add(new IndexRequest("posts").id("4")          .source(XContentType.JSON,"field", "baz"));BulkResponse bulkResponse = client.bulk(request, RequestOptions.DEFAULT);//返回for (BulkItemResponse bulkItemResponse : bulkResponse) {     DocWriteResponse itemResponse = bulkItemResponse.getResponse();     switch (bulkItemResponse.getOpType()) {    case INDEX:        case CREATE:        IndexResponse indexResponse = (IndexResponse) itemResponse;        break;    case UPDATE:           UpdateResponse updateResponse = (UpdateResponse) itemResponse;        break;    case DELETE:           DeleteResponse deleteResponse = (DeleteResponse) itemResponse;    }    //还可以判断是否失败    for (BulkItemResponse bulkItemResponse : bulkResponse) {    if (bulkItemResponse.isFailed()) {         BulkItemResponse.Failure failure =                bulkItemResponse.getFailure();     }}}

BulkProcessor

BulkProcessor.Listener listener = new BulkProcessor.Listener() {     @Override    public void beforeBulk(long executionId, BulkRequest request) {            }    @Override    public void afterBulk(long executionId, BulkRequest request,            BulkResponse response) {            }    @Override    public void afterBulk(long executionId, BulkRequest request,            Throwable failure) {            }};BulkProcessor bulkProcessor = BulkProcessor.builder(        (request, bulkListener) ->            client.bulkAsync(request, RequestOptions.DEFAULT, bulkListener),        listener).build();
批量查询multi get api

javaapi

MultiGetRequest request = new MultiGetRequest();request.add(new MultiGetRequest.Item(    "index",             "example_id"));  request.add(new MultiGetRequest.Item("index", "another_id"));String[] includes = new String[] {"foo", "*r"};String[] excludes = Strings.EMPTY_ARRAY;FetchSourceContext fetchSourceContext =        new FetchSourceContext(true, includes, excludes);request.add(new MultiGetRequest.Item("index", "example_id")    .fetchSourceContext(fetchSourceContext));    MultiGetResponse response = client.mget(request, RequestOptions.DEFAULT);//返回MultiGetItemResponse firstItem = response.getResponses()[0];assertNull(firstItem.getFailure());              GetResponse firstGet = firstItem.getResponse();  String index = firstItem.getIndex();String id = firstItem.getId();if (firstGet.isExists()) {    long version = firstGet.getVersion();    String sourceAsString = firstGet.getSourceAsString();            Map<String, Object> sourceAsMap = firstGet.getSourceAsMap();     byte[] sourceAsBytes = firstGet.getSourceAsBytes();          } else {    }
update By query api

java api

UpdateByQueryRequest request =        new UpdateByQueryRequest("source1", "source2"); request.setConflicts("proceed"); request.setQuery(new TermQueryBuilder("user", "kimchy")); request.setMaxDocs(10); //Only copy 10 documentsrequest.setBatchSize(100); //uses batches of 1000. You can change the batch size with setBatchSize.BulkByScrollResponse bulkResponse =        client.updateByQuery(request, RequestOptions.DEFAULT);TimeValue timeTaken = bulkResponse.getTook(); boolean timedOut = bulkResponse.isTimedOut(); long totalDocs = bulkResponse.getTotal(); long updatedDocs = bulkResponse.getUpdated(); long deletedDocs = bulkResponse.getDeleted(); long batches = bulkResponse.getBatches(); long noops = bulkResponse.getNoops(); long versionConflicts = bulkResponse.getVersionConflicts(); long bulkRetries = bulkResponse.getBulkRetries(); long searchRetries = bulkResponse.getSearchRetries(); TimeValue throttledMillis = bulkResponse.getStatus().getThrottled(); TimeValue throttledUntilMillis =        bulkResponse.getStatus().getThrottledUntil(); List<ScrollableHitSource.SearchFailure> searchFailures =        bulkResponse.getSearchFailures(); List<BulkItemResponse.Failure> bulkFailures =        bulkResponse.getBulkFailures();
Delete By Query API

javaapi

跟更新通过查询一样的

DeleteByQueryRequest request =        new DeleteByQueryRequest("source1", "source2"); request.setQuery(new TermQueryBuilder("user", "kimchy")); BulkByScrollResponse bulkResponse =        client.deleteByQuery(request, RequestOptions.DEFAULT);TimeValue timeTaken = bulkResponse.getTook(); boolean timedOut = bulkResponse.isTimedOut(); long totalDocs = bulkResponse.getTotal(); long deletedDocs = bulkResponse.getDeleted(); long batches = bulkResponse.getBatches(); long noops = bulkResponse.getNoops(); long versionConflicts = bulkResponse.getVersionConflicts(); long bulkRetries = bulkResponse.getBulkRetries(); long searchRetries = bulkResponse.getSearchRetries(); TimeValue throttledMillis = bulkResponse.getStatus().getThrottled(); TimeValue throttledUntilMillis =        bulkResponse.getStatus().getThrottledUntil(); List<ScrollableHitSource.SearchFailure> searchFailures =        bulkResponse.getSearchFailures(); List<BulkItemResponse.Failure> bulkFailures =        bulkResponse.getBulkFailures();
es搜索查询 (重点★★★★★)es语句term 精确查询 ,只对feild为keyword属性

对比sql,SELECT document FROM products WHERE price = 20

es操作语句如下:

GET /my_store/products/_search{    "query" : {        "constant_score" : {  //我们用 constant_score 将 term 查询转化成为过滤器            "filter" : {                "term" : {                     "price" : 20                }            }        }    }}返回:"hits" : [    {        "_index" : "my_store",        "_type" :  "products",        "_id" :    "2",        "_score" : 1.0,         "_source" : {          "price" :     20,          "productID" : "KDKE-B-9947-#kL5"        }    }]

假如对字段属性是text的话就不行,存储text类型字段时候,会对数据进行分词,有中文分词,英文分词等等 例如如下:

GET /my_store/products/_search{    "query" : {        "constant_score" : {  //我们用 constant_score 将 term 查询转化成为过滤器            "filter" : {                "term" : {                     "title" : "无法得到"                }            }        }    }}在es,要是使用中文分词器,那么title在es存储是'无法' '得到'。这样的话用的无法得到就不能找到对应得文档信息

查询多个精确值:

GET /my_store/products/_search{    "query" : {        "constant_score" : {            "filter" : {                "terms" : {                     "price" : [20, 30]                }            }        }    }}返回:"hits" : [    {        "_id" :    "2",        "_score" : 1.0,        "_source" : {          "price" :     20,          "productID" : "KDKE-B-9947-#kL5"        }    },    {        "_id" :    "3",        "_score" : 1.0,        "_source" : {          "price" :     30,          "productID" : "JODL-X-1937-#pV7"        }    },    {        "_id":     "4",        "_score":  1.0,        "_source": {           "price":     30,           "productID": "QQPX-R-3956-#aD8"        }     }]

对这语句进行分析,要看看命中情况

GET /my_store/_analyze{  "field": "productID",  "text": "XHDK-A-1293-#fJ3"}返回:{  "tokens" : [ {    "token" :        "xhdk",    "start_offset" : 0,    "end_offset" :   4,    "type" :         "<ALPHANUM>",    "position" :     1  }, {    "token" :        "a",    "start_offset" : 5,    "end_offset" :   6,    "type" :         "<ALPHANUM>",    "position" :     2  }, {    "token" :        "1293",    "start_offset" : 7,    "end_offset" :   11,    "type" :         "<NUM>",    "position" :     3  }, {    "token" :        "fj3",    "start_offset" : 13,    "end_offset" :   16,    "type" :         "<ALPHANUM>",    "position" :     4  } ]}
组合过滤器bool一个bool过滤器由三部分组成
{   "bool" : {      "must" :     [], //所有的语句都 必须(must) 匹配,与 AND 等价      "should" :   [], //至少有一个语句要匹配,与 OR 等价      "must_not" : [], //所有的语句都 不能(must not) 匹配,与 NOT 等价   }}

实战操作:对比sql语句操作

SELECT productFROM   productsWHERE  (price = 20 OR productID = "XHDK-A-1293-#fJ3")  AND  (price != 30)

es语句达到这种效果如下

GET /my_store/products/_search{   "query" : {      "filtered" : {          "filter" : {            "bool" : {              "should" : [                 { "term" : {"price" : 20}},                  { "term" : {"productID" : "XHDK-A-1293-#fJ3"}}               ],              "must_not" : {                 "term" : {"price" : 30}               }           }         }      }   }}返回结果:"hits" : [    {        "_id" :     "1",        "_score" :  1.0,        "_source" : {          "price" :     10,          "productID" : "XHDK-A-1293-#fJ3"         }    },    {        "_id" :     "2",        "_score" :  1.0,        "_source" : {          "price" :     20,           "productID" : "KDKE-B-9947-#kL5"        }    }]
嵌套布尔过滤器

对于以下这个 SQL 语句:

SELECT documentFROM   productsWHERE  productID      = "KDKE-B-9947-#kL5"  OR (     productID = "JODL-X-1937-#pV7"       AND price     = 30 )

es 转换成一组嵌套的 bool 过滤器:

GET /my_store/products/_search{   "query" : {      "filtered" : {         "filter" : {            "bool" : {              "should" : [                { "term" : {"productID" : "KDKE-B-9947-#kL5"}},                 { "bool" : {                   "must" : [                    { "term" : {"productID" : "JODL-X-1937-#pV7"}},                     { "term" : {"price" : 30}}                   ]                }}              ]           }         }      }   }}
范围查询

SQL语句

SELECT documentFROM   productsWHERE  price BETWEEN 20 AND 40

es操作语句

GET /my_store/products/_search{    "query" : {        "constant_score" : {            "filter" : {                "range" : {                    "price" : {                        "gte" : 20,                        "lt"  : 40                    }                }            }        }    }}
gt: > 大于(greater than)lt: < 小于(less than)gte: >= 大于或等于(greater than or equal to)lte: <= 小于或等于(less than or equal to)处理null值

数据准备

POST /my_index/posts/_bulk{ "index": { "_id": "1"              }}{ "tags" : ["search"]                }  { "index": { "_id": "2"              }}{ "tags" : ["search", "open_source"] }  { "index": { "_id": "3"              }}{ "other_field" : "some data"        }  { "index": { "_id": "4"              }}{ "tags" : null                      }  { "index": { "_id": "5"              }}{ "tags" : ["search", null]          }

SQL语句

SELECT tagsFROM   postsWHERE  tags IS NOT NULL

es 中实现操作

GET /my_index/posts/_search{    "query" : {        "constant_score" : {            "filter" : {                "exists" : { "field" : "tags" }            }        }    }}
全文搜索

数据准备

DELETE /my_index PUT /my_index{ "settings": { "number_of_shards": 1 }} POST /my_index/my_type/_bulk{ "index": { "_id": 1 }}{ "title": "The quick brown fox" }{ "index": { "_id": 2 }}{ "title": "The quick brown fox jumps over the lazy dog" }{ "index": { "_id": 3 }}{ "title": "The quick brown fox jumps over the quick dog" }{ "index": { "_id": 4 }}{ "title": "Brown fox brown dog" }
match单个词查询
GET /my_index/my_type/_search{    "query": {        "match": {            "title": "QUICK!"        }    }}

Elasticsearch 执行上面这个 match 查询的步骤是:

检查字段类型 。

标题 title 字段是一个 string 类型( analyzed )已分析的全文字段,这意味着查询字符串本身也应该被分析。分析查询字符串 。

将查询的字符串 QUICK! 传入标准分析器中,输出的结果是单个项 quick 。因为只有一个单词项,所以 match 查询执行的是单个底层 term 查询。查找匹配文档 。

用 term 查询在倒排索引中查找 quick 然后获取一组包含该项的文档,本例的结果是文档:1、2 和 3为每个文档评分 。

用 term 查询计算每个文档相关度评分 _score ,这是种将词频(term frequency,即词 quick 在相关文档的 title 字段中出现的频率)和反向文档频率(inverse document frequency,即词 quick 在所有文档的 title 字段中出现的频率),以及字段的长度(即字段越短相关度越高)相结合的计算方式多词查询

GET /my_index/my_type/_search{    "query": {        "match": {            "title": {                      "query":    "BROWN DOG!",                "operator": "and" //BROWN DOG! 都存在才会匹配。没有and的话只要匹配一部分就可以了,提高精确度            }        }    }}

又例如:

GET /_search{  "query": {    "bool": {      "should": [        { "match": {             "title":  {              "query": "War and Peace",              "boost": 2        }}},        { "match": {             "author":  {              "query": "Leo Tolstoy",              "boost": 2        }}},        { "bool":  {             "should": [              { "match": { "translator": "Constance Garnett" }},              { "match": { "translator": "Louise Maude"      }}            ]        }}      ]    }  }}
组合查询
GET /my_index/my_type/_search{  "query": {    "bool": {      "must":     { "match": { "title": "quick" }},      "must_not": { "match": { "title": "lazy"  }},      "should": [                  { "match": { "title": "brown" }},                  { "match": { "title": "dog"   }}      ]    }  }}

扩展:

tie_breaker 这个参数将其他匹配语句的评分也考虑其中 dis_max (查询会采用单个最佳匹配字段),而忽略其他的匹配

tie_breaker 参数提供了一种 dis_max 和 bool 之间的折中选择,它的评分方式如下:

获得最佳匹配语句的评分 _score 。将其他匹配语句的评分结果与 tie_breaker 相乘。对以上评分求和并规范化。multi_match 查询

{  "dis_max": {    "queries":  [      {        "match": {          "title": {            "query": "Quick brown fox",            "minimum_should_match": "30%"          }        }      },      {        "match": {          "body": {            "query": "Quick brown fox",            "minimum_should_match": "30%"          }        }      },    ],    "tie_breaker": 0.3  }}

上面这个查询用 multi_match 重写成更简洁的形式:

{    "multi_match": {        "query":                "Quick brown fox",        "type":                 "best_fields", //best_fields 类型是默认值,可以不指定。        "fields":               [ "title", "body" ],        "tie_breaker":          0.3,        "minimum_should_match": "30%" // minimum_should_match 或 operator 这样的参数会被传递到生成的 match 查询中。    }}

多字段批量查询

{  "query": {    "bool": {      "should": [        { "match": { "street":    "Poland Street W1V" }},        { "match": { "city":      "Poland Street W1V" }},        { "match": { "country":   "Poland Street W1V" }},        { "match": { "postcode":  "Poland Street W1V" }}      ]    }  }}

上面的可以改成

{  "query": {    "multi_match": {    //可以采用 multi_match 查询,将 type 设置成 most_fields 然后告诉 Elasticsearch 合并所有匹配字段的评分      "query":       "Poland Street W1V",      "type":        "most_fields",      "fields":      [ "street", "city", "country", "postcode" ]    }  }}
近似匹配短语匹配
GET /my_index/my_type/_search{    "query": {        "match_phrase": {            "title": "quick brown fox"        }    }}

match_phrase 查询首先将查询字符串解析成一个词项列表,然后对这些词项进行搜索,但只保留那些包含 全部 搜索词项,且 位置 与搜索词项相同的文档

slop 参数将灵活度引入短语匹配

GET /my_index/my_type/_search{    "query": {        "match_phrase": {            "title": {            	"query": "quick fox",            	"slop":  1            }        }    }}

slop 参数告诉 match_phrase 查询词条相隔多远时仍然能将文档视为匹配 。 相隔多远的意思是为了让查询和文档匹配你需要移动词条多少次?

分数较高因为 quick 和 fox 很接近

分数较低因为 quick 和 fox 分开较远

部分匹配

在某个时候实现一个 低效的全文搜索 用下面的 SQL 语句对全文进行搜索

WHERE text LIKE "%quick%"      AND text LIKE "%brown%"      AND text LIKE "%fox%"
prefix 前缀查询

找到所有以 W1 开始的邮编,可以使用简单的 prefix 查询:

GET /my_index/address/_search{    "query": {        "prefix": {            "postcode": "W1"        }    }}
通配符与正则表达式查询

查询会匹配包含 W1F 7HW 和 W2F 8HW 的文档

GET /my_index/address/_search{    "query": {        "wildcard": {            "postcode": "W?F*HW" //? 匹配 1 和 2 , * 与空格及 7 和 8 匹配。        }    }}

也可以这样使用

GET /my_index/address/_search{    "query": {        "regexp": {            "postcode": "W[0-9].+" //这个正则表达式要求词必须以 W 开头,紧跟 0 至 9 之间的任何一个数字,然后接一或多个其他字符        }    }}

对有很多唯一词的字段执行这些查询可能会消耗非常多的资源,所以要避免使用左通配这样的模式匹配(如: *foo 或 .*foo 这样的正则式)。跟sql查询一个道理

“Quick brown fox” (快速的棕色狐狸)的 title 字段会生成词: quick 、 brown 和 fox

会匹配以下这个查询:

{ "regexp": { "title": "br.*" }}

但是不会匹配以下两个查询:

{ "regexp": { "title": "Qu.*" }}  //在索引里的词是 quick 而不是 Quick 。{ "regexp": { "title": "quick br*" }}  //quick 和 brown 在词表中是分开的。分词相关
javaapi搜索查询

都支持异步查询操作,索引新增异步一样,具体请看官方文档

search Api

SearchRequest searchRequest = new SearchRequest("item_14714");SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();searchSourceBuilder.query(QueryBuilders.termQuery("itemName", "清"));//和es中term一样匹配searchSourceBuilder.from(0);searchSourceBuilder.size(5);searchSourceBuilder.sort(new ScoreSortBuilder().order(SortOrder.DESC));searchSourceBuilder.sort(new FieldSortBuilder("id").order(SortOrder.ASC));searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));searchRequest.source(searchSourceBuilder);SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);//返回信息处理RestStatus status = searchResponse.status();TimeValue took = searchResponse.getTook();Boolean terminatedEarly = searchResponse.isTerminatedEarly();boolean timedOut = searchResponse.isTimedOut();int totalShards = searchResponse.getTotalShards();int successfulShards = searchResponse.getSuccessfulShards();int failedShards = searchResponse.getFailedShards();for (ShardSearchFailure failure : searchResponse.getShardFailures()) {    // failures should be handled here}SearchHits hits = searchResponse.getHits(); TotalHits totalHits = hits.getTotalHits(); //命中数// the total number of hits, must be interpreted in the context of totalHits.relationlong numHits = totalHits.value;// whether the number of hits is accurate (EQUAL_TO) or a lower bound of the total (GREATER_THAN_OR_EQUAL_TO)TotalHits.Relation relation = totalHits.relation;float maxScore = hits.getMaxScore();SearchHit[] searchHits = hits.getHits();for (SearchHit hit : searchHits) {    // do something with the SearchHit    String index = hit.getIndex();    String id = hit.getId();    float score = hit.getScore();    String sourceAsString = hit.getSourceAsString();    Map<String, Object> sourceAsMap = hit.getSourceAsMap();    String documentTitle = (String) sourceAsMap.get("title");    List<Object> users = (List<Object>) sourceAsMap.get("user");    Map<String, Object> innerObject =            (Map<String, Object>) sourceAsMap.get("innerObject");}}

上面查询条件可以更换其他

searchSourceBuilder.query(QueryBuilders.termQuery("itemName", "清"));

更换成批量

QueryBuilder matchQueryBuilder = QueryBuilders.matchQuery("user", "kimchy")                                                .fuzziness(Fuzziness.AUTO)                                                .prefixLength(3)                                                .maxExpansions(10);searchSourceBuilder.query(matchQueryBuilder);//还可以设置你要返回的字段信息,或者要排除什么字段String[] includeFields = new String[] {"title", "innerObject.*"};String[] excludeFields = new String[] {"user"};searchSourceBuilder.fetchSource(includeFields, excludeFields);

设置高亮请求

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();HighlightBuilder highlightBuilder = new HighlightBuilder(); //创建一个高亮builderHighlightBuilder.Field highlightTitle =        new HighlightBuilder.Field("itemName"); //高亮字段highlightTitle.highlighterType("unified");  highlightBuilder.field(highlightTitle);  //构建高亮builderHighlightBuilder.Field highlightUser = new HighlightBuilder.Field("user");highlightBuilder.field(highlightUser);searchSourceBuilder.highlighter(highlightBuilder);

请求聚合(es原生下面会讲) terms在公司名称上创建一个汇总,并在公司中员工的平均年龄上进行子汇总:

SearchRequest searchRequest = new SearchRequest("item_14714");SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();TermsAggregationBuilder aggregation = AggregationBuilders.terms("by_company")        .field("company.keyword"); //公司名称进行聚合,也就是分组,别名:by_company,在返回中取得对应得数据aggregation.subAggregation(AggregationBuilders.avg("average_age")        .field("age")); //计算平均年龄searchSourceBuilder.aggregation(aggregation);searchRequest.source(searchSourceBuilder);SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);//返回信息Aggregations aggregations = searchResponse.getAggregations();Terms byCompanyAggregation = aggregations.get("by_company"); //by_company聚合Bucket elasticBucket = byCompanyAggregation.getBucketByKey("Elastic"); Avg averageAge = elasticBucket.getAggregations().get("average_age");  //平均年龄的聚合double avg = averageAge.getValue();//有多组的情况下,还可以做成集合的形式List<Aggregation> aggregationList = aggregations.asList();for (Aggregation agg : aggregations) {    String type = agg.getType();    if (type.equals(TermsAggregationBuilder.NAME)) {        Bucket elasticBucket = ((Terms) agg).getBucketByKey("Elastic");        long numberOfDocs = elasticBucket.getDocCount();    }}

请求Suggestions

SearchRequest searchRequest = new SearchRequest("item_14714");SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();SuggestionBuilder termSuggestionBuilder =    SuggestBuilders.termSuggestion("user").text("kmichy"); SuggestBuilder suggestBuilder = new SuggestBuilder();suggestBuilder.addSuggestion("suggest_user", termSuggestionBuilder); searchSourceBuilder.suggest(suggestBuilder);searchRequest.source(searchSourceBuilder);SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);//返回Suggest suggest = searchResponse.getSuggest(); TermSuggestion termSuggestion = suggest.getSuggestion("suggest_user"); for (TermSuggestion.Entry entry : termSuggestion.getEntries()) {     for (TermSuggestion.Entry.Option option : entry) {         String suggestText = option.getText().string();    }}//检索分析结果Map<String, ProfileShardResult> profilingResults =        searchResponse.getProfileResults(); for (Map.Entry<String, ProfileShardResult> profilingResult : profilingResults.entrySet()) {     String key = profilingResult.getKey(); //属于哪个分片的密钥    ProfileShardResult profileShardResult = profilingResult.getValue(); }
Search Scroll API 大数据结果滚动查询

es分页超过了10000的话就性能就会受到很大的影响的,要用游标的滚动的方式去查大数据的分页

final Scroll scroll = new Scroll(TimeValue.timeValueMinutes(1L));SearchRequest searchRequest = new SearchRequest("posts");searchRequest.scroll(scroll);SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();searchSourceBuilder.query(matchQuery("title", "Elasticsearch"));searchRequest.source(searchSourceBuilder);SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT); //通过发送初始值来初始化搜索上下文 SearchRequestString scrollId = searchResponse.getScrollId();SearchHit[] searchHits = searchResponse.getHits().getHits();while (searchHits != null && searchHits.length > 0) { //通过循环调用Search Scroll API检索所有搜索结果,直到没有文档返回    //处理返回的搜索结果    SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId); //创建一个新的SearchScrollRequest保存最后返回的滚动标识符和滚动间隔    scrollRequest.scroll(scroll);    searchResponse = client.scroll(scrollRequest, RequestOptions.DEFAULT);    scrollId = searchResponse.getScrollId();    searchHits = searchResponse.getHits().getHits();}ClearScrollRequest clearScrollRequest = new ClearScrollRequest(); //滚动完成后,清除滚动上下文clearScrollRequest.addScrollId(scrollId);ClearScrollResponse clearScrollResponse = client.clearScroll(clearScrollRequest, RequestOptions.DEFAULT);boolean succeeded = clearScrollResponse.isSucceeded();
Multi-Search API 批量查询
MultiSearchRequest request = new MultiSearchRequest();    SearchRequest firstSearchRequest = new SearchRequest();   SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();searchSourceBuilder.query(QueryBuilders.matchQuery("user", "kimchy"));firstSearchRequest.source(searchSourceBuilder);request.add(firstSearchRequest); //第二个条件SearchRequest secondSearchRequest = new SearchRequest();  searchSourceBuilder = new SearchSourceBuilder();searchSourceBuilder.query(QueryBuilders.matchQuery("user", "luca"));secondSearchRequest.source(searchSourceBuilder);request.add(secondSearchRequest);//查询index为posts的索引数据SearchRequest searchRequest = new SearchRequest("posts"); MultiSearchResponse response = client.msearch(request, RequestOptions.DEFAULT);//返回MultiSearchResponse.Item firstResponse = response.getResponses()[0];  //first searchassertNull(firstResponse.getFailure());     //判断getFailure返回null                           SearchResponse searchResponse = firstResponse.getResponse();           assertEquals(4, searchResponse.getHits().getTotalHits().value);MultiSearchResponse.Item secondResponse = response.getResponses()[1];  //second searchassertNull(secondResponse.getFailure());searchResponse = secondResponse.getResponse();assertEquals(1, searchResponse.getHits().getTotalHits().value);
Search Template API 模板查询

只写一种实用的

SearchTemplateRequest request = new SearchTemplateRequest();request.setRequest(new SearchRequest("posts"));request.setScriptType(ScriptType.STORED);request.setScript("title_search");Map<String, Object> params = new HashMap<>();params.put("field", "title");params.put("value", "elasticsearch");params.put("size", 5);request.setScriptParams(params);//设置explain 和profilerequest.setExplain(true);request.setProfile(true);SearchTemplateResponse response = client.searchTemplate(request, RequestOptions.DEFAULT);//返回SearchResponse searchResponse = response.getResponse();SearchTemplateResponse renderResponse = client.searchTemplate(request, RequestOptions.DEFAULT);BytesReference source = renderResponse.getSource();
Field Capabilities API 字段多索引查询
FieldCapabilitiesRequest request = new FieldCapabilitiesRequest()    .fields("user")    .indices("posts", "authors", "contributors");FieldCapabilitiesResponse response = client.fieldCaps(request, RequestOptions.DEFAULT);//返回Map<String, FieldCapabilities> userResponse = response.getField("user");  FieldCapabilities textCapabilities = userResponse.get("keyword");boolean isSearchable = textCapabilities.isSearchable();boolean isAggregatable = textCapabilities.isAggregatable();String[] indices = textCapabilities.indices(); String[] nonSearchableIndices = textCapabilities.nonSearchableIndices(); String[] nonAggregatableIndices = textCapabilities.nonAggregatableIndices();
Count API 统计查询数据
CountRequest countRequest = new CountRequest(); //这里加索引//也可以多索引//countRequest.indices("blog", "author");SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); searchSourceBuilder.query(QueryBuilders.matchAllQuery()); //查询所有/*SearchSourceBuilder sourceBuilder = new SearchSourceBuilder(); sourceBuilder.query(QueryBuilders.termQuery("user", "kimchy"));*/ countRequest.source(searchSourceBuilder); CountResponse countResponse = client    .count(countRequest, RequestOptions.DEFAULT);//返回long count = countResponse.getCount();RestStatus status = countResponse.status();Boolean terminatedEarly = countResponse.isTerminatedEarly();int totalShards = countResponse.getTotalShards();int skippedShards = countResponse.getSkippedShards();int successfulShards = countResponse.getSuccessfulShards();int failedShards = countResponse.getFailedShards();for (ShardSearchFailure failure : countResponse.getShardFailures()) {    // failures should be handled here}
es聚合(重点★★★★★)概念

你只需要明白两个主要的概念:

桶(Buckets)

满足特定条件的文档的集合指标(Metrics)

对桶内的文档进行统计计算

可以粗略的翻译成sql语句解释:

SELECT COUNT(color)  //COUNT(color) 相当于指标。FROM tableGROUP BY color      //GROUP BY color 相当于桶。

桶在概念上类似于 SQL 的分组(GROUP BY),而指标则类似于 COUNT() 、 SUM() 、 MAX() 等统计方法。

es聚合语句

准备测试数据:

POST /cars/transactions/_bulk{ "index": {}}{ "price" : 10000, "color" : "red", "make" : "honda", "sold" : "2014-10-28" }{ "index": {}}{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }{ "index": {}}{ "price" : 30000, "color" : "green", "make" : "ford", "sold" : "2014-05-18" }{ "index": {}}{ "price" : 15000, "color" : "blue", "make" : "toyota", "sold" : "2014-07-02" }{ "index": {}}{ "price" : 12000, "color" : "green", "make" : "toyota", "sold" : "2014-08-19" }{ "index": {}}{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }{ "index": {}}{ "price" : 80000, "color" : "red", "make" : "bmw", "sold" : "2014-01-01" }{ "index": {}}{ "price" : 25000, "color" : "blue", "make" : "ford", "sold" : "2014-02-12" }

构建我们的第一个聚合。汽车经销商可能会想知道哪个颜色的汽车销量最好,用聚合可以轻易得到结果,用 terms 桶操作:

# 报错先执行PUT cars/_mapping{  "properties": {	    "color": { 														      "type":     "text",      "fielddata": true									    }  }}# GET /cars/transactions/_search{    "size" : 0, 																		#我们将 size 设置成 0 。我们并不关心搜索结果的具体内容,所以将返回记录数设置为 0 来提高查询速度    "aggs" : { 																			#完整形式 aggregations 同样有效        "popular_colors" : { 												#可以为聚合指定一个我们想要名称,本例中是: popular_colors            "terms" : { 														#定义单个桶的类型 terms               "field" : "color"            }        }    }}## 执行返回结果{   "hits": {      "hits": [] 																		#因为我们设置了 size 参数,所以不会有 hits 搜索结果返回。   },   "aggregations": {      "popular_colors": { 													#popular_colors 聚合是作为 aggregations 字段的一部分被返回的。         "buckets": [            {               "key": "red", 												#每个桶的 key 都与 color 字段里找到的唯一词对应。它总会包含 doc_count 字段,告诉我们包含该词项的文档数量。               "doc_count": 4 											#每个桶的数量代表该颜色的文档数量            },            {               "key": "blue",               "doc_count": 2            },            {               "key": "green",               "doc_count": 2            }         ]      }   }}
添加度量指标

我们继续为汽车的例子加入 average 平均度量

GET /cars/transactions/_search{   "size" : 0,   "aggs": {      "colors": {         "terms": {            "field": "color"         },         "aggs": { //为度量新增 aggs 层。            "avg_price": {                "avg": {//为 price 字段定义 avg 度量                  "field": "price"                }            }         }      }   }}返回:{...   "aggregations": {      "colors": {         "buckets": [            {               "key": "red",               "doc_count": 4,               "avg_price": {                   "value": 32500               }            },            {               "key": "blue",               "doc_count": 2,               "avg_price": {//响应中的新字段 avg_price 。                  "value": 20000               }            },            {               "key": "green",               "doc_count": 2,               "avg_price": {                  "value": 21000               }            }         ]      }   }...}
嵌套桶

我们想知道每个颜色的汽车制造商的分布:

GET /cars/transactions/_search{   "size" : 0,   "aggs": {      "colors": {         "terms": {            "field": "color"         },         "aggs": {            "avg_price": {                "avg": {                  "field": "price"               }            },            "make": { //另一个聚合 make 被加入到了 color 颜色桶中。                "terms": {                    "field": "make" //这个聚合是 terms 桶,它会为每个汽车制造商生成唯一的桶。                }            }         }      }   }}返回:{...   "aggregations": {      "colors": {         "buckets": [            {               "key": "red",               "doc_count": 4,               "make": {                   "buckets": [                     {                        "key": "honda",                         "doc_count": 3                     },                     {                        "key": "bmw",                        "doc_count": 1                     }                  ]               },               "avg_price": {                  "value": 32500                }            },...}

为每个汽车生成商计算最低和最高的价格:

GET /cars/transactions/_search{   "size" : 0,   "aggs": {      "colors": {         "terms": {            "field": "color"         },         "aggs": {            "avg_price": { "avg": { "field": "price" }            },            "make" : {                "terms" : {                    "field" : "make"                },                "aggs" : {                     "min_price" : { "min": { "field": "price"} }, //然后包括 min 最小度量                    "max_price" : { "max": { "field": "price"} } //以及 max 最大度量。                }            }         }      }   }}返回:{...   "aggregations": {      "colors": {         "buckets": [            {               "key": "red",               "doc_count": 4,               "make": {                  "buckets": [                     {                        "key": "honda",                        "doc_count": 3,                        "min_price": {                           "value": 10000 //	min 和 max 度量现在出现在每个汽车制造商( make )下面                        },                        "max_price": {                           "value": 20000                         }                     },                     {                        "key": "bmw",                        "doc_count": 1,                        "min_price": {                           "value": 80000                        },                        "max_price": {                           "value": 80000                        }                     }                  ]               },               "avg_price": {                  "value": 32500               }            },...
条形图

直方图 histogram 特别有用。 它本质上是一个条形图,如果有创建报表或分析仪表盘的经验,那么我们会毫无疑问的发现里面有一些图表是条形图。 创建直方图需要指定一个区间,如果我们要为售价创建一个直方图,可以将间隔设为 20,000。这样做将会在每个 $20,000 档创建一个新桶,然后文档会被分到对应的桶中。

GET /cars/transactions/_search{   "size" : 0,   "aggs":{      "price":{         "histogram":{ //histogram 桶要求两个参数:一个数值字段以及一个定义桶大小间隔。            "field": "price",            "interval": 20000         },         "aggs":{            "revenue": {               "sum": { //sum 度量嵌套在每个售价区间内,用来显示每个区间内的总收入。                 "field" : "price"               }             }         }      }   }}返回:{...   "aggregations": {      "price": {         "buckets": [            {               "key": 0,               "doc_count": 3,               "revenue": {                  "value": 37000               }            },            {               "key": 20000,               "doc_count": 4,               "revenue": {                  "value": 95000               }            },            {               "key": 80000,               "doc_count": 1,               "revenue": {                  "value": 80000               }            }         ]      }   }}

结果如下图:

让我们以最受欢迎 10 种汽车以及它们的平均售价、标准差这些信息创建一个条形图。 我们会用到 terms 桶和 extended_stats 度量:

GET /cars/transactions/_search{  "size" : 0,  "aggs": {    "makes": {      "terms": {        "field": "make",        "size": 10      },      "aggs": {        "stats": {          "extended_stats": {            "field": "price"          }        }      }    }  }}

每月销售多少台汽车?

GET /cars/transactions/_search{   "size" : 0,   "aggs": {      "sales": {         "date_histogram": {            "field": "sold",            "interval": "month",             "format": "yyyy-MM-dd"          }      }   }}返回:{   ...   "aggregations": {      "sales": {         "buckets": [            {               "key_as_string": "2014-01-01",               "key": 1388534400000,               "doc_count": 1            },            {               "key_as_string": "2014-02-01",               "key": 1391212800000,               "doc_count": 1            },            {               "key_as_string": "2014-05-01",               "key": 1398902400000,               "doc_count": 1            },            {               "key_as_string": "2014-07-01",               "key": 1404172800000,               "doc_count": 1            },            {               "key_as_string": "2014-08-01",               "key": 1406851200000,               "doc_count": 1            },            {               "key_as_string": "2014-10-01",               "key": 1412121600000,               "doc_count": 1            },            {               "key_as_string": "2014-11-01",               "key": 1414800000000,               "doc_count": 2            }         ]...}
过滤聚合过滤

找到售价在 $10,000 美元之上的所有汽车同时也为这些车计算平均售价, 可以简单地使用一个 constant_score 查询和 filter 约束

GET /cars/transactions/_search{    "size" : 0,    "query" : {        "constant_score": {            "filter": {                "range": {                    "price": {                        "gte": 10000                    }                }            }        }    },    "aggs" : {        "single_avg_price": {            "avg" : { "field" : "price" }        }    }}
过滤桶

假设我们正在为汽车经销商创建一个搜索页面, 我们希望显示用户搜索的结果,但是我们同时也想在页面上提供更丰富的信息,包括(与搜索匹配的)上个月度汽车的平均售价。

GET /cars/transactions/_search{   "size" : 0,   "query":{      "match": {         "make": "ford"      }   },   "aggs":{      "recent_sales": {         "filter": { //使用 过滤 桶在 查询 范围基础上应用过滤器            "range": {               "sold": {                  "from": "now-1M"               }            }         },         "aggs": {            "average_price":{               "avg": {                  "field": "price" //avg 度量只会对 ford 和上个月售出的文档计算平均售价。               }            }         }      }   }}
后过滤器

我们为汽车经销商设计另外一个搜索页面,这个页面允许用户搜索汽车同时可以根据颜色来过滤

GET /cars/transactions/_search{    "size" : 0,    "query": {        "match": {            "make": "ford"        }    },    "post_filter": {//post_filter 元素是 top-level 而且仅对命中结果进行过滤。        "term" : {            "color" : "green"        }    },    "aggs" : {        "all_colors": {            "terms" : { "field" : "color" }        }    }}
聚合排序内置排序

这些排序模式是桶 固有的 能力:它们操作桶生成的数据 ,比如 doc_count 。 它们共享相同的语法,但是根据使用桶的不同会有些细微差别

做一个 terms 聚合但是按 doc_count 值的升序排序:

GET /cars/transactions/_search{    "size" : 0,    "aggs" : {        "colors" : {            "terms" : {              "field" : "color",              "order": {                "_count" : "asc" //用关键字 _count ,我们可以按 doc_count 值的升序排序              }            }        }    }}
_count

按文档数排序。对 terms 、 histogram 、 date_histogram 有效。_term

按词项的字符串值的字母顺序排序。只在 terms 内使用。_key

按每个桶的键值数值排序(理论上与 _term 类似)。 只在 histogram 和 date_histogram 内使用。按度量排序

我们可能想按照汽车颜色创建一个销售条状图表,但按照汽车平均售价的升序进行排序。

我们可以增加一个度量,再指定 order 参数引用这个度量即可:

GET /cars/transactions/_search{    "size" : 0,    "aggs" : {        "colors" : {            "terms" : {              "field" : "color",              "order": {                "avg_price" : "asc" //桶按照计算平均值的升序排序。              }            },            "aggs": {                "avg_price": {                    "avg": {"field": "price"} //计算每个桶的平均售价。                }            }        }    }}

我们可以采用这种方式用任何度量排序,只需简单的引用度量的名字。不过有些度量会输出多个值。 extended_stats 度量是一个很好的例子:它输出好几个度量值。

如果我们想使用多值度量进行排序, 我们只需以关心的度量为关键词使用点式路径:

GET /cars/transactions/_search{    "size" : 0,    "aggs" : {        "colors" : {            "terms" : {              "field" : "color",              "order": {                "stats.variance" : "asc" //使用 . 符号,根据感兴趣的度量进行排序。              }            },            "aggs": {                "stats": {                    "extended_stats": {"field": "price"}                }            }        }    }}
统计去重后的数量

SQL 形式比较熟悉:

SELECT COUNT(DISTINCT color)FROM cars

在es中实现,我们可以用 cardinality 度量确定经销商销售汽车颜色的数量:

GET /cars/transactions/_search{    "size" : 0,    "aggs" : {        "distinct_colors" : {            "cardinality" : {//cardinality去重              "field" : "color"            }        }    }}

标签: #hgroupcss