Please see my other blog for Oracle EBusiness Suite Posts - EBMentors

Search This Blog

Note: All the posts are based on practical approach avoiding lengthy theory. All have been tested on some development servers. Please don’t test any post on production servers until you are sure.

Thursday, April 19, 2018

Working with Elasticsearch


Introduction


Elasticsearch is a  distributed, scalable, real-time search and analytics engine built on top of Apache Lucene™,. Lucene ( a library) is arguably the most advanced, high-performance, and fully featured search engine library in existence today—both open source and proprietary. It enables you to search, analyze, and explore your data whether you need full-text search, real-time analytics of structured data, or a combination of the two. 



Elasticsearch is much more than just Lucene and much more than “just” full-text search. It can also be described as follows:

  • A distributed real-time document store where every field is indexed and searchable
  • A distributed search engine with real-time analytics
  • Capable of scaling to hundreds of servers and petabytes of structured and unstructured data

It packages up all this functionality into a standalone server that your application can talk to via a simple RESTful API, using a web client from your favorite programming language, or even from the command line.


Installing and Running


1- The only requirement for installing Elasticsearch is a recent version of Java. You can get the latest version of Elasticsearch archive  from elastic.co/downloads/elasticsearchOnce you’ve extracted the archive file, Elasticsearch is ready to run.

[hdpsysuser@hdpmaster elk]$ java -version
[hdpsysuser@hdpmaster elk]$ tar -xvf elasticsearch-6.2.3.tar.gz
[hdpsysuser@hdpmaster elk]$ tar -xvf kibana-6.2.3-linux-x86_64.tar.gz
[hdpsysuser@hdpmaster elk]$ tar -xvf logstash-6.2.3.tar.gz

export ES_HOME=/usr/hadoopsw/elk/elasticsearch-6.2.3
export PATH=$PATH:$ES_HOME/bin
export KIBANA_HOME=/usr/hadoopsw/elk/kibana-6.2.3-linux-x86_64
export PATH=$PATH:$KIBANA_HOME/bin


2- Go to the Elasticsearch home directory and inside the bin folder. Default port for Elasticsearch web interface is 9200. You can change it by changing http.port inside elasticsearch.yml file present in bin directory.


[hdpsysuser@hdpmaster elasticsearch-6.2.3]$ cd /usr/hadoopsw/elk/elasticsearch-6.2.3/bin


Config location: /usr/hadoopsw/elk/elasticsearch-6.2.3/config/elasticsearch.yml
network.host: 0.0.0.0
http.port: 9200

In config/elasticsearch.yml put, do the same for Kibana if you have it also.
network.host: 0.0.0.0

Elasticsearch loads its configuration from the $ES_HOME/config/elasticsearch.yml file by default. Any settings that can be specified in the config file can also be specified on the command line, using the -E syntax as follows:

./bin/elasticsearch -d -Ecluster.name=my_cluster -Enode.name=node_1


3- start it up in the foreground, Add -d if you want to run it in the background as a daemon. When Elasticsearch is running in the foreground, you can stop it by pressing Ctrl-C.
./bin/elasticsearch

-- To run Elasticsearch as a daemon

./bin/elasticsearch -d -p pid
Log messages can be found in the $ES_HOME/logs/ directory.

To shut down Elasticsearch, kill the process ID recorded in the pid file:

kill `cat pid`


On Linux, Elasticsearch uses a lot of file descriptors or file handles. Running out of file descriptors can be disastrous and will most probably lead to data loss. Make sure to increase the limit on the number of open files descriptors for the user running Elasticsearch to 65,536 or higher. Set nofile to 65536 in /etc/security/limits.conf.




[hdpsysuser@hdpmaster ~]$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 15673
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 65536
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 4096
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited


vi /etc/security/limits.conf
*       hard nofile 65536
*       soft nofile 65536


You may get below error also while starting Elasticsearch

max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

[hdpsysuser@hdpmaster ~]$ sudo sysctl -w vm.max_map_count=262144

vm.max_map_count = 262144

The vm_map_max_count setting should be set permanently in /etc/sysctl.conf:

vi /etc/sysctl.conf

vm.max_map_count=262144

4- You can check if the server is up and running by browsing http://localhost:9200. It will return a JSON object, which contains the information about the installed Elasticsearch

[hdpsysuser@hdpmaster ~]$ curl http://localhost:9200

{
  "name" : "w_ykJGL",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "bie9YKCvRlWqetugFWPawg",
  "version" : {
    "number" : "6.2.3",
    "build_hash" : "c59ff00",
    "build_date" : "2018-03-13T10:06:29.741383Z",
    "build_snapshot" : false,
    "lucene_version" : "7.2.1",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}


Talking to Elasticsearch

You can talk with Elastic search using Java API or RESTful API. For Java, Elasticsearch comes with two built-in clients (Node and Transport client) that you can use in your code over port 9300. All other languages can communicate with Elasticsearch over port 9200 using a RESTful API, accessible with your favorite web client. Elasticsearch provides official clients for several languages—Groovy, JavaScript, .NET, PHP, Perl, Python, and Ruby.


A request to Elasticsearch consists of the same parts as any HTTP request:
 curl -X<VERB> '<PROTOCOL>://<HOST>:<PORT>/<PATH>?<QUERY_STRING>' -d '<BODY>'


The parts marked with < > above are:

VERB
The appropriate HTTP method or verbGETPOSTPUTHEAD, or DELETE.
PROTOCOL
Either http or https (if you have an https proxy in front of Elasticsearch.)
HOST
The hostname of any node in your Elasticsearch cluster, or localhost for a node on your
 local machine.
PORT
The port running the Elasticsearch HTTP service, which defaults to 9200.
PATH
API Endpoint (for example _count will return the number of documents in the cluster). 
Path may contain multiple components, such as _cluster/stats or _nodes/stats/jvm
QUERY_STRING
Any optional query-string parameters (for example ?pretty will pretty-print the JSON 
response to make it easier to read.)
BODY
A JSON-encoded request body (if the request needs one.)
For instance, to count the number of documents in the cluster, we could use this:

curl -XGET 'http://localhost:9200/_count?pretty' -d '
{
    "query": {
        "match_all": {}
    }
}
'

Elasticsearch returns an HTTP status code like 200 OK and (except for HEAD requests) a JSON-encoded response body. The preceding curl request would respond with a JSON body like the following:

{
    "count" : 0,
    "_shards" : {
        "total" : 5,
        "successful" : 5,
        "failed" : 0
    }
}


We don’t see the HTTP headers in the response because we didn’t ask curl to display them. To see the headers, use the curl command with the -i switch:


[hdpsysuser@hdpmaster bin]$ curl -i -XGET 'localhost:9200/'

Nature of Elasticsearch


Elasticsearch is document oriented, meaning that it stores entire objects or documents. It not only stores them, but also indexes the contents of each document in order to make them searchable. In Elasticsearch, you index, search, sort, and filter documents—not rows of columnar data. This is a fundamentally different way of thinking about data and is one of the reasons Elasticsearch can perform complex full-text search.


Elasticsearch uses JavaScript Object Notation, or JSON, as the serialization format for documents. JSON serialization is supported by most programming languages, and has become the standard format used by the NoSQL movement. It is simple, concise, and easy to read.


Walk Through Elasticsearch


To have a feel for what is possible in Elasticsearch and how easy it is to use, let’s start by walking through simple statements that cover basic concepts such as indexing, search, and aggregations.


Indexing

The act of storing data in Elasticsearch is called indexing,but before we can index  a document, we need to decide where to store it. An Elasticsearch cluster can contain multiple indices, which in turn contain multiple types. These types hold multiple documents, and each document has multiple fields. 

An index is like a database in a traditional relational database. Indexing is much like the INSERT keyword in SQL except that, if the document already exists, the new document would replace the old.

Relational databases add an index, such as a B-tree index, to specific columns in order to improve the speed of data retrieval. Elasticsearch and Lucene use a structure calledan inverted index for exactly the same purpose.

By default, every field in a document is indexed (has an inverted index) and thus is searchable. A field without an inverted index is not searchable.

Create Index

[hdpsysuser@hdpmaster bin]$ curl -X PUT http://localhost:9200/elktest
{"acknowledged":true,"shards_acknowledged":true,"index":"elktest"}
Using index

insert a “Hello, world” test document to verify that your new index is available

curl --header "content-type: application/JSON" -XPOST http://localhost:9200/elktest/test/hello -d '{"title":"Hello world"}' 

[hdpsysuser@hdpmaster bin]$ curl --header "content-type: application/JSON" -XPOST http://localhost:9200/elktest/test/hello -d '{"title":"Hello world"}'

{"_index":"elktest","_type":"test","_id":"hello","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}


Retrieve document

You may view this document by issuing a GET request to the _search endpoint

curl -XGET 'http://localhost:9200/elktest/test/hello'

[hdpsysuser@hdpmaster bin]$ curl -XGET 'http://localhost:9200/elktest/test/hello'

{"_index":"elktest","_type":"test","_id":"hello","_version":1,"found":true,"_source":{"title":"Hello world"}}


Basic health check

[hdpsysuser@hdpmaster bin]$ curl -XGET 'localhost:9200/_cat/health?v&pretty'

epoch      timestamp cluster       status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1523186823 11:27:03  elasticsearch yellow          1         1      5   5    0    0        5             0                  -                 50.0%

List of nodes in cluster 

[hdpsysuser@hdpmaster bin]$ curl -XGET 'localhost:9200/_cat/nodes?v&pretty'

ip        heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
127.0.0.1           15          67   5    0.00    0.01     0.05 mdi       *      w_ykJGL


List of indices

[hdpsysuser@hdpmaster bin]$ curl -XGET 'localhost:9200/_cat/indices?v&pretty'

health status index   uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   elktest PAqzGiWRTeeSea8qAvAN3A   5   1          1            0      4.4kb          4.4kb


Create an index named "customer" and then list all the indexes again

[hdpsysuser@hdpmaster bin]$ curl -XPUT 'localhost:9200/customer?pretty&pretty'
{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "customer"
}

[hdpsysuser@hdpmaster bin]$ curl -XGET 'localhost:9200/_cat/indices?v&pretty'

health status index    uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   elktest  PAqzGiWRTeeSea8qAvAN3A   5   1          1            0      4.4kb          4.4kb
yellow open   customer f-9cBDSlQ-aNmAWhHhj_rA   5   1          0            0       230b           230b


Put something into our customer index. We’ll index a simple customer document into the customer index



curl -XPUT 'localhost:9200/customer/_doc/1?pretty&pretty' -H 'Content-Type: application/json' -d'

{
  "name": "Inam Bukhari"
}
'

[hdpsysuser@hdpmaster bin]$ curl -XPUT 'localhost:9200/customer/_doc/1?pretty&pretty' -H 'Content-Type: application/json' -d'
> {
>   "name": "Inam Bukhari"
> }
> '
{
  "_index" : "customer",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

Retrieve document

Retrieve that document that we just indexed

curl -XGET 'localhost:9200/customer/_doc/1?pretty&pretty'
[hdpsysuser@hdpmaster bin]$ curl -XGET 'localhost:9200/customer/_doc/1?pretty&pretty'
{
  "_index" : "customer",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "found" : true,
  "_source" : {
    "name" : "Inam Bukhari"
  }
}

Delete the index

curl -XDELETE 'localhost:9200/elktest?pretty&pretty'

[hdpsysuser@hdpmaster bin]$ curl -XDELETE 'localhost:9200/elktest?pretty&pretty'

{
  "acknowledged" : true
}

[hdpsysuser@hdpmaster bin]$ curl -XGET 'localhost:9200/_cat/indices?v&pretty'
health status index    uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   customer f-9cBDSlQ-aNmAWhHhj_rA   5   1          1            0      4.3kb          4.3kb

Replace the existing doc

Replace the existing doc with new one using same ID eg; 1, see the version info 

curl -XPUT 'localhost:9200/customer/_doc/1?pretty&pretty' -H 'Content-Type: application/json' -d'
{
  "name": "Inam Bukhari"
}
'
[hdpsysuser@hdpmaster bin]$ curl -XPUT 'localhost:9200/customer/_doc/1?pretty&pretty' -H 'Content-Type: application/json' -d'
> {
>   "name": "Inam Bukhari"
> }
> '
{
  "_index" : "customer",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 2,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 1,
  "_primary_term" : 1
}


Update documents

curl -XPOST 'localhost:9200/customer/_doc/1/_update?pretty&pretty' -H 'Content-Type: application/json' -d'
{
  "doc": { "name": "Inaam Bukhary" }
}
'

[hdpsysuser@hdpmaster bin]$ curl -XPOST 'localhost:9200/customer/_doc/1/_update?pretty&pretty' -H 'Content-Type: application/json' -d'
> {
>   "doc": { "name": "Inaam Bukhary" }
> }
> '
{
  "_index" : "customer",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 3,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 2,
  "_primary_term" : 1
}

curl -XPOST 'localhost:9200/customer/_doc/1/_update?pretty&pretty' -H 'Content-Type: application/json' -d'
{
  "doc": { "name": "Inaam Bukhary", "age": 30 }
}
'
[hdpsysuser@hdpmaster bin]$ curl -XPOST 'localhost:9200/customer/_doc/1/_update?pretty&pretty' -H 'Content-Type: application/json' -d'
> {
>   "doc": { "name": "Inaam Bukhary", "age": 30 }
> }
> '
{
  "_index" : "customer",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 4,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 3,
  "_primary_term" : 1
}

Update by scripts

Updates can also be performed by using simple scripts. 
curl -XPOST 'localhost:9200/customer/_doc/1/_update?pretty&pretty' -H 'Content-Type: application/json' -d'
{
  "script" : "ctx._source.age += 5"
}
'
[hdpsysuser@hdpmaster bin]$ curl -XPOST 'localhost:9200/customer/_doc/1/_update?pretty&pretty' -H 'Content-Type: application/json' -d'
> {
>   "script" : "ctx._source.age += 5"
> }
> '
{
  "_index" : "customer",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 5,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 4,
  "_primary_term" : 1
}

Deleting a document 

[hdpsysuser@hdpmaster bin]$ curl -XDELETE 'localhost:9200/customer/_doc/1?pretty&pretty'
{
  "_index" : "customer",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 6,
  "result" : "deleted",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 5,
  "_primary_term" : 1
}

Delete by Query

POST logstash-2018.04.23/_delete_by_query
{
  "query": { 
    "match": {
      "message": "HDFS"
    }
  }

}


POST logstash-2018.04.23/_delete_by_query
{
  "query": { 
    "match_all": {}
  }

}



POST logstash-2018.04.23/_delete_by_query
{
  "query": { 
    "match": {"@timestamp":"2018-04-23"}
  }

}

Batch Operations

Elasticsearch also provides the ability to perform any of the above operations in batches using the _bulk API. 

curl -XPOST 'localhost:9200/customer/_doc/_bulk?pretty&pretty' -H 'Content-Type: application/json' -d'
{"index":{"_id":"1"}}
{"name": "Inam Bukhari" }
{"index":{"_id":"2"}}
{"name": "Abuzar Bukhari" }
'
[hdpsysuser@hdpmaster bin]$ curl -XPOST 'localhost:9200/customer/_doc/_bulk?pretty&pretty' -H 'Content-Type: application/json' -d'
> {"index":{"_id":"1"}}
> {"name": "Inam Bukhari" }
> {"index":{"_id":"2"}}
> {"name": "Abuzar Bukhari" }
> '
{
  "took" : 41,
  "errors" : false,
  "items" : [
    {
      "index" : {
        "_index" : "customer",
        "_type" : "_doc",
        "_id" : "1",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 6,
        "_primary_term" : 1,
        "status" : 201
      }
    },
    {
      "index" : {
        "_index" : "customer",
        "_type" : "_doc",
        "_id" : "2",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 1,
        "_primary_term" : 1,
        "status" : 201
      }
    }
  ]
}

curl -XPOST 'localhost:9200/customer/_doc/_bulk?pretty&pretty' -H 'Content-Type: application/json' -d'
{"update":{"_id":"1"}}
{"doc": { "name": "Inaam Bukhary" } }
{"delete":{"_id":"2"}}
'

[hdpsysuser@hdpmaster bin]$ curl -XPOST 'localhost:9200/customer/_doc/_bulk?pretty&pretty' -H 'Content-Type: application/json' -d'
> {"update":{"_id":"1"}}
> {"doc": { "name": "Inaam Bukhary" } }
> {"delete":{"_id":"2"}}
> '
{
  "took" : 45,
  "errors" : false,
  "items" : [
    {
      "update" : {
        "_index" : "customer",
        "_type" : "_doc",
        "_id" : "1",
        "_version" : 2,
        "result" : "updated",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 7,
        "_primary_term" : 1,
        "status" : 200
      }
    },
    {
      "delete" : {
        "_index" : "customer",
        "_type" : "_doc",
        "_id" : "2",
        "_version" : 2,
        "result" : "deleted",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 2,
        "_primary_term" : 1,
        "status" : 200
      }
    }
  ]
}

Exploring Data

Loading Dataset

curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/_doc/_bulk?pretty&refresh" --data-binary "@/data/accounts.json"


[hdpsysuser@hdpmaster bin]$ curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/_doc/_bulk?pretty&refresh" --data-binary "@/data/accounts.json"

[hdpsysuser@hdpmaster bin]$ curl "localhost:9200/_cat/indices?v"
health status index    uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   customer f-9cBDSlQ-aNmAWhHhj_rA   5   1          1            0      4.4kb          4.4kb
yellow open   bank     ymIPSIiDS9i_iiQeM6mD1w   5   1       1000            0    498.3kb        498.3kb



Simple searches



[hdpsysuser@hdpmaster bin]$ curl -XGET 'localhost:9200/bank/_search?q=*&sort=account_number:asc&pretty&pretty'


[hdpsysuser@hdpmaster bin]$ curl -XGET 'localhost:9200/bank/_search?q=_id:9&sort=account_number:asc&pretty&pretty'

-- same exact search above using the alternative request body method

curl -XGET 'localhost:9200/bank/_search?pretty' -H 'Content-Type: application/json' -d'
{
  "query": { "match_all": {} },
  "sort": [
    { "account_number": "asc" }
  ]
}
'


Query Language



we also can pass other parameters to influence the search results. In the example in the section above we passed in sort, here we pass in size (limit in SQL):


curl -XGET 'localhost:9200/bank/_search?pretty' -H 'Content-Type: application/json' -d'
{
  "query": { "match_all": {} },
  "size": 1
}
'
--  documents 10 through 19:
curl -XGET 'localhost:9200/bank/_search?pretty' -H 'Content-Type: application/json' -d'
{
  "query": { "match_all": {} },
  "from": 10,
  "size": 10
}
'


-- match_all and sorts the results by account balance in descending order and returns the top 10 (default size) documents.

curl -XGET 'localhost:9200/bank/_search?pretty' -H 'Content-Type: application/json' -d'

{

  "query": { "match_all": {} },
  "sort": { "balance": { "order": "desc" } }
}
'


Executing Searches



-- return two fields, account_number and balance (inside of _source), from the search:


curl -XGET 'localhost:9200/bank/_search?pretty' -H 'Content-Type: application/json' -d'
{
  "query": { "match_all": {} },
  "_source": ["account_number", "balance"]
}
'

-- basic fielded search query 

curl -XGET 'localhost:9200/bank/_search?pretty' -H 'Content-Type: application/json' -d'

{

  "query": { "match": { "account_number": 20 } }

}
'


-- returns all accounts containing the term "mill" in the address



curl -XGET 'localhost:9200/bank/_search?pretty' -H 'Content-Type: application/json' -d'

{
  "query": { "match": { "address": "mill" } }
}
'


-- returns all accounts containing the term "mill" or "lane" in the address:

curl -XGET 'localhost:9200/bank/_search?pretty' -H 'Content-Type: application/json' -d'

{

  "query": { "match": { "address": "mill lane" } }
}
'


-- a variant of match (match_phrase) that returns all accounts containing the phrase "mill lane" in the address:



curl -XGET 'localhost:9200/bank/_search?pretty' -H 'Content-Type: application/json' -d'

{
  "query": { "match_phrase": { "address": "mill lane" } }
}
'


-- The bool query allows us to compose smaller queries into bigger queries using boolean logic.

--  returns all accounts containing "mill" and "lane" in the address:

-- bool must clause specifies all the queries that must be true for a document to be considered a match.


curl -XGET 'localhost:9200/bank/_search?pretty' -H 'Content-Type: application/json' -d'
{
  "query": {
    "bool": {
      "must": [
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
      ]
    }
  }
}
'

-- the bool should clause specifies a list of queries either of which must be true for a document to be considered a match.



curl -XGET 'localhost:9200/bank/_search?pretty' -H 'Content-Type: application/json' -d'

{
  "query": {
    "bool": {
      "should": [
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
      ]
    }
  }
}
'

-- the bool must_not clause specifies a list of queries none of which must be true for a document to be considered a match.



curl -XGET 'localhost:9200/bank/_search?pretty' -H 'Content-Type: application/json' -d'

{
  "query": {
    "bool": {
      "must_not": [
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
      ]
    }
  }
}
'


-- We can combine must, should, and must_not clauses simultaneously inside a bool query. 

-- below returns all accounts of anybody who is 40 years old but doesn’t live in ID:

curl -XGET 'localhost:9200/bank/_search?pretty' -H 'Content-Type: application/json' -d'

{
  "query": {
    "bool": {
      "must": [
        { "match": { "age": "40" } }
      ],
      "must_not": [
        { "match": { "state": "ID" } }
      ]
    }
  }
}
'


Executing Filters



--return all accounts with balances between 20000 and 30000, inclusive.


curl -XGET 'localhost:9200/bank/_search?pretty' -H 'Content-Type: application/json' -d'
{
  "query": {
    "bool": {
      "must": { "match_all": {} },
      "filter": {
        "range": {
          "balance": {
            "gte": 20000,
            "lte": 30000
          }
        }
      }
    }
  }
}
'


Executing Aggregation

-- group all the accounts by state, and then returns the top 10 (default) states sorted by 

count descending (also default):


curl -XGET 'localhost:9200/bank/_search?pretty' -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword"
      }
    }
  }
}
'


-- calculates the average account balance by state (again only for the top 10 states sorted by count in descending order):



curl -XGET 'localhost:9200/bank/_search?pretty' -H 'Content-Type: application/json' -d'

{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword"
      },
      "aggs": {
        "average_balance": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}
'









No comments: