Sunday, February 28, 2016

Elasticsearch with Python: document APIs

What is a document in ES: "In Elasticsearch, the term document has a specific meaning. It refers to the top-level, or root object that is serialized into JSON and stored in Elasticsearch under a unique ID."

If comparing ES with traditional SQL server, a document to ES is more like a row to SQL. It usually represent one object stored in ES's index.

There are quite a few handy commands in elasticsearch-py to do following operations:

Insert/update single document into an index


or one can specify the op_type as "create", with the only difference that if the "id" already exist in the index, the operation with "create" will return error, while the operation with default ("index") will simply update the document.


Delete single document from an index



Insert/update/delete multiple document(s) into an index

This is more involving than a single document operation. This require to load the "helpers" module, and pass the "es" and a list of "actions" into the "helpers" object.


Then we may want to access to different documents. Accessing different documents is not exactly same as doing a search, because one need to know its index, type and id. So it would be nice to put it here before going deep into search.

Access documents from an index







Elasticsearch with Python: introduction

Elasticsearch (ES) is a search server based on Lucene. Currently it has been widely used in different products to provide near real-time search capacity. As a Python user focusing on analytics, having ES integrated with existing Python experience would be mostly helpful & convenient. Recently, I found the following two official ES client quite useful:
1. Elasticsearch-py: a low-level client for ES with most support.
2. Elasticsearch-dsl: a high-level client to write and manipulate ES queries.

Preparation

Step 2: install python analytics packages (iPython Notebook, pip install elasticsearch-py and elasticsearch-dsl), then open the iPython Notebook (or called Jupyter notebook now)
Step 3: set up the connection between the iPython notebook with your local server.
from elasticsearch import Elasticsearch
es = Elasticsearch('localhost:9200')
That's it! Now the "es" variable refers to the elastic search instance you just initiated.

Explore ES Client 

Step 1.: install ES on local machine to start the game. (version: 1.7.5) After download the package, unzip it, and then run "./bin/elasticsearch", a new ES server will be started. By default, the url to access is "localhost:9200"
The mostly used one (beside es itself) is the es.indices, since it includes all operations about the index.

Index Operation

The index operation includes: create, display, and delete. Those operations are shown below: