Introduction

ELK stands for Elasticsearch, Logstash and Kibana. It is complete suite of tools for reading, storing or indexing and analysing data in real time. People have been using HDFS, MapReduce, Pig etc. for data analytics in offline or batch mode. In those applications, data is sent to HDFS as soon as they come, but the an  alytics using the MapReduce or Pig is done in offline or batch mode only. With social media like Twitter, Facebook etc becoming more and more popular, the data size and its speed is increasing day-by-day. This is where real time analytics comes into picture. There are several other tools available for real time data analytics, but they either solve half of the problem or they are paid. The advantage with ELK is that it is a complete suite, free and customizable.

Components

  1. Logstash: Logstash reads the unstructured data from variety of sources and sends it to Elasticsearch after applying some filters or transformation.
  2. Elasticsearch: Elasticsearch stores and indexes the data in real time (sent from logstash) using NoSQL concept. It just takes 1 second to index data before it becomes searchable.
  3. Kibana: Kibana uses the indexes in Elasticsearch to give you the insight or graphical presentation of the data that you can interact with.

 Features of Elasticsearch

  • Flexible, powerful, distributed real time search and analytics engine.
  • SLA of just 1 second for a data to become searchable after it is inserted or consumed.
  • Ability to scale horizontally, full text search, RESTful APIs, schema free and built on top of Apache Lucene.

 Features of Logstash

  • Collects unstructured data from a variety of sources (like apache logs, syslogs, imap, jmx, log4j, amazon S3, tcp/udp socket, even Twitter !!) to a centralized place.
  • Runs or applies filter to parse or modify the data (like json/xml parsing, CSV parsing, Browser useragent parsing, IP based geolocation parsing) in a customized way.
  • Sends the data to variety of outputs (like file, email, graphite, http, mongodb, amazon S3, Solr, syslog, tcp/udp socket).

 Features of Kibana

  • No code to write.
  • Creates visualizations in variety of formats (like Pie chart, Bar chart, geographic distribution, line chart etc)
  • Time based analysis and comparison of data.
  • Flexible and powerful search syntax.

 Example use case

We all have access logs in our web servers e.g. Tomcat. These access logs have information about the incoming HTTP requests like remote host, received time, request received, HTTP status code, response in bytes, response time etc. Using logstash, we can read these logs in real time i.e. LIVE, as soon as they come from all of our servers, parse/apply some transformation and send it to a centralized Elasticsearch server (running in a cluster). We can then generate visualization in Kibana which presents the data sampled over time in graphical format like:

  • Top requests or products.
  • Top HTTP codes or error statistics.
  • Peak traffic hours within a day.
  • Traffic history.
  • Location (Country/Continent) based traffic
  • Location (Country/Continent) based aggregations like average response time, max response time
  • Slowest pages.
  • Analytics on Browser agents like Chrome, Firefox, IE.
  • Analytics on Browser agent OS like Windows 7, Linux

Kibana versions

If you are going to use Kibana for production use, I would advise to use Kibana 3 as it has following benefits:

  • Auto refresh, i.e. real time insight or graph of traffic
  • More graphs or panels
  • Better UI and control over configurations
  • Dashboard that can be saved and shared as json
  • Is stable
Events over time

Kibana 3 – Events over time

 

Real time statistics e.g. min, max, avg

Kibana 3 – Real time statistics e.g. min, max, avg

 

Real time charts and geographic distribution

Kibana 3 – Real time charts and geographic distribution

Real time bars and pie charts

Kibana 3 – Real time bars and pie charts

 

Geographic distribution

Kibana 4 – Geographic distribution

Traffic history

Pie chart

Kibana 4 – Pie chart