e-Newsletter Issue 57
Building a Patabyte-scale LOG Analysis Platform Using Elasticsearch

Elasticsearch, developed by Dutch engineer, Shay Banon, is an easy-to-use, distributed full text search system that features distributed searches, horizontal expansion, and real-time analysis capabilities. It is suitable for performing quick searches of several billions of pieces of petabyte-scale log data. The bottom layer of Elasticsearch uses several preeminent open source projects such as Netty, a high performance networking java library that is adept at processing mass parallel networks, and Lucene, a program that enables Elasticsearch to easily process several billions of pieces of petabyte-scale log data every day.

In terms of its search function, using Lucene as its infrastructure, Elasticsearch is able to separate very large indexes into several smaller pieces and store those individual pieces on server nodes. When the user inputs his command search requirements, the requirements are distributed to each node, and then the final search results are transmitted back to the user.

Similar to Hadoop, Elasticsearch “owns” its ecosystem. In other words, each of its tools solves a specific problem and possesses certain flexibility. For example, by incorporating Kibana, one of the data visualization tools included in Elasticsearch, users do not need to know how to program as Kibana enables Elasticsearch to automatically generate various types of visual charts such as bar and pie charts, line graphs, scatter charts, and histograms. Elasticsearch can also display geographic information according to the Log data IP address. Various graphical representations allow the user to analyze data from different angles and help him discover hidden knowledge and threads within.

Elasticsearch’s outstanding search capability and ease-of-use have encouraged companies such as Github, Foursquare, SoundCloud, Mozilla, eBay, Linkedin, Sony, and Wikipedia to abandon their previous search engines for Elasticsearch. Because of this, Elasticsearch has gained the attention of several venture capital companies and has, since its initial release in 2010, received more than $100,000,000 in financing. This further serves to illustrate the current markets strong demand for free and easy-to-use big data analysis tools.

Current cloud environments consist of very large system architectures. With its ability to quickly and easily analyze mass Log data in real-time, Elasticsearch helps system administrators find and solve system problems. In terms of information security, Elasticsearch is able to rapidly analyze collected Log data and discover where information security leaks exist, as well as prevent attacks by hackers.


[1] https://github.com/netty/netty/wiki/Related-projects

[2] http://solr-vs-elasticsearch.com/

[3] https://www.elastic.co/use-cases

[4] http://betanews.com/2014/06/17/birst-brings-powerful-analytics-to-sap-hana/

 Share This Page
內頁-焦點新聞圖示 內頁-焦點新聞小圖
Examining New Trends in Research and Education Networks in Response to the Emerging Push of Big Data
內頁-每月一圖圖示 內頁-每月一圖小圖
The “Remote Crack Measuring System and Device” Wins the Platinum Award at the 2015 Taipei International Invention Show & Technomart
NARLabs and Argonne National Laboratory Co-host the Smart Cities and Urban Analytics Workshop
“Angel Star” Wins the Fourth Annual NCHC HPC Kung Fu - 3D Animation Challenge!
DevOps—The Innovation of Continuous Software Delivery
Building a Patabyte-scale LOG Analysis Platform Using Elasticsearch
Apache Spark -- The Most Famous of Big Data Analysis Tools