- Eclipse is a popular IDE donated by IBM to the open source community.
- Lucene is a text search engine library written in Java.
- Hbase is the Hadoop database.
- Hive provides data warehousing tools to extract, transform and load data, and then, query this data stored in Hadoop files.
- Pig is a high level language that generates MapReduce code to analyze large data sets.
- Jaql is a query language for JavaScript open notation.
- ZooKeeper is a centralized configuration service and naming registry for large distributed systems.
- Avro is a data serialization system.
- UIMA is the architecture for the development, discovery, composition and deployment for the analysis of unstructured data .
This site contains code snippets that I develop while learning and experimenting with SAS, R and Linux.
Sunday, December 7, 2014
Terminologies related to Hadoop
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment