- Hadoop is an open source project of the Apache Foundation.
- It is a framework written in Java originally developed by Doug Cutting who named it after his son's toy elephant.
- Hadoop uses Google’s MapReduce and Google File System technologies as its foundation.It is optimized to handle massive quantities of data which could be structured, unstructured or semi-structured, using commodity hardware,that is, relatively inexpensive computers.
- Hadoop replicates its data across multiple computers, so that if one goes down, the data is processed on one of the replicated computers. It is a batch operation handling massive quantities of data, so the response time is not immediate.
- Hadoop is not good to process transactions due to its lack random access.
- Hadoop is not suitable for OnLine Transaction Processing workloads where data is randomly accessed on structured data like a relational database.
- Hadoop is not suitable for OnLine Analytical Processing or Decision Support System workloads where data is sequentially accessed on structured data like a relational database, to generate reports that provide business intelligence.
- It is NOT a replacement for a relational database system.
This site contains code snippets that I develop while learning and experimenting with SAS, R and Linux.
Sunday, December 7, 2014
Initials of HADOOP
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment