Pages

Sunday, December 7, 2014

Initials of HADOOP

  1. Hadoop is an open source project of the Apache Foundation.
  2. It is a framework written in Java originally developed by Doug Cutting who named it after his son's toy elephant.
  3. Hadoop uses Google’s MapReduce and Google File System technologies as its foundation.It is optimized to handle massive quantities of data which could be structured, unstructured or semi-structured, using commodity hardware,that is, relatively inexpensive computers. 
  4. Hadoop replicates its data across multiple computers, so that if one goes down, the data is processed on one of the replicated computers. It is a batch operation handling massive quantities of data, so the response time is not immediate. 
  5. Hadoop is not good to process transactions due to its lack random access. 
  6. Hadoop is not suitable for OnLine Transaction Processing workloads where data is randomly accessed on structured data like a relational database. 
  7. Hadoop is not suitable for OnLine Analytical Processing or Decision Support System workloads where data is sequentially accessed on structured data like a relational database, to generate reports that provide business intelligence. 
  8. It is NOT a replacement for a relational database system.

    Terminologies related to Hadoop

    1. Eclipse is a popular IDE donated by IBM to the open source community.
    2. Lucene is a text search engine library written in Java.
    3. Hbase is the Hadoop database.
    4. Hive provides data warehousing tools to extract, transform and load data, and then, query this data stored in Hadoop files.
    5. Pig is a high level language that generates MapReduce code to analyze large data sets.
    6. Jaql is a query language for JavaScript open notation.
    7. ZooKeeper is a centralized configuration service and naming registry for large distributed  systems. 
    8. Avro is a data serialization system.
    9. UIMA is the architecture for the development, discovery, composition and deployment for the analysis of unstructured data . 

                        Monday, March 3, 2014

                        Buisness Intellegence

                        Buisness intellegence helps to manage data by different skills , technologies , security and  quality risk . This also helps in  acheiving  better understanding of data .Buisness intellegence can also be  considered  as collective information. It helps in making predictions of buisness operations using gathered data in the warehouse.Buisness intellegence application also helps to tackle sales , financial, production ,etc buisness data . It helps in  better decision making and can be also considered  as a decision support system.

                        SAS buisness intellegence has analytical capabilities like ststistics, reporting, Data mining , prediction forcasting and optimization  .They help in getting data in desired format . it heps in inproving data quality.