An Introduction to Big Data


Credibll

Uploaded on Oct 11, 2018

Category Technology

Big Data is a collection of a wide range of data analytics and data gathering strategies. It is a kind of capability to capture, store and analyze data on a mass scale to help in business decisions.

Category Technology

Comments

                     

An Introduction to Big Data

Big Data B I G D A T A A N I N T R O D U C T I O N B Y C R E D I B L L W W W . C R E D I B L L . C O M What is Big Data? Big Data is a collection of a wide range of data analytics and data gathering strategies. It is a kind of capability to capture, store and analyze data on a mass scale to help in business decisions. Data is a kind of resource itself which helps companies with such vital information which is helpful to draw deep insights into human behaviour. Big data provides a new view of traditional metrics like sales and marketing information. Characteristics of Big Data S I N A I D E S I G N E R S Volume: The quantity of generated an stored data. Variety: The type and nature of data. Velocity: The speed at which data is generated and processed. Variability: It is the consistency of data sets. Veracity: It is about the quality of captured data. Some key Facts about Big Data Data is growing at lightning fast speed, studies show that by the year 2020 around 2MB data will be created for every user every second. Google receives around 40000 queries per second, which is around 1.2 trillion searches for a year. Facebook receives around ~35000 likes per minute. Youtube receives around 300 hours of video content every minute. Google processes 20,000 TB of data every day. Reasons Organizations need to move on Big Data As technologies are shifting from analogue to digital need of increased data storage has multiplied manifolds. In Big-data data is stored in a single warehouse on a single location, it minimizes the risk and promotes calculated decision at right time. Big Data technologies like NoSQL and MapReduce provide the ability to retrieve the information without changing the structure in a data base. Frameworks supported by Big Data APACHE MAHOUT It is a kind of library which uses MapReduce paradigm on top of Hadoop. It provides Java libraries for statistical and algebraic operations. It helps in creating a scalable performance oriented machine learning application on Hadoop. It provides better user targeting based on predictions of audience interests. APACHE PIG It is a High-Level language named Pig latin which resembles sql but has some minor differences. This program executes large data sets by executing Map Reduce Jobs. It prevents data frauds by detailed transaction analysis. Analyze user engagement on the web. APACHE SPARK It is a kind of tool used for general purpose used for large scale processing engine. It is quiet fast, easy to use and have advanced options for development. One can create application which works faster than normal. It is the most suited processing engine for performaing advanced analytics in large scale data processing. APACHE HIVE Hive provides a mechanism to structure organizational data through HiveQL. Provides better management and querying for large data sets. Reduces time for semantic check. Ad-hoc style querying. APACHE SOLR SOLR provides text search, real-time indexing, faceted search. Offers dynamic clustering and rich documents handling. It is designed for scalability and fault tolerance It supports indexing and searching through multiple sites. Thank You  Visit: www.credibll.com