An Overview of Apache’s Hadoop

By Chandra Heitzman

Designed by the Apache Software Foundation, Hadoop is a Java-based open-source platform designed to process massive amounts of data in a distributed computing environment. Hadoop’s key innovations lay in its ability to store and access massive amounts of data over thousands of computers and to coherently present that data.

Though data warehouses can store data on a similar scale, they are costly and do not allow for effective exploration of huge amounts of discordant data. Hadoop addresses this limitation by taking a data query and distributing it over multiple computer clusters. By distributing the workload over thousands of loosely networked computers (nodes), Hadoop can potentially examine and present petabytes of heterogeneous data in a meaningful format. Even so, the software is fully scalable and can operate on a single server or small network.

Hadoop’s distributed computing abilities are actually derived from two software frameworks: the Hadoop Distributed File System (HDFS) and MapReduce. HDFS facilitates rapid data transfer between computer nodes and allows continued operation even in the event of node failure. MapReduce distributes all data processing over these nodes, thus reducing the workload on each individual computer and allowing for computations and analysis beyond the capabilities of a single computer or network. For example, Facebook uses MapReduce for analysis of user behavior and advertisement-tracking, amounting to about 21 petabytes of information. Other prominent users include IBM, Yahoo, and Google, typically for use in search engines and advertising.

A typical application of Hadoop requires the understanding that it is designed to run on a large number of machines without shared hardware or memory. When a financial institution wants to analyze data from dozens of servers, Hadoop breaks apart the data and distributes it throughout those servers. Hadoop also replicates the data, preventing data loss in the event of most failures. In addition, MapReduce expands potential computing speed by dividing and distributing LARGE data analysis through all servers or computers in a cluster, but answers the query in a single result set.

Though Hadoop offers a scalable approach to data storage and analysis, it is not meant as a substitute for a standard database (e.g. SQL Server 2012 database). Hadoop stores data in files, but does not index them for easy locating. Finding the data requires MapReduce, which will take more time than what can be considered efficient for simple database operations. Hadoop functions best when the dataset is too large for conventional storage and too diverse for easy analysis.

The digitization of information has increased nine times in the last five years, with companies spending an estimated four trillion dollars worldwide on data management in 2011. Doug Cutting, creator of Cloudera and Hadoop, estimates that 1.8 zettabytes (1.8 trillion gigabytes) were created and replicated in the same year. Ninety percent of this information is unstructured, and Hadoop and applications like it offer the only current method of keeping this data comprehensible.

For more information about SQL 2012 development, visit Magenic who have been one of the leading software development companies providing innovative custom software development to meet unique business challenges for some of the most recognized companies and organizations in the nation.

Article Source: http://EzineArticles.com/?expert=Chandra_Heitzman

Article Source: http://EzineArticles.com/6928548

An Overview of Apache’s Hadoop

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112