
Publication Date: May 26, 2012 ISBN-10: 1449311520 ISBN-13 :978-1449311520 | Version: third edition
Ready for your data, you can turn the power on it? With this comprehensive guide, you will learn how to establish and maintain reliable, scalable, distributed systems, and Apache Hadoop. This book is any size data set analysis, ideal for programmers and administrators to establish and run Hadoop clusters.
You will find instructive case study that demonstrates how to use Hadoop to solve specific problems. This third edition includes recent changes to Hadoop, including new material MapReduce API MapReduce and more flexible execution mode (yarn).
Store large data sets Hadoop Distributed File System (HDFS)
MapReduce running distributed
Use Hadoop’s data and I / O block of compressed data integrity, serialization (including Avro), and persistent
Found common pitfalls and advanced features, write MapReduce programs in the real world
Design, build and manage a dedicated Hadoop cluster running Hadoop cloud computing
From the data in a relational database is loaded into HDFS, Sqoop
Pig query language to perform large-scale data processing
Analysis of the data set of the hive, Hadoop’s data warehousing system
Structured and semi-structured data, HBase, ZooKeeper is used to build distributed systems
Show more
