Big Data Solution – Hadoop Development


Introduction Big Data

All about Data!
Data Storage and Analysis
Comparison with Other Systems
Rational Database Management System
Grid Computing
Volunteer Computing
A Brief History of Hadoop

Installation single node Hadoop

Prerequisites Installation Configuration Standalone Mode
Pseudo distributed Mode Configuration SSH Formatting HDFS filesystem
Starting and stopping MapReduce
Fully Distributed Mode

Creating Eclipse Plugin for Hadoop-2.x.0

Download and install Eclipse
Install git
Download source code for Hadoop Plugin for Eclipse from git
Compile and create jar
Install the plugin to eclipse

Developing a MapReduce Application

The Configuration Combining Resources Variable Expansion
Setting Up the Development Environment Managing Configuration GenericOptionsParser, Tool, and ToolRunne
Writing a Unit Test with MRUnit
Running Locally on Test Data Running a Job in a Local Job Runner Testing the Driver
Running on a Cluster Packaging a Job Launching a Job
The MapReduce Web UI Retrieving the Results Debugging a Job
Hadoop Logs Remote Debugging Tuning a Job Profiling Tasks

MapReduce Workflows

Decomposing a Problem into MapReduce Jobs
Apache Oozie
MapReduce Features
Built-in Counters
User-Defined Java Counters
User-Defined Streaming Counters
Sorting Preparation Partial Sort Total Sort Secondary Sort Joins
Map-Side Joins
Reduce-Side Joins
Side Data Distribution
Using the Job Configuration Distributed Cache MapReduce Library Classes

Setting Up a Hadoop Cluster

Cluster Specification
Network Topology
Cluster Setup and Installation
Installing Java
Creating a Hadoop User Installing Hadoop Testing the Installation SSH Configuration
Hadoop Configuration
Configuration Management
Environment Settings
Important Hadoop Daemon Properties Hadoop Daemon Addresses and Ports Other Hadoop Properties
User Account Creation
YARN Configuration
Important YARN Daemon Properties YARN Daemon Addresses and Ports Security
Kerberos and Hadoop
Delegation Tokens
Other Security Enhancements Benchmarking a Hadoop Cluster Hadoop Benchmarks
User Jobs
Hadoop in the Cloud
Apache Whirr

Administering Hadoop

Persistent Data Structures
Safe Mode Audit Logging Tools Monitoring Logging Metrics
Java Management Extensions
Routine Administration Procedures Commissioning and Decommissioning Nodes Upgrades


Installing and Running Pig
Execution Types Running Pig Programs Grunt
Pig Latin Editors An Example Generating Examples
Comparison with Databases
Pig Latin Structure Statements Expressions Types Schemas Functions Macros
User-Defined Functions
A Filter UD An Eval UDF A Load UDF
Data Processing Operators Loading and Storing Data Filtering Data
Grouping and Joining Data
Sorting Data
Combining and Splitting Data
Pig in Practice
Parameter Substitution


Installing Hive The Hive Shell An Example Running Hive
Configuring Hive
Hive Services
The Metastore
Comparison with Traditional Databases Schema on Read Versus Schema on Write Updates, Transactions, and Indexes HiveQL
Data Types
Operators and Functions
Managed Tables and External Tables
Partitions and Buckets
Storage Formats
Importing Data Altering Tables Dropping Tables Querying Data
Sorting and Aggregating
MapReduce Scripts
Joins Subqueries Views
User-Defined Functions
Writing a UDF Writing a UDAF


HBasics Backdrop Concepts
Whirlwind Tour of the Data Model
Installation Test Drive Clients
Avro, REST, and Thrift
Example Schemas Loading Data Web Queries
HBase Versus RDBMS
Successful Service
Use Case: HBase at
Praxis Versions HDFS
UI Metrics
Schema Design
Bulk Load

R and Hadoop

Introduction R language
Introduction RHadoop Big Data solution
RHadoop data analysis
RHadoop machine learning

Python and Hadoop

Python Programming
Python and Hadoop
Hadoop - mrjob development


Introduction Spark
Machine Learning

Advanced Administration and monitoring

Multiple nodes

Add nodes
Decommission nodes
Recovering from Namenode failure
Monitoring cluster health using Ganglia - Pure Monitoring
Install Ambari - Manage and monitoring
Install Hue - Emphasis on use of hadoop environment and management

Clouderea Hadoop Certification

CCHA - Hadoop Administrator
CCHD – Hadoop Developer

Case Studies

Hadoop Usage at The Social Music Revolution
Hadoop at
Generating Charts with Hadoop The Track Statistics Program Summary
本页最后更新: | -- | 网站设计和虚拟主机服务 CMS