Optimizing Apache Hadoop* Deployments

Deze pdf is helaas alleen te downloaden.

Optimizing Hadoop* Deployments

Executive Summary

This paper provides guidance, based on extensive lab testing conducted with Hadoop* at Intel, to organizations as they make key choices in the planning stages of Hadoop deployments. It begins with best practices for establishing server hardware specifications, helping architects choose optimal combinations of components. Next, it discusses the server software environment, including choosing the OS and version of Hadoop. Finally, it introduces some configuration and tuning advice that can help improve results in Hadoop environments.

Overview

Having moved beyond its origins in search and Web indexing, Hadoop is becoming increasingly attractive as a framework for large-scale, data-intensive applications. Because Hadoop deployments can have very large infrastructure requirements, hardware and software choices made at design time can have a significant impact on performance and TCO.

Intel is a major contributor to open source initiatives, such as Linux*, Apache*, and Xen*, and has also devoted resources to Hadoop analysis, testing, and performance characterizations, both internally and with fellow travelers such as HP and Cloudera. Through these technical efforts, Intel has observed many practical trade-offs in hardware, software, and system settings that have real-world impacts.

This paper discusses some of those optimizations, which fall into three general categories:
• Server hardware.

This set of recommendations focuses on choosing the appropriate hardware components for an optimal balance between performance and both initial and recurring costs.
• System software.

Read the full Optimizing Hadoop* Deployments White Paper.