Apache™ Hadoop® notesRussell Bateman |
Apache Hadoop is an open-source software framework for storage and large-scale processing of data sets on clusters of commodity hardware.
Hadoop consists of (a) libraries and utilities used by Hadoop, (b) a distributed filesystem to store data on the (commodity) machines, (c) a resource-management platform for managing resources in clusters and scheduling user applications, and (d) a programming mode (MapReduce) for large-scale data processing.
MapReduce is technology for processing large data sets using parallel and distributed algorithms in a cluster environment.
Hadoop is a tarball with binaries including dæmons, executables and other stuff inside. It's downloaded from Welcome to Apache™ Hadoop®. What Hadoop is is best explained there and here.