Hadoop

Apache Hadoop

Apache Hadoop is an open-source software framework for distributed storage and processing of big data using the MapReduce programming model. It splits files into large blocks and distributes them across nodes in a cluster, allowing for faster and more efficient data processing than traditional supercomputer architectures. The base framework consists of HDFS and MapReduce, with additional modules such as Apache Pig, Apache Hive, and Apache Spark available to install alongside it.

1 courses cover this concept

15-440 Distributed Systems

Carnegie Mellon University

Fall 2020

A course offering both theoretical understanding and practical experience in distributed systems. Key themes include concurrency, scheduling, network communication, and security. Real-world protocols and paradigms like distributed filesystems, RPC, MapReduce are studied. Course utilizes C and Go programming languages.

No concepts data

+ 34 more concepts