Apache Hadoop is an open-source software framework for distributed storage and processing of big data using the MapReduce programming model. It splits files into large blocks and distributes them across nodes in a cluster, allowing for faster and more efficient data processing than traditional supercomputer architectures. The base framework consists of HDFS and MapReduce, with additional modules such as Apache Pig, Apache Hive, and Apache Spark available to install alongside it.
Carnegie Mellon University
Fall 2020
A course offering both theoretical understanding and practical experience in distributed systems. Key themes include concurrency, scheduling, network communication, and security. Real-world protocols and paradigms like distributed filesystems, RPC, MapReduce are studied. Course utilizes C and Go programming languages.
No concepts data
+ 34 more concepts