Apache Hadoop Distributed File System (HDFS)

Apache Hadoop

Apache Hadoop is an open-source software framework that allows for distributed storage and processing of big data using the MapReduce programming model. It consists of a storage part called Hadoop Distributed File System (HDFS) and a processing part that utilizes data locality to process data faster and more efficiently. Hadoop also has a wide range of additional software packages that can be installed on top of it, such as Apache Pig, Apache Hive, and Apache Spark.

1 courses cover this concept

15-440 Distributed Systems

Carnegie Mellon University

Fall 2020

A course offering both theoretical understanding and practical experience in distributed systems. Key themes include concurrency, scheduling, network communication, and security. Real-world protocols and paradigms like distributed filesystems, RPC, MapReduce are studied. Course utilizes C and Go programming languages.

No concepts data

+ 34 more concepts