Apache Spark is an open-source analytics engine for large-scale data processing. It provides an interface for programming clusters with implicit data parallelism and fault tolerance. It was originally developed at the University of California, Berkeley's AMPLab and donated to the Apache Software Foundation.
UC Berkeley
Spring 2023
This project-heavy course covers access methods, data models, query languages, database services, and interfaces. It introduces transaction processing and requires CS 61A, CS 61B, and CS 61C as prerequisites/corequisites. It suggests proficiency in Java for project work.
No concepts data
+ 23 more conceptsStanford University
Fall 2022
Focused on principles and trade-offs in designing modern parallel computing systems, this course also teaches parallel programming techniques. It is intended for students looking to understand both parallel hardware and software design. Prerequisite knowledge in computer systems is required.
No concepts data
+ 45 more conceptsUC Berkeley
Fall 2021
A graduate survey of systems managing computation and information. Topics include volatile and persistent memory management, system support for networking, security infrastructure, extensible systems, APIs, and large software system performance analysis. Students are expected to engage in quality systems research, culminating in a publishable group project.
No concepts data
+ 31 more conceptsStanford University
Spring 2023
This course focuses on data mining and machine learning algorithms for large scale data analysis. The emphasis is on parallel algorithms with tools like MapReduce and Spark. Topics include frequent itemsets, locality sensitive hashing, clustering, link analysis, and large-scale supervised machine learning. Familiarity with Java, Python, basic probability theory, linear algebra, and algorithmic analysis is required.
No concepts data
+ 17 more conceptsCarnegie Mellon University
Fall 2020
A course offering both theoretical understanding and practical experience in distributed systems. Key themes include concurrency, scheduling, network communication, and security. Real-world protocols and paradigms like distributed filesystems, RPC, MapReduce are studied. Course utilizes C and Go programming languages.
No concepts data
+ 34 more conceptsUC Berkeley
Fall 2022
This course deepens students' understanding of computer architecture and the translation of high-level programs into machine language. Emphasis is on C and assembly language programming, computer organization, parallelism, CPU design, and warehouse-scale computing. Prerequisites include CS61A and CS61B or equivalent C-based programming experience.
No concepts data
+ 51 more concepts