You can choose to take XM-ADM 419 Fundamentals of Big Data Solutions that cover the three popular technologies (Hadoop, Cassandra, and Spark) to learn the concepts and foundation of Big Data Solutions while getting a saving of $100, or take just the classes that cover the technologies or areas of your interests.
Each course provides a lot of practice experiences. Participants are expected to bring their laptops with at least 8 GB of memory to classes. SJSU will award a Certificate of Completion for each course to those who have finished it with satisfactory result.
XG-ADM 419 Fundamentals of Big Data Solutions
This is a 3-Saturday class that teaches the fundamentals of Big Data solutions. Three popular technologies: Cassandra, Hadoop, and Spark are used to teach the relevant concepts and guide the practical exercises
Instructor: Chris Tseng
XG-ADM 420 Fundamentals of Cassandra
Each Big Data solution requires a robust way to store and retrieve data. Cassandra is a distributed database for managing a large amount of data across many commodity servers, while providing highly available services and no single point of failure. Below are the class learning objectives:
- Understand the basic theories of ACID, CAP, and BASE for distributed databases
- Know the internal architecture of a distributed database like Cassandra
- Learn the concept of read path, write path, and data compaction in distributed database
- Understand the relation of replication and eventual consistency for data query
- Understand anti-entropy operations for read repair
- Design query based data model for query and analyzing big data
Instructor: Chris Tseng
XG-ADM 421 Fundamentals of Hadoop
Programming is a key to explore and extract intelligence from massive amount of data. Hadoop is programming framework that supports the processing and storage of extremely large data sets in a distributed computing environment. Below are the class learning objectives:
- Discuss a brief history of Hadoop and big data
- Define the components of the HDFS (Hadoop File System) architecture and how data is read and written
- Write a simple MapReduce program in Java or Python
- Describe the commonly-used components of the Hadoop ecosystem
- Delineate an end-to-end architecture and data flow for batch and streaming solutions
XG-ADM 422 Fundamentals of Spark
In addition to a programming framework and a robust database, we need an efficient processing engine. Spark is a fast and general engine for large-scale data processing. Below are the class learning objectives:
- Discuss the motivation for Apache Spark
- Describe the components of the Spark architecture and deployment modes
- Define the RDD (Resilient Distributed Datasets), DStream, and SparkSQL data types
- Identify functionality available in Spark core, streaming, SparkSQL, and MLlib packages
- Use the Spark Web UI to monitor jobs, gather metrics, and examine job execution DAGs (Direct Acyclic Graphs).
- A desire to learn how to store, access and analyze massive amounts of data via the foundation technologies taught in the course.
- Basic understanding of computer operations and concepts of programming.
Instructors: James Casaletto
COURSE REFUND POLICY:Full Refund (minus a $20 processing fee) if class dropped before the first day of instruction. NO Refund when class dropped on the same day of instruction, or later.