Course Description

Big Data

You can choose to take XM-ADM 419 Fundamentals of Big Data Solutions that cover the three popular technologies (Hadoop, Cassandra, and Spark) to learn the concepts and foundation of Big Data Solutions while getting a saving of $100, or take just the classes that cover the technologies or areas of your interests.

Each course provides a lot of practice experiences. Participants are expected to bring their laptops with at least 8 GB of memory to classes. SJSU will award a Certificate of Completion for each course to those who have finished it with satisfactory result.

 

XG-ADM 419 Fundamentals of Big Data Solutions

This is a 3-Saturday class that teaches the fundamentals of Big Data solutions. Three popular technologies: Cassandra, Hadoop, and Spark are used to teach the relevant concepts and guide the practical exercises

Instructor: Chris Tseng



XG-ADM 420 Fundamentals of Cassandra

Each Big Data solution requires a robust way to store and retrieve data. Cassandra is a distributed database for managing a large amount of data across many commodity servers, while providing highly available services and no single point of failure. Below are the class learning objectives:

  • Understand the basic theories of ACID, CAP, and BASE for distributed databases
  • Know the internal architecture of a distributed database like Cassandra
  • Learn the concept of read path, write path, and data compaction in distributed database
  • Understand the relation of replication and eventual consistency for data query
  • Understand anti-entropy operations for read repair
  • Design query based data model for query and analyzing big data

Instructor: Chris Tseng



XG-ADM 421 Fundamentals of Hadoop

Programming is a key to explore and extract intelligence from massive amount of data. Hadoop is programming framework that supports the processing and storage of extremely large data sets in a distributed computing environment. Below are the class learning objectives:

  • Discuss a brief history of Hadoop and big data
  • Define the components of the HDFS (Hadoop File System) architecture and how data is read and written
  • Write a simple MapReduce program in Java or Python
  • Describe the commonly-used components of the Hadoop ecosystem
  • Delineate an end-to-end architecture and data flow for batch and streaming solutions

Instructors: Chris Tseng and James Casaletto



XG-ADM 422 Fundamentals of Spark

In addition to a programming framework and a robust database, we need an efficient processing engine. Spark is a fast and general engine for large-scale data processing. Below are the class learning objectives:

  • Discuss the motivation for Apache Spark
  • Describe the components of the Spark architecture and deployment modes
  • Define the RDD (Resilient Distributed Datasets), DStream, and SparkSQL data types
  • Identify functionality available in Spark core, streaming, SparkSQL, and MLlib packages
  • Use the Spark Web UI to monitor jobs, gather metrics, and examine job execution DAGs (Direct Acyclic Graphs).

Prerequisites

  • A desire to learn how to store, access and analyze massive amounts of data via the foundation technologies taught in the course.
  • Basic understanding of computer operations and concepts of programming.

Instructors: James Casaletto

 

COURSE REFUND POLICY:Full Refund (minus a $20 processing fee) if class dropped before the first day of instruction. NO Refund when class dropped on the same day of instruction, or later.

d