Spring 2018

BUS4 118s, Special Topics in MIS - Big Data

This course counts as an MIS elective for purposes of the MIS concentration and can be repeated (if there are different seminar courses offered).  As discussed in the course catalog, special topics courses augment the regularly scheduled electives and this course covers the topic of Big Data.  For Spring 2018 there are two sections offered:

Section 1:

Course number:  21148
Class Time:  Tuesday & Thursday 12:00pm - 1:15pm
Location:  BBC 103

Section 2:

Course number:  21251
Class Time:  Tuesday & Thursday 1:30pm - 2:45pm
Location:  BBC 103

For the detailed syllabus and weekly schedule, please see Canvas

Course Format: The course is very hands-on using Big Data tools to wrangle, analyze, and visualize a social media dataset.  There will be some lecture/discussion towards the end of the semester on the use of Big Data in different industries, issues of ethics and privacy, additional Big Data strategy issues, and data lakes.

Most classes will be hands-on, so if you plan to use your own laptop, please bring it to class.  If you don't have a laptop, you can check one out either from the Jack Holland Success Center here in the BBC or from the MLK library.  From talking with staff at the library, there is a lot of demand for their laptops, so it may be easier to check one out from the Jack Holland Center.  Tableau you will use locally on your laptop, but you will be using Apache Spark from a web-based interface that only requires a browser.

During class we will be doing hands-on exercises, each of which is designed to be completed in class.  There will also be four team-based take-home lab assignments in which you apply these same tools to answer a question about the dataset we are working with.  The exercises and labs are designed to get everyone up-to-speed and comfortable with the tools since you will use them on a team project.

We will form teams early in the semester.  Your team will work together to answer a potential business question of a real-world social media dataset and then apply the framework we learn in class along with the Big Data tools to answer that question.  Since you do not know the answer to your question at the start, you are graded on how you apply the process, how you document your work, your identification of issues in the data, and whether you are curious about your data - not getting a specific result.  The team will prepare a progress report during the semester and each team presents their results at the end of the semester.

Course Goals and Description:  Data Science is currently a hot topic in industry and Big Data is the fuel for data science.  In the early years, data scientists were often Ph.D.'s from the hard sciences (such as astrophysics), but increasingly data science is a team project.  The aim of this course is to prepare you for the aspects of data science that consume most of the team's effort and give you skills that can help you enter this exciting field.

Across many industries, 80% of a data scientist's day is spent wrangling data.  This includes getting data formatted, transforming it, and profiling it - asking questions of the data to learn about it.  The "sexy" aspect of developing complex models is a small part of the job, and then being able to visualize and communicate the results to upper management is required for businesses to get any value out of the analysis.  In this course we will focus on the data wrangling aspect using a dataset provided by Yelp, you will ask questions of the data, and then create a visualization to present at the end of the semester.

The importance of data wrangling was summed up by DJ Patil, the first Chief Data Scientist for the U.S. Government (in the Obama administration), who stated that: "Good data scientists understand, in a deep way, that the heavy lifting of cleanup and preparation isn't something that gets in the way of solving the problem: it is the problem."

The Yelp data is available as part of their Dataset Challenge; a data competition for students.  A new dataset is available at the start of each semester and contains reviews, business data, and user data.  Each semester the dataset grows.  Last semester it contained over 4.7 million reviews for businesses in 11 cities.

The tools in this field continue to evolve, but we will be using the following:
Apache Spark: Currently one of the fastest growing Big Data tools, hosted on IBM's cloud using Jupyter notebooks which are currently popular with data scientists.
Tableau: One of the most popular visualization tools and a skill that prior students have found to be in demand by recruiters.

Textbooks:  We will be using chapters from a number of books that are available online from the MLK Library for free to you as a student or available for free from some of the tool vendors.  For thinking about how to frame the questions you ask of the data, we will be using Thinking with Data: How to Turn Information into Insights, by Max Shron.  You can read this online for free from the MLK Library for free (as long as you are an SJSU student or live in San Jose).

Prerequisites: Currently none.  Both the BUS4 92, Introduction to Business Programming, and BUS4 112, Database Management Systems courses are helpful knowledge for this course, but are not required. The exercises will walk you through step-by-step to learn the tools.  We will also have a couple sessions where we do a hands-on review of topics you may have covered in more detail in those courses.  Curiosity is a greater asset than specific technical skills.


BUS4 188, Business Systems and Policy

Section 6:

Course number: 21347
Class Time: Tuesday 6:00pm - 8:45pm
Location: BBC 103

Course Description: In today’s business environment technology plays a significant role, so an understanding of information systems is needed for businesses to be able to compete effectively. This course provides an introduction to the information systems used in business, including key terms, concepts, and capabilities, as well as how technology impacts business organizations.  Click here for the course catalog description and prerequisites.

Course Format: We meet once per week, and most weeks the course will consist of a combination of lecture, exercises, and in-class projects where you will work with your team.  There are approximately 9 in-class team projects and you will be assigned to rotations on three teams which will be diverse both as to major and gender since industry has found that diverse teams generate better results.  There will be a midterm and final exam, with two online quizzes before the midterm and two before the final - these are designed as a study aid for the exams.  We will also have a series of six lab projects where you will be using a Salesforce developer account to do lab exercises.  These labs use an engaging format and since they are based on a currently in-demand skill, the labs have helped prior students in landing internships and jobs.