Data Science for All Seminar Series

 

Learning Objectives

Seminar Description

Demystifying Artificial Intelligence (AI) Harlan Findley

After completing the seminar you should be able to:

  1. Describe the basics of how machine learning works
  2. Explain how it’s being used to improve business processes and which parts of the economy will be most affected by ML-driven automation.

Machine Learning is dramatically changing the way many companies do business.  But we’re only at the beginning of the revolution: as companies, governments and individuals grow to understand how to leverage Machine Learning and enable widespread automation, we will see an immensely disruptive impact to the economy - and to the future of work itself.

Machine Learning (ML) describes systems that that learn from, and make useful predictions, using historical data.  Please join us for a seminar on the basic concepts of ML and automation, as well as a discussion of the potential impacts we’ll see from automation over the next ten years.  The seminar is designed for a non-technical audience (we won’t code, and only a bit of light math), but all are welcome!

Python Foundations for data Science (Dr. Esperanza Huerta)

After completing the seminar you should be able to:

  1. Describe the data science system
  2. Explain how computers work
  3. Explain programming in data science
  4. Interpret, modify and create basic programs in Python

 

This seminar gives you the programming foundations to perform some of the steps in the data science system. Data scientists analyze a great deal of data, always assisted by computers. Data scientists tell computers what to do by coding programs with very detailed instructions

For this and other seminars, the programming language used is python. You should take this seminar if you have no experience programming. If you already know how to program in any other language, you could learn python on you own.

Neo4j

(Dr. Scott Jensen)

After completing the pre-seminar you should understand how to:

●     Install Neo4j on your computer

●     Configure the basic computer settings for Neo4j

●     Load an existing Neo4j database

 

After participating in the seminar and completing the post-seminar assessment to obtain your digital badge, you should be able to:

●     Describe why graph databases are used to explore social networks

●     Describe relationships in graphs

●     Write basic cypher queries

●     Load data into a graph database

●     Generate visualizations of network relationships

NoSQL databases are common in Big Data, and graph databases are one of the hottest areas in NoSQL!

Data science is about exploring the patterns and relationships in data, and graph databases are the key to exploring relationships in networks - such as the tsunami of data from social networks.  In this seminar we will be using the Neo4j graph database to explore relationships in a social network.

In business, this means discovering relationships between customers, their purchases, and their behaviors.  But it’s not only for businesses.  The International Consortium of Investigative Journalists also used Neo4j to enable investigative journalists across the globe to discover previously hidden relationships between politicians and offshore tax havens.  So whether your interest is in tracking the relationships between customers, between politicians and tax havens, detecting financial fraud, or tracking the spread of infectious diseases, this seminar will give you a new tool to discover the relationships of interest to you.

Statistical Foundations of Data Science

(Dr. Subhankar Dhar)

After completing the seminar, you should be able to:

  1. Understand basic statistical principles often used by data scientists
  2. Apply common statistical tools and techniques used in Data Science
  3. Use Python and Jupyter Notebook to analyze large datasets
  4. Visualize and interpret results for decision making

The seminar has three aspects: analysis of data for inferential thinking, computational thinking, and real-world applications for decision making. Given data arising from some real-world phenomenon, how does one analyze that data so as to understand that phenomenon? The seminar delves into important statistical concepts along with relevant computer programming fundamentals, in conjunction with hands-on analysis of real-world datasets.

Spark and Jupyter (Dr. Scott Jensen)

After participating in the seminar and completing the post-seminar assessment to obtain your digital badge, you should be able to:

  1. Load data into Spark DataFrames and ask basic questions of your data using PySpark
  2. Understand the importance of documenting your work and using markdown in Jupyter notebooks
  3. Create basic visualizations in Jupyter
  4. Share and publish your results

Apache Spark and Jupyter notebooks are currently two of the hottest tools in data science.  Through a web-based interface you can explore, and experiment with large datasets without extensive coding.  In this seminar you will be using these tools to explore a large dataset and visualize your results.

Jupyter and Spark are being used by data scientists at some of the largest web-based companies in the Silicon Valley. Apache Spark allows them to explore large datasets in varied formats to quickly identify patterns in the data. Jupyter notebooks allow them to not only visualize and document their results, but also easily share them with colleagues.

Brief summary for poster:

You will use some of the hottest web-based tools in data science to explore a large dataset, visualize your analysis, and publish your results.