Exploring Relationships in Graphs Seminar
Instructor : Dr. Scott Jensen
This seminar was first presented at San Jose State University during the Spring 2019 semester.
Why we hope to see you at the seminar!
No prior knowledge of graphs or programming is required; just curiosity! If you have ever wondered how companies explore your data to suggest new social media connections, or products you may be interested in, this seminar is for you! If you have ever been fascinated by journalists using data to connect, people and events, this seminar is for you!
Data science is about exploring the patterns and relationships in data, and graph databases are the key to exploring relationships in networks – such as the tsunami of data from social networks. In this seminar we will be using the Neo4j graph database to explore relationships in a social network. Graphs are composed of “nodes” and the relationships (edges) between those nodes. For example, in a social network, the nodes could be you and your friends (you would each be a node), and the relationships between you would be “FRIEND”. Other people in your social network would also be nodes, but connected through other types of relationships, such as “PARENT”, or “SIGNIFICANT OTHER”. Relationships are directional (you would have a PARENT relationship to each of your parents, but they would not have a PARENT relationship to you).
In business, this means discovering relationships between customers, their purchases, and their behaviors. Graphs enable features such as “people you may know” or recommending other products to purchase, songs to listen to, or people to date. But graphs aren't only for businesses. The International Consortium of Investigative Journalists used Neo4j to enable investigative journalists across the globe to discover previously hidden relationships between politicians and offshore tax havens. So whether your interest is in tracking the relationships between customers, between politicians and tax havens, detecting financial fraud, or tracking the spread of infectious diseases, this seminar will enable you to discover the relationships of interest to you!
After participating in the seminar and completing the post-seminar assessment, you will be able to:
- Describe why graph databases are used to explore social networks
- Describe relationships in graphs
- Write basic cypher queries
- Load data in a graph database
- Generate visualizations of networks relationships
You will be working with a dataset made available by Yelp and we will be looking at restaurant and bar reviews, who is reviewing which businesses, the cuisines restaurants serve, the entertainment bars provide, where businesses are located, and the friend relationships between users. Although this is only a small segment of Yelp’s data, the graph you will be working with contains approximately 20 million relationships! We will explore patterns in users reviewing restaurants and also explore using the database for restaurant recommendations, such as starting with a user who has a lot of fans and asking, can we use their network of friends to make pizza recommendations based on the reviews by friends of their friends (who are not direct friends), but they have been similarly critical of a restaurant that the user has also reviewed?
The database we will be using will be installed on the lab computers, but instructions for creating and installing the database on your own computer are included below if you wish to play with it either before or after the seminar on your own computer. No prior experience is needed, but to get the most out of the seminar, please do the following:
How to get started
- Register for the seminar – its 100% free, but registering for the seminar will get you access to a Canvas course with all of the seminar materials, optional pre-seminar exercises, and additional materials (some of these are included below, but more convenient in Canvas).
- Try out the pre-seminar exercise in Canvas. This includes using web-based, pre-populated databases that Neo4j makes available for you to play with – all you need is a browser!
Note: additional materials are available in Canvas after you register
- At the start of the seminar you will be required to click on the Yelp dataset download link and Accept the license agreement for the Yelp data [pdf]. Since we have already created the graph database, you will not need to download the dataset.
- Pre-seminar exercise. In addition to optional videos and websites with examples using graphs, there is an exercise in the Canvas module based on the Web-based Neo4j Recommendation sandbox. This is a database Neo4j has created for movie recommendations and you only need a web browser to access it. The exercise provides a brief discussion of graph databases and walks you through signing up for the sandbox and doing some initial queries to generate visualizations.
- Seminar slides. This is a PowerPoint file of the slides from the seminar. Feel free to look at them beforehand, but if you don’t understand them before the seminar, that’s fine! We will be walking through learning about the topics covered in the slides.
Faculty Materials & Community Colleges
- If you are a faculty member at SJSU or any university or community college, and you would like to host a seminar at your school or use the materials in your course, see the Teaching Materials page to request the additional materials available.
- If you are a Dean or faculty member at a a Bay Area community college, we would like to hear from you! We are working with community college faculty in the Bay Area and provide stipends to attend the seminar and assist in presenting it at your school.
- Are you a Bay Area Community College student? Ask your professors if they could incorporate the seminar into your current class or host a student event.