Big Data Events

Seminars, Webinars and Brown Bag Lunches!

Each semester the Initiative sponsors or co-sponsors a variety of important events that focus exclusively on big data issues.  Here are the latest events for you to think about - come and enjoy:

April 7 @10:30 | Big Data Architectures & Projects

When Nathan Marz coined the term Lambda Architecture back in 2012 he might have only been in search for a somewhat sensical title for his upcoming book. No doubt, the Lambda Architecture has since gained traction, functioning as a blueprint to build large-scale, distributed data processing systems in a flexible and extensible manner. But it also turns out that there is a sometimes overlooked aspect of the Lambda Architecture: human fault tolerance. Humans make mistakes. Machines don`t. Machines scale. Humans don`t. By reviewing a number of real-world architectures of distributed applications from our customer and partner base I'm trying to come up with answers to the following questions: 

    • What Apache Hadoop eco-system components are useful for which layer in the Lambda Architecture?
    • What is the impact on human fault tolerance when choosing certain components?
    • Are there good practices available for using certain Apache Hadoop ecosystem components in the three-layered Lambda Architecture? 

Speaker: Michael Hausenblas, Chief Data Engineer EMEA, MapR Technologies and the event will held in DH - 450

April 9 @1:30 | Big Data analytics: Introduction to H20

Data Modeling has been constrained through scale; Sampling still rules the day for Adhoc Analytics. Scale brings much needed change to the modeling world. In this talk we present the predictive power of using sophisticated algorithms on big datasets. With large data sizes comes the particularly hard problem of unbalanced data with multiple asymmetrically rare classes. Missing features pose unique problems for most Classification and Regression algorithms and proper handling can lead to greater predictive power. In the race for Better Predictions, H2O makes practical techniques accessible to anyone through an easy-to-use software product.

H2O is an open source math & machine learning engine for big data that brings distribution and parallelism to powerful algorithms while keeping the widely used languages of R and JSON as an API. And integrates neatly into popular data ecosystems of hadoop, amazon s3, nosql and sql. We briefly discuss design choices in the implementation of Distributed Random Forest and Generalized Linear Modeling and bringing speed and scale to vox populi of Data Science, R. We take a peek at the elegant lego-like infrastructure that brings fine grained parallelism to math over simple distributed arrays.  A short hacking data demo presents the life cycle of Data Science: Powerful Data Manipulation via R at scale, Interactive Summarization over large datasets, Modeling using Elastic Net (GLM), Grid Search for best parameters & low-latency scoring.  The event will be held in MH225.

April 14 @1:30 | Big Data analytics: Hands-on With H20

Data Modeling has been constrained through scale; Sampling still rules the day for Adhoc Analytics. Scale brings much needed change to the modeling world. In this talk we present the predictive power of using sophisticated algorithms on big datasets. With large data sizes comes the particularly hard problem of unbalanced data with multiple asymmetrically rare classes. Missing features pose unique problems for most Classification and Regression algorithms and proper handling can lead to greater predictive power. In the race for Better Predictions, H2O makes practical techniques accessible to anyone through an easy-to-use software product.