Topic guide to Data Day Seattle 2016

Here is a brief guide to some of the topics covered at Data Day Seattle 2016

Keynote

9:00 am Maslow's Hierarchy of Needs for Databases - Charity Majors (Honeycomb)

Big Picture

10:00 am Elevating Your Data Platform - Kurt Brown (Netflix)
11:00 am: Paths of Learning: The most effective way to learn about learning is to play among lovely graphs - Taylor Martin (O'Reilly)

Data Pipelines

10:00 am: Elevating Your Data Platform - Kurt Brown (Netflix)
12:50 pm Data Pipelines with Kafka and Spark (2 hour workshop) - John Akred / Mark Mims / Stephen O'Sullivan (Silicon Valley Data Science)
2:40 pm Introducing Apache Airflow (Incubating) - A Better Way to Build Data Pipelines - Siddarth Anand (Agari)
4:15 pm Open Source Lambda Architecture with Kafka, Samza, Hadoop, and Druid - Fangjin Yang (Imply)
Building Recommendations at Scale: Lessions Learned - Preetha Appan (Indeed)

Data Science

10:00 am: How Machine Learning is like Cycling - Michelle Casbon (Qordoba)
10:00 am: Data Infrastructure at a Small Company - Melissa Santos (Big Cartel)
10:00 am: Visualizing the Model Selection Process - Benjamin Bengfort (District Data Labs)
10:00am Lessons learned from deploying the top deep learning frameworks in production - Kenny Daniel (Algorithmia)
11:00 am Finding Key Influencers and Viral Topics in Twitter Networks Related to ISIS and to the 2016 Primary Elections - Steve Kramer (Paragon Science)
11:00 am What's Your Data Worth? - John Akred (Silicon Valley Data Science)
2:40 pm: How to Observe: Lessons from Epidemiologists, Actuaries and Charlatans - Juliet Hougland (Cloudera)
2:40 pm Catching trains: Iterative model development with Jupyter Notebook - Chloe Mawer (Silicon Valley Data Science)
Data Science for the Masses: Can KNIME make the impossible possible? - Michael Berthold (KNIME)
Transforming Data to Unlock Its Latent Value - Tony Ojeda (District Data Labs)
Modernizing the Fashion Industry with Data - Andy Terrel (Fashion Metric)
4:15 pm Web Scraping in a JavaScript World - Ryan Mitchell (HedgeServ)

Natural Language Processing / Text

10:00 am: How Machine Learning is like Cycling - Michelle Casbon (Qordoba)
11:00 am: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec - Christopher Moody (Stitch Fix)
12:50 pm: Deep Learning for Natural Language Processing - Jonathan Mugan (Deep Grammar)
1:45 pm: Generating personalized travel recommendations from natural language queries - Melanie Tosik (WayBlazer)
2:40 pm: NLP @HomeAway: how to mine reviews and track competition - Brent Schneeman (Homeaway)
4:15 pmNLP for the web: augmenting traditional systems with web specific features - Matthew Peters (Moz)
5:10 pm Turning Unstructured Data into Kernels of Ideas - Jason Kessler (CDK Digital Marketing)

Machine Learning

10:00 am: How Machine Learning is like Cycling - Michelle Casbon (Qordoba)
10:00 am: Visualizing the Model Selection Process - Benjamin Bengfort (District Data Labs)
12:50 pm: Building better models faster using active learning - Nicholas Gaylord (Crowdflower)
Transforming Data to Unlock Its Latent Value - Tony Ojeda (District Data Labs)
Building Recommendations at Scale: Lessions Learned - Preetha Appan (Indeed)

Algorithms everywhere

10:00am Lessons learned from deploying the top deep learning frameworks in production - Kenny Daniel (Algorithmia)
4:15 pm: Thinking like Spark: How trying to optimize one algorithm helped me re-think distributed data processing - Rachel Warren (Alpine Data Labs)

Data Analysis / Analytics

10:00 am How to Visualize Graph Data: A Developer’s Guide - Corey Lanum (Cambridge Intelligence)
Data Science for the Masses: Can KNIME make the impossible possible? - Michael Berthold (KNIME)
Transforming Data to Unlock Its Latent Value - Tony Ojeda (District Data Labs)
10:00 Data Infrastructure at a Small Company - Melissa Santos (Big Cartel)

Graph Databases and Processing

10:00 am How to Visualize Graph Data: A Developer’s Guide - Corey Lanum (Cambridge Intelligence)
11:00 am: Paths of Learning: The most effective way to learn about learning is to play among lovely graphs - Taylor Martin (O'Reilly)
12:50 pm: Graphs vs Tables: Ready? Fight. (2 hour workshop) pt 1 of 2 - Denise Gosnell (PokitDok)
1:45 pm: Graphs vs Tables: Ready? Fight. (2 hour workshop) - pt 2 of 2 - Denise Gosnell (PokitDok)
1:45 pm: Generating personalized travel recommendations from natural language queries - Melanie Tosik (WayBlazer)
2:40 pm: Graph Database Engine Shootout: Part 1 - Josh Perryman (Expero)
4:15 pm: Graph Database Engine Shootout: Part 2 - Josh Perryman (Expero)
5:10 pm: Virtualizing Relational Databases as Graphs: a multi-model approach - Juan Sequeda (Capsenta)

Cassandra

12:50 pm: A Little Cassandra for the Relational Brain - Patrick McFadin (DataStax)
12:50 pm: Graphs vs Tables: Ready? Fight. (2 hours) pt 1 of 2 - Denise Gosnell (PokitDok)
1:45 pm: Graphs vs Tables: Ready? Fight. (2 hours) pt 2 of 2 - Denise Gosnell (PokitDok)
1:45 pm: Stuff you should know as an Advanced Cassandra user - Patrick McFadin (DataStax)
4:15 pm: [Eric Lubow's famous and scary talk about Cassandra counters - with updates] - Eric Lubow (SimpleReach)

Hadoop

10:00 am: Elevating Your Data Platform - Kurt Brown (Netflix)
2:40 pm Real-time Search on Terabytes of Data Per Day: Lessons Learned - Joey Echeverria (Rocana)
2:40 pm Introducing Apache Airflow (Incubating) - A Better Way to Build Data Pipelines - Siddarth Anand (Agari)
4:15 pm Open Source Lambda Architecture with Kafka, Samza, Hadoop, and Druid - Fangjin Yang (Imply)

Kafka

10:00 am STREAMING KEYNOTE: Large-scale stream processing using Apache Kafka - Jay Kreps (Confluent)
12:50 pm Data Pipelines with Kafka and Spark (2 hour workshop) - John Akred / Mark Mims / Stephen O'Sullivan (Silicon Valley Data Science)
2:40 pm Real-time Search on Terabytes of Data Per Day: Lessons Learned - Joey Echeverria (Rocana)
4:15 pm Open Source Lambda Architecture with Kafka, Samza, Hadoop, and Druid - Fangjin Yang (Imply)
5:10 pm Extreme Streaming Processing at Uber - Hien Luu (uber)

Python

4:15 pm Web Scraping in a JavaScript World - Ryan Mitchell (HedgeServ)
Visualizing the Model Selection Process - Benjamin Bengfort (District Data Labs)
2:40 pm Catching trains: Iterative model development with Jupyter Notebook - Chloe Mawer (Silicon Valley Data Science)
Transforming Data to Unlock Its Latent Value - Tony Ojeda (District Data Labs)

Spark

10:00 am: Elevating Your Data Platform - Kurt Brown (Netflix)
11:00 am: Beyond Shuffling - Tips & Tricks for Scaling Apache Spark Programs - Holden Karau (IBM)
12:50 pm Data Pipelines with Kafka and Spark (2 hour workshop) - John Akred / Mark Mims / Stephen O'Sullivan (Silicon Valley Data Science)
4:15 pm: Thinking like Spark: How trying to optimize one algorithm helped me re-think distributed data processing - Rachel Warren (Alpine Data Labs)
5:10 pm Extreme Streaming Processing at Uber - Hien Luu (uber)

Databases

9:00 am: Maslow's Hierarchy of Needs for Databases - Charity Majors (Honeycomb)
10:00 am: In Search of Database Nirvana – The Challenges of Delivering Hybrid Transaction/Analytical Processing - Rohit Jain (Esgyn)
2:40 pm Graph Database Engine Shootout: Part 1 - Josh Perryman (Expero)
4:15 pmGraph Database Engine Shootout: Part 2 - Josh Perryman (Expero)

Deep Learning / Neural Networks

10:00am Lessons learned from deploying the top deep learning frameworks in production - Kenny Daniel (Algorithmia)
12:50 pm: Deep Learning for Natural Language Processing - Jonathan Mugan (Deep Grammar)
1:45 pm: Distilling dark knowledge from neural networks - Alex Korbonits (Remitly)

Data Valuation

What's Your Data Worth? - John Akred (Silicon Valley Data Science)
Transforming Data to Unlock Its Latent Value - Tony Ojeda (District Data Labs)

Data Visualization

10:00 am How to Visualize Graph Data: A Developer’s Guide - Corey Lanum (Cambridge Intelligence)
Visualizing the Model Selection Process - Benjamin Bengfort (District Data Labs)

Streaming Data

5:10 pm Extreme Streaming Processing at Uber - Hien Luu (uber)