Confirmed Sessions for Data Day Seattle

We are just now beginning to announce the confirmed sessions. Check this page regularly for updates.

Foundations of Streaming SQL or: How I Learned to Love Stream & Table Theory

Tyler Akidau - Google

What does it mean to execute robust streaming queries in SQL? What is the relationship of streaming queries to classic relational queries? Are streams and tables the same thing conceptually, or different? And how does all of this relate to the programmatic frameworks like we’re all familiar with? This talk will address all of those questions in two parts.
First, we’ll explore the relationship between the Apache Beam Model and stream & table theory (as popularized by Martin Kleppmann and Jay Kreps, amongst others, but essentially originating out of the database world). It turns out that stream & table theory does an illuminating job of describing the low-level concepts that underlie the Beam Model.
Second, we’ll apply our clear understanding of that relationship towards explaining what is required to provide robust stream processing support in SQL. We’ll discuss concrete efforts that have been made in this area by the Apache Beam, Calcite, and Flink communities, and talk about new ideas yet to come.
In the end, you can expect to have a much better understanding of the key concepts underpinning data processing, regardless of whether that data processing batch or streaming, SQL or programmatic, as well as a concrete notion of what robust stream processing in SQL looks like.

Graph Sessions

Securing Federated Data with TinkerPop and how to handle the “search engine” problem

Josh Perryman - Expero

"Hey, I have a great idea,” the CIO said, “let’s use a graph database to integrate all of our enterprise data sources!” “But security?” you replied. “Oh, it can’t be that hard” he dismissed, “Graph is easy. I’m sure that security in graph will be a cake-walk for you.”
Little did he know the mighty project of woe he had just bequeathed to you. Security was easy to model when you talked to the owner of the first data source, but when you had the requirements conversation with the business owner of the second data source, and then the third, and each one thereafter, you started to fret. Security in a graph database is not easy, or obvious. And when full-text search capabilities made it into the MVP requirements, you started to despair.
With his unique style of wisdom and wit Josh will look at the options for doing “cell level” security in a TinkerPop3-based graph data engine, and how to solve “the search engine problem” as well.

Graph Data Obfuscation

Mike Downie - Expero

It only takes 5 pieces of identifying data to uniquely identify someone within a zip code with fairly high confidence. If you have HIPPA or PCI (or both!) compliance requirements, and you are using graph, that problem just got hard. The graph eco-system doesn’t have the same obfuscation support as other data engines. Since graph is an enabling and strategic technology for many projects, and regulatory requirements aren’t going away, that means you have to solve the data obfuscation problem. Come join Mike as he looks at techniques and strategies for obfuscating, de-identifying or encrypting data in your graph implementation.

Graph Representations in Machine Learning

Trey Wilson / Steve Purves - Expero

Success in machine learning is all about the data we have available. Using graph representations and graph analytics enables organisations to understand their data in new and powerful ways. Many datasets are naturally graphs, and some benefit from being treated as a graph. There is significant potential then for graph analytics and machine learning to be used together. This session will cover the intersection of these two fields at a high level. We will introduce basic concepts and explore three areas in which graph-based machine learning is seeing traction: measures on graphs for features and constraints, semi-supervised learning with graphs, and graph-based deep learning.

Interactive prototyping of Graph Applications with JanusGraph

Alan Pita - Expero

Graph Databases are being used to create applications that can expose new insights into very large, complex datasets. Technical skills are in short supply for writing programs to take advantage of graph query languages such as Gremlin and Cypher, and even with a team of top-notch experts the first working prototype consumes weeks of coding effort in several programming languages.
This talk explores the essential steps, components, and technologies necessary to build a scalable cloud-based graph database application. We will also visit the concept of model driven development and how it might be applied to accelerate the development of scalable cloud-based graph database applications. Finally, we will show examples of how model driven development can create rapid prototypes for graph applications that solve problems for real clients.