We are continuing to announce the speakers for Data Day Seattle 2016. We have 20+ more speakers yet to announce. Check back regularly for updates. If you wish to speak at Data Day Seattle, there is still time to submit a proposal. Details on our proposals page.
John Akred is the Founder and CTO of Silicon Valley Data Science. In the business world, John Akred likes to help organizations become more data driven. He has over 15 years of experience in machine learning, predictive modeling, and analytical system architecture. His focus is on the intersection of data science tools and techniques; data transport, processing and storage technologies; and the data management strategy and practices that can unlock data driven capabilities for an organization. A frequent speaker at the O'Reilly Strata Conferences, John is host of the perennially popular workshop: Building A Data Platform.
John will also be hosting office hours at Data Day Seattle.
Siddarth Anand (SF Bay) @r39132
Siddarth Anand is a hands-on software architect with deep experience building and scaling data infrastructure at high-traffic web sites. Sid currently serves as the Data Architect for Agari, a rising email security company. Prior to joining Agari, Sid held several technical and leadership positions including LinkedIn’s Search Architect, Netflix’s Cloud Data Architect, Etsy’s VP of Engineering, and several technical roles at eBay. Sid earned his BS and MS degrees in CS from Cornell University, where he focused on Distributed Systems. Sid is also PPMC & committer on Apache Airflow.
Preetha Appan (Austin)
Preetha Appan is the technical lead of the recommendations team at Indeed.com. Her past contributions to Indeed's job and resume search engines include keyword tokenization improvements, query expansion features, and major infrastructure and performance improvements. She enjoys working on challenging problems in machine learning and information retrieval.
Preetha Appan's session: Building Recommendations at Scale: Lessons Learned.
Benjamin Bengfort (Washington D.C.) @bbengfort
Benjamin Bengfort, Partner and Head Faculty of District Data Labs, is a Data Scientist who lives inside the beltway but ignores politics (the normal business of DC), favoring technology instead. He is currently working to finish his PhD at the University of Maryland where he studies machine learning and artificial intelligence. His lab does have robots (though this field of study is not one he favors) and, much to his chagrin, they seem to constantly arm said robots with knives and tools; presumably to pursue culinary accolades. Having seen a robot attempt to slice a tomato, Benjamin prefers his own adventures in the kitchen where he specializes in fusion French and Guyanese cuisine as well as BBQ of all types. A professional programmer by trade, a Data Scientist by avocation, Benjamin's writing and instruction pursues a diverse range of subjects from Natural Language Processing, to Data Science with Python to analytics with Hadoop and Spark.
While at Data Day, Benjamin will be holding office hours and signing copies of his upcoming O'Reilly book: Data Analytics with Hadoop.
Michael Berthold (Konstanz) @MRBerthold
Michael Berthold is co-founder of KNIME, the open analytics platform used by thousands of data experts around the world. He is currently president of KNIME.com AG and a professor at Konstanz University, where his research interests include bisociative data analysis and widening of mining algorithms. Previously he held positions in both academia (Carnegie Mellon, UC Berkeley) and industry (Intel, Tripos). Michael has co-authored two successful data analysis text books and is a frequent speaker at both academic and industrial conferences. If time permits he still writes code.
Check out our interview with Michael Berthold.
Kurt Brown (SFBay)
Kurt Brown, leads the Data Platform team at Netflix. His group architects and manages the technical infrastructure underpinning the company’s analytics. The Netflix data infrastructure includes various big data technologies (Hadoop, Hive, and Pig), Netflix open sourced applications and services (Lipstick and Genie), and traditional BI tools (Teradata and MicroStrategy).
Kurt will be speaking on How to get the most out of your data platform.
Following his presentation at Data Day, Kurt will be holding office hours and discussing careers at Netflix.
Michelle Casbon (San Antonio) @texasmichelle
Michelle Casbon is a Senior Data Science Engineer at Idibon, where she is contributing to the goal of bringing language technologies to all the world’s languages. Her development experience spans a decade across various industries, including media, investment banking, healthcare, retail, and geospatial services. Michelle completed a Masters at the University of Cambridge, focusing on NLP, speech recognition, speech synthesis, and machine translation. She loves working with open source technologies and has had a blast contributing to the Apache Spark project. Holding technical conversations and learning from the people she meets is her favorite part of Data Day Seattle.
Michelle Casbon will be appearing as part of NLP Day Seattle.
Check out our interview with Michelle Casbon.
(NEW) Nicholas Gaylord (SF Bay)
Nicholas Gaylord is a data scientist at Crowdflower. Formerly, he was at the NLP startup, Idibon, where he worked on designing and improving machine learning models to meet clients' diverse text analytics needs. Nicholas' background includes work in marketing and university education, and he holds a PhD in Linguistics from the University of Texas at Austin, where he specialized in experimental studies of human language comprehension with a secondary emphasis on corpus design and annotation.
Nicholas Gaylord will be appearing as part of NLP Day Seattle.
Dr. Denise Koessler Gosnell (Charleston) @DeniseKGosnell
Dr. Denise Gosnell, a driving member of the PokitDok Data Science team since 2014, has brought her research in applied graph theory to help architect the graph database while also serving as an analytics thought leader. Her work with the Data Science team aims to extract insight from the trenches of hidden data in healthcare and build products to bring the industry into the 21st century. She has represented PokitDok's Data Science Team at numerous conferences including, PyData, KDD (Knowledge Discovery & Data Mining) and the inaugural GraphDay.
Prior to PokitDok, Dr. Gosnell earned her Ph.D. in Computer Science from the University of Tennessee. Her research on how our online interactions leave behind unique identifiers that form a “social fingerprint” led to presentations at major conferences from San Diego to London and drew the interest of such tech industry giants as Microsoft Research and Apple. Additionally, she was a leader in addressing the underrepresentation of women in her field and founded a branch of Sheryl Sandberg's Lean In Circles.
Check out our interview with Denise Gosnell.
Jon Haddad has 15 years experience in both development and operations. For the last 10, he’s worked at various startups in southern California. For the last two years, he’s been the maintainer of cqlengine, the Python object mapper for Cassandra, now integrated into the native Cassandra driver. Jon is currently a technical evangelist at Datastax, where he continues to focus on advancing Cassandra in the Python, operations, and data science communities. Jon holds a degree in computer science from the University of Vermont.
Juliet Hougland is a data scientist at Cloudera, and contributor/committer/maintainer for the Sparkling Pandas project. Her commercial applications of data science include developing predictive maintenance models for oil and gas pipelines at Deep Signal, and designing/building a platform for real-time model application, data storage, and model building at WibiData. Juliet was the technical editor for Learning Spark by Karau et al. and Advanced Analytics with Spark by Ryza et al. She holds an M.S. in applied mathematics from the University of Colorado, Boulder and graduated Phi Beta Kappa from Reed College with a BA in math-physics..
Holden Karau is a software development engineer and is active in open source. She a co-author of Learning Spark & Fast Data Processing with Spark and has taught intro Spark workshops. Prior to IBM she worked on a variety of big data, search, and classification problems at Alpine, DataBricks, Google, Foursquare, and Amazon. She graduated from the University of Waterloo with a Bachelors of Mathematics in Computer Science. Outside of computers she enjoys dancing & playing with fire.
While at Data Day, Holden will be holding office hours and signing copies of her O'Reilly book: Learning Spark.
Check out our interview with Holden Karau.
Natalia King is a professional services consultant at Cloudera. Bio forthcoming.
Alex Korbonits (Seattle) @korbonits
Alex Korbonits is a Data Scientist at Remitly, Inc., where he works extensively on feature extraction and putting machine learning models into production. Outside of work, he loves Kaggle competitions, is diving deep into topological data analysis, and is exploring machine learning on GPUs. Alex is a graduate of the University of Chicago with degrees in Mathematics and Economics.
Alex Korbonit's Session: Distilling dark knowledge from neural networks.
Dr. Steve Kramer is a computational physicist and data science entrepreneur who founded Paragon Science in 2006 after earlier research work for the Air Force and years of managing teams in the e-commerce software industry. Steve applied techniques from chaos theory and non-linear dynamic to create a patented dynamic anomaly technology to find the "unknown unknowns" in multiple types of time-dependent data sets with no machine learning required. Since 2011, he has served as a reviewer and program committee member for the ACM KDD and IEEE Security and Intelligence Informatics conferences. Steve gave a joint presentation, "Got Chaos? Extracting Critical Business Intelligence from Email Using Advanced NLP and Dynamic Graph Analysis," (slideshare) with Matthew Russell, CTO of Digital Reasoning, at Data Day Texas 2014.
Steve Kramer's Session: Finding Key Influencers and Viral Topics in Twitter Networks Related to ISIS and to the 2016 Primary Elections.
Check out our interview with Steve Kramer.
Dr. Kramer will be holding office hours in Seattle on Friday, July 22.. If your company is interested in meeting, please contact him directly at firstname.lastname@example.org.
Corey Lanum, has a distinguished background in graph visualization. Over the last 15 years he has managed technical and business relationships with dozens of the largest defense and intelligence agencies in North America, in addition to working with many security and anti-fraud organizations in private industry. Prior to joining Cambridge Intelligence as their US Manager, Corey was helping the customers of i2 (now IBM) and SS8 to solve their most complex graph data challenges.
Corey is the author of the forthcoming Learning Graph Visualization from Manning Publications
Twitter: @ corey_lanum
Eric Lubow (NYC) @elubow
Eric Lubow is CTO of SimpleReach, builds highly-scalable distributed systems for processing social data. He began his career building secure Linux systems. Since then he has worked on building and administering various types of ad systems, maintaining and deploying large scale web applications, and building email delivery and analytics systems. Eric is also a DataStax MVP for Apache Cassandra, and co-author of Practical Cassandra, a Developer's Approach. In his spare time, Eric is a skydiver. BASE jumper, motorcycle rider. Team Tiger Schulmann mixed martial artist, snowboarder, New York Giants & 30 Rock fan, and dog dad.
(NEW) Hien Luu (San Francisco)
Hien Luu manages the Marketplace Data & Processing team at Uber. Previously Hien managed a team in the Data Analytics & Infrastructure organization at LinkedIn. He is a big data enthusiast and has been focusing on building big data infrastructure and applications. Hien is also a big proponent of open source and has contributed to Apache Pig and Azkaban. He enjoys teaching and is currently teaching Apache Spark course at UCSC Silicon Valley Extension school. He has given presentations at various conferences like QCon SF, QCon London, Hadoop Summit, JavaOne, ArchSummit and Lucene/Solr Revolution.
Charity Majors (San Francisco) @mipsytipsy
Charity Majors is Cofounder and CTO of Hound, a new startup focused on mining machine data. Previously running infrastructure at Parse, engineering manager at Facebook. Worked with the RocksDB team to develop and roll out the world's first Mongo+Rocks in production. Has run way too much Mongo, Cassandra, Mysql, Redis, and probably more but those brain cells are gone. Likes single malt scotch.
Charity Majors will be giving the keynote for Data Day Seattle.
Dr. Taylor Martin's mission in life is understanding how people learn. She's particularly interested in how adaptive and personalized learning can best be used to help people reach their learning goals faster.
As an established academic and thought-leader in the Learning Sciences, Dr. Martin has spearheaded data-centric approaches to developing learning environments and then measuring how people learn Science, Math, Engineering and Computer Science in these environments. This includes environments such as online games, online programming environments (e.g., scratch.mit.edu), internship programs, Maker spaces, and engineering design labs.
In her current role as Principal Learning Scientist at O’Reilly Media, she's focused on implementation. She's helping a team of data scientists and engineers mix in just the right amount of data-driven "learning engineering" to personalize the learning experience across various forms of published media.
Check out our interview with Taylor Martin.
(NEW) Chloe Mawer (San Francisco) @chloemawer
Chloe Mawer is a data scientist at Silicon Valley Data Science. She has experience working on a wide variety of problems ranging from developing a data strategy for a pharmaceutical company, to devising a methodology for performing longitudinal consumer impact studies at a large retail company. Additionally, she has researched, written, and spoken on the subject of valuing data for both monetization and for making internal decisions within an organization. Chloe obtained her doctorate in Environmental Engineering from Stanford University. Her research there focused on developing methods for obtaining hydrologic information from electrical data taken from the subsurface to better inform groundwater management decisions.
Patrick McFadin (Linkedin) is regarded as one of the foremost experts of Apache Cassandra and data modeling techniques. As the Chief Evangelist for Apache Cassandra and consultant for DataStax, he has helped build some of the largest deployments in the world. Previous to DataStax, he was Chief Architect at Hobsons, an education services company. There, he spoke often on web application design and performance.
Check out our interview with Patrick McFadin.
Mark Mims (SF Bay)
Mark Mims is a Principal Engineer at Silicon Valley Data Science and his passion is Data Plumbing, where Data Science meets the real world of DevOps and Infrastructure Engineering. Mark has extensive experience architecting and implementing data science solutions across a variety of industries including Entertainment, Insurance, Finance, Energy, Education, Manufacturing, and Commercial Modeling and Simulation. Before joining SVDS, Mark was the Principal Data Architect for Infochimps/CSC building managed "Big Data" pipelines for CSC's Enterprise customer-base. There, he used his deep full-stack datascience infrastructure expertise to adapt the cloud-based Infochimps product line to Openstack-based dedicated rack customer deployments. Previously, He worked for Canonical building DevOps tools for Ubuntu Server to make sure Ubuntu Server meets the needs of Data Plumbers everywhere. Mark has a doctorate in Mathematical Physics from UT Austin for research simulating quantum algorithms and is very interested in what it takes to train data scientists.
While at Data Day Texas, Mark will be holding office hours with Silicon Valley Data Science.
Ryan Mitchell (Somerville, MA) @Kludgist
Ryan Mitchell (Linkedin) is Software Engineer at LinkeDrive in Boston, where she develops their API and data analysis tools. She is a graduate of Olin College of Engineering, and is a master’s degree student at Harvard University School of Extension Studies. Prior to joining LinkeDrive, she was a Software Engineer building web scrapers and bots at Abine Inc, and regularly does freelance work, building web scrapers for clients, primarily in the financial and retail industries. Ryan is the author of two books about web scraping: Web Scraping with Python (O’Reilly, 2015), and Instant Instant Web Scraping with Java (Packt, 2013), as well as an upcoming O’Reilly video series: Web Crawling with Python.
Check out our interview with Ryan Mitchell.
Christopher Moody (SF Bay) @chrisemoody
Chris Moody loves high-performance computing, high dimensions & high fashion. He loves learning the beautiful symmetries between physics, data, and analytics. Went to Caltech, did astrostats & supercomputing and now Data Labs at Stitch Fix. Currently enjoying coding up word2vec, Gaussian Processes, Deep RNNs and t-SNE.
Christopher Moody will be appearing as part of NLP Day Seattle.
Jonathan Mugan (Austin) @jmugan
Jonathan Mugan (Linkedin) is Co-Founder and CEO at Deep Grammar. Dr. Mugan specializes in artificial intelligence and machine learning. His current research focuses in the area of deep learning, where he seeks to allow computers to acquire abstract representations that enable them to capture subtleties of meaning. Dr. Mugan received his Ph.D. in Computer Science from the University of Texas at Austin. His thesis was centered in developmental robotics, which is an area of research that seeks to understand how robots can learn about the world in the same way that human children do. Dr. Mugan also held a post-doctoral position at Carnegie Mellon University, where he worked at the intersection of machine learning and human-computer interaction. He is also the author of The Curiosity Cycle: Preparing Your Child for the Ongoing Technological Explosion.
Jonathan Mugan will be appearing as part of NLP Day Seattle.
Neha Narkhede (San Francisco) @nehanarkhede
Neha Narkhede is co-founder and head of engineering at Confluent. Previously, she was responsible for LinkedIn’s petabyte scale streaming infrastructure supporting hundreds of billions of events per day. She is also one of the initial authors of Apache Kafka and serves as a PMC member and committer for the project. In the past she has worked on search within the database at Oracle and holds a Masters in Computer Science from Georgia Tech.
(NEW) Stephen O'Sullivan (SF Bay)
Stephen O'Sullivan is the VP of Engineering at Silicon Valley Data Science, where he leads data architecture and infrastructure. A veteran of WalmartLabs, Sun and Yahoo! with over 20 years of experience creating scalable, high-availability, data and applications solutions, Stephen is leading expert on big data architecture and Hadoop.
Stephen will also be hosting office hours at Data Day Texas.
Tony Ojeda (Washington D.C.) @tonyojeda3
Tony Ojeda, Founder and CEO of District Data Labs, is an accomplished data scientist, author, and entrepreneur with expertise in streamlining business processes and over a decade of experience creating and implementing innovative data products. He believes that technological solutions should amplify or extend human abilities, and he is deeply passionate about advancing the field of data science and the skills of those who practice it. Tony has a Masters in Finance from Florida International University and an MBA with concentrations in Strategy and Entrepreneurship from DePaul University. He is also a Co-Founder and the current President of Data Community DC, a non-profit organization that promotes the work of data scientists through community-driven events.
Josh Perryman (Bryan / College Station) @joshperryman
Josh Perryman likes to play with data. Oftentimes this is implementing proprietary algorithms closer to the data for performance or scale. Sometimes it is ad-hoc investigation and analysis, a sort of exploratory querying. A few times he’s been able to leverage his experience with data engines for dramatic performance improvements. But the real joy is designing a schema for both functionality and performance, one which increases the productivity of other developers and enables a technology to solve new problems or deliver new value to the business.
But technology isn't just data, and he does more than just play with data. He’s worked with high performance computing (HPC) environments, taking computations from hours to minutes or seconds. He has built visualizations which deliver new insights into complex data domains. He’s managed technology personnel, both directly and indirectly, to deliver technology solutions. He’s have put together more types of technology components, software and hardware, than can be counted, because one of his fortes is solving problems by building sustainable systems.
Matt Peters is the Manager and Tech Lead for Moz's Data Science team. Matt's team works primarily as an incubator inside Moz to take ideas all the way from prototype to production. They build and ship large scale machine learning systems that are integrated into other back end services and become significant customer facing features. Prior to joining Moz, Matt worked in the mortgage finance industry (prepayment and default modeling) and as a post-doc/grad student in climate modeling and data analysis. He has a PhD in Applied Math from the University of Washington.
Matt Peters will be appearing as part of NLP Day Seattle.
Eric Sammer (San Francisco) @esammer
Eric Sammer , Co-Founder and CTO at Rocana, is deeply entrenched in the open source community with a passion for solving difficult scaling and processing problems. Prior to Rocana, Eric most recently served as an Engineering Manager at Cloudera, responsible for developer tools and partner integrations. Eric’s team worked with hundreds of partners to develop robust solutions and integrate them tightly with Cloudera’s Enterprise Data Hub. He was previously a Principal Solutions Architect, working with customers and strategic partners to support and integrate Hadoop clusters and related infrastructure. While working with some of Cloudera’s largest customers, Eric developed many of the best practices for developing large, distributed, data processing infrastructure.
Eric is a committer on the Apache Flume and Apache MRUnit projects, and the creator of the Kite open source project. Prior to Cloudera, Eric served as a Senior Engineer and Architect at several large scale data driven organizations including Experian and Conductor. Eric is author of Hadoop Operations, published by O'Reilly. He speaks frequently on technology and techniques for large scale data processing, integration, and system management.
Check out our interview with Eric Sammer.
(NEW) Melissa Santos (Portland)
Melissa Santos has over a decade of experience working with data, from ETLs and reporting to Hadoop clusters and marketing analytics. In her previous role as Engineering Manager of Etsy, she led her team from being a Hadoop Infrastructure team that was constantly fixing problems and cleaning up messes, to declaring themselves to be a Data Platform team, expanding into investigating new tools, teaching coworkers about big data, and consulting with other teams about how to meet their data needs. Favorite past projects include implementing a beta-binomial model in SAS, creating neighborhood boundaries from Flickr and OpenStreetMap data, using principal components analysis to detect spam emails, and teaching coworkers to write Scalding jobs. Melissa's professional goal is to make data more accessible to all parts of the business, and to businesses of every size. She has a PhD in Applied Math and is currently the (sole) Data Scientist for Big Cartel.
Brent Schneeman (Austin)
Brent Schneeman joined HomeAway in 2010 and focuses on strengthening the data science muscle in the Technology Office. As Director of Data Science, he serves as an internal consultant on a diverse set of analytic projects such as multi-variate testing, customer website behavior and applying natural language processing techniques to unstructured data. A storyteller, Brent has presented at South By Southwest and has given many technological talks. Prior to joining HomeAway, Brent worked at PayPal and Visa. He has one degree in Mathematics and another in Electrical Engineering and lives in Austin Texas with his wife and three kids and spends most of his free time mowing the lawn.
Christopher Moody will be appearing as part of NLP Day Seattle.
Dr. Juan Sequeda is the co-founder of Capsenta and the developer of Ultrawrap, a system that virtualizes relational databases as graph data sources. His research interests are on the intersection of Logic and Data and in particular between the Semantic Web and Relational Databases for data integration. Juan holds a Ph.D. in Computer Science from the University of Texas at Austin. Capsenta is a spin-off from his PhD research. Juan is the recipient of the NSF Graduate Research Fellowship, Best Student Paper at the 2014 International Semantic Web Conference, and 2nd Place in the 2013 Semantic Web Challenge for his work on ConstituteProject.org. Juan is on the editorial board of the Journal of Web Semantics and has been an invited expert member and standards editor for the World Wide Web Consortium (W3C) Relational Database to RDF Graph working group.
Check out our recent interview with Juan Sequeda.
Joe Stein is an Apache Kafka committer and PMC member. A frequent speaker on both Hadoop and Cassandra, Joe is the Co-Founder and CEO of Elodina Inc. Joe has been a distributed systems developer and architect for over 12 years now having built backend systems that supported over one hundred million unique devices a day processing trillions of events. He blogs and hosts a podcast about Hadoop and related systems at All Things Hadoop.
Andy Terrel (Austin) @aterrel
Andy Terrel , is the CTO of Fashion Metric, where he is bringing his experience building smart scalable data systems to the fashion industry. You will also find him leading the board of the NumFOCUS foundation. As a passionate advocate for open source scientific codes Andy has been involved in the wider scientific Python community since 2006, contributing to numerous projects in the scientific stack.
(NEW) Melanie Tosik (Austin)
Melanie Tosik (linkedin) is an NLP research engineer at WayBlazer. Her team at Wayblazer focuses on developing complex, real-world applications of natural language processing and machine learning. Melanie spends most of her time designing and implementing semantic microservices which accurately extract context, intent and relevant concepts from natural language queries. In addition, Melanie employs a variety of machine learning algorithms to enrich user queries with vast amounts of previously unstructured data. Before she moved to Austin, Melanie studied computational linguistics at the University of Potsdam. Rowing and gardening are her favorite free time pursuits.
Melanie Tosik will be appearing as part of NLP Day Seattle.
Rachel Warren (San Francisco)
Rachel Warren is a Software Engineer in Data Science at Alpine Data Labs in San Francisco. She is a Spark enthusiast, functional programmer, and data scientist. She is currently co-authoring a book for O’Reilly titled High Performance Spark. Bio forthcoming.
Nicole White grew up in Kansas City, Missouri. She spent four years at LSU in Baton Rouge, Louisiana where she earned a degree in economics with a minor in mathematics. She then went to the University of Texas at Austin, where she received her master’s degree in analytics. It was during this time that she found Neo4j and began exploring its capabilities. When she’s not graphing all the things, she spends her time playing card games and board games.
Check out our interview with Nicole White.
Fangjin Yang (SF Bay)
Fangjin Yang is one of the main committers to the open source Druid project and one of the first developers at Metamarkets, a San Francisco-based data startup. Fangjin previously worked on diagnostic optimization algorithms at Cisco Systems. He holds a BASc in Electrical Engineering and a MASc in Computer Engineering from the University of Waterloo, Canada.