Who spoke at Data Day Seattle 2016?

Check out the list of topics to be covered at this year's Data Day Seattle.
We will be announcing a few more speakers. Check this page for updates.

John Akred (SF Bay) @BigDataAnalysis

John Akred is the Founder and CTO of Silicon Valley Data Science. In the business world, John Akred likes to help organizations become more data driven. He has over 15 years of experience in machine learning, predictive modeling, and analytical system architecture. His focus is on the intersection of data science tools and techniques; data transport, processing and storage technologies; and the data management strategy and practices that can unlock data driven capabilities for an organization. A frequent speaker at the O'Reilly Strata Conferences, John is host of the perennially popular workshop: Building A Data Platform.
Linkedin
John will also be hosting office hours at Data Day Seattle.

Siddarth Anand (SF Bay) @r39132

Siddarth Anand is a hands-on software architect with deep experience building and scaling data infrastructure at high-traffic web sites. Sid currently serves as the Data Architect for Agari, a rising email security company. Prior to joining Agari, Sid held several technical and leadership positions including LinkedIn’s Search Architect, Netflix’s Cloud Data Architect, Etsy’s VP of Engineering, and several technical roles at eBay. Sid earned his BS and MS degrees in CS from Cornell University, where he focused on Distributed Systems. Sid is also PPMC & committer on Apache Airflow.

 

Preetha Appan (Austin)

Preetha Appan is the technical lead of the recommendations team at Indeed.com. Her past contributions to Indeed's job and resume search engines include keyword tokenization improvements, query expansion features, and major infrastructure and performance improvements. She enjoys working on challenging problems in machine learning and information retrieval.
Preetha Appan's session: Building Recommendations at Scale: Lessons Learned.

 

Matthew Baird (SF Bay)

Matthew Baird has Statistics and Computer Science double major from Queen’s University. He has built software and managed teams at companies like PeopleSoft, Siebel Systems and Oracle. He loves the open source movement, and building scalable, innovative enterprise software. Matt also has the amazing ability to multi-task. In the office, you’ll often catch him coding across three screens while listening to the latest E3 sessions only and engaging in epic “Dota 2” battles. A musician, Matthew has played in one of the biggest disco tribute bands in the world, or at least in Waterloo, Ontario.

Benjamin Bengfort (Washington D.C.) @bbengfort

Benjamin Bengfort, Partner and Head Faculty of District Data Labs, is a Data Scientist who lives inside the beltway but ignores politics (the normal business of DC), favoring technology instead. He is currently working to finish his PhD at the University of Maryland where he studies machine learning and artificial intelligence. His lab does have robots (though this field of study is not one he favors) and, much to his chagrin, they seem to constantly arm said robots with knives and tools; presumably to pursue culinary accolades. Having seen a robot attempt to slice a tomato, Benjamin prefers his own adventures in the kitchen where he specializes in fusion French and Guyanese cuisine as well as BBQ of all types. A professional programmer by trade, a Data Scientist by avocation, Benjamin's writing and instruction pursues a diverse range of subjects from Natural Language Processing, to Data Science with Python to analytics with Hadoop and Spark.
While at Data Day, Benjamin will be holding office hours and signing copies of his upcoming O'Reilly book: Data Analytics with Hadoop.

Michael Berthold (Konstanz) @MRBerthold

Michael Berthold is co-founder of KNIME, the open analytics platform used by thousands of data experts around the world. He is currently president of KNIME.com AG and a professor at Konstanz University, where his research interests include bisociative data analysis and widening of mining algorithms. Previously he held positions in both academia (Carnegie Mellon, UC Berkeley) and industry (Intel, Tripos). Michael has co-authored two successful data analysis text books and is a frequent speaker at both academic and industrial conferences. If time permits he still writes code.
Check out our interview with Michael Berthold.

 

Kurt Brown (SFBay)

Kurt Brown, leads the Data Platform team at Netflix. His group architects and manages the technical infrastructure underpinning the company’s analytics. The Netflix data infrastructure includes various big data technologies (Hadoop, Hive, and Pig), Netflix open sourced applications and services (Lipstick and Genie), and traditional BI tools (Teradata and MicroStrategy).
Linkedin
Kurt will be speaking on How to get the most out of your data platform.
Following his presentation at Data Day, Kurt will be holding office hours and discussing careers at Netflix.

Michelle Casbon (San Antonio) @texasmichelle

Michelle Casbon is Director of Data Science at Qordoba. Previously, she was a Senior Data Science Engineer at Idibon, where she contributed to the goal of bringing language technologies to all the world’s languages. Michelle's development experience spans a decade across various industries, including media, investment banking, healthcare, retail, and geospatial services. Michelle completed a Masters at the University of Cambridge, focusing on NLP, speech recognition, speech synthesis, and machine translation. She loves working with open source technologies and has had a blast contributing to the Apache Spark project. Holding technical conversations and learning from the people she meets is her favorite part of Data Day Seattle.
Michelle Casbon will be appearing as part of NLP Day Seattle.
Check out our interview with Michelle Casbon.

Kenny Daniel (Seattle) @platypii

Kenny Daniel is Founder and CTO of Algorithmia. He came up with the idea for Algorithmia while working on his PhD and seeing the plethora of algorithms that never saw the light of day. Kenny holds degrees from Carnegie Mellon University and the University of Southern California, where he studied artificial intelligence and mechanism design. Kenny has also worked with companies like wine enthusiast app Delectable to build out their deep learning-based image recognition systems. It was during this time that Kenny saw the possibilities of what can be achieved when companies have access to state-of-the-art AI tools. Kenny’s goal with Algorithmia is to accelerate AI development by creating a marketplace where algorithm developers can share their creations and application developers can make their applications smarter by incorporating the latest machine-learning algorithms.

Joey Echeverria (SF Bay) @fwiffo

Joey Echeverria is the platform technical lead at Rocana, where he builds applications for scaling IT operations built on the Apache Hadoop platform. Joey is a committer on the Kite SDK, an Apache-licensed data API for the Hadoop ecosystem. Joey was previously a software engineer at Cloudera, where contributed to several ASF projects including Apache Flume, Apache Sqoop, Apache Hadoop, and Apache HBase. Joey is also a coauthor of Hadoop Security, published by O'Reilly Media.

 

 

Nicholas Gaylord (SF Bay) @texastacos

Nicholas Gaylord is Senior Data Scientist at CrowdFlower, where he helps build out their new machine learning offering, CrowdFlower AI. CrowdFlower AI allows data scientists to construct, monitor, and improve machine learning models using data collected at scale from human contributors via the CrowdFlower platform, in a tightly integrated human-in-the-loop active learning environment. Prior to CrowdFlower, Nick was a data scientist at SF text analytics startup Idibon. He has a PhD from the University of Texas at Austin, where his research focused on human language comprehension and the construction of datasets for NLP applications. In his spare time he fixes bikes and collaborates on work applying cognitive science principles to the public health domain.
Nicholas Gaylord will be appearing as part of NLP Day Seattle.

Dr. Denise Koessler Gosnell (Charleston) @DeniseKGosnell

Dr. Denise Gosnell, a driving member of the PokitDok Data Science team since 2014, has brought her research in applied graph theory to help architect the graph database while also serving as an analytics thought leader. Her work with the Data Science team aims to extract insight from the trenches of hidden data in healthcare and build products to bring the industry into the 21st century. She has represented PokitDok's Data Science Team at numerous conferences including, PyData, KDD (Knowledge Discovery & Data Mining) and the inaugural GraphDay.
Prior to PokitDok, Dr. Gosnell earned her Ph.D. in Computer Science from the University of Tennessee. Her research on how our online interactions leave behind unique identifiers that form a “social fingerprint” led to presentations at major conferences from San Diego to London and drew the interest of such tech industry giants as Microsoft Research and Apple. Additionally, she was a leader in addressing the underrepresentation of women in her field and founded a branch of Sheryl Sandberg's Lean In Circles.
Denise Gosnell will be appearing as part of Graph Day Seattle.
Check out our interview with Denise Gosnell.

Juliet Hougland (SF Bay) @JulietHougland

Juliet Hougland is a data scientist at Cloudera, and contributor/committer/maintainer for the Sparkling Pandas project. Her commercial applications of data science include developing predictive maintenance models for oil and gas pipelines at Deep Signal, and designing/building a platform for real-time model application, data storage, and model building at WibiData. Juliet was the technical editor for Learning Spark by Karau et al. and Advanced Analytics with Spark by Ryza et al. She holds an M.S. in applied mathematics from the University of Colorado, Boulder and graduated Phi Beta Kappa from Reed College with a BA in math-physics..

Rohit Jain (Austin)

Rohit Jain is the CTO at Esgyn working on Apache Trafodion™, currently in incubation. Trafodion is a transactional to analytics SQL-on-Hadoop RDBMS. Rohit worked for Tandem, Compaq, and Hewlett-Packard for the last 28 of his 40 years in application and database development. He has worked as an application developer, solutions architect, consultant, software engineer, database architect, development and QA manager, Product Manager, and CTO. His experience spans Online Transaction Processing, Operational Data Stores, Data Marts, Enterprise Data Warehouses, Business Intelligence, and Advanced Analytics, on distributed massively parallel systems.
O'Reilly Media just published the following report from Rohit: In Search of Database Nirvana – The Challenges of Delivering HTAP
Linkedin

Sanjay Joshi (Seattle)

Sanjay Joshi is the CTO of Healthcare and Life Sciences at the EMC Emerging Technologies Division. Based in Seattle, Sanjay's 28+ year career has spanned the entire gamut of life-sciences and healthcare from clinical and biotechnology research to healthcare informatics to medical devices; he defines himself as a “non-reductionist systems guy.”
His current focus is a systems view of Genomics, Proteomics and Healthcare for infrastructures and informatics. Recent experience has included data management and instruments for Electronic Medical Records; Proteomics and Flow Cytometry; FDA and HIPAA validations; Lab Information Management Systems (LIMS); Translational Genomics research and Imaging. Sanjay holds a patent in multi-dimensional analytics. He began his career developing and building X-Ray machines.
Sanjay was the recipient of a National Institutes of Health (NIH) Small Business Innovation Research (SBIR) grant and has been a consultant or co-Principal-Investigator on several NIH grants. He is actively involved in non-profit biotech networking and educational organizations in the Seattle area and beyond.
Sanjay holds a Master of Biomedical Engineering from the University of New South Wales, Sydney and a Bachelor of Instrumentation Technology from Bangalore University. He has completed several medical school and PhD level courses.
Linkedin

Holden Karau (SF Bay) @holdenkarau

Holden Karau is a software development engineer and is active in open source. She a co-author of Learning Spark & Fast Data Processing with Spark and has taught intro Spark workshops. Prior to IBM she worked on a variety of big data, search, and classification problems at Alpine, DataBricks, Google, Foursquare, and Amazon. She graduated from the University of Waterloo with a Bachelors of Mathematics in Computer Science. Outside of computers she enjoys dancing & playing with fire.
While at Data Day, Holden will be holding office hours and signing copies of her O'Reilly book: Learning Spark.
Check out our interview with Holden Karau.

Jason Kessler (Seattle) @ jasonkessler

Jason Kessler is a data scientist at CDK Global, where he analyses language use and consumer behavior in the online auto-shopping ecosystem. Prior to joining CDK, Jason was the founding data scientist at PlaceIQ and worked as a research scientist for JD Power and Associates. He has published peer-reviewed papers on algorithms and corpora for sentiment and belief analysis, and has sat on program committees and reviewed for several AI and NLP conferences. Most recently, he has delivered talks on the identification of persuasive and influential language language to the 2015 Sentiment Symposium and Texas Data Day 2016.
Linkedin

 

 

Alex Korbonits (Seattle) @korbonits

Alex Korbonits is a Data Scientist at Remitly, Inc., where he works extensively on feature extraction and putting machine learning models into production. Outside of work, he loves Kaggle competitions, is diving deep into topological data analysis, and is exploring machine learning on GPUs. Alex is a graduate of the University of Chicago with degrees in Mathematics and Economics.
Alex Korbonit's Session: Distilling dark knowledge from neural networks.

 

Dr. Steve Kramer (Austin) @ParagonSci_Inc

Dr. Steve Kramer is a computational physicist and data science entrepreneur who founded Paragon Science in 2006 after earlier research work for the Air Force and years of managing teams in the e-commerce software industry. Steve applied techniques from chaos theory and non-linear dynamic to create a patented dynamic anomaly technology to find the "unknown unknowns" in multiple types of time-dependent data sets with no machine learning required. Since 2011, he has served as a reviewer and program committee member for the ACM KDD and IEEE Security and Intelligence Informatics conferences. Steve gave a joint presentation, "Got Chaos? Extracting Critical Business Intelligence from Email Using Advanced NLP and Dynamic Graph Analysis," (slideshare) with Matthew Russell, CTO of Digital Reasoning, at Data Day Texas 2014.
Steve Kramer's Session: Finding Key Influencers and Viral Topics in Twitter Networks Related to ISIS and to the 2016 Primary Elections.
Check out our interview with Steve Kramer.
Dr. Kramer will be holding office hours in Seattle on Friday, July 22.. If your company is interested in meeting, please contact him directly at steve.kramer@paragonscience.com.

Jay Kreps (SF Bay) @jaykreps

Jay Kreps will be giving the Streaming Data Keynote.

Jay Kreps is the CEO of Confluent, Inc., a company backing the popular Apache Kafka messaging system. Prior to founding Confluent, he was formerly the lead architect for data infrastructure at LinkedIn. He is among the original authors of several open source projects including Project Voldemort (a key-value store). Apache Kafka (a distributed messaging system) and Apache Samza (a stream processing system).

 

 

 

Alex Liang (Seattle) @alexlianghu

Alex Liang is Director of Data Programs, Products, Architect and Strategy at eBay. Alex helps drive the evolution of eBay's analytical capabilities for its rapidly growing Marketplaces business, where over 100 million active worldwide users transact at a rate of more than $3,500 of goods every second.
For the past 11 years at Ebay, Alex has helped eBay leverage its unique end-to-end data set and drive continuous innovation to accelerate top-line growth for the company. Direct the utilization of a significant investment in large-scale data management and processing resources and technologies while helping to evolve eBay's world-class analytics capabilities. Enable the business goals through partnering with internal customers and stakeholders to remove barriers to data-driven insights.

Corey Lanum (Boston) @corey_lanum

Corey Lanum, has a distinguished background in graph visualization. Over the last 15 years he has managed technical and business relationships with dozens of the largest defense and intelligence agencies in North America, in addition to working with many security and anti-fraud organizations in private industry. Prior to joining Cambridge Intelligence as their US Manager, Corey was helping the customers of i2 (now IBM) and SS8 to solve their most complex graph data challenges.
Corey is the author of the forthcoming Learning Graph Visualization from Manning Publications
Corey Lanum will be appearing as part of Graph Day Seattle.

Eric Lubow (NYC) @elubow

Eric Lubow is CTO of SimpleReach, builds highly-scalable distributed systems for processing social data. He began his career building secure Linux systems. Since then he has worked on building and administering various types of ad systems, maintaining and deploying large scale web applications, and building email delivery and analytics systems. Eric is also a DataStax MVP for Apache Cassandra, and co-author of Practical Cassandra, a Developer's Approach. In his spare time, Eric is a skydiver. BASE jumper, motorcycle rider. Team Tiger Schulmann mixed martial artist, snowboarder, New York Giants & 30 Rock fan, and dog dad.
Linkedin
eric.lubow.org

Hien Luu (San Francisco) @hluu

Hien Luu manages the Marketplace Data & Processing team at Uber. Previously Hien managed a team in the Data Analytics & Infrastructure organization at LinkedIn. He is a big data enthusiast and has been focusing on building big data infrastructure and applications. Hien is also a big proponent of open source and has contributed to Apache Pig and Azkaban. He enjoys teaching and is currently teaching Apache Spark course at UCSC Silicon Valley Extension school. He has given presentations at various conferences like QCon SF, QCon London, Hadoop Summit, JavaOne, ArchSummit and Lucene/Solr Revolution.

Charity Majors (San Francisco) @mipsytipsy

Charity Majors is an engineer and cofounder at Honeycomb - a new startup focused on making impossible data observability problems not only possible, but exploratory and inviting for teams of all types. Prior to Honeycomb, Charity led the Parse infrastructure team as they grew Parse from a handful of mobile apps to over a million, then worked as an engineering manager at Facebook, while pairing closely with the RocksDB team to develop and roll out the world's first Mongo+Rocks in production. She is a reluctant DBA who has spent far too much time running Mongo, Cassandra, Mysql, Redis, and probably more but those brain cells are gone. Charity is co-author ( with Laine Campbell) of the upcoming O'Reilly book Database Reliability Engineering and loves single malt scotch.
Charity's GitHub
Charity Majors will be giving the keynote for Data Day Seattle.

 

Taylor Martin (Santa Rosa) @taylormartin42

Dr. Taylor Martin's mission in life is understanding how people learn. She's particularly interested in how adaptive and personalized learning can best be used to help people reach their learning goals faster.
As an established academic and thought-leader in the Learning Sciences, Dr. Martin has spearheaded data-centric approaches to developing learning environments and then measuring how people learn Science, Math, Engineering and Computer Science in these environments. This includes environments such as online games, online programming environments (e.g., scratch.mit.edu), internship programs, Maker spaces, and engineering design labs.
In her current role as Principal Learning Scientist at O’Reilly Media, she's focused on implementation. She's helping a team of data scientists and engineers mix in just the right amount of data-driven "learning engineering" to personalize the learning experience across various forms of published media.
Check out our interview with Taylor Martin.

Chloe Mawer (San Francisco) @chloemawer

Chloe Mawer is a data scientist at Silicon Valley Data Science. She has experience working on a wide variety of problems ranging from developing a data strategy for a pharmaceutical company, to devising a methodology for performing longitudinal consumer impact studies at a large retail company. Additionally, she has researched, written, and spoken on the subject of valuing data for both monetization and for making internal decisions within an organization. Chloe obtained her doctorate in Environmental Engineering from Stanford University. Her research there focused on developing methods for obtaining hydrologic information from electrical data taken from the subsurface to better inform groundwater management decisions.

Patrick McFadin (SF Bay) @patrickmcfadin

Patrick McFadin (Linkedin) is regarded as one of the foremost experts of Apache Cassandra and data modeling techniques. As the Chief Evangelist for Apache Cassandra and consultant for DataStax, he has helped build some of the largest deployments in the world. Previous to DataStax, he was Chief Architect at Hobsons, an education services company. There, he spoke often on web application design and performance.
Check out our interview with Patrick McFadin.

 

Mark Mims (SF Bay) @m_3

Mark Mims is a Principal Engineer at Silicon Valley Data Science and his passion is Data Plumbing, where Data Science meets the real world of DevOps and Infrastructure Engineering. Mark has extensive experience architecting and implementing data science solutions across a variety of industries including Entertainment, Insurance, Finance, Energy, Education, Manufacturing, and Commercial Modeling and Simulation. Before joining SVDS, Mark was the Principal Data Architect for Infochimps/CSC building managed "Big Data" pipelines for CSC's Enterprise customer-base. There, he used his deep full-stack datascience infrastructure expertise to adapt the cloud-based Infochimps product line to Openstack-based dedicated rack customer deployments. Previously, He worked for Canonical building DevOps tools for Ubuntu Server to make sure Ubuntu Server meets the needs of Data Plumbers everywhere. Mark has a doctorate in Mathematical Physics from UT Austin for research simulating quantum algorithms and is very interested in what it takes to train data scientists.
Linkedin
While at Data Day Texas, Mark will be holding office hours with Silicon Valley Data Science.

Ryan Mitchell (Somerville, MA) @Kludgist

Ryan Mitchell (Linkedin) is a senior software engineer at HedgeServ , She received her master's in software engineering from Harvard University, Extension School, and a bachelor's in Engineering at Olin College of Engineering. Prior to joining HedgeServ, Ryan was a Software Engineer building web scrapers and bots at Abine Inc. Ryan is the author of two books about web scraping: Web Scraping with Python (O’Reilly, 2015), and Instant Instant Web Scraping with Java (Packt, 2013), as well as an upcoming O’Reilly video series: Web Crawling with Python.
In addition to speaking at past Data Day events in both Seattle and Austin, she gives talks and runs workshops around the country, including an upcoming 8 week web development course through the Boston Public Library this fall.
Ryan will be holding office hours following her talk at Data Day Seattle.
Check out our interview with Ryan Mitchell.

Christopher Moody (SF Bay) @chrisemoody

Chris Moody loves high-performance computing, high dimensions & high fashion. He loves learning the beautiful symmetries between physics, data, and analytics. Went to Caltech, did astrostats & supercomputing and now Data Labs at Stitch Fix. Currently enjoying coding up word2vec, Gaussian Processes, Deep RNNs and t-SNE.

Christopher Moody will be appearing as part of NLP Day Seattle.

 

 

Jonathan Mugan (Austin) @jmugan

Jonathan Mugan (Linkedin) is Co-Founder and CEO at Deep Grammar. Dr. Mugan specializes in artificial intelligence and machine learning. His current research focuses in the area of deep learning, where he seeks to allow computers to acquire abstract representations that enable them to capture subtleties of meaning. Dr. Mugan received his Ph.D. in Computer Science from the University of Texas at Austin. His thesis was centered in developmental robotics, which is an area of research that seeks to understand how robots can learn about the world in the same way that human children do. Dr. Mugan also held a post-doctoral position at Carnegie Mellon University, where he worked at the intersection of machine learning and human-computer interaction. He is also the author of The Curiosity Cycle: Preparing Your Child for the Ongoing Technological Explosion.
Jonathan Mugan will be appearing as part of NLP Day Seattle.

Stephen O'Sullivan (SF Bay) @steveos

Stephen O'Sullivan is the VP of Engineering at Silicon Valley Data Science, where he leads data architecture and infrastructure. A veteran of WalmartLabs, Sun and Yahoo! with over 20 years of experience creating scalable, high-availability, data and applications solutions, Stephen is leading expert on big data architecture and Hadoop.
Linkedin
Stephen will also be hosting office hours at Data Day Texas.

 

 

Tony Ojeda (Washington D.C.) @tonyojeda3

Tony Ojeda, Founder and CEO of District Data Labs, is an accomplished data scientist, author, and entrepreneur with expertise in streamlining business processes and over a decade of experience creating and implementing innovative data products. He believes that technological solutions should amplify or extend human abilities, and he is deeply passionate about advancing the field of data science and the skills of those who practice it. Tony has a Masters in Finance from Florida International University and an MBA with concentrations in Strategy and Entrepreneurship from DePaul University. He is also a Co-Founder and the current President of Data Community DC, a non-profit organization that promotes the work of data scientists through community-driven events.

Josh Perryman (Bryan / College Station) @joshperryman

Josh Perryman likes to play with data. Oftentimes this is implementing proprietary algorithms closer to the data for performance or scale. Sometimes it is ad-hoc investigation and analysis, a sort of exploratory querying. A few times he’s been able to leverage his experience with data engines for dramatic performance improvements. But the real joy is designing a schema for both functionality and performance, one which increases the productivity of other developers and enables a technology to solve new problems or deliver new value to the business.
But technology isn't just data, and he does more than just play with data. He’s worked with high performance computing (HPC) environments, taking computations from hours to minutes or seconds. He has built visualizations which deliver new insights into complex data domains. He’s managed technology personnel, both directly and indirectly, to deliver technology solutions. He’s have put together more types of technology components, software and hardware, than can be counted, because one of his fortes is solving problems by building sustainable systems.
Josh Perryman will be appearing as part of Graph Day Seattle

Matt Peters (Seattle) @mattthemathman

Matt Peters is the Manager and Tech Lead for Moz's Data Science team. Matt's team works primarily as an incubator inside Moz to take ideas all the way from prototype to production. They build and ship large scale machine learning systems that are integrated into other back end services and become significant customer facing features. Prior to joining Moz, Matt worked in the mortgage finance industry (prepayment and default modeling) and as a post-doc/grad student in climate modeling and data analysis. He has a PhD in Applied Math from the University of Washington.

Matt Peters will be appearing as part of NLP Day Seattle.

Eric Sammer (San Francisco) @esammer

Eric Sammer , Co-Founder and CTO at Rocana, is deeply entrenched in the open source community with a passion for solving difficult scaling and processing problems. Prior to Rocana, Eric most recently served as an Engineering Manager at Cloudera, responsible for developer tools and partner integrations. Eric’s team worked with hundreds of partners to develop robust solutions and integrate them tightly with Cloudera’s Enterprise Data Hub. He was previously a Principal Solutions Architect, working with customers and strategic partners to support and integrate Hadoop clusters and related infrastructure. While working with some of Cloudera’s largest customers, Eric developed many of the best practices for developing large, distributed, data processing infrastructure.
Eric is a committer on the Apache Flume and Apache MRUnit projects, and the creator of the Kite open source project. Prior to Cloudera, Eric served as a Senior Engineer and Architect at several large scale data driven organizations including Experian and Conductor. Eric is author of Hadoop Operations, published by O'Reilly. He speaks frequently on technology and techniques for large scale data processing, integration, and system management.
Linkedin
Blog
Check out our interview with Eric Sammer.

Melissa Santos (Portland) @ansate

Melissa Santos has over a decade of experience working with data, from ETLs and reporting to Hadoop clusters and marketing analytics. In her previous role as Engineering Manager of Etsy, she led her team from being a Hadoop Infrastructure team that was constantly fixing problems and cleaning up messes, to declaring themselves to be a Data Platform team, expanding into investigating new tools, teaching coworkers about big data, and consulting with other teams about how to meet their data needs. Favorite past projects include implementing a beta-binomial model in SAS, creating neighborhood boundaries from Flickr and OpenStreetMap data, using principal components analysis to detect spam emails, and teaching coworkers to write Scalding jobs. Melissa's professional goal is to make data more accessible to all parts of the business, and to businesses of every size. She has a PhD in Applied Math and is currently the (sole) Data Scientist for Big Cartel.

 

Brent Schneeman (Austin) @schnee

Brent Schneeman joined HomeAway in 2010 and focuses on strengthening the data science muscle in the Technology Office. As Director of Data Science, he serves as an internal consultant on a diverse set of analytic projects such as multi-variate testing, customer website behavior and applying natural language processing techniques to unstructured data. A storyteller, Brent has presented at South By Southwest and has given many technological talks. Prior to joining HomeAway, Brent worked at PayPal and Visa. He has one degree in Mathematics and another in Electrical Engineering and lives in Austin Texas with his wife and three kids and spends most of his free time mowing the lawn.

Christopher Moody will be appearing as part of NLP Day Seattle.

Juan Sequeda (Austin) @juansequeda

Dr. Juan Sequeda is the co-founder of Capsenta and the developer of Ultrawrap, a system that virtualizes relational databases as graph data sources. His research interests are on the intersection of Logic and Data and in particular between the Semantic Web and Relational Databases for data integration. Juan holds a Ph.D. in Computer Science from the University of Texas at Austin. Capsenta is a spin-off from his PhD research. Juan is the recipient of the NSF Graduate Research Fellowship, Best Student Paper at the 2014 International Semantic Web Conference, and 2nd Place in the 2013 Semantic Web Challenge for his work on ConstituteProject.org. Juan is on the editorial board of the Journal of Web Semantics and has been an invited expert member and standards editor for the World Wide Web Consortium (W3C) Relational Database to RDF Graph working group.
Juan Sequeda will be appearing as part of Graph Day Seattle
Check out our recent interview with Juan Sequeda.

Joe Stein (NYC) @allthingshadoop

Joe Stein is an Apache Kafka committer and PMC member. A frequent speaker on both Hadoop and Cassandra, Joe is the Co-Founder and CEO of Elodina Inc. Joe has been a distributed systems developer and architect for over 12 years now having built backend systems that supported over one hundred million unique devices a day processing trillions of events. He blogs and hosts a podcast about Hadoop and related systems at All Things Hadoop.
Linkedin

 

Andy Terrel (Austin) @aterrel

Andy Terrel , is the CTO of Fashion Metric, where he is bringing his experience building smart scalable data systems to the fashion industry. You will also find him leading the board of the NumFOCUS foundation. As a passionate advocate for open source scientific codes Andy has been involved in the wider scientific Python community since 2006, contributing to numerous projects in the scientific stack.
Linkedin

 

 

Melanie Tosik (Austin) @meltomene

Melanie Tosik (linkedin) is an NLP research engineer at WayBlazer. Her team at Wayblazer focuses on developing complex, real-world applications of natural language processing and machine learning. Melanie spends most of her time designing and implementing semantic microservices which accurately extract context, intent and relevant concepts from natural language queries. In addition, Melanie employs a variety of machine learning algorithms to enrich user queries with vast amounts of previously unstructured data. Before she moved to Austin, Melanie studied computational linguistics at the University of Potsdam. Rowing and gardening are her favorite free time pursuits.
Melanie Tosik will be appearing as part of NLP Day Seattle.

Thomas Varghese (Seattle)

Thomas Varghese is Senior Manager for Product Management, Data Services and Solutions, at eBay. Thomas has 19 years of experience in building software capabilities to address complex business and technical challenges, and has been with eBay for the past 4 years, focusing on real-time big data capabilities, cloud services, and distributed computing technologies. He has led several key initiatives in the data infrastructure and analytics space from concept thru launch. Prior to eBay, Thomas worked at Microsoft and Hewlett-Packard in various engineering and product management roles, where his focus and expertise was building and running high performance and highly available distributed systems for massive scale.

Rachel Warren (San Francisco) @warre_n_peace

Rachel Warren is a Software Engineer in Data Science at Alpine Data Labs in San Francisco. She is a Spark engineer, functional programmer, and data scientist. She has worked on financial, political, and natural language problems. In addition to coding she is passionate about teaching and working with people. She has taught computer science and math in Ghana, and is now helping educate her peers on Spark in San Fransisco. She is currently co-authoring a book with Holden Karau for O’Reilly titled High Performance Spark which is in early release.

 

 

Fangjin Yang (SF Bay) @fangjin

Fangjin Yang is one of the main committers to the open source Druid project and one of the first developers at Metamarkets, a San Francisco-based data startup. Fangjin previously worked on diagnostic optimization algorithms at Cisco Systems. He holds a BASc in Electrical Engineering and a MASc in Computer Engineering from the University of Waterloo, Canada.
Linkedin

 

 

Peter Zaitsev (Raleigh-Durham) @PeterZaitsev

Peter Zaitsev co-founded Percona in 2006, assuming the role of CEO. Percona helps companies of all sizes maximize their success with MySQL. Percona was named to the Inc. 5000 in 2013. Peter was an early employee at MySQL AB, eventually leading the company’s High Performance Group. A serial entrepreneur, Peter co-founded his first startup while attending Moscow State University where he majored in Computer Science. As CEO of Percona, Peter enjoys mixing business leadership with hands on technical expertise. Peter is co-author of High Performance MySQL published by O’Reilly, one of the most popular books on MySQL performance. Peter blogs regularly on MySQLPerformanceBlog.com and speaks frequently at conferences. Peter lives in North Carolina with his wife and two children. In his spare time, Peter enjoys travel and spending time outdoors.
Linkedin