Books and Authors at Data Day Seattle

Thinking about speaking at Data Day Seattle? Check out the testimonials from past speakers.

The following Data Day Seattle speakers will be signing copies of their books during office hours.

All author book signings will be held at the O’Reilly exhibit. This is a great opportunity to meet O’Reilly authors and get free copies of their books. Complimentary copies new titles will be given to the first 25 attendees. Limit one free book per attendee.

High Performance Spark (Holden Karau / Rachel Warren)

If you’ve successfully used Apache Spark to solve medium sized-problems, but still struggle to realize the "Spark promise" of unparalleled performance on big data, this book is for you. High Performance Spark shows you how take advantage of Spark at scale, so you can grow beyond the novice-level. It’s ideal for software engineers, data engineers, developers, and system administrators working with large-scale data applications.

  • Learn how to make Spark jobs run faster
  • Productionize exploratory data science with Spark
  • Handle even larger data sets with Spark
  • Reduce pipeline running times for faster insights

Data Analytics with Hadoop (Benjamin Bengfort / Jenny Kim)

If you’re a data scientist ready to tackle statistical and machine learning techniques across large data sets, this practical guide provides a solid introduction to the world of clustered computing and analytics with Hadoop. Instead of deployment, operations, or software development, this book focuses on particular analyses you can build, the data warehousing techniques that Hadoop provides, and the higher order data workflows it can produce.
You’ll learn a wide range of topics, from the basics of using MapReduce and Spark with Python to advanced modeling and data management using Spark MLlib, Hive, and HBase. You'll gain an understanding of the analytical processes and data systems available to build and empower data products that require huge amounts of data.



Kafka: The Definitive Guide (Neha Narkhede / Gwen Shapira)

Learn how to take full advantage of Apache Kafka, the distributed, publish-subscribe queue for handling real-time data feeds. With this comprehensive book, you’ll understand how Kafka works and how it’s designed. Authors Neha Narkhede, Gwen Shapira, and Todd Palino show you how to deploy production Kafka clusters; secure, tune, and monitor them; write rock-solid applications that use Kafka; and build scalable stream-processing applications.
Learn how Kafka compares to other queues, and where it fits in the big data ecosystem
Dive into Kafka’s internal design
Pick up best practices for developing applications that use Kafka
Understand the best way to deploy Kafka in production monitoring, tuning, and maintenance tasks
Learn how to secure a Kafka cluster
Get detailed use-cases

Educating Data (Taylor Martin)

While big data has already made significant advances in business and government, data analytics is also beginning to transform education. This O’Reilly report explores how the use of analytics has already helped several educational programs, such as personalized learning and massive open online courses (MOOCs), for students of all ages.
Of course, that’s only part of the story. As author Taylor Martin explains, researchers, educators, and private practitioners in the field have also run into several challenges in bringing the education field up to speed. Issues such as building data infrastructures, integrating data sources, and assuring student privacy still need to be resolved—as does the problem of teaching a new generation of data scientists about the challenges and opportunities unique to education.



The following Data Day Seattle speakers will have copies of their books on hand for review during their office hours.

Webscraping with Python (Ryan Mitchell)

Want to freely access unlimited data from any web source, in any format? Automated gathering and manipulation of data from across the web helped launch Facebook in its early days, and is the foundation of Google's search engine today. With this book, you’ll learn how to gather unlimited data from any web source and use it for your own studies or web applications.
Web scraping is a technology nearly as old as the web itself, but the techniques used must keep pace with web technologies in order to remain viable. Web Scraping with Python not only teaches you the basics of web scraping, but also gets you up to speed on cutting-edge security and technology considerations in one comprehensive guide.



Hadoop Operations (Eric Sammer)

If you’ve been asked to maintain large and complex Hadoop clusters, this book is a must. Demand for operations-specific material has skyrocketed now that Hadoop is becoming the de facto standard for truly large-scale data processing in the data center. Eric Sammer, Principal Solution Architect at Cloudera, shows you the particulars of running Hadoop in production, from planning, installing, and configuring the system to providing ongoing maintenance. Rather than run through all possible scenarios, this pragmatic operations guide calls out what works, as demonstrated in critical deployments.





Author's gallery from past Data Day events.

Above: You can always locate the O'Reilly exhibit at Data Day - just follow the line!


Above: Annalis Clint of O'Reilly Media chatting before Paco Nathan's booksigning at Data Day 2014.


Above: Wes McKinney signing his book, Python for Data Analysis, at Data Day 2013.