PyTables with Francesc Alted – Episode 97

Summary

HDF5 is a file format that supports fast and space efficient analysis of large datasets. PyTables is a project that wraps and expands on the capabilities of HDF5 to make it easy to integrate with the larger Python data ecosystem. Francesc Alted explains how the project got started, how it works, and how it can be used for creating sharable and archivable data sets.

linode-banner-sponsor-largeDo you want to try out some of the tools and applications that you heard about on Podcast.__init__? Do you have a side project that you want to share with the world? Check out Linode at linode.com/podcastinit or use the code podcastinit2017 and get a $20 credit to try out their fast and reliable Linux virtual servers. They’ve got lightning fast networking and SSD servers with plenty of power and storage to run whatever you want to experiment on.


GoCD is the on-premise open source continuous delivery server created by ThoughtWorks and modeled after the ideas in the Continuous Delivery book by Jez Humble and David Farley.

With GoCD’s comprehensive pipeline modeling, you can model complex workflows for multiple teams with ease. And GoCD’s Value Stream Map lets you track a change from commit to deploy at a glance.

GoCD’s real power is in the visibility it provides over your end-to-end workflow. So you get complete control of and visibility into your deployments, across multiple teams.

Say goodbye to deployment panic and hello to consistent, predictable deliveries.

To learn more about GoCD, visit gocd.org for a free download. Professional Support and enterprise add-ons, including disaster recovery, are available.


Preface

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • I would like to thank everyone who has donated to the show. Your contributions help us make the show sustainable.
  • When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at linode.com/podcastinit and get a $20 credit to try out their fast and reliable Linux virtual servers for running your awesome app. Linode will has announced new plans, including 1GB for $5 plan, high memory plans starting at 16GB for $60/mo and an upgrade in storage from 24GB to 30GB on our 2GB for $10 plan.
  • Visit our site to subscribe to our show, sign up for our newsletter, read the show notes, and get in touch.
  • To help other people find the show you can leave a review on iTunes, or Google Play Music, and tell your friends and co-workers
  • Your host as usual is Tobias Macey and today I’m interviewing Francesc Alted about PyTables

Interview

  • Introductions
  • How did you get introduced to Python?
  • To start with, what is HDF5 and what was the problem that motivated you to wrap Python around it to create PyTables?
  • Which are the most relevant contributors for PyTables? How you interacted?
  • How is the project architected and what are some of the design decisions that you are most proud of?
  • What are some of the typical use cases for PyTables and how does it tie into the broader Python data ecosystem?
  • How common is it to use an HDF5 file as a data interchange format to be shared between researchers or between languages?
  • Given the ability to create custom node types, does that inhibit the ability to interact with the stored data using other libraries?
  • What are some of the capabilities of HDF5 and PyTables that can’t be reasonably replicated in other data storage systems?
  • One of the more intriguing capabilities that I noticed while reading the documentation is the ability to perform undo and redo operations on the data. How might that be leveraged in a real-world use case?
  • What are some of the most interesting or unexpected uses of PyTables that you are aware of?

Keep In Touch

Picks

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA