Data Retriever with Henry Senyondo – Episode 122

Summary

Analyzing and interpreting data is a large portion of the work involved in scientific research. Getting to that point can be a lot of work on its own because of all of the steps required to download, clean, and organize the data prior to analysis. This week Henry Senyondo talks about the work he is doing with Data Retriever to make data preparation as easy as retriever install.

linode-banner-sponsor-largeDo you want to try out some of the tools and applications that you heard about on Podcast.__init__? Do you have a side project that you want to share with the world? Check out Linode at linode.com/podcastinit or use the code podcastinit2017 and get a $20 credit to try out their fast and reliable Linux virtual servers. They’ve got lightning fast networking and SSD servers with plenty of power and storage to run whatever you want to experiment on.


Preface

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • I would like to thank everyone who supports us on Patreon. Your contributions help to make the show sustainable.
  • When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at www.podastinit.com/linode and get a $20 credit to try out their fast and reliable Linux virtual servers for running your awesome app.
  • Visit the site to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch.
  • To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media.
  • Your host as usual is Tobias Macey and today I’m interviewing Henry Senyondo about Data Retriever, the package manager for public data sets.

Interview

  • Introductions
  • How did you get introduced to Python?
  • Can you explain what data retriever is and the problem that it was built to solve?
  • Are there limitations as to the types of data that can be managed by data retriever?
  • What kinds of data sets are currently available and who are the target users?
  • What is involved in preparing a new dataset to be available for installation?
  • How much of the logic for installing the data is shared between the R and Python implementations of Data Retriever and how do you ensure that the two packages evolve in parallel?
  • How is the project designed and what are some of the most difficult technical aspects of building it?
  • What is in store for the future of data retriever?

Keep In Touch

Picks

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA