Data Science

Luminoth: AI Powered Computer Vision for Python with Joaquin Alori - Episode 154

Summary

Making computers identify and understand what they are looking at in digital images is an ongoing challenge. Recent years have seen notable increases in the accuracy and speed of object detection due to deep learning and new applications of neural networks. In order to make it easier for developers to take advantage of these techniques Tryo Labs built Luminoth. In this interview Joaquín Alori explains how how Luminoth works, how it can be used in your projects, and how it compares to API oriented services for computer vision.

Introduction

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • When you’re ready to launch your next app you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to scale up. Go to podcastinit.com/linode to get a $20 credit and launch a new server in under a minute.
  • For complete visibility into your application stack, deployment tracking, and powerful alerting, DataDog has got you covered. With their monitoring, metrics, and log collection agent, including extensive integrations and distributed tracing, you’ll have everything you need to find and fix bugs in no time. Go to podcastinit.com/datadog today to start your free 14 day trial and get a sweet new T-Shirt.
  • To get worry-free releases download GoCD, the open source continous delivery server built by Thoughworks. You can use their pipeline modeling and value stream map to build, control and monitor every step from commit to deployment in one place. Go to podcastinit.com/gocd to learn more about their professional support services and enterprise add-ons.
  • Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email [email protected])
  • Your host as usual is Tobias Macey and today I’m interviewing Joaquín Alori about Luminoth, a deep learning toolkit for computer vision in Python

Interview

  • Introductions
  • How did you get introduced to Python?
  • What is Luminoth and what was your motivation for creating it?
  • Computer vision has been a focus of AI research for decades. How do current approaches with deep learning compare to previous generations of tooling?
  • What are some of the most difficult problems in visual processing that still need to be solved?
  • What are the limitations of Luminoth for building a computer vision application and how do they differ from the capabilities of something built with a prior generation of tooling such as OpenCV?
  • For someone who is interested in using Luminoth in their project what is the current workflow?
  • How do the capabilities of Luminoth compare with some of the various service based options such as Rekognition for Amazon or the Cloud Vision API from Google?
    • What are some of the motivations for using Luminoth in place of these services?
  • What are some of the highest priority features that you are focusing on implementing in Luminoth?
  • When is Luminoth the wrong choice for a computer vision application and what are some of the strongest alternatives at the moment?

Keep In Touch

Picks

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

PyRay: Pure Python 3D Rendering with Rohit Pandey - Episode 147

Summary

Using a rendering library can be a difficult task due to dependency issues and complicated APIs. Rohit Pandey wrote PyRay to address these issues in a pure Python library. In this episode he explains how he uses it to gain a more thorough understanding of mathematical models, how it compares to other options, and how you can use it for creating your own videos and GIFs.

Preface

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • I would like to thank everyone who supports us on Patreon. Your contributions help to make the show sustainable.
  • When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at podastinit.com/linode and get a $20 credit to try out their fast and reliable Linux virtual servers for running your awesome app. And now you can deliver your work to your users even faster with the newly upgraded 200 GBit network in all of their datacenters.
  • If you’re tired of cobbling together your deployment pipeline then it’s time to try out GoCD, the open source continuous delivery platform built by the people at ThoughtWorks who wrote the book about it. With GoCD you get complete visibility into the life-cycle of your software from one location. To download it now go to podcatinit.com/gocd. Professional support and enterprise plugins are available for added piece of mind.
  • Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email [email protected])
  • To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media.
  • A few announcements before we start the show:
    • There’s still time to get your tickets for PyCon Colombia, happening February 9th and 10th. Go to pycon.co to learn more and register.
    • There is also still time to register for the O’Reilly Software Architecture Conference in New York. Use the link podcastinit.com/sacon-new-york to register and save 20%
    • If you work with data or want to learn more about how the projects you have heard about on the show get used in the real world then join me at the Open Data Science Conference in Boston from May 1st through the 4th. It has become one of the largest events for data scientists, data engineers, and data driven businesses to get together and learn how to be more effective. To save 60% off your tickets go to podcastinit.com/odsc-east-2018 and register.
  • Your host as usual is Tobias Macey and today I’m interviewing Rohit Pandey about PyRay, a 3d rendering library written completely in python

Interview

  • Introductions
  • How did you get introduced to Python?
  • Can you start by explaining what PyRay is and what motivated you to create it?
    [rohit] PyRay is an open source library written completely in Python that let’s you render three and higher dimensional objects and scenes. Development on it has been ongoing and new features have so far come about from videos for my Youtube channel.
  • What does the internal architecture of PyRay look like and how has that design evolved since you first started working on it?
  • What capabilities are unlocked by having a pure Python rendering library which would otherwise be impractical or impossible for Python developers to do with existing options?
    [rohit] Having a pure Python library makes it accessible with minimal fixed cost to Python users. The tradeoff is you lose on speed, but for many applications that isn’t an issue. I haven’t seen a library coded completely in Python that let’s you manipulate 3d and higher dimensional objects. The core usecase right now is Mathematical artwork. Google geometric gifs and you’ll see some fascinating, mesmerizing results. But those are created for the most part using tools that are not Python. Which is a pity since Python has a very extensive library of Mathematical functions.
  • What have been some of the most challenging aspects of building and maintaining PyRay?
    [rohit] 3d objects – getting mesh plots. I have to develop routines from scratch for almost everything – shading objects, etc. Animated routines for characters.

  • What are some of the most interesting or unexpected uses of PyRay that you are aware of?
    [rohit] Physical simulations. Ex: Testing if a solid is a fair die, getting lower bounds for space packing efficiencies of solids. Creating interactive demos where a user can draw to provide input.

  • For someone who wanted to contribute to PyRay are there any particular skills or experience that would be most helpful?
    Basic linear algebra and python
  • What are some of the features or improvements that you have planned for the future of PyRay?

Keep In Touch

pyray repo – https://github.com/ryu577/pyray
Email
GitHub
LinkedIn

Picks

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Learn Leap Fly: Using Python To Promote Global Literacy with Kjell Wooding - Episode 145

Summary

Learning how to read is one of the most important steps in empowering someone to build a successful future. In developing nations, access to teachers and classrooms is not universally available so the Global Learning XPRIZE serves to incentivize the creation of technology that provides children with the tools necessary to teach themselves literacy. Kjell Wooding helped create Learn Leap Fly in order to participate in the competition and used Python and Kivy to build a platform for children to develop their reading skills in a fun and engaging environment. In this episode he discusses his experience participating in the XPRIZE competition, how he and his team built what is now Kasuku Stories, and how Python and its ecosystem helped make it possible.

Preface

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • I would like to thank everyone who supports us on Patreon. Your contributions help to make the show sustainable.
  • When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at podastinit.com/linode and get a $20 credit to try out their fast and reliable Linux virtual servers for running your awesome app. And now you can deliver your work to your users even faster with the newly upgraded 200 GBit network in all of their datacenters.
  • If you’re tired of cobbling together your deployment pipeline then it’s time to try out GoCD, the open source continuous delivery platform built by the people at ThoughtWorks who wrote the book about it. With GoCD you get complete visibility into the life-cycle of your software from one location. To download it now go to podcatinit.com/gocd. Professional support and enterprise plugins are available for added piece of mind.
  • Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email [email protected])
  • To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media.
  • Your host as usual is Tobias Macey and today I’m interviewing Kjell Wooding about Learn Leap Fly, a startup using Python on mobile devices to facilitate global learning

Interview

  • Introductions
  • How did you get introduced to Python?
  • Can you start by describing what Learn Leap Fly does and how the company got started?
  • What was your motivation for using Kivy as the primary technology for your mobile applications as opposed to the platform native toolkits or other multi-platform frameworks?
  • What are some of the pedagogical techniques that you have incorporated into the technological aspects of your mobile application and are there any that you were unable to translate to a purely technical implementation.
  • How do you measure the effectiveness of the work that you are doing?
  • How has the framework of the XPRIZE influenced the way in which you have approached the design and development of your work?
  • What have been some of the biggest challenges that you faced in the process of developing and deploying your submission for the XPRIZE?
  • What are some of the features that you have planned for future releases of your platform?

Keep In Touch

Picks

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Orange: Visual Data Mining Toolkit with Janez Demšar and Blaž Zupan - Episode 142

Summary

Data mining and visualization are important skills to have in the modern era, regardless of your job responsibilities. In order to make it easier to learn and use these techniques and technologies Blaž Zupan and Janez Demšar, along with many others, have created Orange. In this episode they explain how they built a visual programming interface for creating data analysis and machine learning workflows to simplify the work of gaining insights from the myriad data sources that are available. They discuss the history of the project, how it is built, the challenges that they have faced, and how they plan on growing and improving it in the future.

Preface

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • I would like to thank everyone who supports us on Patreon. Your contributions help to make the show sustainable.
  • When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at podastinit.com/linode and get a $20 credit to try out their fast and reliable Linux virtual servers for running your awesome app. And now you can deliver your work to your users even faster with the newly upgraded 200 GBit network in all of their datacenters.
  • If you’re tired of cobbling together your deployment pipeline then it’s time to try out GoCD, the open source continuous delivery platform built by the people at ThoughtWorks who wrote the book about it. With GoCD you get complete visibility into the life-cycle of your software from one location. To download it now go to podcatinit.com/gocd. Professional support and enterprise plugins are available for added piece of mind.
  • Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email [email protected])
  • To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media.
  • Your host as usual is Tobias Macey and today I’m interviewing Blaž Zupan and Janez Demsar about Orange, a toolbox for interactive machine learning and data visualization in Python

Interview

  • Introductions
  • How did you get introduced to Python?
  • What is Orange and what was your motivation for building it?
  • Who is the target audience for this project?
  • How is the graphical interface implemented and what kinds of workflows can be implemented with the visual components?
  • What are some of the most notable or interesting widgets that are available in the catalog?
  • What are the limitations of the graphical interface and what options do user have when they reach those limits?
  • What have been some of the most challenging aspects of building and maintaining Orange?
  • What are some of the most common difficulties that you have seen when users are just getting started with data analysis and machine learning, and how does Orange help overcome those gaps in understanding?
  • What are some of the most interesting or innovative uses of Orange that you are aware of?
  • What are some of the projects or technologies that you consider to be your competition?
  • Under what circumstances would you advise against using Orange?
  • What are some widgets that you would like to see in future versions?
  • What do you have planned for future releases of Orange?

Keep In Touch

Picks

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Jake Vanderplas: Data Science For Academic Research - Episode 140

Summary

Jake Vanderplas is an astronomer by training and a prolific contributor to the Python data science ecosystem. His current role is using Python to teach principles of data analysis and data visualization to students and researchers at the University of Washington. In this episode he discusses how he got started with Python, the challenges of teaching best practices for software engineering and reproducible analysis, and how easy to use tools for data visualization can help democratize access to, and understanding of, data.

Preface

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • I would like to thank everyone who supports us on Patreon. Your contributions help to make the show sustainable.
  • When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at podastinit.com/linode and get a $20 credit to try out their fast and reliable Linux virtual servers for running your awesome app. And now you can deliver your work to your users even faster with the newly upgraded 200 GBit network in all of their datacenters.
  • If you’re tired of cobbling together your deployment pipeline then it’s time to try out GoCD, the open source continuous delivery platform built by the people at ThoughtWorks who wrote the book about it. With GoCD you get complete visibility into the life-cycle of your software from one location. To download it now go to podcatinit.com/gocd. Professional support and enterprise plugins are available for added piece of mind.
  • Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email [email protected])
  • To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media.
  • Your host as usual is Tobias Macey and today I’m interviewing Jake Vanderplas about data science best practices, and applying them to academic sciences

Interview

  • Introductions
  • How did you get introduced to Python?
  • How has your astronomy background informed and influenced your current work?
  • In your work at the University of Washington, what are some of the most common difficulties that students face when learning data science?
    • How does that list differ for professional scientists who are learning how to apply data science to their work?
  • Where is the tooling still lacking in terms of enabling consistent and repeatable workflows?
  • One of the projects that you are spending time on now is Altair, which is a library for generating visualizations from Pandas dataframes. How does that work factor into your teaching?
  • What are some of the most novel applications of data science that you have been involved with?
  • What are some of the trends in data analysis that you are most excited for?

Keep In Touch

Picks

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Surprise! Recommendation Algorithms with Nicolas Hug - Episode 135

Summary

A relevant and timely recommendation can be a pleasant surprise that will delight your users. Unfortunately it can be difficult to build a system that will produce useful suggestions, which is why this week’s guest, Nicolas Hug, built a library to help with developing and testing collaborative recommendation algorithms. He explains how he took the code he wrote for his PhD thesis and cleaned it up to release as an open source library and his plans for future development on it.

Preface

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • I would like to thank everyone who supports us on Patreon. Your contributions help to make the show sustainable.
  • When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at podastinit.com/linode and get a $20 credit to try out their fast and reliable Linux virtual servers for running your awesome app. And now you can deliver your work to your users even faster with the newly upgraded 200 GBit network in all of their datacenters.
  • If you’re tired of cobbling together your deployment pipeline then it’s time to try out GoCD, the open source continuous delivery platform built by the people at ThoughtWorks who wrote the book about it. With GoCD you get complete visibility into the life-cycle of your software from one location. To download it now go to podcatinit.com/gocd. Professional support and enterprise plugins are available for added piece of mind.
  • Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email [email protected])
  • To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media.
  • Your host as usual is Tobias Macey and today I’m interviewing Nicolas Hug about Surprise, a scikit library for building recommender systems

Interview

  • Introductions
  • How did you get introduced to Python?
  • What is Surprise and what was your motivation for creating it?
  • What are the most challenging aspects of building a recommender system and how does Surprise help simplify that process?
  • What are some of the ways that a user or company can bootstrap a recommender system while they accrue data to use a collaborative algorithm?
  • What are some of the ways that a recommender system can be used, outside of the typical ecommerce example?
  • Once an algorithm has been deployed how can a user test the accuracy of the suggestions?
  • How is Surprise implemented and how has it evolved since you first started working on it?
  • What have been the most difficult aspects of building and maintaining Surprise?
  • competitors?
  • What are the attributes of the system that can be modified to improve the relevance of the recommendations that are provided?
  • For someone who wants to use Surprise in their application, what are the steps involved?
  • What are some of the new features or improvements that you have planned for the future of Surprise?

Keep In Touch

Picks

  • Tobias
    • Silk profiler for Django

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Cauldron with Scott Ernst - Episode 111

Summary

The notebook format that has been exemplified by the IPython/Jupyter project has gained in popularity among data scientists. While the existing formats have proven their value, they are still susceptible with difficulties in collaboration and maintainability. Scott Ernst created the Cauldron notebook to be testable, production ready, and friendly to version control. This week we explore the capabilities, use cases, and architecture of Cauldron and how you can start using it today!

Preface

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • I would like to thank everyone who supports us on Patreon. Your contributions help to make the show sustainable.
  • When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at www.podastinit.com/linode and get a $20 credit to try out their fast and reliable Linux virtual servers for running your awesome app.
  • Visit the site to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch.
  • To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media.
  • Your host as usual is Tobias Macey and today I’m interviewing Scott Ernst about Cauldron, a new notebook format built with software engineering best practices in mind.

Interview

  • Introductions
  • How did you get introduced to Python?
  • Can you start by explaining what Cauldron is and what problem you were trying to solve when you created it?
  • In the documentation it mentions that you can use any editor for creating the content of the notebook. Can you describe a typical workflow of authoring the various files and cells and viewing the output?
  • How does Cauldron compare to the Jupyter notebook format and what factors would lead someone to choose one over the other?
  • Does Cauldron support running languages other than Python? If not then what would be involved in adding that capability?
  • Cauldron notebooks support unit tests of individual cells. How does that process work and what are the limitations?
  • The option for running the notebook in the context of a task workflow tool appears to be a powerful capability. What are some of the considerations that are necessary when writing a notebook to be run in that manner?
  • What are some of the most interesting or unexpected projects that you have seen people using Cauldron for?
  • What do you have planned for the future of Cauldron?

Keep In Touch

Picks

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

MetPy: Taming The Weather With Python - Episode 100

Summary

What’s the weather tomorrow? That’s the question that meteorologists are always trying to get better at answering. This week the developers of MetPy discuss how their project is used in that quest and the challenges that are inherent in atmospheric and weather research. It is a fascinating look at dealing with uncertainty and using messy, multidimensional data to model a massively complex system.

Preface

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • I would like to thank everyone who has donated to the show. Your contributions help us make the show sustainable.
  • When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at linode.com/podcastinit and get a $20 credit to try out their fast and reliable Linux virtual servers for running your awesome app.
  • Visit our site to subscribe to our show, sign up for our newsletter, read the show notes, and get in touch.
  • To help other people find the show you can leave a review on iTunes, or Google Play Music, and tell your friends and co-workers
  • Your host as usual is Tobias Macey and today I’m interviewing Ryan May, Sean Arms, and John Leeman about MetPy, a collection of tools and notebooks for analyzing meteorological data in Python.

Interview

  • Introductions
  • How did you get introduced to Python?
  • What is MetPy and what is the problem that prompted you to create it?
  • Can you explain the problem domain for Meteorology and how it compares to other domains such as the physical sciences?
  • How do you deal with the inherent uncertainty of atmospheric and weather data?
  • What are some of the data sources and data formats that a meteorologist works with?
  • To what degree is machine learning or artificial intelligence employed when modelling climate and local weather patterns?
  • The MetPy documentation has a number of examples of how to use the library and a number of them produce some fairly complex plots and graphs. How prevalent is the need to interact with meteorological data visually to properly understand what it is trying to tell you?
  • I read through your developer guide and watched your SciPy talk about development automation in MetPy. My understanding is that individuals with a pure science background tend to eschew formal code styles and software engineering practices so I’m curious what your experience has been when interacting with your user community.
  • What are some of the interesting innovations in weather science that you are looking forward to?

Keep In Touch

Picks

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Pandas with Jeff Reback - Episode 98

Summary

Pandas is one of the most versatile and widely used tools for data manipulation and analysis in the Python ecosystem. This week Jeff Reback explains why that is, how you can use it to make your life easier, and what you can look forward to in the months to come.

Preface

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • I would like to thank everyone who has donated to the show. Your contributions help us make the show sustainable.
  • When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at linode.com/podcastinit and get a $20 credit to try out their fast and reliable Linux virtual servers for running your awesome app.
  • When you’re writing Python you need a powerful editor to automate routine tasks, maintain effective development practices, and simplify challenging things like refactoring. Our sponsor JetBrains delivers the perfect solution for you in the form of PyCharm, providing a complete set of tools for productive Python, Web, Data Analysis and Scientific development, available in 2 editions. The free and open-source PyCharm Community Edition is perfect for pure Python coding. PyCharm Professional Edition is a full-fledged tool, designed for professional Python, Web and Data Analysis developers. Today JetBrains is offering a 3-month free PyCharm Professional Edition individual subscription. Don’t miss this chance to use the best-in-class tool with intelligent code completion, automated testing, and integration with modern tools like Docker – go to <www.podcastinit.com/pycharm> and use the promo code podcastinit during checkout.
  • Visit the site to subscribe to our show, sign up for our newsletter, read the show notes, and get in touch.
  • To help other people find the show you can leave a review on iTunes, or Google Play Music, and tell your friends and co-workers
  • Your host as usual is Tobias Macey and today I’m interviewing Jeff Reback about Pandas, the swiss army knife of data analysis in Python.

Interview

  • Introductions
  • How did you get introduced to Python?
  • To start off, what is Pandas and what is its origin story?
    • How did you get involved in the project’s development?
  • For someone who is just getting started with Pandas what are the fundamental ideas and abstractions in the library that are necessary to understand how to use it for working with data?
  • Pandas has quite an extensive API and I noticed that the most recent release includes a nice cheat sheet. How do you balance the power and flexibility of such an expressive API with the usability issues that can be introduced by having so many options of how to manipulate the data?
  • There is a strong focus for use in science and data analytics, but there are a number of other areas where Pandas is useful as well. What are some of the most interesting or unexpected uses that you have seen or heard of?
  • What are some of the biggest challenges that you have encountered while working on Pandas?
  • Do you find the constraint of only supporting two dimensional arrays to be limiting, or has it proven to be beneficial for the success of pandas?
  • What’s coming for pandas? Pandas 2.0!

Keep In Touch

Picks

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

SpaCy with Matthew Honnibal - Episode 87

Summary

As the amount of text available on the internet and in businesses continues to increase, the need for fast and accurate language analysis becomes more prominent. This week Matthew Honnibal, the creator of SpaCy, talks about his experiences researching natural language processing and creating a library to make his findings accessible to industry.

Brief Introduction

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • I would like to thank everyone who has donated to the show. Your contributions help us make the show sustainable.
  • When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at linode.com/podcastinit and get a $20 credit to try out their fast and reliable Linux virtual servers for running your awesome app.
  • You’ll want to make sure that your users don’t have to put up with bugs, so you should use Rollbar for tracking and aggregating your application errors to find and fix the bugs in your application before your users notice they exist. Use the link rollbar.com/podcastinit to get 90 days and 300,000 errors for free on their bootstrap plan.
  • Visit our site to subscribe to our show, sign up for our newsletter, read the show notes, and get in touch.
  • To help other people find the show you can leave a review on iTunes, or Google Play Music, and tell your friends and co-workers
  • Join our community! Visit discourse.pythonpodcast.com for your opportunity to find out about upcoming guests, suggest questions, and propose show ideas.
  • Your host as usual is Tobias Macey and today I’m interviewing Matthew Honnibal about SpaCy and Explosion.AI

Interview with Matthew Honnibal

  • Introductions
  • How did you get introduced to Python?
  • Can you start by sharing what SpaCy is and what problem you were trying to solve when you created it?
  • Another project for natural language processing that has been part of the Python ecosystem for a number of years is the Natural Language Tool Kit (NLTK). How does SpaCy differ from the NLTK and are there any cases where that would be the better choice?
  • How much knowledge of NLP and computational linguistics is necessary to be able to use SpaCy?
  • What does the internal design and architecture of SpaCy look like and what are the biggest challenges associated with its development to date and into the future?
  • One of the projects that you have built around SpaCy which I think is really cool and caught my attention when I first found your project is the displaCy visualization tool. Can you explain what that is and why you think it is important?
  • What are some kinds of applications where SpaCy would be useful which might not be obvious candidates for it?
  • Why is speed such an important focus for an NLP library?
  • One of the ways that you have been able to gain a speed boost is through releasing the GIL and allowing for true parallelism via Cython. How have you managed to ensure that this doesn’t lead to data races and program failures?
  • Building on the success of SpaCy you founded a company called Explosion AI. Can you explain what your goals are for this endeavor and the kinds of services that you are offering?
  • What are some of the most interesting uses of SpaCy that you have seen?
  • What do you have planned for the future of SpaCy?

Keep In Touch

Picks

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA