One of the challenges of machine learning is obtaining large enough volumes of well labelled data. An approach to mitigate the effort required for labelling data sets is active learning, in which outliers are identified and labelled by domain experts. In this episode Tivadar Danka describes how he built modAL to bring active learning to bioinformatics. He is using it for doing human in the loop training of models to detect cell phenotypes with massive unlabelled datasets. He explains how the library works, how he designed it to be modular for a broad set of use cases, and how you can use it for training models of your own.
Most applications require data to operate on in order to function, but sometimes that data is hard to come by, so why not just make it up? Mimesis is a library for randomly generating data of different types, such as names, addresses, and credit card numbers, so that you can use it for testing, anonymizing real data, or for placeholders. This week Nikita Sobolev discusses how the project got started, the challenges that it has posed, and how you can use it in your applications.
Determining the best way to manage the capacity and flow of goods through a system is a complicated issue and can be exceedingly expensive to get wrong. Rather than experimenting with the physical objects to determine the optimal algorithm for managing the logistics of everything from global shipping lanes to your local bank, it is better to do that analysis in a simulation. Ruud van der Ham has been working in this area for the majority of his professional life at the Dutch port of Rotterdam. Using his acquired domain knowledge he wrote Salabim as a library to assist others in writing detailed simulations of their own and make logistical analysis of real world systems accessible to anyone with a Python interpreter.
Every piece of software that has been around long enough ends up with some piece of it that needs to be redesigned and refactored. Often the code that needs to be updated is part of the critical path through the system, increasing the risks associated with any change. One way around this problem is to compare the results of the new code against the existing logic to ensure that you aren’t introducing regressions. This week Joe Alcorn shares his work on Laboratory, how the engineers at GitHub inspired him to create it as an analog to the Scientist gem, and how he is using it for his day job.
Using a rendering library can be a difficult task due to dependency issues and complicated APIs. Rohit Pandey wrote PyRay to address these issues in a pure Python library. In this episode he explains how he uses it to gain a more thorough understanding of mathematical models, how it compares to other options, and how you can use it for creating your own videos and GIFs.
One of the draws of Python is how dynamic and flexible the language can be. Sometimes, that flexibility can be problematic if the format of variables at various parts of your program is unclear or the descriptions are inaccurate. The growing middle ground is to use type annotations as a way of providing some verification of the format of data as it flows through your application and enforcing gradual typing. To make it simpler to get started with type hinting, Carl Meyer and Matt Page, along with other engineers at Instagram, created MonkeyType to analyze your code as it runs and generate the type annotations. In this episode they explain how that process works, how it has helped them reduce bugs in their code, and how you can start using it today.
A relevant and timely recommendation can be a pleasant surprise that will delight your users. Unfortunately it can be difficult to build a system that will produce useful suggestions, which is why this week’s guest, Nicolas Hug, built a library to help with developing and testing collaborative recommendation algorithms. He explains how he took the code he wrote for his PhD thesis and cleaned it up to release as an open source library and his plans for future development on it.
For any program that is used by more than one person you need a way to control identity and permissions. There are myriad solutions to that problem, but most of them are tied to a specific framework. Yosai is a flexible, general purpose framework for managing role-based access to your applications that has been decoupled from the underlying platform. This week the author of Yosai, Darin Gordon, joins us to talk about why he started it, his experience porting it from Java, and where he hopes to take it in the future.
As the amount of text available on the internet and in businesses continues to increase, the need for fast and accurate language analysis becomes more prominent. This week Matthew Honnibal, the creator of SpaCy, talks about his experiences researching natural language processing and creating a library to make his findings accessible to industry.