Mike Driscoll has been writing blogs and books for the Python community for years, including his popular series on the Python Module Of The Week. In his daily work he uses Python to test graphical interfaces written in C++ and QT for embedded platforms. In this episode he explains his work, how he got involved in writing as a regular exercise, and an overview of his recent books.
Hosting your own artifact repositories can have a huge impact on the reliability of your production systems. It reduces your reliance on the availability of external services during deployments and ensures that you have access to a consistent set of dependencies with known versions. Many repositories only support one type of package, thereby requiring multiple systems to be maintained, but Pulp is a platform that handles multiple content types and is easily extendable to manage everything you need for running your applications. In this episode maintainers Bihan Zhang and Austin Macdonald explain how the Pulp project works, the exciting new changes coming in version 3, and how you can get it set up to use for your deployments today.
The future is here, it’s just not evenly distributed. One of the places where this is especially true is in sub-Saharan Africa which is a vast region with little to no reliable internet connectivity. To help communities in this region leapfrog infrastructure challenges and gain access to opportunities for education and market information the Ascoderu non-profit has built Lokole. In this episode one of the lead engineers on the project, Clemens Wolff, explains what it is, how it is built, and how the venerable e-mail protocols can continue to provide access cheaply and reliably.
Machine learning models are often inscrutable and it can be difficult to know whether you are making progress. To improve feedback and speed up iteration cycles Benjamin Bengfort and Rebecca Bilbro built Yellowbrick to easily generate visualizations of model performance. In this episode they explain how to use Yellowbrick in the process of building a machine learning project, how it aids in understanding how different parameters impact the outcome, and the improved understanding among teammates that it creates. They also explain how it integrates with the scikit-learn API, the difficulty of producing effective visualizations, and future plans for improvement and new features.
The command line is a powerful and resilient interface for getting work done, but the user experience is often lacking. This can be especially pronounced in database clients because of the amount of information being transferred and examined. To help improve the utility of these interfaces Amjith Ramanujam built PGCLI, quickly followed by MyCLI with the Prompt Toolkit library. In this episode he describes his motivation for building these projects, how their popularity led him to create even more clients, and how these tools can help you in your command line adventures.
Pandas is a swiss army knife for data processing in Python but it has long been difficult to customize. In the latest release there is now an extension interface for adding custom data types with namespaced APIs. This allows for building and combining domain specific use cases and alternative storage mechanisms. In this episode Tom Augspurger describes how the new ExtensionArray works, how it came to be, and how you can start building your own extensions today.
Software development is a skill that can create value and reduce drudgery in a wide variety of contexts. Sometimes the causes that are most in need of software expertise are also the least able to pay for it. By volunteering our time and abilities to causes that we believe in, we can help make a tangible difference in the world. In this episode Eric Schles describes his experiences working on social justice initiatives and the types of work that proved to be the most helpful to the groups that he was working with.
One of the challenges of machine learning is obtaining large enough volumes of well labelled data. An approach to mitigate the effort required for labelling data sets is active learning, in which outliers are identified and labelled by domain experts. In this episode Tivadar Danka describes how he built modAL to bring active learning to bioinformatics. He is using it for doing human in the loop training of models to detect cell phenotypes with massive unlabelled datasets. He explains how the library works, how he designed it to be modular for a broad set of use cases, and how you can use it for training models of your own.
Testing is a critical activity in all software projects, but one that is often neglected in data pipelines. The complexities introduced by the inherent statefulness of the problem domain and the interdependencies between systems contribute to make pipeline testing difficult to manage. To make this endeavor more manageable Abe Gong and James Campbell have created Great Expectations. In this episode they discuss how you can use the project to create tests in the exploratory phase of building a pipeline and leverage those to monitor your systems in production. They also discussed how Great Expectations works, the difficulties associated with pipeline testing and managing associated technical debt, and their future plans for the project.
We take it for granted every day, but creating and displaying vivid colors in our digital media is a complicated and often difficult process. There are different ways to represent color, the ways in which they are displayed can cause them to look different, and translating between systems can cause losses of information. To simplify the process of working with color information in code Thomas Mansencal wrote the Colour project. In this episode we discuss his motiviation for creating and sharing his library, how it works to translate and manage color representations, and how it can be used in your projects.