As data science becomes more widespread and has a bigger impact on the lives of people, it is important that those projects and products are built with a conscious consideration of ethics. Keeping ethical principles in mind throughout the lifecycle of a data project helps to reduce the overall effort of preventing negative outcomes from the use of the final product. Emily Miller and Peter Bull of Driven Data have created Deon to improve the communication and conversation around ethics among and between data teams. It is a Python project that generates a checklist of common concerns for data oriented projects at the various stages of the lifecycle where they should be considered. In this episode they discuss their motivation for creating the project, the challenges and benefits of maintaining such a checklist, and how you can start using it today.
Maintaining the health and well-being of your software is a never-ending responsibility. Automating away as much of it as possible makes that challenge more achievable. In this episode Anthony Sottile describes his work on the pre-commit framework to simplify the process of writing and distributing functions to make sure that you only commit code that meets your definition of clean. He explains how it supports tools and repositories written in multiple languages, enforces team standards, and how you can start using it today to ship better software.
How secure are your servers? The best way to be sure that your systems aren’t being compromised is to do it yourself. In this episode Daniel Goldberg explains how you can use his project Infection Monkey to run a scan of your infrastructure to find and fix the vulnerabilities that can be taken advantage of. He also discusses his reasons for building it in Python, how it compares to other security scanners, and how you can get involved to keep making it better.
The need to process unbounded and continually streaming sources of data has become increasingly common. One of the popular platforms for implementing this is Kafka along with its streams API. Unfortunately, this requires all of your processing or microservice logic to be implemented in Java, so what’s a poor Python developer to do? If that developer is Ask Solem of Celery fame then the answer is, help to re-implement the streams API in Python. In this episode Ask describes how Faust got started, how it works under the covers, and how you can start using it today to process your fast moving data in easy to understand Python code. He also discusses ways in which Faust might be able to replace your Celery workers, and all of the pieces that you can replace with your own plugins.
Continuous integration systems are important for ensuring that you don’t release broken software. Some projects can benefit from simple, standardized platforms, but as you grow or factor in additional projects the complexity of checking your deployments grows. Zuul is a deployment automation and gating system that was built to power the complexities of OpenStack so it will grow and scale with you. In this episode Monty Taylor explains how he helped start Zuul, how it is designed for scale, and how you can start using it for your continuous delivery systems. He also discusses how Zuul has evolved and the directions it will take in the future.
Twisted is one of the earliest frameworks for developing asynchronous applications in Python and it has yet to fulfill its original purpose. It can be used to build network servers that integrate a multitude of protocols, increase the performance of your I/O bound applications, serve as the full web stack for your WSGI projects, and anything else that needs a battle tested and performant foundation. In this episode long time maintainer Moshe Zadka discusses the history of Twisted, how it has evolved over the years, the transition to Python 3, some of its myriad use cases, and where it is headed in the future. Try it out today and then send some thanks to all of the people who have dedicated their time to building it.
The future is here, it’s just not evenly distributed. One of the places where this is especially true is in sub-Saharan Africa which is a vast region with little to no reliable internet connectivity. To help communities in this region leapfrog infrastructure challenges and gain access to opportunities for education and market information the Ascoderu non-profit has built Lokole. In this episode one of the lead engineers on the project, Clemens Wolff, explains what it is, how it is built, and how the venerable e-mail protocols can continue to provide access cheaply and reliably.
The command line is a powerful and resilient interface for getting work done, but the user experience is often lacking. This can be especially pronounced in database clients because of the amount of information being transferred and examined. To help improve the utility of these interfaces Amjith Ramanujam built PGCLI, quickly followed by MyCLI with the Prompt Toolkit library. In this episode he describes his motivation for building these projects, how their popularity led him to create even more clients, and how these tools can help you in your command line adventures.
One of the challenges of machine learning is obtaining large enough volumes of well labelled data. An approach to mitigate the effort required for labelling data sets is active learning, in which outliers are identified and labelled by domain experts. In this episode Tivadar Danka describes how he built modAL to bring active learning to bioinformatics. He is using it for doing human in the loop training of models to detect cell phenotypes with massive unlabelled datasets. He explains how the library works, how he designed it to be modular for a broad set of use cases, and how you can use it for training models of your own.
With libraries such as Tensorflow, PyTorch, scikit-learn, and MXNet being released it is easier than ever to start a deep learning project. Unfortunately, it is still difficult to manage scaling and reproduction of training for these projects. Mourad Mourafiq built Polyaxon on top of Kubernetes to address this shortcoming. In this episode he shares his reasons for starting the project, how it works, and how you can start using it today.