The need to process unbounded and continually streaming sources of data has become increasingly common. One of the popular platforms for implementing this is Kafka along with its streams API. Unfortunately, this requires all of your processing or microservice logic to be implemented in Java, so what’s a poor Python developer to do? If that developer is Ask Solem of Celery fame then the answer is, help to re-implement the streams API in Python. In this episode Ask describes how Faust got started, how it works under the covers, and how you can start using it today to process your fast moving data in easy to understand Python code. He also discusses ways in which Faust might be able to replace your Celery workers, and all of the pieces that you can replace with your own plugins.
Michael Foord has been working on building and testing software in Python for over a decade. One of his most notable and widely used contributions to the community is the Mock library, which has been incorporated into the standard library. In this episode he explains how he got involved in the community, why testing has been such a strong focus throughout his career, the uses and hazards of mocked objects, and how he is transitioning to freelancing full time.
Twisted is one of the earliest frameworks for developing asynchronous applications in Python and it has yet to fulfill its original purpose. It can be used to build network servers that integrate a multitude of protocols, increase the performance of your I/O bound applications, serve as the full web stack for your WSGI projects, and anything else that needs a battle tested and performant foundation. In this episode long time maintainer Moshe Zadka discusses the history of Twisted, how it has evolved over the years, the transition to Python 3, some of its myriad use cases, and where it is headed in the future. Try it out today and then send some thanks to all of the people who have dedicated their time to building it.
Pandas is a swiss army knife for data processing in Python but it has long been difficult to customize. In the latest release there is now an extension interface for adding custom data types with namespaced APIs. This allows for building and combining domain specific use cases and alternative storage mechanisms. In this episode Tom Augspurger describes how the new ExtensionArray works, how it came to be, and how you can start building your own extensions today.
One of the challenges of machine learning is obtaining large enough volumes of well labelled data. An approach to mitigate the effort required for labelling data sets is active learning, in which outliers are identified and labelled by domain experts. In this episode Tivadar Danka describes how he built modAL to bring active learning to bioinformatics. He is using it for doing human in the loop training of models to detect cell phenotypes with massive unlabelled datasets. He explains how the library works, how he designed it to be modular for a broad set of use cases, and how you can use it for training models of your own.
Most applications require data to operate on in order to function, but sometimes that data is hard to come by, so why not just make it up? Mimesis is a library for randomly generating data of different types, such as names, addresses, and credit card numbers, so that you can use it for testing, anonymizing real data, or for placeholders. This week Nikita Sobolev discusses how the project got started, the challenges that it has posed, and how you can use it in your applications.
Determining the best way to manage the capacity and flow of goods through a system is a complicated issue and can be exceedingly expensive to get wrong. Rather than experimenting with the physical objects to determine the optimal algorithm for managing the logistics of everything from global shipping lanes to your local bank, it is better to do that analysis in a simulation. Ruud van der Ham has been working in this area for the majority of his professional life at the Dutch port of Rotterdam. Using his acquired domain knowledge he wrote Salabim as a library to assist others in writing detailed simulations of their own and make logistical analysis of real world systems accessible to anyone with a Python interpreter.
Every piece of software that has been around long enough ends up with some piece of it that needs to be redesigned and refactored. Often the code that needs to be updated is part of the critical path through the system, increasing the risks associated with any change. One way around this problem is to compare the results of the new code against the existing logic to ensure that you aren’t introducing regressions. This week Joe Alcorn shares his work on Laboratory, how the engineers at GitHub inspired him to create it as an analog to the Scientist gem, and how he is using it for his day job.
Using a rendering library can be a difficult task due to dependency issues and complicated APIs. Rohit Pandey wrote PyRay to address these issues in a pure Python library. In this episode he explains how he uses it to gain a more thorough understanding of mathematical models, how it compares to other options, and how you can use it for creating your own videos and GIFs.