One of the challenges of machine learning is obtaining large enough volumes of well labelled data. An approach to mitigate the effort required for labelling data sets is active learning, in which outliers are identified and labelled by domain experts. In this episode Tivadar Danka describes how he built modAL to bring active learning to bioinformatics. He is using it for doing human in the loop training of models to detect cell phenotypes with massive unlabelled datasets. He explains how the library works, how he designed it to be modular for a broad set of use cases, and how you can use it for training models of your own.
With libraries such as Tensorflow, PyTorch, scikit-learn, and MXNet being released it is easier than ever to start a deep learning project. Unfortunately, it is still difficult to manage scaling and reproduction of training for these projects. Mourad Mourafiq built Polyaxon on top of Kubernetes to address this shortcoming. In this episode he shares his reasons for starting the project, how it works, and how you can start using it today.
Making computers identify and understand what they are looking at in digital images is an ongoing challenge. Recent years have seen notable increases in the accuracy and speed of object detection due to deep learning and new applications of neural networks. In order to make it easier for developers to take advantage of these techniques Tryo Labs built Luminoth. In this interview Joaquin Alori explains how how Luminoth works, how it can be used in your projects, and how it compares to API oriented services for computer vision.
Using a rendering library can be a difficult task due to dependency issues and complicated APIs. Rohit Pandey wrote PyRay to address these issues in a pure Python library. In this episode he explains how he uses it to gain a more thorough understanding of mathematical models, how it compares to other options, and how you can use it for creating your own videos and GIFs.
Learning how to read is one of the most important steps in empowering someone to build a successful future. In developing nations, access to teachers and classrooms is not universally available so the Global Learning XPRIZE serves to incentivize the creation of technology that provides children with the tools necessary to teach themselves literacy. Kjell Wooding helped create Learn Leap Fly in order to participate in the competition and used Python and Kivy to build a platform for children to develop their reading skills in a fun and engaging environment. In this episode he discusses his experience participating in the XPRIZE competition, how he and his team built what is now Kasuku Stories, and how Python and its ecosystem helped make it possible.
Data mining and visualization are important skills to have in the modern era, regardless of your job responsibilities. In order to make it easier to learn and use these techniques and technologies Blaž Zupan and Janez Demšar, along with many others, have created Orange. In this episode they explain how they built a visual programming interface for creating data analysis and machine learning workflows to simplify the work of gaining insights from the myriad data sources that are available. They discuss the history of the project, how it is built, the challenges that they have faced, and how they plan on growing and improving it in the future.
Jake Vanderplas is an astronomer by training and a prolific contributor to the Python data science ecosystem. His current role is using Python to teach principles of data analysis and data visualization to students and researchers at the University of Washington. In this episode he discusses how he got started with Python, the challenges of teaching best practices for software engineering and reproducible analysis, and how easy to use tools for data visualization can help democratize access to, and understanding of, data.
A relevant and timely recommendation can be a pleasant surprise that will delight your users. Unfortunately it can be difficult to build a system that will produce useful suggestions, which is why this week’s guest, Nicolas Hug, built a library to help with developing and testing collaborative recommendation algorithms. He explains how he took the code he wrote for his PhD thesis and cleaned it up to release as an open source library and his plans for future development on it.
The notebook format that has been exemplified by the IPython/Jupyter project has gained in popularity among data scientists. While the existing formats have proven their value, they are still susceptible with difficulties in collaboration and maintainability. Scott Ernst created the Cauldron notebook to be testable, production ready, and friendly to version control. This week we explore the capabilities, use cases, and architecture of Cauldron and how you can start using it today!
What’s the weather tomorrow? That’s the question that meteorologists are always trying to get better at answering. This week the developers of MetPy discuss how their project is used in that quest and the challenges that are inherent in atmospheric and weather research. It is a fascinating look at dealing with uncertainty and using messy, multidimensional data to model a massively complex system.