Do you know what your servers are doing? If you have a metrics system in place then the answer should be “yes”. One critical aspect of that platform is the timeseries database that allows you to store, aggregate, analyze, and query the various signals generated by your software and hardware. As the size and complexity of your systems scale, so does the volume of data that you need to manage which can put a strain on your metrics stack. Julien Danjou built Gnocchi during his time on the OpenStack project to provide a time oriented data store that would scale horizontally and still provide fast queries. In this episode he explains how the project got started, how it works, how it compares to the other options on the market, and how you can start using it today to get better visibility into your operations.
How secure are your servers? The best way to be sure that your systems aren’t being compromised is to do it yourself. In this episode Daniel Goldberg explains how you can use his project Infection Monkey to run a scan of your infrastructure to find and fix the vulnerabilities that can be taken advantage of. He also discusses his reasons for building it in Python, how it compares to other security scanners, and how you can get involved to keep making it better.
Continuous integration systems are important for ensuring that you don’t release broken software. Some projects can benefit from simple, standardized platforms, but as you grow or factor in additional projects the complexity of checking your deployments grows. Zuul is a deployment automation and gating system that was built to power the complexities of OpenStack so it will grow and scale with you. In this episode Monty Taylor explains how he helped start Zuul, how it is designed for scale, and how you can start using it for your continuous delivery systems. He also discusses how Zuul has evolved and the directions it will take in the future.
Hosting your own artifact repositories can have a huge impact on the reliability of your production systems. It reduces your reliance on the availability of external services during deployments and ensures that you have access to a consistent set of dependencies with known versions. Many repositories only support one type of package, thereby requiring multiple systems to be maintained, but Pulp is a platform that handles multiple content types and is easily extendable to manage everything you need for running your applications. In this episode maintainers Bihan Zhang and Austin Macdonald explain how the Pulp project works, the exciting new changes coming in version 3, and how you can get it set up to use for your deployments today.
With libraries such as Tensorflow, PyTorch, scikit-learn, and MXNet being released it is easier than ever to start a deep learning project. Unfortunately, it is still difficult to manage scaling and reproduction of training for these projects. Mourad Mourafiq built Polyaxon on top of Kubernetes to address this shortcoming. In this episode he shares his reasons for starting the project, how it works, and how you can start using it today.
Your backups are running every day, right? Are you sure? What about that daily report job? We all have scripts that need to be run on a periodic basis and it is easy to forget about them, assuming that they are working properly. Sometimes they fail and in order to know when that happens you need a tool that will let you know so that you can find and fix the problem. Pēteris Caune wrote Healthchecks to be that tool and made it available both as an open source project and a hosted version. In this episode he discusses his motivation for starting the project, the lessons he has learned while managing the hosting for it, and how you can start using it today.
Do you know what is happening in your production systems right now? If you have a comprehensive metrics platform then the answer is yes. If your answer is no, then this episode is for you. Jason Dixon and Dan Cech, core maintainers of the Graphite project, talk about how graphite is architected to capture your time series data and give you the ability to use it for answering questions. They cover the challenges that have been faced in evolving the project, the strengths that have let it stand the tests of time, and the features that will be coming in future releases.
Server administration is a complex endeavor, but there are some tools that can make life easier. If you are running your workload in a cloud environment then cloud-init is here to help. This week Scott Moser explains what cloud-init is, how it works, and how it became the de-facto tool for configuring your Linux servers at boot.
Everyone who uses a computer on a regular basis knows the importance of backups. Duplicity is one of the most widely used backup technologies, and it’s written in Python! This week Kenneth Loafman shares how Duplicity got started, how it works, and why you should be using it every day.
As our system architectures and the Internet of Things continue to push us towards distributed logic we need a way to route the traffic between those various components. Crossbar.io is the original implementation of the Web Application Messaging Protocol (WAMP) which combines Remote Procedure Calls (RPC) with Publish/Subscribe (PubSub) communication patterns into a single communication layer. In this episode Tobias Oberstein describes the use cases and design patterns that become possible when you have event-based RPC in a high-throughput and low-latency system.