NAPALM with David Barroso and Mircea Ulinic – Episode 117

Summary

Routers and switches are the stitches in the invisible fabric of the internet which we all rely on. Managing that hardware has traditionally been a very manual process, but the NAPALM (Network Automation and Programmability Abstraction Layer with Multivendor support) is helping to change that. This week David Barroso and Mircea Ulinic explain how Python is being used to make sure that you can watch those cat videos.

linode-banner-sponsor-largeDo you want to try out some of the tools and applications that you heard about on Podcast.__init__? Do you have a side project that you want to share with the world? Check out Linode at linode.com/podcastinit or use the code podcastinit2017 and get a $20 credit to try out their fast and reliable Linux virtual servers. They’ve got lightning fast networking and SSD servers with plenty of power and storage to run whatever you want to experiment on.


Preface

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • I would like to thank everyone who supports us on Patreon. Your contributions help to make the show sustainable.
  • When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at www.podastinit.com/linode and get a $20 credit to try out their fast and reliable Linux virtual servers for running your awesome app.
  • Visit the site to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch.
  • To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media.
  • Your host as usual is Tobias Macey and today I’m interviewing David Barroso and Mircea Ulinic about NAPALM (Network Automation and Programmability Abstraction Layer with Multivendor support), the library for managing programmable network devices

Interview

  • Introductions
  • How did you get introduced to Python?
    • [david] 2012 trying to use django 1.4 to store data I had on confluence.
    • [mircea] August 2008, when I bought the Learning Python, Mark Lutz, 2nd edition
  • Can you start by explaining what NAPALM is and the problem that you were solving when you started working on it?
    • [david] trying to remove all the if vendor_a do this, elif vendor_b do this other thing instead
    • [mircea] only if I will feel there’s anything to add
  • What led you to choose Python as the language for implementing it?
    • [david] it’s what I knew best and vendors were starting to provide libraries to interact with their platforms so python seemed like a natural evolution as we could just provide an abstraction on top of those libraries that already existed.
    • [mircea] I didn’t implement NAPALM, I was fistly a user then contributor, now I’m one of the maintainers.
  • When working with network equipment it is easy to apply the wrong settings and bring down a large number of systems or lock yourself out entirely. Are there any tools in NAPALM to help prevent this from happening?
    • [david] We provide mechanisms to ensure proper peer reviewing; we let operators propose a configuration and get a diff. We have a rollback mechanism so if you detect an issue you can immediately rollback and we also added support to the autorollback feature some vendors have.
  • How have you architected the library to allow for easy integration of new classes of network devices?
    • [david] very simple architecture. Trying to avoid complex features like abstract classes, metaprogramming or decorators. Main reason is that I figured my main user base wasn’t going to be very python savvy so I wanted something simple. What I ended doing was simulating interfaces with with a base class that described the supported methods and how they were supposed to behave and an extensive testing framework that ensure the method signatures and the behaviors matched the expectations.
  • Designing and building a consistent API for such a wide variety of hardware and software platforms is a daunting task. How do you determine the lowest common set of functionality that you are going to expose as part of the core library vs delegating to the underlying dependencies?
    • [david] We don’t necessarily go with the lowest common denominator. Sometimes we try to emulate features. For example, if a platform doesn’t support atomic changes we might simulate it by trying to send the configuration as a block and rollback immediately. Obviously a feature likes this is clearly documented so people is aware that this might happen. What we try to avoid though is implementing things that are very specific to a single vendor. In any case the way it has worked so far falls into two categories:
    1. configuration management. These are primitives like loading a candidate configuration for merging or replacing into the device, getting a diff back, commiting, discarding or rolling back configuration. These primitives were designed at the very begining of the project based on the netconf protocol and they have changed very little since then. When a primitive is not natively supported by a device we try to emulate it as with the atomicity example I gave before or we don’t implement it at all if it’s not possible.
    2. The second category is what we call getters which are methods that retrieve information from the devices. Things like interface counters, bgp neighbors, etc. These are basically community driven. Someone opens an issue on github explaining the data that he or she needs, we discuss it, we define a model and then we work on it. Not all getters are supported on all platforms. People mostly implements them as they need.
      Now there is a third category though. It is actually funny but I presented napalm for the first time a couple of years ago at NANOG64. It turns out the day after, at the same venue, Google was presenting Openconfig. Openconfig is an effort to design a common set of models to operate the network. So, for example, they have models for BGP neighbors, for interfaces, vlans, etc… Those models try to be vendor agnostic and you should, in theory, be able to use them to configure or to retrieve consistently information from any device. Problem is that, of course, vendors are slow implementing them, they don’t even have plans for all of them or for all the platforms, etc… So the sad truth is that two years later support for Openconfig is extremely limited. However, in the last few months I have been working on integrating napalm with opencofig so now we have a beta version of napalm where you can use python bindings that can translate native data from a device into an Openconfig object and viceversa. That has two direct implications:
      1. Now we are not only operating all vendors with the same tool but we are also operating them with the same data structures. This means that I can get the configuration of a cisco device and translate it directly to junos configuration.
      2. It also means that because now we are dealing with objects, I can do smart things like having an object that represents the candidate configuration, anotther object that represents a certain running state and simulate merges myself without having to rely on the device itself. I can even generate the exact commands to do the merge without having to rely on them doing the actual merge. I can also simulate the changes offline, I don’t even need access to the device anymore, I could be builting the objects from a backup or from the resulting configuration after merging different branches on github.
  • I have seen a few posts recently discussing the use of NAPALM in conjunction with configuration management platforms such as SaltStack and Ansible. What are the tradeoffs of using the library directly vs integrated with these other tools?
    • [david] napalm is a library in the strict sense. There is no business logic, no workflows, very little tooling embedded. Instead we try to implement as many primitives and be as flexible as possible so other tools can leverage on napalm to implement their workflows. What this means is that using napalm directly is great if you are writing a script to do backups or to solve a specific issue but if you want to build a whole framework for event driven automation or a configuration management system you are probably better off leveraging on napalm integration with salt/ansible/st2.
  • I noticed in the documentation that merging configuration is supported. How do you manage conflicts and priority of nested data structures?
    • [david] we try to make changes atomic. So if you make a change and trigger a conflict or you are missing some datastructure or some configuration is invalid configuration won’t be applied and the user will get an error. For platforms where changes can’t be atomic we try to apply the configuration changes in bulk and revert immediately if there is an error.
  • How does declarative modeling of network devices differ from general purpose operating systems and what unique challenges do they pose?
    • [david] lack of tooling like sed/awk/etc. Lots of state. Configuration is state itself and in most cases you can’t even reload it. Which means you have to type the exact commands to go from state a to state b. Like trying to configure the network stack of linux with only the iproute2 tooling available.
  • What are the most technically challenging aspects of managing different network hardware programmatically?
    • [david] Inconsistencies and buggy code. Not even inconsistencies across different platforms but across minor revisions of the same platforms. Small API changes that are not backwards compatible, small differences on output commands that break regular expressions and APIs that break every second call.
  • What are some of the most interesting or unusual uses of NAPALM that you have seen?
    • [david]
    1. I have seen people replacing their SNMP based monitoring system with napalm.
    2. I have built myself what you could call “immutable infrastructure for the network”. So for example, when you have to do a configuration change you don’t apply that configuration change. What you do instead is compile a full configuration for the device and fully reload the state of the device. That ensures you are always into a known state. So if a user would connect to the device and do a change outside the change control system because you are fully deploying state you can be certain that the manual change will be wipeout. So there is no way out of the automation.
    3. We also have this validate functionality integrated into napalm. With this functionality you can define a desired state, for example certain BGP neighbors have to be up and I must be receiving N prefixes from them. Napalm can then read those rules, figure out which data to retrieve and validate the data retrieved complies so I know some people using this state validation instead of using the traditional times series type of monitoring where you keep retrieving data constantly and alerting when you reach certain thresholds. I guess you could call this test driven monitoring?
    • [mircea]
    1. SNMP thing
  • For someone who is interested in learning more about network management, what resources do you recommend?
    • [david]
    1. networktocode.com has some resources, labs, the slack community behing the organization is very active as well.
    2. ipspace.com has some good resources as well.
    3. pynet.twb-tech.com is also another great place to check for courses
    4. o’reilly has a book on Network Programmability and Automation which I haven’t read but I know the authors are very good so I am confident the content will be of high quality.
    • [mircea]
    1. I blog about NAPALM & generally networking and network automation on my personal space: mirceaulinic.net
    2. packetpushers.net

Keep In Touch

Picks

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA