Synthetic Data Generation Using Mimesis with Nikita Sobolev - Episode 155

Summary

Most applications require data to operate on in order to function, but sometimes that data is hard to come by, so why not just make it up? Mimesis is a library for randomly generating data of different types, such as names, addresses, and credit card numbers, so that you can use it for testing, anonymizing real data, or for placeholders. This week Nikita Sobolev discusses how the project got started, the challenges that it has posed, and how you can use it in your applications.

linode-banner-sponsor-largeDo you want to try out some of the tools and applications that you heard about on Podcast.__init__? Do you have a side project that you want to share with the world? Check out Linode at linode.com/podcastinit or use the code podcastinit2018 and get a $20 credit to try out their fast and reliable Linux virtual servers. They’ve got lightning fast networking and SSD servers with plenty of power and storage to run whatever you want to experiment on.


GoCD is the on-premise open source continuous delivery server created by ThoughtWorks and modeled after the ideas in the Continuous Delivery book by Jez Humble and David Farley.

With GoCD’s comprehensive pipeline modeling, you can model complex workflows for multiple teams with ease. And GoCD’s Value Stream Map lets you track a change from commit to deploy at a glance.

GoCD’s real power is in the visibility it provides over your end-to-end workflow. So you get complete control of and visibility into your deployments, across multiple teams.

Say goodbye to deployment panic and hello to consistent, predictable deliveries.

To learn more about GoCD, visit gocd.org for a free download. Professional Support and enterprise add-ons, including disaster recovery, are available.


Preface

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • When you’re ready to launch your next app you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to scale up. Go to podcastinit.com/linode to get a $20 credit and launch a new server in under a minute.
  • To get worry-free releases download GoCD, the open source continous delivery server built by Thoughworks. You can use their pipeline modeling and value stream map to build, control and monitor every step from commit to deployment in one place. Go to podcastinit.com/gocd to learn more about their professional support services and enterprise add-ons.
  • Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email [email protected])
  • Your host as usual is Tobias Macey and today I’m interviewing Nikita Sobolev about Mimesis, a library for quickly generating synthetic data

Interview

  • Introductions
  • How did you get introduced to Python?
  • What is mimesis and how does it compare to other projects such as faker and factory_boy?
    • What was the motivation for creating it?
  • One of the features that is advertised is the speed of Mimesis. What techniques are used to ensure that the data is generated quickly?
  • What are the built in mechanisms for generating data?
    • What options do users have for customizing the types of data that can get generated?
  • What are some of the most complicated providers to write and maintain?
  • What are some of the use cases outside of unit or integration tests where Mimesis could be beneficial?
    • How would you use Mimesis to anonymize data from a production environment to be used for testing?
  • What are the most challenging aspects of maintaining the Mimesis project?
  • What are some of the plans that you have for the future of Mimesis?

Keep In Touch

Picks

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA