Synthetic Data Generation Using Mimesis with Nikita Sobolev

00:00:00
/
00:32:37

April 1st, 2018

32 mins 37 secs

Your Hosts

About this Episode

Summary

Most applications require data to operate on in order to function, but sometimes that data is hard to come by, so why not just make it up? Mimesis is a library for randomly generating data of different types, such as names, addresses, and credit card numbers, so that you can use it for testing, anonymizing real data, or for placeholders. This week Nikita Sobolev discusses how the project got started, the challenges that it has posed, and how you can use it in your applications.

Preface

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • When you’re ready to launch your next app you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to scale up. Go to podcastinit.com/linode to get a $20 credit and launch a new server in under a minute.
  • To get worry-free releases download GoCD, the open source continous delivery server built by Thoughworks. You can use their pipeline modeling and value stream map to build, control and monitor every step from commit to deployment in one place. Go to podcastinit.com/gocd to learn more about their professional support services and enterprise add-ons.
  • Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email hosts@podcastinit.com)
  • Your host as usual is Tobias Macey and today I’m interviewing Nikita Sobolev about Mimesis, a library for quickly generating synthetic data

Interview

  • Introductions
  • How did you get introduced to Python?
  • What is mimesis and how does it compare to other projects such as faker and factory_boy?
    • What was the motivation for creating it?


  • One of the features that is advertised is the speed of Mimesis. What techniques are used to ensure that the data is generated quickly?

  • What are the built in mechanisms for generating data?

    • What options do users have for customizing the types of data that can get generated?


  • What are some of the most complicated providers to write and maintain?

  • What are some of the use cases outside of unit or integration tests where Mimesis could be beneficial?

    • How would you use Mimesis to anonymize data from a production environment to be used for testing?


  • What are the most challenging aspects of maintaining the Mimesis project?

  • What are some of the plans that you have for the future of Mimesis?

Keep In Touch

Picks

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA