The Developer Space

Developer's Cafe

  • Cloud
  • Database
  • Open Source
  • Programming
  • Web Dev
  • Mobile
  • Security
  • QuickRef
  • Home
  • Tools
  • Migrate MySQL table data to MongoDB collections using Python

Migrate MySQL table data to MongoDB collections using Python

Shameel Ahmed - MongoDB, MySQL, Python, Tools
June 18, 2021June 25, 2021 1 Comment
mongodb mysql python
2 0
2 0
Read Time14 Minute, 18 Second

Table of Contents

  • Introduction
  • Step 1: Install the required modules
  • Step 2: Read data from MySQL table
  • Step 3: Write to MongoDB collections
  • Step 4: Putting things together
  • Step 5: Enhance the script to load all tables in a MySQL Schema
  • Step 6: Output script progress and make it readable
  • The Final Outcome
    • The Source MySQL Database
    • The Target MongoDB Database after the migration
    • The Python Script and output in VSCode
  • The Complete Script
  • Caveats
    • Database size and performance
    • MongoDB Collection Naming restrictions
  • Download
  • Share

Introduction

MySQL is an RDBMS platform that stores data in a tabular format in a normalized manner, whereas, MongoDB is a NoSQL database that stores information in a schema-less manner as documents that are grouped under collections. The representation of data is completely different and hence migration of MySQL table data to MongoDB collections might sound like a mammoth task. Fortunately, Python makes it a breeze with its strong connectivity and data handling capabilities.

In this article, I will walk you through the steps required to migrate your MySQL table data to MongoDB collections using a simple Python script. The scripts were developed using Python 3.9.5 on Windows. However, it should work with any Python 3+ version on any platform.

Step 1: Install the required modules

The first step is to install the modules required to connect to your MySQL and MongoDB database instances. We will use mysql.connector, the official module to connect to MySQL database. For MongoDB, we will use pymongo, which is the recommended module to connect to MongoDB from Python.

Run the following PIP commands to install the required modules, if they’re not already installed.

pip install mysql-connector
pip install pymongo

PIP is a package manager for Python packages, or modules.

Step 2: Read data from MySQL table

The first step is to read data from the source MySQL table and prepare it in a format that can be used to load the data into the target MongoDB database. MongoDB is a NoSQL database that stores data as JSON documents, It is therefore a good idea to generate the source data in JSON format. Fortunately, Python has strong data handling capabilities which makes it easy to convert the data to JSON format.

import mysql.connector

mysqldb = mysql.connector.connect(
    host="localhost",
    database="employees",
    user="root",
    password=""
)

mycursor = mysqldb.cursor(dictionary=True)
mycursor.execute("SELECT * from categories;")
myresult = mycursor.fetchall()

print(myresult)

When the script completes without any errors, you should see an output like:

[
   {
      "id":4,
      "name":"Medicine",
      "description":"<p>Medicine<br></p>",
      "created_at":"",
      "updated_at":""
   },
   {
      "id":6,
      "name":"Food",
      "description":"<p>Food</p>",
      "created_at":"",
      "updated_at":""
   },
   {
      "id":8,
      "name":"Groceries",
      "description":"<p>Groceries<br></p>",
      "created_at":"",
      "updated_at":""
   },
   {
      "id":9,
      "name":"Cakes & Bakes",
      "description":"<p>Cakes &amp; Bakes<br></p>",
      "created_at":d"",
      "updated_at":""
   }
]

Note that the output is a JSON array, since we passed in the dictionary=True argument to the cursor. Otherwise, the results would be in a list format. We now have the source data in JSON format, ready to be migrated to a MongoDB collection.

Step 3: Write to MongoDB collections

Once you have the source data in JSON format, the next step is to insert the data into a MongoDB collection. A collection is a set of documents and is the NoSQL equivalent of a table (or relation) in an RDBMS. We do that by calling the insert_many() method of the collection class, which returns the list of object ids of the inserted documents. Note that this method will throw an exception when an empty list is passed on as the argument, and hence the length check before the method call.

import pymongo

mongodb_host = "mongodb://localhost:27017/"
mongodb_dbname = "mymongodb"

myclient = pymongo.MongoClient(mongodb_host)
mydb = myclient[mongodb_dbname]
mycol = mydb["categories"]

if len(myresult) > 0:
        x = mycol.insert_many(myresult) #myresult comes from mysql cursor
        print(len(x.inserted_ids))

After this step, you can check your MongoDB instance to verify that the database and collection have been created and the documents inserted. Note that MongoDB is schema-less, which means that you don’t have to define the schema to insert documents, the schema is inferred on the fly and created automatically. MongoDB also creates the database and collection referenced in the code, if they do not already exist.

Step 4: Putting things together

Here is the complete script to read a table from MySQL and insert it into a collection in MongoDB.

import mysql.connector
import pymongo

delete_existing_documents = True
mysql_host="localhost"
mysql_database="mydatabase"
mysql_schema = "myschema"
mysql_user="myuser"
mysql_password="********"

mongodb_host = "mongodb://localhost:27017/"
mongodb_dbname = "mymongodb"

mysqldb = mysql.connector.connect(
    host=mysql_host,
    database=mysql_database,
    user=mysql_user,
    password=mysql_password
)

mycursor = mysqldb.cursor(dictionary=True)
mycursor.execute("SELECT * from categories;")
myresult = mycursor.fetchall()

myclient = pymongo.MongoClient(mongodb_host)
mydb = myclient[mongodb_dbname]
mycol = mydb["categories"]

if len(myresult) > 0:
        x = mycol.insert_many(myresult) #myresult comes from mysql cursor
        print(len(x.inserted_ids))

Step 5: Enhance the script to load all tables in a MySQL Schema

The script reads a table from MySQL and load the results in a MongoDB collection. Now, the next step is to iterate through the list of all tables in the source database and load the results in a new MySQL collection. We can do this by querying the information_schema.tables metadata table which gives us the list of tables in a given schema. We can then iterate through the result and call the above script to migrate the data of each table.

#Iterate through the list of tables in the schema
table_list_cursor = mysqldb.cursor()
table_list_cursor.execute("SELECT table_name FROM information_schema.tables WHERE table_schema = %s ORDER BY table_name;", (mysql_schema,))
tables = table_list_cursor.fetchall()

for table in tables:
    #Execute the migration script for 'table'

You can do this by abstracting the migration logic into a function.

#Function migrate_table 
def migrate_table(db, col_name):
    mycursor = db.cursor(dictionary=True)
    mycursor.execute("SELECT * FROM " + col_name + ";")
    myresult = mycursor.fetchall()

    mycol = mydb[col_name]
    
    if delete_existing_documents:
        #delete all documents in the collection
        mycol.delete_many({})

    #insert the documents
    if len(myresult) > 0:
        x = mycol.insert_many(myresult)
        return len(x.inserted_ids)
    else:
        return 0

Step 6: Output script progress and make it readable

The progress of the script is communicated through the use of print statements. Use color coding to make the output easily readable. For example, you can print success statements in green and failed statements in red.

class bcolors:
    HEADER = '\033[95m'
    OKBLUE = '\033[94m'
    OKCYAN = '\033[96m'
    OKGREEN = '\033[92m'
    WARNING = '\033[93m'
    FAIL = '\033[91m'
    ENDC = '\033[0m'
    BOLD = '\033[1m'
    UNDERLINE = '\033[4m'

print(f"{bcolors.HEADER}This is a header{bcolors.ENDC}")
print(f"{bcolors.OKBLUE}This prints in blue{bcolors.ENDC}")
print(f"{bcolors.OKGREEN}This message is green{bcolors.ENDC}")

The Final Outcome

The Source MySQL Database

The Target MongoDB Database after the migration

The Python Script and output in VSCode

The Complete Script

# Author: Shameel Ahmed (https://github.com/zshameel)
# Date Created: 17-Jun-2021
# Date Last Modified: 25-Jun-20201
# Python version: 3.4+

import mysql.connector
import pymongo
import datetime
import enum

class bcolors:
    HEADER = '\033[95m'
    OKBLUE = '\033[94m'
    OKCYAN = '\033[96m'
    OKGREEN = '\033[92m'
    WARNING = '\033[93m'
    FAIL = '\033[91m'
    ENDC = '\033[0m'
    BOLD = '\033[1m'
    UNDERLINE = '\033[4m'

class MsgType(enum.Enum):
    HEADER = 1
    OKBLUE = 2
    OKCYAN = 3
    OKGREEN = 4
    WARNING = 5
    FAIL = 6
    ENDC = 7
    BOLD = 8
    UNDERLINE = 9

#Pretty Print Function
def prettyprint(msg_text, msg_type):
    if msg_type == MsgType.HEADER:
        print(f"{bcolors.HEADER}{msg_text}{bcolors.ENDC}")
    elif msg_type == MsgType.OKBLUE:
        print(f"{bcolors.OKBLUE}{msg_text}{bcolors.ENDC}")
    elif msg_type == MsgType.OKCYAN:
        print(f"{bcolors.OKCYAN}{msg_text}{bcolors.ENDC}")
    elif msg_type == MsgType.OKGREEN:
        print(f"{bcolors.OKGREEN}{msg_text}{bcolors.ENDC}")
    elif msg_type == MsgType.WARNING:
        print(f"{bcolors.WARNING}{msg_text}{bcolors.ENDC}")
    elif msg_type == MsgType.FAIL:
        print(f"{bcolors.FAIL}{msg_text}{bcolors.ENDC}")
    elif msg_type == MsgType.BOLD:
        print(f"{bcolors.BOLD}{msg_text}{bcolors.ENDC}")
    elif msg_type == MsgType.UNDERLINE:
        print(f"{bcolors.UNDERLINE}{msg_text}{bcolors.ENDC}")

#Function migrate_table 
def migrate_table(db, table_name):
    #TODO: Sanitize table name to conform to MongoDB Collection naming restrictions
    #For example, the $ sign is allowed in MySQL table names but not in MongoDB Collection names
    mycursor = db.cursor(dictionary=True)
    mycursor.execute("SELECT * FROM " + table_name + ";")
    myresult = mycursor.fetchall()

    mycol = mydb[table_name]
    
    if delete_existing_documents:
        #delete all documents in the collection
        mycol.delete_many({})

    #insert the documents
    if len(myresult) > 0:
        x = mycol.insert_many(myresult)
        return len(x.inserted_ids)
    else:
        return 0

begin_time = datetime.datetime.now()
abort = False
prettyprint(f"Script started at: {begin_time}", MsgType.HEADER)

delete_existing_documents = False;
mysql_host="localhost"
mysql_database="mydatabase"
mysql_schema = "myschhema"
mysql_user="root"
mysql_password=""

mongodb_host = "mongodb://localhost:27017/"
mongodb_dbname = "mymongodb"

if (delete_existing_documents):
    confirm_delete = input("Delete existing documents from collections (y)es/(n)o/(a)bort?")
    if confirm_delete.lower() == "a":
        abort = True
    elif confirm_delete.lower() == "n":
        delete_existing_documents = False
    else:
        #Confirm again
        confirm_delete = input("Are you sure (y)es/(n)?")
        if confirm_delete.lower() == "y":
            delete_existing_documents = True
        else:
            abort = True

if abort:
    prettyprint("Script aborted by user", MsgType.FAIL)
else:
    if (delete_existing_documents):
        prettyprint("Existing documents will be deleted from collections", MsgType.FAIL)
    else:
        prettyprint("Existing documents will not be deleted from collections", MsgType.OKGREEN)
        
    #MySQL connection
    prettyprint("Connecting to MySQL server...", MsgType.HEADER)
    mysqldb = mysql.connector.connect(
        host=mysql_host,
        database=mysql_database,
        user=mysql_user,
        password=mysql_password
    )
    prettyprint("Connection to MySQL Server succeeded.", MsgType.OKGREEN)

    #MongoDB connection
    prettyprint("Connecting to MongoDB server...", MsgType.HEADER)
    myclient = pymongo.MongoClient(mongodb_host)
    mydb = myclient[mongodb_dbname]
    prettyprint("Connection to MongoDB Server succeeded.", MsgType.OKGREEN)

    #Start migration
    prettyprint("Migration started...", MsgType.HEADER)

    dblist = myclient.list_database_names()
    if mongodb_dbname in dblist:
        prettyprint("The database exists.", MsgType.OKBLUE)
    else:
        prettyprint("The database does not exist, it is being created.", MsgType.WARNING)

    #Iterate through the list of tables in the schema
    table_list_cursor = mysqldb.cursor()
    table_list_cursor.execute("SELECT table_name FROM information_schema.tables WHERE table_schema = %s ORDER BY table_name LIMIT 15;", (mysql_schema,))
    tables = table_list_cursor.fetchall()

    total_count = len(tables)
    success_count = 0
    fail_count = 0

    for table in tables:
        try:
            prettyprint(f"Processing table: {table[0]}...", MsgType.OKCYAN)
            inserted_count = migrate_table(mysqldb, table[0])
            success_count += 1
            prettyprint(f"Processing table: {table[0]} completed. {inserted_count} documents inserted.", MsgType.OKGREEN)
        except Exception as e:
            fail_count += 1
            prettyprint(f"{e}", MsgType.FAIL)

    prettyprint("Migration completed.", MsgType.HEADER)
    prettyprint(f"{success_count} of {total_count} tables migrated successfully.", MsgType.OKGREEN)
    if fail_count > 0:
        prettyprint(f"Migration of {fail_count} tables failed. See errors above.", MsgType.FAIL)
    
end_time = datetime.datetime.now()
prettyprint(f"Script completed at: {end_time}", MsgType.HEADER)
prettyprint(f"Total execution time: {end_time-begin_time}", MsgType.HEADER)

Caveats

Database size and performance

This script serves well for small to medium size MySQL databases with a few hundred tables, each having a few thousand rows. Performance may suffer for large databases with millions of rows. Please try out with limited rows using the LIMIT keyword on the table list query and on the actual table select query before starting the actual migration.

MongoDB Collection Naming restrictions

MongoDB has some restrictions for naming collections. For example, it does not allow the $ sign in collection names. The script does not sanitize the table names in MySQL to conform to these restrictions.

Download

Download the entire script from GitHub here:

https://github.com/zshameel/MySQL2MongoDB

You can also read this article in DZone

Share

Facebook
Twitter
LinkedIn
Email

Post navigation

Book Review: Dealing with Difficult People (HBR Emotional Intelligence Series)
Re:Link, the Browser Bootstrapper

Related Articles

Invoke OpenAI APIs from API Gateway

How to build a REST API using Amazon API Gateway to invoke OpenAI APIs

Shameel Ahmed
March 11, 2023March 11, 2023 No Comments

How to fix “‘StreamingBody’ object is not subscriptable” in AWS Lambda boto3

Shameel Ahmed
February 12, 2023February 12, 2023 No Comments
Invoke OpenAI APIs from Lambda

How to invoke OpenAI APIs from AWS Lambda functions

Shameel Ahmed
February 5, 2023February 18, 2023 14 Comments

Average Rating

5 Star
100%
4 Star
0%
3 Star
0%
2 Star
0%
1 Star
0%
5.00
(Add your review)

One thought on “Migrate MySQL table data to MongoDB collections using Python”

  1. Arunkumar Palaniyappan says:
    June 18, 2021 at 3:06 pm

    Good work Shameel ! Keep going ! This might really helpful for the people who are moving out of RDBMS.

    Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Contents

  • Introduction
  • Step 1: Install the required modules
  • Step 2: Read data from MySQL table
  • Step 3: Write to MongoDB collections
  • Step 4: Putting things together
  • Step 5: Enhance the script to load all tables in a MySQL Schema
  • Step 6: Output script progress and make it readable
  • The Final Outcome
    • The Source MySQL Database
    • The Target MongoDB Database after the migration
    • The Python Script and output in VSCode
  • The Complete Script
  • Caveats
    • Database size and performance
    • MongoDB Collection Naming restrictions
  • Download
  • Share

Categories

.NET Architecture Artificial Intelligence ASP.NET AWS Azure Books C# Career Cloud CodeProject Conversational Bots Database Data Security Facade IDEs Java Mobile MongoDB MySQL Open Source Patterns PostgreSQL Programming Python Redis Security SQL Server Tools Uncategorized Web Development Windows Phone

Recent Posts

  • How to build a REST API using Amazon API Gateway to invoke OpenAI APIs March 11, 2023
  • How to fix “‘StreamingBody’ object is not subscriptable” in AWS Lambda boto3 February 12, 2023
  • How to invoke OpenAI APIs from AWS Lambda functions February 5, 2023
  • Developer to Architect Series (Red Hat Enable Architect) January 16, 2023
  • Can ChatGPT replace Google Search? January 11, 2023

Archives

  • March 2023 (1)
  • February 2023 (2)
  • January 2023 (2)
  • December 2022 (1)
  • October 2022 (1)
  • July 2022 (2)
  • February 2022 (1)
  • November 2021 (1)
  • July 2021 (1)
  • June 2021 (1)
  • September 2020 (1)
  • May 2020 (2)
  • April 2020 (1)
  • October 2019 (1)
  • September 2019 (4)
  • July 2019 (2)
  • May 2018 (1)
  • September 2017 (1)
  • April 2017 (1)
  • April 2014 (1)
  • August 2011 (1)
  • June 2009 (1)
Copyright 2022. The Developer Space | Theme: OMag by LilyTurf Themes
  • DbStudio
  • Re:Link
  • shameel.net
  • Privacy Policy
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept”, you consent to the use of ALL the cookies.
Do not sell my personal information.
Cookie settingsACCEPT
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are as essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
SAVE & ACCEPT