Contents
Introduction
MySQL is an RDBMS platform that stores data in a tabular format in a normalized manner, whereas, MongoDB is a NoSQL database that stores information in a schema-less manner as documents that are grouped under collections. The representation of data is completely different and hence migration of MySQL table data to MongoDB collections might sound like a mammoth task. Fortunately, Python makes it a breeze with its strong connectivity and data handling capabilities.
In this article, I will walk you through the steps required to migrate your MySQL table data to MongoDB collections using a simple Python script. The scripts were developed using Python 3.9.5 on Windows. However, it should work with any Python 3+ version on any platform.
Step 1: Install the required modules
The first step is to install the modules required to connect to your MySQL and MongoDB database instances. We will use mysql.connector, the official module to connect to MySQL database. For MongoDB, we will use pymongo, which is the recommended module to connect to MongoDB from Python.
Run the following PIP commands to install the required modules, if they’re not already installed.
pip install mysql-connector pip install pymongo
PIP is a package manager for Python packages, or modules.
Step 2: Read data from MySQL table
The first step is to read data from the source MySQL table and prepare it in a format that can be used to load the data into the target MongoDB database. MongoDB is a NoSQL database that stores data as JSON documents, It is therefore a good idea to generate the source data in JSON format. Fortunately, Python has strong data handling capabilities which makes it easy to convert the data to JSON format.
import mysql.connector mysqldb = mysql.connector.connect( host="localhost", database="employees", user="root", password="" ) mycursor = mysqldb.cursor(dictionary=True) mycursor.execute("SELECT * from categories;") myresult = mycursor.fetchall() print(myresult)
When the script completes without any errors, you should see an output like:
[ { "id":4, "name":"Medicine", "description":"<p>Medicine<br></p>", "created_at":"", "updated_at":"" }, { "id":6, "name":"Food", "description":"<p>Food</p>", "created_at":"", "updated_at":"" }, { "id":8, "name":"Groceries", "description":"<p>Groceries<br></p>", "created_at":"", "updated_at":"" }, { "id":9, "name":"Cakes & Bakes", "description":"<p>Cakes & Bakes<br></p>", "created_at":d"", "updated_at":"" } ]
Note that the output is a JSON array, since we passed in the dictionary=True
argument to the cursor. Otherwise, the results would be in a list format. We now have the source data in JSON format, ready to be migrated to a MongoDB collection.
Step 3: Write to MongoDB collections
Once you have the source data in JSON format, the next step is to insert the data into a MongoDB collection. A collection is a set of documents and is the NoSQL equivalent of a table (or relation) in an RDBMS. We do that by calling the insert_many()
method of the collection class, which returns the list of object ids of the inserted documents. Note that this method will throw an exception when an empty list is passed on as the argument, and hence the length check before the method call.
import pymongo mongodb_host = "mongodb://localhost:27017/" mongodb_dbname = "mymongodb" myclient = pymongo.MongoClient(mongodb_host) mydb = myclient[mongodb_dbname] mycol = mydb["categories"] if len(myresult) > 0: x = mycol.insert_many(myresult) #myresult comes from mysql cursor print(len(x.inserted_ids))
After this step, you can check your MongoDB instance to verify that the database and collection have been created and the documents inserted. Note that MongoDB is schema-less, which means that you don’t have to define the schema to insert documents, the schema is inferred on the fly and created automatically. MongoDB also creates the database and collection referenced in the code, if they do not already exist.
Step 4: Putting things together
Here is the complete script to read a table from MySQL and insert it into a collection in MongoDB.
import mysql.connector import pymongo delete_existing_documents = True mysql_host="localhost" mysql_database="mydatabase" mysql_schema = "myschema" mysql_user="myuser" mysql_password="********" mongodb_host = "mongodb://localhost:27017/" mongodb_dbname = "mymongodb" mysqldb = mysql.connector.connect( host=mysql_host, database=mysql_database, user=mysql_user, password=mysql_password ) mycursor = mysqldb.cursor(dictionary=True) mycursor.execute("SELECT * from categories;") myresult = mycursor.fetchall() myclient = pymongo.MongoClient(mongodb_host) mydb = myclient[mongodb_dbname] mycol = mydb["categories"] if len(myresult) > 0: x = mycol.insert_many(myresult) #myresult comes from mysql cursor print(len(x.inserted_ids))
Step 5: Enhance the script to load all tables in a MySQL Schema
The script reads a table from MySQL and load the results in a MongoDB collection. Now, the next step is to iterate through the list of all tables in the source database and load the results in a new MySQL collection. We can do this by querying the information_schema.tables
metadata table which gives us the list of tables in a given schema. We can then iterate through the result and call the above script to migrate the data of each table.
#Iterate through the list of tables in the schema table_list_cursor = mysqldb.cursor() table_list_cursor.execute("SELECT table_name FROM information_schema.tables WHERE table_schema = %s ORDER BY table_name;", (mysql_schema,)) tables = table_list_cursor.fetchall() for table in tables: #Execute the migration script for 'table'
You can do this by abstracting the migration logic into a function.
#Function migrate_table def migrate_table(db, col_name): mycursor = db.cursor(dictionary=True) mycursor.execute("SELECT * FROM " + col_name + ";") myresult = mycursor.fetchall() mycol = mydb[col_name] if delete_existing_documents: #delete all documents in the collection mycol.delete_many({}) #insert the documents if len(myresult) > 0: x = mycol.insert_many(myresult) return len(x.inserted_ids) else: return 0
Step 6: Output script progress and make it readable
The progress of the script is communicated through the use of print
statements. Use color coding to make the output easily readable. For example, you can print success statements in green and failed statements in red.
class bcolors: HEADER = '\033[95m' OKBLUE = '\033[94m' OKCYAN = '\033[96m' OKGREEN = '\033[92m' WARNING = '\033[93m' FAIL = '\033[91m' ENDC = '\033[0m' BOLD = '\033[1m' UNDERLINE = '\033[4m' print(f"{bcolors.HEADER}This is a header{bcolors.ENDC}") print(f"{bcolors.OKBLUE}This prints in blue{bcolors.ENDC}") print(f"{bcolors.OKGREEN}This message is green{bcolors.ENDC}")
The Final Outcome
The Source MySQL Database
The Target MongoDB Database after the migration
The Python Script and output in VSCode
The Complete Script
# Author: Shameel Ahmed (https://github.com/zshameel) # Date Created: 17-Jun-2021 # Date Last Modified: 25-Jun-20201 # Python version: 3.4+ import mysql.connector import pymongo import datetime import enum class bcolors: HEADER = '\033[95m' OKBLUE = '\033[94m' OKCYAN = '\033[96m' OKGREEN = '\033[92m' WARNING = '\033[93m' FAIL = '\033[91m' ENDC = '\033[0m' BOLD = '\033[1m' UNDERLINE = '\033[4m' class MsgType(enum.Enum): HEADER = 1 OKBLUE = 2 OKCYAN = 3 OKGREEN = 4 WARNING = 5 FAIL = 6 ENDC = 7 BOLD = 8 UNDERLINE = 9 #Pretty Print Function def prettyprint(msg_text, msg_type): if msg_type == MsgType.HEADER: print(f"{bcolors.HEADER}{msg_text}{bcolors.ENDC}") elif msg_type == MsgType.OKBLUE: print(f"{bcolors.OKBLUE}{msg_text}{bcolors.ENDC}") elif msg_type == MsgType.OKCYAN: print(f"{bcolors.OKCYAN}{msg_text}{bcolors.ENDC}") elif msg_type == MsgType.OKGREEN: print(f"{bcolors.OKGREEN}{msg_text}{bcolors.ENDC}") elif msg_type == MsgType.WARNING: print(f"{bcolors.WARNING}{msg_text}{bcolors.ENDC}") elif msg_type == MsgType.FAIL: print(f"{bcolors.FAIL}{msg_text}{bcolors.ENDC}") elif msg_type == MsgType.BOLD: print(f"{bcolors.BOLD}{msg_text}{bcolors.ENDC}") elif msg_type == MsgType.UNDERLINE: print(f"{bcolors.UNDERLINE}{msg_text}{bcolors.ENDC}") #Function migrate_table def migrate_table(db, table_name): #TODO: Sanitize table name to conform to MongoDB Collection naming restrictions #For example, the $ sign is allowed in MySQL table names but not in MongoDB Collection names mycursor = db.cursor(dictionary=True) mycursor.execute("SELECT * FROM " + table_name + ";") myresult = mycursor.fetchall() mycol = mydb[table_name] if delete_existing_documents: #delete all documents in the collection mycol.delete_many({}) #insert the documents if len(myresult) > 0: x = mycol.insert_many(myresult) return len(x.inserted_ids) else: return 0 begin_time = datetime.datetime.now() abort = False prettyprint(f"Script started at: {begin_time}", MsgType.HEADER) delete_existing_documents = False; mysql_host="localhost" mysql_database="mydatabase" mysql_schema = "myschhema" mysql_user="root" mysql_password="" mongodb_host = "mongodb://localhost:27017/" mongodb_dbname = "mymongodb" if (delete_existing_documents): confirm_delete = input("Delete existing documents from collections (y)es/(n)o/(a)bort?") if confirm_delete.lower() == "a": abort = True elif confirm_delete.lower() == "n": delete_existing_documents = False else: #Confirm again confirm_delete = input("Are you sure (y)es/(n)?") if confirm_delete.lower() == "y": delete_existing_documents = True else: abort = True if abort: prettyprint("Script aborted by user", MsgType.FAIL) else: if (delete_existing_documents): prettyprint("Existing documents will be deleted from collections", MsgType.FAIL) else: prettyprint("Existing documents will not be deleted from collections", MsgType.OKGREEN) #MySQL connection prettyprint("Connecting to MySQL server...", MsgType.HEADER) mysqldb = mysql.connector.connect( host=mysql_host, database=mysql_database, user=mysql_user, password=mysql_password ) prettyprint("Connection to MySQL Server succeeded.", MsgType.OKGREEN) #MongoDB connection prettyprint("Connecting to MongoDB server...", MsgType.HEADER) myclient = pymongo.MongoClient(mongodb_host) mydb = myclient[mongodb_dbname] prettyprint("Connection to MongoDB Server succeeded.", MsgType.OKGREEN) #Start migration prettyprint("Migration started...", MsgType.HEADER) dblist = myclient.list_database_names() if mongodb_dbname in dblist: prettyprint("The database exists.", MsgType.OKBLUE) else: prettyprint("The database does not exist, it is being created.", MsgType.WARNING) #Iterate through the list of tables in the schema table_list_cursor = mysqldb.cursor() table_list_cursor.execute("SELECT table_name FROM information_schema.tables WHERE table_schema = %s ORDER BY table_name LIMIT 15;", (mysql_schema,)) tables = table_list_cursor.fetchall() total_count = len(tables) success_count = 0 fail_count = 0 for table in tables: try: prettyprint(f"Processing table: {table[0]}...", MsgType.OKCYAN) inserted_count = migrate_table(mysqldb, table[0]) success_count += 1 prettyprint(f"Processing table: {table[0]} completed. {inserted_count} documents inserted.", MsgType.OKGREEN) except Exception as e: fail_count += 1 prettyprint(f"{e}", MsgType.FAIL) prettyprint("Migration completed.", MsgType.HEADER) prettyprint(f"{success_count} of {total_count} tables migrated successfully.", MsgType.OKGREEN) if fail_count > 0: prettyprint(f"Migration of {fail_count} tables failed. See errors above.", MsgType.FAIL) end_time = datetime.datetime.now() prettyprint(f"Script completed at: {end_time}", MsgType.HEADER) prettyprint(f"Total execution time: {end_time-begin_time}", MsgType.HEADER)
Caveats
Database size and performance
This script serves well for small to medium size MySQL databases with a few hundred tables, each having a few thousand rows. Performance may suffer for large databases with millions of rows. Please try out with limited rows using the LIMIT keyword on the table list query and on the actual table select query before starting the actual migration.
MongoDB Collection Naming restrictions
MongoDB has some restrictions for naming collections. For example, it does not allow the $ sign in collection names. The script does not sanitize the table names in MySQL to conform to these restrictions.
Download
Download the entire script from GitHub here:
https://github.com/zshameel/MySQL2MongoDB
You can also read this article in DZone
Good work Shameel ! Keep going ! This might really helpful for the people who are moving out of RDBMS.
[…] Migrate MySQL table data to MongoDB collections using Python Learn Python with ChatGPT […]