Python, MySQL and UTF8… A slightly unholy trinity

I spent a good two days trying to resolve a problem with a python based messaging application I’ve been working on.
The app was running as a daemon process, monitoring a database queue and sending any items on it.
The database, table and data are all UTF8 encoded. All was fine until an email that actually utilized the benefits of UTF and contained other characters than the usual English ones – the email that was sent came up as either garbage or a series of question marks. Same queue processed by ColdFusion worked fine. From python – no luck. WTF I said (well, I didn’t actually speak out the acronym, but I did utter the expanded version a few times, making sure I repeat the F word for good measure).

My first route was to tackle the email encoding issues but after a day of sweating it and getting no results, It dawned on me that perhaps I’m bashing my head on the wrong wall. A quick interactive shell session confirmed that – the data coming out from the queries was returning as garbage whenever an extended character was supposed to show. Absolutely nothing would make it display properly. The only half-assed thing that worked is encoding the utf8 data as latin1 (and later in the email mime part to set the encoding to utf8. That just didn’t feel right. Any character that could not be represented in latin-1 would have been either ignore, replaced or thrown a Unicode error.

Many Googles later, a long session at #python on undernet (thanks tommi) and a pinch of perseverance, I had a working solution. The secret ingredient is to add a charset=”utf8″ to your connection parameters, and use_unicode=True. The former will tell mySql that it should work in the utf8 character set. The latter is to tell python its on utf8.

I do recommend using DBUtils for connection pooling. It works a treat. My example below uses that to access the MySQLdb connection but I added an example for a more direct approach as well.

The conclusion – use Postgres... It is really really so much better, that once you give it a go, there’s no turning back.


import MySQLdb
from DBUtils.PooledDB import PooledDB
pool = PooledDB( creator = MySQLdb, mincached = 5, db = DB_NAME, host = DB_HOST, user = DB_USER, passwd= DB_PASS,
charset = "utf8", use_unicode = True)
__db = pool.connection(0)
__cur = __db.cursor()
__cur.execute("SELECT utf8_field FROM utf8_table WHERE this=that")
row = __cur.fetchone()
print row[0]

Or if you don’t really fancy a connection pool:


import MySQLdb
__db = MySQLdb.connect(host=DB_HOST ,user=DB_USER ,passwd=DB_PASS,db=DB_NAME, charset = "utf8", use_unicode = True)
__cur = __db.cursor()
__cur.execute("SELECT utf8_field FROM utf8_table WHERE this=that")
row = __cur.fetchone()
print row[0]


33 Responses to “Python, MySQL and UTF8… A slightly unholy trinity”

Leave a Reply