question about convert

Discussion:

question about convert_unicode

Vasily Sulatskov

2006-04-17 09:47:30 UTC

Hello

I have a question about convert-unicode engine option.

The documentation says:

convert_unicode=False : if set to True, all String/character based types will
convert Unicode values to raw byte values going into the database, and all
raw byte values to Python Unicode coming out in result sets. This is an
engine-wide method to provide unicode across the board. For unicode
conversion on a column-by-column level, use the Unicode column type
instead.convert_unicode=False : if set to True, all String/character based
types will convert Unicode values to raw byte values going into the database,
and all raw byte values to Python Unicode coming out in result sets. This is
an engine-wide method to provide unicode across the board. For unicode
conversion on a column-by-column level, use the Unicode column type instead.

Wut when convert_unicode is set to true it converts Unicode objects to strings
and leaves String objects unchanged and it can lead to problems:

here is a simple example:
# -*- coding: cp1251 -*-

import sqlalchemy

db = sqlalchemy.create_engine('sqlite://', echo=True, echo_uow=False,
convert_unicode=True)

# a table to store companies
companies = sqlalchemy.Table('companies', db,
sqlalchemy.Column('company_id', sqlalchemy.Integer, primary_key=True),
sqlalchemy.Column('name', sqlalchemy.String(50)))

class Company(object):
pass

sqlalchemy.assign_mapper(Company, companies)

companies.create()

# Company(name=u'Some text in cp1251 encoding')
# This lines works perfectly, unicode object is automatically encoded to
# utf8 before going to database
Company(name=u'Какой-то текст в кодировке cp1251')

# This line still works fine:
# It goes to database as is, i.e. as a string and when decoded
# it is a valid utf8 that can be converted to unicode without
# problems
Company(name='Some text in ascii')

# And this line causes problems:
# It goes to database as is, i.e. as a string and when
Company(name='Какой-то текст в кодировке cp1251')

sqlalchemy.objectstore.commit()

sqlalchemy.objectstore.clear()

c = Company.get(1)
print type(c.name)

c = Company.get(2)
# Now we get something funny. We specified name as a string during
# object creation and get it out of database as Unicode.
print type(c.name)

# And this line will crash interpeter because sqlalchemy tries to convert it
# name to Unicode as it was an utf8 and it is not. It is still in cp1251
encoding
c2 = Company.get(3)

So is it intended behaviour for sqlalchemy or is that a bug?

In my opinion that's a bug and that behaviour should be changed to something
like that:
1. If object is unicode then convert it to engine specified encoding (like
utf8) as it happens now
2. If it's a string then convert it to unicode using some another specifed
encoding (it should be added to engine parameters). This encoding specifies
client-side encoding. It's often handy to have different encodings in database
and on client machines (at least for people with "alternate languages" :-)

If that's indeed problems with sqlalchemy and not my expectations of what
sqlalchemy should be theh I perhaps can make those changes to sqlalchemy

-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642

Michael Bayer

2006-04-17 13:24:42 UTC