Discussion:
How ACID are SQLAlchemy transactions?
Wolfgang Keller
2006-09-18 18:09:48 UTC
Permalink
Hello,

as a lazy Python scripting dilettant who's looking for a persistence
framework for database applications, I was wondering whether SQLAlchemy does
transparently ensure that all transactions will always be as ACID as the
underlying database (PostgreSQL in my case) allows. Among others, whether it
reliably ensures that e.g. the objects in memory and the records in the
database are always consistent themselves and in sync with each other,
especially with multiple concurrent clients in a client-server application.

Just because I don't want to have to take care of all this "transaction
management" myself, as this is complete rocket-science to me...

TIA,

Sincerely,

Wolfgang Keller
--
My email-address is correct.
Do NOT remove ".nospam" to reply.


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
Michael Bayer
2006-09-19 01:00:18 UTC
Permalink
as far as basic transactional sanity, its reasonable; we imitate the
constructs used by Hibernate. in a concurrent environment, its not
possible to insure that objects in memory are always exactly in sync
with the database, without going back to the database to refetch the
data, or by pessimistically locking everything (which is not
recommended). however, the Session/Query objects now support all
the same methods that hibernate uses to insure consistency, which
include optimistic concurrency control via checking the affected
rowcount of operations using an optional versioning column, and
pessimistic concurrency control via the "lockmode" keyword parameter
recently added to the Query object (which corresponds to the usage of
SELECT..FOR UPDATE and related constructs...this newer feature still
needs documentation; however for most needs, optimistic concurrency
control is effective).

If you always use your Session objects within a SessionTransaction,
then youll have ACID behavior as good as that which you get from
working with the database directly.

note that SQLAlchemy is not a "framework" and does not "insure" any
particular programming pattern, and is also a bit more open ended
than Hibernate. so while its easier than not to write sane code
using SQLAlchemy, its not a substitute for knowing the basics of how
transactions work.
Post by Wolfgang Keller
Hello,
as a lazy Python scripting dilettant who's looking for a persistence
framework for database applications, I was wondering whether
SQLAlchemy does
transparently ensure that all transactions will always be as ACID as the
underlying database (PostgreSQL in my case) allows. Among others, whether it
reliably ensures that e.g. the objects in memory and the records in the
database are always consistent themselves and in sync with each other,
especially with multiple concurrent clients in a client-server
application.
Just because I don't want to have to take care of all this
"transaction
management" myself, as this is complete rocket-science to me...
TIA,
Sincerely,
Wolfgang Keller
--
My email-address is correct.
Do NOT remove ".nospam" to reply.
----------------------------------------------------------------------
---
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?
page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Sqlalchemy-users mailing list
https://lists.sourceforge.net/lists/listinfo/sqlalchemy-users
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
Wolfgang Keller
2006-09-19 12:19:44 UTC
Permalink
Hello,
Post by Michael Bayer
as far as basic transactional sanity, its reasonable; we imitate the
constructs used by Hibernate. in a concurrent environment,
I forgot to mention that what I have in mind are not several instances of
SQLAlchemy, but a single instance which would be part of something like an
"application server". Then all client requests would be processed by one and
the same instance of SQLAlchemy, running on the server side.
Post by Michael Bayer
its not possible to insure that objects in memory are always exactly in
sync with the database,
In such an environment, where SQLAlchemy should "know" at runtime which
objects are modified by other clients, it should be possible to automatically
keep everything in sync, no?
Post by Michael Bayer
without going back to the database to refetch the data, or by
pessimistically locking everything (which is not recommended).
Why not recommended? In some cases, consistency is more important than
performance...

My uneducated guess would be that without pessimistic locking, you would get
lots of transaction failures in a concurrent environment. And that there
would be a lot of handwork to cleanup behind. :-(
Post by Michael Bayer
If you always use your Session objects within a SessionTransaction,
then youll have ACID behavior as good as that which you get from
working with the database directly.
That's fine, thanks. :-)
Post by Michael Bayer
note that SQLAlchemy is not a "framework" and does not "insure" any
particular programming pattern, and is also a bit more open ended
than Hibernate. so while its easier than not to write sane code
using SQLAlchemy, its not a substitute for knowing the basics of how
transactions work.
No problem for me, I love learning how things work in principle.

But, between "knowing something about the theoretical basis" (what's a lock,
what does rollback mean), and being able to actually implement an entire
transaction manager "by hand" "from scratch", there's a H-U-G-E difference
imho. Something like five years of CS studies and at least as much of actual
practical experience I guess.

Sincerely,

Wolfgang Keler
--
My email-address is correct.
Do NOT remove ".nospam" to reply.


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
Michael Bayer
2006-09-19 14:03:14 UTC
Permalink
Post by Wolfgang Keller
I forgot to mention that what I have in mind are not several
instances of
SQLAlchemy, but a single instance which would be part of something like an
"application server". Then all client requests would be processed by one and
the same instance of SQLAlchemy, running on the server side.
In such an environment, where SQLAlchemy should "know" at runtime which
objects are modified by other clients, it should be possible to automatically
keep everything in sync, no?
right, but your server probably uses multiple threads and/or child
processes, and the SA session is only intended to be used in one
thread at a time, so concurrency issues can still come up, and one
thread/process can change the database while another session is not
exactly in sync with that new data.
Post by Wolfgang Keller
Post by Michael Bayer
without going back to the database to refetch the data, or by
pessimistically locking everything (which is not recommended).
Why not recommended? In some cases, consistency is more important than
performance...
it depends on the needs of the specific application, but pessimistic
locking is usually overkill, complicated, and can lead to deadlocking
conditions. it should be used sparingly and only where its
demonstrable that its needed.
Post by Wolfgang Keller
My uneducated guess would be that without pessimistic locking, you would get
lots of transaction failures in a concurrent environment. And that there
would be a lot of handwork to cleanup behind. :-(
not really! although again it depends on the needs of the
application. going with two assumptions that are the overwhelmingly
common case, a. the session is short-lived, meaning it only lasts
within the context of a request, as opposed to staying open with the
same objects for several seconds/minutes, and b. the application
involves users logging in and working on areas of data that are
specific to their login (meaning, user A typically works with rows in
set X, user B typically works with rows in set Y, and the
intersection of X and Y is very small or none), optimistic collisions
will be extremely rare. if only A or B is the case, collisions are
still pretty rare as well.

long running sessions (with corresponding long-running transactions)
are not recommended anyway. Hibernate has this to say on the subject
(http://www.hibernate.org/hib_docs/v3/reference/en/html/
transactions.html#transactions-basics-apptx):

"The session-per-request pattern is not the only useful concept you
can use to design units of work. Many business processes require a
whole series of interactions with the user interleaved with database
accesses...<snip>..A first naive implementation might keep the
Session and database transaction open during user think time, with
locks held in the database to prevent concurrent modification, and to
guarantee isolation and atomicity. This is of course an anti-pattern,
since lock contention would not allow the application to scale with
the number of concurrent users."

check out the discussion on that page for further detail over how SA
seeks to handle it.
Post by Wolfgang Keller
But, between "knowing something about the theoretical
basis" (what's a lock,
what does rollback mean), and being able to actually implement an entire
transaction manager "by hand" "from scratch", there's a H-U-G-E difference
imho. Something like five years of CS studies and at least as much of actual
practical experience I guess.
well theres a third option to those methods which is to use known
design patterns. While I came up with a lot of ideas for SA on my
own, I didnt have any great new ideas for how to handle sessions; so
SA tries to rely on what is already known. the original unit of work
model I got from a design patterns book (Fowler's); and in version
0.2 i totally changed the API around with regards to Sessions to
mimic the behavior of hibernate much more closely, since it was
already a widely used pattern that has withstood a lot of scrutiny.





-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
Wolfgang Keller
2006-09-20 13:33:44 UTC
Permalink
Post by Michael Bayer
I forgot to mention that what I have in mind are not several instances of
SQLAlchemy, but a single instance which would be part of something like
an "application server". Then all client requests would be processed by
one and the same instance of SQLAlchemy, running on the server side.
In such an environment, where SQLAlchemy should "know" at runtime which
objects are modified by other clients, it should be possible to
automatically keep everything in sync, no?
right, but your server probably uses multiple threads and/or child
processes, and the SA session is only intended to be used in one
thread at a time, so concurrency issues can still come up, and one
thread/process can change the database while another session is not
exactly in sync with that new data.
For such cases, <duck> Modeling </duck> provides a mechanism that
"broadcasts" changes made by one "editing context" to all others upon
committing. Maybe they could share code with SQLAlchemy?

Personally, I agree with SGI that:

"GUI tools should not mislead the user; they should display the current state
of the system, even when changes to the system originate from outside of the
tools themselves."

(Citation from http://oss.sgi.com/projects/fam/).

So imho the GUI of a database application should always reflect the current
state of the data in the database. Without polling, of course. Not like e.g.
this ยง$%&@! Windows "Explorer"...

Unfortunately there seems to be no "standard" mechanism to register a client
with the database itself for notification of any updates to records. Even the
PostgreSQL-specific "listen" and "notify" commands require polling to
retrieve the notification events as far as I understand from the
documentation.

I wonder whether PL/Python support in PostgreSQL would allow to implement
something like such an asynchronous notification system...
Post by Michael Bayer
Post by Michael Bayer
without going back to the database to refetch the data, or by
pessimistically locking everything (which is not recommended).
Why not recommended? In some cases, consistency is more important than
performance...
it depends on the needs of the specific application, but pessimistic
locking is usually overkill, complicated, and can lead to deadlocking
conditions.
In case of the "naive" implementation mentioned, e.g. if someone opens a
record for modification and then leaves the dialogbox open for the rest of
the day, because (s)he gets distracted and forgets about it...
Post by Michael Bayer
long running sessions (with corresponding long-running transactions)
are not recommended anyway. Hibernate has this to say on the subject
(http://www.hibernate.org/hib_docs/v3/reference/en/html/
Thanks for the link.
Post by Michael Bayer
But, between "knowing something about the theoretical basis" (what's a
lock, what does rollback mean), and being able to actually implement an
entire transaction manager "by hand" "from scratch", there's a H-U-G-E
difference imho. Something like five years of CS studies and at least as
much of actual practical experience I guess.
well theres a third option to those methods which is to use known
design patterns.
And a fourth (imho the most "Pythonic") option: Don't re-invent your own nuts
and bolts, use well implemented, tested and documented modules provided by
third parties. :-)

Sincerely,

Wolfgang Keller
--
My email-address is correct.
Do NOT remove ".nospam" to reply.


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
Michael Bayer
2006-09-20 15:44:37 UTC
Permalink
Post by Wolfgang Keller
For such cases, <duck> Modeling </duck> provides a mechanism that
"broadcasts" changes made by one "editing context" to all others upon
committing. Maybe they could share code with SQLAlchemy?
if someone wants to write an extension, most easily to
sessioncontext, that just synchronizes data between open sessions,
thats not a big deal. But it really is somewhat of a reinvention of
the database itself. the Session only represents the set of objects
youre working on *right now*. if i wrote a function that loaded a
row from a database, and then i was going to operate on some of those
values, would i want those values to magically change while i was in
the middle of my operation ?
Post by Wolfgang Keller
"GUI tools should not mislead the user; they should display the current state
of the system, even when changes to the system originate from
outside of the
tools themselves."
(Citation from http://oss.sgi.com/projects/fam/).
So imho the GUI of a database application should always reflect the current
state of the data in the database. Without polling, of course. Not like e.g.
I would not agree that SA is a GUI, and i wouldnt agree that a GUI
should always reflect the current state of the database. if i open
up a document and im editing the document, and i havent yet pressed
"save", the GUI does not represent the current state of the database.
Post by Wolfgang Keller
Unfortunately there seems to be no "standard" mechanism to register a client
with the database itself for notification of any updates to
records. Even the
PostgreSQL-specific "listen" and "notify" commands require polling to
retrieve the notification events as far as I understand from the
documentation.
because nobody uses stuff like that. really, its not the end of the
world to have to deal with normal data synchronization issues.
Post by Wolfgang Keller
I wonder whether PL/Python support in PostgreSQL would allow to implement
something like such an asynchronous notification system...
its a solution in search of a problem. ;) come up with the
application first and show that this is something really
necessary...the reason there arent too many tools for this sort of
thing is because they are usually not needed !





-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
Wolfgang Keller
2006-09-21 14:51:18 UTC
Permalink
Post by Michael Bayer
Post by Wolfgang Keller
For such cases, <duck> Modeling </duck> provides a mechanism that
"broadcasts" changes made by one "editing context" to all others upon
committing. Maybe they could share code with SQLAlchemy?
if someone wants to write an extension, most easily to
sessioncontext, that just synchronizes data between open sessions,
thats not a big deal. But it really is somewhat of a reinvention of
the database itself. the Session only represents the set of objects
youre working on *right now*. if i wrote a function that loaded a
row from a database, and then i was going to operate on some of those
values, would i want those values to magically change while i was in
the middle of my operation ?
Err, no. However I do think that you should be notified that they have been
changed by some third-party and then have the possibility to update your
values if you want to.
Post by Michael Bayer
Post by Wolfgang Keller
"GUI tools should not mislead the user; they should display the current
state of the system, even when changes to the system originate from
outside of the tools themselves."
(Citation from http://oss.sgi.com/projects/fam/).
So imho the GUI of a database application should always reflect the
current state of the data in the database. Without polling, of course.
I would not agree that SA is a GUI,
But it's being used for GUI applications (among others) in my case.
Post by Michael Bayer
and i wouldnt agree that a GUI should always reflect the current state of
the database.
My potential end-users won't agree with you on that. >:->
Post by Michael Bayer
if i open up a document and im editing the document, and i havent yet
pressed "save", the GUI does not represent the current state of the
database.
Yes, and that's bad imho. >:-> The application should notify you as soon as
possible that someone has modified the data "behind your back". Everything
else will lead to unwanted results.
Post by Michael Bayer
Post by Wolfgang Keller
Unfortunately there seems to be no "standard" mechanism to register a
client with the database itself for notification of any updates to
records. Even the PostgreSQL-specific "listen" and "notify" commands
require polling to retrieve the notification events as far as I
understand from the documentation.
I was wrong here. PostgreSQL notifications don't require polling the database
server. It's just the libpq client library itself that requires polling a
function to retrieve the notifications.

But an example for Psycopg2 shows how to solve this: "tap" the socket where
the notifications arrive and poll the function only when something has
arrived.
Post by Michael Bayer
because nobody uses stuff like that. really, its not the end of the
world to have to deal with normal data synchronization issues.
No it isn't. But I have to deal with them, I can't ignore them. And I have to
deal with them in the way that my potential end-users need, so polling every
minute or hour is out of question. And I have to deal with them in an
efficient way, so polling the database every second is out of question.
Post by Michael Bayer
Post by Wolfgang Keller
I wonder whether PL/Python support in PostgreSQL would allow to implement
something like such an asynchronous notification system...
its a solution in search of a problem. ;)
Well, err, no. >:-> Not in my case. I am precisely searching for a solution
for this kind of problem. Because I know from the end users (I'm closer to
being an end-user myself than to being a developer) that they wouldn't accept
to see something on screen that may already be wrong at the moment when they
see it. And they wouldn't accept either to have to press a "refresh" button
every few seconds to see what is going on.

As a sidenote: We _did_ encounter precisely this issue in a past project (I
was on the end-user side in this one), and because the developers were
"stuck" on using "web services" just because it was considered as "modern",
this issue (among lots of others) couldn't be solved in a way that would not
have been ridiculous. In the end the project didn't produce anything that was
actually useful for the end-users.

And it was precisely the experience with this project (wasting quite a few
years and millions of EUR) what made me look for possibilities to implement
such applications myself. And with Python and a solid framework it could be
possible even for a non-computer scientist to implement typical database
applications himself.
Post by Michael Bayer
come up with the application first
In my case, just as _one_ example among many others: See a sort of
"dashboard" on screen about the state of the ongoing work in a workshop. Work
orders, ressource booking, asset tracking etc. This will basically be the
"desktop background" on the screen of the workshop manager.

Another example would be the use of a shared database for data exchange
between applications. In the environment where I work, most applications are
in fact database applications, so exchanging data between them via a shared
database is natural. But you wouldn't want to poll such a database
permanently for nothing, as data for exchange will often arrive
intermittently, in large batches, with nothing happening between, sometimes
for hours.
Post by Michael Bayer
and show that this is something really necessary...the reason there arent
too many tools for this sort of thing is because they are usually not
needed !
My primary job is (technical) consulting for end-users, i.e. mainly asking
them questions about their problems and searching for solutions because they
don't have the time to do it themselves. So I think I somewhat do know what
they need. >:->

Sincerely,

Wolfgang Keller
--
My email-address is correct.
Do NOT remove ".nospam" to reply.


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
Michael Bayer
2006-09-21 16:31:18 UTC
Permalink
Post by Wolfgang Keller
Post by Michael Bayer
if someone wants to write an extension, most easily to
sessioncontext, that just synchronizes data between open sessions,
thats not a big deal. But it really is somewhat of a reinvention of
the database itself. the Session only represents the set of objects
youre working on *right now*. if i wrote a function that loaded a
row from a database, and then i was going to operate on some of those
values, would i want those values to magically change while i was in
the middle of my operation ?
Err, no. However I do think that you should be notified that they have been
changed by some third-party and then have the possibility to update your
values if you want to.
ok, and what sort of mechanism would be used for this notification ?
if im inside of a python function, would there be some kind of
interrupt mechanism being used, that yanks me out of my current code
into some subroutine to deal with a change ? or would it be the job
of the code itself to finish what its doing, and then before doing
the next thing simply check for changes ? all this requires in the
session-based model is to expire() all the objects contained within
(or just clear it out, or get your objects using load(), etc. the
session has many ways of going back to the database to get the latest
data). the database then acts as the commonality between all open
sessions. this is why building another layer of commonality between
sessions in python is somewhat of a reinvention of the wheel; you can
argue that its faster or more efficient than going back to the
database, in which case its an optimization; but i wouldnt optimize
and complicate before determining that the database itself is not
fast enough.

at this point, id like to see what you want your code to look like
for your application to do what it needs.


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
Hogarty, David A.
2006-09-21 17:38:28 UTC
Permalink
I'm having a little difficulty figuring out how to work with compound primary keys within the following context: I have Projects, Versions, and for examples sake, 'Things' inside of both of these, as follows:

CREATE TABLE projects (
id INTEGER NOT NULL,
name VARCHAR(40),
PRIMARY KEY (id)
)

CREATE TABLE versions (
project_id INTEGER NOT NULL,
id INTEGER NOT NULL,
date TIMESTAMP,
comment VARCHAR(300),
PRIMARY KEY (project_id, id),
FOREIGN KEY(project_id) REFERENCES projects (id)
)

CREATE TABLE things (
project_id INTEGER NOT NULL,
thing_id INTEGER NOT NULL,
start_version INTEGER NOT NULL,
end_version INTEGER NOT NULL,
PRIMARY KEY(thing_id),
FOREIGN KEY(project_id) REFERENCES projects (id),
FOREIGN KEY(project_id, start_version) REFERENCES versions (project_id, id),
FOREIGN KEY(project_id, end_version) REFERENCES versions (project_id, id)
)

For now, I'll start with my first problem: properly sequencing the version ids. I need the version ids to start at 1 and increase by 1 for each project, so for example project 1 might have versions 1-52, project 2 might have 1-25, etc. Versions will not be deleted. My current approach to this is the following MapperExtension:

versions_table = Table('versions', metadata,
Column('project_id', Integer, ForeignKey('projects.id'), primary_key=True),
Column('id', Integer, primary_key=True),
Column('date', DateTime),
Column('comment', String(300)))

class VersionKeyMapper(MapperExtension):
def before_insert(self, mapper, connection, instance):
versions_t = mapper.select_table
s = select(
[func.max(versions_t.c.id)+1],
versions_t.c.project_id==instance.project_id
)
r = s.execute()
new_id = r.fetchone()[0]
instance.id = new_id or 1

mapper(Version,versions_table, extension=VersionKeyMapper())

However, using this mapping, if I insert multiple versions before flushing the session, as in:

p = Project('test')
p.versions.append(Version('first version'))
session.save(p)
p.versions.append(Version('second version'))
session.flush()

I get an error because the selects in the MapperExtension happen before the inserts, so multiple versions are all assigned the same version number:

[2006-09-21 13:21:24,030] [engine]: BEGIN
[2006-09-21 13:21:24,032] [engine]: INSERT INTO projects (name) VALUES (?)
[2006-09-21 13:21:24,032] [engine]: ['test']
[2006-09-21 13:21:24,036] [engine]: SELECT max(versions.id) + ?
FROM versions
WHERE versions.project_id = ?
[2006-09-21 13:21:24,038] [engine]: [1, 1]
[2006-09-21 13:21:24,046] [engine]: SELECT max(versions.id) + ?
FROM versions
WHERE versions.project_id = ?
[2006-09-21 13:21:24,048] [engine]: [1, 1]
[2006-09-21 13:21:24,051] [engine]: INSERT INTO versions (project_id, id, date,
comment) VALUES (?, ?, ?, ?)
[2006-09-21 13:21:24,051] [engine]: [1, 1, '2006-09-21 13:21:24', 'first version
']
[2006-09-21 13:21:24,054] [engine]: INSERT INTO versions (project_id, id, date,
comment) VALUES (?, ?, ?, ?)
[2006-09-21 13:21:24,055] [engine]: [1, 1, '2006-09-21 13:21:24', 'second versio
n']

Any suggestions?

-Dave Hogarty
Hogarty, David A.
2006-09-21 19:45:06 UTC
Permalink
Looking at the code, the problem seems to arise in sqlalchemy/orm/mapper.py, in the save_obj function

The basic structure is:

run before_insert/before_update for all objects

save all the mapper attributes to a table at a time for all objects

run after_insert/after_update for all objects

The problem here is if there are sequential dependencies between object saves, as there are in the case of versioning, the above model doesn't work. The goal of the above structuring was to "
bundle inserts/updates on the same table together...". However, this obviously doesn't work in all cases. There are two options I see:

-Always save complete objects before the next object, reordering the central loop of save_obj to loop first on objects, then on tables
-Have some sort of 'sequential dependency' flag that determines whether or not to loop on tables for efficiency if false or on objects for correctness if true

-Dave H

"

________________________________

From: sqlalchemy-users-***@lists.sourceforge.net on behalf of Hogarty, David A.
Sent: Thu 9/21/2006 1:38 PM
To: Sqlalchemy-***@lists.sourceforge.net
Subject: [Sqlalchemy-users] Compound Keys: determining for create,using compound foreign keys


I'm having a little difficulty figuring out how to work with compound primary keys within the following context: I have Projects, Versions, and for examples sake, 'Things' inside of both of these, as follows:

CREATE TABLE projects (
id INTEGER NOT NULL,
name VARCHAR(40),
PRIMARY KEY (id)
)

CREATE TABLE versions (
project_id INTEGER NOT NULL,
id INTEGER NOT NULL,
date TIMESTAMP,
comment VARCHAR(300),
PRIMARY KEY (project_id, id),
FOREIGN KEY(project_id) REFERENCES projects (id)
)

CREATE TABLE things (
project_id INTEGER NOT NULL,
thing_id INTEGER NOT NULL,
start_version INTEGER NOT NULL,
end_version INTEGER NOT NULL,
PRIMARY KEY(thing_id),
FOREIGN KEY(project_id) REFERENCES projects (id),
FOREIGN KEY(project_id, start_version) REFERENCES versions (project_id, id),
FOREIGN KEY(project_id, end_version) REFERENCES versions (project_id, id)
)

For now, I'll start with my first problem: properly sequencing the version ids. I need the version ids to start at 1 and increase by 1 for each project, so for example project 1 might have versions 1-52, project 2 might have 1-25, etc. Versions will not be deleted. My current approach to this is the following MapperExtension:

versions_table = Table('versions', metadata,
Column('project_id', Integer, ForeignKey('projects.id'), primary_key=True),
Column('id', Integer, primary_key=True),
Column('date', DateTime),
Column('comment', String(300)))

class VersionKeyMapper(MapperExtension):
def before_insert(self, mapper, connection, instance):
versions_t = mapper.select_table
s = select(
[func.max(versions_t.c.id)+1],
versions_t.c.project_id==instance.project_id
)
r = s.execute()
new_id = r.fetchone()[0]
instance.id = new_id or 1

mapper(Version,versions_table, extension=VersionKeyMapper())

However, using this mapping, if I insert multiple versions before flushing the session, as in:

p = Project('test')
p.versions.append(Version('first version'))
session.save(p)
p.versions.append(Version('second version'))
session.flush()

I get an error because the selects in the MapperExtension happen before the inserts, so multiple versions are all assigned the same version number:

[2006-09-21 13:21:24,030] [engine]: BEGIN
[2006-09-21 13:21:24,032] [engine]: INSERT INTO projects (name) VALUES (?)
[2006-09-21 13:21:24,032] [engine]: ['test']
[2006-09-21 13:21:24,036] [engine]: SELECT max(versions.id) + ?
FROM versions
WHERE versions.project_id = ?
[2006-09-21 13:21:24,038] [engine]: [1, 1]
[2006-09-21 13:21:24,046] [engine]: SELECT max(versions.id) + ?
FROM versions
WHERE versions.project_id = ?
[2006-09-21 13:21:24,048] [engine]: [1, 1]
[2006-09-21 13:21:24,051] [engine]: INSERT INTO versions (project_id, id, date,
comment) VALUES (?, ?, ?, ?)
[2006-09-21 13:21:24,051] [engine]: [1, 1, '2006-09-21 13:21:24', 'first version
']
[2006-09-21 13:21:24,054] [engine]: INSERT INTO versions (project_id, id, date,
comment) VALUES (?, ?, ?, ?)
[2006-09-21 13:21:24,055] [engine]: [1, 1, '2006-09-21 13:21:24', 'second versio
n']

Any suggestions?

-Dave Hogarty
Michael Bayer
2006-09-21 21:22:43 UTC
Permalink
if the versions are truly just 1-N based, why even call to the
database to get the number ? if you set up a backreference on
Version to get its Project, you can just do:

class VersionKeyMapper(MapperExtension):
def before_insert(self, mapper, connection, instance):
instance.id = instance.project.versions.index(instance) + 1

or you can just modify Project to assign numbers as versions are
appended, that way you have the numbers before you even have to flush():

class Project(object):
def add_version(self, version):
version.id = len(self.versions) + 1
self.versions.append(version)

or if you want the totally clean API, do the "custom list class" thing:

class VersionList(list):
def append(self, item):
if getattr(item, 'id', None) is None:
item.id = len(self) + 1
list.append(self, item)

class Project(object):
versions = VersionList

i was thinking you could also do it via a column default combined
with a thread local, but id have to modify the API a little bit to
get the current Connection in there.

the mapper's save_obj is written the way it is to support batching.
which it doesnt do at the moment because it makes the "concurrency"
check impossible with DBAPI's current behavior for executemany(), and
also theres other stuff regarding defauls that wouldnt work right now
(but could work with a little tweaking). but if we flip it inside
out then theres no chance of ever supporting that, or we'd have to
have two totally different versions of save_obj...although if you
have some insight on how to do it in a reasonably readable fashion im
open to that (since i suppose this sort of thing is going to be
needed by others as well).
Post by Hogarty, David A.
Looking at the code, the problem seems to arise in sqlalchemy/orm/
mapper.py, in the save_obj function
run before_insert/before_update for all objects
save all the mapper attributes to a table at a time for all objects
run after_insert/after_update for all objects
The problem here is if there are sequential dependencies between
object saves, as there are in the case of versioning, the above
model doesn't work. The goal of the above structuring was to
"bundle inserts/updates on the same table together...". However,
-Always save complete objects before the next object, reordering
the central loop of save_obj to loop first on objects, then on tables
-Have some sort of 'sequential dependency' flag that determines
whether or not to loop on tables for efficiency if false or on
objects for correctness if true
-Dave H
"
Sent: Thu 9/21/2006 1:38 PM
Subject: [Sqlalchemy-users] Compound Keys: determining for
create,using compound foreign keys
I'm having a little difficulty figuring out how to work with
compound primary keys within the following context: I have
Projects, Versions, and for examples sake, 'Things' inside of both
CREATE TABLE projects (
id INTEGER NOT NULL,
name VARCHAR(40),
PRIMARY KEY (id)
)
CREATE TABLE versions (
project_id INTEGER NOT NULL,
id INTEGER NOT NULL,
date TIMESTAMP,
comment VARCHAR(300),
PRIMARY KEY (project_id, id),
FOREIGN KEY(project_id) REFERENCES projects (id)
)
CREATE TABLE things (
project_id INTEGER NOT NULL,
thing_id INTEGER NOT NULL,
start_version INTEGER NOT NULL,
end_version INTEGER NOT NULL,
PRIMARY KEY(thing_id),
FOREIGN KEY(project_id) REFERENCES projects (id),
FOREIGN KEY(project_id, start_version) REFERENCES versions (project_id, id),
FOREIGN KEY(project_id, end_version) REFERENCES versions (project_id, id)
)
For now, I'll start with my first problem: properly sequencing the
version ids. I need the version ids to start at 1 and increase by 1
for each project, so for example project 1 might have versions
1-52, project 2 might have 1-25, etc. Versions will not be deleted.
versions_table = Table('versions', metadata,
Column('project_id', Integer, ForeignKey('projects.id'),
primary_key=True),
Column('id', Integer, primary_key=True),
Column('date', DateTime),
Column('comment', String(300)))
versions_t = mapper.select_table
s = select(
[func.max(versions_t.c.id)+1],
versions_t.c.project_id==instance.project_id
)
r = s.execute()
new_id = r.fetchone()[0]
instance.id = new_id or 1
mapper(Version,versions_table, extension=VersionKeyMapper())
However, using this mapping, if I insert multiple versions before
p = Project('test')
p.versions.append(Version('first version'))
session.save(p)
p.versions.append(Version('second version'))
session.flush()
I get an error because the selects in the MapperExtension happen
before the inserts, so multiple versions are all assigned the same
[2006-09-21 13:21:24,030] [engine]: BEGIN
[2006-09-21 13:21:24,032] [engine]: INSERT INTO projects (name) VALUES (?)
[2006-09-21 13:21:24,032] [engine]: ['test']
[2006-09-21 13:21:24,036] [engine]: SELECT max(versions.id) + ?
FROM versions
WHERE versions.project_id = ?
[2006-09-21 13:21:24,038] [engine]: [1, 1]
[2006-09-21 13:21:24,046] [engine]: SELECT max(versions.id) + ?
FROM versions
WHERE versions.project_id = ?
[2006-09-21 13:21:24,048] [engine]: [1, 1]
[2006-09-21 13:21:24,051] [engine]: INSERT INTO versions
(project_id, id, date,
comment) VALUES (?, ?, ?, ?)
[2006-09-21 13:21:24,051] [engine]: [1, 1, '2006-09-21 13:21:24', 'first version
']
[2006-09-21 13:21:24,054] [engine]: INSERT INTO versions
(project_id, id, date,
comment) VALUES (?, ?, ?, ?)
[2006-09-21 13:21:24,055] [engine]: [1, 1, '2006-09-21 13:21:24', 'second versio
n']
Any suggestions?
-Dave Hogarty
----------------------------------------------------------------------
---
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to
share your
opinions on IT & business topics through brief surveys -- and earn
cash
http://www.techsay.com/default.php?
page=join.php&p=sourceforge&CID=DEVDEV________________________________
_______________
Sqlalchemy-users mailing list
https://lists.sourceforge.net/lists/listinfo/sqlalchemy-users
Michael Bayer
2006-09-22 23:09:30 UTC
Permalink
I think the best way to handle this would be to have some sort of
option for the mapper or mapper extension that defaults to False
for efficiency reasons, something like sequentiallydependent=True
to indicate that a full commit/update must happen before another
commit/update of the same type of object (no batch processing).
I found a somewhat obvious way to do this...latest SVN up, specify
"batch=False" to your mapper (default is True), and it will
iteratively call save_obj with a single-element list to fully save
one object at a time.

Michael Bayer
2006-09-21 16:45:23 UTC
Permalink
Post by Wolfgang Keller
I was wrong here. PostgreSQL notifications don't require polling the database
server. It's just the libpq client library itself that requires polling a
function to retrieve the notifications.
But an example for Psycopg2 shows how to solve this: "tap" the
socket where
the notifications arrive and poll the function only when something has
arrived.
if there is some psycopg2 method of receiving change notifications
from the database, then it would not be too hard to write an
extension to SessionContext that simply tracks all open sessions,
receives the notifications, and sends appropriate expire() calls to
the affected objects contained within those sessions. however this
will mean that if you are in the process of doing some setting/
getting on one of those objects, your operation will need to be
interrupted in some way, or some flag need be set so that the Session
raises a concurrency exception when the flush() method is called.

*or* - if you really really want all concurrent threads to totally
share the same state at all times, you can have them all use the same
Session, if you synchronize operations to that session since its not
a threadsafe object by default.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
Mongoose
2006-09-19 17:05:32 UTC
Permalink
On Mon, 18 Sep 2006 20:09:48 +0200
Post by Wolfgang Keller
Just because I don't want to have to take care of all this "transaction
management" myself, as this is complete rocket-science to me...
I know not of ACID, but... rocket science this is not.

trans = session.create_transaction()
DoStuff(session)
session.flush() # Oh noes we've written to the database!
DoMoreStuff(session)
if AllIsWell(): trans.commit()
else: trans.rollback()

If you've ever used save points in a video game, then you've used transaction management. It's like a little save point you can carry around, which disappears harmlessly when you restore your old save, or when you save your game. It's supported in the database across the phone line, so you don't have to have sqlalchemy keep all the pending stuff in memory, which is wasteful plus you can't query it. Instead you write everything to the database, and if something goes wrong you tell the database to rollback to the beginning of your transaction and it'll delete everything you've done since create_transaction().

There's two levels of safety: first sqlalchemy keeps the stuff in memory until you flush(), then the database keeps tabs on the stuff you write to it, until you commit(). Normally sqlalchemy does both levels with every flush(), but if you create a transaction now you have control of when the database rolls back or commits.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
Loading...