[sqlalchemy] Guidance regarding nested session usage

Discussion:

a***@withplum.com

2017-08-09 16:14:09 UTC

Hey,

I'd like some help regarding nested session usage please.

I'm working on an application that has an API layer but also has a lot of
cron jobs (via Celery) and scripts. I'm trying to design the app in a way
that my "business" logic is contained and re-usable by any of these
interfaces.

The SQLAlchemy session scope is request/task-wide (i.e requests and tasks
remove the scoped session at the end) but I am doing explicit commits
instead of committing on request end because I sometimes have to deal with
complicated logic like creating/submitting transactions to payment
processors etc.

To start off, I use a context manager, much like the docs, which commits or
rollbacks as necessary. I then have a layer of actions, which are
considered "top-level" functions that can do a simple operation e.g update
something or a collection of operations i.e create and submit a
transaction. These actions use the context manager above to persist stuff
and I've opted to keep all session "usage" in these actions alone and
nowhere else in the code. Pretty soon, the need to use some of the simpler
actions inside other, bigger actions arose which, after reading the docs,
led me to turn autocommit=True and use session.begin(subtransactions=True).
Note that I don't want to use savepoints, I just want to be able to use my
actions inside other actions. The docs recommend that expire_on_commit is
set to False with autocommit, which I've done but that led to a couple of
situations where I was operating on out-of-date data hence I want to turn
expire_on_commit to True again.

My questions:

(1) Does my application layout make sense from a SQLAlchemy perspective?
(2) What is the problem with expire_on_commit=True and autocommit=True?
(3) I feel that, even with the context manager, the transaction boundaries
are still blurry because the developer does not know what will actually get
committed in the database. For example, if a previous part of the code
changed something, then called an action that commits the session, the
previous change will get committed as well. I've searched around and found
this: https://github.com/mitsuhiko/flask-sqlalchemy/pull/447 which
basically issues a rollback on entering the context manager to ensure that
only what is within the context manager will get committed. What do you
think of it? I can immediately see a problem where if I query for an object
before passing it to an action, then use the context manager, all the work
done on querying is lost since the object state is expired on rollback.

I'd appreciate any advice/input.

Best,
Alex
--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Jonathan Vanasco

2017-08-09 19:38:34 UTC

Permalink

A similar question about another anti-pattern was asked/answered recently.
Most of what Mike says applies in this use-case
https://groups.google.com/forum/#!topic/sqlalchemy/W_Rn-EwKvZo especially
the locking and integrity issues with long-running transactions. He's
written about it elsewhere as well.

Personally, I prefer to use the following approach when dealing with
repeated actions and long-running processes:

1. Long running, complicated, processes have their own table, which
includes at-least:
job_id, current_state, timestamp_start, timestamp_last, timestamp_finish
As the job progresses, this table is updated. Having it in the database
allows us to find stuck jobs, etc.

2. Long processes span multiple transactions. Some transactions are nested
with savepoints.

3. Re-usable functions accept the session as an argument and, as a rule,
never commit. When they must commit (it happens) you require a kwarg set
and raise an error if it's missing. This way the logic is absolutely clear
in the calling function (otherwise, maintenance and code reviews are a
headache)

We often use secondary sessions with auto-commit to track 3rd party api
logging/etc too.

For example, a payment processing task on an ecormmerce project I worked on
once did the following:

session autocommit- log that we're about to charge $x, returning id
api integration- charge $x
session autocommit- log that we successfully charged $x to id (or
failed).
session transaction - note the charge, continue with the task

automated reports then check for charges that were not completed and not
marked as an acceptable fail. those items are errors that need to be
reconciled with the payment processor's logs.

Other people here have enforce much better standards and practices than I
do. We have some Celery jobs that use 4-5 transactions when dealing with
external APIs.
--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Mike Bayer

2017-08-09 20:26:24 UTC

Permalink

it's getting close to where I will have to make some major additions
to the docs as well as probably add some new patterns up. The story
of "autocommit" mode as well as "subtransactions" are based on a
SQLAlchemy that was designed both when "always-transactions" was not
an assumption (it is now) as well as during Python 2.3, when not only
context managers hadn't been invented yet, we didn't even have
decorators. The problem that "subtransactions" were meant to solve
are much better solved using normal Python today, that is, context
managers. If you use the context manager pattern, use it with a
scoped_session and then have your context manager check if there's
already a session present (e.g. that your context managers are
nested), and then simply do nothing in the block if this is the case.

Post by a***@withplum.com
(2) What is the problem with expire_on_commit=True and autocommit=True?

so here I need to go look at the docs and try to understand why this

Post by a***@withplum.com
If used, it should always be combined with the usage of Session.begin() and Session.commit(), to **ensure a transaction demarcation.**
Executing queries **outside of a demarcated transaction is a legacy mode of usage**, and can in some cases lead to concurrent connection checkouts.
"***In the absence of a demarcated transaction***, the Session cannot make appropriate decisions as to when autoflush should occur nor when auto-expiration should occur, so these features should be disabled with autoflush=False, expire_on_commit=False."

that is referring to the original, original, super old, never-use-it
version of autocommit, where you aren't calling begin(), which looked
like this (we are talking SQLAlchemy 0.2):

sess = create_session()

sess.add(object)
sess.flush() # <--- commits a transaction

result = sess.query(SomeClass).all()

sess.add(some_other_object)
sess.flush() # <-- -same

above, it would be very inconvenient if all the objects in memory were
cleared out every time you called flush.

Here's the problem with documentation. If you go through the effort
to make them very specific and very accurate, you end up with too many
words, and the reader will often not understand the basic point being
made. Which is above, **if you are not using begin() and commit(),
the way we just said you should, then you should turn off
autocommit=True**. That section needs another rewrite but then again
the entire concept of "subtransactions" also needs to be discouraged
as these are all obsolete patterns.

Post by a***@withplum.com
(3) I feel that, even with the context manager, the transaction boundaries
are still blurry because the developer does not know what will actually get
committed in the database. For example, if a previous part of the code
changed something, then called an action that commits the session, the
previous change will get committed as well.

So this whole part sounds wrong. If you want your database function
to occur in the context of a larger transaction, then by definition,
there may be other pending data present. Whether that data is pending
in the session of your Python application, or pending in the MVCC
buffer of your database, doesn't matter from a transaction-level point
of view. It might matter for performance or debugging reasons, but in
that case, you'd want to just emit flush() at the top of the block, so
that those pending changes are on the server side of the transaction
rather than the client side, but all of it is still pending as far as
being permanent to disk and visible to the rest of the world.

if you have a function that wants to persist data out to the database
and it does not want to persist data that is already pending in the
ongoing tranasction, it should use a separate transaction. This is
a common use case and it is what you do if you are for example putting
rows into a job queue type of table, or sending out messages that are
going to show up in some log or console output somewhere.

Post by a***@withplum.com
I've searched around and found
this: https://github.com/mitsuhiko/flask-sqlalchemy/pull/447 which basically
issues a rollback on entering the context manager to ensure that only what
is within the context manager will get committed. What do you think of it?

I'm much more a proponent of writing one's own patterns that suit
their application rather than making the prepackaged ones in something
like Flask fit. I think if it implicitly rolls back, that's a
terrible idea because if you truly expect that nothing important is
present in the session, it should be asserting that and raising if
something is found (look in session.new, session.dirty,
session.deleted).

I

Post by a***@withplum.com
can immediately see a problem where if I query for an object before passing
it to an action, then use the context manager, all the work done on querying
is lost since the object state is expired on rollback.
I'd appreciate any advice/input.
Best,
Alex
--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper
http://www.sqlalchemy.org/
To post example code, please provide an MCVE: Minimal, Complete, and
Verifiable Example. See http://stackoverflow.com/help/mcve for a full
description.
---
You received this message because you are subscribed to the Google Groups
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Mike Bayer

2017-08-09 20:28:44 UTC

Permalink

Post by Mike Bayer
made. Which is above, **if you are not using begin() and commit(),
the way we just said you should, then you should turn off
autocommit=True**.

arg arg ARG ARG

"if you are **not** using begin() and commit(), and are instead just
calling flush() which fully commits, then you should **disable**
expire_on_commit=True, to avoid excessive re-loads of data."

That section needs another rewrite but then again

Post by Mike Bayer
the entire concept of "subtransactions" also needs to be discouraged
as these are all obsolete patterns.

So this whole part sounds wrong. If you want your database function
to occur in the context of a larger transaction, then by definition,
there may be other pending data present. Whether that data is pending
in the session of your Python application, or pending in the MVCC
buffer of your database, doesn't matter from a transaction-level point
of view. It might matter for performance or debugging reasons, but in
that case, you'd want to just emit flush() at the top of the block, so
that those pending changes are on the server side of the transaction
rather than the client side, but all of it is still pending as far as
being permanent to disk and visible to the rest of the world.
if you have a function that wants to persist data out to the database
and it does not want to persist data that is already pending in the
ongoing tranasction, it should use a separate transaction. This is
a common use case and it is what you do if you are for example putting
rows into a job queue type of table, or sending out messages that are
going to show up in some log or console output somewhere.

a***@withplum.com

2017-08-12 13:55:04 UTC

Permalink

Thank you very much for the guidance Jonathan and Mike. I've implemented
nesting counting on my context manager and turned off autocommit and
subtransactions. It looks like it's working well!

Alex

Post by a***@withplum.com
Hey,
I'd like some help regarding nested session usage please.
I'm working on an application that has an API layer but also has a lot of
cron jobs (via Celery) and scripts. I'm trying to design the app in a way
that my "business" logic is contained and re-usable by any of these
interfaces.
The SQLAlchemy session scope is request/task-wide (i.e requests and tasks
remove the scoped session at the end) but I am doing explicit commits
instead of committing on request end because I sometimes have to deal with
complicated logic like creating/submitting transactions to payment
processors etc.
To start off, I use a context manager, much like the docs, which commits
or rollbacks as necessary. I then have a layer of actions, which are
considered "top-level" functions that can do a simple operation e.g update
something or a collection of operations i.e create and submit a
transaction. These actions use the context manager above to persist stuff
and I've opted to keep all session "usage" in these actions alone and
nowhere else in the code. Pretty soon, the need to use some of the simpler
actions inside other, bigger actions arose which, after reading the docs,
led me to turn autocommit=True and use session.begin(subtransactions=True).
Note that I don't want to use savepoints, I just want to be able to use my
actions inside other actions. The docs recommend that expire_on_commit is
set to False with autocommit, which I've done but that led to a couple of
situations where I was operating on out-of-date data hence I want to turn
expire_on_commit to True again.
(1) Does my application layout make sense from a SQLAlchemy perspective?
(2) What is the problem with expire_on_commit=True and autocommit=True?
(3) I feel that, even with the context manager, the transaction boundaries
are still blurry because the developer does not know what will actually get
committed in the database. For example, if a previous part of the code
changed something, then called an action that commits the session, the
previous change will get committed as well. I've searched around and found
this: https://github.com/mitsuhiko/flask-sqlalchemy/pull/447 which
basically issues a rollback on entering the context manager to ensure that
only what is within the context manager will get committed. What do you
think of it? I can immediately see a problem where if I query for an object
before passing it to an action, then use the context manager, all the work
done on querying is lost since the object state is expired on rollback.
I'd appreciate any advice/input.
Best,
Alex