Discussion:
[sqlalchemy] Patterns for multitenancy
Eric Smith
2018-11-16 00:51:10 UTC
Permalink
Are there any good examples/patterns for implementing multitenancy with
SQLAlchemy?

I'm in the process of converting a single-tenant web application to support
multiple tenants. Using a schema-per-tenant strategy seems appealing -- if
you can "activate" the schema for a particular tenant on a request, the
existing database access code can remain unchanged, and not leaking data
between tenants seems pretty manageable. I've read about schema translation
<https://docs.sqlalchemy.org/en/latest/core/connections.html?highlight=schema_translate_map#schema-translating> as
a potential approach (though I think I would prefer that to happen at a
session scope rather than connection), as well as using the PostgreSQL
search_path.

While I'm academically still interested in schema-per-tenant, my expected
scale (tens of thousands of tenants) is pushing me toward a row-per-tenant
strategy instead. Adding a tenant_id to the appropriate models is
straightforward, but the need to manually add
.filter_by(tenant_id=tenant_id) to every query feels cumbersome and
error-prone.

Maybe I should be looking at the PreFilteredQuery or GlobalFilter recipes?
I haven't quite gotten my head around those to know if they apply.

Maybe there's a way to use a query_property on the model Base?

I also ran across this experimental project that seems
intriguing: https://github.com/mwhite/MultiAlchemy But I'm not sure if
that is a solid strategy or if there is something egregiously wrong with it.

Any advice, references or direction would be appreciated!

Thanks,
Eric
--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.
Mike Bayer
2018-11-16 01:20:25 UTC
Permalink
Are there any good examples/patterns for implementing multitenancy with SQLAlchemy?
I'm in the process of converting a single-tenant web application to support multiple tenants. Using a schema-per-tenant strategy seems appealing -- if you can "activate" the schema for a particular tenant on a request, the existing database access code can remain unchanged, and not leaking data between tenants seems pretty manageable. I've read about schema translation as a potential approach (though I think I would prefer that to happen at a session scope rather than connection), as well as using the PostgreSQL search_path.
While I'm academically still interested in schema-per-tenant, my expected scale (tens of thousands of tenants) is pushing me toward a row-per-tenant strategy instead. Adding a tenant_id to the appropriate models is straightforward, but the need to manually add .filter_by(tenant_id=tenant_id) to every query feels cumbersome and error-prone.
Maybe I should be looking at the PreFilteredQuery or GlobalFilter recipes? I haven't quite gotten my head around those to know if they apply.
Maybe there's a way to use a query_property on the model Base?
I also ran across this experimental project that seems intriguing: https://github.com/mwhite/MultiAlchemy But I'm not sure if that is a solid strategy or if there is something egregiously wrong with it.
Any advice, references or direction would be appreciated!
It sounds like you saw the schema-based multitenancy pattern which you
can see at https://docs.sqlalchemy.org/en/latest/core/connections.html#translation-of-schema-names
, which is made possible by the schema_translate_map feature. That
can absolutely be per-Session like this:

sess = Session()
sess.connection(execution_options={"schema_translate_map": {...}})

or like this

conn = engine.connect()
conn = conn.execution_options(....)
sess = Session(conn)
Maybe I should be looking at the PreFilteredQuery or GlobalFilter recipes? I haven't quite gotten my head around those to know if they apply.
they do but I think a more modern way to do this is to use the
before_compile event:

https://docs.sqlalchemy.org/en/latest/orm/events.html?highlight=before_compile#sqlalchemy.orm.events.QueryEvents.before_compile

the example there shows sort of the equivlent of PreFilteredQuery.
The wiki should be updated to refer to this event hook.
Thanks,
Eric
--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper
http://www.sqlalchemy.org/
To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.
--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.
Eric Smith
2018-11-19 22:27:33 UTC
Permalink
Thanks -- the "before_compile" event seems interesting -- that could solve
automatically adding a tenant on the SELECT side of things. What I wish for
is a usage pattern like:

1. When a web request comes in, determine the appropriate tenant
2. Set the tenant somewhere associated with the context of the request
3. Use the database as if it were single-tenant, relying on event
handlers/subclassed ORM objects/something to fill in the tenant for INSERT
and SELECT statements.

Does that seem realistic?

For schema-per-tenant, I can image how that would be done with
schema_translate_map or search_path, since that covers both INSERT and
SELECT. For row-per-tenant, is there something roughly equivalent to
before_compile to be able to modify INSERT statements to include a tenant?
There's the engine before_execute event -- maybe that would work -- but
someone in this group has said those are evil
<https://groups.google.com/forum/#!searchin/sqlalchemy/evil%7Csort:date/sqlalchemy/rtgKPoplH74/rELcSA1LBAAJ>
.

The "MultiAlchemy
<https://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Fmwhite%2FMultiAlchemy&sa=D&sntz=1&usg=AFQjCNHl7sPWv5DqsD1eX9cyjzWinhg04g>"
example I linked to earlier creates a Session subclass overriding the "add"
method to automatically include the tenant, but it seems like that would
miss objects that get added implicitly through relationships.

Finally, a specific question about before_compile. If the query I want to
modify is for first(), changing the query fails with:

sqlalchemy.exc.InvalidRequestError: Query.filter() being called on a Query
which already has LIMIT or OFFSET applied. To modify the row-limited
results of a Query, call from_self() first. Otherwise, call filter()
before limit() or offset() are applied.

If I use from_self(), I get:

RecursionError: maximum recursion depth exceeded while calling a Python
object


I guess because from_self causes the event handler to be fired again. I
could use some kind of flag to avoid that, but needing to do so makes me
wonder if I'm doing something wrong? Example:


from sqlalchemy import create_engine, Column, Integer, String, event,
Boolean
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker, Query

engine = create_engine('sqlite:///:memory:', echo=True)
Base = declarative_base()
Session = sessionmaker(bind=engine)


class User(Base):
__tablename__ = 'users'

id = Column(Integer, primary_key=True)
name = Column(String)
fullname = Column(String)
password = Column(String)
deleted = Column(Boolean, default=False)


@event.listens_for(Query, "before_compile", retval=True)
def no_deleted(query):
for desc in query.column_descriptions:
if desc['type'] is User:
entity = desc['entity']
query = query.from_self().filter(entity.deleted == False)
return query


Base.metadata.create_all(engine)
session = Session()
ed_user = User(name='ed', fullname='Ed Jones', password='edspassword')
session.add(ed_user)
session.commit()

any_not_deleted_user = session.query(User).first()


Thanks again for the guidance,

Eric
--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.
Mike Bayer
2018-11-20 15:44:27 UTC
Permalink
Thanks -- the "before_compile" event seems interesting -- that could solve automatically adding a tenant on the SELECT side of things.
note that I've rewritten the PreFilteredQuery and GloblalFilter
recipes entirely to use before_compile:

https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/FilteredQuery

However, FilteredQuery doesn't handle all SQL, just ORM queries, and
even then it has some caveats. There's ways to ensure a row-based
tenant is present in all kinds of queries but they're complicated and
there will always be more edge cases where something needs special
attention. In SQLAlchemy, we have a feature called "single-table
inheritance" which has many similarities, in that a certain column is
a discriminator for rows, and this discriminator has to be present at
all times. It's complicated and there have been many issues
(particularly joins) where it needed additional adjustment.
When a web request comes in, determine the appropriate tenant
Set the tenant somewhere associated with the context of the request
Use the database as if it were single-tenant, relying on event handlers/subclassed ORM objects/something to fill in the tenant for INSERT and SELECT statements.
That's the pattern all of these techniques are trying to go for.
Does that seem realistic?
For row-based tenancy you can do it but you would need to test every
new feature very carefully to make sure you aren't losing the tenant
for a certain row nor are you creating queries that are wrong. Below
is a LEFT OUTER JOIN that is used to try to find rows in "a" that
don't have a row in "b". This query should be rewritten to use a NOT
EXISTS subquery. However, this kind of query is common:

SELECT a.id LEFT OUTER JOIN b ON a.id=b.a_id WHERE b.id IS NULL

if we add logic that naively adds tenant IDs to everything:

SELECT a.id LEFT OUTER JOIN b ON a.id=b.a_id WHERE b.id IS NULL WHERE
a.tenant_id=5 AND b.tenant_id=5

now the query won't work. You'll get no rows for "a". the better query is:

SELECT a.id FROM a WHERE NOT EXISTS(SELECT 1 FROM b WHERE a.id=b.a_id
WHERE b.tenant_id=5) AND a.tenant_id=5

Overall, the scheme that's presented in the schema_translate_map
feature is the one that definitely works in all cases. But if you
have millions of "tenants", that's not really multitenancy IMHO,
that's user accounts.
For schema-per-tenant, I can image how that would be done with schema_translate_map or search_path, since that covers both INSERT and SELECT. For row-per-tenant, is there something roughly equivalent to before_compile to be able to modify INSERT statements to include a tenant? There's the engine before_execute event -- maybe that would work -- but someone in this group has said those are evil.
before_execute() and before_cursor_execute are your two events, and
since you'd want to intercept insert() structures before they are
compiled, you'd be using before_execute(). It's doable but you have
to test a lot. But it's also not necessary if you're fine sticking
to ORM only.
The "MultiAlchemy" example I linked to earlier creates a Session subclass overriding the "add" method to automatically include the tenant, but it seems like that would miss objects that get added implicitly through relationships.
If you want to do it at the ORM level only, for persistence there are
plenty of events you can use:

before_insert: https://docs.sqlalchemy.org/en/latest/orm/events.html?highlight=before_insert#sqlalchemy.orm.events.MapperEvents.before_insert
after_attach (or before_attach):
https://docs.sqlalchemy.org/en/latest/orm/events.html?highlight=after_attach#sqlalchemy.orm.events.SessionEvents.after_attach
init: https://docs.sqlalchemy.org/en/latest/orm/events.html?highlight=init#sqlalchemy.orm.events.InstanceEvents.init

Above, the "init" method is the earliest, and it is really just
another way to do the __init__() constructor on your object. that's
the event that polymorphic persistence uses and is likely the simplest
to use, because you have your tenant id set up front. But if the
tenant id is only associated with the Session then use
before_attach/after_attach.
sqlalchemy.exc.InvalidRequestError: Query.filter() being called on a Query which already has LIMIT or OFFSET applied. To modify the row-limited results of a Query, call from_self() first. Otherwise, call filter() before limit() or offset() are applied.
right you'll see in the wiki examples you have to call a special
method query.enable_assertions(False) which is explicitly for writing
these recipes. don't use from_self(), that gives you a much more
complicated query and is not appropriate here.
RecursionError: maximum recursion depth exceeded while calling a Python object
from sqlalchemy import create_engine, Column, Integer, String, event, Boolean
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker, Query
engine = create_engine('sqlite:///:memory:', echo=True)
Base = declarative_base()
Session = sessionmaker(bind=engine)
__tablename__ = 'users'
id = Column(Integer, primary_key=True)
name = Column(String)
fullname = Column(String)
password = Column(String)
deleted = Column(Boolean, default=False)
@event.listens_for(Query, "before_compile", retval=True)
entity = desc['entity']
query = query.from_self().filter(entity.deleted == False)
return query
Base.metadata.create_all(engine)
session = Session()
ed_user = User(name='ed', fullname='Ed Jones', password='edspassword')
session.add(ed_user)
session.commit()
any_not_deleted_user = session.query(User).first()
Thanks again for the guidance,
Eric
--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper
http://www.sqlalchemy.org/
To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.
--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.
Loading...