Discussion:
[sqlalchemy] Deleting rows from a table where row elements are found in a pandas.DataFrame
Jevgenij Kusakovskij
2018-01-03 16:21:47 UTC
Permalink
I feel bad bugging you with questions everyday, but it seems that I get
them answered here... Hope I am not abusing your hospitality.

I would like to delete multiple rows from a table on a MS SQL Server. The
rows to be deleted are given in a `pandas.DataFrame` object and there can
be thousands or even more of them.

The way I tried to tackle the problem is by using the query construction
methods and then passing it all to connection.execute(). My code looks
something like this:

def delete_rows(self, tablename, data):
"""
Args:
tablename - name of db table, e.g. '[schema].[name]'
data - pandas.DataFrame
"""
# Get column names
colnames = data.columns.values
# Create sa.column() objects for producing bindparams
cols = [sa.column(name) for name in data.columns.values]

# Build a DELETE query
query = sa.table(tablename).delete().\
where(sa.and_(*[c == sa.bindparam(cn) for c, cn in
zip(cols,colnames)]))
# Execute query
self.connection.execute(query, data.to_dict(orient='records'))

When running with inputs, that are seen in the echos of the engine, I get
an error:

....
INFO sqlalchemy.engine.base.Engine DELETE FROM some.table WHERE col1 =
? AND col2 = ? AND col3 = ?
INFO sqlalchemy.engine.base.Engine ((123, Timestamp('2012-05-22
00:00:00'), 123, 123), (456, Timestamp('2012-01-27 00:00:00'), 456, 456))
INFO sqlalchemy.engine.base.Engine ROLLBACK
....
pyodbc.ProgrammingError: ('Expected 0 parameters, supplied 4', 'HY000')

Could this be related to the way the dates are stored in the `DataFrame`?

I am using pandas-0.20.3, pyODBC-4.0.21 and SQLAlchemy-1.1.13.
--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.
Mike Bayer
2018-01-03 16:55:41 UTC
Permalink
I feel bad bugging you with questions everyday, but it seems that I get them
answered here... Hope I am not abusing your hospitality.
I would like to delete multiple rows from a table on a MS SQL Server. The
rows to be deleted are given in a `pandas.DataFrame` object and there can be
thousands or even more of them.
The way I tried to tackle the problem is by using the query construction
methods and then passing it all to connection.execute(). My code looks
"""
tablename - name of db table, e.g. '[schema].[name]'
data - pandas.DataFrame
"""
# Get column names
colnames = data.columns.values
# Create sa.column() objects for producing bindparams
cols = [sa.column(name) for name in data.columns.values]
# Build a DELETE query
query = sa.table(tablename).delete().\
where(sa.and_(*[c == sa.bindparam(cn) for c, cn in
zip(cols,colnames)]))
# Execute query
self.connection.execute(query, data.to_dict(orient='records'))
When running with inputs, that are seen in the echos of the engine, I get an
I think the first problem is you can't create a column() on the fly
like that and use it in the delete() statement, these Column objects
must be in terms of the Table you are operating upon, so

t = table(tablename, *[column(name) for name in names])

# pseudocode, not quite right, the idea is the column comes from
table.c.<colname>
query = t.delete().where(and_(table.c[name] = bindparam(name) for name
in names))


the next problem is that this is an odd way to DELETE rows, by naming
every single column in the row in the WHERE clause. The data you
have here usually should have some kind of natural key so you can
identify the rows more accurately and efficiently on just those
columns that identify the row.

that should get you a bit further at least.
....
INFO sqlalchemy.engine.base.Engine DELETE FROM some.table WHERE col1 = ?
AND col2 = ? AND col3 = ?
INFO sqlalchemy.engine.base.Engine ((123, Timestamp('2012-05-22
00:00:00'), 123, 123), (456, Timestamp('2012-01-27 00:00:00'), 456, 456))
INFO sqlalchemy.engine.base.Engine ROLLBACK
....
pyodbc.ProgrammingError: ('Expected 0 parameters, supplied 4', 'HY000')
Could this be related to the way the dates are stored in the `DataFrame`?
I am using pandas-0.20.3, pyODBC-4.0.21 and SQLAlchemy-1.1.13.
--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper
http://www.sqlalchemy.org/
To post example code, please provide an MCVE: Minimal, Complete, and
Verifiable Example. See http://stackoverflow.com/help/mcve for a full
description.
---
You received this message because you are subscribed to the Google Groups
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.
--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.
Jevgenij Kusakovskij
2018-01-03 17:30:08 UTC
Permalink
Thank you for such a quick response!

Tried your suggestions, but still getting the same error.

And could it be caused by some data in the DataFrame being 'odd'? E.g.
Timestamp('2012-05-22 00:00:00') assigned to col2.
--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.
Mike Bayer
2018-01-03 20:39:26 UTC
Permalink
Post by Jevgenij Kusakovskij
Thank you for such a quick response!
Tried your suggestions, but still getting the same error.
And could it be caused by some data in the DataFrame being 'odd'? E.g.
Timestamp('2012-05-22 00:00:00') assigned to col2.
maybe but that's not what the error would necessarily look like.

what does data.to_dict() return, is that a list of dictionaries?
Usually I step into these things with pdb and try smaller forms of
what's going on to isolate how it winds up in an error situation. If
you are new to Python, there'd be a learning curve here using pdb and
whatnot (worth it though IMO).
Post by Jevgenij Kusakovskij
--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper
http://www.sqlalchemy.org/
To post example code, please provide an MCVE: Minimal, Complete, and
Verifiable Example. See http://stackoverflow.com/help/mcve for a full
description.
---
You received this message because you are subscribed to the Google Groups
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.
--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.
Jevgenij Kusakovskij
2018-01-03 21:04:58 UTC
Permalink
It is a list of dictionaries indeed.

I will investigate this issue tomorrow. If something useful comes out or if
I have any more questions, I will write back here.
--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.
Jevgenij Kusakovskij
2018-01-04 10:05:03 UTC
Permalink
I sorted this problem out and, as you predicted, it turned out to be layers
upon layers of problems. If
you do not mind having a quick look at how I resolved this, I would really
appreciate some feedback
and a second opinion...

I actually missed the main problem initially. It had to do with the name of
the table being quoted
incorrectly, which in turn was due to incorrect inputs on my part. In my
actual code the variable of the
table name was *tablename = '[some].[long].[name]'*, which was translated
to a DELETE query with
additional quotes: *'[[some].[long].[name]]'*. I saw then that I had to
provide the schema and the table
names separately - and ASAP read much more about databases and MS SQL
syntax.

I then very lazily tried to add these two to the initiation call of the
table object. Using your pseudocode
and the names from the previous paragraph:


* t = table('name', *[column(name) for name in names],
schema='some.long') query = t.delete().where(and_(table.c[n] =
bindparam(n) for n in names)) *

This resulted in a *TypeError: table() got an unexpected keyword argument
'schema'. *I then made my
peace with the fact that at the moment the depth of my database knowledge
is only sufficient for
building queries with strings and for loops, but here it will not be good
enough. :)

I finally made a call to let the module take care of as much as possible
and everything worked. In the
end I reflected the table from the server and then constructed the query
using that object:

* meta = sa.MetaData(bind=self.engine, schema=schema_name)*
* reflected_table = sa.Table(table_name, meta, autoload=True)*

* # Get column names from the input DataFrame - data*
* colnames = data.columns.values*

* # Build a DELETE query with the SQLAlchemy builder*
* query = reflected_table.delete().\*
* where(sa.and_(reflected_table.c[cname] == sa.bindparam(cname) for
cname in colnames))*

* self.connection.execute(query, data.to_dict(orient='records'))*

With my limited knowledge of everything, it seems that there are several
benefits here. First of
all, I can further optimize my query by looking for the natural key in the
reflected table and then
simply use it in the WHERE conditions. Also, am I right to think that this
way of working provides
a consistency check of the column names in the table and the DataFrame?
This might be a good
indicator for people using this method when they are doing something wrong.

This was a long one, so I will not hold it against you if you did not make
it this far. :) If you are
reading this and still open to give me some feedback, thanks a lot for your
help and for a cool
module that you have built here. I came to Python and database tasks a few
weeks ago and
am still having fun (in some peculiar way) with getting the hang of it.
Keep up the good work!
--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.
Loading...