Programming
python oracle dask
Updated Fri, 09 Sep 2022 09:53:25 GMT

Dask read_sql_table throws error AttributeError: 'Connection' object has no attribute '_instantiate_plugins'


from sqlalchemy import create_engine
sql_engine = create_engine('oracle+cx_oracle://user:password@environment')
connection = sql_engine.connect()
# works with PANDAS?
import pandas as pd
pd.read_sql_table('table', connection)

The above gives expected output. Now for Dask...

import dask.dataframe as dd
dd.read_sql_table('table', connection)

...which gives...

--------------------------------------------------------------------------- TypeError Traceback (most recent call last) in ----> 1 dd.read_sql_table('qsi_run_info_int', connection)#, 'hedis_version_key')

TypeError: read_sql_table() missing 1 required positional argument: 'index_col'

...and adding column to index on via...

dd.read_sql_table('table', connection, 'column_to_index_on')

...gives...

--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) in ----> 1 dd.read_sql_table('qsi_run_info_int', connection, 'hedis_version_key')

~.conda\envs\test_dask\lib\site-packages\dask\dataframe\io\sql.py in read_sql_table(table, uri, index_col, divisions, npartitions, limits, columns, bytes_per_chunk, head_rows, schema, meta, engine_kwargs, **kwargs) 98 raise ValueError("Must specify index column to partition on") 99 engine_kwargs = {} if engine_kwargs is None else engine_kwargs --> 100 engine = sa.create_engine(uri, **engine_kwargs) 101 m = sa.MetaData() 102 if isinstance(table, string_types):

~.conda\envs\test_dask\lib\site-packages\sqlalchemy\engine__init__.py in create_engine(*args, **kwargs) 433 strategy = kwargs.pop("strategy", default_strategy) 434 strategy = strategies.strategies[strategy] --> 435 return strategy.create(*args, **kwargs) 436 437

~.conda\envs\test_dask\lib\site-packages\sqlalchemy\engine\strategies.py in create(self, name_or_url, **kwargs) 54 u = url.make_url(name_or_url) 55 ---> 56 plugins = u._instantiate_plugins(kwargs) 57 58 u.query.pop("plugin", None)

AttributeError: 'Connection' object has no attribute '_instantiate_plugins'

What to try next?




Solution

The second argument to read_sql_table, as the docstring specifies, should not be a connection object, but the URI to create that object (in your case the "oracle..." string). This is because the sqlalchemy engines are not serialisable nor thread-safe, so the tasks running the queries need to create their own local versions of the engine.