Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug(Impala): con.create_table fails when database name is not supplied to connection object #10263

Open
1 task done
contang0 opened this issue Oct 2, 2024 · 0 comments
Open
1 task done
Labels
bug Incorrect behavior inside of ibis

Comments

@contang0
Copy link

contang0 commented Oct 2, 2024

What happened?

I noticed that con.create_table fails when con is created without specifying a database (catalogue) name. Previously I used the first method, however since I need to do a lot of cross-catalogue joins I had to switch to the second option.

Method 1 works:

con = ibis.impala.connect(host=host, database="db_name", **kwargs)
con.create_table(
        name="tbl_name",
        obj=df,
        overwrite=True,
    )

Method 2 doesn't:

con = ibis.impala.connect(host=host, **kwargs)
con.create_table(
        name="tbl_name",
        obj=df,
        database="db_name",
        overwrite=True,
    )

The issue seems to be in function _register_in_memory_table

def _register_in_memory_table(self, op: ops.InMemoryTable) -> None:

Even though the database name is passed to con.create_table, it is not passed to _register_in_memory_table, and instead of creating a table under db_name.tbl_name, it attempts to create a table under the default database. I believe the call to sg.to_identifier should contain the database name.

I also noticed that the temp memtables tables created by _register_in_memory_table are never dropped after use.

What version of ibis are you using?

9.5.0

What backend(s) are you using, if any?

Impala

Relevant log output

{
	"name": "HiveServer2Error",
	"message": "AuthorizationException: User 'user' does not have privileges to execute 'CREATE' on: default
",
	"stack": "---------------------------------------------------------------------------
HiveServer2Error                          Traceback (most recent call last)
File proj\\main.py:1
----> 1 con.create_table(
      2         name=\"tbl_names\",
      3         obj=data,
      4         database=\"db_name\",
      5         overwrite=True,
      6     )

File ~\\Desktop\\Projects\\ibis\\ibis\\backends\\impala\\__init__.py:529, in Backend.create_table(self, name, obj, schema, database, temp, overwrite, external, format, location, partition, tbl_properties, like_parquet)
    526 if not isinstance(obj, ir.Table):
    527     obj = ibis.memtable(obj)
--> 529 self._run_pre_execute_hooks(obj)
    531 select = self.compile(obj)
    533 if overwrite:

File ~\\Desktop\\Projects\\ibis\\ibis\\backends\\__init__.py:1106, in BaseBackend._run_pre_execute_hooks(self, expr)
   1104 \"\"\"Backend-specific hooks to run before an expression is executed.\"\"\"
   1105 self._register_udfs(expr)
-> 1106 self._register_in_memory_tables(expr)

File ~\\Desktop\\Projects\\ibis\\ibis\\backends\\__init__.py:1081, in BaseBackend._register_in_memory_tables(self, expr)
   1079 for memtable in expr.op().find(ops.InMemoryTable):
   1080     if not self._in_memory_table_exists(memtable.name):
-> 1081         self._register_in_memory_table(memtable)
   1082         weakref.finalize(
   1083             memtable, self._finalize_in_memory_table, memtable.name
   1084         )

File ~\\Desktop\\Projects\\ibis\\ibis\\backends\\impala\\__init__.py:1255, in Backend._register_in_memory_table(self, op)
   1253 data = op.data.to_frame().itertuples(index=False)
   1254 insert_stmt = self._build_insert_template(name, schema=schema)
-> 1255 with self._safe_raw_sql(create_stmt) as cur:
   1256     for row in data:
   1257         cur.execute(insert_stmt, row)

File \\Lib\\contextlib.py:137, in _GeneratorContextManager.__enter__(self)
    135 del self.args, self.kwds, self.func
    136 try:
--> 137     return next(self.gen)
    138 except StopIteration:
    139     raise RuntimeError(\"generator didn't yield\") from None

File ~\\Desktop\\Projects\\ibis\\ibis\\backends\\impala\\__init__.py:279, in Backend._safe_raw_sql(self, query)
    276         query = query.compile()
    278 assert isinstance(query, str), type(query)
--> 279 with contextlib.closing(self.raw_sql(query)) as cur:
    280     yield cur

File ~\\Desktop\\Projects\\ibis\\ibis\\backends\\impala\\__init__.py:254, in Backend.raw_sql(self, query)
    251     cursor._wait_to_finish()
    253     util.log(query)
--> 254     cursor.execute_async(query)
    256     cursor._wait_to_finish()
    257 except (Exception, KeyboardInterrupt):

File \\Lib\\site-packages\\impala\\hiveserver2.py:388, in HiveServer2Cursor.execute_async(self, operation, parameters, configuration)
    383     op = self.session.execute(self._last_operation_string,
    384                               configuration,
    385                               run_async=True)
    386     self._last_operation = op
--> 388 self._execute_async(op)

File \\Lib\\site-packages\\impala\\hiveserver2.py:407, in HiveServer2Cursor._execute_async(self, operation_fn)
    405 self._reset_state()
    406 self._debug_log_state()
--> 407 operation_fn()
    408 self._last_operation_active = True
    409 self._debug_log_state()

File \\Lib\\site-packages\\impala\\hiveserver2.py:383, in HiveServer2Cursor.execute_async.<locals>.op()
    380 else:
    381     self._last_operation_string = operation
--> 383 op = self.session.execute(self._last_operation_string,
    384                           configuration,
    385                           run_async=True)
    386 self._last_operation = op

File \\Lib\\site-packages\\impala\\hiveserver2.py:1227, in HS2Session.execute(self, statement, configuration, run_async)
   1219 req = TExecuteStatementReq(sessionHandle=self.handle,
   1220                            statement=statement,
   1221                            confOverlay=configuration,
   1222                            runAsync=run_async)
   1223 # Do not try to retry http requests.
   1224 # Read queries should be idempotent but most dml queries are not. Also retrying
   1225 # query execution from client could be expensive and so likely makes sense to do
   1226 # it if server is also aware of the retries.
-> 1227 return self._operation('ExecuteStatement', req, False)

File \\Lib\\site-packages\\impala\\hiveserver2.py:1148, in ThriftRPC._operation(self, kind, request, retry_on_http_error)
   1147 def _operation(self, kind, request, retry_on_http_error=False):
-> 1148     resp = self._rpc(kind, request, retry_on_http_error)
   1149     return self._get_operation(resp.operationHandle)

File \\Lib\\site-packages\\impala\\hiveserver2.py:1085, in ThriftRPC._rpc(self, func_name, request, retry_on_http_error)
   1083 response = self._execute(func_name, request, retry_on_http_error)
   1084 self._log_response(func_name, response)
-> 1085 err_if_rpc_not_ok(response)
   1086 return response

File \\Lib\\site-packages\\impala\\hiveserver2.py:781, in err_if_rpc_not_ok(resp)
    777 def err_if_rpc_not_ok(resp):
    778     if (resp.status.statusCode != TStatusCode.SUCCESS_STATUS and
    779             resp.status.statusCode != TStatusCode.SUCCESS_WITH_INFO_STATUS and
    780             resp.status.statusCode != TStatusCode.STILL_EXECUTING_STATUS):
--> 781         raise HiveServer2Error(resp.status.errorMessage)

HiveServer2Error: AuthorizationException: User 'user' does not have privileges to execute 'CREATE' on: default
"
}

Code of Conduct

  • I agree to follow this project's Code of Conduct
@contang0 contang0 added the bug Incorrect behavior inside of ibis label Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior inside of ibis
Projects
Status: backlog
Development

No branches or pull requests

1 participant