Quickly move data from postgres to numpy or pandas.
Run the query returning a the results as np.ndarrays. Parameters ---------- query : sa.sql.Selectable The query to run. This can be a select or a table. bind : sa.Engine, optional The engine used to create the connection. If not provided ``query.bind`` will be used. Returns ------- arrays : dict[str, (np.ndarray, np.ndarray)] A map from column name to the result arrays. The first array holds the values and the second array is a boolean mask for NULLs. The values where the mask is False are 0 interpreted by the type.
Run the query returning a the results as a pd.DataFrame. Parameters ---------- query : sa.sql.Selectable The query to run. This can be a select or a table. bind : sa.Engine, optional The engine used to create the connection. If not provided ``query.bind`` will be used. null_values : dict[str, any] The null values to use for each column. This falls back to ``warp_prism.null_values`` for columns that are not specified. Returns ------- df : pd.DataFrame A pandas DataFrame holding the results of the query. The columns of the DataFrame will be named the same and be in the same order as the query.
Register an odo edge for sqlalchemy selectable objects to dataframe. This edge will have a lower cost that the default edge so it will be selected as the fasted path. If the selectable is not in a postgres database, it will fallback to the default odo edge.
A quick comparison between warp_prism
, odo
, and pd.read_sql_table
.
In this example we will read real data for VIX from quandl stored in a local
postgres database using warp_prism
, odo
, and pd.read_sql_table
.
After that, we will use odo
to create a table with two float columns and
1000000 rows and query it with the tree tools again.
In [1]: import warp_prism
In [2]: from odo import odo, resource
In [3]: import pandas as pd
In [4]: table = resource(
...: 'postgresql://localhost/bz::yahoo_index_vix',
...: schema='quandl',
...: )
In [5]: warp_prism.to_dataframe(table).head()
Out[5]:
asof_date open_ high low close volume \
0 2016-01-08 22.959999 27.080000 22.480000 27.010000 0.0
1 2015-12-04 17.430000 17.650000 14.690000 14.810000 0.0
2 2015-10-29 14.800000 15.460000 14.330000 14.610000 0.0
3 2015-12-21 19.639999 20.209999 18.700001 18.700001 0.0
4 2015-10-26 14.760000 15.430000 14.680000 15.290000 0.0
adjusted_close timestamp
0 27.010000 2016-01-11 23:14:54.682220
1 14.810000 2016-01-11 23:14:54.682220
2 14.610000 2016-01-11 23:14:54.682220
3 18.700001 2016-01-11 23:14:54.682220
4 15.290000 2016-01-11 23:14:54.682220
In [6]: odo(table, pd.DataFrame).head()
Out[6]:
asof_date open_ high low close volume \
0 2016-01-08 22.959999 27.080000 22.480000 27.010000 0.0
1 2015-12-04 17.430000 17.650000 14.690000 14.810000 0.0
2 2015-10-29 14.800000 15.460000 14.330000 14.610000 0.0
3 2015-12-21 19.639999 20.209999 18.700001 18.700001 0.0
4 2015-10-26 14.760000 15.430000 14.680000 15.290000 0.0
adjusted_close timestamp
0 27.010000 2016-01-11 23:14:54.682220
1 14.810000 2016-01-11 23:14:54.682220
2 14.610000 2016-01-11 23:14:54.682220
3 18.700001 2016-01-11 23:14:54.682220
4 15.290000 2016-01-11 23:14:54.682220
In [7]: pd.read_sql_table(table.name, table.bind, table.schema).head()
Out[7]:
asof_date open_ high low close volume \
0 2016-01-08 22.959999 27.080000 22.480000 27.010000 0.0
1 2015-12-04 17.430000 17.650000 14.690000 14.810000 0.0
2 2015-10-29 14.800000 15.460000 14.330000 14.610000 0.0
3 2015-12-21 19.639999 20.209999 18.700001 18.700001 0.0
4 2015-10-26 14.760000 15.430000 14.680000 15.290000 0.0
adjusted_close timestamp
0 27.010000 2016-01-11 23:14:54.682220
1 14.810000 2016-01-11 23:14:54.682220
2 14.610000 2016-01-11 23:14:54.682220
3 18.700001 2016-01-11 23:14:54.682220
4 15.290000 2016-01-11 23:14:54.682220
In [8]: len(warp_prism.to_dataframe(table))
Out[8]: 6565
In [9]: %timeit warp_prism.to_dataframe(table)
100 loops, best of 3: 7.55 ms per loop
In [10]: %timeit odo(table, pd.DataFrame)
10 loops, best of 3: 49.9 ms per loop
In [11]: %timeit pd.read_sql_table(table.name, table.bind, table.schema)
10 loops, best of 3: 61.8 ms per loop
In [12]: big_table = odo(
...: pd.DataFrame({
...: 'a': np.random.rand(1000000),
...: 'b': np.random.rand(1000000)},
...: ),
...: 'postgresql://localhost/test::largefloattest',
...: )
In [13]: %timeit warp_prism.to_dataframe(big_table)
1 loop, best of 3: 248 ms per loop
In [14]: %timeit odo(big_table, pd.DataFrame)
1 loop, best of 3: 1.51 s per loop
In [15]: %timeit pd.read_sql_table(big_table.name, big_table.bind)
1 loop, best of 3: 1.9 s per loop
Warp Prism can be pip installed but requires numpy to build its C extensions:
$ pip install numpy $ pip install warp_prism
Warp Prism is licensed under the Apache 2.0.
Warp Prism is sponsored by Quantopian where it is used to fetch data for use in Zipline through the Pipeline API or interactively with Blaze.