-
Notifications
You must be signed in to change notification settings - Fork 67
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
9c4ff7d
commit 45eb2ed
Showing
1,285 changed files
with
1,781,465 additions
and
6 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+54.7 KB
website/static/docs/0.288.1/_images/materialized_shuffle_execution.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+75.9 KB
website/static/docs/0.288.1/_images/serialized-page-string-column.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+59.5 KB
website/static/docs/0.288.1/_images/worker-protocol-output-buffers.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
************** | ||
Administration | ||
************** | ||
|
||
.. toctree:: | ||
:maxdepth: 1 | ||
|
||
admin/web-interface | ||
admin/tuning | ||
admin/properties | ||
admin/spill | ||
admin/exchange-materialization | ||
admin/cte-materialization | ||
admin/resource-groups | ||
admin/session-property-managers | ||
admin/function-namespace-managers | ||
admin/dist-sort | ||
admin/verifier |
126 changes: 126 additions & 0 deletions
126
website/static/docs/0.288.1/_sources/admin/cte-materialization.rst.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,126 @@ | ||
=================== | ||
CTE Materialization | ||
=================== | ||
|
||
Common Table Expressions (CTEs) are subqueries that appear in a WITH clause provided by the user. | ||
Their repeated usage in a query can lead to redundant computations, excessive data retrieval, and high resource consumption. | ||
|
||
To address this, Presto supports CTE Materialization allowing intermediate CTEs to be reused within the scope of the same query. | ||
Materializing CTEs can improve performance when the same CTE is used multiple times in a query by reducing recomputation of the CTE. However, there is also a cost to writing to and reading from disk, so the optimization may not be beneficial for very simple CTEs | ||
or CTEs that are not used many times in a query. | ||
|
||
Materialized CTEs are stored in temporary tables that are bucketed based on random hashing. | ||
To use this feature, the connector used by the query must support the creation of temporary tables. Currently, only the :doc:`/connector/hive` offers this capability. | ||
The QueryStats (com.facebook.presto.spi.eventlistener.QueryStatistics#writtenIntermediateBytes) expose a metric to the event listener to monitor the bytes written to intermediate storage by temporary tables. | ||
|
||
How to use CTE Materialization | ||
------------------------------ | ||
|
||
The following configurations and session properties enable CTE materialization and modify its settings. | ||
|
||
``cte-materialization-strategy`` | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
* **Type:** ``string`` | ||
* **Allowed values:** ``ALL``, ``NONE``, ``HEURISTIC``, ``HEURISTIC_COMPLEX_QUERIES_ONLY`` | ||
* **Default value:** ``NONE`` | ||
|
||
Specifies the strategy for materializing Common Table Expressions (CTEs) in queries. | ||
|
||
``NONE`` - no CTEs will be materialized. | ||
|
||
``ALL`` - all CTEs in the query will be materialized. | ||
|
||
``HEURISTIC`` - greedily materializes the earliest parent CTE, which is repeated >= ``cte_heuristic_replication_threshold`` times. | ||
|
||
``HEURISTIC_COMPLEX_QUERIES_ONLY`` greedily materializes the earliest parent CTE which meets the ``HEURISTIC`` criteria and has a join or aggregate. | ||
|
||
Use the ``cte_materialization_strategy`` session property to set on a per-query basis. | ||
|
||
``cte-heuristic-replication-threshold`` | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
* **Type:** ``integer`` | ||
* **Minimum value:** ``0`` | ||
* **Default value:** ``4`` | ||
|
||
When ``cte-materialization-strategy`` is set to ``HEURISTIC`` or ``HEURISTIC_COMPLEX_QUERIES_ONLY``, then CTEs will be materialized if they appear in a query at least ``cte-heuristic-replication-threshold`` number of times. | ||
|
||
Use the ``cte_heuristic_replication_threshold`` session property to set on a per-query basis. | ||
|
||
``query.cte-partitioning-provider-catalog`` | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
* **Type:** ``string`` | ||
* **Default value:** ``system`` | ||
|
||
The name of the catalog that provides custom partitioning for CTE materialization. | ||
This setting specifies which catalog should be used for CTE materialization. | ||
|
||
Use the ``cte_partitioning_provider_catalog`` session property to set on a per-query basis. | ||
|
||
``cte-filter-and-projection-pushdown-enabled`` | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
* **Type:** ``boolean`` | ||
* **Default value:** ``true`` | ||
|
||
Flag to enable or disable the pushdown of common filters and projects into the materialized CTE. | ||
|
||
Use the ``cte_filter_and_projection_pushdown_enabled`` session property to set on a per-query basis. | ||
|
||
``hive.cte-virtual-bucket-count`` | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
* **Type:** ``integer`` | ||
* **Default value:** ``128`` | ||
|
||
The number of buckets to be used for materializing CTEs in queries. | ||
This setting determines how many buckets should be used when materializing the CTEs, potentially affecting the performance of queries involving CTE materialization. | ||
A higher number of partitions might improve parallelism but also increases overhead in terms of memory and network communication. | ||
|
||
Recommended value: 4 - 10x times the size of the cluster. | ||
|
||
Use the ``hive.cte_virtual_bucket_count`` session property to set on a per-query basis. | ||
|
||
``hive.temporary-table-storage-format`` | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
* **Type:** ``string`` | ||
* **Allowed values:** ``PAGEFILE``, ``ORC``, ``DWRF``, ``ALPHA``, ``PARQUET``, ``AVRO``, ``RCBINARY``, ``RCTEXT``, ``SEQUENCEFILE``, ``JSON``, ``TEXTFILE``, ``CSV`` | ||
* **Default value:** ``ORC`` | ||
|
||
This setting determines the data format for temporary tables generated by CTE materialization. The recommended value is ``PAGEFILE`` :doc:`/develop/serialized-page`, as it is the most performant, | ||
since it avoids serialization and deserialization during reads and writes, allowing for direct storage of Presto pages. | ||
|
||
Use the ``hive.temporary_table_storage_format`` session property to set on a per-query basis. | ||
|
||
``hive.bucket-function-type-for-cte-materialization`` | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
* **Type:** ``string`` | ||
* **Allowed values:** ``HIVE_COMPATIBLE``, ``PRESTO_NATIVE`` | ||
* **Default value:** ``PRESTO_NATIVE`` | ||
|
||
This setting specifies the Hash function type for CTE materialization. | ||
|
||
Use the ``hive.bucket_function_type_for_cte_materialization`` session property to set on a per-query basis. | ||
|
||
|
||
``query.max-written-intermediate-bytes`` | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
* **Type:** ``DataSize`` | ||
* **Default value:** ``2TB`` | ||
|
||
This setting defines a cap on the amount of data that can be written during CTE Materialization. If a query exceeds this limit, it will fail. | ||
|
||
Use the ``query_max_written_intermediate_bytes`` session property to set on a per-query basis. | ||
|
||
|
||
How to Participate in Development | ||
--------------------------------- | ||
|
||
List of issues - (https://github.com/prestodb/presto/labels/cte_materialization) | ||
|
||
|
17 changes: 17 additions & 0 deletions
17
website/static/docs/0.288.1/_sources/admin/dist-sort.rst.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
================ | ||
Distributed sort | ||
================ | ||
|
||
Distributed sort allows to sort data which exceeds ``query.max-memory-per-node``. | ||
Distributed sort is enabled via ``distributed_sort`` session property or | ||
``distributed-sort`` configuration property set in | ||
``etc/config.properties`` of the coordinator. Distributed sort is enabled by | ||
default. | ||
|
||
When distributed sort is enabled, sort operator executes in parallel on multiple | ||
nodes in the cluster. Partially sorted data from each Presto worker node is then streamed | ||
to a single worker node for a final merge. This technique allows to utilize memory of multiple | ||
Presto worker nodes for sorting. The primary purpose of distributed sort is to allow for sorting | ||
of data sets which don't normally fit into single node memory. Performance improvement | ||
can be expected, but it won't scale linearly with the number of nodes since the | ||
data needs to be merged by a single node. |
59 changes: 59 additions & 0 deletions
59
website/static/docs/0.288.1/_sources/admin/exchange-materialization.rst.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
======================== | ||
Exchange Materialization | ||
======================== | ||
|
||
Presto allows exchange materialization to support memory intensive queries. | ||
This mechanism brings MapReduce-style execution to Presto's MPP architecture runtime, | ||
and can be applied together with :doc:`/admin/spill`. | ||
|
||
Introduction | ||
------------ | ||
|
||
As with other MPP databases, Presto leverages RPC shuffle to achieve efficient and | ||
low-latency query execution for join and aggregation. However, RPC shuffle | ||
also requires all the producers and consumers to be executed concurrently until the | ||
query is finished. | ||
|
||
To illustrates this, consider the aggregation query: | ||
|
||
.. code-block:: sql | ||
SELECT custkey, SUM(totalprice) | ||
FROM orders | ||
GROUP BY custkey | ||
The following figure demonstrates how this query executes in Presto classic mode: | ||
|
||
.. figure:: ../images/rpc_shuffle_execution.png | ||
:align: center | ||
|
||
With exchange materialization, the intermediate shuffle data is written to disk (currently, | ||
it is always a temporary Hive bucketed table). This opens the opportunity for flexible scheduling policies | ||
on the aggregation side, as only a subset of aggregation data needs to be held in memory at the | ||
same time -- this execution strategy is called "grouped execution" in Presto. | ||
|
||
.. figure:: ../images/materialized_shuffle_execution.png | ||
:align: center | ||
|
||
Using Exchange Materialization | ||
------------------------------ | ||
|
||
Exchange materialization can be enabled on per-query basis by setting the following 3 session properties: | ||
``exchange_materialization_strategy``, ``partitioning_provider_catalog`` and ``hash_partition_count``: | ||
|
||
.. code-block:: sql | ||
SET SESSION exchange_materialization_strategy='ALL'; | ||
-- Set partitioning_provider_catalog to the Hive connector catalog | ||
SET SESSION partitioning_provider_catalog='hive'; | ||
-- We recommend setting hash_partition_count to be at least 5X-10X about the cluster size | ||
-- when exchange materialization is enabled. | ||
SET SESSION hash_partition_count = 4096; | ||
To make it easy for user to use exchange materialization, the admin can leverage :doc:`/admin/session-property-managers` | ||
to set the session properties automatically based on client tags. The example in :doc:`/admin/session-property-managers` | ||
demonstrates how to automatically enable exchange materialization for queries with ``high_mem_etl`` tag. | ||
|
97 changes: 97 additions & 0 deletions
97
website/static/docs/0.288.1/_sources/admin/function-namespace-managers.rst.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,97 @@ | ||
=========================== | ||
Function Namespace Managers | ||
=========================== | ||
|
||
.. warning:: | ||
|
||
This is an experimental feature being actively developed. The way | ||
Function Namespace Managers are configured might be changed. | ||
|
||
Function namespace managers support storing and retrieving SQL | ||
functions, allowing the Presto engine to perform actions such as | ||
creating, altering, deleting functions. | ||
|
||
A function namespace is in the format of ``catalog.schema`` (e.g. | ||
``example.test``). It can be thought of as a schema for storing | ||
functions. However, it is not a full fledged schema as it does not | ||
support storing tables and views, but only functions. | ||
|
||
Each Presto function, whether built-in or user-defined, resides in | ||
a function namespace. All built-in functions reside in the | ||
``presto.default`` function namespace. The qualified function name of | ||
a function is the function namespace in which it reside followed by | ||
its function name (e.g. ``example.test.func``). Built-in functions can | ||
be referenced in queries with their function namespaces omitted, while | ||
user-defined functions needs to be referenced by its qualified function | ||
name. A function is uniquely identified by its qualified function name | ||
and parameter type list. | ||
|
||
Each function namespace manager binds to a catalog name and manages all | ||
functions within that catalog. Using the catalog name of an existing | ||
connector is discouraged, as the behavior is not defined nor tested, | ||
and will be disallowed in the future. | ||
|
||
Currently, those catalog names do not correspond to real catalogs. | ||
They cannot be specified as the catalog in a session, nor do they | ||
support :doc:`/sql/create-schema`, :doc:`/sql/alter-schema`, | ||
:doc:`/sql/drop-schema`, or :doc:`/sql/show-schemas`. Instead, | ||
namespaces can be added using the methods described below. | ||
|
||
|
||
Configuration | ||
------------- | ||
|
||
Presto currently stores all function namespace manager related | ||
information in MySQL. | ||
|
||
To instantiate a MySQL-based function namespace manager that manages | ||
catalog ``example``, administrator needs to first have a running MySQL | ||
server. Suppose the MySQL server can be reached at ``localhost:1080``, | ||
add a file ``etc/function-namespace/example.properties`` with the | ||
following contents:: | ||
|
||
function-namespace-manager.name=mysql | ||
database-url=jdbc:mysql://example.net:3306/database?user=root&password=password | ||
function-namespaces-table-name=example_function_namespaces | ||
functions-table-name=example_sql_functions | ||
|
||
When Presto first starts with the above MySQL function namespace | ||
manager configuration, two MySQL tables will be created if they do | ||
not exist. | ||
|
||
- ``example_function_namespaces`` stores function namespaces of | ||
the catalog ``example``. | ||
- ``example_sql_functions`` stores SQL-invoked functions of the | ||
catalog ``example``. | ||
|
||
Multiple function namespace managers can be instantiated by placing | ||
multiple properties files under ``etc/function-namespace``. They | ||
may be configured to use the same tables. If so, each manager will | ||
only create and interact with entries of the catalog to which it binds. | ||
|
||
To create a new function namespace, insert into the | ||
``example_function_namespaces`` table:: | ||
|
||
INSERT INTO example_function_namespaces (catalog_name, schema_name) | ||
VALUES('example', 'test'); | ||
|
||
|
||
Configuration Reference | ||
----------------------- | ||
|
||
``function-namespace-manager.name`` is the type of the function namespace manager to instantiate. Currently, only ``mysql`` is supported. | ||
|
||
The following table lists all configuration properties supported by the MySQL function namespace manager. | ||
|
||
=========================================== ================================================================================================== | ||
Name Description | ||
=========================================== ================================================================================================== | ||
``database-url`` The URL of the MySQL database used by the MySQL function namespace manager. | ||
``function-namespaces-table-name`` The name of the table that stores all the function namespaces managed by this manager. | ||
``functions-table-name`` The name of the table that stores all the functions managed by this manager. | ||
=========================================== ================================================================================================== | ||
|
||
See Also | ||
-------- | ||
|
||
:doc:`../sql/create-function`, :doc:`../sql/alter-function`, :doc:`../sql/drop-function`, :doc:`../sql/show-functions` |
Oops, something went wrong.