forked from elastic/rally-tracks
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Added readme for so_semantic_text track
- Loading branch information
Showing
3 changed files
with
109 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
## StackOverflow semantic_text track | ||
|
||
This track benchmarks the `semantic_text` field type and `semantic` query performance. | ||
It also enables comparison with existing inference generation & query approaches, | ||
with a focus on comparison with ML inference pipelines and the `text_expansion` query. | ||
|
||
The corpus is the same as that used by the `so` track; see `so/README.md` for more info about it. | ||
|
||
### Generating the query set | ||
|
||
Since the `so` track does not include queries, they were synthetically generated for this track | ||
using the titles of questions from the corpus. | ||
Only the first 1,000,000 questions were used to keep the query file size manageable. | ||
|
||
To regenerate the query set from scratch, download the `so` corpus using | ||
[this link](https://rally-tracks.elastic.co/so/posts.json.bz2) and run the query generation script: | ||
|
||
```shell | ||
./_tools/generate_queries.py -c 1000000 <path_to_posts_file> queries.txt.bz2 | ||
``` | ||
|
||
### Parameters | ||
|
||
This track allows to overwrite the following parameters with Rally 0.8.0+ using `--track-params`: | ||
|
||
* `bulk_size` (default: 10) | ||
* `bulk_indexing_clients` (default: 8): Number of clients that issue bulk indexing requests. | ||
* `ingest_percentage` (default: 100): A number between 0 and 100 that defines how much of the document corpus should be ingested. | ||
* `number_of_replicas` (default: 0) | ||
* `number_of_shards` (default: 5) | ||
* `source_enabled` (default: true): A boolean defining whether the `_source` field is stored in the index. | ||
* `index_settings`: A list of index settings. Index settings defined elsewhere (e.g. `number_of_replicas`) need to be overridden explicitly. | ||
* `cluster_health` (default: "green"): The minimum required cluster health. | ||
* `error_level` (default: "non-fatal"): Available for bulk operations only to specify ignore-response-error-level. | ||
* `post_ingest_sleep` (default: false): Whether to pause after ingest and prior to subsequent operations. | ||
* `post_ingest_sleep_duration` (default: 30): Sleep duration in seconds. | ||
* `use_pipelines` (default: false): A boolean defining whether to use an ML inference pipeline or a `semantic_text` field. | ||
This value also controls the query type used in the `semantic-search` operation. | ||
See the flow chart below for more details. | ||
* `elser_model_id` (default: ".elser_model_2"): The name of the ELSER model to use. | ||
* `num_allocations` (default: 1): The number of ELSER allocations to deploy. | ||
* `num_threads` (default: 1): The number of threads to use per ELSER allocation. | ||
* `calculate_body_vector` (default: false): A boolean defining whether inference should be performed on the `body` field during ingest. | ||
* `enable_search` (default: false): A boolean defining if the `semantic-search` operation is enabled. | ||
* `semantic_search_clients` (default: 1): The number of clients that issue queries in the `semantic-search` operation. | ||
* `semantic_search_time_period` (default: 300): The time period, in seconds, to execute the `semantic-search` operation for. | ||
* `semantic_search_warmup_time_period` (default: 10): The warmup time period, in seconds, for the `semantic-search` operation. | ||
* `semantic_search_page_size` (default: 20): The number of results to fetch for each query. | ||
* `use_nested_text_expansion` (default: false): A boolean defining if a nested `text_expansion` query is used instead of a `semantic` query. | ||
See the flow chart below for more details. | ||
|
||
When the `semantic-search` operation is enabled, the type of query executed is controlled by multiple parameters: | ||
|
||
![image](query_flow_chart.png) | ||
|
||
### License | ||
|
||
We use the same license for the data as the original data: [CC-SA-3.0](http://creativecommons.org/licenses/by-sa/3.0/) | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
<mxfile host="app.diagrams.net" modified="2024-05-08T18:07:43.482Z" agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36" etag="MGauHMRWr4ZgbiEgyRJX" version="24.3.1" type="device"> | ||
<diagram name="Page-1" id="V41hylW0FixsCSBRaKah"> | ||
<mxGraphModel dx="745" dy="1060" grid="1" gridSize="10" guides="1" tooltips="1" connect="1" arrows="1" fold="1" page="1" pageScale="1" pageWidth="850" pageHeight="1100" background="#ffffff" math="0" shadow="0"> | ||
<root> | ||
<mxCell id="0" /> | ||
<mxCell id="1" parent="0" /> | ||
<mxCell id="lYxadVdlI73aAsubxVjZ-5" value="true" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;strokeColor=#000000;strokeWidth=2;fontColor=#000000;labelBackgroundColor=none;fontSize=12;" parent="1" source="lYxadVdlI73aAsubxVjZ-2" target="lYxadVdlI73aAsubxVjZ-4" edge="1"> | ||
<mxGeometry x="-0.0108" y="10" relative="1" as="geometry"> | ||
<mxPoint as="offset" /> | ||
</mxGeometry> | ||
</mxCell> | ||
<mxCell id="lYxadVdlI73aAsubxVjZ-7" value="false" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=0.5;entryY=0;entryDx=0;entryDy=0;strokeColor=#000000;strokeWidth=2;fontColor=#000000;labelBackgroundColor=none;fontSize=12;" parent="1" source="lYxadVdlI73aAsubxVjZ-2" target="lYxadVdlI73aAsubxVjZ-6" edge="1"> | ||
<mxGeometry x="-0.0092" y="-10" relative="1" as="geometry"> | ||
<Array as="points"> | ||
<mxPoint x="355" y="140" /> | ||
<mxPoint x="198" y="140" /> | ||
</Array> | ||
<mxPoint as="offset" /> | ||
</mxGeometry> | ||
</mxCell> | ||
<mxCell id="lYxadVdlI73aAsubxVjZ-2" value="use_pipelines" style="rounded=1;whiteSpace=wrap;html=1;fillColor=#dae8fc;strokeColor=#6c8ebf;" parent="1" vertex="1"> | ||
<mxGeometry x="295" y="30" width="120" height="60" as="geometry" /> | ||
</mxCell> | ||
<mxCell id="lYxadVdlI73aAsubxVjZ-4" value="text_expansion<div>query</div>" style="rhombus;whiteSpace=wrap;html=1;fillColor=#d5e8d4;strokeColor=#82b366;" parent="1" vertex="1"> | ||
<mxGeometry x="415" y="190" width="145" height="90" as="geometry" /> | ||
</mxCell> | ||
<mxCell id="2SGeM3FuVzMsr3sRwzxZ-1" value="true" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;strokeColor=#000000;strokeWidth=2;fontColor=#000000;labelBackgroundColor=none;fontSize=12;" edge="1" parent="1" source="lYxadVdlI73aAsubxVjZ-6" target="lYxadVdlI73aAsubxVjZ-8"> | ||
<mxGeometry x="-0.0108" y="18" relative="1" as="geometry"> | ||
<mxPoint y="1" as="offset" /> | ||
</mxGeometry> | ||
</mxCell> | ||
<mxCell id="2SGeM3FuVzMsr3sRwzxZ-3" value="false" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=0.5;entryY=0;entryDx=0;entryDy=0;strokeColor=#000000;strokeWidth=2;fontColor=#000000;labelBackgroundColor=none;fontSize=12;" edge="1" parent="1" source="lYxadVdlI73aAsubxVjZ-6" target="2SGeM3FuVzMsr3sRwzxZ-2"> | ||
<mxGeometry x="0.0602" y="-17" relative="1" as="geometry"> | ||
<mxPoint as="offset" /> | ||
</mxGeometry> | ||
</mxCell> | ||
<mxCell id="lYxadVdlI73aAsubxVjZ-6" value="use_nested_text_expansion" style="rounded=1;whiteSpace=wrap;html=1;fillColor=#dae8fc;strokeColor=#6c8ebf;" parent="1" vertex="1"> | ||
<mxGeometry x="100" y="205" width="195" height="60" as="geometry" /> | ||
</mxCell> | ||
<mxCell id="lYxadVdlI73aAsubxVjZ-8" value="nested<div>text_expansion</div><div>query</div>" style="rhombus;whiteSpace=wrap;html=1;fillColor=#d5e8d4;strokeColor=#82b366;" parent="1" vertex="1"> | ||
<mxGeometry x="230" y="370" width="150" height="80" as="geometry" /> | ||
</mxCell> | ||
<mxCell id="2SGeM3FuVzMsr3sRwzxZ-2" value="semantic<div>query</div>" style="rhombus;whiteSpace=wrap;html=1;fillColor=#d5e8d4;strokeColor=#82b366;" vertex="1" parent="1"> | ||
<mxGeometry x="20" y="370" width="150" height="80" as="geometry" /> | ||
</mxCell> | ||
</root> | ||
</mxGraphModel> | ||
</diagram> | ||
</mxfile> |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.