From 6dab31ffedcddeb417f123dbe69b0a9598ca6884 Mon Sep 17 00:00:00 2001 From: OxalisCu <2127298698@qq.com> Date: Tue, 10 Sep 2024 15:39:35 +0800 Subject: [PATCH] feat: importv2 support csv Signed-off-by: OxalisCu <2127298698@qq.com> --- .../v2.4.x/v2/Import (v2)/Create.md | 2 + .../v2.4.x/DataImport/BulkFileType.md | 3 ++ .../pymilvus/v2.4.x/DataImport/Config.md | 37 +++++++++++++++++++ .../LocalBulkWriter/LocalBulkWriter.md | 8 +++- .../RemoteBulkWriter/RemoteBulkWriter.md | 6 ++- 5 files changed, 53 insertions(+), 3 deletions(-) create mode 100644 API_Reference/pymilvus/v2.4.x/DataImport/Config.md diff --git a/API_Reference/milvus-restful/v2.4.x/v2/Import (v2)/Create.md b/API_Reference/milvus-restful/v2.4.x/v2/Import (v2)/Create.md index 3e7b272f4..931b4c364 100644 --- a/API_Reference/milvus-restful/v2.4.x/v2/Import (v2)/Create.md +++ b/API_Reference/milvus-restful/v2.4.x/v2/Import (v2)/Create.md @@ -82,6 +82,8 @@ Possible response is similar to the following | `files[][]` | __string__
A list of file paths, relative to the root of your Milvus bucket on the MinIO instance shipped along with the Milvus instance. | | `options` | __object__
Bulk-import options. | | `options.timeout` | __string__
The timeout duration of the created import jobs. The value should be a positive number suffixed by __s__ (seconds), __m__ (minutes), and __h__(hours). For example, _300s_, _1.5h_, and _1h45_ are all valid values. | +| `options.sep` | __string__
The delimiter of CSV file. The value must be a string of length 1, which defaults to ```","```. And the following strings are not allowed: ```"\0"```, ```"\n"```, ```"\r"```, ```"""```. | +| `options.nullkey` | __string__
Special string representing null value. The value defaults to empty string: ```""```. | ## Response diff --git a/API_Reference/pymilvus/v2.4.x/DataImport/BulkFileType.md b/API_Reference/pymilvus/v2.4.x/DataImport/BulkFileType.md index 6e6610521..63cdb6377 100644 --- a/API_Reference/pymilvus/v2.4.x/DataImport/BulkFileType.md +++ b/API_Reference/pymilvus/v2.4.x/DataImport/BulkFileType.md @@ -13,6 +13,9 @@ Sets the file type to **JSON** (*.json*). - **PARQUET** = 3 Sets the file type to [Parquet](https://parquet.apache.org/) (*.parquet*). +- **CSV** = 4 +Sets the file type to **CSV** (*.csv*). + ## Examples ```python diff --git a/API_Reference/pymilvus/v2.4.x/DataImport/Config.md b/API_Reference/pymilvus/v2.4.x/DataImport/Config.md new file mode 100644 index 000000000..2bed40227 --- /dev/null +++ b/API_Reference/pymilvus/v2.4.x/DataImport/Config.md @@ -0,0 +1,37 @@ +# Config + +The configuration of the **CSV** format is a dict type, which includes two fields: **sep** and **nullkey**. + +## Fields + + **sep** (*string*) + +The delimiter of CSV file. + +The value must be a string of length 1, which defaults to ```","```. + +And the following strings are not allowed: ```"\0"```, ```"\n"```, ```"\r"```, ```"""```. + +- **nullkey** (*string*) + +Special string representing null value. + +The value defaults to empty string: ```""```. + +## Examples + +```python +from pymilvus import LocalBulkWriter, BulkFileType + +local_writer = LocalBulkWriter( + schema=schema, + local_path=Path(OUTPUT_PATH).joinpath('csv'), + segment_size=4*1024*1024, + file_type=BulkFileType.CSV, + # highlight-next + config={ + "sep": "\t", + "nullkey": "NULL" + } +) +``` \ No newline at end of file diff --git a/API_Reference/pymilvus/v2.4.x/DataImport/LocalBulkWriter/LocalBulkWriter.md b/API_Reference/pymilvus/v2.4.x/DataImport/LocalBulkWriter/LocalBulkWriter.md index 96f3971bb..145f1e27a 100644 --- a/API_Reference/pymilvus/v2.4.x/DataImport/LocalBulkWriter/LocalBulkWriter.md +++ b/API_Reference/pymilvus/v2.4.x/DataImport/LocalBulkWriter/LocalBulkWriter.md @@ -57,7 +57,7 @@ writer = LocalBulkWriter(

The way BulkWriter segments your data varies with the target file type.

If the generated file exceeds the specified segment size, BulkWriter creates multiple files and names them in sequence numbers, each no larger than the segment size.