Merge pull request #936 from OxalisCu/bulkinsert-support-csv

feat: importv2 support csv
milvus-io · Sep 12, 2024 · be23fb6 · be23fb6
2 parents 089edf4 + 6dab31f
commit be23fb6
Show file tree

Hide file tree

Showing 5 changed files with 53 additions and 3 deletions.
diff --git a/API_Reference/milvus-restful/v2.4.x/v2/Import (v2)/Create.md b/API_Reference/milvus-restful/v2.4.x/v2/Import (v2)/Create.md
@@ -82,6 +82,8 @@ Possible response is similar to the following
 | `files[][]`  | __string__<br/>A list of file paths, relative to the root of your Milvus bucket on the MinIO instance shipped along with the Milvus instance.  |
 | `options` | __object__<br/>Bulk-import options. |
 | `options.timeout`  | __string__<br/>The timeout duration of the created import jobs. The value should be a positive number suffixed by __s__ (seconds), __m__ (minutes), and __h__(hours). For example, _300s_, _1.5h_, and _1h45_ are all valid values.  |
+| `options.sep` | __string__<br/>The delimiter of CSV file. The value must be a string of length 1, which defaults to ```","```. And the following strings are not allowed: ```"\0"```, ```"\n"```, ```"\r"```, ```"""```. |
+| `options.nullkey` | __string__<br/>Special string representing null value. The value defaults to empty string: ```""```. |
 
 ## Response
 

diff --git a/API_Reference/pymilvus/v2.4.x/DataImport/BulkFileType.md b/API_Reference/pymilvus/v2.4.x/DataImport/BulkFileType.md
@@ -13,6 +13,9 @@ Sets the file type to **JSON** (*.json*).
 - **PARQUET** = 3
 Sets the file type to [Parquet](https://parquet.apache.org/) (*.parquet*).
 
+- **CSV** = 4
+Sets the file type to **CSV** (*.csv*).
+
 ## Examples
 
 ```python

diff --git a/API_Reference/pymilvus/v2.4.x/DataImport/Config.md b/API_Reference/pymilvus/v2.4.x/DataImport/Config.md
@@ -0,0 +1,37 @@
+# Config
+
+The configuration of the **CSV** format is a dict type,  which includes two fields: **sep** and **nullkey**.
+
+## Fields
+
+ **sep** (*string*)
+
+The delimiter of CSV file.
+
+The value must be a string of length 1, which defaults to ```","```.
+
+And the following strings are not allowed: ```"\0"```, ```"\n"```, ```"\r"```, ```"""```.
+
+- **nullkey** (*string*)
+
+Special string representing null value.
+
+The value defaults to empty string: ```""```.
+
+## Examples
+
+```python
+from pymilvus import LocalBulkWriter, BulkFileType
+
+local_writer = LocalBulkWriter(
+    schema=schema,
+    local_path=Path(OUTPUT_PATH).joinpath('csv'),
+    segment_size=4*1024*1024,
+    file_type=BulkFileType.CSV,
+    # highlight-next
+    config={
+      "sep": "\t",
+      "nullkey": "NULL"
+    }
+)
+```
diff --git a/API_Reference/pymilvus/v2.4.x/DataImport/LocalBulkWriter/LocalBulkWriter.md b/API_Reference/pymilvus/v2.4.x/DataImport/LocalBulkWriter/LocalBulkWriter.md
@@ -57,7 +57,7 @@ writer = LocalBulkWriter(
 
     <p>The way <strong>BulkWriter</strong> segments your data varies with the target file type.</p>
     <ul>
-    <li><strong>JSON_RB</strong> or <strong>Parquet</strong></li>
+    <li><strong>JSON_RB</strong>, <strong>Parquet</strong> or <strong>CSV</strong></li>
     </ul>
     <p>If the generated file exceeds the specified segment size, <strong>BulkWriter</strong> creates multiple files and names them in sequence numbers, each no larger than the segment size.</p>
     <ul>
@@ -73,7 +73,11 @@ writer = LocalBulkWriter(
 
     The value defaults to **BulkFileType.NPY**. 
 
-    Possible options are **BulkFileType.NPY**, **BulkFileType.JSON_RB** and **BulkFileType.PARQUET**.
+    Possible options are **BulkFileType.NPY**, **BulkFileType.JSON_RB**, **BulkFileType.PARQUET** and **BulkFileType.CSV**.
+
+- **config** (*[Config](../Config.md)*) -
+
+    The configuration of the **CSV** format currently.
 
 **RETURN TYPE:**
 

diff --git a/API_Reference/pymilvus/v2.4.x/DataImport/RemoteBulkWriter/RemoteBulkWriter.md b/API_Reference/pymilvus/v2.4.x/DataImport/RemoteBulkWriter/RemoteBulkWriter.md
@@ -62,7 +62,7 @@ writer = RemoteBulkWriter(
 
     <p>The way <strong>BulkWriter</strong> segments your data varies with the target file type.</p>
     <ul>
-    <li><strong>JSON_RB</strong> or <strong>Parquet</strong></li>
+    <li><strong>JSON_RB</strong>, <strong>Parquet</strong> or <strong>CSV</strong></li>
     </ul>
     <p>If the generated file exceeds the specified segment size, <strong>BulkWriter</strong> creates multiple files and names them in sequence numbers, each no larger than the segment size.</p>
     <ul>
@@ -78,6 +78,10 @@ writer = RemoteBulkWriter(
 
     The value defaults to **BulkFileType.NPY**. 
 
+- **config** (*[Config](../Config.md)*) -
+
+    The configuration of the **CSV** format currently.
+
 **RETURN TYPE:**
 
 *RemoteBulkWriter*