Skip to content

Commit

Permalink
Merge pull request #936 from OxalisCu/bulkinsert-support-csv
Browse files Browse the repository at this point in the history
feat: importv2 support csv
  • Loading branch information
shanghaikid authored Sep 12, 2024
2 parents 089edf4 + 6dab31f commit be23fb6
Show file tree
Hide file tree
Showing 5 changed files with 53 additions and 3 deletions.
2 changes: 2 additions & 0 deletions API_Reference/milvus-restful/v2.4.x/v2/Import (v2)/Create.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,8 @@ Possible response is similar to the following
| `files[][]` | __string__<br/>A list of file paths, relative to the root of your Milvus bucket on the MinIO instance shipped along with the Milvus instance. |
| `options` | __object__<br/>Bulk-import options. |
| `options.timeout` | __string__<br/>The timeout duration of the created import jobs. The value should be a positive number suffixed by __s__ (seconds), __m__ (minutes), and __h__(hours). For example, _300s_, _1.5h_, and _1h45_ are all valid values. |
| `options.sep` | __string__<br/>The delimiter of CSV file. The value must be a string of length 1, which defaults to ```","```. And the following strings are not allowed: ```"\0"```, ```"\n"```, ```"\r"```, ```"""```. |
| `options.nullkey` | __string__<br/>Special string representing null value. The value defaults to empty string: ```""```. |

## Response

Expand Down
3 changes: 3 additions & 0 deletions API_Reference/pymilvus/v2.4.x/DataImport/BulkFileType.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@ Sets the file type to **JSON** (*.json*).
- **PARQUET** = 3
Sets the file type to [Parquet](https://parquet.apache.org/) (*.parquet*).

- **CSV** = 4
Sets the file type to **CSV** (*.csv*).

## Examples

```python
Expand Down
37 changes: 37 additions & 0 deletions API_Reference/pymilvus/v2.4.x/DataImport/Config.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Config

The configuration of the **CSV** format is a dict type, which includes two fields: **sep** and **nullkey**.

## Fields

**sep** (*string*)

The delimiter of CSV file.

The value must be a string of length 1, which defaults to ```","```.

And the following strings are not allowed: ```"\0"```, ```"\n"```, ```"\r"```, ```"""```.

- **nullkey** (*string*)

Special string representing null value.

The value defaults to empty string: ```""```.

## Examples

```python
from pymilvus import LocalBulkWriter, BulkFileType

local_writer = LocalBulkWriter(
schema=schema,
local_path=Path(OUTPUT_PATH).joinpath('csv'),
segment_size=4*1024*1024,
file_type=BulkFileType.CSV,
# highlight-next
config={
"sep": "\t",
"nullkey": "NULL"
}
)
```
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ writer = LocalBulkWriter(

<p>The way <strong>BulkWriter</strong> segments your data varies with the target file type.</p>
<ul>
<li><strong>JSON_RB</strong> or <strong>Parquet</strong></li>
<li><strong>JSON_RB</strong>, <strong>Parquet</strong> or <strong>CSV</strong></li>
</ul>
<p>If the generated file exceeds the specified segment size, <strong>BulkWriter</strong> creates multiple files and names them in sequence numbers, each no larger than the segment size.</p>
<ul>
Expand All @@ -73,7 +73,11 @@ writer = LocalBulkWriter(

The value defaults to **BulkFileType.NPY**.

Possible options are **BulkFileType.NPY**, **BulkFileType.JSON_RB** and **BulkFileType.PARQUET**.
Possible options are **BulkFileType.NPY**, **BulkFileType.JSON_RB**, **BulkFileType.PARQUET** and **BulkFileType.CSV**.

- **config** (*[Config](../Config.md)*) -

The configuration of the **CSV** format currently.

**RETURN TYPE:**

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ writer = RemoteBulkWriter(

<p>The way <strong>BulkWriter</strong> segments your data varies with the target file type.</p>
<ul>
<li><strong>JSON_RB</strong> or <strong>Parquet</strong></li>
<li><strong>JSON_RB</strong>, <strong>Parquet</strong> or <strong>CSV</strong></li>
</ul>
<p>If the generated file exceeds the specified segment size, <strong>BulkWriter</strong> creates multiple files and names them in sequence numbers, each no larger than the segment size.</p>
<ul>
Expand All @@ -78,6 +78,10 @@ writer = RemoteBulkWriter(

The value defaults to **BulkFileType.NPY**.

- **config** (*[Config](../Config.md)*) -

The configuration of the **CSV** format currently.

**RETURN TYPE:**

*RemoteBulkWriter*
Expand Down

0 comments on commit be23fb6

Please sign in to comment.