Skip to content

Commit

Permalink
feat: removed legacy sql query
Browse files Browse the repository at this point in the history
  • Loading branch information
rafaelkallis committed Sep 1, 2020
1 parent 99469c3 commit bdc12ce
Showing 1 changed file with 2 additions and 26 deletions.
28 changes: 2 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,10 +111,10 @@ npm run benchmark
Datasets can be downloaded either using `npm run dataset:balanced` or `npm run dataset:unbalanced`.
The datasets were generated using github archive's which can be accessed through google [BigQuery](https://bigquery.cloud.google.com).

Add the query below to your BigQuery console and adjust if needed (e.g., add `__label__` prefix to labels, etc.).
Add the query below to your BigQuery console and adjust if needed (e.g., add `__label__` prefix to labels, truncate issues to create a balanced dataset, etc.).

```sql
-- unbalanced dataset
-- unbalanced dataset

SELECT
label,
Expand All @@ -135,30 +135,6 @@ WHERE
AND body != 'null';
```

```sql
-- balanced dataset

SELECT
label, CONCAT(title, ' ', REGEXP_REPLACE(body, '(\r|\n|\r\n)',' '))
FROM (
SELECT
LOWER(JSON_EXTRACT_SCALAR(payload, '$.issue.labels[0].name')) AS label,
JSON_EXTRACT_SCALAR(payload, '$.issue.title') AS title,
JSON_EXTRACT_SCALAR(payload, '$.issue.body') AS body
FROM
[githubarchive:day.20180201],
[githubarchive:day.20180202],
[githubarchive:day.20180203],
[githubarchive:day.20180204],
[githubarchive:day.20180205]
WHERE
type = 'IssuesEvent'
AND JSON_EXTRACT_SCALAR(payload, '$.action') = 'closed' )
WHERE
(label = 'bug' OR label = 'enhancement' OR label = 'question')
AND body != 'null';
```

#### run serverless app:

You need a `.env` file in order to run the github app.
Expand Down

0 comments on commit bdc12ce

Please sign in to comment.