Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(blog): classification metrics on the backend #10501

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open
27 changes: 27 additions & 0 deletions docs/posts/classification-metrics-on-the-backend/index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,7 @@ cm = (
t.group_by("outcome")
.agg(counted=_.count())
.pivot_wider(names_from="outcome", values_from="counted")
.select("TP", "FP", "FN", "TN")
)

cm
IndexSeek marked this conversation as resolved.
Show resolved Hide resolved
Expand Down Expand Up @@ -172,6 +173,32 @@ metrics = cm.select(
metrics
```

## A more efficient approach
IndexSeek marked this conversation as resolved.
Show resolved Hide resolved

In the illustrative example above, we used a case expression and pivoted the data the
IndexSeek marked this conversation as resolved.
Show resolved Hide resolved
demonstrate where the values would fall in the confusion matrix and then performed our
IndexSeek marked this conversation as resolved.
Show resolved Hide resolved
metric calculations using the pivoted data. We can actually skip this step using column
aggregation.

```{python}
tp = (t.actual * t.prediction).sum()
fp = t.prediction.sum() - tp
fn = t.actual.sum() - tp
tn = t.actual.count() - tp - fp - fn

accuracy_expr = (tp + tn) / (tp + tn + fp + fn)
IndexSeek marked this conversation as resolved.
Show resolved Hide resolved
precision_expr = tp / (tp + fp)
IndexSeek marked this conversation as resolved.
Show resolved Hide resolved
recall_expr = tp / (tp + fn)
IndexSeek marked this conversation as resolved.
Show resolved Hide resolved
f1_score_expr = 2 * (precision_expr * recall_expr) / (precision_expr + recall_expr)
IndexSeek marked this conversation as resolved.
Show resolved Hide resolved

t.select(
accuracy=accuracy_expr,
precision=precision_expr,
recall=recall_expr,
f1_score=f1_score_expr,
).limit(1)
Copy link
Member Author

@IndexSeek IndexSeek Nov 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a better way we could render these results? I was fiddling around with:

print(f"{accuracy_expr=}, {precision_expr=}, {recall_expr=}, {f1_score_expr=}")

But it wasn't rendering nicely.

image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.execute() should work (or .to_pyarrow().as_py() or some of the other .to_* export methods)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up using to_pyarrow().as_py(). I suspect some readers may like to see that we can bring this to a Python object.

image

```

## Conclusion

By pushing the computation down to the backend, the performance is as powerful as the
Expand Down
Loading