Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text in matplotlib svg #1097

Merged
merged 10 commits into from
Oct 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,16 @@ Major changes
Minor changes
-------------

* Display of labels in the plots of the TableReport, especially for other
scripts than the latin alphabet, has improved.
- before, some characters could be missing and replaced by empty boxes.
- before, when the text is truncated, the ellipsis "..." could appear on the
wrong side for right-to-left scripts.
Moreover, when the text contains line breaks it now appears all on one line.
Note this only affects the labels in the plots; the rest of the report did not
have these problems.
:pr:`1097` by :user:`Jérôme Dockès <jeromedockes>`.

* In the TableReport it is now possible, before clicking any of the cells, to
reach the dataframe sample table and activate a cell with tab key navigation.
:pr:`1101` by :user:`Jérôme Dockès <jeromedockes>`.
Expand Down
4 changes: 4 additions & 0 deletions skrub/_reporting/_data/templates/base.css
Original file line number Diff line number Diff line change
Expand Up @@ -180,6 +180,10 @@ dd {
margin-top: var(--space-s);
}

.margin-t-m {
margin-top: var(--space-m);
}

.horizontal-scroll {
overflow-x: auto;
}
Expand Down
1 change: 1 addition & 0 deletions skrub/_reporting/_data/templates/column-summaries.css
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,7 @@
flex-direction: column;
}


/* Grid of boxes with copybuttons for a column's most frequent values */

.copybutton-grid {
Expand Down
4 changes: 3 additions & 1 deletion skrub/_reporting/_data/templates/column-summary.html
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,9 @@ <h3 class="margin-r-m">

{% for plot_name in column.plot_names %}
<div>
{{ column[plot_name] | safe }}
<div class="margin-t-m" data-manager="SvgAdjustedViewBox">
{{ column[plot_name] | safe }}
</div>
{% if plot_name == "value_counts_plot" %}
<details data-test="frequent-values-details">
<summary>Most frequent values</summary>
Expand Down
137 changes: 137 additions & 0 deletions skrub/_reporting/_data/templates/report.js
Original file line number Diff line number Diff line change
Expand Up @@ -759,6 +759,143 @@ if (customElements.get('skrub-table-report') === undefined) {
}
SkrubTableReport.register(Toggletip);

/*
In the matplotlib svg plots, the labels are stored as text (we want the
browser, rather than matplotlib, to choose the font & render the text, and
this also makes the plots smaller than letting matplotlib draw the
glyphs). As matplotlib may use a different font than the one eventually
chosen by the browser, it cannot compute the correct viewbox for the svg.

When the page loads, we render the svg plot and iterate over all children
to compute the correct viewbox. We then adjust the svg element's width and
height (otherwise if we put a wider viewbox but don't adjust the size we
effectively zoom out and details will appear smaller).

In the default report view, all plots are hidden (they only show up if we
select a column or change the displayed tab panel). Thus when the page
loads they are not rendered. To force rendering the svg so that we get
correct bounding boxes for all the child elements, we clone it and insert
the clone in the DOM (but with absolute positioning and a big offset so it
is outside of the viewport and the user does not see it). We insert the
clone as a child of the #report element so that we know it is displayed
and uses the same font family and size as the actual figure we want to
resize. Once we have the viewbox we remove the clone from the DOM.
*/
class SvgAdjustedViewBox extends Manager {
constructor(elem, exchange) {
super(elem, exchange);
this.adjustViewBox();
}

computeViewBox(svg) {
try {
const {
width
} = svg.getBBox();
if (width === 0) {
return null;
}
} catch (e) {
return null;
}
let [xMin, yMin, xMax, yMax] = [null, null, null, null];
for (const child of svg.children) {
if (typeof child.getBBox !== 'function') {
continue;
}
const {
x,
y,
width,
height
} = child.getBBox();
if (width === 0 || height === 0){
continue;
}
if (xMin === null) {
xMin = x;
yMin = y;
xMax = x + width;
yMax = y + height;
continue;
}
xMin = Math.min(x, xMin);
yMin = Math.min(y, yMin);
xMax = Math.max(x + width, xMax);
yMax = Math.max(y + height, yMax);
}
if (xMin === null) {
return null;
}
return {
x: xMin,
y: yMin,
width: xMax - xMin,
height: yMax - yMin
};
}

/*
Adjust the svg element's width and height so that if we need to set a
wider viewbox, we get a bigger figure rather than zooming out while
keeping the figure size constant.
*/
adjustSize(svg, newViewBox, attribute) {
const match = svg.getAttribute(attribute).match(/^([0-9.]+)(.+)$/);
if (!match) {
return;
}
const size = Number(match[1]);
if (isNaN(size)) {
return;
}
const unit = match[2];
const scale = newViewBox[attribute] / svg.viewBox.baseVal[attribute];
const newSize = size * scale;
if (isNaN(newSize)) {
return;
}
svg.setAttribute(attribute, `${newSize}${unit}`);
}

adjustViewBox() {
const svg = this.elem.querySelector('svg');

// The svg is inside a div with {display: none} in its style. So it
// is not rendered and all bounding boxes will have 0 width and
// height. We insert a clone higher up the DOM below #report, which
// we know is displayed. To avoid the user seeing it flash we position
// it outside of the viewport. The column summary cards use the same
// font family & size as #report so the computed sizes will be the
// same as those of the actual svg when it is rendered.

const report = this.elem.getRootNode().getElementById('report');
const clone = svg.cloneNode(true);
clone.style.position = 'absolute';
clone.style.left = '-9999px';
clone.style.top = '-9999px';
// (visibility = 'hidden' still requires the size to be computed and
// thus the svg to be rendered.)
clone.style.visibility = 'hidden';
report.appendChild(clone);

try {
const viewBox = this.computeViewBox(clone);
if (viewBox !== null) {
this.adjustSize(svg, viewBox, 'width');
this.adjustSize(svg, viewBox, 'height');
svg.setAttribute('viewBox',
`${viewBox.x} ${viewBox.y} ${viewBox.width} ${viewBox.height}`
);
}
} finally {
report.removeChild(clone);
}
}

}
SkrubTableReport.register(SvgAdjustedViewBox);

function initReport(reportId) {
const report = document.getElementById(reportId);
report.init();
Expand Down
37 changes: 35 additions & 2 deletions skrub/_reporting/_plotting.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,20 @@
The figures are returned in the form of svg strings.
"""

import functools
import io
import re
import warnings

import matplotlib
from matplotlib import pyplot as plt

from skrub import _dataframe as sbd

from . import _utils

__all__ = ["COLORS", "COLOR_0", "histogram", "line", "value_counts"]

# from matplotlib import colormaps, colors
# _TAB10 = list(map(colors.rgb2hex, colormaps.get_cmap("tab10").colors))

Expand All @@ -34,6 +39,30 @@
COLOR_0 = COLORS[0]


def _plot(plotting_fun):
"""Set the maptlotib config & silence some warnings for all report plots.

All the plotting functions exposed by this module should be decorated with
`_plot`.
"""

@functools.wraps(plotting_fun)
def plot_with_config(*args, **kwargs):
# This causes matplotlib to insert labels etc as text in the svg rather
# than drawing the glyphs.
with matplotlib.rc_context({"svg.fonttype": "none"}):
with warnings.catch_warnings():
# We do not care about missing glyphs because the text is
# rendered & the viewbox is recomputed in the browser.
warnings.filterwarnings("ignore", "Glyph.*missing from font")
warnings.filterwarnings(
"ignore", "Matplotlib currently does not support Arabic natively"
)
return plotting_fun(*args, **kwargs)

return plot_with_config


def _despine(ax):
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
Expand All @@ -49,6 +78,7 @@ def _to_em(pt_match):

def _serialize(fig):
buffer = io.BytesIO()
fig.patch.set_visible(False)
fig.savefig(buffer, format="svg", bbox_inches="tight")
out = buffer.getvalue().decode("UTF-8")
out = re.sub(r'(width|height)="([0-9.]+)pt"', _to_em, out)
Expand Down Expand Up @@ -84,6 +114,7 @@ def _adjust_fig_size(fig, ax, target_w, target_h):
fig.set_size_inches((w, h))


@_plot
def histogram(col, color=COLOR_0):
"""Histogram for a numeric column."""
values = sbd.to_numpy(col)
Expand All @@ -96,6 +127,7 @@ def histogram(col, color=COLOR_0):
return _serialize(fig)


@_plot
def line(x_col, y_col):
"""Line plot for a numeric column.

Expand All @@ -108,13 +140,14 @@ def line(x_col, y_col):
fig, ax = plt.subplots()
_despine(ax)
ax.plot(x, y)
ax.set_xlabel(_utils.ellide_string_short(x_col.name))
ax.set_xlabel(_utils.ellide_string(x_col.name))
if sbd.is_any_date(x_col):
_rotate_ticklabels(ax)
_adjust_fig_size(fig, ax, 2.0, 1.0)
return _serialize(fig)


@_plot
def value_counts(value_counts, n_unique, n_rows, color=COLOR_0):
"""Bar plot of the frequencies of the most frequent values in a column.

Expand All @@ -139,7 +172,7 @@ def value_counts(value_counts, n_unique, n_rows, color=COLOR_0):
str
The plot as a XML string.
"""
values = [_utils.ellide_string_short(v) for v, _ in value_counts][::-1]
values = [_utils.ellide_string(v) for v, _ in value_counts][::-1]
counts = [c for _, c in value_counts][::-1]
if n_unique > len(value_counts):
title = f"{len(value_counts)} most frequent"
Expand Down
48 changes: 38 additions & 10 deletions skrub/_reporting/_utils.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
import base64
import json
import numbers
import re
import unicodedata

import numpy as np

Expand Down Expand Up @@ -44,20 +46,46 @@ def quantiles(column):
return {q: sbd.quantile(column, q) for q in [0.0, 0.25, 0.5, 0.75, 1.0]}


def ellide_string(s, max_len=100):
def ellide_string(s, max_len=30):
"""Shorten a string so it can be used as a plot axis title or label."""
if not isinstance(s, str):
return s
# normalize whitespace
s = re.sub(r"\s+", " ", s)
if len(s) <= max_len:
return s
if max_len < 30:
return s[:max_len] + "…"
shown_len = max_len - 30
truncated = len(s) - shown_len
return s[:shown_len] + f"[…{truncated} more chars]"


def ellide_string_short(s):
return ellide_string(s, 29)
shown_text = s[:max_len].strip()
ellipsis = "…"
end = ""

# The ellipsis, like most punctuation, is a neutral character (it has no
# writing direction). As here it is the last character in the sentence, its
# direction will be that of the paragraph and it might be displayed on the
# wrong side of the text (eg on the right, at the beginning of the text
# rather than the end, if the text is written in a right-to-left script).
# As a simple heuristic to correct this, we force the ellipsis to have the
# same direction as the last character before the truncation. This is done
# by appending a mark (a zero-width space with the writing direction we
# want, so that the ellipsis is enclosed between 2 strong characters with
# the same direction and thus inherits that direction).

if shown_text:
direction = unicodedata.bidirectional(shown_text[-1])
if direction in [
"R",
"RLE",
"RLO",
"RLI",
]:
# RIGHT-TO-LEFT MARK
end = "\u200f"
elif direction in ["AL"]:
# ARABIC LETTER MARK
end = "\u061c"
elif direction in ["L", "LRE", "LRO", "LRI"]:
# LEFT-TO-RIGHT MARK
end = "\u200e"
return shown_text + ellipsis + end


def format_number(number):
Expand Down
Loading