Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support character based line wrapping (#649) #657

Merged
merged 7 commits into from
Feb 28, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ This can also be enabled programmatically with `warnings.simplefilter('default',

## [2.6.2] - Not released yet
### Added
* [`FPDF.multi_cell()`](https://pyfpdf.github.io/fpdf2/fpdf/fpdf.html#fpdf.fpdf.FPDF.multi_cell) and [`FPDF.write()`](https://pyfpdf.github.io/fpdf2/fpdf/fpdf.html#fpdf.fpdf.FPDF.write) now accept a `wrapmode` argument for word or character based line wrapping ("WORD"/"CHAR").
gmischler marked this conversation as resolved.
Show resolved Hide resolved
- [`FPDF.image()`](https://pyfpdf.github.io/fpdf2/fpdf/fpdf.html#fpdf.fpdf.FPDF.image) has a new `keep_aspect_ratio` optional boolean parameter, to fit it inside a given rectangle: [documentation](https://pyfpdf.github.io/fpdf2/Images.html#fitting-an-image-inside-a-rectangle)
- new method `FPDF.preload_image()`: [documentation](https://pyfpdf.github.io/fpdf2/fpdf/fpdf.html#fpdf.fpdf.FPDF.preload_image)
- new translation of the tutorial in [简体中文](https://pyfpdf.github.io/fpdf2/Tutorial-zh.html) - thanks to @Bubbu0129
Expand Down
27 changes: 19 additions & 8 deletions docs/Text.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,11 @@ There are several ways in fpdf to add text to a PDF document, each of which come
| [`.cell()`](#cell) | one | yes | no | yes | Inserts a single-line text string within the boundaries of a given box, optionally with background and border. |
| [`.multi_cell()`](#multi_cell) | several | yes | no | yes | Inserts a multi-line text string within the boundaries of a given box, optionally with background and border. |
| [`.write()`](#write) | several | no | no | auto | Inserts a multi-line text string within the boundaries of the page margins, starting at the current x/y location (typically the end of the last inserted text). |
| [`.write_html()`](#write_html) | several | no | yes | auto | From [html.py](HTML.html). An extension to `.write()`, with additional parsing of basic HTML tags.
| [`.write_html()`](#write_html) | several | no | yes | auto | An extension to `.write()`, with additional parsing of basic HTML tags.

## Typographical Limitations
## Typography and Language Specific Concepts

### Limitations
There are a few advanced typesetting features that fpdf doesn't currently support.

* Automatic ligatures - Some writing systems (eg. most Indic scripts such as Devaganari, Tamil, Kannada) frequently combine a number of written characters into a single glyph. This would require advanced font analysis capabilities, which aren't currently implemented.
Expand All @@ -30,6 +31,13 @@ some_text = 'اَلْعَرَبِيَّةُכַּף סוֹפִית'
fixed_text = get_display(reshape(some_text))
```

### Character or Word Based Line Wrapping
By default, `multi_line()` and `write()` will wrap lines based on words, using space characters and soft hyphens as seperators.
For languages like Chinese and Japanese, that don't usually seperate their words, character based wrapping is more appropriate.
In such a case, the argument `wrapmode="CHAR"` can be used (the default is "WORD"), and each line will get broken right before the
character that doesn't fit anymore.


## Text Formatting
For all text insertion methods, the relevant font related properties (eg. font/style and foreground/background color) must be set before invoking them. This includes using:

Expand Down Expand Up @@ -72,9 +80,9 @@ page break is performed before outputting.
[Signature and parameters for.cell()](https://pyfpdf.github.io/fpdf2/fpdf/fpdf.html#fpdf.fpdf.FPDF.cell)

## .multi_cell()
Allows printing text with line breaks. Those can be automatic (breaking at the
most recent space or soft-hyphen character) as soon as the text reaches the
right border of the cell, or explicit (via the `\\n` character).
Allows printing text with word or character based line breaks. Those can be automatic
(breaking at the most recent space or soft-hyphen character) as soon as the text
reaches the right border of the cell, or explicit (via the `\\n` character).
As many cells as necessary are stacked, one below the other.
Text can be aligned, centered or justified. The cell block can be framed and
the background painted.
Expand All @@ -90,9 +98,12 @@ When `split_only == True`, returns `txt` split into lines in an array (with any
## .write()
Prints multi-line text between the page margins, starting from the current position.
When the right margin is reached, a line break occurs at the most recent
space or soft-hyphen character, and text continues from the left margin.
A manual break happens any time the \\n character is met,
Upon method exit, the current position is left near the end of the text, ready for the next call to continue without a gap, potentially with a different font or size set. Returns a boolean indicating if page break was triggered.
space or soft-hyphen character (in word wrap mode) or at the current position (in
character break mode), and text continues from the left margin.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly to the "Right-to-Left & Arabic Script workaround" section above,
maybe an explicit suggestion could be made of using wrapmode='CHAR' for langages like chinese or japanese that do not separate words with spaces?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole section is about limitations (and their workarounds) right now, which this really isn't.
But we can turn it into a general typography and language specifics section.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we can turn it into a general typography and language specifics section.

Yes, this would be great!

A manual break happens any time the \\n character is met.
Upon method exit, the current position is left near the end of the text, ready for
the next call to continue without a gap, potentially with a different font or size set.
Returns a boolean indicating if page break was triggered.

The primary purpose of this method is to print continuously wrapping text, where different parts may be rendered in different fonts or font sizes. This contrasts eg. with `.multi_cell()`, where a change in font family or size can only become effective on a new line.

Expand Down
9 changes: 9 additions & 0 deletions fpdf/enums.py
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,15 @@ def coerce(cls, value):
raise TypeError(f"{value} cannot convert to a {cls.__name__}")


class WrapMode(CoerciveEnum):
"Defines how to break and wrap lines in multi-line text."
WORD = intern("WORD")
"Wrap by word"

CHAR = intern("CHAR")
"Wrap by character"


class CharVPos(CoerciveEnum):
"Defines the vertical position of text relative to the line."
SUP = intern("SUP")
Expand Down
17 changes: 16 additions & 1 deletion fpdf/fpdf.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ class Image:
FontDescriptorFlags,
AccessPermission,
CharVPos,
WrapMode,
)
from .errors import FPDFException, FPDFPageFormatException, FPDFUnicodeEncodingException
from .fonts import fpdf_charwidths
Expand Down Expand Up @@ -3251,6 +3252,7 @@ def multi_cell(
print_sh=False,
new_x=XPos.RIGHT,
new_y=YPos.NEXT,
wrapmode: WrapMode = WrapMode.WORD,
):
"""
This method allows printing text with line breaks. They can be automatic
Expand Down Expand Up @@ -3287,13 +3289,16 @@ def multi_cell(
of text as bold / italics / underlined. Default to False.
print_sh (bool): Treat a soft-hyphen (\\u00ad) as a normal printable
character, instead of a line breaking opportunity. Default value: False
wrapmode (fpdf.enums.WrapMode): "WORD" for word based line wrapping (default),
"CHAR" for character based line wrapping.

Using `new_x=XPos.RIGHT, new_y=XPos.TOP, maximum height=pdf.font_size` is
useful to build tables with multiline text in cells.

Returns: a boolean indicating if page break was triggered,
or if `split_only == True`: `txt` splitted into lines in an array
"""
wrapmode = WrapMode.coerce(wrapmode)
if isinstance(w, str) or isinstance(h, str):
raise ValueError(
"Parameter 'w' and 'h' must be numbers, not strings."
Expand Down Expand Up @@ -3361,6 +3366,7 @@ def multi_cell(
styled_text_fragments,
justify=(align == Align.J),
print_sh=print_sh,
wrapmode=wrapmode,
)
text_line = multi_line_break.get_line_of_given_width(maximum_allowed_width)
while (text_line) is not None:
Expand Down Expand Up @@ -3476,7 +3482,12 @@ def multi_cell(

@check_page
def write(
self, h: float = None, txt: str = "", link: str = "", print_sh: bool = False
self,
h: float = None,
txt: str = "",
link: str = "",
print_sh: bool = False,
wrapmode: WrapMode = WrapMode.WORD,
):
"""
Prints text from the current position.
Expand All @@ -3492,7 +3503,10 @@ def write(
(identifier returned by `FPDF.add_link`) or external URL.
print_sh (bool): Treat a soft-hyphen (\\u00ad) as a normal printable
character, instead of a line breaking opportunity. Default value: False
wrapmode (fpdf.enums.WrapMode): "WORD" for word based line wrapping (default),
"CHAR" for character based line wrapping.
"""
wrapmode = WrapMode.coerce(wrapmode)
if not self.font_family:
raise FPDFException("No font set, you need to call set_font() beforehand")
if isinstance(h, str):
Expand All @@ -3511,6 +3525,7 @@ def write(
multi_line_break = MultiLineBreak(
styled_text_fragments,
print_sh=print_sh,
wrapmode=wrapmode,
)
# first line from current x position to right margin
first_width = self.w - self.x - self.r_margin
Expand Down
27 changes: 25 additions & 2 deletions fpdf/line_break.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

from typing import NamedTuple, Any, Union, Sequence

from .enums import CharVPos
from .enums import CharVPos, WrapMode
from .errors import FPDFException

SOFT_HYPHEN = "\u00ad"
Expand Down Expand Up @@ -306,6 +306,22 @@ def add_character(
self.width += character_width
active_fragment.characters.append(character)

def trim_trailing_spaces(self):
if not self.fragments:
return
last_frag = self.fragments[-1]
last_char = last_frag.characters[-1]
while last_char == " ":
char_width = last_frag.get_character_width(" ")
self.width -= char_width
last_frag.trim(-1)
if not last_frag.characters:
del self.fragments[-1]
if not self.fragments:
return
last_frag = self.fragments[-1]
last_char = last_frag.characters[-1]

def _apply_automatic_hint(self, break_hint: Union[SpaceHint, HyphenHint]):
"""
This function mutates the current_line, applying one of the states
Expand Down Expand Up @@ -364,10 +380,12 @@ def __init__(
styled_text_fragments: Sequence,
justify: bool = False,
print_sh: bool = False,
wrapmode: WrapMode = WrapMode.WORD,
):
self.styled_text_fragments = styled_text_fragments
self.justify = justify
self.print_sh = print_sh
self.wrapmode = wrapmode
self.fragment_index = 0
self.character_index = 0
self.idx_last_forced_break = None
Expand Down Expand Up @@ -405,9 +423,14 @@ def get_line_of_given_width(self, maximum_width: float, wordsplit: bool = True):
return current_line.manual_break(trailing_nl=True)

if current_line.width + character_width > maximum_width:
if character == SPACE:
if character == SPACE: # must come first, always drop a current space.
self.character_index += 1
return current_line.manual_break(self.justify)
if self.wrapmode == WrapMode.CHAR:
# If the line ends with one or more spaces, then we want to get rid of them
# so it can be justified correctly.
current_line.trim_trailing_spaces()
return current_line.manual_break(self.justify)
if current_line.automatic_break_possible():
(
self.fragment_index,
Expand Down
2 changes: 1 addition & 1 deletion test/image/test_load_image.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ def test_load_invalid_base64_data():


# ensure memory usage does not get too high - this value depends on Python version:
@memunit.assert_lt_mb(147)
@memunit.assert_lt_mb(200)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not bump this above 150MB, it's not necessary

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point?
The demand will rise with every new test added, and that seems unlikely to get fixed in pytest any time soon.
Is there any real disadvantage of adding some extra headroom here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not sure this is due to pytest... Investigation is ongoing in #641

The point is that keeping those thresholds allows me to keep track of how fast / what causes memory usage increases.
So far they helped me to link the growing number of tests with this memory usage, but this could be spurious-correlation...
As long as the increasing memory usage problem is not solved, I'd prefer to keep those checks a low as possible, in hope this will help pinpoint the origin of the issue

def test_share_images_cache(tmp_path):
images_cache = {}
icc_profiles_cache = {}
Expand Down
Binary file added test/text/multi_cell_char_wrap.pdf
Binary file not shown.
18 changes: 17 additions & 1 deletion test/text/test_line_break.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
from fpdf import FPDF, FPDFException, TextMode
from fpdf.line_break import Fragment, MultiLineBreak, TextLine
from fpdf.line_break import Fragment, MultiLineBreak, TextLine, CurrentLine

import pytest

Expand Down Expand Up @@ -1113,3 +1113,19 @@ def test_last_line_no_justify():
res = multi_line_break.get_line_of_given_width(char_width)
exp = None
assert res == exp


def test_trim_trailing_spaces():
"Check special cases in CurrentLine method."
# pylint: disable=protected-access,assignment-from-none
pdf = FPDF()
pdf.set_font("helvetica")
cl = CurrentLine()
# Result: None - if cl.fragments is empty to begin with.
res = cl.trim_trailing_spaces()
assert res is None
# Result: None - if cl.fragments is empty after trimming trailing spaces.
frag = Fragment(" ", pdf._get_current_graphics_state(), pdf.k)
cl.fragments = [frag]
res = cl.trim_trailing_spaces()
assert res is None
19 changes: 19 additions & 0 deletions test/text/test_multi_cell.py
Original file line number Diff line number Diff line change
Expand Up @@ -399,3 +399,22 @@ def test_multi_cell_char_spacing(tmp_path): # issue #489
pdf.set_char_spacing(10)
pdf.multi_cell(w=150, txt=LOREM_IPSUM[:200], new_x="LEFT", fill=True)
assert_pdf_equal(pdf, HERE / "multi_cell_char_spacing.pdf", tmp_path)


def test_multi_cell_char_wrap(tmp_path): # issue #649
pdf = FPDF()
pdf.add_page()
pdf.set_font("Helvetica", "", 10)
pdf.set_fill_color(255, 255, 0)
pdf.multi_cell(w=50, txt=LOREM_IPSUM[:200], new_x="LEFT", fill=True)
pdf.ln()
pdf.multi_cell(
w=50, txt=LOREM_IPSUM[:200], new_x="LEFT", fill=True, wrapmode="CHAR"
)
pdf.ln()
pdf.set_font("Courier", "", 10)
txt = " " + "abcdefghijklmnopqrstuvwxyz" * 3
pdf.multi_cell(w=50, txt=txt, new_x="LEFT", fill=True, align="L")
pdf.ln()
pdf.multi_cell(w=50, txt=txt, new_x="LEFT", fill=True, align="L", wrapmode="CHAR")
assert_pdf_equal(pdf, HERE / "multi_cell_char_wrap.pdf", tmp_path)
25 changes: 23 additions & 2 deletions test/text/test_write.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,6 @@ def test_write_font_stretching(tmp_path): # issue #478
pdf.add_page()
# built-in font
pdf.set_font("Helvetica", "", 8)
pdf.set_fill_color(255, 255, 0)
pdf.set_right_margin(pdf.w - right_boundary)
pdf.write(txt=LOREM_IPSUM[:100])
pdf.ln()
Expand All @@ -70,7 +69,6 @@ def test_write_font_stretching(tmp_path): # issue #478
pdf.set_stretching(100)
pdf.add_font(fname=FONTS_DIR / "DroidSansFallback.ttf")
pdf.set_font("DroidSansFallback", "", 8)
pdf.set_fill_color(255, 255, 0)
pdf.write(txt=LOREM_IPSUM[:100])
pdf.ln()
pdf.ln()
Expand Down Expand Up @@ -129,3 +127,26 @@ def write_this():
pdf.denom_lift = 1.0
write_this()
assert_pdf_equal(pdf, HERE / "write_superscript.pdf", tmp_path)


def test_write_char_wrap(tmp_path): # issue #649
right_boundary = 50
pdf = fpdf.FPDF()
pdf.add_page()
pdf.set_right_margin(pdf.w - right_boundary)
pdf.set_font("Helvetica", "", 10)
pdf.write(txt=LOREM_IPSUM[:200])
pdf.ln()
pdf.ln()
pdf.write(txt=LOREM_IPSUM[:200], wrapmode="CHAR")
pdf.ln()
pdf.ln()
pdf.set_font("Courier", "", 10)
txt = " " + "abcdefghijklmnopqrstuvwxyz" * 3
pdf.write(txt=txt)
pdf.ln()
pdf.ln()
pdf.write(txt=txt, wrapmode="CHAR")
pdf.line(pdf.l_margin, 10, pdf.l_margin, 130)
pdf.line(right_boundary, 10, right_boundary, 130)
assert_pdf_equal(pdf, HERE / "write_char_wrap.pdf", tmp_path)
Binary file added test/text/write_char_wrap.pdf
Binary file not shown.
Binary file modified test/text/write_font_stretching.pdf
Binary file not shown.