Added support for escape character in markdown text #1224

david-fed · 2024-07-13T17:26:32Z

Implements #1215

Checklist:

The GitHub pipeline is OK (green),
meaning that both pylint (static code analyzer) and black (code formatter) are happy with the changes of this PR.
A unit test is covering the code added / modified by this PR
This PR is ready to be merged
In case of a new feature, docstrings have been added, with also some documentation in the docs/ folder
A mention of the change is present in CHANGELOG.md

This PR will implement support for \ as escape character in Markdown text.
Sample code:

from fpdf import FPDF

pdf = FPDF(format=(101.6, 152.4))
pdf.set_margin(2)
pdf.add_page()
pdf.set_font("Times", size=16)

txt = "**This** is a markdown test. Theres\\__double underscores"
pdf.multi_cell(w=50, text=txt, align='L', new_x='RIGHT',
                new_y='TOP', border=0, markdown=True,)

pdf.output("md.pdf")

Old output:

New output:

By submitting this pull request, I confirm that my contribution is made under the terms of the GNU LGPL 3.0 license.

gmischler

The results of your test don't look quite right to me, or maybe I just don't fully understand what you're trying to do.
Can you please give an exact and complete specification of what escaping rules you're applying? This will also be needed for the documentation.

Specific question:

Why does "\\**Lorem\\\\** Ipsum" render as all regular text? Since the second set of backslashes are neutered (I think), shouldn't the second set of "**" start a fragment of bold text?
What happens if a set of "\\" (or r"\") appears within normal text, not immediately followed by a formart marker? (other than the normal Python handling of backslashes).
Are we certain that "\" is a good character sequence for this purpose? The trouble being in the difference between a normal Python "\\" (which is a single backslash) and r"\\", which gives a genuine double backslash in the text.
Since all of our other formatting markers are genuine double characters, do we really want the escape character to be just a single one? Either way, we'll need to make this very explicit in the documentation and the examples.

david-fed · 2024-07-18T22:57:32Z

Thanks for your feedback @gmischler!

I want to implement escaping rules the same way they work generally in Python. Specifically:

Escape character is only interpreted as an escape character if it's immediately followed by a valid marker.
Odd number of escape characters escape the marker. Even number of escape characters escape the previous escape character (neutered).
Escape character(s) which is not immediately followed by a valid marker will be treated as a normal string.

Please refer to my tests that showcase different combinations of how the escape character can be used and how it will be interpreted.

The results of your test don't look quite right to me, or maybe I just don't fully understand what you're trying to do. Can you please give an exact and complete specification of what escaping rules you're applying? This will also be needed for the documentation.

Specific question:

Why does "\\**Lorem\\\\** Ipsum" render as all regular text? Since the second set of backslashes are neutered (I think), shouldn't the second set of "**" start a fragment of bold text?

Yes, you're right. That was a bug in my previous commit. I fixed that now so it behaves the same way like Python interprets neutered escape characters. Refer to test_markdown_parse_multiple_escape().

What happens if a set of "\\" (or r"\") appears within normal text, not immediately followed by a formart marker? (other than the normal Python handling of backslashes).

It will be treated as a normal string. Refer to test_markdown_unrelated_escape().

Are we certain that "\" is a good character sequence for this purpose? The trouble being in the difference between a normal Python "\\" (which is a single backslash) and r"\\", which gives a genuine double backslash in the text.

Since all of our other formatting markers are genuine double characters, do we really want the escape character to be just a single one? Either way, we'll need to make this very explicit in the documentation and the examples.

I think a single \ is a good candidate, as it's known as an escape character both in Python and Markdown.

It requires that users are familiar with the way Python interprets \ as an escape character (\\ != r"\\"). Which I think we can expect from them.

I added a code snippet and a screenshot to the documentation to demonstrate the escaping functionality. I don't want to make it too verbose but let me know if more explanation is needed.

gmischler

Looking good now, except for some minor nitpicks.

Oh, and don't worry about the one failing test, it has nothing to do with your changes.

gmischler · 2024-07-19T06:37:33Z

docs/TextStyling.md

@@ -181,20 +181,25 @@ An optional `markdown=True` parameter can be passed to the [`cell()`](fpdf/fpdf.
 & [`multi_cell()`](fpdf/fpdf.html#fpdf.fpdf.FPDF.multi_cell) methods
 in order to enable basic Markdown-like styling: `**bold**, __italics__, --underlined--`.

+If your text contains Markdown-like styling that you don't want to apply, you can escape it using `\`. The escape character works the same way it generally does in Python (see the example below).


This phrasing seems a bit confusing to me.
Maybe something like this: "If the printable text contains a character sequence that would be incorrectly interpreted as a formatting marker" ?
It's not really easy to describe, but we should be as precise as possible.

Rephrased it accordingly:

fpdf2/docs/TextStyling.md

Lines 183 to 185 in 56d0f2c

If the printable text contains a character sequence that would be incorrectly interpreted as a formatting marker, it can be escaped using `\`. The escape character works the same way it generally does in Python (see the example below).

gmischler · 2024-07-19T06:45:07Z

docs/TextStyling.md

 pdf.add_page()
-pdf.set_font("Times", size=60)
+pdf.set_font("Times", size=30)
 pdf.cell(text="**Lorem** __Ipsum__ --dolor--", markdown=True)


Maybe add a next_y="NEXT" here to show the different effects on two lines?

Changed it accordingly:

fpdf2/docs/TextStyling.md

Lines 190 to 199 in 56d0f2c

```python

from fpdf import FPDF

pdf = FPDF()

pdf.add_page()

pdf.set_font("Times", size=50)

pdf.cell(text="**Lorem** __Ipsum__ --dolor--", markdown=True, new_x='LEFT', new_y='NEXT')

pdf.cell(text="\\**Lorem\\** \\\\__Ipsum\\\\__ --dolor--", markdown=True)

pdf.output("markdown-styled.pdf")

```

gmischler · 2024-07-19T08:41:24Z

@allcontributors please add @david-fed for code

allcontributors · 2024-07-19T08:41:34Z

@gmischler

I've put up a pull request to add @david-fed! 🎉

gmischler · 2024-07-20T05:12:47Z

Thanks for a helpful feature addition, @david-fed !
Merging now.

Added support for escape character in markdown text

b74f2e8

david-fed requested a review from gmischler as a code owner July 13, 2024 17:26

gmischler reviewed Jul 15, 2024

View reviewed changes

david-fed added 3 commits July 18, 2024 20:14

updated documentation

7b46a18

Fixed escape character handling & added more tests

f0e59c5

rephrased docu

e386a16

gmischler approved these changes Jul 19, 2024

View reviewed changes

allcontributors bot mentioned this pull request Jul 19, 2024

add david-fed as a contributor for code #1229

Merged

updated documentation

56d0f2c

gmischler merged commit 5ffab70 into py-pdf:master Jul 20, 2024
10 of 11 checks passed

gmischler mentioned this pull request Jul 20, 2024

Markdown not escaping MD special characters #1215

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added support for escape character in markdown text #1224

Added support for escape character in markdown text #1224

david-fed commented Jul 13, 2024 •

edited

Loading

gmischler left a comment

david-fed commented Jul 18, 2024

gmischler left a comment •

edited

Loading

gmischler Jul 19, 2024

david-fed Jul 19, 2024

gmischler Jul 19, 2024

david-fed Jul 19, 2024

gmischler commented Jul 19, 2024

allcontributors bot commented Jul 19, 2024

gmischler commented Jul 20, 2024


	If the printable text contains a character sequence that would be incorrectly interpreted as a formatting marker, it can be escaped using `\`. The escape character works the same way it generally does in Python (see the example below).

	```python
	from fpdf import FPDF

	pdf = FPDF()
	pdf.add_page()
	pdf.set_font("Times", size=50)
	pdf.cell(text="Lorem __Ipsum__ --dolor--", markdown=True, new_x='LEFT', new_y='NEXT')
	pdf.cell(text="\\Lorem\\ \\\\__Ipsum\\\\__ --dolor--", markdown=True)
	pdf.output("markdown-styled.pdf")
	```

Added support for escape character in markdown text #1224

Added support for escape character in markdown text #1224

Conversation

david-fed commented Jul 13, 2024 • edited Loading

gmischler left a comment

Choose a reason for hiding this comment

david-fed commented Jul 18, 2024

gmischler left a comment • edited Loading

Choose a reason for hiding this comment

gmischler Jul 19, 2024

Choose a reason for hiding this comment

david-fed Jul 19, 2024

Choose a reason for hiding this comment

gmischler Jul 19, 2024

Choose a reason for hiding this comment

david-fed Jul 19, 2024

Choose a reason for hiding this comment

gmischler commented Jul 19, 2024

allcontributors bot commented Jul 19, 2024

gmischler commented Jul 20, 2024

david-fed commented Jul 13, 2024 •

edited

Loading

gmischler left a comment •

edited

Loading