Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for escape character in markdown text #1224

Merged
merged 5 commits into from
Jul 20, 2024

Conversation

david-fed
Copy link

@david-fed david-fed commented Jul 13, 2024

Implements #1215

Checklist:

  • The GitHub pipeline is OK (green),
    meaning that both pylint (static code analyzer) and black (code formatter) are happy with the changes of this PR.

  • A unit test is covering the code added / modified by this PR

  • This PR is ready to be merged

  • In case of a new feature, docstrings have been added, with also some documentation in the docs/ folder

  • A mention of the change is present in CHANGELOG.md

This PR will implement support for \ as escape character in Markdown text.
Sample code:

from fpdf import FPDF

pdf = FPDF(format=(101.6, 152.4))
pdf.set_margin(2)
pdf.add_page()
pdf.set_font("Times", size=16)

txt = "**This** is a markdown test. Theres\\__double underscores"
pdf.multi_cell(w=50, text=txt, align='L', new_x='RIGHT',
                new_y='TOP', border=0, markdown=True,)

pdf.output("md.pdf")

Old output:
old

New output:
new

By submitting this pull request, I confirm that my contribution is made under the terms of the GNU LGPL 3.0 license.

Copy link
Collaborator

@gmischler gmischler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The results of your test don't look quite right to me, or maybe I just don't fully understand what you're trying to do.
Can you please give an exact and complete specification of what escaping rules you're applying? This will also be needed for the documentation.

Specific question:

  • Why does "\\**Lorem\\\\** Ipsum" render as all regular text? Since the second set of backslashes are neutered (I think), shouldn't the second set of "**" start a fragment of bold text?

  • What happens if a set of "\\" (or r"\") appears within normal text, not immediately followed by a formart marker? (other than the normal Python handling of backslashes).

  • Are we certain that "\" is a good character sequence for this purpose? The trouble being in the difference between a normal Python "\\" (which is a single backslash) and r"\\", which gives a genuine double backslash in the text.

  • Since all of our other formatting markers are genuine double characters, do we really want the escape character to be just a single one? Either way, we'll need to make this very explicit in the documentation and the examples.

@david-fed
Copy link
Author

Thanks for your feedback @gmischler!

I want to implement escaping rules the same way they work generally in Python. Specifically:

  • Escape character is only interpreted as an escape character if it's immediately followed by a valid marker.
  • Odd number of escape characters escape the marker. Even number of escape characters escape the previous escape character (neutered).
  • Escape character(s) which is not immediately followed by a valid marker will be treated as a normal string.

Please refer to my tests that showcase different combinations of how the escape character can be used and how it will be interpreted.

The results of your test don't look quite right to me, or maybe I just don't fully understand what you're trying to do. Can you please give an exact and complete specification of what escaping rules you're applying? This will also be needed for the documentation.

Specific question:

  • Why does "\\**Lorem\\\\** Ipsum" render as all regular text? Since the second set of backslashes are neutered (I think), shouldn't the second set of "**" start a fragment of bold text?

Yes, you're right. That was a bug in my previous commit. I fixed that now so it behaves the same way like Python interprets neutered escape characters. Refer to test_markdown_parse_multiple_escape().

  • What happens if a set of "\\" (or r"\") appears within normal text, not immediately followed by a formart marker? (other than the normal Python handling of backslashes).

It will be treated as a normal string. Refer to test_markdown_unrelated_escape().

  • Are we certain that "\" is a good character sequence for this purpose? The trouble being in the difference between a normal Python "\\" (which is a single backslash) and r"\\", which gives a genuine double backslash in the text.
  • Since all of our other formatting markers are genuine double characters, do we really want the escape character to be just a single one? Either way, we'll need to make this very explicit in the documentation and the examples.

I think a single \ is a good candidate, as it's known as an escape character both in Python and Markdown.

It requires that users are familiar with the way Python interprets \ as an escape character (\\ != r"\\"). Which I think we can expect from them.

I added a code snippet and a screenshot to the documentation to demonstrate the escaping functionality. I don't want to make it too verbose but let me know if more explanation is needed.

Copy link
Collaborator

@gmischler gmischler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good now, except for some minor nitpicks.

Oh, and don't worry about the one failing test, it has nothing to do with your changes.

@@ -181,20 +181,25 @@ An optional `markdown=True` parameter can be passed to the [`cell()`](fpdf/fpdf.
& [`multi_cell()`](fpdf/fpdf.html#fpdf.fpdf.FPDF.multi_cell) methods
in order to enable basic Markdown-like styling: `**bold**, __italics__, --underlined--`.

If your text contains Markdown-like styling that you don't want to apply, you can escape it using `\`. The escape character works the same way it generally does in Python (see the example below).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This phrasing seems a bit confusing to me.
Maybe something like this: "If the printable text contains a character sequence that would be incorrectly interpreted as a formatting marker" ?
It's not really easy to describe, but we should be as precise as possible.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rephrased it accordingly:

fpdf2/docs/TextStyling.md

Lines 183 to 185 in 56d0f2c

If the printable text contains a character sequence that would be incorrectly interpreted as a formatting marker, it can be escaped using `\`. The escape character works the same way it generally does in Python (see the example below).

pdf.add_page()
pdf.set_font("Times", size=60)
pdf.set_font("Times", size=30)
pdf.cell(text="**Lorem** __Ipsum__ --dolor--", markdown=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a next_y="NEXT" here to show the different effects on two lines?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it accordingly:

fpdf2/docs/TextStyling.md

Lines 190 to 199 in 56d0f2c

```python
from fpdf import FPDF
pdf = FPDF()
pdf.add_page()
pdf.set_font("Times", size=50)
pdf.cell(text="**Lorem** __Ipsum__ --dolor--", markdown=True, new_x='LEFT', new_y='NEXT')
pdf.cell(text="\\**Lorem\\** \\\\__Ipsum\\\\__ --dolor--", markdown=True)
pdf.output("markdown-styled.pdf")
```

@gmischler
Copy link
Collaborator

@allcontributors please add @david-fed for code

Copy link

@gmischler

I've put up a pull request to add @david-fed! 🎉

@gmischler
Copy link
Collaborator

Thanks for a helpful feature addition, @david-fed !
Merging now.

@gmischler gmischler merged commit 5ffab70 into py-pdf:master Jul 20, 2024
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants