Markdown not escaping MD special characters #1215

exiva · 2024-06-25T14:03:27Z

Error details
When using markdown in fpdf2 it should respect escaped characters like underscores and asterisks. It does not currently.

Minimal code
Please include some minimal Python code reproducing your issue:

from fpdf import FPDF

pdf = FPDF(format=(101.6, 152.4))
pdf.set_margin(2)
pdf.add_page()
pdf.set_font("Times", size=16)

txt = "**This** is a markdown test. Theres\__double underscores"
pdf.multi_cell(w=50, text=txt, align='L', new_x='RIGHT',
                new_y='TOP', border=0, markdown=True,)

pdf.output("md.pdf")

Should output "This is a markdown test. Theres__double underscores" but outputs

Environment
Please provide the following information:

Operating System: macOS 14.5 Apple Silicon M2 Mac Mini
Python version: 3.11.9
fpdf2 version used: 2.7.9

The text was updated successfully, but these errors were encountered:

gmischler · 2024-06-25T14:52:05Z

Thanks for reporting this, and welcome to fpdf2, @exiva !

Well, the documentation doesn't promise that our markup parser would honor backslash escapes, so technically this isn't a bug... 😉

it is a perfectly valid feature request, though!
Any chance you might want to submit a PR?

andersonhc · 2024-06-25T14:57:34Z

If you want a head start, markdown markers are handled on fpdf _parse_chars() function.

fpdf2/fpdf/fpdf.py

Lines 3500 to 3515 in d574b07

    
           if markdown: 
        
               if ( 
        
                   is_marker 
        
                   and (not txt_frag or txt_frag[-1] != half_marker) 
        
                   and (len(text) < 3 or text[2] != half_marker) 
        
               ): 
        
                   if txt_frag: 
        
                       yield frag() 
        
                   if text[:2] == self.MARKDOWN_BOLD_MARKER: 
        
                       in_bold = not in_bold 
        
                   if text[:2] == self.MARKDOWN_ITALICS_MARKER: 
        
                       in_italics = not in_italics 
        
                   if text[:2] == self.MARKDOWN_UNDERLINE_MARKER: 
        
                       in_underline = not in_underline 
        
                   text = text[2:] 
        
                   continue

gmischler · 2024-07-20T05:13:39Z

Implemented in #1224

Lucas-C · 2024-08-19T11:35:59Z

Hi.

I'd like to reopen this issue to further discuss this.

I may be mistaken, but the escaping implemented in PR #1224 does not seem to conform to the CommonMark standard.

You can test the rendering of \\**Lorem\\** \\\\__Ipsum\\\\__ --dolor-- in https://spec.commonmark.org/dingus/

The result is this:

Escaping double underscores in Markdown should be done this way:
\_\_Ipsum\_\_

What do you think @gmischler, @exiva & @david-fed?

gmischler · 2024-08-19T13:38:43Z

Escaping double underscores in Markdown should be done this way:
\_\_Ipsum\_\_

In "standard" markdown (including commonmark), escaping individual marker characters is necessary because both a single character as well as a double character sequence can act as a marker.

In our own decidedly non-standard markdown implementation, all markers consist of double character sequences. This has the advantage of reducing conflicts with actual text, which is why I hope we'll stick to the concept for any future feature additions. It also means that it is fundamentally incompatible with every other markdown implementation out there (which could be seen both as an advantage or disadvantage).
But the key consequence here is that there is no need - and no real reason - to adhere to any standard markdown escape specification. So as far as I am concerned, the current implementation is perfectly fine.

Added: The bug reported in #1236 obviously still needs to get fixed.

Lucas-C · 2024-08-19T16:13:38Z

he key consequence here is that there is no need - and no real reason - to adhere to any standard markdown escape specification.

I do not quite agree with that 🙂

Yes, you are right, currently our Markdown implementation is relatively limited and far from following any standard.

However, I see several paths forward:

we aim to progressively get closer to the standard CommonMark spec, which is why I find it problematic that fpdf2 currently renders \\\\__Ipsum\\\\__ the way it does, as it is not compatible with the CommonMark spec on backslash escapes: https://spec.commonmark.org/0.31.2/#backslash-escapes
we drop the Markdown rendering feature in fpdf2 (in v2.8.0 ?), and simply recommend that users use mistletoe or any other Markdown-rendering library instead, because that's not really the goal of fpdf2 (which is what I do most of the time)
we continue to have a limited / "broken" implementation of Markdown in fpdf2

IMHO we should aim for option 1. or 2., but not for option 3., which is the least interesting for fpdf2 in the long term.

gmischler · 2024-08-20T05:09:15Z

we aim to progressively get closer to the standard CommonMark spec, which is why I find it problematic that fpdf2 currently renders \\\\__Ipsum\\\\__ the way it does, as it is not compatible with the CommonMark spec on backslash escapes: https://spec.commonmark.org/0.31.2/#backslash-escapes

Given the fundamental conceptual incompabilities and the way we currently handle deprecated features (without a clear deprecation strategy/policy), this would take decades and they would never actually be the same. It is better to keep it clearly distinct from commonmark, to avoid the two getting confused.

we drop the Markdown rendering feature in fpdf2 (in v2.8.0 ?), and simply recommend that users use mistletoe or any other Markdown-rendering library instead, because that's not really the goal of fpdf2 (which is what I do most of the time)

we continue to have a limited / "broken" implementation of Markdown in fpdf2

Accept that our own markdown implementation is "special" (as mentioned, this actually has certain practical advantages vis-a-vis commonmark).
If someone really absolutely wants a standards compliant markdown feature, add a "commonmark" parameter in parallel, which uses one of the several open source Python commonmark parsers, offering full compability. We have discussed this option before and you seemed to agree that it is a good idea.

Lucas-C · 2024-08-20T07:30:02Z

5. If someone really absolutely wants a standards compliant markdown feature, add a "commonmark" parameter in parallel, which uses one of the several open source Python commonmark parsers, offering full compability. We have discussed this option before and you seemed to agree that it is a good idea.

In that comment I wrote:

So I'm not really sure of the path forward regarding Markdown support...

😅

We already have some documentation on how to use a Markdown library combined with fpdf2:
https://py-pdf.github.io/fpdf2/CombineWithMistletoeoToUseMarkdown.html

I'm really not sure that fpdf2 should include code specific to any Markdown renderer / CommonMark parser...
And even less sure that we should support "our own Markdown flavor" AND also be compatible with CommonMark, with an optional dependency...
This sounds a bit like feature creep to me.

But I'm honestly curious to know what end-users think about it:

should we keep our "very basic Markdown flavor"
should we get rid of it and better document how to use other Markdown libraries, that do a far better job than us, in conjunction with fpdf2?

Maybe we could start a GitHub discussion poll about this.

gmischler · 2024-08-20T16:23:27Z

So I'm not really sure of the path forward regarding Markdown support...

Ah, so I was overinterpreting your statements back then. Sorry for that.

All the same, making our markdown implementation "more like commonmark" in any meaningful way would require to change the fundamental concept of its syntax: Namely to use double characters for all formatting markers. Given our general promise to stay backwards compatible, often at great cost, I don't see this as a viable option.

So if we do want a more standard implementation (which essentially equals some flavour of commonmark), then there is no real alternative to adding it as an alternative in parallel.
If at the same time we don't want to bind ourselfes to any particular flavour/implementation of commonmark, maybe that would be another case for the data adapter concept? This one would need to be quite tightly integrated with our text preprocessing machinery, but I think it could reasonably be done.

exiva added the bug label Jun 25, 2024

gmischler added enhancement good first issue up-for-grabs markdown hacktoberfest and removed bug labels Jun 25, 2024

david-fed mentioned this issue Jul 13, 2024

Added support for escape character in markdown text #1224

Merged

5 tasks

gmischler closed this as completed Jul 20, 2024

Lucas-C reopened this Aug 19, 2024

Lucas-C mentioned this issue Aug 19, 2024

Escape character escaping multiple consecutive markdown characters #1236

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Markdown not escaping MD special characters #1215

Markdown not escaping MD special characters #1215

exiva commented Jun 25, 2024 •

edited

Loading

gmischler commented Jun 25, 2024

andersonhc commented Jun 25, 2024 •

edited

Loading

gmischler commented Jul 20, 2024

Lucas-C commented Aug 19, 2024

gmischler commented Aug 19, 2024 •

edited

Loading

Lucas-C commented Aug 19, 2024

gmischler commented Aug 20, 2024

Lucas-C commented Aug 20, 2024

gmischler commented Aug 20, 2024

Markdown not escaping MD special characters #1215

Markdown not escaping MD special characters #1215

Comments

exiva commented Jun 25, 2024 • edited Loading

gmischler commented Jun 25, 2024

andersonhc commented Jun 25, 2024 • edited Loading

gmischler commented Jul 20, 2024

Lucas-C commented Aug 19, 2024

gmischler commented Aug 19, 2024 • edited Loading

Lucas-C commented Aug 19, 2024

gmischler commented Aug 20, 2024

Lucas-C commented Aug 20, 2024

gmischler commented Aug 20, 2024

exiva commented Jun 25, 2024 •

edited

Loading

andersonhc commented Jun 25, 2024 •

edited

Loading

gmischler commented Aug 19, 2024 •

edited

Loading