PECL Extension #315

markhalliwell · 2018-03-19T20:34:59Z

So this is more of an open-ended question of sorts.

Are there any plans for providing a PECL compatible extension of this project?

It'd be nice to have the power/speed of https://github.com/commonmark/cmark that mimic this project's APIs.

I've found https://github.com/krakjoe/cmark, but it seems to be a completely custom and standalone implementation.

markhalliwell · 2018-03-26T19:17:45Z

https://markdown.unicorn.fail/benchmarks

colinodell · 2018-03-26T20:29:45Z

Probably not, unfortunately.

I can barely program an Arduino in C, so programming a safe, fully-functional extension is not something I'd trust myself to do 🙃 I'm much more proficient in languages that manage memory for me. Additionally, there are some aspects of the current design that I dislike and hope to refactor sometime. I'd hate to invest time into writing a PECL extension only to need to refactor that too.

For now I'd personally like to focus on the current implementation and see whether we can optimize anything further - I'm sure there are plenty of areas for improvement.

(I had briefly considering trying to write an extension just for some core aspects of the engine, similar to what Twig 1 did before they changed their minds, but I'm still not convinced we've reached maximum performance.)

markhalliwell · 2018-03-26T20:40:08Z

I was thinking the same actually (in regards to PHP7/Twig).

I'm now thinking that, if the extension is more performant and can ultimately allow a way for it to be extended, then this could just proxy to the extension if it's installed. Idk.

I'll keep following krakjoe/cmark#1 and ultimately add support for it in the module to get some base-line benchmarks to see how much it'd be worth it.

krakjoe · 2018-03-28T07:56:46Z

Some words ...

The reason there is no remarkable difference between the performance of twig with and without that extension is clear: There is no remarkable difference :)

The vm calling code, and calling it directly in C is near as makes no difference the same thing, there's no improvement because there's no space for improvement. If you could measure anything from such a change it would be remarkable, it's bound to fall within the margin of error for whatever benchmark you would wish to use.

Don't write an extension that does that, it would be a waste of time ...

Profile guided optimisation of code is the only thing to do here ... and that may have limited success ...

markhalliwell · 2018-03-30T03:29:11Z

Just an FYI, https://markdown.unicorn.fail now has the PECL cmark extension as part of its demo/benchmarks 😁

krakjoe · 2018-03-30T11:50:25Z

When you built libcmark, did you use CMAKE_BUILD_TYPE=Release ?

markhalliwell · 2018-03-30T11:53:33Z

I didn't manually set that, no... I just following the instructions at https://github.com/commonmark/cmark#installing

markhalliwell · 2018-03-30T12:04:59Z

Ok, I rebuilt the extension using cmake -DCMAKE_BUILD_TYPE=Release .. and it's now deployed and cached results on https://markdown.unicorn.fail/benchmarks have been reset.

markhalliwell · 2018-03-30T12:21:49Z

Ah, I see where your confusion is coming from 😄

Yes, the benchmarks are a little higher than what you're probably used to seeing due to the fact that it has to go through the Drupal system https://github.com/unicorn-fail/markdown.unicorn.fail/blob/48c12049aeecdb70eafcb897889a205acfe3815a/web/modules/custom/markdown_demo/src/FormattedMarkdown.php#L48-L50

I can add a "raw" benchmark in addition to the existing one (e.g. Drupal/raw XXms/XXms).

markhalliwell · 2018-03-30T16:25:27Z

Ok, https://markdown.unicorn.fail/benchmarks now shows Parsed / Rendered / Total times

krakjoe · 2018-03-31T08:24:37Z

Still some of the times look suspect, the league package is rendering something in 0ms ...

Maybe increase resolution of results (.2f), and break down into separate tables for parse/render/total ...

krakjoe · 2018-03-31T08:26:53Z

$this->stop = \DateTime::createFromFormat('U.u', sprintf('%.6F', microtime(TRUE)));

Included in the benchmark time, is a benchmark for DateTime::createFromFormat, and everything that calls, including object creation, initialization and storage ... that's why we just use microtime in benchmarks ...

Set a temp start and stop with microtime, create objects after you have result ...

markhalliwell · 2018-03-31T23:37:38Z

Ok, I've completely redone https://markdown.unicorn.fail/benchmarks

I've actually pushed the "benchmarking" to the Markdown module itself because there's no other way to get more accurate times externally due to the nature of this running in Drupal without a bunch of hoops: https://cgit.drupalcode.org/markdown/commit/?id=617a40f

I also took your advice on setting temporary variables for microtimes, increased MS resolution (.2f) and separate tables (for averages).

I also went ahead and prepended all results with ~ to indicate that these are an approximation and went ahead and added a note at the top explaining that these are being run in a Drupal site.

Thanks for the feedback!

krakjoe · 2018-04-01T05:44:50Z

This is nice ... can I make one last suggestion ... the tables of "x slower" ... I first thought they need to be percentages, some of them are unreasonable, but even then you're going to have a table of percentages and you have to keep scanning back and forth to determine their actual meaning ...

I think the best representation is probably bar graphs ... possibly, swap those for bar graphs ?

markhalliwell · 2018-04-01T15:42:50Z

some of them are unreasonable

I'm not entirely sure what you mean here. If you mean thephpleague/commonmark being somewhere around 50-70x slower than the PECL cmark extension, that's fairly accurate from my local testing where I was bypassing the benchmark caching on each page reload.

That being said, I'm realizing now though that the "averages" are really "aggregated averages" of the first table, where they are all different file sizes, complexities, etc. That isn't really a true representation of the term "average". One is supposed to "average" the same thing, not an aggregated average of different things.

Each of the benchmark lines in the first table should actually be an "average" of, maybe somewhere around 10 tests per example, per parser? Sure, it's a lot of overhead initially for the server, but once ran, it's cached.

but even then you're going to have a table of percentages and you have to keep scanning back and forth to determine their actual meaning

True, I implemented this as just the first "step" to see how feasible and informative this actually is. After sleeping on this, I'm beginning to see that the "slower" numbers are really just the flip of "faster" and ultimately redundant and confusing.

Perhaps, with the changes to the first table, each row could be clicked on (first one active by default) to bring up specific details of the benchmarks, e.g. charts/graphs and a more simplified list of "faster".

That all being said, the whole point of this specific issue is to see how this project could benefit from some performance tuning.

It's pretty clear that thephpleague/commonmark is decisively "slower" than any of its other counterparts and completely smoked when the PECL cmark extension enters the ring.

That isn't to say it currently doesn't have its strengths, like extensibility, that no other project can seem to match.

Perhaps using its own PECL extension or, likely, waiting for commonmark/cmark#123 so that the current PECL cmark extension can mimic it to allow extensibility can help change this in the future.

For now, I'd say that if you're processing 1000s of markdown files per second (where performance may start to be a real concern), maybe the PECL cmark extension is the way to go and you just have to sacrifice valuable things like attributes, tables, and strikethrough.

If it's just for normal website operations based on user actions (e.g. submitting content/comments, etc.), then a more feature rich/extensible implementation like thephpleague/commonmark is probably the way to go.

As much as I'd love to continue nitpicking/improving benchmarking, I have to get back to finishing the actual Markdown module for Drupal: https://www.drupal.org/project/markdown/issues/2952435 😁

colinodell · 2018-04-01T16:27:56Z

I agree with your assessment :) And regardless of which parser is used, the website should ideally be caching the rendered HTML.

If you ever do have the time and desire to help work on some performance tuning I'd absolutely appreciate it! It's much faster than it used to be but there's still plenty of room for improvement - perhaps a fresh set of eyes will uncover some areas I've been overlooking.

Thank you both so much for everything you're doing to push CommonMark adoption forward!

markhalliwell · 2018-04-01T16:43:03Z

I definitely have the desire.

As far as time, it probably won’t be until later this month as I have DrupalCon next week and still need to finish the Drupal Bootstrap 4 port.

colinodell · 2018-09-18T12:52:41Z

I'm going to close this issue for now. While I do think an extension could provide a major performance boost, I don't have the bandwidth or knowledge to create one, and I don't think now is the right time - perhaps when we have a solid roadmap for v2 we can circle back to this.

markhalliwell mentioned this issue Mar 25, 2018

PECL krakjoe/cmark#1

Closed

colinodell self-assigned this Mar 26, 2018

colinodell added question General questions about the project or usage performance Something could be made faster or more efficient labels Mar 26, 2018

colinodell closed this as completed Sep 18, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PECL Extension #315

PECL Extension #315

markhalliwell commented Mar 19, 2018

markhalliwell commented Mar 26, 2018

colinodell commented Mar 26, 2018

markhalliwell commented Mar 26, 2018

krakjoe commented Mar 28, 2018 •

edited

Loading

markhalliwell commented Mar 30, 2018

krakjoe commented Mar 30, 2018

markhalliwell commented Mar 30, 2018

markhalliwell commented Mar 30, 2018

markhalliwell commented Mar 30, 2018 •

edited

Loading

markhalliwell commented Mar 30, 2018

krakjoe commented Mar 31, 2018

krakjoe commented Mar 31, 2018 •

edited

Loading

markhalliwell commented Mar 31, 2018

krakjoe commented Apr 1, 2018

markhalliwell commented Apr 1, 2018 •

edited

Loading

colinodell commented Apr 1, 2018

markhalliwell commented Apr 1, 2018

colinodell commented Sep 18, 2018

PECL Extension #315

PECL Extension #315

Comments

markhalliwell commented Mar 19, 2018

markhalliwell commented Mar 26, 2018

colinodell commented Mar 26, 2018

markhalliwell commented Mar 26, 2018

krakjoe commented Mar 28, 2018 • edited Loading

markhalliwell commented Mar 30, 2018

krakjoe commented Mar 30, 2018

markhalliwell commented Mar 30, 2018

markhalliwell commented Mar 30, 2018

markhalliwell commented Mar 30, 2018 • edited Loading

markhalliwell commented Mar 30, 2018

krakjoe commented Mar 31, 2018

krakjoe commented Mar 31, 2018 • edited Loading

markhalliwell commented Mar 31, 2018

krakjoe commented Apr 1, 2018

markhalliwell commented Apr 1, 2018 • edited Loading

colinodell commented Apr 1, 2018

markhalliwell commented Apr 1, 2018

colinodell commented Sep 18, 2018

krakjoe commented Mar 28, 2018 •

edited

Loading

markhalliwell commented Mar 30, 2018 •

edited

Loading

krakjoe commented Mar 31, 2018 •

edited

Loading

markhalliwell commented Apr 1, 2018 •

edited

Loading