-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
would like po4a to support *roff \c
escape sequence
#527
Comments
Hello Branden, no need to petition us, we're already convinced :) So far, all my attempt to implement a sufficient support for \c failed, and Helge keeps reporting the issues remaining in my several attempts. See e.g. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1036826#85 which is my current TODO list on that topic (you already contributed to that BR on Debian, by the way, thanks for that). Your new hindsight is very welcome here. I will try to look again at our code with your text in mind. The thing is that we don't have no notion of filling which could be enabled or disabled. Most of the time, we don't need such thing, as we simply try to extract the content strings to a PO file, not render the whole file. Of course there is a trick here: we may need to understand some of it, as we try to ease the life of translators by replacing inline formatting (bold, italic) from the *roff syntax to an arguably easier syntax inspired from the POD format (e.g., B and I). If you feel like, you could grep for \c on our implementation: https://github.com/mquinson/po4a/blob/master/lib/Locale/Po4a/Man.pm Please be patient with us, we never pretended to implement a full groff parser, only to extract/inject some sentences from/into an otherwise unmodified source file... |
Hi Martin, Good to hear from you, and excellent to hear back so quickly!
I see! I had forgotten that exchange, except for being rather firm with Bjarni.
Well, you might not need such a notion. I'd like to see how far you can get without it.
Right. One thing I'm curious about is how you decide where the boundaries of a translatable string are. When extracting string literals from a programming language, this might not be too hard--look for double quotes, and understand some things like C- and Perl-style backslash-escape conventions, and also those languages' rules for string literal catenation. In a man(7) document, things may be a bit more interesting.
Acknowledged.
No worries. I would not ask or expect you to. While in theory, man pages can leverage the full power of the troff typesetting system (and groff extensions thereto), in practice they limit their composition to a small subset of that language a few nines of the time. For groff 1.22.4, mandoc(1) maintainer Ingo Schwarze and I collaborated on making the "Portability" subsection of the groff_man(7) page a more useful guide for man(7) authors and a sort of mutually agreed minimal set of groff + man features that the groff would recommend for composition, so that mandoc, a non-roff formatter, would, like po4a, not have to take on the gigantic task of interpreting the full language. Ingo is more of a purist than I am; I feel that man pages should exercise formatter features if necessary to achieve satisfactory typesetting, but when we keep in mind that most man page perusal is on a terminal (or in HTML scraped and converted from terminal output!), the exercise of such features should, most of the time, be safely ignorable by non-typesetters. For example, in groff's man pages I make frequent recourse to
Right. I've never in my life played with po4a before, but it sounds like I should give it a whirl on a man page or two to see what it makes of them. I do have one crazy idea that might be good for a Google Summer of Code or similar project: It's not a very well known fact that groff supports output in more than one format. And I don't mean PostScript, PDF, or HTML--groff handles all of these the same, writing out a document in a page description language that doesn't have a well accepted name but which I call "grout". It's a descendant of the troff output format described by Kernighan in the Bell Labs CSTR documents # 97 and # 76 (1992 revision), which I similarly call "trout". Programs called output drivers, like DWB troff's dpost, or groff's grodvi, grolbp, grops, gropdf, and grotty, translate that page description language into another file format or byte stream that a (possibly emulated) hardware device is prepared to consume. But that's not what I'm talking about. As a language compiler, groff builds lists of "nodes", very much like the abstract syntax tree that is taught in computer science classes. Since day one it has supported not one but two output formats: grout, and "approximate output", which is what you see when you run Here's a description of
And here's an example of what that looks like:
You may anticipate where I'm going with this. One could write a "pod emitter" output class. Like "approximate" (or "ascii" [sic]) output, its It also knows where the sentence boundaries are (assuming the input was not written to conceal this information), so it could start a new output line when encountering one. So why parse man or try to guess where the font face changes are when you could have groff tell you, with perfect knowledge? Just wanted to put that idea out there. And it would be another motivator for superseding |
Actually, that'd be more than perfect. Some of the formats handled by po4a go this way: we don't parse the input ourselves, but interact with the relevant tool. We certainly prefer when it goes that way. I'm wondering: how would the string reinjection work in your idea? We'd provide the translated string back to groff, and it'd write the source file back with the translated content? That would be great, for sure. Ways more robust than our current attempt (which, I must say, works surprising well for most existing man pages. It started as a joke and revealed actually quite usable...) Thanks |
I was reading Locale::Po4a::Man(3pm) today and happened across the following language.
Excellent advice!
Ouch!
As groff maintainer, I would like to petition the po4a project to support
\c
.I am aware that historically, approximately no one has been able to clearly explain what
\c
does (how does it both "interrupt" and "continue", depending on which manual you read?), which may explain why po4a's parser has proven reluctant to apply any interpretation to it.Here is the full explanation, from groff's Texinfo manual:
In man page applications, its interpretation should be simple:
There are two cases: filling enabled and filling disabled.
If filling is enabled,
\c
means "don't put a space on the output when you encounter the next newline".If filling is disabled,
\c
means "don't put a line break on the output when you encounter the next newline".Since groff 1.22.4 (December 2018), the groff_man(7) page has explicitly advised the use of
\c
in certain circumstances. (In groff 1.23.0, much of this guidance migrated to the new groff_man_style(7) page.) We can't escape\c
; we've tried. The only alternative is introducing a bunch of new macros that mostly do the same things as existing ones, bloating the man macro language, making it harder to learn, and beginning a transition that we can be sure will never actually end due to 45 years of inertia possessed by the existing macros.I'd like to know how I can help make this happen.
(If you're curious what brought me here, well, (1) Helge Kreutzmann told me I could find a comprehensive list of tags used by pod2man in po4a's documentation, and (2) I stumbled across this procps-ng commit.)
The text was updated successfully, but these errors were encountered: