-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set forgiving encoding fallback when parsing Po file #515
Conversation
But i found it while building debian's pppconfig instead, which has an |
Printing warnings on the first pass, in case the encoding was not UTF-8, is a bit useless. However on the second pass, with the encoding parsed from the file itself it may be useful to print out some warnings (or even error-out). Thoughts? |
Hello, thanks for your contribution. I would indeed like to display the errors on problem. Also, I think we should add a test enforcing that we do the right thing in case of problems. I will try to do the test, but I would appreciate if you could make sure that the encoding errors are still displayed on need, please. Thanks, |
b00fa0b
to
6caa868
Compare
This revision will print warnings on the second pass only, when we request a specific charset. It also means that we will run a second pass with the same UTF-8 encoding in case the file did not specify |
Hello @aleasto, thanks for your contribution. Could you please add a little test example to your PR? You don't need to integrate it to the testing infrastructure (even if it'd be great if you could), but a source document and the corresponding config file would be necessary to ensure that this feature keeps working as expected in the future. Thanks in advance, |
Before reading the expected encoding from the file itself, we first try to read it as UTF-8 so we can at least get to read the line that specifies the encoding. For some reason, $PerlIO::encoding::fallback is set to FB_CROAK when we reach this part of the code so the program would instead exit at the first unknown character. Unless we were requested a specific encoding, do not even print any warning about it.
I couldn't quite figure out how to integrate the test, but you can verify that
fails without the first commit. |
The test seems to fail at the moment. |
Yea I'm aware.
|
I'll integrate the test on it at some point in the future, but right now I do not have the time to work on it. Any help would be welcome, of course. Sorry for the delay |
If I'm seeing correctly, the test is failing because the produced file is still in UTF-8 and not in Latin-1 as advised by the PO file. @aleasto, I'm a bit lost. What's the purpose of that test, actually? I don't see any Latin-1 char in the t/charset/implicit-iso8859/iso8859.pod input file. I would have thought that the implemented feature would be better tested with such a char? |
implicit-iso8859/iso8859.nb.po uses |
That's a weird local name, in this case, because it already exists: https://www.localeplanet.com/icu/nb/index.html I'd prefer a visible char, such as é or à, please. Then, I had the feeling that the only broken part of your test is that the produced translation file is in UTF8 despite the fact that the input was in another encoding. I'm not sure whether that's the wanted behavior, but if so, the test should be further fixed. Thanks a lot for your help, and sorry for being distant these days, real life is really packed. |
I'm not sure either. I'm replicating the setup of pppconfig assuming it is correct as it worked without complains before po4a 0.70. |
Sorry if I'm interjecting, but why not to fix the underlying issue (that po4a tries to read the whole file in a wrong encoding) in the first place? i.e. read only up to the first msgid/msgstr and only then imbue the file with a correct encoding. Besides that it feels like a setting to use relaxed attitude towards encoding should be global rather than local. And if it's local it should be reset after the function (i.e. set |
I think that this PR is superseeded by !538, which fixes the underlying problem. @aleasto could you please confirm that the current git version fixes the problem you had in the first place? Thanks, Mt |
Seems to work for the (manual) test case in this PR. |
Before reading the expected encoding from the file itself, we first try to read it as UTF-8 so we can at least get to read the line that specifies the encoding.
For some reason, $PerlIO::encoding::fallback is set to FB_CROAK when we reach this part of the code so the program would instead exit at the first unknown character.