Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug(doc): Case-sensitivity of ignored words #3210

Closed
buhtz opened this issue Nov 11, 2023 · 9 comments
Closed

Bug(doc): Case-sensitivity of ignored words #3210

buhtz opened this issue Nov 11, 2023 · 9 comments

Comments

@buhtz
Copy link

buhtz commented Nov 11, 2023

It might be because of my non-native English but in my understanding codespell behave different then described as in
https://github.com/codespell-project/codespell#ignoring-words .

There is written that ignore-patters are case-sensitive.
As an example Manuel (upper case first letter) (a usual first name in Europe) is recognized as an error and Manual is recommended.
That means when I want to allow/ignore it I should put Manuel (upper case fist letter) in the config files ignore-word-list. But this does not work. It work only if the pattern is lower case: Manuel is ignored when ignore-word-list=manuel.

That confuses me.

If this is the expected behavior that the linked README section might should be rewritten?

@DimitriPapadopoulos
Copy link
Collaborator

DimitriPapadopoulos commented Nov 11, 2023

When ignoring false positives, note that spelling errors are case-insensitive but words to ignore are case-sensitive. For example, the dictionary entry wrod will also match the typo Wrod, but to ignore it you must pass wrod.

It means typos you want to ignore must be passed to openfortivpn codespell using the same case as in the typo dictionary:

If you think you can rephrase in a better way, pull requests would be welcome.

@buhtz
Copy link
Author

buhtz commented Nov 11, 2023

What is openfortivpn?
As you describe the situation the ignore pattern is not case senstiv.

When the pattern is "manuel" and "Manuel", "manuel" and "MANUEL" are catched than this is not case sensitive.

I can not rephrase something that I do not understand. sorry.

@DimitriPapadopoulos
Copy link
Collaborator

DimitriPapadopoulos commented Nov 11, 2023

Let me rephrase. The contents of the typo dictionary are:


If you want codespell to ignore words Manuel, manuel, MANUEL (all of them will be ignored because this part of the code is case-insensitive), you need to write --ignore-word-list=manuel, with the exact case used in the typo dictionary above, --ignore-word-list=Manuel or --ignore-word-list=MANUEL won't work.

This is because internally codespell works like this:

  1. option --ignore-word-list filters out typos from the typo dictionaries in a case-sensitive way,
  2. then codespell uses the filtered typo dictionaries to check for typos in a case-insensitive way.

That's confusing, isn't it? There are two ways out of this confusion, better documenting the confusing behaviour or modifying the confusing behaviour. It might be too late to modify the behaviour. Better documentation would probably involve an explanation of the above steps, but then I feel end-users wouldn't be interested or shouldn't need to know the internals of codespell.

@buhtz
Copy link
Author

buhtz commented Nov 12, 2023

Do I got that right

manuel->manual 
AAAAAA  BBBBBB

The A part is the identifier in the dictionary and this is case sensitive.
But its value, the B part, is case insensitive?

Let me rephrase.

I do understand your explanation. But it might not be enough for a "good" docu.
Users don't care about content of the dict. The error is "Manuel->Manual". So a user will add "Manuel" in the ignore word list and wonder why it does not work.
Understanding that case-in-sensitive-dict-problem is to technical. Don't bother users with that.

Are all entries (A part) in the dict are lower case? If it is the case then just tell the user to all ways add a lower case into the ignore word or even better convert each entry to lower case yourself.

@DimitriPapadopoulos
Copy link
Collaborator

Absolutely, you got that right.

No, not all AAAAAA entries in the typo dictionary are currently lower-case, but I guess they could be made lower-case. But then, why do we insist on matching AAAAAA in a case-sensitive way? @peternewman?

@buhtz
Copy link
Author

buhtz commented Nov 15, 2023

I still don't get the behavior. The word "Manuel" is triggered by the dictionary entry "manuel->manual"? But codespell recommends "Manual" (the upper case version)?

Maybe you should refuse to use the term "case-sensitive" and "case-insensitive". From a users perspective the word I have to add to the ignore list ist not "case sensitive" in the usual way. Because to ignore "Manuel" I have to add "manuel" which are different lower/uper cases.
That "Manuel" wouldn't work in that list doesn't matter at this point from the users perspective. I do understand that technically it is case-sensitive but not from a users perspective.

Asked from a users perspective: How do I decide if I have to add the lower- or upper-case version of a term (e.g. "Manuel") I would like to ignore?

I am member of a maintenance team using codespell. And I have to explain my other members how to deal with situations like this in an easy and not time consuming way.

@DimitriPapadopoulos
Copy link
Collaborator

DimitriPapadopoulos commented Nov 15, 2023

It's actually a 3-step process:

  1. option --ignore-word-list filters out typos from the typo dictionaries in a case-sensitive way,
  2. Then codespell uses the filtered typo dictionaries to match typos in a case-insensitive way.
  3. codespell suggests how to fix typos using the case in the checked text, not the case in the typo dictionary.

@DimitriPapadopoulos
Copy link
Collaborator

DimitriPapadopoulos commented Nov 27, 2023

Perhaps all words in the dictionaries of typos should be in lower case. Or alternatively, matching in step 1 should be made case-insensitive. Not sure why it is case-sensitive.

@DimitriPapadopoulos
Copy link
Collaborator

See #3270 for making typos in dictionaries lowercase.

See #3272 for fixing the behaviour of --ignore-words-list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants