-
Notifications
You must be signed in to change notification settings - Fork 467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't ignore an unknown acronym only when it's written in upper case #1578
Comments
Hi @vadz , Currently codespell is case-insensitive, so as per the readme:
So you need to use Our dictionary checking needs to be case-insensitive because for example a C define is normally written in upper case so we'd want to catch |
I have the same problem. I want to exclude |
How about filtering matches after looking them up in the dictionary? |
So you want to do it based on the case of the typo? Currently
Do you mean based on the correction? I don't really follow. |
That would ignore the whole line and I'd also have to specify every line that this word occurs in, right?
Using Right now I worked around it by writing a small wrapper that reads a |
Yes, not ideal but quite easy with grep. I do it quite a lot: It's not perfect, but for a lot of edge cases, like uint/unit, even case sensitivity wouldn't help. Actually for this specific use case, the ignore regex is probably perfect: Doh, just realised you're doing that already! You can vanish pInOut before you even start matching things. Nice trick with stripping the p's off the front by the way! Care to share your wrapper @Holzhaus as it might help others. @vadz I think the above would work for you too?
Understood. So you're talking about filtering based on the case of the MATCH rather than the dictionary entry. You probably either want two whitelists, or some way to flag whether each entry should be case sensitive or not. It's certainly possible! |
The wrapper also parses the git diff and filters out all results that are not coming from added lines (because our legacy codebase is quite large, and fixing other peoples typos in merge commits is annoying). |
@peternewman It looks like it ought to work, but I couldn't test it because of this:
I didn't look into it in details, so it's perfectly possible that I'm doing something wrong, but I couldn't actually test your suggestion... |
This is a builtin module going back to 2.7: https://docs.python.org/2.7/library/configparser.html Are you on 2.6? It's unsupported |
You need at least Python 3 I think. In python 2.7 the module has another name ( |
Oops, sorry, pip indeed uses 2.7.16 by default on my Debian 10 (Buster) system, I keep forgetting that I need to use pip3 explicitly. Its output was rather weird anyhow:
with the "Failed" line in ominous red, but it seems to have been installed successfully and I can confirm that adding |
You need to install wheel before installation:
|
I think this should do the trick for you: |
This works OK to ignore three letter upper case words (i.e. many acronyms):
Or being more clever, 2 or 3 letter acronyms:
However, this fails with consecutive words to be ignored via the regex, see #2049 for a potential solution. |
I think this is a pretty reasonable workflow:
I'd like steps 5 and 6 to be different. In order of preference:
|
Pull requests are welcome, but make sure you address components such as #1578 (comment). |
Just to be clear, would you consider a pull request where:
I believe this is also the suggestion in #1578 (comment) |
I am not the maintainer, but I think that would be a good, long-awaited change. I am only worried about compatibility. I don't have a clear idea (yet?) of what could possibly go wrong in existing codebases when we release such a change. Comments on that? Also, what about words in the dictionary themselves that are not all lowercase? Such as grep '^[A-Z]'$ grep '^[A-Z]' codespell_lib/data/dictionary*.txt
codespell_lib/data/dictionary_code.txt:MSDOS->MS-DOS
codespell_lib/data/dictionary.txt:ACI->ACPI
codespell_lib/data/dictionary.txt:ACII->ASCII
codespell_lib/data/dictionary.txt:Amercia->America
codespell_lib/data/dictionary.txt:Ardiuno->Arduino
codespell_lib/data/dictionary.txt:Australien->Australian
codespell_lib/data/dictionary.txt:Austrlaian->Australian
codespell_lib/data/dictionary.txt:Berkley->Berkeley
codespell_lib/data/dictionary.txt:Bernouilli->Bernoulli
codespell_lib/data/dictionary.txt:Blitzkreig->Blitzkrieg
codespell_lib/data/dictionary.txt:Bonnano->Bonanno
codespell_lib/data/dictionary.txt:Brasillian->Brazilian
codespell_lib/data/dictionary.txt:Britian->Britain
codespell_lib/data/dictionary.txt:Brittish->British
codespell_lib/data/dictionary.txt:Buddah->Buddha
codespell_lib/data/dictionary.txt:Buddist->Buddhist
codespell_lib/data/dictionary.txt:Caesarian->Caesarean
codespell_lib/data/dictionary.txt:Cambrige->Cambridge
codespell_lib/data/dictionary.txt:Capetown->Cape Town
codespell_lib/data/dictionary.txt:Carmalite->Carmelite
codespell_lib/data/dictionary.txt:Carnagie->Carnegie
codespell_lib/data/dictionary.txt:Carnagie-Mellon->Carnegie-Mellon
codespell_lib/data/dictionary.txt:Carnigie->Carnegie
codespell_lib/data/dictionary.txt:Carnigie-Mellon->Carnegie-Mellon
codespell_lib/data/dictionary.txt:Carribbean->Caribbean
codespell_lib/data/dictionary.txt:Carribean->Caribbean
codespell_lib/data/dictionary.txt:Carthagian->Carthaginian
codespell_lib/data/dictionary.txt:Cataline->Catiline, Catalina,
codespell_lib/data/dictionary.txt:Ceasar->Caesar
codespell_lib/data/dictionary.txt:Celcius->Celsius
codespell_lib/data/dictionary.txt:Cgywin->Cygwin
codespell_lib/data/dictionary.txt:Champange->Champagne
codespell_lib/data/dictionary.txt:Cincinatti->Cincinnati
codespell_lib/data/dictionary.txt:Cincinnatti->Cincinnati
codespell_lib/data/dictionary.txt:Conneticut->Connecticut
codespell_lib/data/dictionary.txt:Cyrllic->Cyrillic
codespell_lib/data/dictionary.txt:Dardenelles->Dardanelles
codespell_lib/data/dictionary.txt:DCHP->DHCP
codespell_lib/data/dictionary.txt:Decemer->December
codespell_lib/data/dictionary.txt:Dravadian->Dravidian
codespell_lib/data/dictionary.txt:EBCIDC->EBCDIC
codespell_lib/data/dictionary.txt:EDCDIC->EBCDIC
codespell_lib/data/dictionary.txt:Enlish->English, enlist,
codespell_lib/data/dictionary.txt:Europian->European
codespell_lib/data/dictionary.txt:Europians->Europeans
codespell_lib/data/dictionary.txt:Eurpean->European
codespell_lib/data/dictionary.txt:Eurpoean->European
codespell_lib/data/dictionary.txt:Fahrenheight->Fahrenheit
codespell_lib/data/dictionary.txt:Farenheight->Fahrenheit
codespell_lib/data/dictionary.txt:Farenheit->Fahrenheit
codespell_lib/data/dictionary.txt:Febuary->February
codespell_lib/data/dictionary.txt:Feburary->February
codespell_lib/data/dictionary.txt:Flemmish->Flemish
codespell_lib/data/dictionary.txt:Formalhaut->Fomalhaut
codespell_lib/data/dictionary.txt:Foundland->Newfoundland
codespell_lib/data/dictionary.txt:Fransiscan->Franciscan
codespell_lib/data/dictionary.txt:Fransiscans->Franciscans
codespell_lib/data/dictionary.txt:FTBS->FTBFS
codespell_lib/data/dictionary.txt:Galations->Galatians
codespell_lib/data/dictionary.txt:Gameboy->Game Boy
codespell_lib/data/dictionary.txt:Ghandi->Gandhi
codespell_lib/data/dictionary.txt:Godounov->Godunov
codespell_lib/data/dictionary.txt:Gothenberg->Gothenburg
codespell_lib/data/dictionary.txt:Gottleib->Gottlieb
codespell_lib/data/dictionary.txt:Guaduloupe->Guadalupe, Guadeloupe,
codespell_lib/data/dictionary.txt:Guadulupe->Guadalupe, Guadeloupe,
codespell_lib/data/dictionary.txt:Guatamala->Guatemala
codespell_lib/data/dictionary.txt:Guatamalan->Guatemalan
codespell_lib/data/dictionary.txt:Guilia->Giulia
codespell_lib/data/dictionary.txt:Guilio->Giulio
codespell_lib/data/dictionary.txt:Guiness->Guinness
codespell_lib/data/dictionary.txt:Guiseppe->Giuseppe
codespell_lib/data/dictionary.txt:Habsbourg->Habsburg
codespell_lib/data/dictionary.txt:Hallowean->Hallowe'en, Halloween,
codespell_lib/data/dictionary.txt:Hatian->Haitian
codespell_lib/data/dictionary.txt:Heidelburg->Heidelberg
codespell_lib/data/dictionary.txt:Hyperldger->Hyperledger
codespell_lib/data/dictionary.txt:I'sd->I'd
codespell_lib/data/dictionary.txt:Ihaca->Ithaca
codespell_lib/data/dictionary.txt:ISNB->ISBN
codespell_lib/data/dictionary.txt:Israelies->Israelis
codespell_lib/data/dictionary.txt:Istambul->Istanbul
codespell_lib/data/dictionary.txt:Japanes->Japanese
codespell_lib/data/dictionary.txt:Johanine->Johannine
codespell_lib/data/dictionary.txt:Jospeh->Joseph
codespell_lib/data/dictionary.txt:Laotion->Laotian, lotion,
codespell_lib/data/dictionary.txt:Laotions->Laotians, lotions,
codespell_lib/data/dictionary.txt:Linix->Linux
codespell_lib/data/dictionary.txt:Lybia->Libya
codespell_lib/data/dictionary.txt:Malcom->Malcolm
codespell_lib/data/dictionary.txt:Massachusettes->Massachusetts
codespell_lib/data/dictionary.txt:Massachussets->Massachusetts
codespell_lib/data/dictionary.txt:Massachussetts->Massachusetts
codespell_lib/data/dictionary.txt:Mediteranean->Mediterranean
codespell_lib/data/dictionary.txt:Michagan->Michigan
codespell_lib/data/dictionary.txt:Micorsoft->Microsoft
codespell_lib/data/dictionary.txt:Micosoft->Microsoft
codespell_lib/data/dictionary.txt:Microfost->Microsoft
codespell_lib/data/dictionary.txt:Microsfoft->Microsoft
codespell_lib/data/dictionary.txt:Microsft->Microsoft
codespell_lib/data/dictionary.txt:Microsof->Microsoft
codespell_lib/data/dictionary.txt:Microsofot->Microsoft
codespell_lib/data/dictionary.txt:Micrsft->Microsoft
codespell_lib/data/dictionary.txt:Micrsoft->Microsoft
codespell_lib/data/dictionary.txt:MingGW->MinGW
codespell_lib/data/dictionary.txt:Miscrosoft->Microsoft
codespell_lib/data/dictionary.txt:Misouri->Missouri
codespell_lib/data/dictionary.txt:Missisipi->Mississippi
codespell_lib/data/dictionary.txt:Missisippi->Mississippi
codespell_lib/data/dictionary.txt:Mississipi->Mississippi
codespell_lib/data/dictionary.txt:Mocrosoft->Microsoft
codespell_lib/data/dictionary.txt:Monserrat->Montserrat
codespell_lib/data/dictionary.txt:Montnana->Montana
codespell_lib/data/dictionary.txt:Mythraic->Mithraic
codespell_lib/data/dictionary.txt:Naploeon->Napoleon
codespell_lib/data/dictionary.txt:Napolean->Napoleon
codespell_lib/data/dictionary.txt:Napoleonian->Napoleonic
codespell_lib/data/dictionary.txt:Nazereth->Nazareth
codespell_lib/data/dictionary.txt:Newyorker->New Yorker
codespell_lib/data/dictionary.txt:Novemer->November
codespell_lib/data/dictionary.txt:Novermber->November
codespell_lib/data/dictionary.txt:Nullabour->Nullarbor
codespell_lib/data/dictionary.txt:Nuremburg->Nuremberg
codespell_lib/data/dictionary.txt:Octobor->October
codespell_lib/data/dictionary.txt:Palistian->Palestinian
codespell_lib/data/dictionary.txt:Palistinian->Palestinian
codespell_lib/data/dictionary.txt:Palistinians->Palestinians
codespell_lib/data/dictionary.txt:Papanicalou->Papanicolaou
codespell_lib/data/dictionary.txt:Peloponnes->Peloponnese, Peloponnesus,
codespell_lib/data/dictionary.txt:Pennyslvania->Pennsylvania
codespell_lib/data/dictionary.txt:Philipines->Philippines
codespell_lib/data/dictionary.txt:Phillipine->Philippine
codespell_lib/data/dictionary.txt:Phillippines->Philippines
codespell_lib/data/dictionary.txt:Phonecian->Phoenecian
codespell_lib/data/dictionary.txt:POSIX-complient->POSIX-compliant
codespell_lib/data/dictionary.txt:Postdam->Potsdam
codespell_lib/data/dictionary.txt:Premonasterians->Premonstratensians
codespell_lib/data/dictionary.txt:Pucini->Puccini
codespell_lib/data/dictionary.txt:Puertorrican->Puerto Rican
codespell_lib/data/dictionary.txt:Puertorricans->Puerto Ricans
codespell_lib/data/dictionary.txt:Queenland->Queensland
codespell_lib/data/dictionary.txt:Rockerfeller->Rockefeller
codespell_lib/data/dictionary.txt:Russion->Russian
codespell_lib/data/dictionary.txt:Sacremento->Sacramento
codespell_lib/data/dictionary.txt:Sagitarius->Sagittarius
codespell_lib/data/dictionary.txt:Sanhedrim->Sanhedrin
codespell_lib/data/dictionary.txt:Septemer->September
codespell_lib/data/dictionary.txt:Sionist->Zionist
codespell_lib/data/dictionary.txt:Sionists->Zionists
codespell_lib/data/dictionary.txt:Sixtin->Sistine, Sixteen,
codespell_lib/data/dictionary.txt:Skagerak->Skagerrak
codespell_lib/data/dictionary.txt:Soalris->Solaris
codespell_lib/data/dictionary.txt:Thse->these, this,
codespell_lib/data/dictionary.txt:Tolkein->Tolkien
codespell_lib/data/dictionary.txt:Toosday->Tuesday
codespell_lib/data/dictionary.txt:Tosday->Tuesday, today,
codespell_lib/data/dictionary.txt:Tuscon->Tucson
codespell_lib/data/dictionary.txt:Twosday->Tuesday
codespell_lib/data/dictionary.txt:Ukranian->Ukrainian
codespell_lib/data/dictionary.txt:Unicde->Unicode
codespell_lib/data/dictionary.txt:UnitesStates->UnitedStates
codespell_lib/data/dictionary.txt:UTF8ness->UTF-8-ness
codespell_lib/data/dictionary.txt:Voight->Voigt
codespell_lib/data/dictionary.txt:Vulacn->Vulcan
codespell_lib/data/dictionary.txt:Vulakn->Vulkan
codespell_lib/data/dictionary.txt:Yementite->Yemenite, Yemeni, @larsoner Would you agree to something like #1578 (comment)? |
Sure, and if we're really worried about backward compat we can make the behavior opt in with |
Interestingly, it seems like
Meanwhile, the documentation under
I'm going to focus on the "normal" ignore list for this issue/PR and leave the behavior for URI unchanged I guess, but maybe the help should be reworded too? |
For now, let's focus on improving the behaviour of codespell. Then we can improve the documentation. It seems impossible to provide decent, comprehensible documentation with the current behaviour. See discussion in #3210 and specifically #3210 (comment). |
I have a lot of occurrences of the word SEH in my code and I'd like codespell to leave it alone, but this doesn't seem to work, using neither
-I
nor-L
option, e.g.:The problem seems to be the case, i.e. if I use
-L seh
, it does recognize "SEH", but I'd prefer not to have to do it, as "seh" is much more likely to be a misspelling of "she" than anything else -- "SEH" itself is always written in upper case.I'm not sure if it's the same as #1128, as it happens for camel case identifier there, while my problem doesn't seem to be related to camel case (or an identifier, for that matter, as the problematic acronym only occurs inside comments), sorry in advance if it's the same bug and so an already known problem.
P.S. This is using 1.17.1 from Debian unstable.
The text was updated successfully, but these errors were encountered: