improve section detection #327

keewis · 2021-05-24T17:38:02Z

Potentially fixes #316.

This tries to allow parsing sections which are not separated by blank lines (there should probably be a warning in that case, I'll add that once the general idea has been approved). In order to get that to work used a few tricks (e.g. add a optional doc parameter to _is_at_section to allow calling it on a different reader) so it might need some refactoring before being truly ready.

cc @Carreau, my main motivation was trying to get velin to auto-fix this

keewis · 2021-05-24T23:18:04Z

yielding StopIteration seems like a bug:

numpydoc/numpydoc/docscrape.py

Lines 214 to 219 in 265ab91

    
           if name.startswith('..'):  # index section 
        
               yield name, data[1:] 
        
           elif len(data) < 2: 
        
               yield StopIteration 
        
           else: 
        
               yield name, self._strip(data[2:])

because it will cause this to fail with a obscure error:

numpydoc/numpydoc/docscrape.py

Lines 380 to 381 in 265ab91

    
           sections = list(self._read_sections()) 
        
           section_names = set([section for section, content in sections])

I think this should be fixed, either by changing the yield value to yield StopIteration, StopIteration or yield "", [], or by raising an error instead (if that's what that was supposed to indicate).

keewis · 2022-01-16T12:51:26Z

Ping. Does anyone have any comments on this?

rgommers · 2022-01-19T10:01:10Z

Ping. Does anyone have any comments on this?

I think the discussion in gh-316 shows this is not desirable. Detecting in order to raise a better warning or even an exception makes sense though.

keewis · 2023-01-03T16:49:51Z

apologies for pinging and then forgetting about it for a year.

I implemented the requested change, such that it will now warn on every missing empty line (between summary and the first section, or between two sections).

The current implementation comes with a (slight?) performance regression because NumpyDocString._is_at_section is now called for every line, and I had to do a slightly ugly trick to make sure _is_at_section doesn't swallow empty lines.

Instead, I could also imagine doing a two-pass implementation: find all separators (multiple - or = per line) and check if those belong to a section which is not preceded by an empty line in the first pass, then run the actual extraction in the second pass. That might make the code a bit simpler and potentially faster, but every line would be visited twice (I'm not sure how much of an issue that would be).

Edit: I think the CI failures are unrelated (not sure, though)

keewis added 6 commits May 24, 2021 19:18

check that a section without a preceding blank line works

a924f1b

also check that removing the blank line between sections works

11a67c8

allow passing a different reader to _is_at_section

95ca79d

rewrite read_to_next_section in terms of _is_at_section

c7d223f

rewrite _parse_summary to only work on the summary

db4d8c8

make sure tracebacks are not detected as sections

4b43870

keewis added 10 commits January 3, 2023 12:59

expect warnings instead

7d768ba

Merge branch 'main' into section-detection

2339e1f

expect python warnings instead of sphinx warning log messages

845f0e0

remove the logging setup fixture

d0a2c1e

"read to next empty line" function that warns about missing empty lines

cc83cb5

make the custom function a method

5941477

detect missing empty lines between summary and the first section

21655e8

make the condition a bit clearer

022b4b9

return an empty string if peeking results in a negative list index

fa621ae

blank → empty

08e2ee2

Merge branch 'main' into section-detection

3c86eb7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve section detection #327

improve section detection #327

keewis commented May 24, 2021 •

edited

Loading

keewis commented May 24, 2021 •

edited

Loading

keewis commented Jan 16, 2022

rgommers commented Jan 19, 2022

keewis commented Jan 3, 2023 •

edited

Loading

improve section detection #327

Are you sure you want to change the base?

improve section detection #327

Conversation

keewis commented May 24, 2021 • edited Loading

keewis commented May 24, 2021 • edited Loading

keewis commented Jan 16, 2022

rgommers commented Jan 19, 2022

keewis commented Jan 3, 2023 • edited Loading

keewis commented May 24, 2021 •

edited

Loading

keewis commented May 24, 2021 •

edited

Loading

keewis commented Jan 3, 2023 •

edited

Loading