Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #(959): Fix TermStats.TermText access, add CLI comments and 1 CLI bug fix #963

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

rclabo
Copy link
Contributor

@rclabo rclabo commented Sep 25, 2024

  • You've read the Contributor Guide and Code of Conduct.
  • You've included unit or integration tests for your change, where applicable.
  • You've included inline docs for your change, where applicable.
  • There's an open issue for the PR that you are making. If you'd like to propose a change, please open an issue to discuss the change or find an existing issue.

Summary of the changes (Less than 80 chars)

Fixes #959: Fix TermStats.TermText access, add CLI comments and 1 CLI bug fix

Description

  1. Converts TermStats.TermText into a public property and changes it's case to follow .NET conventions.
  2. Code sweep of all the Main(args) methods that relate to the lucene-cli so they now have XML summary comments that indicate that they are not intended for direct use but rather are used by the lucene-cli. If the Main method is in a static class that looks to be designed solely for use from the command line then similar comments are added to the class.
  3. In the process of doing the code sweep to add comments, I discovered one cli command Lucene,Net.Cli.BenchmarkFindQualityQueriesCommand that was wired up to the wrong command class. So I fixed this bug as well by wiring it up the right one which is QualityQueriesFinder.Main(args).
  4. the 4th commit was due to realizing that I had one failing test due to me making TermStats.TermText a public field rather than a public property. So this commit fixes that and the failing unit test.

Copy link
Contributor

@NightOwl888 NightOwl888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR.

A few things that could be changed on all/most of these:

  1. It would be helpful would be to drop a link to the exact command in the documentation. While I was going to suggest /absoluteLatest here, it would be best if we linked these comments to the same version of the app that they occur in. Especially for version-sensitive commands such as index fix or index merge, which require the same version of lucene-cli to be installed as the version of Lucene.NET they are using. This is one of the downsides of not being able to run these commands on the software directly - it is up to the users to get the version right. If we had a single Powershell script that is run as part of the release procedure, we could update these versioned URLs and keep them in sync with the version that the user has installed (or if on the website, the version that they are currently viewing). Note that we could hold off on this and do it in a later PR.
  2. Many of these classes are for direct use, as they have public static methods that can be called when referenced as a DLL. Technically, there is nothing stopping someone from creating a custom wrapper console app that cascades the call to Main(object[]). Instead of focusing on what is not intended, perhaps we should just focus on the fact that the main method is intended to be called through lucene-cli and usually the page for that command has links and/or documentation that explains how to use the command.
  3. Some of the commands had no documentation that I could find, so they do little more than explain how to run the command (not what it is used for). At the time lucene-cli was documented, LLMs didn't exist. Perhaps we can provide more info if one or more of them is able to provide info.
  4. Change the wording from we provide a <see href="https://www.nuget.org/packages/lucene-cli">lucene-cli</see> with a command to we provide a <a href="https://learn.microsoft.com/en-us/dotnet/core/tools/global-tools">.NET tool</a>, <see href="https://www.nuget.org/packages/lucene-cli"\>lucene-cli</see\>, with a command.

The demos are a special case. See my inline comment for those.

/// method within a DLL can't be directly called from the command line so we
/// provide a <see href="https://www.nuget.org/packages/lucene-cli">lucene-cli</see>
/// with a command that maps to that method: analysis stempel-compile-stems.
/// </para>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is the correct way to use an HTML paragraph tag, but I have found that using a single closed tag between paragraphs also works (<para/>) and replaces the <p> tag they use in Java without the need of an additional closing tag. Since that is the way it is used in most of the other code comments, could you please change them in this PR, as well?

Copy link
Contributor Author

@rclabo rclabo Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah, I went back and forth about that. but I like the <para>text </para> approach better in cases where a chunk of the text is luceneNET specific.

@@ -25,7 +26,7 @@ public class Configuration : ConfigurationBase
{
public Configuration(CommandLineOptions options)
{
this.Main = (args) => QueryDriver.Main(args);
this.Main = (args) => QualityQueriesFinder.Main(args);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

@@ -20,6 +20,13 @@ namespace Lucene.Net.Analysis.Ja.Util
* limitations under the License.
*/

/// <summary>
/// LUCENENET specific: This class is not for direct use. In the Java implementation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, the DictionaryBuilder.Build() method can be used from a calling app by calling into the DLL.

And it looks like all of the dependent types are public, so this could be used as a console app by copy and paste if one were to also bring in all of the dependencies.

Maybe just delete this doc comment. There is a lot that could go here to explain how to use this, but I see that the blog post I used to test it is listed in our docs for this command: https://lucenenet.apache.org/docs/4.8.0-beta00016/cli/analysis/kuromoji-build-dictionary.html#description.

I also looked up the procedure on ChatGPT, if you are interested: https://chatgpt.com/share/66f50a60-8e38-8005-9c29-f7633024787c

Maybe we could explain here or on the command below that the instructions on how to use this are on that page, since if a user starts here, they won't see the usage instructions.

@rclabo
Copy link
Contributor Author

rclabo commented Sep 26, 2024

Hi Shad,
I appreciate the time you've taken to review the PR, but I have to admit that some of the feedback feels a bit discouraging. The comments in this contribution aim to significantly improve the documentation for these methods. While it's true that many of these static main(args) methods could technically be called by application code, they were designed with command-line usage in mind. I believe these comments provide valuable context for developers who might find such methods unusual in a .NET DLL. By pointing them to the Lucene CLI and even specifying the relevant commands, these comments help to clarify an otherwise puzzling aspect of the project.

In my view, the PR as it stands makes a positive contribution and moves the project forward. Rather than focusing on further refinement of the comments, I would suggest that the primary question should be whether this contribution, overall, improves the project. If it does, I believe it warrants approval.

Thanks again for your consideration!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

It's not possible to extract terms from HighFreqTerms
2 participants