Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VIVO 3828: Remove SDB in preparation to moving to new Jena. #374

Open
wants to merge 4 commits into
base: jena4-upgrade
Choose a base branch
from

Conversation

kaladay
Copy link
Contributor

@kaladay kaladay commented Feb 28, 2023

VIVO GitHub issue: vivo-project/VIVO#3828

What does this pull request do?

The Jena version being switched to (Jena 4) has removed support for SDB.

The current forms of DatasetWrapperFactory and StaticDatasetFactory should no longer be needed.

The SDB related code has been stripped out.
Many of the classes with "SDB" in the name that actually provide more than just SDB have been re-created.
These recreations are generally child classes of a similarly named Jena class.
These recreates have "DB" in their name rather than "SDB".

The DatasetFactory.createMem() is deprecated and may not be available in Jena 4.
Attempts to replace this with createTxnMem() have revealed problems that are potentially transaction related.
Replacing with createGeneral() might be possible but this is not done to avoid introducing more potential problems.

Notable points in regards to replacing DatasetFactory.createMem():

  1. The method is deprecated.
  2. The DatasetFactory.createGeneral() is the compatible equivalent.
  3. The DatasetFactory.createGeneral() better supports TDB (which is the main reason for including in this commit set).
  4. The DatasetFactory.createTxnMem() is the more recommended alternative and is transactional.
  5. This better prepares the code for the upcoming Jena upgrade changes.

There are some serious existing problems with closing dataset connections (and related).
The (now removed) DatasetWrapperFactory probably should be rewritten to provide a new dataset each time rather than copying an existing one.
The close() functionality appears to not be being called for TDB due to the SDB conditions.
With this SDB condition remove the connections would the become closed when closed() is called.
This has exposed several problems that both tests and runtime expose.
This problem is considered out of scope for the changes and the close code is commented out with a FIXME statement.

The documentation for example.runtime.properties refers to VitroConnection.DataSource.* only being used by SDB but this is incorrect.
The OpenSocial code appears to talk directly to an SQL database using these properties.
Update the documentation in this regard and replace the references to SDB with OpenSocial.

Remove no longer necessary SDB helper code that is added to TDB as well because it "shouldn't hurt".

Remove much of the documentation and installation functionality using or referencing SDB.

The IndividualDaoSDB.java file is implemting getAllIndividualUris() and getUpdatedSinceIterator() via the now removed IndivdiualSDB class.
This functionality is used by RebuildIndexTask and UpdateUrisTask classes.

The new *DB classes now have java docs.
These java docs have a bare-bones level of detail.

How should this be tested?

Load the i18n data and try to use via the web interface.

Additional Notes:

The DatasetWrapperFactory that is removed in this PR is a factory of DatasetWrapper as described. However, the DatasetWrapper being passed has a local reference of the Dataset. The same Dataset, as far as I can tell, is being passed around. Creating a new DatasetWrapper for the same Dataset does not seem helpful when if that same Dataset is closed anywhere, all of the different DatasetWrapper instantiations holding a reference to that same Dataset will in actuality could become closed even though their respective wrapper does not necessirly have the closed boolean set.

This behavior is a bit confusing to me so I removed the implementation as I described above under the "What does this pull request do?" section. If this is doing something I am not seeing or undersanding, then the DatasetWrapperFactory could be restored. I think it would be worth while to consider creating/instantiating a new Dataset when the a new DatasetWrapper is created via the DatasetWrapperFactory. This would possible be of help in resolving the closure problems described above.

The StaticDatasetFactory is confusing to me. This class appears to provide a non-static instance of a Dataset being shared globally and is treated as if it were static. I am uncertain on the implications of this design decision.

I described above about the deprecation of DatasetFactory.createMem();.
Further investigation has revealed that DatasetFactory.createMem(); is in fact removed as of Jena 4.0.0 and greater. The migration of that method call will be performed in the subsequent Jena migration related PRs.

Interested parties

@brianjlowe
@vivo-project/vivo-committers

The Jena version being switched to (Jena 4) has removed support for SDB.

The current forms of `DatasetWrapperFactory` and `StaticDatasetFactory` should no longer be needed.

The SDB related code has been stripped out.
Many of the classes with "SDB" in the name that actually provide more than just SDB have been re-created.
These recreations are generally child classes of a similarly named Jena class.
These recreates have "DB" in their name rather than "SDB".

The `DatasetFactory.createMem()` is deprecated and may not be available in Jena 4.
Attempts to replace this with `createTxnMem()` have revealed problems that are potentially transaction related.
Replacing with `createGeneral()` might be possible but this is not done to avoid introducing more potential problems.

Notable points in regards to replacing `DatasetFactory.createMem()`:
1) The method is deprecated.
2) The `DatasetFactory.createGeneral()` is the compatible equivalent.
3) The `DatasetFactory.createGeneral()` better supports TDB (which is the main reason for including in this commit set).
4) The `DatasetFactory.createTxnMem()` is the more recommended alternative and is transactional.
5) This better prepares the code for the upcoming Jena upgrade changes.

There are some serious existing problems with closing dataset connections (and related).
The (now removed) `DatasetWrapperFactory` probably should be rewritten to provide a new dataset each time rather than copying an existing one.
The `close()` functionality appears to not be being called for TDB due to the SDB conditions.
With this SDB condition the connections are now being closed.
This has exposed several problems that both tests and runtime expose.
This problem is considered out of scope for the changes and the close code is commented out with a FIXME statement.

The documentation for `example.runtime.properties` refers to `VitroConnection.DataSource.*` only being used by SDB but this is incorrect.
The OpenSocial code appears to talk directly to an SQL database using these properties.
Update the documentation in this regard and replace the references to SDB with OpenSocial.

Remove no longer necessary SDB helper code that is added to TDB as well because it "shouldn't hurt".

Remove much of the documentation and installation functionality using or referencing SDB.

The `IndividualDaoSDB.java` file is implemting `getAllIndividualUris()` and `getUpdatedSinceIterator()` via the now removed `IndivdiualSDB` class.
This functionality is used by `RebuildIndexTask` and `UpdateUrisTask` classes.

The new *DB classes now have java docs.
These java docs have a bare-bones level of detail.
@kaladay kaladay changed the base branch from main to jena4-upgrade March 6, 2023 21:41
@kaladay kaladay marked this pull request as ready for review March 6, 2023 21:45
@chenejac chenejac requested a review from brianjlowe March 7, 2023 07:26
Copy link
Contributor

@chenejac chenejac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kaladay thanks a lot for this really great PR. I have posted three small comments/suggestions. Also, I was wondering for a moment whether some *DB files should be continuation of *SDB files (by using mvn rm), but it looks as it is not so similar, it is more re-creation and just extraction of smaller part of SDB classes, therefore no needs for any change there.

Comment on lines +55 to +57
// FIXME: This is causing NPEs and connection already closed errors in tests.
// Not closing the dataset can result in excessive memory or other resource usage.
/*if (!closed) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, it means multiple instances of DatasetWrapper class are used in VIVO, and those instances share the same Dataset instance, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is my understanding of the situation.

In addition to that, I suspect some of the problems are possibly race conditions as well.
The query execution may close its connection but it might do so after the dataset has been closed. This was observed by adding a system out print statement before the query execution and the NPE stopped happening.

kaladay and others added 2 commits March 9, 2023 08:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cleaning up the Jena SDB related code
2 participants