Skip to content

Commit

Permalink
Merge pull request #8302 from IQSS/develop
Browse files Browse the repository at this point in the history
v5.9
  • Loading branch information
kcondon authored Dec 9, 2021
2 parents 9161cd6 + bbe58f2 commit fb24c87
Show file tree
Hide file tree
Showing 103 changed files with 4,341 additions and 603 deletions.
1 change: 0 additions & 1 deletion conf/docker-aio/0prep_deps.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ wdir=`pwd`

if [ ! -e dv/deps/payara-5.2021.5.zip ]; then
echo "payara dependency prep"
# no more fiddly patching :)
wget https://s3-eu-west-1.amazonaws.com/payara.fish/Payara+Downloads/5.2021.5/payara-5.2021.5.zip -O dv/deps/payara-5.2021.5.zip
fi

Expand Down
6 changes: 3 additions & 3 deletions conf/docker-aio/1prep.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,10 @@ cd ../../
cp -r scripts conf/docker-aio/testdata/
cp doc/sphinx-guides/source/_static/util/createsequence.sql conf/docker-aio/testdata/doc/sphinx-guides/source/_static/util/

wget -q https://downloads.apache.org/maven/maven-3/3.6.3/binaries/apache-maven-3.6.3-bin.tar.gz
tar xfz apache-maven-3.6.3-bin.tar.gz
wget -q https://downloads.apache.org/maven/maven-3/3.8.4/binaries/apache-maven-3.8.4-bin.tar.gz
tar xfz apache-maven-3.8.4-bin.tar.gz
mkdir maven
mv apache-maven-3.6.3/* maven/
mv apache-maven-3.8.4/* maven/
echo "export JAVA_HOME=/usr/lib/jvm/jre-openjdk" > maven/maven.sh
echo "export M2_HOME=../maven" >> maven/maven.sh
echo "export MAVEN_HOME=../maven" >> maven/maven.sh
Expand Down
2 changes: 1 addition & 1 deletion conf/docker-aio/c8.dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ RUN cd /opt ; unzip /tmp/dv/deps/payara-5.2021.5.zip ; ln -s /opt/payara5 /opt/g
# this dies under Java 11, do we keep it?
#COPY domain-restmonitor.xml /opt/payara5/glassfish/domains/domain1/config/domain.xml

RUN sudo -u postgres /usr/pgsql-13/bin/initdb -D /var/lib/pgsql/13/data
RUN sudo -u postgres /usr/pgsql-13/bin/initdb -D /var/lib/pgsql/13/data -E 'UTF-8'

# copy configuration related files
RUN cp /tmp/dv/pg_hba.conf /var/lib/pgsql/13/data/
Expand Down
2 changes: 1 addition & 1 deletion doc/release-notes/5.8-release-notes.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ This release brings new features, enhancements, and bug fixes to the Dataverse S

### Support for Data Embargoes

The Dataverse Software now supports file-level embargoes. The ability to set embargoes, up to a maximum duration (in months), can be configured by a Dataverse installation administrator. For more information, see the [Embargoes section](https://guides.dataverse.org/en/5.8/user/dataset-management.rst#embargoes) of the Dataverse Software Guides.
The Dataverse Software now supports file-level embargoes. The ability to set embargoes, up to a maximum duration (in months), can be configured by a Dataverse installation administrator. For more information, see the [Embargoes section](https://guides.dataverse.org/en/5.8/user/dataset-management.html#embargoes) of the Dataverse Software Guides.

- Users can configure a specific embargo, defined by an end date and a short reason, on a set of selected files or an individual file, by selecting the 'Embargo' menu item and entering information in a popup dialog. Embargoes can only be set, changed, or removed before a file has been published. After publication, only Dataverse installation administrators can make changes, using an API.

Expand Down
171 changes: 171 additions & 0 deletions doc/release-notes/5.9-release-notes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
# Dataverse Software 5.9

This release brings new features, enhancements, and bug fixes to the Dataverse Software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

## Release Highlights

### Dataverse Collection Page Optimizations

The Dataverse Collection page, which also serves as the search page and the homepage in most Dataverse installations, has been optimized, with a specific focus on reducing the number of queries for each page load. These optimizations will be more noticable on Dataverse installations with higher traffic.

### Support for HTTP "Range" Header for Partial File Downloads

Dataverse now supports the HTTP "Range" header, which allows users to download parts of a file. Here are some examples:

- `bytes=0-9` gets the first 10 bytes.
- `bytes=10-19` gets 10 bytes from the middle.
- `bytes=-10` gets the last 10 bytes.
- `bytes=9-` gets all bytes except the first 10.

Only a single range is supported. For more information, see the [Data Access API](https://guides.dataverse.org/en/5.9/api/dataaccess.html) section of the API Guide.

### Support for Optional External Metadata Validation Scripts

The Dataverse software now allows an installation administrator to provide custom scripts for additional metadata validation when datasets are being published and/or when Dataverse collections are being published or modified. The Harvard Dataverse Repository has been using this mechanism to combat content that violates our Terms of Use, specifically spam content. All the validation or verification logic is defined in these external scripts, thus making it possible for an installation to add checks custom-tailored to their needs.

Please note that only the metadata are subject to these validation checks. This does not check the content of any uploaded files.

For more information, see the [Database Settings](https://guides.dataverse.org/en/5.9/installation/config.html) section of the Guide. The new settings are listed below, in the "New JVM Options and DB Settings" section of these release notes.

### Displaying Author's Identifier as Link

In the dataset page's metadata tab the author's identifier is now displayed as a clickable link, which points to the profile page in the external service (ORCID, VIAF etc.) in cases where the identifier scheme provides a resolvable landing page. If the identifier does not match the expected scheme, a link is not shown.

### Auxiliary File API Enhancements

This release includes updates to the Auxiliary File API. These updates include:

- Auxiliary files can now also be associated with non-tabular files
- Auxiliary files can now be deleted
- Duplicate Auxiliary files can no longer be created
- A new API has been added to list Auxiliary files by their origin
- Some auxiliary were being saved with the wrong content type (MIME type) but now the user can supply the content type on upload, overriding the type that would otherwise be assigned
- Improved error reporting
- A bugfix involving checksums for Auxiliary files

Please note that the Auxiliary files feature is experimental and is designed to support integration with tools from the [OpenDP Project](https://opendp.org). If the API endpoints are not needed they can be blocked.

## Major Use Cases and Infrastructure Enhancements

Newly-supported major use cases in this release include:

- The Dataverse collection page has been optimized, resulting in quicker load times on one of the most common pages in the application (Issue #7804, PR #8143)
- Users will now be able to specify a certain byte range in their downloads via API, allowing for downloads of file parts. (Issue #6397, PR #8087)
- A Dataverse installation administrator can now set up metadata validation for datasets and Dataverse collections, allowing for publish-time and create-time checks for all content. (Issue #8155, PR #8245)
- Users will be provided with clickable links to authors' ORCIDs and other IDs in the dataset metadata (Issue #7978, PR #7979)
- Users will now be able to associate Auxiliary files with non-tabular files (Issue #8235, PR #8237)
- Users will no longer be able to create duplicate Auxiliary files (Issue #8235, PR #8237)
- Users will be able to delete Auxiliary files (Issue #8235, PR #8237)
- Users can retrieve a list of Auxiliary files based on their origin (Issue #8235, PR #8237)
- Users will be able to supply the content type of Auxiliary files on upload (Issue #8241, PR #8282)
- The indexing process has been updated so that datasets with fewer files and indexed first, resulting in fewer failures and making it easier to identify problematically-large datasets. (Issue #8097, PR #8152)
- Users will no longer be able to create metadata records with problematic special characters, which would later require Dataverse installation administrator intervention and a database change (Issue #8018, PR #8242)
- The Dataverse software will now appropriately recognize files with the .geojson extension as GeoJSON files rather than "unknown" (Issue #8261, PR #8262)
- A Dataverse installation administrator can now retrieve more information about role deletion from the ActionLogRecord (Issue #2912, PR #8211)
- Users will be able to use a new role to allow a user to respond to file download requests without also giving them the power to manage the dataset (Issue #8109, PR #8174)
- Users will no longer be forced to update their passwords when moving from Dataverse 3.x to Dataverse 4.x (PR #7916)
- Improved accessibility of buttons on the Dataset and File pages (Issue #8247, PR #8257)

## Notes for Dataverse Installation Administrators

### Indexing Performance on Datasets with Large Numbers of Files

We discovered that whenever a full reindexing needs to be performed, datasets with large numbers of files take an exceptionally long time to index. For example, in the Harvard Dataverse Repository, it takes several hours for a dataset that has 25,000 files. In situations where the Solr index needs to be erased and rebuilt from scratch (such as a Solr version upgrade, or a corrupt index, etc.) this can significantly delay the repopulation of the search catalog.

We are still investigating the reasons behind this performance issue. For now, even though some improvements have been made, a dataset with thousands of files is still going to take a long time to index. In this release, we've made a simple change to the reindexing process, to index any such datasets at the very end of the batch, after all the datasets with fewer files have been reindexed. This does not improve the overall reindexing time, but will repopulate the bulk of the search index much faster for the users of the installation.

### Custom Analytics Code Changes

You should update your custom analytics code to capture a bug fix related to tracking within the dataset files table. This release restores that tracking.

For more information, see the documentation and sample analytics code snippet provided in [Installation Guide](http://guides.dataverse.org/en/5.9/installation/config.html#web-analytics-code). This update can be used on any version 5.4+.

### New ManageFilePermissions Permission

Dataverse can now support a use case in which a Admin or Curator would like to delegate the ability to grant access to restricted files to other users. This can be implemented by creating a custom role (e.g. DownloadApprover) that has the new ManageFilePermissions permission. This release introduces the new permission, and a Flyway script adjusts the existing Admin and Curator roles so they continue to have the ability to grant file download requrests.

### Thumbnail Defaults

New *default* values have been added for the JVM settings `dataverse.dataAccess.thumbnail.image.limit` and `dataverse.dataAccess.thumbnail.pdf.limit`, of 3MB and 1MB respectively. This means that, *unless specified otherwise* by the JVM settings already in your domain configuration, the application will skip attempting to generate thumbnails for image files and PDFs that are above these size limits.
In previous versions, if these limits were not explicitly set, the application would try to create thumbnails for files of unlimited size. Which would occasionally cause problems with very large images.

## New JVM Options and DB Settings

The following DB settings allow configuration of the external metadata validator:

- :DataverseMetadataValidatorScript
- :DataverseMetadataPublishValidationFailureMsg
- :DataverseMetadataUpdateValidationFailureMsg
- :DatasetMetadataValidatorScript
- :DatasetMetadataValidationFailureMsg
- :ExternalValidationAdminOverride

See the [Database Settings](https://guides.dataverse.org/en/5.9/installation/config.html) section of the Guides for more information.

## Notes for Developers and Integrators

Two sections of the Developer Guide have been updated:

- Instructions on how to sync a PR in progress with develop have been added in the version control section
- Guidance on avoiding ineffeciencies in JSF render logic has been added to the "Tips" section

## Complete List of Changes

For the complete list of code changes in this release, see the [5.9 Milestone](https://github.com/IQSS/dataverse/milestone/100?closed=1) in Github.

For help with upgrading, installing, or general questions please post to the [Dataverse Community Google Group](https://groups.google.com/forum/#!forum/dataverse-community) or email [email protected].

## Installation

If this is a new installation, please see our [Installation Guide](https://guides.dataverse.org/en/5.9/installation/). Please also contact us to get added to the [Dataverse Project Map](https://guides.dataverse.org/en/5.9/installation/config.html#putting-your-dataverse-installation-on-the-map-at-dataverse-org) if you have not done so already.

## Upgrade Instructions

0\. These instructions assume that you've already successfully upgraded from Dataverse Software 4.x to Dataverse Software 5 following the instructions in the [Dataverse Software 5 Release Notes](https://github.com/IQSS/dataverse/releases/tag/v5.0). After upgrading from the 4.x series to 5.0, you should progress through the other 5.x releases before attempting the upgrade to 5.9.

If you are running Payara as a non-root user (and you should be!), **remember not to execute the commands below as root**. Use `sudo` to change to that user first. For example, `sudo -i -u dataverse` if `dataverse` is your dedicated application user.

In the following commands we assume that Payara 5 is installed in `/usr/local/payara5`. If not, adjust as needed.

`export PAYARA=/usr/local/payara5`

(or `setenv PAYARA /usr/local/payara5` if you are using a `csh`-like shell)

1\. Undeploy the previous version.

- `$PAYARA/bin/asadmin list-applications`
- `$PAYARA/bin/asadmin undeploy dataverse<-version>`

2\. Stop Payara and remove the generated directory

- `service payara stop`
- `rm -rf $PAYARA/glassfish/domains/domain1/generated`

3\. Start Payara

- `service payara start`

4\. Deploy this version.

- `$PAYARA/bin/asadmin deploy dataverse-5.9.war`

5\. Restart payara

- `service payara stop`
- `service payara start`

6\. Run ReExportall to update JSON Exports

Following the directions in the [Admin Guide](http://guides.dataverse.org/en/5.9/admin/metadataexport.html?highlight=export#batch-exports-through-the-api)

## Additional Release Steps

(for installations collecting web analytics)

1\. Update custom analytics code per the [Installation Guide](http://guides.dataverse.org/en/5.9/installation/config.html#web-analytics-code).

(for installations with GeoJSON files)

1\. Redetect GeoJSON files to update the type from "Unknown" to GeoJSON, following the directions in the [API Guide](https://guides.dataverse.org/en/5.9/api/native-api.html#redetect-file-type)

2\. Kick off full reindex following the directions in the [Admin Guide](http://guides.dataverse.org/en/5.9/admin/solr-search-index.html)
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,7 @@
var row = target.parents('tr')[0];
if(row != null) {
//finds the file id/DOI in the Dataset page
label = $(row).find('div.file-metadata-block > a').attr('href');
label = $(row).find('td.col-file-metadata a').attr('href');
} else {
//finds the file id/DOI in the file page
label = $('#fileForm').attr('action');
Expand Down
7 changes: 7 additions & 0 deletions doc/sphinx-guides/source/admin/dataverses-datasets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,13 @@ Removes a link between a Dataverse collection and another Dataverse collection.

curl -H "X-Dataverse-key: $API_TOKEN" -X DELETE http://$SERVER/api/dataverses/$linked-dataverse-alias/deleteLink/$linking-dataverse-alias

List Dataverse Collection Links
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Provides information about whether a certain Dataverse collection ($dataverse-alias) is linked to or links to another collection. Only accessible to superusers. ::

curl -H "X-Dataverse-key:$API_TOKEN" http://$SERVER/api/dataverses/$dataverse-alias/links

Add Dataverse Collection RoleAssignments to Dataverse Subcollections
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand Down
24 changes: 24 additions & 0 deletions doc/sphinx-guides/source/admin/monitoring.rst
Original file line number Diff line number Diff line change
Expand Up @@ -121,3 +121,27 @@ EJB Timers
Should you be interested in monitoring the EJB timers, this script may be used as an example:

.. literalinclude:: ../_static/util/check_timer.bash

AWS RDS
-------

Some installations of Dataverse use AWS's "database as a service" offering called RDS (Relational Database Service) so it's worth mentioning some monitoring tips here.

There are two documents that are especially worth reviewing:

- `Monitoring an Amazon RDS DB instance <https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Monitoring.html>`_: The official documentation.
- `Performance Monitoring Workshop for RDS PostgreSQL and Aurora PostgreSQL <https://rdspg-monitoring.workshop.aws/en/intro.html>`_: A workshop that steps through practical examples and even includes labs featuring tools to generate load.

Tips:

- Enable **Performance Insights**. The `product page <https://aws.amazon.com/rds/performance-insights/>`_ includes a `video from 2017 <https://youtu.be/4462hcfkApM>`_ that is still compelling today. For example, the `Top SQL <https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PerfInsights.UsingDashboard.Components.AvgActiveSessions.TopLoadItemsTable.TopSQL.html>`_ tab shows the SQL queries that are contributing the most to database load. There's also a `video from 2018 <https://www.youtube.com/watch?v=yOeWcPBT458>`_ mentioned in the `overview <https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PerfInsights.Overview.html>`_ that's worth watching.

- Note that Performance Insights is only available for `PostgreSQL 10 and higher <https://aws.amazon.com/about-aws/whats-new/2018/04/rds-performance-insights-on-rds-for-postgresql/>`_ (also mentioned `in docs <https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PerfInsights.Overview.Engines.html>`_). Version 11 has digest statistics enabled automatically but there's an `extra step <https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PerfInsights.UsingDashboard.AnalyzeDBLoad.AdditionalMetrics.PostgreSQL.html#USER_PerfInsights.UsingDashboard.AnalyzeDBLoad.AdditionalMetrics.PostgreSQL.digest>`_ for version 10.
- `Performance Insights policies <https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PerfInsights.access-control.html>`_ describes how to give access to Performance Insights to someone who doesn't have full access to RDS (``AmazonRDSFullAccess``).

- Enable the **slow query log** and consider using pgbadger to analyze the log files. Set ``log_min_duration_statement`` to "5000", for example, to log all queries that take 5 seconds or more. See `enable query logging <https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_LogAccess.Concepts.PostgreSQL.html#USER_LogAccess.Concepts.PostgreSQL.Query_Logging>`_ in the user guide or `slides <https://rdspg-monitoring.workshop.aws/en/postgresql-logs/enable-slow-query-log.html>`_ from the workshop for details. Using pgbadger is also mentioned as a `common DBA task <https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appendix.PostgreSQL.CommonDBATasks.html#Appendix.PostgreSQL.CommonDBATasks.Badger>`_.
- Use **CloudWatch**. CloudWatch gathers metrics about CPU utilization from the hypervisor for a DB instance. It's a separate service to log into so access can be granted more freely than to RDS. See `CloudWatch docs <https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/monitoring-cloudwatch.html>`_.
- Use **Enhanced Monitoring**. Enhanced Monitoring gathers its metrics from an agent on the instance. See `Enhanced Monitoring docs <https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_Monitoring.OS.html>`_.
- It's possible to view and act on **RDS Events** such as snapshots, parameter changes, etc. See `Working with Amazon RDS events <https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PerfInsights.access-control.html>`_ for details.
- RDS monitoring is available via API and the ``aws`` command line tool. For example, see `Retrieving metrics with the Performance Insights API <https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PerfInsights.API.html>`_.
- To play with monitoring RDS using a server configured by `dataverse-ansible <https://github.com/GlobalDataverseCommunityConsortium/dataverse-ansible>`_ set ``use_rds`` to true to skip some steps that aren't necessary when using RDS. See also the :doc:`/developers/deployment` section of the Developer Guide.
Loading

0 comments on commit fb24c87

Please sign in to comment.