Dataverse Software 5.7

This release brings new features, enhancements, and bug fixes to the Dataverse Software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Release Highlights

Experimental Support for External Vocabulary Services

Dataverse can now be configured to associate specific metadata fields with third-party vocabulary services to provide an easy way for users to select values from those vocabularies. The mapping involves use of external Javascripts. Two such scripts have been developed so far: one for vocabularies served via the SKOSMOS protocol and one allowing people to be identified via their ORCID. The guides contain info about the new :CVocConf setting used for configuration and additional information about this functionality. Scripts, examples, and additional documentation are available at the GDCC GitHub Repository.

Please watch the online presentation, read the document with requirements and join the Dataverse Working Group on Ontologies and Controlled Vocabularies if you have some questions and want to contribute.

This functionality was initially developed by Data Archiving and Networked Services (DANS-KNAW), the Netherlands, and funded by SSHOC, "Social Sciences and Humanities Open Cloud". SSHOC has received funding from the European Union’s Horizon 2020 project call H2020-INFRAEOSC-04-2018, grant agreement #823782. It was further improved by the Global Dataverse Community Consortium (GDCC) and extended with the support of semantic search.

Curation Status Labels

A new :AllowedCurationLabels setting allows a sysadmins to define one or more sets of labels that can be applied to a draft Dataset version via the user interface or API to indicate the status of the dataset with respect to a defined curation process.

Labels are completely customizable (alphanumeric or spaces, up to 32 characters, e.g. "Author contacted", "Privacy Review", "Awaiting paper publication"). Superusers can select a specific set of labels, or disable this functionality per collection. Anyone who can publish a draft dataset (e.g. curators) can set/change/remove labels (from the set specified for the collection containing the dataset) via the user interface or via an API. The API also would allow external tools to search for, read and set labels on Datasets, providing an integration mechanism. Labels are visible on the Dataset page and in Dataverse collection listings/search results. Internally, the labels have no effect, and at publication, any existing label will be removed. A reporting API call allows admins to get a list of datasets and their curation statuses.

The Solr schema must be updated as part of installing the release of Dataverse containing this feature for it to work.

Major Use Cases

Newly-supported major use cases in this release include:

Administrators will be able to set up integrations with external vocabulary services, allowing for autocomplete-assisted metadata entry, metadata standardization, and better integration with other systems (Issue #7711, PR #7946)
Users viewing datasets in the root Dataverse collection will now see breadcrumbs that have have a link back to the root Dataverse collection (Issue #7527, PR #8078)
Users will be able to more easily differentiate between datasets and files through new iconography (Issue #7991, PR #8021)
Users retrieving large guestbooks over the API will experience fewer failures (Issue #8073, PR #8084)
Dataverse collection administrators can specify which language will be used when entering metadata for new Datasets in a collection, based on a list of languages specified by the Dataverse installation administrator (Issue #7388, PR #7958)
- Users will see the language used for metadata entry indicated at the document or element level in metadata exports (Issue #7388, PR #7958)
- Administrators will now be able to specify the language(s) of controlled vocabulary entries, in addition to the installation's default language (Issue #6751, PR #7959)
Administrators and curators can now receive notifications when a dataset is created (Issue #8069, PR #8070)
Administrators with large files in their installation can disable the automatic checksum verification process at publish time (Issue #8043, PR #8074)

Notes for Dataverse Installation Administrators

Dataset Creation Notifications for Administrators

A new :SendNotificationOnDatasetCreation setting has been added. When true, administrators and curators (those who can publish the dataset) will get a notification when a new dataset is created. This makes it easier to track activity in a Dataverse and, for example, allow admins to follow up when users do not publish a new dataset within some period of time.

Skip Checksum Validation at Publish Based on Size

When a user requests to publish a dataset, the time taken to complete the publishing process varies based on the dataset/datafile size.

With the additional settings of :DatasetChecksumValidationSizeLimit and :DataFileChecksumValidationSizeLimit, the checksum validation can be skipped while publishing.

If the Dataverse administrator chooses to set these values, it's strongly recommended to have an external auditing system run periodically in order to monitor the integrity of the files in the Dataverse installation.

Guestbook Response API Update

With this release the Retrieve Guestbook Responses for a Dataverse Collection API will no longer produce a file by default. You may specify an output file by adding a -o $YOURFILENAME to the curl command.

Dynamic JavaServer Faces Configuration Options

This release includes a new way to easily change JSF settings via MicroProfile Config, especially useful during development.
See the development guide on "Debugging" for more information.

Enhancements to DDI Metadata Exports

Several changes have been made to the DDI exports to improve support for internationalization and to improve compliance with CESSDA requirements. These changes include:

Addition of xml:lang attributes specifying the dataset metadata language at the document level and for individual elements such as title and description
Specification of controlled vocabulary terms in duplicate elements in multiple languages (in the installation default langauge and, if different, the dataset metadata language)

While these changes are intended to improve harvesting and integration with external systems, they could break existing connections that make assumptions about the elements and attributes that have been changed.

New JVM Options and DB Settings

:SendNotificationOnDatasetCreation - A boolean setting that, if true will send an email and notification to additional users when a Dataset is created. Messages go to those, other than the dataset creator, who have the ability/permission necessary to publish the dataset.
:DatasetChecksumValidationSizeLimit - disables the checksum validation while publishing for any dataset size greater than the limit.
:DataFileChecksumValidationSizeLimit - Disables the checksum validation while publishing for any datafiles greater than the limit.
:CVocConf - A JSON-structured setting that configures Dataverse to associate specific metadatablock fields with external vocabulary services and specific vocabularies/sub-vocabularies managed by that service.
:MetadataLanguages - Sets which languages can be used when entering dataset metadata.
:AllowedCurationLabels - A JSON Object containing lists of allowed labels (up to 32 characters, spaces allowed) that can be set, via API or UI by users with the permission to publish a dataset. The set of labels allowed for datasets can be selected by a superuser - via the Dataverse collection page (Edit/General Info) or set via API call.

Notes for Tool Developers and Integrators

Bags Now Support File Paths

The original Bag generation code stored all dataset files directly under the /data directory. With the addition in Dataverse of a directory path for files and then a change to allow files with different paths to have the same name, archival Bags will now use the directory path from Dataverse to avoid name collisions within the /data directory. Prior to this update, Bags from Datasets with multiple files with the same name would have been created with only one of the files with that name (with warnings in the log, but still generating the Bag).

Complete List of Changes

For the complete list of code changes in this release, see the 5.7 Milestone in Github.

For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email [email protected].

Installation

If this is a new installation, please see our Installation Guide.

Upgrade Instructions

0. These instructions assume that you've already successfully upgraded from Dataverse Software 4.x to Dataverse Software 5 following the instructions in the Dataverse Software 5 Release Notes. After upgrading from the 4.x series to 5.0, you should progress through the other 5.x releases before attempting the upgrade to 5.7.

If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.

In the following commands we assume that Payara 5 is installed in /usr/local/payara5. If not, adjust as needed.

export PAYARA=/usr/local/payara5

(or setenv PAYARA /usr/local/payara5 if you are using a csh-like shell)

1. Undeploy the previous version.

$PAYARA/bin/asadmin list-applications
$PAYARA/bin/asadmin undeploy dataverse<-version>

2. Stop Payara and remove the generated directory

service payara stop
rm -rf $PAYARA/glassfish/domains/domain1/generated

3. Start Payara

service payara start

4. Deploy this version.

$PAYARA/bin/asadmin deploy dataverse-5.7.war

5. Restart payara

service payara stop
service payara start

Additional Release Steps

1. Replace Solr schema.xml to allow Curation Labels to be used. See specific instructions below for those installations with custom metadata blocks (1a) and those without (1b).

1a.

For installations with Custom Metadata Blocks:

-stop solr instance (usually service solr stop, depending on solr installation/OS, see the Installation Guide

add the following line to your schema.xml:

<field name="externalStatus" type="string" stored="true" indexed="true" multiValued="false"/>
restart solr instance (usually service solr start, depending on solr/OS)

1b.

For installations without Custom Metadata Blocks: