Skip to content

Commit

Permalink
Merge pull request #7796 from IQSS/develop
Browse files Browse the repository at this point in the history
v5.4.1
  • Loading branch information
kcondon authored Apr 13, 2021
2 parents ea91390 + 473cde6 commit 80361bf
Show file tree
Hide file tree
Showing 18 changed files with 151 additions and 38 deletions.
46 changes: 46 additions & 0 deletions doc/release-notes/5.4.1-release-notes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Dataverse Software 5.4.1

This release provides a fix for a regression introduced in 5.4 and implements a few other small changes. Please use 5.4.1 for production deployments instead of 5.4.

## Release Highlights

### API Backwards Compatibility Maintained

The syntax in the example in the [Basic File Access](https://guides.dataverse.org/en/5.4.1/api/dataaccess.html#basic-file-access) section of the Dataverse Software Guides will continue to work.

## Complete List of Changes

For the complete list of code changes in this release, see the [5.4.1 Milestone](https://github.com/IQSS/dataverse/milestone/95?closed=1) in Github.

For help with upgrading, installing, or general questions please post to the [Dataverse Community Google Group](https://groups.google.com/forum/#!forum/dataverse-community) or email [email protected].

## Installation

If this is a new installation, please see our [Installation Guide](https://guides.dataverse.org/en/5.4.1/installation/).

## Upgrade Instructions

0\. These instructions assume that you've already successfully upgraded from Dataverse Software 4.x to Dataverse Software 5 following the instructions in the [Dataverse Software 5 Release Notes](https://github.com/IQSS/dataverse/releases/tag/v5.0). After upgrading from the 4.x series to 5.0, you should progress through the other 5.x releases before attempting the upgrade to 5.4.1.

1\. Undeploy the previous version.

- `$PAYARA/bin/asadmin list-applications`
- `$PAYARA/bin/asadmin undeploy dataverse<-version>`

2\. Stop Payara and remove the generated directory

- `service payara stop`
- `rm -rf $PAYARA/glassfish/domains/domain1/generated`

3\. Start Payara

- `service payara start`

4\. Deploy this version.

- `$PAYARA/bin/asadmin deploy dataverse-5.4.1.war`

5\. Restart payara

- `service payara stop`
- `service payara start`
2 changes: 1 addition & 1 deletion doc/sphinx-guides/source/admin/make-data-count.rst
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@ Populate Views and Downloads Nightly

Running ``main.py`` to create the SUSHI JSON file and the subsequent calling of the Dataverse Software API to process it should be added as a cron job.

The Dataverse Software provides example scripts that run the steps to process new accesses and uploads and update your Dataverse installation's database (`counter_daily.sh</_static/util/counter_daily.sh>`) and to retrieve citations for all Datasets from DataCite (`counter_weekly.sh</_static/util/counter_weekly.sh>`). These scripts should be configured for your environment and can be run manually or as cron jobs.
The Dataverse Software provides example scripts that run the steps to process new accesses and uploads and update your Dataverse installation's database :download:`counter_daily.sh <../_static/util/counter_daily.sh>` and to retrieve citations for all Datasets from DataCite :download:`counter_weekly.sh <../_static/util/counter_weekly.sh>`. These scripts should be configured for your environment and can be run manually or as cron jobs.

Sending Usage Metrics to the DataCite Hub
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
7 changes: 7 additions & 0 deletions doc/sphinx-guides/source/api/client-libraries.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,3 +37,10 @@ Java
https://github.com/IQSS/dataverse-client-java is the official Java library for Dataverse Software APIs.

`Richard Adams <http://www.researchspace.com/electronic-lab-notebook/about_us_team.html>`_ from `ResearchSpace <http://www.researchspace.com>`_ created and maintains this library.

Ruby
----

https://github.com/libis/dataverse_api is a Ruby gem for Dataverse Software APIs. It is registered as a library on Rubygems (https://rubygems.org/search?query=dataverse).

The gem is created and maintained by the LIBIS team (https://www.libis.be) at the University of Leuven (https://www.kuleuven.be).
2 changes: 1 addition & 1 deletion doc/sphinx-guides/source/api/dataaccess.rst
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ Basic access URI:

Example: Getting the file whose DOI is *10.5072/FK2/J8SJZB* ::

GET http://$SERVER/api/access/datafile/:persistentId/?persistentId=doi:10.5072/FK2/J8SJZB
GET http://$SERVER/api/access/datafile/:persistentId?persistentId=doi:10.5072/FK2/J8SJZB


Parameters:
Expand Down
4 changes: 4 additions & 0 deletions doc/sphinx-guides/source/api/native-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1246,6 +1246,8 @@ When adding a file to a dataset, you can optionally specify the following:
- The "File Path" of the file, indicating which folder the file should be uploaded to within the dataset.
- Whether or not the file is restricted.
Note that when a Dataverse instance is configured to use S3 storage with direct upload enabled, there is API support to send a file directly to S3. This is more complex and is described in the :doc:`/developers/s3-direct-upload-api` guide.
In the curl example below, all of the above are specified but they are optional.
.. note:: See :ref:`curl-examples-and-environment-variables` if you are unfamiliar with the use of ``export`` below.
Expand Down Expand Up @@ -1959,6 +1961,8 @@ Replacing Files
Replace an existing file where ``ID`` is the database id of the file to replace or ``PERSISTENT_ID`` is the persistent id (DOI or Handle) of the file. Requires the ``file`` to be passed as well as a ``jsonString`` expressing the new metadata. Note that metadata such as description, directoryLabel (File Path) and tags are not carried over from the file being replaced.
Note that when a Dataverse instance is configured to use S3 storage with direct upload enabled, there is API support to send a replacement file directly to S3. This is more complex and is described in the :doc:`/developers/s3-direct-upload-api` guide.
A curl example using an ``ID``
.. code-block:: bash
Expand Down
4 changes: 2 additions & 2 deletions doc/sphinx-guides/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,9 +65,9 @@
# built documents.
#
# The short X.Y version.
version = '5.4'
version = '5.4.1'
# The full version, including alpha/beta/rc tags.
release = '5.4'
release = '5.4.1'

# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
Expand Down
57 changes: 46 additions & 11 deletions doc/sphinx-guides/source/developers/s3-direct-upload-api.rst
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
Direct DataFile Upload API
==========================
Direct DataFile Upload/Replace API
==================================

The direct Datafile Upload API is used internally to support direct upload of files to S3 storage and by tools such as the DVUploader.

Direct upload involves a series of three activities, each involving interacting with the server for a Dataverse installation:

* Requesting initiation of a transfer from the server
* Use of the pre-signed URL(s) returned in that call to perform an upload/multipart-upload of the file to S3
* A call to the server to register the file as part of the dataset and/or to cancel the transfer
* A call to the server to register the file as part of the dataset/replace a file in the dataset or to cancel the transfer

This API is only enabled when a Dataset is configured with a data store supporting direct S3 upload.
Administrators should be aware that partial transfers, where a client starts uploading the file/parts of the file and does not contact the server to complete/cancel the transfer, will result in data stored in S3 that is not referenced in the Dataverse installation (e.g. should be considered temporary and deleted.)
Expand All @@ -24,7 +24,7 @@ To initiate a transfer of a file to S3, make a call to the Dataverse installatio
export PERSISTENT_IDENTIFIER=doi:10.5072/FK27U7YBV
export SIZE=1000000000
curl -H 'X-Dataverse-key:$API_TOKEN' "$SERVER_URL/api/datasets/:persistentId/uploadurls?persistentId=$PERSISTENT_IDENTIFIER&size=$SIZE"
curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/datasets/:persistentId/uploadurls?persistentId=$PERSISTENT_IDENTIFIER&size=$SIZE"
The response to this call, assuming direct uploads are enabled, will be one of two forms:

Expand Down Expand Up @@ -63,13 +63,23 @@ Multiple URLs: when the file must be uploaded in multiple parts. The part size i
In the example responses above, the URLs, which are very long, have been omitted. These URLs reference the S3 server and the specific object identifier that will be used, starting with, for example, https://demo-dataverse-bucket.s3.amazonaws.com/10.5072/FK2FOQPJS/177883b000e-49cedef268ac?...

The client must then use the URL(s) to POST the file, or if the file is larger than the specified partSize, parts of the file.
The client must then use the URL(s) to PUT the file, or if the file is larger than the specified partSize, parts of the file.

In the multipart case, the client must send each part and collect the 'eTag' responses from the server. To successfully conclude the multipart upload, the client must call the 'complete' URI, sending a json object including the part eTags:
In the single part case, only one call to the supplied URL is required:

.. code-block:: bash
curl -X PUT "$SERVER_URL/api/datasets/mpload?..." -d '{"1":"\<eTag1 string\>","2":"\<eTag2 string\>","3":"\<eTag3 string\>","4":"\<eTag4 string\>","5":"\<eTag5 string\>"}'
curl -H 'x-amz-tagging:dv-state=temp' -X PUT -T <filename> "<supplied url>"
In the multipart case, the client must send each part and collect the 'eTag' responses from the server. The calls for this are the same as the one for the single part case except that each call should send a <partSize> slice of the total file, with the last part containing the remaining bytes.
The responses from the S3 server for these calls will include the 'eTag' for the uploaded part.

To successfully conclude the multipart upload, the client must call the 'complete' URI, sending a json object including the part eTags:

.. code-block:: bash
curl -X PUT "$SERVER_URL/api/datasets/mpload?..." -d '{"1":"<eTag1 string>","2":"<eTag2 string>","3":"<eTag3 string>","4":"<eTag4 string>","5":"<eTag5 string>"}'
If the client is unable to complete the multipart upload, it should call the abort URL:

Expand All @@ -87,7 +97,6 @@ jsonData normally includes information such as a file description, tags, provena
* "storageIdentifier" - String, as specified in prior calls
* "fileName" - String
* "mimeType" - String
* "fileSize" - number of bytes
* fixity/checksum: either:

* "md5Hash" - String with MD5 hash value, or
Expand All @@ -100,12 +109,38 @@ The allowed checksum algorithms are defined by the edu.harvard.iq.dataverse.Data
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_IDENTIFIER=doi:10.5072/FK27U7YBV
export JSON_DATA={"description":"My description.","directoryLabel":"data/subdir1","categories":["Data"], "restrict":"false", "storageIdentifier":"s3://demo-dataverse-bucket:176e28068b0-1c3f80357c42", "fileName":"file1.txt", "mimeType":"text/plain", "fileSize":"27", "checksum": {"@type": "SHA-1", "@value": "123456"}}
export JSON_DATA="{'description':'My description.','directoryLabel':'data/subdir1','categories':['Data'], 'restrict':'false', 'storageIdentifier':'s3://demo-dataverse-bucket:176e28068b0-1c3f80357c42', 'fileName':'file1.txt', 'mimeType':'text/plain', 'checksum': {'@type': 'SHA-1', '@value': '123456'}}"
curl -X POST -H 'X-Dataverse-key: $API_TOKEN' "$SERVER_URL/api/datasets/:persistentId/add?persistentId=#PERSISTENT_IDENTIFIER" -F 'jsonData=$JSON_DATA'
curl -X POST -H "X-Dataverse-key: $API_TOKEN" "$SERVER_URL/api/datasets/:persistentId/add?persistentId=$PERSISTENT_IDENTIFIER" -F "jsonData=$JSON_DATA"
Note that this API call can be used independently of the others, e.g. supporting use cases in which the file already exists in S3/has been uploaded via some out-of-band method.
With current S3 stores the object identifier must be in the correct bucket for the store, include the PID authority/identifier of the parent dataset, and be guaranteed unique, and the supplied storage identifer must be prefaced with the store identifier used in the Dataverse installation, as with the internally generated examples above.

Replacing an existing file in the Dataset
-----------------------------------------

Once the file exists in the s3 bucket, a final API call is needed to register it as a replacement of an existing file. This call is the same call used to replace a file to a Dataverse installation but, rather than sending the file bytes, additional metadata is added using the "jsonData" parameter.
jsonData normally includes information such as a file description, tags, provenance, whether the file is restricted, whether to allow the mimetype to change (forceReplace=true), etc. For direct uploads, the jsonData object must also include values for:

* "storageIdentifier" - String, as specified in prior calls
* "fileName" - String
* "mimeType" - String
* fixity/checksum: either:

* "md5Hash" - String with MD5 hash value, or
* "checksum" - Json Object with "@type" field specifying the algorithm used and "@value" field with the value from that algorithm, both Strings

The allowed checksum algorithms are defined by the edu.harvard.iq.dataverse.DataFile.CheckSumType class and currently include MD5, SHA-1, SHA-256, and SHA-512.
Note that the API call does not validate that the file matches the hash value supplied. If a Dataverse instance is configured to validate file fixity hashes at publication time, a mismatch would be caught at that time and cause publication to fail.

.. code-block:: bash
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export FILE_IDENTIFIER=5072
export JSON_DATA="{'description':'My description.','directoryLabel':'data/subdir1','categories':['Data'], 'restrict':'false', 'forceReplace':'true', 'storageIdentifier':'s3://demo-dataverse-bucket:176e28068b0-1c3f80357c42', 'fileName':'file1.txt', 'mimeType':'text/plain', 'checksum': {'@type': 'SHA-1', '@value': '123456'}}"
curl -X POST -H "X-Dataverse-key: $API_TOKEN" "$SERVER_URL/api/files/$FILE_IDENTIFIER/replace" -F "jsonData=$JSON_DATA"

Note that this API call can be used independently of the others, e.g. supporting use cases in which the file already exists in S3/has been uploaded via some out-of-band method.
With current S3 stores the object identifier must be in the correct bucket for the store, include the PID authority/identifier of the parent dataset, and be guaranteed unique, and the supplied storage identifer must be prefaced with the store identifier used in the Dataverse installation, as with the internally generated examples above.
2 changes: 2 additions & 0 deletions doc/sphinx-guides/source/installation/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2225,6 +2225,8 @@ This is the local file system path to be used with the LocalSubmitToArchiveComma

These are the bucket and project names to be used with the GoogleCloudSubmitToArchiveCommand class. Further information is in the :ref:`Google Cloud Configuration` section above.

.. _:InstallationName:

:InstallationName
+++++++++++++++++

Expand Down
3 changes: 2 additions & 1 deletion doc/sphinx-guides/source/versions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,9 @@ Dataverse Software Documentation Versions

This list provides a way to refer to the documentation for previous versions of the Dataverse Software. In order to learn more about the updates delivered from one version to another, visit the `Releases <https://github.com/IQSS/dataverse/releases>`__ page in our GitHub repo.

- 5.4
- 5.4.1

- `5.4 </en/5.4/>`__
- `5.3 </en/5.3/>`__
- `5.2 </en/5.2/>`__
- `5.1.1 </en/5.1.1/>`__
Expand Down
2 changes: 1 addition & 1 deletion pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
-->
<groupId>edu.harvard.iq</groupId>
<artifactId>dataverse</artifactId>
<version>5.4</version>
<version>5.4.1</version>
<packaging>war</packaging>
<name>dataverse</name>
<properties>
Expand Down
3 changes: 0 additions & 3 deletions scripts/dev/dev-rebuild.sh
Original file line number Diff line number Diff line change
Expand Up @@ -52,9 +52,6 @@ cd scripts/api
./setup-all.sh --insecure -p=admin1 | tee /tmp/setup-all.sh.out
cd ../..

echo "Loading SQL reference data..."
psql -U $DB_USER $DB_NAME -f scripts/database/reference_data.sql

echo "Creating SQL sequence..."
psql -U $DB_USER $DB_NAME -f doc/sphinx-guides/source/_static/util/createsequence.sql

Expand Down
7 changes: 6 additions & 1 deletion src/main/java/edu/harvard/iq/dataverse/api/Access.java
Original file line number Diff line number Diff line change
Expand Up @@ -275,13 +275,18 @@ private DataFile findDataFileOrDieWrapper(String fileId){
@Produces({"application/xml"})
public DownloadInstance datafile(@PathParam("fileId") String fileId, @QueryParam("gbrecs") boolean gbrecs, @QueryParam("key") String apiToken, @Context UriInfo uriInfo, @Context HttpHeaders headers, @Context HttpServletResponse response) /*throws NotFoundException, ServiceUnavailableException, PermissionDeniedException, AuthorizationRequiredException*/ {

// check first if there's a trailing slash, and chop it:
while (fileId.lastIndexOf('/') == fileId.length() - 1) {
fileId = fileId.substring(0, fileId.length() - 1);
}

if (fileId.indexOf('/') > -1) {
// This is for embedding folder names into the Access API URLs;
// something like /api/access/datafile/folder/subfolder/1234
// instead of the normal /api/access/datafile/1234 notation.
// this is supported only for recreating folders during recursive downloads -
// i.e. they are embedded into the URL for the remote client like wget,
// but can be safely ignored here.
// but can be safely ignored here.
fileId = fileId.substring(fileId.lastIndexOf('/') + 1);
}

Expand Down
28 changes: 23 additions & 5 deletions src/main/java/edu/harvard/iq/dataverse/api/Files.java
Original file line number Diff line number Diff line change
Expand Up @@ -218,11 +218,27 @@ public Response replaceFileInDataset(
}

// (3) Get the file name and content type
if(null == contentDispositionHeader) {
return error(BAD_REQUEST, "You must upload a file.");
String newFilename = null;
String newFileContentType = null;
String newStorageIdentifier = null;
if (null == contentDispositionHeader) {
if (optionalFileParams.hasStorageIdentifier()) {
newStorageIdentifier = optionalFileParams.getStorageIdentifier();
// ToDo - check that storageIdentifier is valid
if (optionalFileParams.hasFileName()) {
newFilename = optionalFileParams.getFileName();
if (optionalFileParams.hasMimetype()) {
newFileContentType = optionalFileParams.getMimeType();
}
}
} else {
return error(BAD_REQUEST,
"You must upload a file or provide a storageidentifier, filename, and mimetype.");
}
} else {
newFilename = contentDispositionHeader.getFileName();
newFileContentType = formDataBodyPart.getMediaType().toString();
}
String newFilename = contentDispositionHeader.getFileName();
String newFileContentType = formDataBodyPart.getMediaType().toString();

// (4) Create the AddReplaceFileHelper object
msg("REPLACE!");
Expand Down Expand Up @@ -254,14 +270,16 @@ public Response replaceFileInDataset(
addFileHelper.runForceReplaceFile(fileToReplaceId,
newFilename,
newFileContentType,
newStorageIdentifier,
testFileInputStream,
optionalFileParams);
}else{
addFileHelper.runReplaceFile(fileToReplaceId,
newFilename,
newFileContentType,
newStorageIdentifier,
testFileInputStream,
optionalFileParams);
optionalFileParams);
}

msg("we're back.....");
Expand Down
Loading

0 comments on commit 80361bf

Please sign in to comment.