Merge pull request #7796 from IQSS/develop

v5.4.1
IQSS · Apr 13, 2021 · 80361bf · 80361bf
2 parents ea91390 + 473cde6
commit 80361bf
Show file tree

Hide file tree

Showing 18 changed files with 151 additions and 38 deletions.
diff --git a/doc/release-notes/5.4.1-release-notes.md b/doc/release-notes/5.4.1-release-notes.md
@@ -0,0 +1,46 @@
+# Dataverse Software 5.4.1
+
+This release provides a fix for a regression introduced in 5.4 and implements a few other small changes. Please use 5.4.1 for production deployments instead of 5.4.
+
+## Release Highlights
+
+### API Backwards Compatibility Maintained
+
+The syntax in the example in the [Basic File Access](https://guides.dataverse.org/en/5.4.1/api/dataaccess.html#basic-file-access) section of the Dataverse Software Guides will continue to work.
+
+## Complete List of Changes
+
+For the complete list of code changes in this release, see the [5.4.1 Milestone](https://github.com/IQSS/dataverse/milestone/95?closed=1) in Github.
+
+For help with upgrading, installing, or general questions please post to the [Dataverse Community Google Group](https://groups.google.com/forum/#!forum/dataverse-community) or email [email protected].
+
+## Installation
+
+If this is a new installation, please see our [Installation Guide](https://guides.dataverse.org/en/5.4.1/installation/).
+
+## Upgrade Instructions
+
+0\. These instructions assume that you've already successfully upgraded from Dataverse Software 4.x to Dataverse Software 5 following the instructions in the [Dataverse Software 5 Release Notes](https://github.com/IQSS/dataverse/releases/tag/v5.0). After upgrading from the 4.x series to 5.0, you should progress through the other 5.x releases before attempting the upgrade to 5.4.1.
+
+1\. Undeploy the previous version.
+
+- `$PAYARA/bin/asadmin list-applications`
+- `$PAYARA/bin/asadmin undeploy dataverse<-version>`
+
+2\. Stop Payara and remove the generated directory
+
+- `service payara stop`
+- `rm -rf $PAYARA/glassfish/domains/domain1/generated`
+
+3\. Start Payara
+
+- `service payara start`
+
+4\. Deploy this version.
+
+- `$PAYARA/bin/asadmin deploy dataverse-5.4.1.war`
+
+5\. Restart payara
+
+- `service payara stop`
+- `service payara start`
diff --git a/doc/sphinx-guides/source/admin/make-data-count.rst b/doc/sphinx-guides/source/admin/make-data-count.rst
@@ -129,7 +129,7 @@ Populate Views and Downloads Nightly
 
 Running ``main.py`` to create the SUSHI JSON file and the subsequent calling of the Dataverse Software API to process it should be added as a cron job.
 
-The Dataverse Software provides example scripts that run the steps to process new accesses and uploads and update your Dataverse installation's database (`counter_daily.sh</_static/util/counter_daily.sh>`) and to retrieve citations for all Datasets from DataCite (`counter_weekly.sh</_static/util/counter_weekly.sh>`). These scripts should be configured for your environment and can be run manually or as cron jobs.
+The Dataverse Software provides example scripts that run the steps to process new accesses and uploads and update your Dataverse installation's database :download:`counter_daily.sh <../_static/util/counter_daily.sh>` and to retrieve citations for all Datasets from DataCite :download:`counter_weekly.sh <../_static/util/counter_weekly.sh>`. These scripts should be configured for your environment and can be run manually or as cron jobs.
 
 Sending Usage Metrics to the DataCite Hub
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

diff --git a/doc/sphinx-guides/source/api/client-libraries.rst b/doc/sphinx-guides/source/api/client-libraries.rst
@@ -37,3 +37,10 @@ Java
 https://github.com/IQSS/dataverse-client-java is the official Java library for Dataverse Software APIs.
 
 `Richard Adams <http://www.researchspace.com/electronic-lab-notebook/about_us_team.html>`_ from `ResearchSpace <http://www.researchspace.com>`_ created and maintains this library.
+
+Ruby
+----
+
+https://github.com/libis/dataverse_api is a Ruby gem for Dataverse Software APIs. It is registered as a library on Rubygems (https://rubygems.org/search?query=dataverse).
+
+The gem is created and maintained by the LIBIS team (https://www.libis.be) at the University of Leuven (https://www.kuleuven.be).
diff --git a/doc/sphinx-guides/source/api/dataaccess.rst b/doc/sphinx-guides/source/api/dataaccess.rst
@@ -87,7 +87,7 @@ Basic access URI:
 
   Example: Getting the file whose DOI is *10.5072/FK2/J8SJZB* ::
 
-    GET http://$SERVER/api/access/datafile/:persistentId/?persistentId=doi:10.5072/FK2/J8SJZB
+    GET http://$SERVER/api/access/datafile/:persistentId?persistentId=doi:10.5072/FK2/J8SJZB
 
 
 Parameters:

diff --git a/doc/sphinx-guides/source/api/native-api.rst b/doc/sphinx-guides/source/api/native-api.rst
@@ -1246,6 +1246,8 @@ When adding a file to a dataset, you can optionally specify the following:
 - The "File Path" of the file, indicating which folder the file should be uploaded to within the dataset.
 - Whether or not the file is restricted.
 
+Note that when a Dataverse instance is configured to use S3 storage with direct upload enabled, there is API support to send a file directly to S3. This is more complex and is described in the :doc:`/developers/s3-direct-upload-api` guide.
+ 
 In the curl example below, all of the above are specified but they are optional.
 
 .. note:: See :ref:`curl-examples-and-environment-variables` if you are unfamiliar with the use of ``export`` below.
@@ -1959,6 +1961,8 @@ Replacing Files
 
 Replace an existing file where ``ID`` is the database id of the file to replace or ``PERSISTENT_ID`` is the persistent id (DOI or Handle) of the file. Requires the ``file`` to be passed as well as a ``jsonString`` expressing the new metadata.  Note that metadata such as description, directoryLabel (File Path) and tags are not carried over from the file being replaced.
 
+Note that when a Dataverse instance is configured to use S3 storage with direct upload enabled, there is API support to send a replacement file directly to S3. This is more complex and is described in the :doc:`/developers/s3-direct-upload-api` guide.
+
 A curl example using an ``ID``
 
 .. code-block:: bash

diff --git a/doc/sphinx-guides/source/conf.py b/doc/sphinx-guides/source/conf.py
@@ -65,9 +65,9 @@
 # built documents.
 #
 # The short X.Y version.
-version = '5.4'
+version = '5.4.1'
 # The full version, including alpha/beta/rc tags.
-release = '5.4'
+release = '5.4.1'
 
 # The language for content autogenerated by Sphinx. Refer to documentation
 # for a list of supported languages.

diff --git a/doc/sphinx-guides/source/developers/s3-direct-upload-api.rst b/doc/sphinx-guides/source/developers/s3-direct-upload-api.rst
@@ -1,13 +1,13 @@
-Direct DataFile Upload API
-==========================
+Direct DataFile Upload/Replace API
+==================================
 
 The direct Datafile Upload API is used internally to support direct upload of files to S3 storage and by tools such as the DVUploader.
 
 Direct upload involves a series of three activities, each involving interacting with the server for a Dataverse installation:
 
 * Requesting initiation of a transfer from the server
 * Use of the pre-signed URL(s) returned in that call to perform an upload/multipart-upload of the file to S3
-* A call to the server to register the file as part of the dataset and/or to cancel the transfer
+* A call to the server to register the file as part of the dataset/replace a file in the dataset or to cancel the transfer
 
 This API is only enabled when a Dataset is configured with a data store supporting direct S3 upload.
 Administrators should be aware that partial transfers, where a client starts uploading the file/parts of the file and does not contact the server to complete/cancel the transfer, will result in data stored in S3 that is not referenced in the Dataverse installation (e.g. should be considered temporary and deleted.)
@@ -24,7 +24,7 @@ To initiate a transfer of a file to S3, make a call to the Dataverse installatio
   export PERSISTENT_IDENTIFIER=doi:10.5072/FK27U7YBV
   export SIZE=1000000000
  
-  curl -H 'X-Dataverse-key:$API_TOKEN' "$SERVER_URL/api/datasets/:persistentId/uploadurls?persistentId=$PERSISTENT_IDENTIFIER&size=$SIZE"
+  curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/datasets/:persistentId/uploadurls?persistentId=$PERSISTENT_IDENTIFIER&size=$SIZE"
 
 The response to this call, assuming direct uploads are enabled, will be one of two forms:
 
@@ -63,13 +63,23 @@ Multiple URLs: when the file must be uploaded in multiple parts. The part size i
 
 In the example responses above, the URLs, which are very long, have been omitted. These URLs reference the S3 server and the specific object identifier that will be used, starting with, for example, https://demo-dataverse-bucket.s3.amazonaws.com/10.5072/FK2FOQPJS/177883b000e-49cedef268ac?...
 
-The client must then use the URL(s) to POST the file, or if the file is larger than the specified partSize, parts of the file. 
+The client must then use the URL(s) to PUT the file, or if the file is larger than the specified partSize, parts of the file. 
 
-In the multipart case, the client must send each part and collect the 'eTag' responses from the server. To successfully conclude the multipart upload, the client must call the 'complete' URI, sending a json object including the part eTags:
+In the single part case, only one call to the supplied URL is required:
 
 .. code-block:: bash
 
-    curl -X PUT "$SERVER_URL/api/datasets/mpload?..." -d '{"1":"\<eTag1 string\>","2":"\<eTag2 string\>","3":"\<eTag3 string\>","4":"\<eTag4 string\>","5":"\<eTag5 string\>"}'
+    curl -H 'x-amz-tagging:dv-state=temp' -X PUT -T <filename> "<supplied url>"
+
+
+In the multipart case, the client must send each part and collect the 'eTag' responses from the server. The calls for this are the same as the one for the single part case except that each call should send a <partSize> slice of the total file, with the last part containing the remaining bytes.
+The responses from the S3 server for these calls will include the 'eTag' for the uploaded part. 
+
+To successfully conclude the multipart upload, the client must call the 'complete' URI, sending a json object including the part eTags:
+
+.. code-block:: bash
+
+    curl -X PUT "$SERVER_URL/api/datasets/mpload?..." -d '{"1":"<eTag1 string>","2":"<eTag2 string>","3":"<eTag3 string>","4":"<eTag4 string>","5":"<eTag5 string>"}'
   
 If the client is unable to complete the multipart upload, it should call the abort URL:
 
@@ -87,7 +97,6 @@ jsonData normally includes information such as a file description, tags, provena
 * "storageIdentifier" - String, as specified in prior calls
 * "fileName" - String
 * "mimeType" - String
-* "fileSize" - number of bytes
 * fixity/checksum: either: 
 
   * "md5Hash" - String with MD5 hash value, or
@@ -100,12 +109,38 @@ The allowed checksum algorithms are defined by the edu.harvard.iq.dataverse.Data
   export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
   export SERVER_URL=https://demo.dataverse.org
   export PERSISTENT_IDENTIFIER=doi:10.5072/FK27U7YBV
-  export JSON_DATA={"description":"My description.","directoryLabel":"data/subdir1","categories":["Data"], "restrict":"false", "storageIdentifier":"s3://demo-dataverse-bucket:176e28068b0-1c3f80357c42", "fileName":"file1.txt", "mimeType":"text/plain", "fileSize":"27", "checksum": {"@type": "SHA-1", "@value": "123456"}}
+  export JSON_DATA="{'description':'My description.','directoryLabel':'data/subdir1','categories':['Data'], 'restrict':'false', 'storageIdentifier':'s3://demo-dataverse-bucket:176e28068b0-1c3f80357c42', 'fileName':'file1.txt', 'mimeType':'text/plain', 'checksum': {'@type': 'SHA-1', '@value': '123456'}}"
 
-  curl -X POST -H 'X-Dataverse-key: $API_TOKEN' "$SERVER_URL/api/datasets/:persistentId/add?persistentId=#PERSISTENT_IDENTIFIER" -F 'jsonData=$JSON_DATA'
+  curl -X POST -H "X-Dataverse-key: $API_TOKEN" "$SERVER_URL/api/datasets/:persistentId/add?persistentId=$PERSISTENT_IDENTIFIER" -F "jsonData=$JSON_DATA"
   
 Note that this API call can be used independently of the others, e.g. supporting use cases in which the file already exists in S3/has been uploaded via some out-of-band method. 
 With current S3 stores the object identifier must be in the correct bucket for the store, include the PID authority/identifier of the parent dataset, and be guaranteed unique, and the supplied storage identifer must be prefaced with the store identifier used in the Dataverse installation, as with the internally generated examples above.
 
+Replacing an existing file in the Dataset
+-----------------------------------------
+
+Once the file exists in the s3 bucket, a final API call is needed to register it as a replacement of an existing file. This call is the same call used to replace a file to a Dataverse installation but, rather than sending the file bytes, additional metadata is added using the "jsonData" parameter.
+jsonData normally includes information such as a file description, tags, provenance, whether the file is restricted, whether to allow the mimetype to change (forceReplace=true), etc. For direct uploads, the jsonData object must also include values for:
+
+* "storageIdentifier" - String, as specified in prior calls
+* "fileName" - String
+* "mimeType" - String
+* fixity/checksum: either: 
+
+  * "md5Hash" - String with MD5 hash value, or
+  * "checksum" - Json Object with "@type" field specifying the algorithm used and "@value" field with the value from that algorithm, both Strings 
+
+The allowed checksum algorithms are defined by the edu.harvard.iq.dataverse.DataFile.CheckSumType class and currently include MD5, SHA-1, SHA-256, and SHA-512.
+Note that the API call does not validate that the file matches the hash value supplied. If a Dataverse instance is configured to validate file fixity hashes at publication time, a mismatch would be caught at that time and cause publication to fail.
+
+.. code-block:: bash
+
+  export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
+  export SERVER_URL=https://demo.dataverse.org
+  export FILE_IDENTIFIER=5072
+  export JSON_DATA="{'description':'My description.','directoryLabel':'data/subdir1','categories':['Data'], 'restrict':'false', 'forceReplace':'true', 'storageIdentifier':'s3://demo-dataverse-bucket:176e28068b0-1c3f80357c42', 'fileName':'file1.txt', 'mimeType':'text/plain', 'checksum': {'@type': 'SHA-1', '@value': '123456'}}"
+
+  curl -X POST -H "X-Dataverse-key: $API_TOKEN" "$SERVER_URL/api/files/$FILE_IDENTIFIER/replace" -F "jsonData=$JSON_DATA"
   
-
+Note that this API call can be used independently of the others, e.g. supporting use cases in which the file already exists in S3/has been uploaded via some out-of-band method. 
+With current S3 stores the object identifier must be in the correct bucket for the store, include the PID authority/identifier of the parent dataset, and be guaranteed unique, and the supplied storage identifer must be prefaced with the store identifier used in the Dataverse installation, as with the internally generated examples above.
diff --git a/doc/sphinx-guides/source/installation/config.rst b/doc/sphinx-guides/source/installation/config.rst
@@ -2225,6 +2225,8 @@ This is the local file system path to be used with the LocalSubmitToArchiveComma
 
 These are the bucket and project names to be used with the GoogleCloudSubmitToArchiveCommand class. Further information is in the :ref:`Google Cloud Configuration` section above.
 
+.. _:InstallationName:
+
 :InstallationName
 +++++++++++++++++
 

diff --git a/doc/sphinx-guides/source/versions.rst b/doc/sphinx-guides/source/versions.rst
@@ -6,8 +6,9 @@ Dataverse Software Documentation Versions
 
 This list provides a way to refer to the documentation for previous versions of the Dataverse Software. In order to learn more about the updates delivered from one version to another, visit the `Releases <https://github.com/IQSS/dataverse/releases>`__ page in our GitHub repo.
 
-- 5.4
+- 5.4.1
 
+- `5.4 </en/5.4/>`__
 - `5.3 </en/5.3/>`__
 - `5.2 </en/5.2/>`__
 - `5.1.1 </en/5.1.1/>`__

diff --git a/pom.xml b/pom.xml
@@ -7,7 +7,7 @@
     -->
     <groupId>edu.harvard.iq</groupId>
     <artifactId>dataverse</artifactId>
-    <version>5.4</version>
+    <version>5.4.1</version>
     <packaging>war</packaging>
     <name>dataverse</name>
     <properties>

diff --git a/scripts/dev/dev-rebuild.sh b/scripts/dev/dev-rebuild.sh
@@ -52,9 +52,6 @@ cd scripts/api
 ./setup-all.sh --insecure -p=admin1 | tee /tmp/setup-all.sh.out
 cd ../..
 
-echo "Loading SQL reference data..."
-psql -U $DB_USER $DB_NAME -f scripts/database/reference_data.sql
-
 echo "Creating SQL sequence..."
 psql -U $DB_USER $DB_NAME -f doc/sphinx-guides/source/_static/util/createsequence.sql
 

diff --git a/src/main/java/edu/harvard/iq/dataverse/api/Access.java b/src/main/java/edu/harvard/iq/dataverse/api/Access.java
@@ -275,13 +275,18 @@ private DataFile findDataFileOrDieWrapper(String fileId){
     @Produces({"application/xml"})
     public DownloadInstance datafile(@PathParam("fileId") String fileId, @QueryParam("gbrecs") boolean gbrecs, @QueryParam("key") String apiToken, @Context UriInfo uriInfo, @Context HttpHeaders headers, @Context HttpServletResponse response) /*throws NotFoundException, ServiceUnavailableException, PermissionDeniedException, AuthorizationRequiredException*/ {
 
+        // check first if there's a trailing slash, and chop it: 
+        while (fileId.lastIndexOf('/') == fileId.length() - 1) {
+            fileId = fileId.substring(0, fileId.length() - 1);
+        }
+
         if (fileId.indexOf('/') > -1) {
             // This is for embedding folder names into the Access API URLs;
             // something like /api/access/datafile/folder/subfolder/1234
             // instead of the normal /api/access/datafile/1234 notation. 
             // this is supported only for recreating folders during recursive downloads - 
             // i.e. they are embedded into the URL for the remote client like wget,
-            // but can be safely ignored here. 
+            // but can be safely ignored here.
             fileId = fileId.substring(fileId.lastIndexOf('/') + 1);
         }
 

diff --git a/src/main/java/edu/harvard/iq/dataverse/api/Files.java b/src/main/java/edu/harvard/iq/dataverse/api/Files.java
@@ -218,11 +218,27 @@ public Response replaceFileInDataset(
         }
 
         // (3) Get the file name and content type
-        if(null == contentDispositionHeader) {
-             return error(BAD_REQUEST, "You must upload a file.");
+        String newFilename = null;
+        String newFileContentType = null;
+        String newStorageIdentifier = null;
+        if (null == contentDispositionHeader) {
+            if (optionalFileParams.hasStorageIdentifier()) {
+                newStorageIdentifier = optionalFileParams.getStorageIdentifier();
+                // ToDo - check that storageIdentifier is valid
+                if (optionalFileParams.hasFileName()) {
+                    newFilename = optionalFileParams.getFileName();
+                    if (optionalFileParams.hasMimetype()) {
+                        newFileContentType = optionalFileParams.getMimeType();
+                    }
+                }
+            } else {
+                return error(BAD_REQUEST,
+                        "You must upload a file or provide a storageidentifier, filename, and mimetype.");
+            }
+        } else {
+            newFilename = contentDispositionHeader.getFileName();
+            newFileContentType = formDataBodyPart.getMediaType().toString();
         }
-        String newFilename = contentDispositionHeader.getFileName();
-        String newFileContentType = formDataBodyPart.getMediaType().toString();
 
         // (4) Create the AddReplaceFileHelper object
         msg("REPLACE!");
@@ -254,14 +270,16 @@ public Response replaceFileInDataset(
             addFileHelper.runForceReplaceFile(fileToReplaceId,
                                     newFilename,
                                     newFileContentType,
+                                    newStorageIdentifier,
                                     testFileInputStream,
                                     optionalFileParams);
         }else{
             addFileHelper.runReplaceFile(fileToReplaceId,
                                     newFilename,
                                     newFileContentType,
+                                    newStorageIdentifier,
                                     testFileInputStream,
-                                    optionalFileParams);            
+                                    optionalFileParams);
         }    
 
         msg("we're back.....");