Skip to content

Releases: IQSS/dataverse

v6.2

02 Apr 15:02
a218417
Compare
Choose a tag to compare

Dataverse 6.2

Please note: To read these instructions in full, please go to https://github.com/IQSS/dataverse/releases/tag/v6.2 rather than the list of releases, which will cut them off.

This release brings new features, enhancements, and bug fixes to the Dataverse software.
Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Table of Contents

💡Release Highlights

Search and Facet by License

License have been added to the search facets in the search side panel to filter datasets by license (e.g. CC0).

Datasets with Custom Terms are aggregated under the "Custom Terms" value of this facet. See the Licensing section of the guide for more details on configured Licenses and Custom Terms.

For more information, see #9060.

Licenses can also be used to filter the Search API results using the fq parameter, for example : /api/search?q=*&fq=license%3A%22CC0+1.0%22 for CC0 1.0, see the Search API guide for more examples.

For more information, see #10204.

When Returning Datasets to Authors, Reviewers Can Add a Note to the Author

The Popup for returning to author now allows to type in a message to explain the reasons of return and potential edits needed, that will be sent by email to the author.

Please note that this note is mandatory, but that you can still type a creative and meaningful comment such as "The author would like to modify his dataset", "Files are missing", "Nothing to report" or "A curation report with comments and suggestions/instructions will follow in another email" that suits your situation.

For more information, see #10137.

Support for Using Multiple PID Providers

This release adds support for using multiple PID (DOI, Handle, PermaLink) providers, multiple PID provider accounts
(managing a given protocol, authority, separator, shoulder combination), assigning PID provider accounts to specific collections,
and supporting transferred PIDs (where a PID is managed by an account when its authority, separator, and/or shoulder don't match
the combination where the account can mint new PIDs). It also adds the ability for additional provider services beyond the existing
DataCite, EZId, Handle, and PermaLink providers to be dynamically added as separate jar files.

These changes require per-provider settings rather than the global PID settings previously supported. While backward compatibility
for installations using a single PID Provider account is provided, updating to use the new microprofile settings is highly recommended
and will be required in a future version.

For more information check the PID settings on this link.

New microprofile settings

Rate Limiting

The option to rate limit has been added to prevent users from over taxing the system either deliberately or by runaway automated processes.
Rate limiting can be configured on a tier level with tier 0 being reserved for guest users and tiers 1-any for authenticated users.
Superuser accounts are exempt from rate limiting.

Rate limits can be imposed on command APIs by configuring the tier, the command, and the hourly limit in the database.
Two database settings configure the rate limiting :RateLimitingDefaultCapacityTiers and RateLimitingCapacityByTierAndAction, If either of these settings exist in the database rate limiting will be enabled and If neither setting exists rate limiting is disabled.

For more details check the detailed guide on this link.

Simplified SMTP Configuration

With this release, we deprecate the usage of asadmin create-javamail-resource to configure Dataverse to send mail using your SMTP server and provide a simplified, standard alternative using JVM options or MicroProfile Config.

At this point, no action is required if you want to keep your current configuration.
Warnings will show in your server logs to inform and remind you about the deprecation.
A future major release of Dataverse may remove this way of configuration.

Please do take the opportunity to update your SMTP configuration. Details can be found in section of the Installation Guide starting with the SMTP/Email Configuration section of the Installation Guide.

Once reconfiguration is complete, you should remove legacy, unused config. First, run asadmin delete-javamail-resource mail/notifyMailSession as described in the 6.2 guides. Then run curl -X DELETE http://localhost:8080/api/admin/settings/:SystemEmail as this database setting has been replace with dataverse.mail.system-email as described below.

Please note: as there have been problems with email delivered to SPAM folders when the "From" within mail envelope and the mail session configuration didn't match (#4210), as of this version the sole source for the "From" address is the setting dataverse.mail.system-email once you migrate to the new way of configuration.

New SMTP settings:

Binder Redirect

If your installation is configured to use Binder, you should remove the old "girder_ythub" tool and replace it with the tool described at https://github.com/IQSS/dataverse-binder-redirect

For more information, see #10360.

Optional Croissant 🥐 Exporter Support

When a Dataverse installation is configured to use a metadata exporter for the Croissant format, the content of the JSON-LD in the <head> of dataset landing pages will be replaced with that format. However, both JSON-LD and Croissant will still be available for download from the dataset page and API.

For more information, see #10382.

Harvesting Handle Missing Controlled Values

Allows datasets to be harvested with Controlled Vocabulary Values that existed in the originating Dataverse installation but are not in the harvesting Dataverse installation. For more information, view the changes to the endpoint here.

Add .QPJ and .QMD Extensions to Shapefile Handling

Support for .qpj and .qmd files in shapefile uploads has been introduced, ensuring that these files are properly recognized and handled as part of geospatial datasets in Dataverse.

For more information, see #10305.

Ingested Tabular Data Files Can Be Stored Without the Variable Name Header

Tabular Data Ingest can now save the generated archival files with the list of variable names added as the first tab-delimited line.

Access API will be able to take advantage of Direct Download for .tab files saved with these headers on S3 - since they no longer have to be generated and added to the streamed content on the fly.

This behavior is controlled by the new setting :StoreIngestedTabularFilesWithVarHeaders. It is false by default, preserving the legacy behavior. When enabled, Dataverse will be able to handle both the newly ingested files, and any already-existing legacy files stored without these headers transparently to the user. E.g. the access API will continue delivering tab-delimited files with this header line, whether it needs to add it dynamically for the legacy files, or reading complete files directly from storage for the ones stored with it.

We are planning to add an API for converting existing legacy tabular files in a future release.

For more information, see #10282.

Uningest/Reingest Options Available in the File Page Edit Menu

New Uningest/Reingest options are available in the File Page Edit menu. Ingest errors can be cleared by users who can published the associated dataset and by superusers, allowing for a successful ingest to be undone or retried (e.g. after a Dataverse version update or if ingest size limits are changed).

The /api/files//uningest api also now allows users who can publish the dataset to undo an ingest failure.

For more information, see #10319.

Sphinx Guides Now Support Markdown Format and Tabs

Our guides now support the Markdown format with the extension .md. Additionally, an option to create tabs in the guides using Sphinx Tabs has been added. (You can see the tabs in action in the "dev usage" page of the Container Guide.) To continue building the guides, you will need to install this new dependency by re-running:

pip install -r requirements.txt

For more information, see #10111.

Number of Concurrent Indexing Operations Now Configurable

A new MicroProfile setting called dataverse.solr.concurrency.max-async-indexes has been added that controls the maximum number of simultaneously running asynchronous dataset index operations (defaults to 4).

For more information, see #10388.

⬆️


...

Read more

v6.1

12 Dec 23:27
1f9e10c
Compare
Choose a tag to compare

Dataverse 6.1

Please see Dataverse 6.1 deployment challenges for information about a patch that fixes some issues in this release.

Please note: To read these instructions in full, please go to https://github.com/IQSS/dataverse/releases/tag/v6.1 rather than the list of releases, which will cut them off.

This release brings new features, enhancements, and bug fixes to the Dataverse software.
Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Release highlights

Guestbook at request

Dataverse can now be configured (via the dataverse.files.guestbook-at-request option) to display any configured guestbook to users when they request restricted files (new functionality) or when they download files (previous behavior).

The global default defined by this setting can be overridden at the collection level on the collection page and at the individual dataset level by a superuser using the API. The default, showing guestbooks when files are downloaded, remains as it was in prior Dataverse versions.

For details, see dataverse.files.guestbook-at-request and PR #9599.

Collection-level storage quotas

This release adds support for defining storage size quotas for collections. Please see the API guide for details. This is an experimental feature that has not yet been used in production on any real life Dataverse instance, but we are planning to try it out at Harvard/IQSS.

Please note that this release includes a database update (via a Flyway script) that will calculate the storage sizes of all the existing datasets and collections on the first deployment. On a large production database with tens of thousands of datasets this may add a couple of extra minutes to the first, initial deployment of Dataverse 6.1.

For details, see Storage Quotas for Collections in the Admin Guide.

Globus support (experimental), continued

Globus support in Dataverse has been expanded to include support for using file-based Globus endpoints, including the case where files are stored on tape and are not immediately accessible and for the case of referencing files stored on remote Globus endpoints. Support for using the Globus S3 Connector with an S3 store has been retained but requires changes to the Dataverse configuration. Please note:

  • Globus functionality remains experimental/advanced in that it requires significant setup, differs in multiple ways from other file storage mechanisms, and may continue to evolve with the potential for backward incompatibilities.
  • The functionality is configured per store and replaces the previous single-S3-Connector-per-Dataverse-instance model.
  • Adding files to a dataset, and accessing files is supported via the Dataverse user interface through a separate dataverse-globus app.
  • The functionality is also accessible via APIs (combining calls to the Dataverse and Globus APIs)

Backward incompatibilities:

  • The configuration for use of a Globus S3 Connector has changed and is aligned with the standard store configuration mechanism
  • The new functionality is incompatible with older versions of the globus-dataverse app and the Globus-related functionality in the UI will only function correctly if a Dataverse 6.1 compatible version of the dataverse-globus app is configured.

New JVM options:

  • A new "globus" store type and associated store-related options have been added. These are described in the File Storage section of the Installation Guide.
  • dataverse.files.globus-cache-maxage - specifies the number of minutes Dataverse will wait between an initial request for a file transfer occurs and when that transfer must begin.

Obsolete Settings: the :GlobusBasicToken, :GlobusEndpoint, and :GlobusStores settings are no longer used

Further details can be found in the Big Data Support section of the Developer Guide.

Alternative Title now allows multiple values

Alternative Title now allows multiples. Note that JSON used to create a dataset with an Alternate Title must be changed. See "Backward incompatibilities" below and PR #9440 for details.

External tools: configure tools now available at the dataset level

Read/write "configure" tools (a type of external tool) are now available at the dataset level. They appear under the "Edit Dataset" menu. See External Tools in the Admin Guide and PR #9925.

S3 out-of-band upload

In some situations, direct upload might not work from the UI, e.g., when s3 storage is not accessible from the internet. This pull request adds an option to allow direct uploads via API only. This way, a third party application can use direct upload from within the internal network, while there is no direct download available to the users via UI.
By default, Dataverse supports uploading files via the add a file to a dataset API. With S3 stores, a direct upload process can be enabled to allow sending the file directly to the S3 store (without any intermediate copies on the Dataverse server).
With the upload-out-of-band option enabled, it is also possible for file upload to be managed manually or via third-party tools, with the Adding the Uploaded file to the Dataset API call (described in the Direct DataFile Upload/Replace API page) used to add metadata and inform Dataverse that a new file has been added to the relevant store.

JSON Schema for datasets

Functionality has been added to help validate dataset JSON prior to dataset creation. There are two new API endpoints in this release. The first takes in a collection alias and returns a custom dataset schema based on the required fields of the collection. The second takes in a collection alias and a dataset JSON file and does an automated validation of the JSON file against the custom schema for the collection. In this release functionality is limited to JSON format validation and validating required elements. Future releases will address field types, controlled vocabulary, etc. See Retrieve a Dataset JSON Schema for a Collection in the API Guide and PR #10109.

OpenID Connect (OIDC) improvements

Using MicroProfile Config for provisioning

With this release it is possible to provision a single OIDC-based authentication provider by using MicroProfile Config instead of or in addition to the classic Admin API provisioning.

If you are using an external OIDC provider component as an identity management system and/or broker to other authentication providers such as Google, eduGain SAML and so on, this might make your life easier during instance setups and reconfiguration. You no longer need to generate the necessary JSON file.

Adding PKCE Support

Some OIDC providers require using PKCE as additional security layer. As of this version, you can enable support for this on any OIDC provider you configure. (Note that OAuth2 providers have not been upgraded.)

For both features, see the OIDC section of the Installation Guide and PR #9273.

Solr improvements

As of this release, application-side support has been added for the "circuit breaker" mechanism in Solr that makes it drop requests more gracefully when the search engine is experiencing load issues.

Please see the Installing Solr section of the Installation Guide.

New release of Dataverse Previewers (including a Markdown previewer)

Version 1.4 of the standard Dataverse Previewers from https://github/com/gdcc/dataverse-previewers is available. The new version supports the use of signedUrls rather than API keys when previewing restricted files (including files in draft dataset versions). Upgrading is highly recommended. Please note:

  • SignedUrls can now be used with PrivateUrl access tokens, which allows PrivateUrl users to view previewers that are configured to use SignedUrls. See #10093.
  • Launching a dataset-level configuration tool will automatically generate an API token when needed. This is consistent with how other types of tools work. See #10045.
  • There is now a Markdown (.md) previewer.

New or improved APIs

The development of a new UI for Dataverse is driving the addition or improvement of many APIs.

New API endpoints

  • deaccessionDataset (/api/datasets/{id}/versions/{versionId}/deaccession): version deaccessioning through API (Given a dataset and a version).
  • /api/files/{id}/downloadCount
  • /api/files/{id}/dataTables
  • /api/files/{id}/metadata/tabularTags New endpoint to set tabular file tags.
  • canManageFilePermissions (/access/datafile/{id}/userPermissions) Added for getting user permissions on a file.
  • getVersionFileCounts (/api/datasets/{id}/versions/{versionId}/files/counts): Giv...
Read more

v6.0

08 Sep 17:47
5f2413b
Compare
Choose a tag to compare

Dataverse 6.0

Please note: To read these instructions in full, please go to https://github.com/IQSS/dataverse/releases/tag/v6.0 rather than the list of releases, which will cut them off.

This is a platform upgrade release. Payara, Solr, and Java have been upgraded. No features have been added to the Dataverse software itself. Only a handful of bugs were fixed.

Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project!

Release Highlights (Major Upgrades, Breaking Changes)

This release contains major upgrades to core components. Detailed upgrade instructions can be found below.

Runtime

  • The required Java version has been increased from version 11 to 17.
    • See PR #9764 for details.
  • Payara application server has been upgraded to version 6.2023.8.
    • This is a required update.
    • Please note that Payara Community 5 has reached end of life
    • See PR #9685 and PR #9795 for details.
  • Solr has been upgraded to version 9.3.0.
    • See PR #9787 for details.
  • PostgreSQL 13 remains the tested and supported version.
    • See the PostgreSQL section of the Installation Guide for details.

Development

  • Removal of Vagrant and Docker All In One (docker-aio), deprecated in Dataverse v5.14. See PR #9838 and PR #9685 for details.
  • All tests have been migrated to use JUnit 5 exclusively from now on. See PR #9796 for details.

Installation

If this is a new installation, please follow our Installation Guide. Please don't be shy about asking for help if you need it!

Once you are in production, we would be delighted to update our map of Dataverse installations around the world to include yours! Please create an issue or email us at support@dataverse.org to join the club!

You are also very welcome to join the Global Dataverse Community Consortium (GDCC).

Upgrade Instructions

Upgrading requires a maintenance window and downtime. Please plan ahead, create backups of your database, etc.

These instructions assume that you've already upgraded through all the 5.x releases and are now running Dataverse 5.14.

Upgrade from Java 11 to Java 17

Java 17 is now required for Dataverse. Solr can run under Java 11 or Java 17 but the latter is recommended. In preparation for the Java upgrade, stop both Dataverse/Payara and Solr.

  1. Undeploy Dataverse, if deployed, using the unprivileged service account.

    sudo -u dataverse /usr/local/payara5/bin/asadmin list-applications

    sudo -u dataverse /usr/local/payara5/bin/asadmin undeploy dataverse-5.14

  2. Stop Payara 5.

    sudo -u dataverse /usr/local/payara5/bin/asadmin stop-domain

  3. Stop Solr 8.

    sudo systemctl stop solr.service

  4. Install Java 17.

    Assuming you are using RHEL or a derivative such as Rocky Linux:

    sudo yum install java-17-openjdk

  5. Set Java 17 as the default.

    Assuming you are using RHEL or a derivative such as Rocky Linux:

    sudo alternatives --config java

  6. Test that Java 17 is the default.

    java -version

Upgrade from Payara 5 to Payara 6

If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.

  1. Download Payara 6.2023.8.

    curl -L -O https://nexus.payara.fish/repository/payara-community/fish/payara/distributions/payara/6.2023.8/payara-6.2023.8.zip

  2. Unzip it to /usr/local (or your preferred location).

    sudo unzip payara-6.2023.8.zip -d /usr/local/

  3. Change ownership of the unzipped Payara to your "service" user ("dataverse" by default).

    sudo chown -R dataverse /usr/local/payara6

  4. Undeploy Dataverse, if deployed, using the unprivileged service account.

    sudo -u dataverse /usr/local/payara5/bin/asadmin list-applications

    sudo -u dataverse /usr/local/payara5/bin/asadmin undeploy dataverse-5.14

  5. Stop Payara 5, if running.

    sudo -u dataverse /usr/local/payara5/bin/asadmin stop-domain

  6. Copy Dataverse-related lines from Payara 5 to Payara 6 domain.xml.

    sudo -u dataverse cp /usr/local/payara6/glassfish/domains/domain1/config/domain.xml /usr/local/payara6/glassfish/domains/domain1/config/domain.xml.orig

    sudo egrep 'dataverse|doi' /usr/local/payara5/glassfish/domains/domain1/config/domain.xml > lines.txt

    sudo vi /usr/local/payara6/glassfish/domains/domain1/config/domain.xml

    If any JVM options reference the old payara5 path (/usr/local/payara5) be sure to change it to payara6.

    The lines will appear in two sections, examples shown below (but your content will vary).

    Section 1: system properties (under <server name="server" config-ref="server-config">)

    <system-property name="dataverse.db.user" value="dvnuser"></system-property>
    <system-property name="dataverse.db.host" value="localhost"></system-property>
    <system-property name="dataverse.db.port" value="5432"></system-property>
    <system-property name="dataverse.db.name" value="dvndb"></system-property>
    <system-property name="dataverse.db.password" value="dvnsecret"></system-property>
    

    Note: if you used the Dataverse installer, you won't have a dataverse.db.password property. See "Create password aliases" below.

    Section 2: JVM options (under <java-config classpath-suffix="" debug-options="-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=9009" system-classpath="">, the one under <config name="server-config">, not under <config name="default-config">)

    <jvm-options>-Ddataverse.files.directory=/usr/local/dvn/data</jvm-options>
    <jvm-options>-Ddataverse.files.file.type=file</jvm-options>
    <jvm-options>-Ddataverse.files.file.label=file</jvm-options>
    <jvm-options>-Ddataverse.files.file.directory=/usr/local/dvn/data</jvm-options>
    <jvm-options>-Ddataverse.rserve.host=localhost</jvm-options>
    <jvm-options>-Ddataverse.rserve.port=6311</jvm-options>
    <jvm-options>-Ddataverse.rserve.user=rserve</jvm-options>
    <jvm-options>-Ddataverse.rserve.password=rserve</jvm-options>
    <jvm-options>-Ddataverse.auth.password-reset-timeout-in-minutes=60</jvm-options>
    <jvm-options>-Ddataverse.timerServer=true</jvm-options>
    <jvm-options>-Ddataverse.fqdn=dev1.dataverse.org</jvm-options>
    <jvm-options>-Ddataverse.siteUrl=https://dev1.dataverse.org</jvm-options>
    <jvm-options>-Ddataverse.files.storage-driver-id=file</jvm-options>
    <jvm-options>-Ddoi.username=testaccount</jvm-options>
    <jvm-options>-Ddoi.password=notmypassword</jvm-options>
    <jvm-options>-Ddoi.baseurlstring=https://mds.test.datacite.org/</jvm-options>
    <jvm-options>-Ddoi.dataciterestapiurlstring=https://api.test.datacite.org</jvm-options>
    
  7. Check the Xmx setting in domain.xml.

    Under /usr/local/payara6/glassfish/domains/domain1/config/domain.xml, check the Xmx setting under <config name="server-config">, where you put the JVM options, not the one under <config name="default-config">. Note that there are two such settings, and you want to adjust the one in the stanza with Dataverse options. This sets the JVM heap size; a good rule of thumb is half of your system's total RAM. You may specify the value in MB (8192m) or GB (8g).

  8. Copy jhove.conf and jhoveConfig.xsd from Payara 5, edit and change payara5 to payara6.

    sudo cp /usr/local/payara5/glassfish/domains/domain1/config/jhove* /usr/local/payara6/glassfish/domains/domain1/config/

    sudo chown dataverse /usr/local/payara6/glassfish/domains/domain1/config/jhove*

    sudo -u dataverse vi /usr/local/payara6/glassfish/domains/domain1/config/jhove.conf

  9. Copy logos from Payara 5 to Payara 6.

    These logos are for collections (dataverses).

    sudo -u dataverse cp -r /usr/local/payara5/glassfish/domains/domain1/docroot/logos /usr/local/payara6/glassfish/domains/domain1/docroot

  10. If you are using Make Data Count (MDC), edit :MDCLogPath.

    Your :MDCLogPath database setting might be pointing to a Payara 5 directory such as /usr/local/payara5/glassfish/domains/domain1/logs. If so, edit this to be Payara 6. You'll probably want to copy your logs over as well.

  11. If you've enabled access logging or any other site-specific configuration, be sure to preserve them. For instance, the default domain.xml includes

         <http-service>
         <access-log></access-log>
    

    but you may wish to include

         <http-service access-logging-enabled="true">
         <access-log format="%client.name% %datetime% %request% %status% %response.length% %header.user-agent% %header.referer% %cookie.JSESSIONID% %header.x-forwarded-for%"></access-log>
    

    Be sure to keep a previous copy of your domain.xml for reference.

  12. Update systemd unit file (or other init system) from /usr/local/payara5 to /usr/local/payara6, if applicable.

  13. Start Payara.

    sudo -u dataverse /usr/local/payara6/bin/asadmin start-domain

  14. Create a Java mail resource, replacing "localhost" for mailhost with your mail relay server, and replacing "localhost" for fromaddress with the FQDN of your Dataverse server.

    `sudo -u dataverse /usr/local/payara6/bin/asadmin create-javamail-resource --mailhost "localhost" --mailuser "dataversenotify" --fromaddress "do-not-reply@l...

Read more

v5.14

04 Aug 20:35
9f4ddbb
Compare
Choose a tag to compare

Dataverse Software 5.14

(If this note appears truncated on the GitHub Releases page, you can view it in full in the source tree: https://github.com/IQSS/dataverse/blob/master/doc/release-notes/5.14-release-notes.md)

This release brings new features, enhancements, and bug fixes to the Dataverse software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Please note that, as an experiment, the sections of this release note are organized in a different order. The Upgrade and Installation sections are at the top, with the detailed sections highlighting new features and fixes further down.

Installation

If this is a new installation, please see our Installation Guide. Please don't be shy about asking for help if you need it!

After your installation has gone into production, you are welcome to add it to our map of installations by opening an issue in the dataverse-installations repo.

Upgrade Instructions

0. These instructions assume that you are upgrading from 5.13. If you are running an earlier version, the only safe way to upgrade is to progress through the upgrades to all the releases in between before attempting the upgrade to 5.14.

If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.

In the following commands we assume that Payara 5 is installed in /usr/local/payara5. If not, adjust as needed.

export PAYARA=/usr/local/payara5

(or setenv PAYARA /usr/local/payara5 if you are using a csh-like shell)

1. Undeploy the previous version.

  • $PAYARA/bin/asadmin undeploy dataverse-5.13

2. Stop Payara and remove the generated directory

  • service payara stop
  • rm -rf $PAYARA/glassfish/domains/domain1/generated

3. Start Payara

  • service payara start

4. Deploy this version.

  • $PAYARA/bin/asadmin deploy dataverse-5.14.war

5. Restart Payara

  • service payara stop
  • service payara start

6. Update the Citation metadata block: (the update makes the field Series repeatable)

  • wget https://github.com/IQSS/dataverse/releases/download/v5.14/citation.tsv
  • curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @citation.tsv -H "Content-type: text/tab-separated-values"

If you are running an English-only installation, you are finished with the citation block. Otherwise, download the updated citation.properties file and place it in the dataverse.lang.directory; /home/dataverse/langBundles used in the example below.

  • wget https://github.com/IQSS/dataverse/releases/download/v5.14/citation.properties
  • cp citation.properties /home/dataverse/langBundles

7. Upate Solr schema.xml to allow multiple series to be used. See specific instructions below for those installations without custom metadata blocks (7a) and those with custom metadata blocks (7b).

7a. For installations without custom or experimental metadata blocks:

  • Stop Solr instance (usually service solr stop, depending on Solr installation/OS, see the Installation Guide)

  • Replace schema.xml

    • cp /tmp/dvinstall/schema.xml /usr/local/solr/solr-8.11.1/server/solr/collection1/conf
  • Start Solr instance (usually service solr start, depending on Solr/OS)

7b. For installations with custom or experimental metadata blocks:

  • Stop Solr instance (usually service solr stop, depending on Solr installation/OS, see the Installation Guide)

  • There are 2 ways to regenerate the schema: Either by collecting the output of the Dataverse schema API and feeding it to the update-fields.sh script that we supply, as in the example below (modify the command lines as needed):

	wget https://raw.githubusercontent.com/IQSS/dataverse/master/conf/solr/8.11.1/update-fields.sh
	chmod +x update-fields.sh
	curl "http://localhost:8080/api/admin/index/solr/schema" | ./update-fields.sh /usr/local/solr/solr-8.8.1/server/solr/collection1/conf/schema.xml

OR, alternatively, you can edit the following lines in your schema.xml by hand as follows (to indicate that series and its components are now multiValued="true"):

     <field name="series" type="string" stored="true" indexed="true" multiValued="true"/>
     <field name="seriesInformation" type="text_en" multiValued="true" stored="true" indexed="true"/>
     <field name="seriesName" type="text_en" multiValued="true" stored="true" indexed="true"/>
  • Restart Solr instance (usually service solr restart depending on solr/OS)

8. Run ReExportAll to update dataset metadata exports. Follow the directions in the Admin Guide.

9. If your installation did not have :FilePIDsEnabled set, you will need to set it to true to keep file PIDs enabled:

  curl -X PUT -d 'true' http://localhost:8080/api/admin/settings/:FilePIDsEnabled

10. If your installation uses Handles as persistent identifiers (instead of DOIs): remember to upgrade your Handles service installation to a currently supported version.

Generally, Handles is known to be working reliably even when running older versions that haven't been officially supported in years. We still recommend to check on your service and make sure to upgrade to a supported version (the latest version is 9.3.1, https://www.handle.net/hnr-source/handle-9.3.1-distribution.tar.gz, as of writing this). An older version may be running for you seemingly just fine, but do keep in mind that it may just stop working unexpectedly at any moment, because of some incompatibility introduced in a Java rpm upgrade, or anything similarly unpredictable.

Handles is also very good about backward incompatibility. Meaning, in most cases you can simply stop the old version, unpack the new version from the distribution and start it on the existing config and database files, and it'll just keep working. However, it is a good idea to keep up with the recommended format upgrades, for the sake of efficiency and to avoid any unexpected surprises, should they finally decide to drop the old database format, for example. The two specific things we recommend: 1) Make sure your service is using a json version of the siteinfo bundle (i.e., if you are still using siteinfo.bin, convert it to siteinfo.json and remove the binary file from the service directory) and 2) Make sure you are using the newer bdbje database format for your handles catalog (i.e., if you still have the files handles.jdb and nas.jdb in your server directory, convert them to the new format). Follow the simple conversion instructions in the file README.txt in the Handles software distribution. Make sure to stop the service before converting the files and make sure to have a full backup of the existing server directory, just in case. Do not hesitate to contact the Handles support with any questions you may have, as they are very responsive and helpful.

New JVM Options and MicroProfile Config Options

The following PID provider options are now available. See the section "Changes to PID Provider JVM Settings" below for more information.

  • dataverse.pid.datacite.mds-api-url
  • dataverse.pid.datacite.rest-api-url
  • dataverse.pid.datacite.username
  • dataverse.pid.datacite.password
  • dataverse.pid.handlenet.key.path
  • dataverse.pid.handlenet.key.passphrase
  • dataverse.pid.handlenet.index
  • dataverse.pid.permalink.base-url
  • dataverse.pid.ezid.api-url
  • dataverse.pid.ezid.username
  • dataverse.pid.ezid.password

The following MicroProfile Config options have been added as part of Signposting support. See the section "Signposting for Dataverse" below for details.

  • dataverse.signposting.level1-author-limit
  • dataverse.signposting.level1-item-limit

The following JVM options are described in the "Creating datasets with incomplete metadata through API" section below.

  • dataverse.api.allow-incomplete-metadata
  • dataverse.ui.show-validity-filter
  • dataverse.ui.allow-review-for-incomplete

The following JVM/MicroProfile setting is for External Exporters. See "Mechanism Added for Adding External Exporters" below.

  • dataverse.spi.export.directory

The following JVM/MicroProfile settings are for handling of support emails. See "Contact Email Improvements" below.

  • dataverse.mail.support-email
  • dataverse.mail.cc-support-on-contact-emails

The following JVM/MicroProfile setting is for extracting a geospatial bounding box even if S3 direct upload is enabled.

  • dataverse.netcdf.geo-extract-s3-direct-upload

Backward Incompatibilities

The following list of potential backward incompatibilities references the sections of the "Detailed Release Highlights..." portion of the document further below where the corresponding changes are explained in detail.

Using the new External Exporters framework

Care should be taken when replacing Dataverse's internal metadata export formats as third party code, including other third party Exporters, may depend on the contents of those export formats. When replacing an existing format, one must also remember to delete the cached metadata export files or run the r...

Read more

v5.13

14 Feb 15:52
79d6e57
Compare
Choose a tag to compare

Dataverse Software 5.13

This release brings new features, enhancements, and bug fixes to the Dataverse software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Release Highlights

Schema.org Improvements (Some Backward Incompatibility)

The Schema.org metadata used as an export format and also embedded in dataset pages has been updated to improve compliance with Schema.org's schema and Google's recommendations for Google Dataset Search.

Please be advised that these improvements have the chance to break integrations that rely on the old, less compliant structure. For details see the "backward incompatibility" section below. (Issue #7349)

Folder Uploads via Web UI (dvwebloader, S3 only)

For installations using S3 for storage and with direct upload enabled, a new tool called DVWebloader can be enabled that allows web users to upload a folder with a hierarchy of files and subfolders while retaining the relative paths of files (similarly to how the DVUploader tool does it on the command line, but with the convenience of using the browser UI). See Folder Upload in the User Guide for details. (PR #9096)

Long Descriptions of Collections (Dataverses) are Now Truncated

Like datasets, long descriptions of collections (dataverses) are now truncated by default but can be expanded with a "read full description" button. (PR #9222)

License Sorting

Licenses as shown in the dropdown in UI can be now sorted by the superusers. See Sorting Licenses section of the Installation Guide for details. (PR #8697)

Metadata Field Production Location Now Repeatable, Facetable, and Enabled for Advanced Search

Depositors can now click the plus sign to enter multiple instances of the metadata field "Production Location" in the citation metadata block. Additionally this field now appears on the Advanced Search page and can be added to the list of search facets. (PR #9254)

Support for NetCDF and HDF5 Files

NetCDF and HDF5 files are now detected based on their content rather than just their file extension. Both "classic" NetCDF 3 files and more modern NetCDF 4 files are detected based on content. Detection for older HDF4 files is only done through the file extension ".hdf", as before.

For NetCDF and HDF5 files, an attempt will be made to extract metadata in NcML (XML) format and save it as an auxiliary file. There is a new NcML previewer available in the dataverse-previewers repo.

An extractNcml API endpoint has been added, especially for installations with existing NetCDF and HDF5 files. After upgrading, they can iterate through these files and try to extract an NcML file.

See the NetCDF and HDF5 section of the User Guide for details. (PR #9239)

Support for .eln Files (Electronic Laboratory Notebooks)

The .eln file format is used by Electronic Laboratory Notebooks as an exchange format for experimental protocols, results, sample descriptions, etc...

Improved Security for External Tools

External tools can now be configured to use signed URLs to access the Dataverse API as an alternative to API tokens. This eliminates the need for tools to have access to the user's API token in order to access draft or restricted datasets and datafiles. Signed URLs can be transferred via POST or via a callback when triggering a tool via GET. See Authorization Options in the External Tools documentation for details. (PR #9001)

Geospatial Search (API Only)

Geospatial search is supported via the Search API using two new parameters: geo_point and geo_radius.

The fields that are geospatially indexed are "West Longitude", "East Longitude", "North Latitude", and "South Latitude" from the "Geographic Bounding Box" field in the geospatial metadata block. (PR #8239)

Reproducibility and Code Execution with Binder

Binder has been added to the list of external tools that can be added to a Dataverse installation. From the dataset page, you can launch Binder, which spins up a computational environment in which you can explore the code and data in the dataset, or write new code, such as a Jupyter notebook. (PR #9341)

CodeMeta (Software) Metadata Support (Experimental)

Experimental support for research software metadata deposits has been added.

By adding a metadata block for CodeMeta, we take another step toward adding first class support of diverse FAIR objects, such as research software and computational workflows.

There is more work underway to make Dataverse installations around the world "research software ready."

Note: Like the metadata block for computational workflows before, CodeMeta is listed under Experimental Metadata in the guides. Experimental means it's brand new, opt-in, and might need future tweaking based on experience of usage in the field. We hope for feedback from installations on the new metadata block to optimize and lift it from the experimental stage. (PR #7877)

Mechanism Added for Stopping a Harvest in Progress

It is now possible for a sysadmin to stop a long-running harvesting job. See Harvesting Clients in the Admin Guide for more information. (PR #9187)

API Endpoint Listing Metadata Block Details has been Extended

The API endpoint /api/metadatablocks/{block_id} has been extended to include the following fields:

  • controlledVocabularyValues - All possible values for fields with a controlled vocabulary. For example, the values "Agricultural Sciences", "Arts and Humanities", etc. for the "Subject" field.
  • isControlledVocabulary: Whether or not this field has a controlled vocabulary.
  • multiple: Whether or not the field supports multiple values.

See Metadata Blocks in the API Guide for details. (PR #9213)

Advanced Database Settings

You can now enable advanced database connection pool configurations useful for debugging and monitoring as well as other settings. Of particular interest may be sslmode=require, though installations already setting this parameter in the Postgres connection string will need to move it to dataverse.db.parameters. See the new Database Persistence section of the Installation Guide for details. (PR #8915)

Support for Cleaning up Leftover Files in Dataset Storage

Experimental feature: the leftover files stored in the Dataset storage location that are not in the file list of that Dataset, but are named following the Dataverse technical convention for dataset files, can be removed with the new Cleanup Storage of a Dataset API endpoint.

OAI Server Bug Fixed

A bug introduced in 5.12 was preventing the Dataverse OAI server from serving incremental harvesting requests from clients. It was fixed in this release (PR #9316).

Major Use Cases and Infrastructure Enhancements

Changes and fixes in this release not already mentioned above include:

  • Administrators can configure an alternative storage location where files uploaded via the UI are temporarily stored during the transfer from client to server. (PR #8983, See also Configuration Guide)
  • To improve performance, Dataverse estimates download counts. This release includes an update that makes the estimate more accurate. (PR #8972)
  • Direct upload and out-of-band uploads can now be used to replace multiple files with one API call (complementing the prior ability to add multiple new files). (PR #9018)
  • A persistent identifier, CSRT, is added to the Related Publication field's ID Type child field. For datasets published with CSRT IDs, Dataverse will also include them in the datasets' Schema.org metadata exports. (Issue #8838)
  • Datasets that are part of linked dataverse collections will now be displayed in their linking dataverse collections.

New JVM Options and MicroProfile Config Options

The following JVM option is now available:

  • dataverse.personOrOrg.assumeCommaInPersonName - the default is false

The following MicroProfile Config options are now available (these can be treated as JVM options):

  • dataverse.files.uploads - alternative storage location of generated temporary files for UI file uploads
  • dataverse.api.signing-secret - used by signed URLs
  • dataverse.solr.host
  • dataverse.solr.port
  • dataverse.solr.protocol
  • dataverse.solr.core
  • dataverse.solr.path
  • dataverse.rserve.host

The following existing JVM options are now available via MicroProfile Config:

  • dataverse.siteUrl
  • dataverse.fqdn
  • dataverse.files.directory
  • dataverse.rserve.host
  • dataverse.rserve.port
  • dataverse.rserve.user
  • dataverse.rserve.password
  • dataverse.rserve.tempdir

Notes for Developers and Integrato...

Read more

v5.12.1

07 Nov 13:51
cf90431
Compare
Choose a tag to compare

Dataverse Software 5.12.1

This release brings new features, enhancements, and bug fixes to the Dataverse Software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Release Highlights

Bug Fix for "Internal Server Error" When Creating a New Remote Account

Unfortunately, as of 5.11 new remote users have seen "Internal Server Error" when creating an account (or checking notifications just after creating an account). Remote users are those who log in with institutional (Shibboleth), OAuth (ORCID, GitHub, or Google) or OIDC providers.

This is a transient error that can be worked around by reloading the browser (or logging out and back in again) but it's obviously a very poor user experience and a bad first impression. This bug is the primary reason we are putting out this patch release. Other features and bug fixes are coming along for the ride.

Ability to Disable OAuth Sign Up While Allowing Existing Accounts to Log In

A new option called :AllowRemoteAuthSignUp has been added providing a mechanism for disabling new account signups for specific OAuth2 authentication providers (Orcid, GitHub, Google etc.) while still allowing logins for already-existing accounts using this authentication method.

See the Installation Guide for more information on the setting.

Production Date Now Used for Harvested Datasets in Addition to Distribution Date (oai_dc format)

Fix the year displayed in citation for harvested dataset, especially for oai_dc format.

For normal datasets, the date used is the "citation date" which is by default the publication date (the first release date) unless you change it.

However, for a harvested dataset, the distribution date was used instead and this date is not always present in the harvested metadata.

Now, the production date is used for harvested dataset in addition to distribution date when harvesting with the oai_dc format.

Publication Date Now Used for Harvested Dataset if Production Date is Not Set (oai_dc format)

For exports and harvesting in oai_dc format, if "Production Date" is not set, "Publication Date" is now used instead. This change is reflected in the Dataverse 4+ Metadata Crosswalk linked from the Appendix of the User Guide.

Major Use Cases and Infrastructure Enhancements

Changes and fixes in this release include:

  • Users creating an account by logging in with Shibboleth, OAuth, or OIDC should not see errors. (Issue 9029, PR #9030)
  • When harvesting datasets, I want the Production Date if I can't get the Distribution Date (PR #8732)
  • When harvesting datasets, I want the Publication Date if I can't get the Production Date (PR #8733)
  • As a sysadmin I'd like to disable (temporarily or permanently) sign ups from OAuth providers while allowing existing users to continue to log in from that provider (PR #9112)
  • As a C/C++ developer I want to use Dataverse APIs (PR #9070)

New DB Settings

The following DB settings have been added:

  • :AllowRemoteAuthSignUp

See the Database Settings section of the Guides for more information.

Complete List of Changes

For the complete list of code changes in this release, see the 5.12.1 Milestone in GitHub.

For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.

Installation

If this is a new installation, please see our Installation Guide. Please also contact us to get added to the Dataverse Project Map if you have not done so already.

Upgrade Instructions

Upgrading requires a maintenance window and downtime. Please plan ahead, create backups of your database, etc.

0. These instructions assume that you've already successfully upgraded from Dataverse Software 4.x to Dataverse Software 5 following the instructions in the Dataverse Software 5 Release Notes. After upgrading from the 4.x series to 5.0, you should progress through the other 5.x releases before attempting the upgrade to 5.12.1.

If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.

export PAYARA=/usr/local/payara5

(or setenv PAYARA /usr/local/payara5 if you are using a csh-like shell)

1. Undeploy the previous version

    $PAYARA/bin/asadmin list-applications
    $PAYARA/bin/asadmin undeploy dataverse<-version>

2. Stop Payara

    service payara stop
    rm -rf $PAYARA/glassfish/domains/domain1/generated

6. Start Payara

    service payara start

7. Deploy this version.

    $PAYARA/bin/asadmin deploy dataverse-5.12.1.war

8. Restart payara

    service payara stop
    service payara start

Upcoming Versions of Payara

With the recent release of Payara 6 (Payara 6.2022.1 being the first version), the days of free-to-use Payara 5.x Platform Community versions are numbered. Specifically, Payara's blog post says, "Payara Platform Community 5.2022.4 has been released today as the penultimate Payara 5 Community release."

Given the end of free-to-use Payara 5 versions, we plan to get the Dataverse software working on Payara 6 (#8305), which will require substantial efforts from the IQSS team and community members, as this also means shifting our app to be a Jakarta EE 10 application (upgrading from EE 8). We are currently working out the details and will share news as soon as we can. Rest assured we will do our best to provide you with a smooth transition. You can follow along in Issue #8305 and related pull requests and you are, of course, very welcome to participate by testing and otherwise contributing, as always.

v5.12

05 Oct 14:18
71341c0
Compare
Choose a tag to compare

Dataverse Software 5.12

This release brings new features, enhancements, and bug fixes to the Dataverse Software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Release Highlights

Support for Globus

Globus can be used to transfer large files. Part of "Harvard Data Commons Additions" below.

Support for Remote File Storage

Dataset files can be stored at remote URLs. Part of "Harvard Data Commons Additions" below.

New Computational Workflow Metadata Block

The new Computational Workflow metadata block will allow depositors to effectively tag datasets as computational workflows.

To add the new metadata block, follow the instructions in the Admin Guide: https://guides.dataverse.org/en/5.12/admin/metadatacustomization.html

The location of the new metadata block tsv file is scripts/api/data/metadatablocks/computational_workflow.tsv. Part of "Harvard Data Commons Additions" below.

Support for Linked Data Notifications (LDN)

Linked Data Notifications (LDN) is a standard from the W3C. Part of "Harvard Data Commons Additions" below.

Harvard Data Commons Additions

As reported at the 2022 Dataverse Community Meeting, the Harvard Data Commons project has supported a wide range of additions to the Dataverse software that improve support for Big Data, Workflows, Archiving, and interaction with other repositories. In many cases, these additions build upon features developed within the Dataverse community by Borealis, DANS, QDR, TDL, and others. Highlights from this work include:

  • Initial support for Globus file transfer to upload to and download from a Dataverse managed S3 store. The current implementation disables file restriction and embargo on Globus-enabled stores.
  • Initial support for Remote File Storage. This capability, enabled via a new RemoteOverlay store type, allows a file stored in a remote system to be added to a dataset (currently only via API) with download requests redirected to the remote system. Use cases include referencing public files hosted on external web servers as well as support for controlled access managed by Dataverse (e.g. via restricted and embargoed status) and/or by the remote store.
  • Initial support for computational workflows, including a new metadata block and detected filetypes.
  • Support for archiving to any S3 store using Dataverse's RDA-conformant BagIT file format (a BagPack).
  • Improved error handling and performance in archival bag creation and new options such as only supporting archiving of one dataset version.
  • Additions/corrections to the OAI-ORE metadata format (which is included in archival bags) such as referencing the name/mimetype/size/checksum/download URL of the original file for ingested files, the inclusion of metadata about the parent collection(s) of an archived dataset version, and use of the URL form of PIDs.
  • Display of archival status within the dataset page versions table, richer status options including success, pending, and failure states, with a complete API for managing archival status.
  • Support for batch archiving via API as an alternative to the current options of configuring archiving upon publication or archiving each dataset version manually.
  • Initial support for sending and receiving Linked Data Notification messages indicating relationships between a dataset and external resources (e.g. papers or other dataset) that can be used to trigger additional actions, such as the creation of a back-link to provide, for example, bi-directional linking between a published paper and a Dataverse dataset.
  • A new capability to provide custom per field instructions in dataset templates
  • The following file extensions are now detected:
    • wdl=text/x-workflow-description-language
    • cwl=text/x-computational-workflow-language
    • nf=text/x-nextflow
    • Rmd=text/x-r-notebook
    • rb=text/x-ruby-script
    • dag=text/x-dagman

Improvements to Fields that Appear in the Citation Metadata Block

Grammar, style and consistency improvements have been made to the titles, tooltip description text, and watermarks of metadata fields that appear in the Citation metadata block.

This includes fields that dataset depositors can edit in the Citation Metadata accordion (i.e. fields controlled by the citation.tsv and citation.properties files) and fields whose values are system-generated, such as the Dataset Persistent ID, Previous Dataset Persistent ID, and Publication Date fields whose titles and tooltips are configured in the bundles.properties file.

The changes should provide clearer information to curators, depositors, and people looking for data about what the fields are for.

A new page in the Style Guides called "Text" has also been added. The new page includes a section called "Metadata Text Guidelines" with a link to a Google Doc where the guidelines are being maintained for now since we expect them to be revised frequently.

New Static Search Facet: Metadata Types

A new static search facet has been added to the search side panel. This new facet is called "Metadata Types" and is driven from metadata blocks. When a metadata field value is inserted into a dataset, an entry for the metadata block it belongs to is added to this new facet.

This new facet needs to be configured for it to appear on the search side panel. The configuration assigns to a dataverse what metadata blocks to show. The configuration is inherited by child dataverses.

To configure the new facet, use the Metadata Block Facet API: https://guides.dataverse.org/en/5.12/api/native-api.html#set-metadata-block-facet-for-a-dataverse-collection

Broader MicroProfile Config Support for Developers

As of this release, many JVM options
can be set using any MicroProfile Config Source.

Currently this change is only relevant to developers but as settings are migrated to the new "lookup" pattern documented in the Consuming Configuration section of the Developer Guide, anyone installing the Dataverse software will have much greater flexibility when configuring those settings, especially within containers. These changes will be announced in future releases.

Please note that an upgrade to Payara 5.2021.8 or higher is required to make use of this. Payara 5.2021.5 threw exceptions, as explained in PR #8823.

HTTP Range Requests: New HTTP Status Codes and Headers for Datafile Access API

The Basic File Access resource for datafiles (/api/access/datafile/$id) was slightly modified in order to comply better with the HTTP specification for range requests.

If the request contains a "Range" header:

  • The returned HTTP status is now 206 (Partial Content) instead of 200
  • A "Content-Range" header is returned containing information about the returned bytes
  • An "Accept-Ranges" header with value "bytes" is returned

CORS rules/headers were modified accordingly:

  • The "Range" header is added to "Access-Control-Allow-Headers"
  • The "Content-Range" and "Accept-Ranges" header are added to "Access-Control-Expose-Headers"

This new functionality has enabled a Zip Previewer and file extractor for zip files, an external tool.

File Type Detection When File Has No Extension

File types are now detected based on the filename when the file has no extension.

The following filenames are now detected:

  • Makefile=text/x-makefile
  • Snakemake=text/x-snakemake
  • Dockerfile=application/x-docker-file
  • Vagrantfile=application/x-vagrant-file

These are defined in MimeTypeDetectionByFileName.properties.

Upgrade to Payara 5.2022.3 Highly Recommended

With lots of bug and security fixes included, we encourage everyone to upgrade to Payara 5.2022.3 as soon as possible. See below for details.

Major Use Cases and Infrastructure Enhancements

Changes and fixes in this release include:

  • Administrators can configure an S3 store used in Dataverse to support users uploading/downloading files via Globus File Transfer. (PR #8891)
  • Administrators can configure a RemoteOverlay store to allow files that remain hosted by a remote system to be added to a dataset. (PR #7325)
  • Administrators can configure the Dataverse software to send archival Bag copies of published dataset versions to any S3-compatible service. (PR #8751)
  • Users can see information about a dataset's parent collection(s) in the OAI-ORE metadata export. (PR #8770)
  • Users and administrators can now use the OAI-ORE metadata export to retrieve and assess the fixity of the original file (for ingested tabular files) via the included checksum. (PR #8901)
  • Archiving via RDA-conformant Bags is more robust and is more configurable. (PR #8773, #8747, #8699, #8609, #8606, #8610)
  • Users and administrators can see the archival status of the versions of the datasets they manage in the dataset page version table. (PR #8748, #8696)
  • Administrators can configure messaging between their Dataverse installation and other repositories that may hold related resources or services interested in activity within that installation. (PR #8775)
  • Collection managers can create templates that include custom instructions on how to fill out specific metadata fields.
  • Dataset update API users are given more information when the dataset they are updating is out of compliance with Terms of Access requirements (Issue #8859)
  • Adds...
Read more

v5.11.1

02 Aug 18:20
02e3e92
Compare
Choose a tag to compare

Dataverse Software 5.11.1

This is a bug fix release of the Dataverse Software. The .war file for v5.11 will no longer be made available and installations should upgrade directly from v5.10.1 to v5.11.1. To do so you will need to follow the instructions for installing release 5.11 using the v5.11.1 war file. (Note specifically the upgrade steps 6-9 from the 5.11 release note; most importantly, the ones related to the citation block and the Solr schema). If you had previously installed v5.11 (no longer available), follow the simplified instructions below.

Release Highlights

Dataverse Software 5.11 contains two critical issues that are fixed in this release.

First, if you delete a file from a published version of a dataset, the file will be deleted from the file system (or S3) and lose its "owner id" in the database. For details, see Issue #8867.

Second, if you are a superuser, it's possible to click "Delete Draft" and delete a published dataset if it has restricted files. For details, see #8845 and #8742.

Notes for Dataverse Installation Administrators

Identifying Datasets with Deleted Files

If you have been running 5.11, check if any files show "null" for the owner id. The "owner" of a file is the parent dataset:

select * from dvobject where dtype = 'DataFile' and owner_id is null;

For any of these files, change the owner id to the database id of the parent dataset. In addition, the file on disk (or in S3) is likely gone. Look at the "storageidentifier" field from the query above to determine the location of the file then restore the file from backup.

Identifying Datasets Superusers May Have Accidentally Destroyed

Check the "actionlogrecord" table for DestroyDatasetCommand. While these "destroy" entries are normal when a superuser uses the API to destroy datasets, an entry is also created if a superuser has accidentally deleted a published dataset in the web interface with the "Delete Draft" button.

Complete List of Changes

For the complete list of code changes in this release, see the 5.11.1 Milestone in GitHub.

For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.

Installation

If this is a new installation, please see our Installation Guide. Please also contact us to get added to the Dataverse Project Map if you have not done so already.

Upgrade Instructions

0. These instructions assume that you've already successfully upgraded from Dataverse Software 4.x to Dataverse Software 5 following the instructions in the Dataverse Software 5 Release Notes. After upgrading from the 4.x series to 5.0, you should progress through the other 5.x releases before attempting the upgrade to 5.11.1. To upgrade from 5.10.1, follow the instructions for installing release 5.11 using the v5.11.1 war file. If you had previously installed v5.11 (no longer available), follow the simplified instructions below.

If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.

In the following commands we assume that Payara 5 is installed in /usr/local/payara5. If not, adjust as needed.

export PAYARA=/usr/local/payara5

(or setenv PAYARA /usr/local/payara5 if you are using a csh-like shell)

1. Undeploy the previous version.

  • $PAYARA/bin/asadmin list-applications
  • $PAYARA/bin/asadmin undeploy dataverse<-version>

2. Stop Payara and remove the generated directory

  • service payara stop
  • rm -rf $PAYARA/glassfish/domains/domain1/generated

3. Start Payara

  • service payara start

4. Deploy this version.

  • $PAYARA/bin/asadmin deploy dataverse-5.11.1.war

5. Restart Payara

  • service payara stop
  • service payara start

v5.11

13 Jun 20:49
21ac7e1
Compare
Choose a tag to compare

Dataverse Software 5.11

Please note: We have removed the 5.11 war file and dvinstall.zip because there are very serious bugs in the 5.11 release. For the upgrade instructions below, please use the 5.11.1 war file instead. New installations should start with 5.11.1. The bugs are explained in the 5.11.1 release notes.

This release brings new features, enhancements, and bug fixes to the Dataverse Software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Release Highlights

Terms of Access or Request Access Required for Restricted Files

Beginning in this release, datasets with restricted files must have either Terms of Access or Request Access enabled. This change is to ensure that for each file in a Dataverse installation there is a clear path to get to the data, either through requesting access to the data or to provide context about why requesting access is not enabled.

Published datasets are not affected by this change. Datasets that are in draft and that have neither Terms of Access nor Request Access enabled must be updated to select one or the other (or both). Otherwise, datasets cannot be futher edited or published. Dataset authors will be able to tell if their dataset is affected by the presence of the following message at the top of their dataset (when they are logged in):

"Datasets with restricted files are required to have Request Access enabled or Terms of Access to help people access the data. Please edit the dataset to confirm Request Access or provide Terms of Access to be in compliance with the policy."

At this point, authors should click "Edit Dataset" then "Terms" and then check the box for "Request Access" or fill in "Terms of Access for Restricted Files" (or both). Afterwards, authors will be able to further edit metadata and publish.

In the "Notes for Dataverse Installation Administrators" section, we have provided a query to help proactively identify datasets that need to be updated.

See also Issue #8191 and PR #8308.

Muting Notifications

Users can control which notifications they receive if the system is configured to allow this. See also Issue #7492 and PR #8530.

Major Use Cases and Infrastructure Enhancements

Changes and fixes in this release include:

  • Terms of Access or Request Access required for restricted files. (Issue #8191, PR #8308)
  • Users can control which notifications they receive if the system is configured to allow this. (Issue #7492, PR #8530)
  • A 500 error was occuring when creating a dataset if a template did not have an associated "termsofuseandaccess". See "Legacy Templates Issue" below for details. (Issue #8599, PR #8789)
  • Tabular ingest can be skipped via API. (Issue #8525, PR #8532)
  • The "Verify Email" button has been changed to "Send Verification Email" and rather than sometimes showing a popup now always sends a fresh verification email (and invalidates previous verification emails). (Issue #8227, PR #8579)
  • For Shibboleth users, the emailconfirmed timestamp is now set on login and the UI should show "Verified". (Issue #5663, PR #8579)
  • Information about the license selection (or custom terms) is now available in the confirmation popup when contributors click "Submit for Review". Previously, this was only available in the confirmation popup for the "Publish" button, which contributors do not see. (Issue #8561, PR #8691)
  • For installations configured to support multiple languages, controlled vocabulary fields that do not allow multiple entries (e.g. journalArticleType) are now indexed properly. (Issue #8595, PR #8601, PR #8624)
  • Two-letter ISO-639-1 codes for languages are now supported, in metadata imports and harvesting. (Issue #8139, PR #8689)
  • The API endpoint for listing notifications has been enhanced to show the subject, text, and timestamp of notifications. (Issue #8487, PR #8530)
  • The API Guide has been updated to explain that the Content-type header is now (as of Dataverse 5.6) necessary to create datasets via native API. (Issue #8663, PR #8676)
  • Admin API endpoints have been added to find and delete dataset templates. (Issue 8600, PR #8706)
  • The BagIt file handler detects and transforms zip files with a BagIt package format into Dataverse data files, validating checksums along the way. See the BagIt File Handler section of the Installation Guide for details. (Issue #8608, PR #8677)
  • For BagIt Export, the number of threads used when zipping data files into an archival bag is now configurable using the :BagGeneratorThreads database setting. (Issue #8602, PR #8606)
  • PostgreSQL 14 can now be used (though we've tested mostly with 13). PostgreSQL 10+ is required. (Issue #8295, PR #8296)
  • As always, widgets can be embedded in the <iframe> HTML tag, but the HTTP header "Content-Security-Policy" is now being sent on non-widget pages to prevent them from being embedded. (PR #8662)
  • URIs in the the experimental Semantic API have changed (details below). (Issue #8533, PR #8592)
  • Installations running Make Data Count can upgrade to Counter Processor-0.1.04. (Issue #8380, PR #8391)
  • PrimeFaces, the UI framework we use, has been upgraded from 10 to 11. (Issue #8456, PR #8652)

Notes for Dataverse Installation Administrators

Identifying Datasets Requiring Terms of Access or Request Access Changes

In support of the change to require either Terms of Access or Request Access for all restricted files (see above for details), we have provided a query to identify datasets in your installation where at least one restricted file has neither Terms of Access nor Request Access enabled:

https://github.com/IQSS/dataverse/blob/v5.11/scripts/issues/8191/datasets_without_toa_or_request_access

This will allow you to reach out to those dataset owners as appropriate.

Legacy Templates Issue

When custom license functionality was added, dataverses that had older legacy templates as their default template would not allow the creation of a new dataset (500 error).

This occurred because those legacy templates did not have an associated termsofuseandaccess linked to them.

In this release, we run a script that creates a default empty termsofuseandaccess for each of these templates and links them.

Note the termsofuseandaccess that are created this way default to using the license with id=1 (cc0) and the fileaccessrequest to false.

See also Issue #8599 and PR #8789.

PostgreSQL Version 10+ Required

This release upgrades the bundled PostgreSQL JDBC driver to support major version 14.

Note that the newer PostgreSQL driver required a Flyway version bump, which entails positive and negative consequences:

  • The newer version of Flyway supports PostgreSQL 14 and includes a number of security fixes.
  • As of version 8.0 the Flyway Community Edition dropped support for PostgreSQL 9.6 and older.

This means that as foreshadowed in the 5.10 and 5.10.1 release notes, version 10 or higher of PostgreSQL is now required. For suggested upgrade steps, please see "PostgreSQL Update" in the release notes for 5.10: https://github.com/IQSS/dataverse/releases/tag/v5.10

Counter Processor 0.1.04 Support

This release includes support for counter-processor-0.1.04 for processing Make Data Count metrics. If you are running Make Data Counts support, you should reinstall/reconfigure counter-processor as described in the latest Guides. (For existing installations, note that counter-processor-0.1.04 requires a newer version of Python so you will need to follow the full counter-processor install. Also note that if you configure the new version the same way, it will reprocess the days in the current month when it is first run. This is normal and will not affect the metrics in Dataverse.)

New JVM Options and DB Settings

The following DB settings have been added:

  • :ShowMuteOptions
  • :AlwaysMuted
  • :NeverMuted
  • :CreateDataFilesMaxErrorsToDisplay
  • :BagItHandlerEnabled
  • :BagValidatorJobPoolSize
  • :BagValidatorMaxErrors
  • :BagValidatorJobWaitInterval
  • :BagGeneratorThreads

See the Database Settings section of the Guides for more information.

Notes for Developers and Integrators

See the "Backward Incompatibilities" section below.

Backward Incompatibilities

Semantic API Changes

This release includes an update to the experimental semantic API and the underlying assignment of URIs to metadata block terms that are not explicitly mapped to terms in community vocabularies. The change affects the output of the OAI_ORE metadata export, the OAI_ORE file in archival bags, and the input/output allowed for those terms in the semantic API.

For those updating integrating code or existing files intended for input into this release of Dataverse, URIs of the form...

https://dataverse.org/schema/<block name>/<parentField name>#<childField title>

and

https://dataverse.org/schema/<block name>/<Field title>

...are both replaced with URIs of the form...

https://dataverse.org/schema/<block name>/<Field name>.

Create Dataset API Requires Content-type Header (Since 5.6)

Due to a code change introduced in Dataverse 5.6, calls to the native API without the Content-type header will fail to create a dataset. The API Guide has been updated to indicate the necessity of this header: https://guides.dataverse.org/en/5.11/api/native-api.html#create-a-dataset-in-a-dataverse-collection

Complete List of Changes

...

Read more

v5.10.1

06 Apr 19:49
b844672
Compare
Choose a tag to compare

Dataverse Software 5.10.1

This release brings new features, enhancements, and bug fixes to the Dataverse Software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Release Highlights

Bug Fix for Request Access

Dataverse Software 5.10 contains a bug where the "Request Access" button doesn't work from the file listing on the dataset page if the dataset contains custom terms. This has been fixed in PR #8555.

Bug Fix for Searching and Selecting Controlled Vocabulary Values

Dataverse Software 5.10 contains a bug where the search option is no longer present when selecting from more than ten controlled vocabulary values. This has been fixed in PR #8521.

Major Use Cases and Infrastructure Enhancements

Changes and fixes in this release include:

  • Users can use the "Request Access" button when the dataset has custom terms. (Issue #8553, PR #8555)
  • Users can search when selecting from more than ten controlled vocabulary values. (Issue #8519, PR #8521)
  • The default file categories ("Documentation", "Data", and "Code") can be redefined through the :FileCategories database setting. (Issue #8461, PR #8478)
  • Documentation on troubleshooting Excel ingest errors was improved. (PR #8541)
  • Internationalized controlled vocabulary values can now be searched. (Issue #8286, PR #8435)
  • Curation labels can be internationalized. (Issue #8381, PR #8466)
  • "NONE" is no longer accepted as a license using the SWORD API (since 5.10). See "Backward Incompatibilities" below for details. (Issue #8551, PR #8558).

Notes for Dataverse Installation Administrators

PostgreSQL Version 10+ Required Soon

Because 5.10.1 is a bug fix release, an upgrade to PostgreSQL is not required. However, this upgrade is still coming in the next non-bug fix release. For details, please see the release notes for 5.10: https://github.com/IQSS/dataverse/releases/tag/v5.10

Payara Upgrade

You may notice that the Payara version used in the install scripts has been updated from 5.2021.5 to 5.2021.6. This was to address a bug where it was not possible to easily update the logging level. For existing installations, this release does not require upgrading Payara and a Payara upgrade is not part of the Upgrade Instructions below. For more information, see PR #8508.

New JVM Options and DB Settings

The following DB settings have been added:

  • :FileCategories - The default list of the pre-defined file categories ("Documentation", "Data" and "Code") can now be redefined with a comma-separated list (e.g. 'Docs,Data,Code,Workflow').

See the Database Settings section of the Guides for more information.

Notes for Developers and Integrators

In the "Backward Incompatibilities" section below, note changes in the API regarding licenses and the SWORD API.

Backward Incompatibilities

As of Dataverse 5.10, "NONE" is no longer supported as a valid license when creating a dataset using the SWORD API. The API Guide has been updated to reflect this. Additionally, if you specify an invalid license, a list of available licenses will be returned in the response.

Complete List of Changes

For the complete list of code changes in this release, see the 5.10.1 Milestone in Github.

For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.

Installation

If this is a new installation, please see our Installation Guide. Please also contact us to get added to the Dataverse Project Map if you have not done so already.

Upgrade Instructions

0. These instructions assume that you've already successfully upgraded from Dataverse Software 4.x to Dataverse Software 5 following the instructions in the Dataverse Software 5 Release Notes. After upgrading from the 4.x series to 5.0, you should progress through the other 5.x releases before attempting the upgrade to 5.10.1.

If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.

In the following commands we assume that Payara 5 is installed in /usr/local/payara5. If not, adjust as needed.

export PAYARA=/usr/local/payara5

(or setenv PAYARA /usr/local/payara5 if you are using a csh-like shell)

1. Undeploy the previous version.

  • $PAYARA/bin/asadmin list-applications
  • $PAYARA/bin/asadmin undeploy dataverse<-version>

2. Stop Payara and remove the generated directory

  • service payara stop
  • rm -rf $PAYARA/glassfish/domains/domain1/generated

3. Start Payara

  • service payara start

4. Deploy this version.

  • $PAYARA/bin/asadmin deploy dataverse-5.10.1.war

5. Restart payara

  • service payara stop
  • service payara start