Specification DRAFT

This is a DRAFT specification, not intended for use or reference.

Introduction

This specification documents BagIt Enabled Deposit (BAGEND), a profile for BagIt that incorporates semantic description of the bag payload, specifically supporting the transfer of publications, data, bibliographic metadata, contributors, and their funding sources.

It provides a low barrier to entry for producers and consumers of conformant bags by allowing the use of idiomatic JSON for semantic description. It introduces linked data which allows for the development and evolution of advanced use cases to be implemented on an as-needed basis.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Relationship to BagIT

Packages conformant with this specification MUST comport with BagIT version 1.0 (RFC 8493), and MUST be valid and complete bags. Bags conforming to this specification MUST advertise their compliance by including a BagIt-Profile-Identifier tag in bag-info.txt with the value http://bagend.io/bagit-profile/0.1/. Consumers MAY ascribe the semantics present in the BAGEND Resource Model to standard BagIt tags present in bag-info.txt.

If the directory /resources/bagend exists (relative to the base directory of the bag) and a BagIt-Profile-Identifier tag exists with a value of http://bagend.io/bagit-profile/0.1/ in bag-info.txt, then files contained in this directory SHOULD be interpreted according to this specification as entities of the BAGEND Resource Model describing the content contained in the BagIt payload directory, /data.

If /resources/bagend contains a Submission from the BAGEND Resource Model, then the full path (relative to the base directory of the bag) to the file containing the Submission MUST be specified by the BagEnd-Submission-File tag in bag-info.txt. The identity of the Submission resource SHOULD be specified by the BagEnd-Submission-Resource tag in bag-info.txt.

TODO: verbiage about the exploded bag directory name matching the serialized bag file basename?

BAGEND Resource Model

Bags conforming to the BAGEND profile SHOULD serialize and transmit an instance of the Resource Model with the package. If the Resource Model is included, it MUST be located under the directory (relative to the base directory of the bag) named /resources/bagend. The Resource Model, if present, MUST contain at most one Submission. The Submission, if present, MUST contain exactly one Article.

Resources MUST be serialized as JSON-LD. Compliant serializations SHOULD be in compacted form, and MUST use the context located at http://bagend.io/model/0.1/context.jsonld for representing the instances and relationships of the Resource Model. Arbitrary keys and values MAY be present in resource serializations, and resource consumers SHOULD ignore any unrecognized elements.

Compliant serializations MAY embed the context directly in the resource serialization, or include and reference a copy of the context in the /resources/bagend directory to facilitate off-line processing.

Resources serialized as files SHOULD use the file extension .jsonld.

Resources SHOULD be serialized in a single file, using embedding (as described in JSON-LD 1.1 ยง4.5) instead of linking between discreet files in order to manifest relationships between model elements.

Consumers MAY process the Resource Model as JSON without interpreting it as Linked Data.

Model Description

This section is non-normative.

The BAGEND Resource Model is used to describe the custodial content of the bag, and is composed of the following conceptual entities:

Table 1: Overview of the Resource Model entities

Resource Model Entity Description
AgreementRepresents terms or conditions that govern the submission of the Article. Encapsulates any and all form of agreement or contract related to the Submission, Article, or Data Files.
ArticleThe intellectual content being published, typically a copy of an article that has been accepted for publication, post peer-review. Represents the Author Accepted Manuscript or Published Article. The primary intellectual output captured by a Submission.
AwardRepresents funding that enabled or contributed to the research documented or performed in the article. The award or awards that funded the research represented in the Article or data files.
ContractRepresents legal contracts such as Terms of Service, a License, etc. that specifically are agreed to in the process of performing a submission.
FileEncapsulates the technical metadata of a finite, ordered, stream of binary digits that are contained within the bag payload directory. An Article may link to a File representing its content, supporting figures, tables, or data. Represents the binary files and their roles associated with an Article. These would be considered part of the custodial content of a package.
JournalEncapsulates a Journal and its metadata. Metadata describing a Journal in print and/or electronic form
OrganizationRepresents an entity unified by management, vision, or legal framework that may act as an agent. Represents an organization related to the research or funding of the Article
PersonRepresents an individual agent that contributed to the Submission in some way. Accounts for people and their roles.
PublicationRepresents the Article in the context of a printed publication.
SubmissionAggregates the entities of the Resource Model as a cohesive whole and provides a place to record provenance detail. Represents a submission to an agency, containing the Article, Data Files, Award information, people involved in the submission process, and any agreements or contracts signed as part of the submission.

Alignment with BagIt

This section is non-normative.

If a BagIt-Profile-Identifier tag exists with a value of http://bagend.io/bagit-profile/0.1/ in bag-info.txt, then the consumer may apply semantics of the Resource Model to bag metadata where their fields align. This may be useful when a bag which conforms to the BAGEND profile does not contain a Resource Model instance. If values present in bag metadata conflict with values in an existing Resource Model, the Resource Model values take precedence.

Table 2: Semantic alignment of BagIt metadata tags and Resource Model elements

BagIt Tag Description Resource Model Element
Source-Organization, Organization-Address Described by BagIt as "Organization transferring the content." The affiliated Organization of the Submission Contact
Contact-Name, Contact-Phone, Contact-Email Described by BagIt as "Person at the source organization who is responsible for the content transfer." Submission Contact
External-Description Described by BagIt as "A brief explanation of the contents and provenance." Submission Description
External-Identifier Described by BagIt as "A sender-supplied identifier for the bag." Submission Correlation ID, or any other suitable Submission identifier
Bagging-Date Described by BagIt as "Date (YYYY-MM-DD) that the content was prepared for transfer." Submission Creation Date
Internal-Sender-Identifier Described by BagIt as "An alternate sender-specific identifier for the content and/or bag" Any suitable Submission identifier