DataONE Python Products

DataONE provides a number of products implemented in Python and Java, as part of the Investigator Toolkit (ITK). Potential users of these products include software developers, Member Node partners and end users. Only the Python products are outlined in this document.

For software developers, DataONE provides development libraries implemented in Python. These provide functionality commonly needed by projects that interact with the DataONE infrastructure. It is recommended that applications implemented in Python use the libraries instead of interacting directly with the infrastructure as this is likely to reduce the development effort.

For Member Node partners, DataONE provides a Member Node (MN) implemented in Python, called Generic Member Node (GMN).

Lastly, DataONE provides various tools intended for end users, also implemented in Python. These include ONEDrive and the DataONE Command Line Client.

Utilities (for end users)

DataONE Command Line Utilities and Examples

The DataONE Utilities / Examples package contains command line utilities for interacting with the DataONE infrastructure.

The utilities are implemented using the DataONE Common and Client libraries for Python. Effort has been put into keeping the implementations clear and easy to read, allowing the utilities to also serve as examples on how to use the DataONE Python libraries.

After setup, all the utilities will be in the search path for the shell, so can be started from any directory. If using a virtual environment, the virtual environment must be active for the commands to work.

DataONE ONEDrive

DataONE ONEDrive enables DataONE objects stored in Zotero citation manager libraries to be accessed like regular files on Windows, Mac OS X and Linux systems. This allows users to open remote DataONE objects locally and work with them as if they reside on the user’s computer. For instance, a spread sheet that is stored on a Member Node can be opened directly in Excel.

DataONE objects can be added to a Zotero library via the ONEMercury search tool. Objects can also be added in all the other ways that Zotero supports. ONEDrive connects to a Zotero library and makes all DataONE objects within the library accessible as regular files. Zotero collections are represented as folders in the ONEDrive filesystem.

DataONE Command Line Interface

The DataONE Command Line Interface (CLI) enables operations to be performed against the DataONE infrastructure from the command line. Supported operations include creating and retrieving DataONE objects, searching, updating access control rules and retrieving statistics.

Member Node (for Member Node partners)

Generic Member Node (GMN)

The Generic Member Node (GMN) is a DataONE Member Node MN). It provides an implementation of MN APIs and can be used by organizations to expose their science data to DataONE if they do not wish to create their own, native MN.

GMN can be used as a standalone MN or it can be used for exposing data that is already available on the web, to DataONE. When used in this way, GMN provides a DataONE compatible interface to existing data and does not store the data.

GMN can also be used as a workbone or reference for a 3rd party MN implementation. If an organization wishes to donate storage space to DataONE, GMN can be set up as a replication target.

Python Libraries (for software developers)

DataONE Common Library for Python

The DataONE Common Library for Python is a component of the DataONE Investigator Toolkit (ITK). It forms the foundation on which higher level components in the DataONE Python stack are built. It provides functionality commonly needed by clients, servers and other applications that interact with the DataONE infrastructure, including:

  • Serializing, deserializing, validating and type conversions for the DataONE XML types

  • Parsing and generating X.509 v3 certificates with DataONE extension

  • Parsing and generating OAI-ORE Resource Maps as used by DataONE

  • Utilities for working with XML documents, URLs, date-times, etc, in the context of DataONE

DataONE Client Library for Python

The DataONE Client Library for Python works together with the DataONE Common Library for Python to provide functionality commonly needed by client software that connects to DataONE nodes.

The main functionality provided by this library is a complete set of wrappers for all DataONE API methods. There are many details related to interacting with the DataONE API, such as creating MIME multipart messages, encoding parameters into URLs and handling Unicode. The wrappers hide these details, allowing the developer to communicate with nodes by calling native Python methods which take and return native Python objects.

The wrappers also convert any errors received from the nodes into native exceptions, enabling clients to use Python’s concise exception handling system to handle errors.

DataONE Science Metadata Validator for Python

The DataONE Science Metadata library for Python is a component of the DataONE Investigator Toolkit (ITK). It currently provides schema validation of DataONE Science Metadata XML documents.

DataONE Test Utilities

The DataONE Test Utilities package contains various utilities for testing DataONE infrastructure components and clients. These include the Instance Generator, used for creating randomized System Metadata documents, and the Stress Tester, used for stress testing of Member Node implementations. The stress_tester can create many concurrent connections to a Member Node and simultaneously create any number of randomly generated objects while running queries and object retrievals. There are also various Utilities.

DataONE Dev Tools

DataONE CSW Harvester