d1_util package

DataONE Utilities and Examples.

A collection of scripts intended to be useful both as command line utilities and as examples on how to interact with the DataONE infrastructure via the DataONE Python stack.

Although this directory is not a package, this __init__.py file is required for pytest to be able to reach test directories below this directory.

Submodules

d1_util.cert_check_cn module

Submit a PEM (Base64) encoded X.509 v3 certificate to a CN for validation.

This is an example on how to use the DataONE Client and Common libraries for Python. It shows how to:

  • Submit a PEM (Base64) encoded X.509 v3 certificate, optionally containing a DataONE SubjectInfo extension, to a CN to check if it passes validation and to determine which DataONE subjects are authenticated by it.

Example

$ cert-check-cn –cert-pub /tmp/x509up_u1000

Notes:

  • Both the public and private key of the certificate are required. They may be in the same file, in which case only the --cert-pub option is required.

  • See check_x509_certificate_local.py for how to process the lists of equivalent identities and group memberships in a DataONE SubjectInfo extension into a list of authenticated DataONE subjects.

d1_util.cert_check_cn.main()

d1_util.cert_check_local module

Parse a PEM (Base64) encoded X.509 v3 certificate.

This is an example on how to use the DataONE Client and Common libraries for Python. It shows how to:

  • Parse a PEM (Base64) encoded X.509 v3 certificate, optionally containing a DataONE SubjectInfo extension, to determine which DataONE subjects are authenticated by it.

  • Process the lists of equivalent identities and group memberships in a DataONE SubjectInfo extension into a list of authenticated DataONE subjects.

Notes:

  • This does not require the private key of the certificate and does not validate the certificate.

d1_util.cert_check_local.main()

d1_util.cert_create_ca module

Generate a self signed root Certificate Authority (CA) certificate.

The certificate can be used for issuing certificates and sign CSRs that are locally trusted.

This is an example on how to use the DataONE Client and Common libraries for Python. It shows how to:

  • Use the d1_common.cert.x509 module to create a local self-signed CA certificate.

d1_util.cert_create_ca.main()
d1_util.cert_create_ca.create_ca(args)
exception d1_util.cert_create_ca.CACreateError

Bases: Exception

d1_util.cert_create_csr module

Generate a Certificate Signing Request (CSR) for a Member Node Client Side Certificate, suitable for submitting to DataONE.

This is an example on how to use the DataONE Client and Common libraries for Python. It shows how to:

  • Use the d1_common.cert.x509 module to create a local Certificate Signing Request

(CSR).

d1_util.cert_create_csr.main()
d1_util.cert_create_csr.create_csr(args)
exception d1_util.cert_create_csr.CSRCreateError

Bases: Exception

d1_util.cert_sign_csr module

Sign a Certificate Signing Request (CSR).

This is an example on how to use the DataONE Client and Common libraries for Python. It shows how to:

  • Use the d1_common.cert.x509 module to sign a Certificate Signing Request (CSR) using a

local CA.

d1_util.cert_sign_csr.main()
d1_util.cert_sign_csr.sign_csr(args)
d1_util.cert_sign_csr.assert_valid_path(p)
exception d1_util.cert_sign_csr.CSRSignError

Bases: Exception

d1_util.check_object_checksums module

Compare Science Object checksums for replicas on CNs and MNs.

This is an example on how to use the DataONE Client and Common libraries for Python. It shows how to:

  • Download Science Object checksums from CNs and MNs

d1_util.check_object_checksums.main()
d1_util.check_object_checksums.log_dict(d)

d1_util.check_scimeta_indexing module

d1_util.check_x509_certificate_cn module

Submit a PEM (Base64) encoded X.509 v3 certificate to a CN for validation.

This is an example on how to use the DataONE Client and Common libraries for Python. It shows how to:

  • Submit a PEM (Base64) encoded X.509 v3 certificate, optionally containing a DataONE SubjectInfo extension, to a CN to check if it passes validation and to determine which DataONE subjects are authenticated by it.

Notes:

  • This requires the private key of the certificate, and the CN validates the certificate.

  • See check_x509_certificate_local.py for how to process the lists of equivalent identities and group memberships in a DataONE SubjectInfo extension into a list of authenticated DataONE subjects.

d1_util.check_x509_certificate_cn.main()

d1_util.check_x509_certificate_local module

Parse a PEM (Base64) encoded X.509 v3 certificate.

This is an example on how to use the DataONE Client and Common libraries for Python. It shows how to:

  • Parse a PEM (Base64) encoded X.509 v3 certificate, optionally containing a DataONE SubjectInfo extension, to determine which DataONE subjects are authenticated by it.

  • Process the lists of equivalent identities and group memberships in a DataONE SubjectInfo extension into a list of authenticated DataONE subjects.

Notes:

  • This does not require the private key of the certificate and does not validate the certificate.

d1_util.check_x509_certificate_local.main()

d1_util.compare_object_lists module

d1_util.create_data_packages module

Create Data Package (Resource Map) on Member Node.

This is an example on how to use the DataONE Client and Common libraries for Python. It shows how to:

  • Create Data Packages (Resource Maps) from local files

  • Upload local files and Data Packages to a Member Node

Operation:

Data packages are created from files in a folder provided by the user. Files with the same basename are combined into a package, with the basename being the name of the package.

Example:

The files

myfile.1.txt myfile.2.txt myfile.jpg

would be grouped into a package because they share the same basename. First, each of the files would be uploaded to the Member Node separately. The full filename is used as the PID.

For each file, a system metadata file is generated, based on information from the file and from a set of fixed settings.

Then, a package for all the files is generated. System metadata is generated for the package, and the package is uploaded to the Member Node.

d1_util.create_data_packages.main()
d1_util.create_data_packages.create_science_object_on_member_node(client, file_path)
d1_util.create_data_packages.create_package_on_member_node(client, files_in_group)
d1_util.create_data_packages.create_resource_map_for_pids(package_pid, pids)
d1_util.create_data_packages.generate_system_metadata_for_science_object(pid, format_id, science_object)
d1_util.create_data_packages.generate_sys_meta(pid, format_id, size, md5, now)
d1_util.create_data_packages.generate_public_access_policy()
d1_util.create_data_packages.find_file_groups(directory_path)
d1_util.create_data_packages.find_files_in_group(directory_path, group)
d1_util.create_data_packages.group_name(file_path)
d1_util.create_data_packages.base_name_without_extension(file_path)

d1_util.create_object_on_member_node module

Create Science Object on Member Node.

This is an example on how to use the DataONE Client and Common libraries for Python. It shows how to:

  • Upload a local file to a Member Node as a Science Object

Operation:

  • The first time the script is run, a message indicating that the object was successfully created should be displayed, and the object should become available on the Member Node.

  • If the script is then launched again without changing the identifier (PID), an IdentifierNotUnique exception should be returned. This indicates that the identifier is now in use by the previously created object.

  • Any other errors will also be returned as DataONE exceptions.

d1_util.create_object_on_member_node.main()

d1_util.delete_all_objects_of_type module

Delete all Science Objects of specific type from Member Node.

This is an example on how to use the DataONE Client and Common libraries for Python. It shows how to:

  • Retrieve a list of all objects with specific FormatID on a Member Node

  • Delete all objects with a specific FormatID from a Member Node

Notes:

  • Do NOT use this script to delete undesired objects from a production Member Node!

  • The objects are deleted with the MNStorage.delete() API method. The API method is intended to be called only by CNs under specific circumstances. In a stand-alone or non-production environment, the API can be used for removing objects from a Member Node.

  • MNStorage.delete() is only available to subjects which have delete permission on the node.

  • To delete all the objects on the node, remove the formatId and replicaStatus parameters in the listObjects() call below.

  • The Member Node object list is retrieved in small sections, called pages. Because removing objects may, depending on the implementation of listObjects(), cause the contents in each page to shift, the entire list of objects to delete is created first and then the deletions are performed in a separate step. This could require a lot of memory if running on a server with a large number of objects. In that case, an alternative implementation is to delete the objects as they are discovered and repeat the process until no more objects to delete are found.

  • The listObjects() Member Node API method may not be efficiently implemented by all Member Nodes as it is intended primarily for use by Coordinating Nodes.

Operation:

  • Configure the script in the Config section below

d1_util.delete_all_objects_of_type.main()
class d1_util.delete_all_objects_of_type.MemberNodeObjectDeleter(base_url)

Bases: object

delete_objects_from_member_node()

d1_util.display_node_status module

Display node status.

This is an example on how to use the DataONE Client and Common libraries for Python. It shows how to:

  • Retrieve a list of all DataONE Nodes

  • Get and display key metrics for each of the Nodes.

Notes:

  • See the description for the CERTIFICATE setting below for limitations in the information displayed by this script.

Operation:

  • Configure the script in the Config section below

d1_util.display_node_status.main()
d1_util.display_node_status.get_node_list_from_coordinating_node()
d1_util.display_node_status.get_cn_metrics(cn)
d1_util.display_node_status.get_mn_metrics(mn)
d1_util.display_node_status.print_capabilities(client)
d1_util.display_node_status.get_gen_metrics(client, node)
d1_util.display_node_status.get_ping(client)
d1_util.display_node_status.get_number_of_objects(client)
d1_util.display_node_status.get_number_of_log_records(client)
d1_util.display_node_status.is_member_node(node)
d1_util.display_node_status.is_coordinating_node(node)

d1_util.download_all_objects module

Download all Science Objects in a DataONE environment.

This is an example on how to use the DataONE Client and Common libraries for Python. It shows how to:

  • Retrieve a list of all DataONE Member Nodes

  • Retrieve a list of all objects of specific FormatID on each of those Member Nodes

  • Retrieve and examine the System Metadata for each of the listed objects

  • Based on information in the System Metadata, determine if the corresponding object should be downloaded

  • Download the corresponding object

Notes:

  • This approach retrieves object lists directly from each Member Node and is mainly suitable in special situations where a 3rd party wishes to examine the overall state of objects in DataONE, for instance, for creating statistics or data quality reports.

  • This approach uses the listObjects() Member Node API method, which has limited filtering facilities. The example shows how to use this filtering to list objects that are of a specific type (FormatID) and that are native to the Member Node (i.e., not replicas). If a more specific set of objects is desired, it is better to use DataONE’s query interface, which offers much richer filtering facilities.

  • It is not possible to filter out non-public objects with listObjects(). Instead, this script attempts to download the object’s System Metadata and checks for NotAuthorized exceptions.

  • If a completely unfiltered object list is required, simply remove the formatId and replicaStatus parameters in the listObjects() call below.

  • The Member Node object list is retrieved in small sections, called pages. The objects on each page are processed before retrieving the next page.

  • The listObjects() Member Node API method may not be efficiently implemented by all Member Nodes as it is intended primarily for use by Coordinating Nodes.

  • The listObjects() method may miss objects that are created while the method is in use.

d1_util.download_all_objects.main()
d1_util.download_all_objects.get_node_list_from_coordinating_node()
d1_util.download_all_objects.is_member_node(node)
class d1_util.download_all_objects.MemberNodeObjectDownloader(node)

Bases: object

download_objects_from_member_node()
download_d1_object(pid)

d1_util.download_and_display_data_package module

Download and display Data Package (Resource Map) from Member Node.

This is an example on how to use the DataONE Client and Common libraries for Python. It shows how to:

  • Download a Data Package (Resource Map)

  • Parse and display the Resource Map.

Operation:

  • Configure the script in the Config section below

d1_util.download_and_display_data_package.main()

d1_util.download_mn_objects module

Download Science Objects from a Member Node.

This is an example on how to use the DataONE Client and Common libraries for Python. It shows how to:

  • Retrieve a list of all objects on a MN

  • Retrieve the bytes and System Metadata for each object

Operation:

  • Configure the script in the Config section below

d1_util.download_mn_objects.main()
class d1_util.download_mn_objects.MemberNodeObjectDownloader(base_url, download_folder, object_id_filter_list=None, max_object_size=None)

Bases: object

download_all()
exception d1_util.download_mn_objects.DownloadError

Bases: Exception

d1_util.download_sciobj module

Download Science Objects from a Member Node or Coordinating Node.

This is an example on how to use the DataONE Client and Common libraries for Python. It shows how to:

  • Download a Science Object from a MN or CN.

d1_util.download_sciobj.main()

d1_util.download_server_certs module

Download the issuer CA X.509 certificates for all DataONE nodes.

This is an example on how to use the DataONE Client and Common libraries for Python. It shows how to:

  • Downloads server side certificate from a DataONE nodes

  • Parse certificates to find the issuer CA certificate URLs

  • Downloads the CA certs

Operation:

This process downloads the server side certificates from the DataONE nodes and parses them to find the issuer CA certificate URLs. It then downloads the CA certs.

The CA certs can then be installed as trusted CAs in the local environment in order to ensure that the DataONE client library trusts all server side certs currently in use in DataONE.

To install the CA bundles on Ubuntu and derived distributions, move the files to:

/usr/local/share/ca-certificates

Then run:

update-ca-certificates

update-ca-certificates only processes “.crt” files, so this script saves the certificates with that extension.

d1_util.download_server_certs.main()
d1_util.download_server_certs.download_server_cert(base_url, node_id, download_dir_path)

d1_util.download_sysmeta module

Download and display Science Metadata.

This is an example on how to use the DataONE Client and Common libraries for Python. It shows how to:

  • Download Science Metadata from a MN or CN.

  • Format the Science Metadata XML document for display or save to disk.

d1_util.download_sysmeta.main()

d1_util.download_sysmeta_multiproc module

Bulk download System Metadata object from MN.

This is an example on how to use the DataONE Client and Common libraries for Python. It shows how to:

  • Use the multiprocessed System Metadata iterator to efficiently perform bulk downloads of System Metadata from a Member Node

d1_util.download_sysmeta_multiproc.main()
d1_util.download_sysmeta_multiproc.parse_cmd_line()
exception d1_util.download_sysmeta_multiproc.SysMetaRetrieveError

Bases: Exception

d1_util.download_test_docs module

Download randomly selected Science Metadata objects from CN.

This is an example on how to use the DataONE Science Metadata library for Python. It shows how to:

  • Query the DataONE Solr index for a random selection of object identifiers for a given formatId.

  • Download objects with high bandwith throughput by using the async DataONEClient to perform concurrent downloads.

async d1_util.download_test_docs.main()
async d1_util.download_test_docs.validate_bulk(client, out_dir_path, format_id, pid_count, solr_client, progress_logger, task_name)
async d1_util.download_test_docs.download(client, out_dir_path, pid, format_id, progress_logger)
async d1_util.download_test_docs.save_xml(out_dir_path, pid, sciobj_f)
async d1_util.download_test_docs.get_random_pid_list(solr_client, format_id, pid_count)

Query Solr for a list of randomly selected PIDs of objects with a given formatId.

d1_util.find_gmn_instances module

d1_util.generate_data_package_from_stream module

d1_util.generate_object_stats module

Generate statistics for Science Objects on a given set of Member Nodes.

This is an example on how to use the DataONE Client and Common libraries for Python. It shows how to:

  • Aggregate values from System Metadata on a set of Member Nodes

Operation:

  • Configure the script in the Config section below

d1_util.generate_object_stats.main()
d1_util.generate_object_stats.find_object_size_stats_node_all(gmn_node_list)
d1_util.generate_object_stats.find_object_size_stats_node(gmn_dict)
d1_util.generate_object_stats.max_size_sysmeta_list(sysmeta_pyxb_list, sysmeta_pyxb, max_size=10)
d1_util.generate_object_stats.log_dict(d)

d1_util.jwt_token_tasks module

Perform various operations on Java Web Tokens (JWTs)

This is an example on how to use the DataONE Client and Common libraries for Python.

d1_util.jwt_token_tasks.main()
d1_util.jwt_token_tasks.validate_and_decode(jwt_bu64, cert_obj)

Example for validating the signature of a JWT using only the cryptography library.

Note that this does NOT validate the claims in the claim set.

d1_util.jwt_token_tasks.find_valid_combinations(cert_file_name_list, jwt_file_name_list)

Given a list of cert and JWT file names, print a list showing each combination along with indicators for combinations where the JWT signature was successfully validated with the cert.

d1_util.jwt_token_tasks.download_cn_certs()
d1_util.jwt_token_tasks.jwt_cleanup()
d1_util.jwt_token_tasks.cert_cleanup()
d1_util.jwt_token_tasks.filename_from_cert_obj(cert_obj)

d1_util.parse_format_id_list module

Parse ObjectFormatList XML doc with XPath.

This is an example on how to use the DataONE Client and Common libraries for Python. It shows how to:

  • Extract formatIds from a DataONE ObjectFormatList using XPath

d1_util.parse_format_id_list.main()
d1_util.parse_format_id_list.get_scimeta_format_id_list(xsd_path)

d1_util.resolve_package_identifier module

Resolve an OAI-ORE Resource Map (data package) identifier to download URL for a BagIt ZIP archive of the package.

This is an example on how to use the DataONE Client and Common libraries for Python.

d1_util.resolve_package_identifier.main()

d1_util.solr_query module

Solr query.

This is an example on how to use the DataONE Client Library for Python. It shows how to:

  • Query DataONE’s Solr index

  • Display the results

d1_util.solr_query.main()

d1_util.validate_system_metadata module

d1_util.xml_apply_xslt module

Apply XSLT transform to XML document.

This is an example on how to use the DataONE Science Metadata library for Python. It shows how to:

  • Deserialize, process and serialize XML docs.

  • Apply an XSLT stransform.

  • Display or save the resulting XML doc.

d1_util.xml_apply_xslt.main()
exception d1_util.xml_apply_xslt.ResolveError

Bases: Exception

d1_util.xml_remove_empty_elements module

Remove empty elements from XML that might interfere with XSD schema validation.

Overall formatting is maintained.

This is an example on how to use the DataONE Science Metadata library for Python. It shows how to:

  • Deserialize, process and serialize XML docs.

  • Apply an XSLT stransform which removes empty elements from XML.

  • Display or save the resulting XML doc.

d1_util.xml_remove_empty_elements.main()
exception d1_util.xml_remove_empty_elements.ResolveError

Bases: Exception

d1_util.xml_strip_whitespace module

Strip whitespace that might interfere with XSD schema validation.

Overall formatting is maintained. Note that pretty printing the doc is likely to add the stripped whitespace back in.

This is an example on how to use the DataONE Science Metadata library for Python. It shows how to:

  • Deserialize, process and serialize XML docs.

  • Apply an XSLT stransform which strips potentially problematic whitespace.

  • Display or save the resulting XML doc.

d1_util.xml_strip_whitespace.main()
exception d1_util.xml_strip_whitespace.ResolveError

Bases: Exception