d1_scimeta package

DataONE Science Metadata Library.

Although this directory is not a package, this __init__.py file is required for pytest to be able to reach test directories below this directory.

Submodules

d1_scimeta.gen_root_xsd module

Generate preliminary root XSD docs for each formatId that is supported for validation.

The XSD docs much be further edited by hand to correctly import all namespaces required for validation of each given formatId.

d1_scimeta.gen_root_xsd.main()
d1_scimeta.gen_root_xsd.gen_target_ns_to_rel_xsd_path_tup(branch_path, root_xsd_path)
d1_scimeta.gen_root_xsd.gen_xsd_tree(ns_xsd_path_tup)

d1_scimeta.lxml_validate module

d1_scimeta.lxml_validate.main()

d1_scimeta.schema_prepare module

d1_scimeta.schema_resolve module

Determine which schema locations will be accessed when validating a given XML doc.

Recursively follows xs:include and xs:import schemaLocation and issue warnings if XSD docs are missing, invalid or require network access.

Intended for troubleshooting of validation issues.

d1_scimeta.schema_resolve.main()
d1_scimeta.schema_resolve.resolve_schemas(xsd_uri_tup, schema_branch_path, visited_xsd_uri_set=None, indent=1)
d1_scimeta.schema_resolve.load_schema(xsd_uri)
d1_scimeta.schema_resolve.download_schema(xsd_url)
exception d1_scimeta.schema_resolve.ResolveError

Bases: Exception

d1_scimeta.util module

d1_scimeta.util.get_xsi_schema_location_tup(xml_tree)

Extract xsi:schemaLocation from the root of an XML doc.

The root schemaLocation consists of (namespace, uri) pairs stored as a list of strings and designates XSD namespaces and schema locations required for validation.

For schemaLocation in xs:include and xs:import in XSD docs, see other function.

Parameters

xml_tree

Returns

tup of 2-tups.

Examples

xsi:schemaLocation=”

http://www.isotc211.org/2005/gmi http://files.axds.co/isobio/gmi/gmi.xsd http://www.isotc211.org/2005/gmd http://files.axds.co/isobio/gmd/gmd.xsd

“>

->

(

(‘http://www.isotc211.org/2005/gmi’, ‘http://files.axds.co/isobio/gmi/gmi.xsd’), (‘http://www.isotc211.org/2005/gmd’, ‘http://files.axds.co/isobio/gmd/gmd.xsd’)

)

d1_scimeta.util.get_xs_include_xs_import_schema_location_tup(xsd_tree)

Extract xs:schemaLocation from xs:include and xs:import elements in XSD doc.

The schemaLocation consists of a single uri.

For schemaLocation in the root of XML docs, see other function.

d1_scimeta.util.get_root_ns(xml_tree)

Extract the root namespace for the XML doc.

Returns

Extracted from the prefix used for the root element which is also declared as an xmlns in the root element.

Return type

str

Examples

<xs:schema

targetNamespace=”http://www.w3.org/XML/1998/namespace” xmlns:xs=”http://www.w3.org/2001/XMLSchema”>

-> http://www.w3.org/2001/XMLSchema

d1_scimeta.util.get_target_ns(xml_tree)

Extract the target namespace for the XML doc.

Returns

Extracted from the targetNamespace attribute of the root element.

If the root element does not have a targetNamespace attribute, return an empty string, “”.

Return type

str

Examples:

<xs:schema

targetNamespace=”http://www.w3.org/XML/1998/namespace” xmlns:xs=”http://www.w3.org/2001/XMLSchema”>

-> http://www.w3.org/XML/1998/namespace

d1_scimeta.util.get_abs_root_xsd_path(format_id)

Get abs path to root XSD by formatId.

Returns

str

Path to the pre-generated XSD that should import all XSDs required for validating any XML of the given formatId.

E.g.:

format_id = http://www.isotc211.org/2005/gmd -> /d1_scimeta/ext/isotc211.xsd

Return type

xsd_path

d1_scimeta.util.get_schema_name(format_id)

Get the directory name of a schema by formatId.

Returns

str

The name (not path) of the root directory for the XSD files for a given formatId.

This is also the basename of the root XSD file for a given formatId.

E.g.:

format_id = http://www.isotc211.org/2005/gmd -> isotc211

Return type

schema_dir_name

d1_scimeta.util.get_supported_format_id_list()

Get list of formatIds that are supported by the validator.

Returns

list

List of the formatId strings that can be passed to the validate*() functions.

Return type

list of format_id

.

d1_scimeta.util.is_installed_scimeta_format_id(format_id)

Return True if validation is supported for format_id.

d1_scimeta.util.gen_abs_xsd_path_list(branch_path)

Generate a list of abs paths to XSD files under branch_path.

Excludes *.ORIGINAL.* files, which are inferred from their .xsd conterparts.

d1_scimeta.util.get_abs_schema_branch_path(format_id)

Get absolute path to a branch holding all the XSD files for a single formatId.

The returned path will always have a trailing slash.

Returns

str

E.g.:

format_id = http://www.isotc211.org/2005/gmd -> /schema/lib_scimeta/src/d1_scimeta/schema/isotc211/

Return type

abs_xsd_path

d1_scimeta.util.gen_xsd_name_dict(branch_path, xsd_path_list)

Generate a dict of XSD name to abs path to the XSD file.

The key is the part of the XSD path that follows under branch_path.

E.g.:

path = /schema/isotc211/gmd/applicationSchema.xsd -> key = /gmd/applicationSchema.xsd val = /schema/isotc211/gmd/applicationSchema.xsd

d1_scimeta.util.gen_rel_xsd_path(branch_path, xsd_path)

Generate the relative part of the XSD path that follows under branch_path.

Parameters
  • branch_path – str Absolute path to a branch holding all the XSD files for a single formatId.

  • xsd_path – str Absolute path to an XSD file under the branch_path.

Returns

str

E.g.:

branc_path = /schema/isotc211/ xsd_path = /schema/isotc211/gmd/applicationSchema.xsd -> gmd/applicationSchema.xsd

Return type

path

d1_scimeta.util.get_rel_path(parent_xsd_path, child_xsd_path)

Generate a relative path suitable for use as a schemaLocation URI.

Parameters
  • parent_xsd_path – str Abs path to XSD file that has the schemaLocation.

  • child_xsd_path – str Abs path to XSD file that the schemaLocation should be rewritten to.

Returns

Relative path

E.g.:

parent = /schema/isotc211/gmd/maintenance.xsd child = /schema/isotc211/gmd/citation.xsd -> ../gmd/citation.xsd

Return type

str

d1_scimeta.util.gen_abs_uri(abs_url_or_path, rel_path)

Create an absolute URL or local filesystem path given at least one absolute component.

Parameters
  • abs_url_or_path – str URL or absolute filesystem path

  • rel_path – URL or relative filesystem path

Returns

Absolute URL or filesystem path.

d1_scimeta.util.get_xsd_path(xsd_name_dict, uri)

Get abs path to the XSD that has a key that matches the end of the URI.

Works for file paths, URLs and URIs. E.g.:

http://www.w3.org/2001/xml.xsd -> xml.xsd xml.xsd -> schema/_cache/http_www.w3.org_2001__xml.xsd

d1_scimeta.util.load_xml_file_to_tree(xml_path)
d1_scimeta.util.parse_xml_bytes(xml_bytes, xml_path)

Parse XML bytes to tree.

Passing in the path to the file enables relative imports to work.

d1_scimeta.util.get_error_log_as_str(lxml_obj)

Create a basic message with results from the last XMLParser() or lxml.etree.XMLSchema() run.

lxml.etree.XMLParser(), lxml.etree.XMLSchema() and some exception objects, such as lxml.etree.XMLSchemaParseError() have an error_log attribute which contains a list of errors and warnings from the most recent run.

Each error element in the list has attributes:

message: the message text domain: the domain ID (see the lxml.etree.ErrorDomains class) type: the message type ID (see the lxml.etree.ErrorTypes class) level: the log level ID (see the lxml.etree.ErrorLevels class) line: the line at which the message originated (if applicable) column: the character column at which the message originated (if applicable) filename: the name of the file in which the message originated (if applicable)

For convenience, there are also three properties that provide readable names for the ID values:

domain_name type_name level_name

Parameters

lxml_obj – lxml.etree.XMLParser() or lxml.etree.XMLSchema()

Returns

Selected elements from the error_log of the lxml_obj.

Return type

str

d1_scimeta.util.load_bytes_from_file(xml_path)
d1_scimeta.util.save_tree_to_file(xml_tree, xml_path)

Write pretty formatted XML tree to file.

d1_scimeta.util.save_bytes_to_file(xml_path, xml_bytes)

Write bytes to file.

d1_scimeta.util.dump_pretty_tree(xml_tree, msg_str='XML Tree', logger=<bound method Logger.debug of <Logger d1_scimeta.util (DEBUG)>>)
d1_scimeta.util.dump(o, msg_str='Object dump')
d1_scimeta.util.pretty_format_tree(xml_tree)
d1_scimeta.util.is_valid_xml_file(xml_path)
d1_scimeta.util.is_valid_xsd_file(xsd_path)
d1_scimeta.util.is_url(s)

Return True if s is a URL.

d1_scimeta.util.strip_whitespace(xml_tree)

Strip whitespace that might interfere with validation from XSD while maintaining overall formatting.

E.g., whitespace in a gco:DateTime trips up the validation:

<gco:DateTime>

2011-03-18T15:39:17Z

</gco:DateTime>

This changes it to:

<gco:DateTime>2011-03-18T15:39:17Z</gco:DateTime>

Parameters

xml_tree

Returns

stripped xml_tree

d1_scimeta.util.remove_empty_elements(xml_tree)

Remove empty elements that might interfere with validation from XSD while maintaining overall formatting.

Parameters

xml_tree

Returns

stripped xml_tree

d1_scimeta.util.apply_xslt_transform(xml_tree, xslt_path)
d1_scimeta.util.create_lxml_obj(xml_tree, lxml_obj_class)

Create an object from an lxml class that takes a tree as parameter.

Parameters
  • xml_tree – etree

  • lxml_obj_class – lxml object

    lxml.etree.XMLSchema lxml.etree.XSLT

exception d1_scimeta.util.SciMetaError

Bases: Exception

d1_scimeta.validate module

Validate Science Metadata.

Usage:

import d1_scimeta.validate

try:

d1_scimeta.validate.assert_valid(format_id, xml)

except d1_scimeta.util.SciMetaError as e:

log.error(e)

d1_scimeta.validate.assert_valid(format_id, xml)

Validate an Science Metadata XML file.

Parameters
  • format_id – str DataONE formatId. Must be one of the keys from the format_id_to_schema.json document. E.g., http://www.isotc211.org/2005/gmd.

  • xml – str, bytes or tree str: Path to XML file to validate. bytes: UTF-8 encoded bytes of XML doc to validate tree: lxml.etree of XML doc to validate

Raises

On validation error – d1_scimeta.util.SciMetaError

Returns

None

Return type

On successful validation

d1_scimeta.validate.validate_tree(format_id, xml_tree)
d1_scimeta.validate.validate_bytes(format_id, xml_bytes, xml_path=None)
d1_scimeta.validate.validate_path(format_id, xml_path)
d1_scimeta.validate.apply_xerces_adaption_schema_transform(root_xsd_path, xml_tree)