d1_scimeta package¶
DataONE Science Metadata Library.
Although this directory is not a package, this __init__.py file is required for pytest to be able to reach test directories below this directory.
Subpackages¶
Submodules¶
d1_scimeta.gen_root_xsd module¶
Generate preliminary root XSD docs for each formatId that is supported for validation.
The XSD docs much be further edited by hand to correctly import all namespaces required for validation of each given formatId.
-
d1_scimeta.gen_root_xsd.
main
()¶
-
d1_scimeta.gen_root_xsd.
gen_target_ns_to_rel_xsd_path_tup
(branch_path, root_xsd_path)¶
-
d1_scimeta.gen_root_xsd.
gen_xsd_tree
(ns_xsd_path_tup)¶
d1_scimeta.schema_prepare module¶
d1_scimeta.schema_resolve module¶
Determine which schema locations will be accessed when validating a given XML doc.
Recursively follows xs:include and xs:import schemaLocation and issue warnings if XSD docs are missing, invalid or require network access.
Intended for troubleshooting of validation issues.
-
d1_scimeta.schema_resolve.
main
()¶
-
d1_scimeta.schema_resolve.
resolve_schemas
(xsd_uri_tup, schema_branch_path, visited_xsd_uri_set=None, indent=1)¶
-
d1_scimeta.schema_resolve.
load_schema
(xsd_uri)¶
-
d1_scimeta.schema_resolve.
download_schema
(xsd_url)¶
-
exception
d1_scimeta.schema_resolve.
ResolveError
¶ Bases:
Exception
d1_scimeta.util module¶
-
d1_scimeta.util.
get_xsi_schema_location_tup
(xml_tree)¶ Extract xsi:schemaLocation from the root of an XML doc.
The root schemaLocation consists of (namespace, uri) pairs stored as a list of strings and designates XSD namespaces and schema locations required for validation.
For schemaLocation in xs:include and xs:import in XSD docs, see other function.
- Parameters
xml_tree
- Returns
tup of 2-tups.
Examples
- xsi:schemaLocation=”
http://www.isotc211.org/2005/gmi http://files.axds.co/isobio/gmi/gmi.xsd http://www.isotc211.org/2005/gmd http://files.axds.co/isobio/gmd/gmd.xsd
“>
->
- (
(‘http://www.isotc211.org/2005/gmi’, ‘http://files.axds.co/isobio/gmi/gmi.xsd’), (‘http://www.isotc211.org/2005/gmd’, ‘http://files.axds.co/isobio/gmd/gmd.xsd’)
)
-
d1_scimeta.util.
get_xs_include_xs_import_schema_location_tup
(xsd_tree)¶ Extract xs:schemaLocation from xs:include and xs:import elements in XSD doc.
The schemaLocation consists of a single uri.
For schemaLocation in the root of XML docs, see other function.
-
d1_scimeta.util.
get_root_ns
(xml_tree)¶ Extract the root namespace for the XML doc.
- Returns
Extracted from the prefix used for the root element which is also declared as an xmlns in the root element.
- Return type
str
Examples
- <xs:schema
targetNamespace=”http://www.w3.org/XML/1998/namespace” xmlns:xs=”http://www.w3.org/2001/XMLSchema”>
-
d1_scimeta.util.
get_target_ns
(xml_tree)¶ Extract the target namespace for the XML doc.
- Returns
Extracted from the targetNamespace attribute of the root element.
If the root element does not have a targetNamespace attribute, return an empty string, “”.
- Return type
str
Examples:
- <xs:schema
targetNamespace=”http://www.w3.org/XML/1998/namespace” xmlns:xs=”http://www.w3.org/2001/XMLSchema”>
-
d1_scimeta.util.
get_abs_root_xsd_path
(format_id)¶ Get abs path to root XSD by formatId.
- Returns
str
Path to the pre-generated XSD that should import all XSDs required for validating any XML of the given formatId.
- E.g.:
format_id = http://www.isotc211.org/2005/gmd -> /d1_scimeta/ext/isotc211.xsd
- Return type
xsd_path
-
d1_scimeta.util.
get_schema_name
(format_id)¶ Get the directory name of a schema by formatId.
- Returns
str
The name (not path) of the root directory for the XSD files for a given formatId.
This is also the basename of the root XSD file for a given formatId.
- E.g.:
format_id = http://www.isotc211.org/2005/gmd -> isotc211
- Return type
schema_dir_name
-
d1_scimeta.util.
get_supported_format_id_list
()¶ Get list of formatIds that are supported by the validator.
- Returns
list
List of the formatId strings that can be passed to the validate*() functions.
- Return type
list of format_id
.
-
d1_scimeta.util.
is_installed_scimeta_format_id
(format_id)¶ Return True if validation is supported for format_id.
-
d1_scimeta.util.
gen_abs_xsd_path_list
(branch_path)¶ Generate a list of abs paths to XSD files under branch_path.
Excludes *.ORIGINAL.* files, which are inferred from their .xsd conterparts.
-
d1_scimeta.util.
get_abs_schema_branch_path
(format_id)¶ Get absolute path to a branch holding all the XSD files for a single formatId.
The returned path will always have a trailing slash.
- Returns
str
E.g.:
format_id = http://www.isotc211.org/2005/gmd -> /schema/lib_scimeta/src/d1_scimeta/schema/isotc211/
- Return type
abs_xsd_path
-
d1_scimeta.util.
gen_xsd_name_dict
(branch_path, xsd_path_list)¶ Generate a dict of XSD name to abs path to the XSD file.
The key is the part of the XSD path that follows under branch_path.
- E.g.:
path = /schema/isotc211/gmd/applicationSchema.xsd -> key = /gmd/applicationSchema.xsd val = /schema/isotc211/gmd/applicationSchema.xsd
-
d1_scimeta.util.
gen_rel_xsd_path
(branch_path, xsd_path)¶ Generate the relative part of the XSD path that follows under branch_path.
- Parameters
branch_path – str Absolute path to a branch holding all the XSD files for a single formatId.
xsd_path – str Absolute path to an XSD file under the
branch_path
.
- Returns
str
- E.g.:
branc_path = /schema/isotc211/ xsd_path = /schema/isotc211/gmd/applicationSchema.xsd -> gmd/applicationSchema.xsd
- Return type
path
-
d1_scimeta.util.
get_rel_path
(parent_xsd_path, child_xsd_path)¶ Generate a relative path suitable for use as a schemaLocation URI.
- Parameters
parent_xsd_path – str Abs path to XSD file that has the schemaLocation.
child_xsd_path – str Abs path to XSD file that the schemaLocation should be rewritten to.
- Returns
Relative path
- E.g.:
parent = /schema/isotc211/gmd/maintenance.xsd child = /schema/isotc211/gmd/citation.xsd -> ../gmd/citation.xsd
- Return type
str
-
d1_scimeta.util.
gen_abs_uri
(abs_url_or_path, rel_path)¶ Create an absolute URL or local filesystem path given at least one absolute component.
- Parameters
abs_url_or_path – str URL or absolute filesystem path
rel_path – URL or relative filesystem path
- Returns
Absolute URL or filesystem path.
-
d1_scimeta.util.
get_xsd_path
(xsd_name_dict, uri)¶ Get abs path to the XSD that has a key that matches the end of the URI.
Works for file paths, URLs and URIs. E.g.:
http://www.w3.org/2001/xml.xsd -> xml.xsd xml.xsd -> schema/_cache/http_www.w3.org_2001__xml.xsd
-
d1_scimeta.util.
load_xml_file_to_tree
(xml_path)¶
-
d1_scimeta.util.
parse_xml_bytes
(xml_bytes, xml_path)¶ Parse XML bytes to tree.
Passing in the path to the file enables relative imports to work.
-
d1_scimeta.util.
get_error_log_as_str
(lxml_obj)¶ Create a basic message with results from the last XMLParser() or lxml.etree.XMLSchema() run.
lxml.etree.XMLParser(), lxml.etree.XMLSchema() and some exception objects, such as lxml.etree.XMLSchemaParseError() have an error_log attribute which contains a list of errors and warnings from the most recent run.
Each error element in the list has attributes:
message: the message text domain: the domain ID (see the lxml.etree.ErrorDomains class) type: the message type ID (see the lxml.etree.ErrorTypes class) level: the log level ID (see the lxml.etree.ErrorLevels class) line: the line at which the message originated (if applicable) column: the character column at which the message originated (if applicable) filename: the name of the file in which the message originated (if applicable)
For convenience, there are also three properties that provide readable names for the ID values:
domain_name type_name level_name
- Parameters
lxml_obj – lxml.etree.XMLParser() or lxml.etree.XMLSchema()
- Returns
Selected elements from the error_log of the lxml_obj.
- Return type
str
-
d1_scimeta.util.
load_bytes_from_file
(xml_path)¶
-
d1_scimeta.util.
save_tree_to_file
(xml_tree, xml_path)¶ Write pretty formatted XML tree to file.
-
d1_scimeta.util.
save_bytes_to_file
(xml_path, xml_bytes)¶ Write bytes to file.
-
d1_scimeta.util.
dump_pretty_tree
(xml_tree, msg_str='XML Tree', logger=<bound method Logger.debug of <Logger d1_scimeta.util (DEBUG)>>)¶
-
d1_scimeta.util.
dump
(o, msg_str='Object dump')¶
-
d1_scimeta.util.
pretty_format_tree
(xml_tree)¶
-
d1_scimeta.util.
is_valid_xml_file
(xml_path)¶
-
d1_scimeta.util.
is_valid_xsd_file
(xsd_path)¶
-
d1_scimeta.util.
is_url
(s)¶ Return True if s is a URL.
-
d1_scimeta.util.
strip_whitespace
(xml_tree)¶ Strip whitespace that might interfere with validation from XSD while maintaining overall formatting.
E.g., whitespace in a gco:DateTime trips up the validation:
- <gco:DateTime>
2011-03-18T15:39:17Z
</gco:DateTime>
This changes it to:
<gco:DateTime>2011-03-18T15:39:17Z</gco:DateTime>
- Parameters
xml_tree
- Returns
stripped xml_tree
-
d1_scimeta.util.
remove_empty_elements
(xml_tree)¶ Remove empty elements that might interfere with validation from XSD while maintaining overall formatting.
- Parameters
xml_tree
- Returns
stripped xml_tree
-
d1_scimeta.util.
apply_xslt_transform
(xml_tree, xslt_path)¶
-
d1_scimeta.util.
create_lxml_obj
(xml_tree, lxml_obj_class)¶ Create an object from an lxml class that takes a tree as parameter.
- Parameters
xml_tree – etree
lxml_obj_class – lxml object
lxml.etree.XMLSchema lxml.etree.XSLT
-
exception
d1_scimeta.util.
SciMetaError
¶ Bases:
Exception
d1_scimeta.validate module¶
Validate Science Metadata.
Usage:
import d1_scimeta.validate
- try:
d1_scimeta.validate.assert_valid(format_id, xml)
- except d1_scimeta.util.SciMetaError as e:
log.error(e)
-
d1_scimeta.validate.
assert_valid
(format_id, xml)¶ Validate an Science Metadata XML file.
- Parameters
format_id – str DataONE formatId. Must be one of the keys from the format_id_to_schema.json document. E.g., http://www.isotc211.org/2005/gmd.
xml – str, bytes or tree str: Path to XML file to validate. bytes: UTF-8 encoded bytes of XML doc to validate tree: lxml.etree of XML doc to validate
- Raises
On validation error – d1_scimeta.util.SciMetaError
- Returns
None
- Return type
On successful validation
-
d1_scimeta.validate.
validate_tree
(format_id, xml_tree)¶
-
d1_scimeta.validate.
validate_bytes
(format_id, xml_bytes, xml_path=None)¶
-
d1_scimeta.validate.
validate_path
(format_id, xml_path)¶
-
d1_scimeta.validate.
apply_xerces_adaption_schema_transform
(root_xsd_path, xml_tree)¶