d1_common.iter package¶
This package contains iterators that provide a convenient way to retrieve and iterate over Node contents.
Although this directory is not a package, this __init__.py file is required for pytest to be able to reach test directories below this directory.
Submodules¶
d1_common.iter.bytes module¶
Generator that returns a bytes
object in chunks.
d1_common.iter.path module¶
Generator that resolves a list of file and dir paths and returns file paths with optional filtering and client feedback.
-
d1_common.iter.path.
path_generator
(path_list, include_glob_list=None, exclude_glob_list=None, recursive=True, ignore_invalid=False, default_excludes=True, return_entered_dir_paths=False, return_skipped_dir_paths=False)¶ # language=rst.
- Parameters
path_list – list of str
List of file- and dir paths. File paths are used directly and dirs are searched for files.
path_list
does not accept glob patterns, as it’s more convenient to let the shell expand glob patterns to directly specified files and dirs. E.g., to use a glob to select all .py files in a subdir, the command may be called with sub/dir/*.py, which the shell expands to a list of files, which are then passed to this function. The paths should be Unicode or utf-8 strings. Tilde (“~”) to home expansion is performed on the paths.The shell can also expand glob patterns to dir paths or a mix of file and dir paths.
include_glob_list – list of str
exclude_glob_list – list of str
Patterns ending with “/” are matched only against dir names. All other patterns are matched only against file names.
If the include list contains any file patterns, files must match one or more of the patterns in order to be returned.
If the include list contains any dir patterns, dirs must match one or more of the patterns in order for the recursive search to descend into them.
The exclude list works in the same way except that matching files and dirs are excluded instead of included. If both include and exclude lists are specified, files and dirs must both match the include and not match the exclude patterns in order to be returned or descended into.
recursive – bool
True (default): Search subdirectories
False: Do not search subdirectories
ignore_invalid – bool
True: Invalid paths in path_list are ignored.
False (default): EnvironmentError is raised if any of the paths in
path_list
do not reference an existing file or dir.
default_excludes – bool
True: A list of glob patterns for files and dirs that should typically be ignored is added to any exclude patterns passed to the function. These include dirs such as .git and backup files, such as files appended with “~”.
False: No files or dirs are excluded by default.
return_entered_dir_paths – bool
False: Only file paths are returned.
True: Directory paths are also returned.
return_skipped_dir_paths – bool
False: Paths of skipped dirs are not returned.
True: Paths of skipped dirs are returned.
The iterator never descends into excluded dirs, and by default, does not return the paths of excluded dirs. However, the client may need to get the paths of dirs that were excluded instead of dirs that were included. E.g., when looking for dirs to delete.
- Returns
File path iterator
Notes
During iteration, the iterator can be prevented from descending into a directory by sending a “skip” flag when the iterator yields the directory path. This allows the client to determine if directories should be iterated by, for instance, which files are present in the directory. This can be used in conjunction with the include and exclude glob lists. Note that, in order to receive directory paths that can be skipped,
return_entered_dir_paths
must be set to True.The regular
for...in
syntax does not support sending the “skip” flag back to the iterator. Instead, use a pattern like:itr = file_iterator.file_iter(..., return_entered_dir_paths=True) try: path = itr.next() while True: skip_dir = determine_if_dir_should_be_skipped(path) file_path = itr.send(skip_dir) except KeyboardInterrupt: raise StopIteration except StopIteration: pass
Glob patterns are matched only against file and directory names, not the full paths.
Paths passed directly in
path_list
are not filtered.The same file can be returned multiple times if
path_list
contains duplicated file paths or dir paths, or dir paths that implicitly include the same subdirs.include_glob_list
andexclude_glob_list
are handy for filtering the files found in dir searches.Remember to escape the include and exclude glob patterns on the command line so that they’re not expanded by the shell.
-
class
d1_common.iter.path.
ArgParser
(description_str=None, formatter_class=<class 'argparse.RawDescriptionHelpFormatter'>, **val_dict)¶ Bases:
object
An argparse.ArgumentParser populated with a standard set of command line arguments for controlling the path generator from the command line.
The script that calls this function will typically add its own specific arguments by making additional
parser.add_argument()
calls.When creating the path_generator, simply pass
parser.path_arg_dict
to path_generator().Example
import d1_common.iter.path
- parser = d1_common.iter.path.ArgParser(
__doc__, # Set non-configurable values include_glob_list=[‘*.py’], return_entered_dir_paths=True,
) # Add command specific arguments parser.add_argument(…) # Create the path_generator and iterate over the resulting paths for p in d1_common.iter.path.path_generator(parser.path_arg_dict):
print(p)
-
ARG_DICT
= {'default_excludes': ('--no-default-excludes', {'action': 'store_false', 'help': "Don't add default glob exclude patterns"}), 'exclude_glob_list': ('--exclude', {'default': ['*egg-info/', '.git/', '.idea/', '__pycache__/', '.eggs/', '.pytest_cache/', 'build/', 'dist/', 'doc/', 'generated/', 'migrations/', '*~', '*.bak', '*.tmp', '*.pyc'], 'nargs': '+', 'metavar': 'glob', 'help': 'Exclude glob patterns'}), 'ignore_invalid': ('--ignore-invalid', {'action': 'store_true', 'help': 'Ignore invalid paths'}), 'include_glob_list': ('--include', {'nargs': '+', 'metavar': 'glob', 'help': 'Exclude glob patterns'}), 'path_list': ('path', {'nargs': '+', 'help': 'File or directory path'}), 'recursive': ('--no-recursive', {'action': 'store_false', 'help': 'Do not search directories recursively'}), 'return_entered_dir_paths': ('--return-entered-dir-paths', {'action': 'store_true', 'help': 'Return the paths of dirs that the generator enters'}), 'return_skipped_dir_paths': ('--return-skipped-dir-paths', {'action': 'store_true', 'help': 'Return the paths of dirs that the generator enters'})}¶
-
__init__
(description_str=None, formatter_class=<class 'argparse.RawDescriptionHelpFormatter'>, **val_dict)¶ Create a ArgumentParser populated with a standard set of command line arguments for controlling the path generator from the command line.
- Parameters
description_str – Description of the command The description is included in the automatically generated help message.
formatter_class – Modify the help message format. See the argparse module for available Formatter classes.
fixed value overrides – Passing any of these arguments causes provided value to be used when instantiating the path generator, and the corresponding command line argument to become hidden and unavailable.
fixed_path_list fixed_exclude_glob_list fixed_include_glob_list fixed_recursive fixed_ignore_invalid fixed_default_excludes fixed_return_entered_dir_paths fixed_return_skipped_dir_paths
default value overrides – Passing any of these arguments causes the provided value to be used as the default. The corresponding command line argument is still available and can be used to override the default value
default_path_list default_exclude_glob_list default_include_glob_list default_recursive default_ignore_invalid default_default_excludes default_return_entered_dir_paths default_return_skipped_dir_paths
-
add_argument
(*arg_list, **arg_dict)¶ Add command specific arguments.
-
property
args
¶ Get complete command line arguments as Namespace object.
- Returns
Complete command line arguments.
This is an exact representation of the parsed command line and does not include any fixed value substitutions from the val_dict passed to __init__().
- Return type
Namespace
-
property
path_arg_dict
¶ Get command line arguments as dict suitable for passing to a path_generator create call via argument unpacking.
- Returns
Arguments valid for passing to path_generator() create call.
The dict will include any fixed value substitutions that were passed to __init__() via the val_dict.
- Return type
dict
d1_common.iter.stream module¶
-
class
d1_common.iter.stream.
StreamIterator
(stream, chunk_size=1024)¶ Bases:
object
Generator that returns a stream in chunks.
In this context, a stream is anything with a
read()
method and, if the client requires it, a way to determine the total number of elements that will be returned by theread()
method at any point during iteration.Typical sources for streams are files and HTML responses.
-
__init__
(stream, chunk_size=1024)¶ Args: stream: Object with
read()
method, such as an open file.chunk_size: int Max number of elements to return in each chunk. The last chunk will normally be smaller. Other chunks may be smaller as well, but never empty.
-
property
size
¶ Returns:
int : The total number of bytes that will be returned by the iterator.
-
d1_common.iter.string module¶
Generator that returns the Unicode characters of a str
in chunks.