Custom Parsers#
In the situation where the output files from a process are not processable by any of the built-in parsers, a custom parser can be created to extract the information of interest.
File Parsers#
File parsers are used for tracking, they take the path of an identified candidate as an argument, parse the data from that file and return a key-value mapping of the data of interest. To create a custom parser you will need to use the multiparser.parsing.file.file_parser decorator. The function should take an argument input_file which is the file path, and allow an arbitrary number of additional arguments (**_) to be compatible with the decorator. It should return either:
- Two dictionaries containing relevant metadata (usually left blank), and the parsed information:
{...}, {...}. - A single dictionary representing the metadata, and a list of dictionaries (for cases where multiple lines are read in a single parse and these should be kept separate):
{...}, [{...}, ...]
from typing import Any
import multiparser.parsing.file as mp_file_parse
@mp_file_parse.file_parser
def parse_user_file_type(input_file: str, **_) -> tuple[dict[str, Any], dict[str, Any]]:
with open(file_path) as in_f:
file_lines = in_f.readlines()
data = {}
for line in file_lines:
key, value = line.split(":", 1)
data[key.strip().lower().replace(" ", "_")] = value.strip()
return {}, data
To use the parser within a FileMonitor session:
with multiparser.FileMonitor(timeout=10) as monitor:
monitor.track(path_glob_exprs="custom_file.log", parser_func=parser_user_file_type)
monitor.run()
In the case where you would like your parser function to accept additional keyword arguments you can add these
to the definition and pass them to tracking using the parser_kwargs argument.
Empty Data
The per_thread_callback of FileMonitor only executes when new data is parsed from a modified file,
not when the file is marked as modified. Therefore a custom parser which either does not return data, or
returns explicitly an empty dictionary will never trigger the per_thread_callback method.
Log Parsers#
In the case where the custom parser will be used in file "tailing", that is read only the latest information appended to the file, the multiparser.parsing.tail.log_parser decorator is used when defining the function. The function should take an argument file_content which is a string containing the latest read content, and allow an arbitrary number of additional arguments (**_) to be compatible with the decorator. It should return either:
- Two dictionaries containing relevant metadata (usually left blank), and the parsed information:
{...}, {...}. - A single dictionary representing the metadata, and a list of dictionaries (for cases where multiple lines are read in a single parse and these should be kept separate:
{...}, [{...}, ...].
from typing import Any
import multiparser.parsing.tail as mp_tail_parse
@mp_tail_parse.log_parser
def parse_user_file_type(file_content: str, **_) -> tuple[dict[str, Any], dict[str, Any]]:
file_lines = file_content.split("\n")
data = {}
for line in file_lines:
key, value = line.split(":", 1)
data[key.strip().lower().replace(" ", "_")] = value.strip()
return {}, data
To use the parser within a FileMonitor session:
with multiparser.FileMonitor(timeout=10) as monitor:
monitor.tail(path_glob_exprs="custom_file.log", parser_func=parser_user_file_type)
monitor.run()
In the case where you would like your parser function to accept additional keyword arguments you can add these
to the definition and pass them to tracking using the parser_kwargs argument.
Class methods as parsers#
When custom parsers are decorated any positional and keyword arguments are propagated through, i.e.:
will be called as:
This allows class methods to also work as custom parsers because self is handled correctly:
Empty Data
The per_thread_callback of FileMonitor only executes when new data is parsed from a modified file,
not when the file is marked as modified. Therefore a custom parser which either does not return data, or
returns explicitly an empty dictionary will never trigger the per_thread_callback method.