Functions | |
| def | index_summary (run_metrics, level='Lane', columns=None, dtype='f4', extra) |
| def | index_summary_columns |
| def | summary (run_metrics, level='Total', columns=None, dtype='f4', ignore_missing_columns=True, extra) |
| def | load_summary_metrics () |
| def | summary_columns |
| def | indexing (run_metrics, per_sample=True, dtype='f4', stype='O', extra) |
| def | imaging (run_metrics, dtype='f4', extra) |
| def | imaging_columns (run_metrics, extra) |
| def | read (run, valid_to_load=None, requires=None, search_paths=None, extra) |
| def | read_metric |
| def | create_valid_to_load (interop_prefixes) |
| def | enable_metrics (valid_to_load, interop_prefixes) |
| def | load_to_string_list (valid_to_load) |
| def | group_from_filename (filename) |
| def | load_imaging_metrics () |
Variables | |
| tuple | _summary_levels = ('Total', 'NonIndex', 'Read', 'Lane', 'Surface') |
| tuple | _index_summary_levels = ('Lane', 'Barcode') |
Detailed Description
@package interop {#interop_core}
Core routines to simplify using the InterOp Library
InterOp is built around a single data structure alled a `run_metrics` object. This contains the full set of InterOps
along with the RunInfo.xml and some of the RunParameters.xml.
A run metrics object can be read in as follows:
>>> from interop import read
>>> run_metrics = read("some/path/run_folder_name") # doctest: +SKIP
Core routines take the run_metrics object and convert it into a table represented by a structured NumPy array. This can,
in turn, be converted to a pandas DataFrame or any other data structure.
The core routines include the following:
>>> from interop import index_summary
>>> index_summary(run_metrics_with_indexing)
array([(1, 0.46, 1015.56, 520.67, 1536.22, 1800., 2000.)],
dtype=[('Lane', '<u2'), ('Mapped Reads Cv', '<f4'), ('Max Mapped Reads', '<f4'), ('Min Mapped Reads', '<f4'), ('Total Fraction Mapped Reads', '<f4'), ('Total Pf Reads', '<f4'), ('Total Reads', '<f4')])
>>> from interop import summary
>>> summary(run_metrics_example)
array([(0.37, 6.67, 0., 0., 0.)],
dtype=[('Error Rate', '<f4'), ('First Cycle Intensity', '<f4'), ('Projected Yield G', '<f4'), ('Reads', '<f4'), ('Reads Pf', '<f4')])
>>> from interop import indexing
>>> indexing(run_metrics_with_indexing)
array([(1., 1101., 'ATCACGAC-AAGGTTCA', '1', 4570., 900., 507.78),
(1., 1101., 'ATCACGAC-GGGGGGGG', '2', 2343., 900., 260.33),
(1., 1102., 'ATCACGAC-AAGGTTCA', '1', 4570., 0., 0. ),
(1., 1102., 'ATCACGAC-GGGGGGGG', '2', 2343., 0., 0. )],
dtype=[('Lane', '<f4'), ('Tile', '<f4'), ('Barcode', 'O'), ('SampleID', 'O'), ('Cluster Count', '<f4'), ('Cluster Count PF', '<f4'), ('% Demux', '<f4')])
>>> from interop import imaging
>>> imaging(run_metrics_example)
rec.array([(1., 1101., 1., 1., 1., 0.1, 10., 10., 25. , 33.3, 33.3, 33.3, 0., 10., 10., nan, nan, nan, nan, nan, nan, nan, nan, nan, 1., 1., 1.),
(1., 1101., 2., 1., 2., 0.2, 5., 15., 12.5, 42.9, 28.6, 28.6, 0., 5., 15., nan, nan, nan, nan, nan, nan, nan, nan, nan, 1., 1., 1.),
(1., 1101., 3., 1., 3., 0.3, 10., 10., 25. , 33.3, 50. , 16.7, 0., 10., 10., nan, nan, nan, nan, nan, nan, nan, nan, nan, 1., 1., 1.),
(1., 1101., 4., 2., 1., 0.4, 10., 5., 25. , 16.7, 50. , 33.3, 0., 10., 5., nan, nan, nan, nan, nan, nan, nan, nan, nan, 1., 1., 1.),
(1., 1101., 5., 3., 1., 0.5, 15., 5., 37.5, 20. , 40. , 40. , 0., 15., 5., nan, nan, nan, nan, nan, nan, nan, nan, nan, 1., 1., 1.)],
dtype=[('Lane', '<f4'), ('Tile', '<f4'), ('Cycle', '<f4'), ('Read', '<f4'), ('Cycle Within Read', '<f4'), ('Error Rate', '<f4'), ('P90/green', '<f4'), ('P90/blue', '<f4'), ('% No Calls', '<f4'), ('% Base/A', '<f4'), ('% Base/C', '<f4'), ('% Base/G', '<f4'), ('% Base/T', '<f4'), ('Fwhm/green', '<f4'), ('Fwhm/blue', '<f4'), ('Corrected/A', '<f4'), ('Corrected/C', '<f4'), ('Corrected/G', '<f4'), ('Corrected/T', '<f4'), ('Called/A', '<f4'), ('Called/C', '<f4'), ('Called/G', '<f4'), ('Called/T', '<f4'), ('Signal To Noise', '<f4'), ('Surface', '<f4'), ('Swath', '<f4'), ('Tile Number', '<f4')])
Any of the core routines above can take a `run_metrics` object or a string containing a file path to a valid run folder.
>>> ar = index_summary("some/path/run_folder_name") # doctest: +SKIP
The structured NumPy array can be converted to a Pandas DataFrame just so:
>>> import pandas as pd # doctest: +SKIP
>>> df = pd.DataFrame(ar) # doctest: +SKIP
For more information see the documentation around each function below.
Function Documentation
| def core.create_valid_to_load | ( | interop_prefixes | ) |
Create list of metrics valid to load by the InterOp library
List of validate metric_names can be gotten using `list_interop_files`
>>> from interop import create_valid_to_load
>>> int(create_valid_to_load(['Extraction'])[0])
0
>>> create_valid_to_load(0)
Traceback (most recent call last):
...
TypeError: Parameter valid_to_load must be a collection of values
:param interop_prefixes: list of strings containing InterOp metric names
:return: py_interop_run.uchar_vector
| def core.enable_metrics | ( | valid_to_load, | |
| interop_prefixes | |||
| ) |
Enable metrics in valid_to_load
>>> from interop import enable_metrics, load_to_string_list
>>> import interop.py_interop_run as interop_run
>>> valid_to_load = interop_run.uchar_vector(interop_run.MetricCount, 0)
>>> load_to_string_list(enable_metrics(valid_to_load, 'Extraction'))
['Extraction']
>>> load_to_string_list(enable_metrics(valid_to_load, ['Error', 'Q']))
['Error', 'Extraction', 'Q']
Nothing changes when passing in an empty list
>>> load_to_string_list(enable_metrics(valid_to_load, []))
['Error', 'Extraction', 'Q']
Here are some example exceptions when the improper parameter is given
>>> enable_metrics(valid_to_load, None)
Traceback (most recent call last):
...
TypeError: 'NoneType' object is not iterable
>>> enable_metrics(None, [])
Traceback (most recent call last):
...
TypeError: Parameter valid_to_load must be of type interop.py_interop_run.uchar_vector
>>> enable_metrics("None", [])
Traceback (most recent call last):
...
TypeError: Parameter valid_to_load must be of type interop.py_interop_run.uchar_vector
:param valid_to_load: interop_run.uchar_vector (boolean array)
:param interop_prefixes: list of metrics to enable
:return: interop_run.uchar_vector (It is updated in-place so the return can be ignored)
| def core.group_from_filename | ( | filename | ) |
Get the metric group id from an InterOp filename path
>>> from interop import group_from_filename
>>> import interop.py_interop_run as interop_run
>>> group_from_filename("some/path/run/InterOp/ExtractionMetricsOut.bin")
2
>>> interop_run.Extraction
2
This group id can be used to load a metric from a binary buffer as in `interop.core.read_metric`
:param filename: path to interop metric
:return: interop_run.metric_group
| def core.imaging | ( | run_metrics, | |
dtype = 'f4', |
|||
| extra | |||
| ) |
Convert InterOp run_metrics (or read run_metrics from disk) to a numpy structured array containing the imaging
table
We can read an imaging table directly from a run folder. Note, this does not load all metrics, only those required
by the imaging table. See `load_imaging_metrics` for that list.
Also note that loading only tile level metrics (e.g. metrics without cycles) will result in an empty table. This is
a limitation of the imaging table.
>>> from interop import imaging
>>> from interop import load_imaging_metrics
>>> import interop.py_interop_run_metrics as interop_metrics
>>> import numpy as np
>>> ar = imaging("some/path/run_folder_name") # doctest: +SKIP
The above function is equivalent to
>>> ar = imaging("some/path/run_folder_name", valid_to_load=load_imaging_metrics()) # doctest: +SKIP
We can select a subset of metrics to include based on metric groups
>>> ar = imaging("some/path/run_folder_name", valid_to_load=['Error']) # doctest: +SKIP
See `read` below for more examples.
The following example will rely on an existing run_metrics object (possibly created by the `read` function below).
>>> ar = imaging(run_metrics_example)
>>> ar
rec.array([(1., 1101., 1., 1., 1., 0.1, 10., 10., 25. , 33.3, 33.3, 33.3, 0., 10., 10., nan, nan, nan, nan, nan, nan, nan, nan, nan, 1., 1., 1.),
(1., 1101., 2., 1., 2., 0.2, 5., 15., 12.5, 42.9, 28.6, 28.6, 0., 5., 15., nan, nan, nan, nan, nan, nan, nan, nan, nan, 1., 1., 1.),
(1., 1101., 3., 1., 3., 0.3, 10., 10., 25. , 33.3, 50. , 16.7, 0., 10., 10., nan, nan, nan, nan, nan, nan, nan, nan, nan, 1., 1., 1.),
(1., 1101., 4., 2., 1., 0.4, 10., 5., 25. , 16.7, 50. , 33.3, 0., 10., 5., nan, nan, nan, nan, nan, nan, nan, nan, nan, 1., 1., 1.),
(1., 1101., 5., 3., 1., 0.5, 15., 5., 37.5, 20. , 40. , 40. , 0., 15., 5., nan, nan, nan, nan, nan, nan, nan, nan, nan, 1., 1., 1.)],
dtype=[('Lane', '<f4'), ('Tile', '<f4'), ('Cycle', '<f4'), ('Read', '<f4'), ('Cycle Within Read', '<f4'), ('Error Rate', '<f4'), ('P90/green', '<f4'), ('P90/blue', '<f4'), ('% No Calls', '<f4'), ('% Base/A', '<f4'), ('% Base/C', '<f4'), ('% Base/G', '<f4'), ('% Base/T', '<f4'), ('Fwhm/green', '<f4'), ('Fwhm/blue', '<f4'), ('Corrected/A', '<f4'), ('Corrected/C', '<f4'), ('Corrected/G', '<f4'), ('Corrected/T', '<f4'), ('Called/A', '<f4'), ('Called/C', '<f4'), ('Called/G', '<f4'), ('Called/T', '<f4'), ('Signal To Noise', '<f4'), ('Surface', '<f4'), ('Swath', '<f4'), ('Tile Number', '<f4')])
>>> ar.dtype
dtype((numpy.record, [('Lane', '<f4'), ('Tile', '<f4'), ('Cycle', '<f4'), ('Read', '<f4'), ('Cycle Within Read', '<f4'), ('Error Rate', '<f4'), ('P90/green', '<f4'), ('P90/blue', '<f4'), ('% No Calls', '<f4'), ('% Base/A', '<f4'), ('% Base/C', '<f4'), ('% Base/G', '<f4'), ('% Base/T', '<f4'), ('Fwhm/green', '<f4'), ('Fwhm/blue', '<f4'), ('Corrected/A', '<f4'), ('Corrected/C', '<f4'), ('Corrected/G', '<f4'), ('Corrected/T', '<f4'), ('Called/A', '<f4'), ('Called/C', '<f4'), ('Called/G', '<f4'), ('Called/T', '<f4'), ('Signal To Noise', '<f4'), ('Surface', '<f4'), ('Swath', '<f4'), ('Tile Number', '<f4')]))
We can convert the numpy array to a Pandas DataFrame as follows:
>>> import pandas as pd # doctest: +SKIP
>>> df = pd.DataFrame(ar) # doctest: +SKIP
>>> df # doctest: +SKIP
Lane ... Tile Number
0 1.0 ... 1.0
1 1.0 ... 1.0
2 1.0 ... 1.0
3 1.0 ... 1.0
4 1.0 ... 1.0
<BLANKLINE>
[5 rows x 27 columns]
You can also change the dtype of the resulting data array table.
>>> imaging(run_metrics_example, dtype=np.float32)
rec.array([(1., 1101., 1., 1., 1., 0.1, 10., 10., 25. , 33.3, 33.3, 33.3, 0., 10., 10., nan, nan, nan, nan, nan, nan, nan, nan, nan, 1., 1., 1.),
(1., 1101., 2., 1., 2., 0.2, 5., 15., 12.5, 42.9, 28.6, 28.6, 0., 5., 15., nan, nan, nan, nan, nan, nan, nan, nan, nan, 1., 1., 1.),
(1., 1101., 3., 1., 3., 0.3, 10., 10., 25. , 33.3, 50. , 16.7, 0., 10., 10., nan, nan, nan, nan, nan, nan, nan, nan, nan, 1., 1., 1.),
(1., 1101., 4., 2., 1., 0.4, 10., 5., 25. , 16.7, 50. , 33.3, 0., 10., 5., nan, nan, nan, nan, nan, nan, nan, nan, nan, 1., 1., 1.),
(1., 1101., 5., 3., 1., 0.5, 15., 5., 37.5, 20. , 40. , 40. , 0., 15., 5., nan, nan, nan, nan, nan, nan, nan, nan, nan, 1., 1., 1.)],
dtype=[('Lane', '<f4'), ('Tile', '<f4'), ('Cycle', '<f4'), ('Read', '<f4'), ('Cycle Within Read', '<f4'), ('Error Rate', '<f4'), ('P90/green', '<f4'), ('P90/blue', '<f4'), ('% No Calls', '<f4'), ('% Base/A', '<f4'), ('% Base/C', '<f4'), ('% Base/G', '<f4'), ('% Base/T', '<f4'), ('Fwhm/green', '<f4'), ('Fwhm/blue', '<f4'), ('Corrected/A', '<f4'), ('Corrected/C', '<f4'), ('Corrected/G', '<f4'), ('Corrected/T', '<f4'), ('Called/A', '<f4'), ('Called/C', '<f4'), ('Called/G', '<f4'), ('Called/T', '<f4'), ('Signal To Noise', '<f4'), ('Surface', '<f4'), ('Swath', '<f4'), ('Tile Number', '<f4')])
Here is the output if an empty run_metrics was provided
>>> imaging(interop_metrics.run_metrics())
array([], dtype=float64)
Here is an example exception if an improper input is given
>>> imaging(None)
Traceback (most recent call last):
...
ValueError: Expected interop.py_interop_run_metrics.run_metrics or str for `run_metrics`
:param run_metrics: py_interop_run_metrics.run_metrics or str file path to a run folder
:param dtype: data type for the array (Default: 'f4')
:param extra: all extra parameters are passed to `read` if the first parameter is a str file path to a run folder
:return: structured with column names and dype - np.array
| def core.imaging_columns | ( | run_metrics, | |
| extra | |||
| ) |
Get a list of imaging table columns
>>> from interop import imaging_columns
>>> from interop import load_imaging_metrics
>>> import interop.py_interop_run_metrics as interop_metrics
>>> import numpy as np
>>> ar = imaging_columns("some/path/run_folder_name") # doctest: +SKIP
The above function is equivalent to
>>> ar = imaging_columns("some/path/run_folder_name", valid_to_load=load_imaging_metrics()) # doctest: +SKIP
We can select a subset of metrics to include based on metric groups
>>> ar = imaging_columns("some/path/run_folder_name", valid_to_load=['Error']) # doctest: +SKIP
See `read` below for more examples.
The following example will rely on an existing run_metrics object (possibly created by the `read` function below).
>>> imaging_columns(run_metrics_example)
['Lane', 'Tile', 'Cycle', 'Read', 'Cycle Within Read', 'Error Rate', 'P90/green', 'P90/blue', '% No Calls', '% Base/A', '% Base/C', '% Base/G', '% Base/T', 'Fwhm/green', 'Fwhm/blue', 'Corrected/A', 'Corrected/C', 'Corrected/G', 'Corrected/T', 'Called/A', 'Called/C', 'Called/G', 'Called/T', 'Signal To Noise', 'Surface', 'Swath', 'Tile Number']
:param run_metrics: py_interop_run_metrics.run_metrics or str file path to a run folder
:param extra: all extra parameters are passed to `read` if the first parameter is a str file path to a run folder
:return: list of string headers
| def core.index_summary | ( | run_metrics, | |
level = 'Lane', |
|||
columns = None, |
|||
dtype = 'f4', |
|||
| extra | |||
| ) |
Index summary table
>>> from interop import index_summary
>>> ar = index_summary("some/path/run_folder_name") # doctest: +SKIP
>>> index_summary(run_metrics_with_indexing)
array([(1, 0.46, 1015.56, 520.67, 1536.22, 1800., 2000.)],
dtype=[('Lane', '<u2'), ('Mapped Reads Cv', '<f4'), ('Max Mapped Reads', '<f4'), ('Min Mapped Reads', '<f4'), ('Total Fraction Mapped Reads', '<f4'), ('Total Pf Reads', '<f4'), ('Total Reads', '<f4')])
>>> index_summary(run_metrics_with_indexing, level='Barcode')
array([(1, 18280., 1015.56, 1., 'ATCACGAC', 'AAGGTTCA', 'TSCAIndexes', '1'),
(1, 9372., 520.67, 2., 'ATCACGAC', 'GGGGGGGG', 'TSCAIndexes', '2')],
dtype=[('Lane', '<u2'), ('Cluster Count', '<f4'), ('Fraction Mapped', '<f4'), ('Id', '<f4'), ('Index1', 'O'), ('Index2', 'O'), ('Project Name', 'O'), ('Sample Id', 'O')])
>>> index_summary(run_metrics_with_indexing, columns=['Total Fraction Mapped Reads'])
array([(1, 1536.22)],
dtype=[('Lane', '<u2'), ('Total Fraction Mapped Reads', '<f4')])
>>> index_summary(run_metrics_with_indexing, columns=['Incorrect'])
Traceback (most recent call last):
...
ValueError: Column `Incorrect` not found in: ['Mapped Reads Cv', 'Max Mapped Reads', 'Min Mapped Reads', 'Total Fraction Mapped Reads', 'Total Pf Reads', 'Total Reads'] - column not consistent with level or misspelled
>>> index_summary(run_metrics_with_indexing, level='Incorrect')
Traceback (most recent call last):
...
ValueError: level=Incorrect not in ('Lane', 'Barcode')
:param run_metrics: py_interop_run_metrics.run_metrics or string run folder path
:param level: level of the data to summarize, valid values include: 'Total', 'NonIndex', 'Read', 'Lane', 'Surface' (Default: Total)
:param columns: list of columns (valid values depend on the level) see `summary_columns`
:param dtype: data type for the array (Default: 'f4')
:param extra: all extra parameters are passed to `read` if the first parameter is a str file path to a run folder
:return: structured with column names and dype - np.array
| def core.index_summary_columns | ( | level = 'Lane', |
|
ret_dict = False |
|||
| ) |
List the columns of the `index_summary` table
>>> from interop import index_summary_columns
>>> index_summary_columns()
('Mapped Reads Cv', 'Max Mapped Reads', 'Min Mapped Reads', 'Total Fraction Mapped Reads', 'Total Pf Reads', 'Total Reads')
>>> index_summary_columns('Barcode')
('Cluster Count', 'Fraction Mapped', 'Id', 'Index1', 'Index2', 'Project Name', 'Sample Id')
:param level: level of the data to summarize, valid values include: 'Lane', 'Barcode' (Default: Lane)
:param ret_dict: if true, return a dict mapping from column name to method name (Default: False)
:return: tuple of columns (or dictionary mapping column name to method depending on `ret_dict` parameter)
| def core.indexing | ( | run_metrics, | |
per_sample = True, |
|||
dtype = 'f4', |
|||
stype = 'O', |
|||
| extra | |||
| ) |
Convert InterOp run_metrics (or read run_metrics from disk) to a numpy structured array containing an
indexing table
We can read an indexing table directly from a run folder. Note, this does not load all metrics, only those required
by the indexing table, e.g. IndexMetricsOut.bin
>>> from interop import indexing
>>> ar = indexing("some/path/run_folder_name") # doctest: +SKIP
Note that `valid_to_load` in `read` is ignored.
We can also convert a `run_metrics` object to an indexing table as follows
>>> ar = indexing(run_metrics_with_indexing)
>>> ar
array([(1., 1101., 'ATCACGAC-AAGGTTCA', '1', 4570., 900., 507.78),
(1., 1101., 'ATCACGAC-GGGGGGGG', '2', 2343., 900., 260.33),
(1., 1102., 'ATCACGAC-AAGGTTCA', '1', 4570., 0., 0. ),
(1., 1102., 'ATCACGAC-GGGGGGGG', '2', 2343., 0., 0. )],
dtype=[('Lane', '<f4'), ('Tile', '<f4'), ('Barcode', 'O'), ('SampleID', 'O'), ('Cluster Count', '<f4'), ('Cluster Count PF', '<f4'), ('% Demux', '<f4')])
The `indexing` function also provides an overall sample view by setting `per_sample=False`.
>>> ar = indexing(run_metrics_with_indexing, per_sample=False)
>>> ar
array([(1., 1101., 1000., 900., 768.11), (1., 1102., 0., 0., 0. )],
dtype=[('Lane', '<f4'), ('Tile', '<f4'), ('Cluster Count', '<f4'), ('Cluster Count PF', '<f4'), ('% Demux', '<f4')])
:param run_metrics: py_interop_run_metrics.run_metrics or string run folder path
:param per_sample: return demux per sample (Default: True)
:param dtype: data type for the array (Default: 'f4')
:param stype: string type for the array (Default: 'O')
:param extra: all extra parameters are passed to `read` if the first parameter is a str file path to a run folder
:return: structured with column names and dype - np.array
| def core.load_imaging_metrics | ( | ) |
List of valid imaging metrics to load >>> from interop import load_to_string_list >>> from interop import load_imaging_metrics >>> load_to_string_list(load_imaging_metrics()) ['CorrectedInt', 'Error', 'Extraction', 'Image', 'Q', 'Tile', 'QByLane', 'QCollapsed', 'EmpiricalPhasing', 'DynamicPhasing', 'ExtendedTile'] :return: valid_to_load
| def core.load_summary_metrics | ( | ) |
List of valid summary metrics to load >>> from interop import load_to_string_list >>> from interop import load_summary_metrics >>> load_to_string_list(load_summary_metrics()) ['CorrectedInt', 'Error', 'Extraction', 'Q', 'Tile', 'QByLane', 'QCollapsed', 'EmpiricalPhasing', 'ExtendedTile'] :return: valid_to_load
| def core.load_to_string_list | ( | valid_to_load | ) |
Create a string list of names for each enabled metric in `valid_to_load`
>>> from interop import create_valid_to_load, load_to_string_list
>>> import interop.py_interop_run as interop_run
>>> valid_to_load = create_valid_to_load('Extraction')
>>> load_to_string_list(valid_to_load)
['Extraction']
>>> valid_to_load = interop_run.uchar_vector(interop_run.MetricCount, 1)
>>> load_to_string_list(valid_to_load)
['CorrectedInt', 'Error', 'Extraction', 'Image', 'Index', 'Q', 'Tile', 'QByLane', 'QCollapsed', 'EmpiricalPhasing', 'DynamicPhasing', 'ExtendedTile', 'SummaryRun']
:param valid_to_load: boolean buffer
:return: list of strings containing the name of each metric enabled in `valid_to_load`
| def core.read | ( | run, | |
valid_to_load = None, |
|||
requires = None, |
|||
search_paths = None, |
|||
| extra | |||
| ) |
Read InterOp metrics into a run_metrics object
- List of validate valid_to_load names can be gotten using `list_interop_files`
- If run is `interop.py_interop_run_metrics.run_metrics` then run is returned.
- If an InterOp file is missing from the `requires` list, then an empty run_metrics object is returned
Read in all metrics from a run folder
>>> from interop import read
>>> metrics = read("some/path/run_folder_name") # doctest: +SKIP
Read in only ErrorMetricsOut.bin in a run folder
>>> metrics = read("some/path/run_folder_name", valid_to_load=['Error']) # doctest: +SKIP
Read in ErrorMetricsOut.bin and ExtractionMetricsOut.bin but if ErrorMetricsOut.bin is missing return an empty
>>> metrics = read("some/path/run_folder_name", valid_to_load=['Error', 'Extraction'], requires=['Error']) # doctest: +SKIP
Read in IndexMetricsOut.bin and search for it outside the run folder in `fastq/reports`
>>> metrics = read("some/path/run_folder_name", valid_to_load=['Index'], search_paths=['fastq/reports']) # doctest: +SKIP
Read in a run folder that is not found
>>> metrics = read("some/non/existing/run_folder_name")
Traceback (most recent call last):
...
interop.py_interop_run.xml_file_not_found_exception: cannot open file some/non/existing/run_folder_name/RunInfo.xml
Read from a None object
>>> metrics = read(None)
Traceback (most recent call last):
...
ValueError: invalid null reference in method 'run_metrics_read', argument 2 of type 'std::string const &'
:param run: string path including name of run folder (or run_metrics object)
:param valid_to_load: list of strings containing InterOp metric names (Default: None, load everything)
:param requires: list of required metric (Default: None, check nothing)
:param search_paths: list of paths to search when looking for `IndexMetricsOut.bin` (Default: None, do not search)
:return: interop.py_interop_run_metrics.run_metrics
| def core.read_metric | ( | filename, | |
run_metrics = None, |
|||
finalize = False |
|||
| ) |
Read a specific metric from a file into a run_metrics object
This function allows incremental reading of metric files from disk. The last call should set
`finalize=True`.
Read in `ErrorMetricsOut.bin` into a run_metrics object and finalize since this is the only metric we plan to read
>>> from interop import read_metric
>>> metrics = read_metric("some/path/run_folder_name/InterOp/ErrorMetricsOut.bin", finalize=True) # doctest: +SKIP
:param filename: path to InterOp file
:param run_metrics: existing run_metrics object (Default None, one will be created)
:param finalize: if true, then call finalize_after_load (last call to `read_metric` should set finalize=True)
:return: interop.py_interop_run_metrics.run_metrics
| def core.summary | ( | run_metrics, | |
level = 'Total', |
|||
columns = None, |
|||
dtype = 'f4', |
|||
ignore_missing_columns = True, |
|||
| extra | |||
| ) |
Generate a summary table with the given level, columns and dtype from a run_metrics object or run_folder path
Note that not all columns will be included if InterOp files are missing or purposing excluded using `valid_to_load`.
The following examples show the different levels that one can summarize the data including:
- Total (Default)
- NonIndex
- Read
- Lane
- Summary
>>> from interop import summary
>>> ar = summary("some/path/run_folder_name") # doctest: +SKIP
>>> ar = summary("some/path/run_folder_name", valid_to_load=['Error']) # doctest: +SKIP
>>> summary(run_metrics_example)
array([(0.37, 6.67, 0., 0., 0.)],
dtype=[('Error Rate', '<f4'), ('First Cycle Intensity', '<f4'), ('Projected Yield G', '<f4'), ('Reads', '<f4'), ('Reads Pf', '<f4')])
>>> summary(run_metrics_example, 'Total')
array([(0.37, 6.67, 0., 0., 0.)],
dtype=[('Error Rate', '<f4'), ('First Cycle Intensity', '<f4'), ('Projected Yield G', '<f4'), ('Reads', '<f4'), ('Reads Pf', '<f4')])
>>> summary(run_metrics_example, 'NonIndex')
array([(0.2, 10., 0., 0., 0.)],
dtype=[('Error Rate', '<f4'), ('First Cycle Intensity', '<f4'), ('Projected Yield G', '<f4'), ('Reads', '<f4'), ('Reads Pf', '<f4')])
>>> summary(run_metrics_example, 'Read')
array([(1, 78, 0.2, 10., 0., 0., 0.), (2, 89, 0.4, 5., 0., 0., 0.),
(3, 89, 0.5, 5., 0., 0., 0.)],
dtype=[('ReadNumber', '<u2'), ('IsIndex', 'u1'), ('Error Rate', '<f4'), ('First Cycle Intensity', '<f4'), ('Projected Yield G', '<f4'), ('Reads', '<f4'), ('Reads Pf', '<f4')])
>>> summary(run_metrics_example, 'Lane')
array([(1, 78, 1, 0.2, 10., 0., 0., 0., 1.),
(2, 89, 1, 0.4, 5., 0., 0., 0., 1.),
(3, 89, 1, 0.5, 5., 0., 0., 0., 1.)],
dtype=[('ReadNumber', '<u2'), ('IsIndex', 'u1'), ('Lane', '<u2'), ('Error Rate', '<f4'), ('First Cycle Intensity', '<f4'), ('Projected Yield G', '<f4'), ('Reads', '<f4'), ('Reads Pf', '<f4'), ('Tile Count', '<f4')])
For a single surface, as is this example, nothing is reported.
>>> summary(run_metrics_example, 'Surface')
array([], dtype=float64)
We can select specific columns using the `columns` parameter
>>> summary(run_metrics_example, 'Total', columns=['First Cycle Intensity', 'Error Rate'])
array([(6.67, 0.37)],
dtype=[('First Cycle Intensity', '<f4'), ('Error Rate', '<f4')])
If a column values are NaN, or missing, then it will automatically be excluded
>>> summary(run_metrics_example, 'Total', columns=['% Aligned', 'Error Rate'])
array([(0.37,)], dtype=[('Error Rate', '<f4')])
To include missing columns, set `ignore_missing_columns=False`
>>> summary(run_metrics_example, 'Total', ignore_missing_columns=False, columns=['% Aligned', 'Error Rate'])
array([(nan, 0.37)], dtype=[('% Aligned', '<f4'), ('Error Rate', '<f4')])
>>> summary(run_metrics_example, 'Total', columns=['Incorrect'])
Traceback (most recent call last):
...
ValueError: Column `Incorrect` not found in: ['Error Rate', 'First Cycle Intensity', '% Aligned', '% >= Q30', '% Occupancy Proxy', '% Occupied', 'Projected Yield G', 'Yield G'] - column not consistent with level or misspelled
:param run_metrics: py_interop_run_metrics.run_metrics or string run folder path
:param level: level of the data to summarize, valid values include: 'Total', 'NonIndex', 'Read', 'Lane', 'Surface' (Default: Total)
:param columns: list of columns (valid values depend on the level) see `summary_columns`
:param dtype: data type for the array (Default: 'f4')
:param ignore_missing_columns: ignore missing columns, e.g. those with NaN values (Default: True)
:param extra: all extra parameters are passed to `read` if the first parameter is a str file path to a run folder
:return: structured with column names and dype - np.array
| def core.summary_columns | ( | level = 'Total', |
|
ret_dict = False |
|||
| ) |
Get a list of column names supported at each level of the summary table
>>> from interop import summary_columns
The default columns are for the Run/Read level
>>> summary_columns()
('Cluster Count', 'Cluster Count Pf', 'Error Rate', 'First Cycle Intensity', '% Aligned', '% >= Q30', '% Occupancy Proxy', '% Occupied', 'Projected Yield G', 'Reads', 'Reads Pf', 'Yield G')
>>> summary_columns(level='Total')
('Cluster Count', 'Cluster Count Pf', 'Error Rate', 'First Cycle Intensity', '% Aligned', '% >= Q30', '% Occupancy Proxy', '% Occupied', 'Projected Yield G', 'Reads', 'Reads Pf', 'Yield G')
>>> summary_columns(level='NonIndex')
('Cluster Count', 'Cluster Count Pf', 'Error Rate', 'First Cycle Intensity', '% Aligned', '% >= Q30', '% Occupancy Proxy', '% Occupied', 'Projected Yield G', 'Reads', 'Reads Pf', 'Yield G')
>>> summary_columns(level='Read')
('Cluster Count', 'Cluster Count Pf', 'Error Rate', 'First Cycle Intensity', '% Aligned', '% >= Q30', '% Occupancy Proxy', '% Occupied', 'Projected Yield G', 'Reads', 'Reads Pf', 'Yield G')
The lane/surface level give another set of columns for the summary table
>>> summary_columns(level='Lane')
('Cluster Count', 'Cluster Count Pf', 'Density', 'Density Pf', 'Error Rate', 'Error Rate 100', 'Error Rate 35', 'Error Rate 50', 'Error Rate 75', 'First Cycle Intensity', '% Aligned', '% >= Q30', '% Occupied', '% Pf', 'Phasing', 'Phasing Offset', 'Phasing Slope', 'Prephasing', 'Prephasing Offset', 'Prephasing Slope', 'Projected Yield G', 'Reads', 'Reads Pf', 'Tile Count', 'Yield G')
>>> summary_columns(level='Surface')
('Cluster Count', 'Cluster Count Pf', 'Density', 'Density Pf', 'Error Rate', 'Error Rate 100', 'Error Rate 35', 'Error Rate 50', 'Error Rate 75', 'First Cycle Intensity', '% Aligned', '% >= Q30', '% Occupied', '% Pf', 'Phasing', 'Phasing Offset', 'Phasing Slope', 'Prephasing', 'Prephasing Offset', 'Prephasing Slope', 'Projected Yield G', 'Reads', 'Reads Pf', 'Tile Count', 'Yield G')
:param level: level of the data to summarize, valid values include: 'Run', 'Read', 'Lane', 'Surface' (Default: Run)
:param ret_dict: if true, return a dict mapping from column name to method name (Default: False)
:return: tuple of columns - each column is a tuple, or a tuple of lambda functions that take the run_info as an argument
Variable Documentation
| tuple _index_summary_levels = ('Lane', 'Barcode') |
| tuple _summary_levels = ('Total', 'NonIndex', 'Read', 'Lane', 'Surface') |