poprox_recommender.lkpipeline#
A vendored copy of LensKit’s pipeline abstraction, without trainability support.
Classes
|
LensKit recommendation pipeline. |
Exceptions
Pipeline configuration errors. |
|
Pipeline configuration and setup warnings. |
- class poprox_recommender.lkpipeline.Pipeline(name=None, version=None)#
Bases:
object
LensKit recommendation pipeline. This is the core abstraction for using LensKit models and other components to produce recommendations in a useful way. It allows you to wire together components in (mostly) abitrary graphs, train them on data, and serialize pipelines to disk for use elsewhere.
If you have a scoring model and just want to generate recommenations with a default setup and minimal configuration, see
topn_pipeline()
.- Parameters:
- meta(*, include_hash=True)#
Get the metadata (name, version, hash, etc.) for this pipeline without returning the whole config.
- Parameters:
include_hash (bool) – Whether to include a configuration hash in the metadata.
- Return type:
- node(node, *, missing='error')#
Get the pipeline node with the specified name. If passed a node, it returns the node or fails if the node is not a member of the pipeline.
- create_input(name, *types)#
Create an input node for the pipeline. Pipelines expect their inputs to be provided when they are run.
- Parameters:
- Returns:
A pipeline node representing this input.
- Raises:
ValueError – a node with the specified
name
already exists.- Return type:
Node[T]
- literal(value, *, name=None)#
Create a literal node (a node with a fixed value).
Note
Literal nodes cannot be serialized witih
get_config()
orsave_config()
.- Parameters:
value (T)
name (str | None)
- Return type:
LiteralNode[T]
- set_default(name, node)#
Set the default wiring for a component input. Components that declare an input parameter with the specified
name
but no configured input will be wired to this node.This is intended to be used for things like wiring up user parameters to semi-automatically receive the target user’s identity and history.
- get_default(name)#
Get the default wiring for an input name.
- alias(alias, node)#
Create an alias for a node. After aliasing, the node can be retrieved from
node()
using either its original name or its alias.- Parameters:
- Raises:
ValueError – if the alias is already used as an alias or node name.
- Return type:
None
- add_component(name, obj, **inputs)#
Add a component and connect it into the graph.
- Parameters:
- Returns:
The node representing this component in the pipeline.
- Return type:
Node[ND]
- replace_component(name, obj, **inputs)#
Replace a component in the graph. The new component must have a type that is compatible with the old component. The old component’s input connections will be replaced (as the new component may have different inputs), but any connections that use the old component to supply an input will use the new component instead.
- connect(obj, **inputs)#
Provide additional input connections for a component that has already been added. See pipeline-connections for details.
- Parameters:
obj (str | Node[Any]) – The name or node of the component to wire.
inputs (Node[Any] | str | object) – The component’s input wiring. For each keyword argument in the component’s function signature, that argument can be provided here with an input that the pipeline will provide to that argument of the component when the pipeline is run.
- component_configs()#
Get the configurations for the components. This is the configurations only, it does not include pipeline inputs or wiring.
- clone(how='config')#
Clone the pipeline, optionally including trained parameters.
The
how
parameter controls how the pipeline is cloned, and what is available in the clone pipeline. It can be one of the following values:"config"
Create fresh component instances using the configurations of the components in this pipeline. When applied to a trained pipeline, the clone does not have the original’s learned parameters. This is the default clone method.
"pipeline-config"
Round-trip the entire pipeline through
get_config()
andfrom_config()
.
- get_config(*, include_hash=True)#
Get this pipeline’s configuration for serialization. The configuration consists of all inputs and components along with their configurations and input connections. It can be serialized to disk (in JSON, YAML, or a similar format) to save a pipeline.
The configuration does not include any trained parameter values, although the configuration may include things such as paths to checkpoints to load such parameters, depending on the design of the components in the pipeline.
Note
Literal nodes (from
literal()
, or literal values wired to inputs) cannot be serialized, and this method will fail if they are present in the pipeline.- Parameters:
include_hash (bool)
- Return type:
- config_hash()#
Get a hash of the pipeline’s configuration to uniquely identify it for logging, version control, or other purposes.
The hash format and algorithm are not guaranteed, but is stable within a LensKit version. For the same version of LensKit and component code, the same configuration will produce the same hash, so long as there are no literal nodes. Literal nodes will usually hash consistently, but since literals other than basic JSON values are hashed by pickling, hash stability depends on the stability of the pickle bytestream.
In LensKit 2024.1, the configuration hash is computed by computing the JSON serialization of the pipeline configuration without a hash and returning the hex-encoded SHA256 hash of that configuration.
- Return type:
- run(*nodes, **kwargs)#
Run the pipeline and obtain the return value(s) of one or more of its components. See pipeline-execution for details of the pipeline execution model.
- Parameters:
kwargs (object) – The pipeline’s inputs, as defined with
create_input()
.
- Returns:
The pipeline result. If zero or one nodes are specified, the result is returned as-is. If multiple nodes are specified, their results are returned in a tuple.
- Raises:
PipelineError – when there is a pipeline configuration error (e.g. a cycle).
ValueError – when one or more required inputs are missing.
TypeError – when one or more required inputs has an incompatible type.
other – exceptions thrown by components are passed through.
- Return type:
- run_all(*nodes, **kwargs)#
Run all nodes in the pipeline, or all nodes required to fulfill the requested node, and return a mapping with the full pipeline state (the data attached to each node). This is useful in cases where client code needs to be able to inspect the data at arbitrary steps of the pipeline. It differs from
run()
in two ways:It returns the data from all nodes as a mapping (dictionary-like object), not just the specified nodes as a tuple.
If no nodes are specified, it runs all nodes instead of only the last node. This has the consequence of running nodes that are not required to fulfill the last node (such scenarios typically result from using
use_first_of()
).
- Parameters:
- Returns:
The full pipeline state, with
default
set to the last node specified (either the last node in nodes, or the last node added to the pipeline).- Return type:
- exception poprox_recommender.lkpipeline.PipelineError#
Bases:
Exception
Pipeline configuration errors.
Note
This exception is only to note problems with the pipeline configuration and structure (e.g. circular dependencies). Errors running the pipeline are raised as-is.
- exception poprox_recommender.lkpipeline.PipelineWarning#
Bases:
Warning
Pipeline configuration and setup warnings. We also emit warnings to the logger in many cases, but this allows critical ones to be visible even if the client code has not enabled logging.
Note
This warning is only to note problems with the pipeline configuration and structure (e.g. circular dependencies). Errors running the pipeline are raised as-is.
- class poprox_recommender.lkpipeline.Node(name, *, types=None)#
Bases:
Generic
[ND
]Representation of a single node in a
Pipeline
.
- class poprox_recommender.lkpipeline.Configurable(*args, **kwargs)#
Bases:
Protocol
Interface for configurable objects such as pipeline components with settings or hyperparameters. A configurable object supports two operations:
saving its configuration with
get_config()
.creating a new instance from a saved configuration with the class method
from_config()
.
An object must implement both of these methods to be considered configurable. Components extending the
Component
class automatically have working versions of these methods if they define their constructor parameters and fields appropriately.Note
Configuration data should be JSON-compatible (strings, numbers, etc.).
- classmethod from_config(cfg)#
Reinstantiate this component from configuration values.
- class poprox_recommender.lkpipeline.PipelineConfig(**data)#
Bases:
BaseModel
Root type for serialized pipeline configuration. A pipeline config contains the full configuration, components, and wiring for the pipeline, but does not contain the
- Parameters:
data (Any)
- meta: PipelineMeta#
Pipeline metadata.
- inputs: list[PipelineInput]#
Pipeline inputs.
- components: OrderedDict[str, PipelineComponent]#
Pipeline components, with their configurations and wiring.
- literals: dict[str, PipelineLiteral]#
Literals
- model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}#
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'aliases': FieldInfo(annotation=dict[str, str], required=False, default_factory=dict), 'components': FieldInfo(annotation=OrderedDict[str, PipelineComponent], required=False, default_factory=OrderedDict), 'defaults': FieldInfo(annotation=dict[str, str], required=False, default_factory=dict), 'inputs': FieldInfo(annotation=list[PipelineInput], required=False, default_factory=list), 'literals': FieldInfo(annotation=dict[str, PipelineLiteral], required=False, default_factory=dict), 'meta': FieldInfo(annotation=PipelineMeta, required=True)}#
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- class poprox_recommender.lkpipeline.Lazy(*args, **kwargs)#
-
Type for accepting lazy inputs from the pipeline runner. If your function may or may not need one of its inputs, declare the type with this to only run it as needed:
def my_component(input: str, backup: Lazy[str]) -> str: if input == 'invalid': return backup.get() else: return input
- get()#
Get the value behind this lazy instance.
- Return type:
T
- class poprox_recommender.lkpipeline.Component(*args, **kwargs)#
Bases:
Configurable
,Generic
[COut
]Base class for pipeline component objects. Any component that is not just a function should extend this class.
Components are
Configurable
. The base class provides default implementations ofget_config()
andfrom_config()
that inspect the constructor arguments and instance variables to automatically provide configuration support. By default, all constructor parameters will be considered configuration parameters, and their values will be read from instance variables of the same name. Components can also defineEXTRA_CONFIG_FIELDS
andIGNORED_CONFIG_FIELDS
class variables to modify this behavior. Missing attributes are silently ignored.To work as components, derived classes also need to implement a
__call__
method to perform their operations.- EXTRA_CONFIG_FIELDS: ClassVar[list[str]] = []#
Names of instance variables that should be included in the configuration dictionary even though they do not correspond to named constructor arguments.
Note
This is rarely needed, and usually needs to be coupled with
**kwargs
in the constructor to make the resulting objects constructible.
- IGNORED_CONFIG_FIELDS: ClassVar[list[str]] = []#
Names of constructor parameters that should be excluded from the configuration dictionary.
- get_config()#
Get the configuration by inspecting the constructor and instance variables.
Modules