Module `futureexpert.matcher`

Contains the models with the configuration for the matcher and the result format.

Classes

class ActualsCovsConfiguration (**data: Any)

Expand source code

class ActualsCovsConfiguration(BaseModel):
    """Configuration of actuals and covariates via name and lag.

    Parameters
    ----------
    actuals_name: builtins.str
        Name of the time series.
    covs_configurations: builtins.list[futureexpert.shared_models.CovariateRef]
        List of Covariates.
    """
    actuals_name: str
    covs_configurations: list[CovariateRef]

Configuration of actuals and covariates via name and lag.

Parameters

actuals_name : builtins.str: Name of the time series.
covs_configurations : builtins.list[CovariateRef]: List of Covariates.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Ancestors

pydantic.main.BaseModel

Class variables

var actuals_name : str
var covs_configurations : list[CovariateRef]
var model_config

class CovariateRankingDetails (**data: Any)

Expand source code

class CovariateRankingDetails(BaseModel):
    """Final rank for a given set of covariates.

    Parameters
    ----------
    rank: futureexpert.shared_models.PositiveInt
        Rank for the given set of covariates.
    covariates: builtins.list[futureexpert.shared_models.Covariate]
        Used covariates (might be zero or more than one).
    """
    model_config = ConfigDict(arbitrary_types_allowed=True)
    rank: ValidatedPositiveInt
    covariates: list[Covariate]

Final rank for a given set of covariates.

Parameters

rank : PositiveInt: Rank for the given set of covariates.
covariates : builtins.list[Covariate]: Used covariates (might be zero or more than one).

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Ancestors

pydantic.main.BaseModel

Class variables

var covariates : list[Covariate]
var model_config
var rank : PositiveInt

class LagSelectionConfig (**data: Any)

Expand source code

class LagSelectionConfig(BaseModel):
    """Configures covariate lag selection.

    Parameters
    ----------
    fixed_lags: typing.Optional[builtins.list[builtins.int]]
        Lags that are tested in the lag selection.
    min_lag: typing.Optional[builtins.int]
        Minimal lag that is tested in the lag selection. For example, a lag 3 means the covariate
        is shifted 3 data points into the future.
    max_lag: typing.Optional[builtins.int]
        Maximal lag that is tested in the lag selection. For example, a lag 12 means the covariate
        is shifted 12 data points into the future.
    """
    min_lag: Optional[int] = None
    max_lag: Optional[int] = None
    fixed_lags: Optional[list[int]] = None

    @model_validator(mode='after')
    def _check_range(self) -> Self:
        if (self.min_lag is None) ^ (self.max_lag is None):
            raise ValueError(
                'If one of `min_lag` and `max_lag` is set the other one also needs to be set.')

        if self.min_lag and self.max_lag:
            if self.fixed_lags is not None:
                raise ValueError('Fixed lags and min/max lag are mutually exclusive.')
            if self.max_lag < self.min_lag:
                raise ValueError('max_lag needs to be greater or equal to min_lag.')
            lag_range = abs(self.max_lag - self.min_lag) + 1
            if lag_range > 15:
                raise ValueError(f'Only 15 lags are allowed to be tested. The requested range has length {lag_range}.')

        if self.fixed_lags and len(self.fixed_lags) > 15:
            raise ValueError(
                f'Only 15 lags are allowed to be tested. The provided fixed lags has length {len(self.fixed_lags)}.')

        return self

Configures covariate lag selection.

Parameters

fixed_lags : typing.Optional[builtins.list[builtins.int]]: Lags that are tested in the lag selection.
min_lag : typing.Optional[builtins.int]: Minimal lag that is tested in the lag selection. For example, a lag 3 means the covariate is shifted 3 data points into the future.
max_lag : typing.Optional[builtins.int]: Maximal lag that is tested in the lag selection. For example, a lag 12 means the covariate is shifted 12 data points into the future.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Ancestors

pydantic.main.BaseModel

Class variables

var fixed_lags : list[int] | None
var max_lag : int | None
var min_lag : int | None
var model_config

class MatcherConfig (**data: Any)

Expand source code

class MatcherConfig(BaseConfig):
    """Configuration for a MATCHER run.

    Parameters
    ----------
    title: builtins.str
        A short description of the report.
    actuals_version: builtins.str
        The version ID of the actuals.
    covs_versions: builtins.list[builtins.str]
        List of versions of the covariates.
    actuals_filter: builtins.dict[builtins.str, typing.Any]
        Filter criterion for actuals time series. The given actuals version is
        automatically added as additional filter criterion. Possible Filter criteria are all fields that are part
        of the TimeSeries class. e.g. {'name': 'Sales'}
        For more complex filter check: https://www.mongodb.com/docs/manual/reference/operator/query/#query-selectors
    covs_filter: builtins.dict[builtins.str, typing.Any]
        Filter criterion for covariates time series. The given covariate version is
        automatically added as additional filter criterion. Possible Filter criteria are all fields that are part
        of the TimeSeries class. e.g. {'name': 'Sales'}
        For more complex filter check: https://www.mongodb.com/docs/manual/reference/operator/query/#query-selectors
    max_ts_len: typing.Optional[builtins.int]
        At most this number of most recent observations of the actuals time series is used. Check the variable MAX_TS_LEN_CONFIG
        for allowed configuration.
    lag_selection: futureexpert.matcher.LagSelectionConfig
        Configuration of covariate lag selection.
    evaluation_start_date: typing.Optional[builtins.str]
        Optional start date for the evaluation. The input should be in the ISO format
        with date and time, "YYYY-mm-DDTHH-MM-SS", e.g., "2024-01-01T16:40:00".
        Actuals and covariate observations prior to this start date are dropped.
    evaluation_end_date: typing.Optional[builtins.str]
        Optional end date for the evaluation. The input should be in the ISO format
        with date and time, "YYYY-mm-DDTHH-MM-SS", e.g., "2024-01-01T16:40:00".
        Actuals and covariate observations after this end date are dropped.
    max_publication_lag: builtins.int
        Maximal publication lag for the covariates. The publication lag of a covariate
        is the number of most recent observations (compared to the actuals) that are
        missing for the covariate. E.g., if the actuals (for monthly granularity) end
        in April 2023 but the covariate ends in February 2023, the covariate has a
        publication lag of 2.
    post_selection_queries: builtins.list[builtins.str]
        List of queries that are executed on the ranking summary DataFrame. Only ranking entries that
        match the queries are kept. The query strings need to satisfy the pandas query syntax
        (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.query.html). Here are the columns
        of the ranking summary DataFrame that you might want to filter on:

        Column Name          |      Data Type   |    Description
        -----------------------------------------------------------------------------------------------
        Lag                  |          Int64   |    Lag of the covariate.
        Rank                 |        float64   |    Rank of the model.
        BetterThanNoCov      |           bool   |    Indicates whether the model is better than the non-cov model.
    enable_leading_covariate_selection: builtins.bool
        When True, all covariates after the lag is applied that do not have at least one more
        datapoint beyond the the time period covered by actuals are removed from the candidate
        covariates passed to covariate selection.
    fixed_season_length: typing.Optional[builtins.int]
        An optional parameter specifying the length of a season in the dataset.
    pool_covs: typing.Optional[builtins.list[futureexpert.pool.PoolCovDefinition]]
        List of covariate definitions.
    db_name: typing.Optional[builtins.str]
        Only accessible for internal use. Name of the database to use for storing the results.
    """
    title: str
    actuals_version: str
    covs_versions: list[str] = Field(default_factory=list)
    actuals_filter: dict[str, Any] = Field(default_factory=dict)
    covs_filter: dict[str, Any] = Field(default_factory=dict)
    max_ts_len: Annotated[
        Optional[int], pydantic.Field(ge=1, le=1500)] = None
    lag_selection: LagSelectionConfig = LagSelectionConfig()
    evaluation_start_date: Optional[str] = None
    evaluation_end_date: Optional[str] = None
    max_publication_lag: int = 2
    post_selection_queries: list[str] = []
    enable_leading_covariate_selection: bool = True
    fixed_season_length: Optional[int] = None
    pool_covs: Optional[list[PoolCovDefinition]] = None
    db_name: Optional[str] = None

    @model_validator(mode='after')
    def _validate_post_selection_queries(self) -> Self:
        # Validate the post-selection queries.
        invalid_queries = []
        columns = {
            'Lag': 'int',
            'Rank': 'float',
            'BetterThanNoCov': 'bool'
        }
        # Create an empty DataFrame with the specified column names and data types
        validation_df = pd.DataFrame(columns=columns.keys()).astype(columns)
        for postselection_query in self.post_selection_queries:
            try:
                validation_df.query(postselection_query, )
            except Exception:
                invalid_queries.append(postselection_query)

        if len(invalid_queries):
            raise ValueError("The following post-selection queries are invalidly formatted: "
                             f"{', '.join(invalid_queries)}. ")

        return self

Configuration for a MATCHER run.

Parameters

title : builtins.str

A short description of the report.

actuals_version : builtins.str

The version ID of the actuals.

covs_versions : builtins.list[builtins.str]

List of versions of the covariates.

actuals_filter : builtins.dict[builtins.str, typing.Any]

Filter criterion for actuals time series. The given actuals version is automatically added as additional filter criterion. Possible Filter criteria are all fields that are part of the TimeSeries class. e.g. {'name': 'Sales'} For more complex filter check: https://www.mongodb.com/docs/manual/reference/operator/query/#query-selectors

covs_filter : builtins.dict[builtins.str, typing.Any]

Filter criterion for covariates time series. The given covariate version is automatically added as additional filter criterion. Possible Filter criteria are all fields that are part of the TimeSeries class. e.g. {'name': 'Sales'} For more complex filter check: https://www.mongodb.com/docs/manual/reference/operator/query/#query-selectors

max_ts_len : typing.Optional[builtins.int]

At most this number of most recent observations of the actuals time series is used. Check the variable MAX_TS_LEN_CONFIG for allowed configuration.

lag_selection : LagSelectionConfig

Configuration of covariate lag selection.

evaluation_start_date : typing.Optional[builtins.str]

Optional start date for the evaluation. The input should be in the ISO format with date and time, "YYYY-mm-DDTHH-MM-SS", e.g., "2024-01-01T16:40:00". Actuals and covariate observations prior to this start date are dropped.

evaluation_end_date : typing.Optional[builtins.str]

Optional end date for the evaluation. The input should be in the ISO format with date and time, "YYYY-mm-DDTHH-MM-SS", e.g., "2024-01-01T16:40:00". Actuals and covariate observations after this end date are dropped.

max_publication_lag : builtins.int

Maximal publication lag for the covariates. The publication lag of a covariate is the number of most recent observations (compared to the actuals) that are missing for the covariate. E.g., if the actuals (for monthly granularity) end in April 2023 but the covariate ends in February 2023, the covariate has a publication lag of 2.

post_selection_queries : builtins.list[builtins.str]

List of queries that are executed on the ranking summary DataFrame. Only ranking entries that match the queries are kept. The query strings need to satisfy the pandas query syntax (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.query.html). Here are the columns of the ranking summary DataFrame that you might want to filter on:

Column Name | Data Type | Description

enable_leading_covariate_selection : builtins.bool

When True, all covariates after the lag is applied that do not have at least one more datapoint beyond the the time period covered by actuals are removed from the candidate covariates passed to covariate selection.

fixed_season_length : typing.Optional[builtins.int]

An optional parameter specifying the length of a season in the dataset.

pool_covs : typing.Optional[builtins.list[PoolCovDefinition]]

List of covariate definitions.

db_name : typing.Optional[builtins.str]

Only accessible for internal use. Name of the database to use for storing the results.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Ancestors

BaseConfig
pydantic.main.BaseModel

Class variables

var actuals_filter : dict[str, typing.Any]
var actuals_version : str
var covs_filter : dict[str, typing.Any]
var covs_versions : list[str]
var db_name : str | None
var enable_leading_covariate_selection : bool
var evaluation_end_date : str | None
var evaluation_start_date : str | None
var fixed_season_length : int | None
var lag_selection : LagSelectionConfig
var max_publication_lag : int
var max_ts_len : int | None
var model_config
var pool_covs : list[PoolCovDefinition] | None
var post_selection_queries : list[str]
var title : str

class MatcherResult (**data: Any)

Expand source code

class MatcherResult(BaseModel):
    """Result of a covariate matcher run and the corresponding input data.

    Parameters
    ----------
    actuals: futureexpert.shared_models.TimeSeries
        Time series for which the matching was performed.
    ranking: builtins.list[futureexpert.matcher.CovariateRankingDetails]
        Ranking of the different covariate and non-covariate models.
    """
    actuals: TimeSeries
    ranking: list[CovariateRankingDetails]

    def convert_ranking_to_forecast_config(self) -> ActualsCovsConfiguration:
        """Converts MATCHER results into the input format for the FORECAST.

    Parameters
    ----------
    return: futureexpert.matcher.ActualsCovsConfiguration

    """
        covs_config = [CovariateRef(name=cov.ts.name, lag=cov.lag) for r in self.ranking for cov in r.covariates]
        return ActualsCovsConfiguration(actuals_name=self.actuals.name,
                                        covs_configurations=covs_config)

Result of a covariate matcher run and the corresponding input data.

Parameters

actuals : TimeSeries: Time series for which the matching was performed.
ranking : builtins.list[CovariateRankingDetails]: Ranking of the different covariate and non-covariate models.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Ancestors

pydantic.main.BaseModel

Class variables

var actuals : TimeSeries
var model_config
var ranking : list[CovariateRankingDetails]

Methods

def convert_ranking_to_forecast_config(self) ‑> ActualsCovsConfiguration

Expand source code

def convert_ranking_to_forecast_config(self) -> ActualsCovsConfiguration:
    """Converts MATCHER results into the input format for the FORECAST.

Parameters
----------
return: futureexpert.matcher.ActualsCovsConfiguration

"""
    covs_config = [CovariateRef(name=cov.ts.name, lag=cov.lag) for r in self.ranking for cov in r.covariates]
    return ActualsCovsConfiguration(actuals_name=self.actuals.name,
                                    covs_configurations=covs_config)

Converts MATCHER results into the input format for the FORECAST.

Parameters

return : ActualsCovsConfiguration