Module futureexpert.matcher

Contains the models with the configuration for the matcher and the result format.

Classes

class ActualsCovsConfiguration (**data: Any)
Expand source code
class ActualsCovsConfiguration(BaseModel):
    """Configuration of actuals and covariates via name and lag.

    Parameters
    ----------
    actuals_name: builtins.str
        Name of the time series.
    covs_configurations: builtins.list
        List of Covariates.
    """
    actuals_name: str
    covs_configurations: list[CovariateRef]

Configuration of actuals and covariates via name and lag.

Parameters

actuals_name : builtins.str
Name of the time series.
covs_configurations : builtins.list
List of Covariates.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Ancestors

  • pydantic.main.BaseModel

Class variables

var actuals_name : str
var covs_configurations : list[CovariateRef]
var model_config
class CovariateRankingDetails (**data: Any)
Expand source code
class CovariateRankingDetails(BaseModel):
    """Final rank for a given set of covariates.

    Parameters
    ----------
    rank: futureexpert.shared_models.PositiveInt
        Rank for the given set of covariates.
    covariates: builtins.list
        Used covariates (might be zero or more than one).
    """
    model_config = ConfigDict(arbitrary_types_allowed=True)
    rank: ValidatedPositiveInt
    covariates: list[Covariate]

Final rank for a given set of covariates.

Parameters

rank : PositiveInt
Rank for the given set of covariates.
covariates : builtins.list
Used covariates (might be zero or more than one).

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Ancestors

  • pydantic.main.BaseModel

Class variables

var covariates : list[Covariate]
var model_config
var rankPositiveInt
class LagSelectionConfig (**data: Any)
Expand source code
class LagSelectionConfig(BaseModel):
    """Configures covariate lag selection.

    Parameters
    ----------
    fixed_lags: typing.Optional
        Lags that are tested in the lag selection.
    min_lag: typing.Optional
        Minimal lag that is tested in the lag selection. For example, a lag 3 means the covariate
        is shifted 3 data points into the future.
    max_lag: typing.Optional
        Maximal lag that is tested in the lag selection. For example, a lag 12 means the covariate
        is shifted 12 data points into the future.
    """
    min_lag: Optional[int] = None
    max_lag: Optional[int] = None
    fixed_lags: Optional[list[int]] = None

    @model_validator(mode='after')
    def _check_range(self) -> Self:
        if (self.min_lag is None) ^ (self.max_lag is None):
            raise ValueError(
                'If one of `min_lag` and `max_lag` is set the other one also needs to be set.')

        if self.min_lag and self.max_lag:
            if self.fixed_lags is not None:
                raise ValueError('Fixed lags and min/max lag are mutually exclusive.')
            if self.max_lag < self.min_lag:
                raise ValueError('max_lag needs to be greater or equal to min_lag.')
            lag_range = abs(self.max_lag - self.min_lag) + 1
            if lag_range > 15:
                raise ValueError(f'Only 15 lags are allowed to be tested. The requested range has length {lag_range}.')

        if self.fixed_lags and len(self.fixed_lags) > 15:
            raise ValueError(
                f'Only 15 lags are allowed to be tested. The provided fixed lags has length {len(self.fixed_lags)}.')

        return self

Configures covariate lag selection.

Parameters

fixed_lags : typing.Optional
Lags that are tested in the lag selection.
min_lag : typing.Optional
Minimal lag that is tested in the lag selection. For example, a lag 3 means the covariate is shifted 3 data points into the future.
max_lag : typing.Optional
Maximal lag that is tested in the lag selection. For example, a lag 12 means the covariate is shifted 12 data points into the future.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Ancestors

  • pydantic.main.BaseModel

Class variables

var fixed_lags : list[int] | None
var max_lag : int | None
var min_lag : int | None
var model_config
class MatcherConfig (**data: Any)
Expand source code
class MatcherConfig(BaseConfig):
    """Configuration for a MATCHER run.

    Parameters
    ----------
    title: builtins.str
        A short description of the report.
    actuals_version: builtins.str
        The version ID of the actuals.
    covs_versions: builtins.list
        List of versions of the covariates.
    actuals_filter: builtins.dict
        Filter criterion for actuals time series. The given actuals version is
        automatically added as additional filter criterion. Possible Filter criteria are all fields that are part
        of the TimeSeries class. e.g. {'name': 'Sales'}
        For more complex filter check: https://www.mongodb.com/docs/manual/reference/operator/query/#query-selectors
    covs_filter: builtins.dict
        Filter criterion for covariates time series. The given covariate version is
        automatically added as additional filter criterion. Possible Filter criteria are all fields that are part
        of the TimeSeries class. e.g. {'name': 'Sales'}
        For more complex filter check: https://www.mongodb.com/docs/manual/reference/operator/query/#query-selectors
    max_ts_len: typing.Optional
        At most this number of most recent observations of the actuals time series is used. Check the variable MAX_TS_LEN_CONFIG
        for allowed configuration.
    lag_selection: futureexpert.matcher.LagSelectionConfig
        Configuration of covariate lag selection.
    evaluation_start_date: typing.Optional
        Optional start date for the evaluation. The input should be in the ISO format
        with date and time, "YYYY-mm-DDTHH-MM-SS", e.g., "2024-01-01T16:40:00".
        Actuals and covariate observations prior to this start date are dropped.
    evaluation_end_date: typing.Optional
        Optional end date for the evaluation. The input should be in the ISO format
        with date and time, "YYYY-mm-DDTHH-MM-SS", e.g., "2024-01-01T16:40:00".
        Actuals and covariate observations after this end date are dropped.
    max_publication_lag: builtins.int
        Maximal publication lag for the covariates. The publication lag of a covariate
        is the number of most recent observations (compared to the actuals) that are
        missing for the covariate. E.g., if the actuals (for monthly granularity) end
        in April 2023 but the covariate ends in February 2023, the covariate has a
        publication lag of 2.
    post_selection_queries: builtins.list
        List of queries that are executed on the ranking summary DataFrame. Only ranking entries that
        match the queries are kept. The query strings need to satisfy the pandas query syntax
        (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.query.html). Here are the columns
        of the ranking summary DataFrame that you might want to filter on:

        Column Name          |      Data Type   |    Description
        -----------------------------------------------------------------------------------------------
        Lag                  |          Int64   |    Lag of the covariate.
        Rank                 |        float64   |    Rank of the model.
        BetterThanNoCov      |           bool   |    Indicates whether the model is better than the non-cov model.
    enable_leading_covariate_selection: builtins.bool
        When True, all covariates after the lag is applied that do not have at least one more
        datapoint beyond the the time period covered by actuals are removed from the candidate
        covariates passed to covariate selection.
    fixed_season_length: typing.Optional
        An optional parameter specifying the length of a season in the dataset.
    pool_covs: typing.Optional
        List of covariate definitions.
    db_name: typing.Optional
        Only accessible for internal use. Name of the database to use for storing the results.
    """
    title: str
    actuals_version: str
    covs_versions: list[str] = Field(default_factory=list)
    actuals_filter: dict[str, Any] = Field(default_factory=dict)
    covs_filter: dict[str, Any] = Field(default_factory=dict)
    max_ts_len: Annotated[
        Optional[int], pydantic.Field(ge=1, le=1500)] = None
    lag_selection: LagSelectionConfig = LagSelectionConfig()
    evaluation_start_date: Optional[str] = None
    evaluation_end_date: Optional[str] = None
    max_publication_lag: int = 2
    post_selection_queries: list[str] = []
    enable_leading_covariate_selection: bool = True
    fixed_season_length: Optional[int] = None
    pool_covs: Optional[list[PoolCovDefinition]] = None
    db_name: Optional[str] = None

    @model_validator(mode='after')
    def _validate_post_selection_queries(self) -> Self:
        # Validate the post-selection queries.
        invalid_queries = []
        columns = {
            'Lag': 'int',
            'Rank': 'float',
            'BetterThanNoCov': 'bool'
        }
        # Create an empty DataFrame with the specified column names and data types
        validation_df = pd.DataFrame(columns=columns.keys()).astype(columns)
        for postselection_query in self.post_selection_queries:
            try:
                validation_df.query(postselection_query, )
            except Exception:
                invalid_queries.append(postselection_query)

        if len(invalid_queries):
            raise ValueError("The following post-selection queries are invalidly formatted: "
                             f"{', '.join(invalid_queries)}. ")

        return self

Configuration for a MATCHER run.

Parameters

title : builtins.str
A short description of the report.
actuals_version : builtins.str
The version ID of the actuals.
covs_versions : builtins.list
List of versions of the covariates.
actuals_filter : builtins.dict
Filter criterion for actuals time series. The given actuals version is automatically added as additional filter criterion. Possible Filter criteria are all fields that are part of the TimeSeries class. e.g. {'name': 'Sales'} For more complex filter check: https://www.mongodb.com/docs/manual/reference/operator/query/#query-selectors
covs_filter : builtins.dict
Filter criterion for covariates time series. The given covariate version is automatically added as additional filter criterion. Possible Filter criteria are all fields that are part of the TimeSeries class. e.g. {'name': 'Sales'} For more complex filter check: https://www.mongodb.com/docs/manual/reference/operator/query/#query-selectors
max_ts_len : typing.Optional
At most this number of most recent observations of the actuals time series is used. Check the variable MAX_TS_LEN_CONFIG for allowed configuration.
lag_selection : LagSelectionConfig
Configuration of covariate lag selection.
evaluation_start_date : typing.Optional
Optional start date for the evaluation. The input should be in the ISO format with date and time, "YYYY-mm-DDTHH-MM-SS", e.g., "2024-01-01T16:40:00". Actuals and covariate observations prior to this start date are dropped.
evaluation_end_date : typing.Optional
Optional end date for the evaluation. The input should be in the ISO format with date and time, "YYYY-mm-DDTHH-MM-SS", e.g., "2024-01-01T16:40:00". Actuals and covariate observations after this end date are dropped.
max_publication_lag : builtins.int
Maximal publication lag for the covariates. The publication lag of a covariate is the number of most recent observations (compared to the actuals) that are missing for the covariate. E.g., if the actuals (for monthly granularity) end in April 2023 but the covariate ends in February 2023, the covariate has a publication lag of 2.
post_selection_queries : builtins.list

List of queries that are executed on the ranking summary DataFrame. Only ranking entries that match the queries are kept. The query strings need to satisfy the pandas query syntax (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.query.html). Here are the columns of the ranking summary DataFrame that you might want to filter on:

Column Name | Data Type | Description

Lag | Int64 | Lag of the covariate. Rank | float64 | Rank of the model. BetterThanNoCov | bool | Indicates whether the model is better than the non-cov model.

enable_leading_covariate_selection : builtins.bool
When True, all covariates after the lag is applied that do not have at least one more datapoint beyond the the time period covered by actuals are removed from the candidate covariates passed to covariate selection.
fixed_season_length : typing.Optional
An optional parameter specifying the length of a season in the dataset.
pool_covs : typing.Optional
List of covariate definitions.
db_name : typing.Optional
Only accessible for internal use. Name of the database to use for storing the results.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Ancestors

Class variables

var actuals_filter : dict[str, typing.Any]
var actuals_version : str
var covs_filter : dict[str, typing.Any]
var covs_versions : list[str]
var db_name : str | None
var enable_leading_covariate_selection : bool
var evaluation_end_date : str | None
var evaluation_start_date : str | None
var fixed_season_length : int | None
var lag_selectionLagSelectionConfig
var max_publication_lag : int
var max_ts_len : int | None
var model_config
var pool_covs : list[PoolCovDefinition] | None
var post_selection_queries : list[str]
var title : str
class MatcherResult (**data: Any)
Expand source code
class MatcherResult(BaseModel):
    """Result of a covariate matcher run and the corresponding input data.

    Parameters
    ----------
    actuals: futureexpert.shared_models.TimeSeries
        Time series for which the matching was performed.
    ranking: builtins.list
        Ranking of the different covariate and non-covariate models.
    """
    actuals: TimeSeries
    ranking: list[CovariateRankingDetails]

    def convert_ranking_to_forecast_config(self) -> ActualsCovsConfiguration:
        """Converts MATCHER results into the input format for the FORECAST."""
        covs_config = [CovariateRef(name=cov.ts.name, lag=cov.lag) for r in self.ranking for cov in r.covariates]
        return ActualsCovsConfiguration(actuals_name=self.actuals.name,
                                        covs_configurations=covs_config)

Result of a covariate matcher run and the corresponding input data.

Parameters

actuals : TimeSeries
Time series for which the matching was performed.
ranking : builtins.list
Ranking of the different covariate and non-covariate models.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Ancestors

  • pydantic.main.BaseModel

Class variables

var actualsTimeSeries
var model_config
var ranking : list[CovariateRankingDetails]

Methods

def convert_ranking_to_forecast_config(self) ‑> ActualsCovsConfiguration
Expand source code
def convert_ranking_to_forecast_config(self) -> ActualsCovsConfiguration:
    """Converts MATCHER results into the input format for the FORECAST."""
    covs_config = [CovariateRef(name=cov.ts.name, lag=cov.lag) for r in self.ranking for cov in r.covariates]
    return ActualsCovsConfiguration(actuals_name=self.actuals.name,
                                    covs_configurations=covs_config)

Converts MATCHER results into the input format for the FORECAST.