Manifest Format#

POPROX experiments are described in a manifest format. Manifests are specified in TOML, and typically written as Markdown files with embedded TOML code blocks that are extracted to form the manifest data (like literate programming). See Experiment Model for a description of the model.

This document specifies the TOML layout for the manifest data.

Warning

This document is not current — it is Michael’s initial draft, but the specification has evolved since then.

Experiment Metadata#

Each experiment has metadata describing the experiment and the team responsible for it. This is typically at the top of the Markdown file, although TOML does not impose an order.

experiment table#

The experiment table contains the basic experiment metadata. For example:

[experiment]
# this ID is provided by POPROX when you register an experiment
id = "37DFE15F-2DBE-4F5C-AC18-AFC5031747FA"
# a short description that will appear in Poprox dashboards, etc.
description = "Better news through diversified transmogrification"
status = "draft"

The following fields are defined:

  • id: the experiment UUID

  • description: a short description for display in the PORPOX interface.

  • status: the experiment status (coordinate with POPROX team on updates to this).

  • predecessor: the UUID of the previous experiment for linked experiments.

  • pred_aliases: a table mapping short alias strings to UUIDs for experiments in this experiment’s predecessor chain.

  • random.seed: a random seed for the randomized elements of experiment setup. This is primarily for internal purposes to make user allocation deterministic while debugging. The random seed will be passed to seedbank and used to initialize a NumPy Generator.

Todo

List valid status objects

owner table#

The owner table specifies the team responsible for the experiment:

[owner]
# obtained from the account information
team_id = "3C435603-63FD-4F96-A0D5-ADE527B43A01"

Right now, team_id is the only supported field.

User Specification#

User specification has 3 components:

  • Tags: specifying tags that can apply to users and the criteria for applying them

  • Filters: the filters for including or excluding users

  • Groups: the different user groups

User Expressions#

Both tags and filters use the Common Expression Language to express conditions over user attributes. These expressions are evaluated when the user selection and group assignment runs.

The full set of attributes is defined elsewhere. Experiment-independent attributes are defined on the user object; commonly-used ones include:

  • user.active: whether the user meets POPROX’s definition of an active user

  • user.account_age: the user’s account age in days

  • user.state: the state in which the user lives

  • user.age: the user’s age in years

  • user.tags: an object mapping tags to booleans, where a tag maps to true if it is present.

  • user.n_unique_articles: the number of unique articles the user has opened

  • user.last.month.n_unique_articles: the number of unique articles the user has opened in the last month

The last accessor has several subproperties for different analysis periods:

  • week — the last week

  • month — the last month

  • quarter — the last 3 months

  • year — the last year

For example, an experiment to see if a new recommendation can increase activity of low-activity users may want to select users who have not clicked any articles in the last month:

filter = "user.last.month.n_unique_articles == 0"

Results of predecessor experiment(s) can be accessed in ways:

  • The prev_exp object has the results from the predecessor experiment (if one is defined).

  • The past_exps object has the results from all experiments in the predecessor chain (e.g. predecessor of the predecessor). If pred_aliases is defined, then this is an object whose attribute names are the specified aliases and values are the experiment results.

The attributes and values depend on the precise experiment, but include the results of measurements.

Todo

Define full set!

Todo

Define how precisely we export measurements from one experiment to another.

Note

Investigate whether missing attributes are null or an error for tags.

General User Options#

The users table has several fields:

If size is specified, then a random set of users of the specified size is selected from the users who pass the filters. If size is not specified, it is derived from the sizes of the groups; this is the most common design.

Tagging Users#

Users can be tagged, and these tags can be used in other filters and selections. Tags are specified using the users.tags table; each key is a tag, and is a table with the following values:

  • include: a CEL condition defining the users to tag with this tag

  • exclude: a CEL condition defining users to exclude

A user is tagged with the tag if they both satisfy the inclusion criteria and do not satisfy the exclusion criteria.

Important

The user.tags field is not available in the tag expressions, because it hasn’t been computed yet.

The following defines a two very simple, mutually-exclusive tags for recent activity level of users:

[users.tags.high]
include = "user.last.month.n_unique_articles >= 10"
exclude = "!user.active"

[users.tags.low]
include = "user.last.month.n_unique_articles < 10"
exclude = "!user.active"

Filtering Users#

Users can also be filtered for inclusion into the experiment. Filters are specified with the users.filter table:

[users.filter]
include = "…"
exclude = "…"
in_experiment = ""
active = true
tagged_with_any = ["tag1", "tag2"]
tagged_with_all = ["tag1", "tag2"]

All fields are optional (as is the entire users.filter condition). The fields are defined as follows:

  • include: a CEL condition defining the users to include. Can be a list, in which case the conditions are combined with AND.

  • exclude: a CEL condition defining the users to exclude. Can be a list, in which case the conditions are combined with AND.

  • in_experiment: an experiment UUID or alias, matching users who were in the specified experiment.

  • active: a convenience condition, if true only includes active users (if false it will only include inactive users).

  • tagged_with_any: a list of tags, a user is included if they are tagged with any of the tags.

  • tagged_with_all: a list of tags, a user is included only if they are tagged with all specified tags.

Users are included if and only if they satisfy *all criteria. As a shorthand, if filter is specified as a string rather than a table, it is equivalent to a filter with that string in the include value (allowing the filter to be specified as a single CEL expression).

Grouping Users#

Experiments almost always require dividing users into groups; an experiment is required to have at least one group. Groups are named, and are defined under the users.groups table, as sub-tables named with the group name. For example, the following will select 2 identical user groups a and b of size 25:

[users.groups.a]
size = 25

[users.groups.b]
identical_to = "a"

The following keys are available for groups:

  • identical_to: the name of a group whose settings are used for this group. If specified, no other keys can be specified. This is to make it easy to make multiple identical groups.

  • size: the number of users to put in this group.

  • filter: a filter for inclusion in the group, of the same format as users.filter.

  • strata: a specification of strata for stratified sampling, based on a set of mutually-exclusive tags.

Important

The name default is reserved and cannot be used to name a group.

Stratified Sampling#

Stratified sampling is handled through mutually-exclusive tags; in order to use it, you first need to specify user tags. There are two ways to specify stratified samples, both of which use the strata field:

  • If strata is a list of tags, then an equal number of users are sampled from each tag. The size must be divisible by the number of tags.

  • If strata is a table, then its keys are tags, and its values are the number of users to sample with that tag.

For example, to do a stratified sample with 50 high-activity and 100 low-activity users using the tags specified above, we can write the following:

[users.groups.a.strata]
high = 50
low = 100

The strata specification will be propagated through identical_to, so both groups a and b will have identical stratified samples.

If the tags specified in strata are not mutually-exclusive (there exists a user who passes the filters and has more than one strata tag), then the user assignment will fail with an error.

Group Sizes#

In almost all cases, one of the following must be true:

  • users.size is specified

  • the experiment has a predecessor

  • each group has a size, either from size, identical_to, or strata

The only exceptions are for POPROX internal use, or carefully-designed experiments in collaboration with POPROX.

Important

When we implement this, we need to throw a very loud warning if we don’t have sizes.

Users are assigned to groups in the following way:

  1. Compute the set of groups that can contain each user.

  2. Check user/group mapping. Users should be eligible for all groups, for precisely one group, or for all but one group; if this is not satisfied, allocation will raise a warning for review.

  3. Fill each sized group in alphabetical order by sampling the specified number of users (possibly stratified) uniformly at random without replacement from the set of eligible users.

  4. Allocate each unallocated user, in lexicographical order of user ID, between the upsized groups to which they apply; if a user is eligible for more than one group, select the group at random.

Note

The all-but-1 group option in (2) is to allow for designs where we allocate the users matching criteria to a set of groups, and have a last group to specify what to do with users who don’t match the normal allocation criteria.

Note

The ordering specifications in (3) and (4) is just to make the process deterministic for debugging and auditing.

Note

At implementation, we need good ways to audit group membership.

Recommender Specification#

The recommenders table specifies the different recommendation treatments to which a user group can be assigned. Each recommender is named, and its name is used as the its key in the recommenders table. There is a designated recommender, baseline, that is the POPROX baseline experience.

Important

It is an error for an experiment to specify a recommender named baseline.

Each recommender is defined by an endpoint implementing the POPROX management and recommendation APIs, defined in experimenter endpoints. For example, the following defines a recommender named “DivMMR”:

[recommenders.DivMMR]
endpoint = "https://poprox.cloud.drexel.edu/newsrec2024/mmr/"
modifies = "ranking"

The following fields are defined:

  • endpoint: the URL to the REST endpoint.

  • modifies: which stage(s) of the recommendation process are modified in this experiment.

  • description a brief (one-line) description for display in reports and dashboards.

The modifies Key#

The modifies key deserves a little more discussion. This indicates to POPROX how the recommender deviates from the baseline experience.

For example, specifying ranking says that the experiment only modifies the ranking logic, but candidate selection and final result display are using the POPROX standard implementations.

Experimenter implementations are required to provide complete results, but modifies will be used in validation to ensure that the results of the recommender are the equivalent of what would be produced using POPROX implementations for the unmodified components. This is primarily to help audit and validate experiments, to make sure that experimenters are using the current versions of baseline components, and that their logic is not accidentally modifying parts of the recommendation experience they didn’t intend to modify (to the extent we can detect).

Note

Validation of results with respect to modifies is best-effort. For example, if the algorithm specifies that it modifies ranking, we don’t have a way to validate that the ranking used the baseline candidate selector. We can validate that the recommender used the baseline result display logic.

Measurements#

Todo

Write out the measurement spec.

Phases and Assignments#

The final piece of an experiment is the phases. As noted in the experiment model, the experiment progresses through a series of phases, and in each phase, each group is assigned to recommendations and measurements.

Note

In POPROX 1.0, phases proceed in lockstep. Future POPROX versions may allow users or groups to progress through the experiment at different speeds based on things like click activity.

The phases table defines these phases. It begins by specifying the phases in order:

[phases]
sequence = [
    "experiment",
    "followup",
]

The sequence field is a list of phase names in the order in which they proceed.

Defining Phases#

Each phase is defined as a separate entry in the phases table, with the following fields:

  • assignments: a table of assignments, each of which assigns a group to a recommender and/or measurements.

  • measures: a list of measurements applied to all users in the study.

The keys of the assignments table are the group names or the special name default, which assigns all groups that aren’t specifically assigned. Each value is another table with the following keys:

  • recommender: the recommender to assign the group to.

  • measures: a list of group-specific measurements.

For example, the following will assign groups A and B to the baseline and the DivMMR:

[phases.experiment.assignments.a]
recommender = "baseline"

[phases.experiments.assignments.b]
recommender = "DivMMR"

The measures list, either on a phase or on an assignment, specifies the survey measurements to take. These can be either standard surveys or surveys specified in the experiment.

Todo

Finish this, and also review with survey team.