Manifest Format#
POPROX experiments are described in a manifest format. Manifests are specified in TOML, and typically written as Markdown files with embedded TOML code blocks that are extracted to form the manifest data (like literate programming). See Experiment Model for a description of the model.
This document specifies the TOML layout for the manifest data.
Warning
This document is not current — it is Michael’s initial draft, but the specification has evolved since then.
Experiment Metadata#
Each experiment has metadata describing the experiment and the team responsible for it. This is typically at the top of the Markdown file, although TOML does not impose an order.
experiment
table#
The experiment
table contains the basic experiment metadata. For example:
[experiment]
# this ID is provided by POPROX when you register an experiment
id = "37DFE15F-2DBE-4F5C-AC18-AFC5031747FA"
# a short description that will appear in Poprox dashboards, etc.
description = "Better news through diversified transmogrification"
status = "draft"
The following fields are defined:
id
: the experiment UUIDdescription
: a short description for display in the PORPOX interface.status
: the experiment status (coordinate with POPROX team on updates to this).predecessor
: the UUID of the previous experiment for linked experiments.pred_aliases
: a table mapping short alias strings to UUIDs for experiments in this experiment’s predecessor chain.random.seed
: a random seed for the randomized elements of experiment setup. This is primarily for internal purposes to make user allocation deterministic while debugging. The random seed will be passed toseedbank
and used to initialize a NumPyGenerator
.
Todo
List valid status objects
owner
table#
The owner
table specifies the team responsible for the experiment:
[owner]
# obtained from the account information
team_id = "3C435603-63FD-4F96-A0D5-ADE527B43A01"
Right now, team_id
is the only supported field.
User Specification#
User specification has 3 components:
Tags: specifying tags that can apply to users and the criteria for applying them
Filters: the filters for including or excluding users
Groups: the different user groups
User Expressions#
Both tags and filters use the Common Expression Language to express conditions over user attributes. These expressions are evaluated when the user selection and group assignment runs.
The full set of attributes is defined elsewhere. Experiment-independent attributes are defined on the user
object; commonly-used ones include:
user.active
: whether the user meets POPROX’s definition of an active useruser.account_age
: the user’s account age in daysuser.state
: the state in which the user livesuser.age
: the user’s age in yearsuser.tags
: an object mapping tags to booleans, where a tag maps totrue
if it is present.user.n_unique_articles
: the number of unique articles the user has openeduser.last.month.n_unique_articles
: the number of unique articles the user has opened in the last month
The last
accessor has several subproperties for different analysis periods:
week
— the last weekmonth
— the last monthquarter
— the last 3 monthsyear
— the last year
For example, an experiment to see if a new recommendation can increase activity of low-activity users may want to select users who have not clicked any articles in the last month:
filter = "user.last.month.n_unique_articles == 0"
Results of predecessor experiment(s) can be accessed in ways:
The
prev_exp
object has the results from the predecessor experiment (if one is defined).The
past_exps
object has the results from all experiments in the predecessor chain (e.g. predecessor of the predecessor). Ifpred_aliases
is defined, then this is an object whose attribute names are the specified aliases and values are the experiment results.
The attributes and values depend on the precise experiment, but include the results of measurements.
Todo
Define full set!
Todo
Define how precisely we export measurements from one experiment to another.
Note
Investigate whether missing attributes are null or an error for tags
.
General User Options#
The users
table has several fields:
size
(uncommon): the number of users to select.tags
: the tag specificationsfilters
: the user filter specificationgroups
(required): the user group specification
If size
is specified, then a random set of users of the specified size is
selected from the users who pass the filters. If size
is not specified, it is
derived from the sizes of the groups; this is the most common design.
Tagging Users#
Users can be tagged, and these tags can be used in other filters and
selections. Tags are specified using the users.tags
table; each key is a tag,
and is a table with the following values:
include
: a CEL condition defining the users to tag with this tagexclude
: a CEL condition defining users to exclude
A user is tagged with the tag if they both satisfy the inclusion criteria and do not satisfy the exclusion criteria.
Important
The user.tags
field is not available in the tag expressions, because it hasn’t been computed yet.
The following defines a two very simple, mutually-exclusive tags for recent activity level of users:
[users.tags.high]
include = "user.last.month.n_unique_articles >= 10"
exclude = "!user.active"
[users.tags.low]
include = "user.last.month.n_unique_articles < 10"
exclude = "!user.active"
Filtering Users#
Users can also be filtered for inclusion into the experiment. Filters are specified with the users.filter
table:
[users.filter]
include = "…"
exclude = "…"
in_experiment = ""
active = true
tagged_with_any = ["tag1", "tag2"]
tagged_with_all = ["tag1", "tag2"]
All fields are optional (as is the entire users.filter
condition). The fields are defined as follows:
include
: a CEL condition defining the users to include. Can be a list, in which case the conditions are combined with AND.exclude
: a CEL condition defining the users to exclude. Can be a list, in which case the conditions are combined with AND.in_experiment
: an experiment UUID or alias, matching users who were in the specified experiment.active
: a convenience condition, iftrue
only includes active users (iffalse
it will only include inactive users).tagged_with_any
: a list of tags, a user is included if they are tagged with any of the tags.tagged_with_all
: a list of tags, a user is included only if they are tagged with all specified tags.
Users are included if and only if they satisfy *all criteria. As a shorthand, if filter
is specified as a string rather than a table, it is equivalent to a filter with that string in the include
value (allowing the filter to be specified as a single CEL expression).
Grouping Users#
Experiments almost always require dividing users into groups; an experiment is required to have at least one group.
Groups are named, and are defined under the users.groups
table, as sub-tables named with the group name. For example,
the following will select 2 identical user groups a
and b
of size 25:
[users.groups.a]
size = 25
[users.groups.b]
identical_to = "a"
The following keys are available for groups:
identical_to
: the name of a group whose settings are used for this group. If specified, no other keys can be specified. This is to make it easy to make multiple identical groups.size
: the number of users to put in this group.filter
: a filter for inclusion in the group, of the same format asusers.filter
.strata
: a specification of strata for stratified sampling, based on a set of mutually-exclusive tags.
Important
The name default
is reserved and cannot be used to name a group.
Stratified Sampling#
Stratified sampling is handled through mutually-exclusive tags; in order to use it, you first need to specify user tags.
There are two ways to specify stratified samples, both of which use the strata
field:
If
strata
is a list of tags, then an equal number of users are sampled from each tag. Thesize
must be divisible by the number of tags.If
strata
is a table, then its keys are tags, and its values are the number of users to sample with that tag.
For example, to do a stratified sample with 50 high-activity and 100 low-activity users using the tags specified above, we can write the following:
[users.groups.a.strata]
high = 50
low = 100
The strata specification will be propagated through identical_to
, so both groups a
and b
will have identical stratified samples.
If the tags specified in strata
are not mutually-exclusive (there exists a user who passes the filters and has more than one strata tag), then the user assignment will fail with an error.
Group Sizes#
In almost all cases, one of the following must be true:
users.size
is specifiedthe experiment has a predecessor
each group has a size, either from
size
,identical_to
, orstrata
The only exceptions are for POPROX internal use, or carefully-designed experiments in collaboration with POPROX.
Important
When we implement this, we need to throw a very loud warning if we don’t have sizes.
Users are assigned to groups in the following way:
Compute the set of groups that can contain each user.
Check user/group mapping. Users should be eligible for all groups, for precisely one group, or for all but one group; if this is not satisfied, allocation will raise a warning for review.
Fill each sized group in alphabetical order by sampling the specified number of users (possibly stratified) uniformly at random without replacement from the set of eligible users.
Allocate each unallocated user, in lexicographical order of user ID, between the upsized groups to which they apply; if a user is eligible for more than one group, select the group at random.
Note
The all-but-1 group option in (2) is to allow for designs where we allocate the users matching criteria to a set of groups, and have a last group to specify what to do with users who don’t match the normal allocation criteria.
Note
The ordering specifications in (3) and (4) is just to make the process deterministic for debugging and auditing.
Note
At implementation, we need good ways to audit group membership.
Recommender Specification#
The recommenders
table specifies the different recommendation treatments to which a user group can be assigned. Each recommender is named, and its name is used as the its key in the recommenders
table. There is a designated recommender, baseline
, that is the POPROX baseline experience.
Important
It is an error for an experiment to specify a recommender named baseline
.
Each recommender is defined by an endpoint implementing the POPROX management and recommendation APIs, defined in experimenter endpoints. For example, the following defines a recommender named “DivMMR”:
[recommenders.DivMMR]
endpoint = "https://poprox.cloud.drexel.edu/newsrec2024/mmr/"
modifies = "ranking"
The following fields are defined:
endpoint
: the URL to the REST endpoint.modifies
: which stage(s) of the recommendation process are modified in this experiment.description
a brief (one-line) description for display in reports and dashboards.
The modifies
Key#
The modifies
key deserves a little more discussion. This indicates to POPROX how the recommender deviates from the baseline experience.
For example, specifying ranking
says that the experiment only modifies the ranking logic, but candidate selection and final result display are using the POPROX standard implementations.
Experimenter implementations are required to provide complete results, but modifies
will be used in validation to ensure that the results of the recommender are the equivalent of what would be produced using POPROX implementations for the unmodified components. This is primarily to help audit and validate experiments, to make sure that experimenters are using the current versions of baseline components, and that their logic is not accidentally modifying parts of the recommendation experience they didn’t intend to modify (to the extent we can detect).
Note
Validation of results with respect to modifies
is best-effort. For example, if the algorithm specifies that it modifies ranking, we don’t have a way to validate that the ranking used the baseline candidate selector. We can validate that the recommender used the baseline result display logic.
Measurements#
Todo
Write out the measurement spec.
Phases and Assignments#
The final piece of an experiment is the phases. As noted in the experiment model, the experiment progresses through a series of phases, and in each phase, each group is assigned to recommendations and measurements.
Note
In POPROX 1.0, phases proceed in lockstep. Future POPROX versions may allow users or groups to progress through the experiment at different speeds based on things like click activity.
The phases
table defines these phases. It begins by specifying the phases in order:
[phases]
sequence = [
"experiment",
"followup",
]
The sequence
field is a list of phase names in the order in which they proceed.
Defining Phases#
Each phase is defined as a separate entry in the phases
table, with the following fields:
assignments
: a table of assignments, each of which assigns a group to a recommender and/or measurements.measures
: a list of measurements applied to all users in the study.
The keys of the assignments
table are the group names or the special name default
, which assigns all groups that aren’t specifically assigned.
Each value is another table with the following keys:
recommender
: the recommender to assign the group to.measures
: a list of group-specific measurements.
For example, the following will assign groups A and B to the baseline and the DivMMR
:
[phases.experiment.assignments.a]
recommender = "baseline"
[phases.experiments.assignments.b]
recommender = "DivMMR"
The measures
list, either on a phase or on an assignment, specifies the survey measurements to take. These can be either standard surveys or surveys specified in the experiment.
Todo
Finish this, and also review with survey team.