adopy.base
¶
This module contains three basic classes of ADOpy: Task, Model, and Engine. These classes provide built-in functions for the Adaptive Design Optimization.
Note
Three basic classes are defined in the adopy.base
(i.e.,
adopy.base.Task
, adopy.base.Model
, and
adopy.base.Engine
). However, for convinience, users can import them
directly as adopy.Task
, adopy.Model
, and
adopy.Engine
.
from adopy import Task, Model, Engine
# works the same as
from adopy.base import Task, Model, Engine
Task¶
-
class
adopy.base.
Task
(designs, responses, name=None)¶ Bases:
object
A task object stores information for a specific experimental task, including labels of design variables (
designs
), labels of possible responses (responses
), and the task name (name
).Changed in version 0.4.0: The
response
argument is changed to the labels of response variables, instead of possible values of a response variable.- Parameters
designs – Labels of design variables in the task.
responses – Labels of response variables in the task (e.g., choice, rt).
name – Name of the task.
Examples
>>> task = Task(name='Task A', ... designs=['d1', 'd2'], ... responses=['choice']) >>> task Task('Task A', designs=['d1', 'd2'], responses=['choice']) >>> task.name 'Task A' >>> task.designs ['d1', 'd2'] >>> task.responses ['choice']
-
property
name
¶ Name of the task. If it has no name, returns
None
.
-
property
designs
¶ Labels for design variables of the task.
-
property
responses
¶ Labels of response variables in the task.
-
extract_responses
(data)¶ Extract response grids from the given data.
- Parameters
data – A data object that contains key-value pairs or columns corresponding to design variables.
- Returns
ret – An ordered dictionary of grids for response variables.
Model¶
-
class
adopy.base.
Model
(task, params, func=None, name=None)¶ Bases:
object
A base class for a model in the ADOpy package.
Its initialization requires up to 4 arguments:
task
,params
,func
(optional), andname
(optional).task
is an instance of aadopy.base.Task
class that this model is for.params
is a list of model parameters, given as a list of their labels, e.g.,['alpha', 'beta']
.name
is the name of this model, which is optional for its functioning.The most important argument is
func
, which calculates the log likelihood given with design values, parameter values, and response values as its input. The arguments of the function should include design variables and response variables (defined in thetask
: instance) and model parameters (given asparams
). The order of arguments does not matter. Iffunc
is not given, the model provides the log likelihood of a random noise. An simple example is given as follows:def compute_log_lik(design1, design2, param1, param2, param3, response1, response2): # ... calculating the log likelihood ... return log_lik
Warning
Since the version 0.4.0, the
func
argument should be defined to compute the log likelihood, instead of the probability of a binary response variable. Also, it should include the response variables as arguments. These changes might break existing codes using the previous versions of ADOpy.Changed in version 0.4.0: The
func
argument is changed to the log likelihood function, instead of the probability function of a single binary response.- Parameters
task (Task) – Task object that this model object is for.
params – Labels of model parameters in the model.
func (function, optional) – A function to compute the log likelihood given a model, denoted as \(L(\mathbf{x} | \mathbf{d}, \mathbf{\theta})\), where \(\mathbf{x}\) is a response vector, \(\mathbf{d}\) is a design vector, and \(\mathbf{\theta}\) is a parameter vector. Note that the function arguments should include all design, parameter, and response variables.
name (Optional[str]) – Name of the task.
Examples
>>> task = Task(name='Task A', designs=['x1', 'x2'], responses=['y']) >>> def calculate_log_lik(y, x1, x2, b0, b1, b2): ... import numpy as np ... from scipy.stats import bernoulli ... logit = b0 + b1 * x1 + b2 * x2 ... p = np.divide(1, 1 + np.exp(-logit)) ... return bernoulli.logpmf(y, p) >>> model = Model(name='Model X', task=task, params=['b0', 'b1', 'b2'], ... func=calculate_log_lik) >>> model.name 'Model X' >>> model.task Task('Task A', designs=['x1', 'x2'], responses=['y']) >>> model.params ['b0', 'b1', 'b2'] >>> model.compute(y=1, x1=1, x2=-1, b0=1, b1=0.5, b2=0.25) -0.251929081345373 >>> compute_log_lik(y=1, x1=1, x2=-1, b0=1, b1=0.5, b2=0.25) -0.251929081345373
-
property
name
¶ Name of the model. If it has no name, returns
None
.
-
property
task
¶ Task instance for the model.
-
property
params
¶ Labels for model parameters of the model.
-
compute
(*args, **kargs)¶ Compute log likelihood of obtaining responses with given designs and model parameters. The function provide the same result as the argument
func
given in the initialization. If the likelihood function is not given for the model, it returns the log probability of a random noise.Warning
Since the version 0.4.0,
compute()
function should compute the log likelihood, instead of the probability of a binary response variable. Also, it should include the response variables as arguments. These changes might break existing codes using the previous versions of ADOpy.Changed in version 0.4.0: Provide the log likelihood instead of the probability of a binary response.
Engine¶
-
class
adopy.base.
Engine
(task, model, grid_design, grid_param, grid_response, noise_ratio=1e-07, dtype=<class 'numpy.float32'>)¶ Bases:
object
A base class for an ADO engine to compute optimal designs.
-
property
task
¶ Task instance for the engine.
-
property
model
¶ Model instance for the engine.
-
property
grid_design
¶ Grid space for design variables, generated from the grid definition, given as
grid_design
with initialization.
-
property
grid_param
¶ Grid space for model parameters, generated from the grid definition, given as
grid_param
with initialization.
-
property
grid_response
¶ Grid space for response variables, generated from the grid definition, given as
grid_response
with initialization.
-
property
log_prior
¶ Log prior probabilities on the grid space of model parameters, \(\log p_0(\theta)\). This log probabilities correspond to grid points defined in
grid_param
.
-
property
log_post
¶ Log posterior probabilities on the grid space of model parameters, \(\log p(\theta)\). This log probabilities correspond to grid points defined in
grid_param
.
-
property
prior
¶ Prior probabilities on the grid space of model parameters, \(p_0(\theta)\). This probabilities correspond to grid points defined in
grid_param
.
-
property
post
¶ Posterior probabilities on the grid space of model parameters, \(p(\theta)\). This probabilities correspond to grid points defined in
grid_param
.
-
property
marg_post
¶ Marginal posterior distributions for each parameter
-
property
log_lik
¶ Log likelihood \(p(y | d, \theta)\) for all discretized values of \(y\), \(d\), and \(\theta\).
-
property
marg_log_lik
¶ Marginal log likelihood \(\log p(y | d)\) for all discretized values for \(y\) and \(d\).
-
property
ent
¶ Entropy \(H(Y(d) | \theta) = -\sum_y p(y | d, \theta) \log p(y | d, \theta)\) for all discretized values for \(d\) and \(\theta\).
-
property
ent_marg
¶ Marginal entropy \(H(Y(d)) = -\sum_y p(y | d) \log p(y | d)\) for all discretized values for \(d\), where \(p(y | d)\) indicates the marginal likelihood.
-
property
ent_cond
¶ Conditional entropy \(H(Y(d) | \Theta) = \sum_\theta p(\theta) H(Y(d) | \theta)\) for all discretized values for \(d\), where \(p(\theta)\) indicates the posterior distribution for model parameters.
-
property
mutual_info
¶ Mutual information \(I(Y(d); \Theta) = H(Y(d)) - H(Y(d) | \Theta)\), where \(H(Y(d))\) indicates the marginal entropy and \(H(Y(d) | \Theta)\) indicates the conditional entropy.
-
property
post_mean
¶ A vector of estimated means for the posterior distribution. Its length is
num_params
.
-
property
post_cov
¶ An estimated covariance matrix for the posterior distribution. Its shape is
(num_grids, num_params)
.
-
property
post_sd
¶ A vector of estimated standard deviations for the posterior distribution. Its length is
num_params
.
-
property
dtype
¶ The desired data-type for the internal vectors and matrixes, e.g.,
numpy.float64
. Default isnumpy.float32
.New in version 0.4.0.
-
reset
()¶ Reset the engine as in the initial state.
-
get_design
(kind='optimal')¶ Choose a design with given one of following types:
'optimal'
(default): an optimal design \(d^*\) that maximizes the mutual information.'random'
: a design randomly chosen.
- Parameters
kind ({‘optimal’, ‘random’}, optional) – Type of a design to choose. Default is
'optimal'
.- Returns
design (Dict[str, any] or None) – A chosen design vector to use for the next trial. Returns None if there is no design available.
-
update
(design, response)¶ Update the posterior probabilities \(p(\theta | y, d^*)\) for all discretized values of \(\theta\).
\[p(\theta | y, d^*) \sim p( y | \theta, d^*) p(\theta)\]# Given design and resposne as `design` and `response`, # the engine can update probability with the following line: engine.update(design, response)
Also, it can takes multiple observations for updating posterior probabilities. Multiple pairs of design and response should be given as a list of designs and a list of responses, into
design
andresponse
argument, respectively.\[\begin{split}\begin{aligned} p\big(\theta | y_1, \ldots, y_n, d_1^*, \ldots, d_n^*\big) &\sim p\big( y_1, \ldots, y_n | \theta, d_1^*, \ldots, d_n^* \big) p(\theta) \\ &= p(y_1 | \theta, d_1^*) \cdot \ldots \cdot p(y_n | \theta, d_n^*) p(\theta) \end{aligned}\end{split}\]# Given a list of designs and corresponding responses as below: designs = [design1, design2, design3] responses = [response1, response2, response3] # the engine can update with multiple observations: engine.update(designs, responses)
- Parameters
design (dict or
pandas.Series
or list of designs) – Design vector for given responseresponse (dict or
pandas.Series
or list of responses) – Any kinds of observed response
-
property