Parameter estimates and model evidence

The two critical results of any computational model are 1) how well did the model fit (model evidence) and 2) what are the estimated parameters of the model. In this section, we cover both.

Parameter estimates¶

We load a builtin dataset and fit the default ReMeta model.

ds = remeta.load_dataset('default')
rem = remeta.ReMeta(optim_type2_gridsearch=False)
rem.fit(ds.stimuli, ds.choices, ds.confidence)

----------------------------------
..Generative model:
    Type 1 noise distribution: normal
    Type 2 noise type: report
    Type 2 noise distribution: beta_mode
..Generative parameters:
    type1_noise: 0.5
    type1_bias: -0.1
    type2_noise: 0.3
    type2_criteria: [0.25 0.5  0.75]
        [extra] Criterion bias: 0.0000
        [extra] Criterion-based confidence bias: 0.0000
..Descriptive statistics:
    No. subjects: 1
    No. samples: 2000
    Accuracy: 85.2% correct
    d': 2.1
    Choice bias: -3.9%
    Confidence: 0.53
    M-Ratio: 0.33
    AUROC2: 0.60
----------------------------------
Dataset characteristics:
    No. subjects: 1
    No. samples: 2000
    Accuracy: 85.2% correct
    d': 2.117
    Choice bias: -3.9%
    Mean confidence: 0.530 (min: 0.125, max: 0.875)

+++ Type 1 level +++
  Subject-level estimation (MLE)
    .. finished (0.1 secs).
  Final report
    Parameters estimates (subject-level fit)
        [subject] type1_noise: 0.510 ± 0.018
        [subject] type1_bias: -0.099 ± 0.019
    [subject] Log-likelihood: -717.84 (per sample: -0.3589)
    [subject] Fitting time: 0.09 secs
Type 1 level finished

+++ Type 2 level +++
  Subject-level estimation (MLE)
    .. finished (59.3 secs).
  Final report
    Parameters estimates (subject-level fit)
        [subject] type2_noise: 0.264 ± 0.031
        [subject] type2_criteria: [0.268 ± 0.014, 0.512 ± 0.012, 0.758 ± 0.008]
            [extra] type2_criteria_bias: 0.009 ± 0.008
            [extra] type2_criteria_confidence_bias: -0.009 ± 0.008
    [subject] Log-likelihood: -3425.60 (per sample: -1.713)
    [subject] Fitting time: 24.08 secs
Type 2 level finished

To access the results, it is recommended to first invoke the summary() method on the ReMeta instance:

result = rem.summary()

The final parameter estimates are accessible via

result.params

{'type1_noise': np.float64(0.5095607807701794),
 'type1_bias': np.float64(-0.09903165092790608),
 'type2_noise': np.float64(0.26434207348657335),
 'type2_criteria': array([0.26752641, 0.51154427, 0.75795668])}

In addition, each fitting stage (type 1 and type 2) can be accessed separately.

result.type1.params

{'type1_noise': np.float64(0.5095607807701794),
 'type1_bias': np.float64(-0.09903165092790608)}

result.type2.params

{'type2_noise': np.float64(0.26434207348657335),
 'type2_criteria': array([0.26752641, 0.51154427, 0.75795668])}

Uncertainty of parameter estimates¶

For each parameter estimate, ReMeta computes a standard error, reflecting the uncertainty in the parameter estimate:

result.params_se

{'type1_noise': np.float64(0.017564176485066366),
 'type1_bias': np.float64(0.01897664187387822),
 'type2_noise': np.float64(0.031464384754221524),
 'type2_criteria': array([0.01401871, 0.01208081, 0.00817252])}

The standard error is computed based on the Hessian matrix, which is the second derivative (i.e. curvature) of the log likelihood $\ell(\boldsymbol{\theta})$ evaluated at the estimated parameters $\boldsymbol{\hat{\theta}}$ :

H(\boldsymbol{\hat{\theta}}) = \frac{\partial^2 \ell(\boldsymbol{\theta})}{\partial \boldsymbol{\theta} \, \partial \boldsymbol{\theta}^\top} \Bigg|_{\boldsymbol{\theta} = \boldsymbol{\hat{\theta}}}

(1)

The Hessian is a $k$ x $k$ matrix with $k$ being the number of prameters. By inverting the Hessian, we get the covariance matrix:

\hat{\mathrm{Cov}}(\boldsymbol{\hat{\theta}}) = \left( - H(\boldsymbol{\hat{\theta}}) \right)^{-1}

(2)

The estimated standard error $\hat{\mathrm{SE}}$ of the $j$ -th parameter $\hat{\theta}_j$ is the square root of the $j$ -th diagonal element of the covariance matrix:

\hat{\mathrm{SE}}(\hat{\theta}_j) = \sqrt{ \left[ \hat{\mathrm{Cov}}(\boldsymbol{\hat{\theta}}) \right]_{jj} }

(3)

The Hessian of the log-likelihood evaluated at $\boldsymbol{\hat{\theta}}$ measures the local curvature of the likelihood surface at that point. A steeper curvature indicates that even small deviations from $\boldsymbol{\hat{\theta}}$ are inconsistent with the observed data and thus higher certainty about our particular parameter estimate. In contrast, a flatter curvature a flatter curvature indicates that the log-likelihood changes only gradually near $\boldsymbol{\hat{\theta}}$ , meaning that a wider range of parameter values is consistent with the data and we are less certain about our particular parameter estimate.

Subject versus group level¶

When the model fit includes group-level data (with random or fixed effects), a third level in the result object becomes relevant, which separates subject-level and group-level information. For this purpose, we load a built-in dataset with three participants:

ds2 = remeta.load_dataset('group')

----------------------------------
..Generative model:
    Type 1 noise distribution: normal
    Type 2 noise type: report
    Type 2 noise distribution: beta_mode
..Generative parameters:
    type1_noise: 0.5
    type1_bias: -0.1
    type2_noise: 0.3
    type2_criteria: [0.25 0.5  0.75]
        [extra] Criterion bias: 0.0000
        [extra] Criterion-based confidence bias: 0.0000
..Descriptive statistics:
    No. subjects: 3
    No. samples: 1000
    Accuracy: 85.6% correct
    d': 2.2
    Choice bias: -4.1%
    Confidence: 0.53
    M-Ratio: 0.28
    AUROC2: 0.59
----------------------------------

For this dataset, we fit the type1_bias as a random effect:

cfg = remeta.Configuration()
cfg.param_type1_bias.group = 'random'
cfg.optim_type2_gridsearch = False

rem = remeta.ReMeta(cfg)
rem.fit(ds2.stimuli, ds2.choices, ds2.confidence)
result2 = rem.summary()

Dataset characteristics:
    No. subjects: 3
    No. samples: [1000, 1000, 1000]
    Accuracy: 85.6% correct
    d': 2.164
    Choice bias: -4.1%
    Mean confidence: 0.532 (min: 0.125, max: 0.875)

+++ Type 1 level +++
  Subject-level estimation (MLE)
     Subject 1 / 3
     Subject 2 / 3
     Subject 3 / 3
    .. finished (0.5 secs).

  Group-level optimization (MLE / MAP)
        [20:34:14] Iteration 1 / 30 (Convergence: 0.00021332)
        [20:34:14] Iteration 11 / 30 (Convergence: 0.00003587)
        [20:34:14] Iteration 21 / 30 (Convergence: 0.00001391)
    .. finished (0.3 secs).
  Final report
    Subject 1 / 3
        Parameters estimates (subject-level fit)
            [subject] type1_noise: 0.462 ± 0.023
            [subject] type1_bias: -0.099 ± 0.025
        [subject] Log-likelihood: -326.13 (per sample: -0.3261)
        [subject] Fitting time: 0.10 secs
        Parameters estimates (group-level fit)
            [subject] type1_noise: 0.462 ± 0.023
            [group=random] type1_bias: -0.104 ± 0.009
        [final] Log-likelihood: -326.14 (per sample: -0.3261)
    Subject 2 / 3
        Parameters estimates (subject-level fit)
            [subject] type1_noise: 0.474 ± 0.024
            [subject] type1_bias: -0.082 ± 0.026
        [subject] Log-likelihood: -333.80 (per sample: -0.3338)
        [subject] Fitting time: 0.15 secs
        Parameters estimates (group-level fit)
            [subject] type1_noise: 0.474 ± 0.023
            [group=random] type1_bias: -0.102 ± 0.009
        [final] Log-likelihood: -334.08 (per sample: -0.3341)
    Subject 3 / 3
        Parameters estimates (subject-level fit)
            [subject] type1_noise: 0.509 ± 0.029
            [subject] type1_bias: -0.134 ± 0.027
        [subject] Log-likelihood: -361.38 (per sample: -0.3614)
        [subject] Fitting time: 0.11 secs
        Parameters estimates (group-level fit)
            [subject] type1_noise: 0.510 ± 0.025
            [group=random] type1_bias: -0.108 ± 0.009
        [final] Log-likelihood: -361.86 (per sample: -0.3619)
Type 1 level finished

+++ Type 2 level +++
  Subject-level estimation (MLE)
     Subject 1 / 3
     Subject 2 / 3
     Subject 3 / 3
    .. finished (81.1 secs).
  Final report
    Subject 1 / 3
        Parameters estimates (subject-level fit)
            [subject] type2_noise: 0.316 ± 0.062
            [subject] type2_criteria: [0.265 ± 0.022, 0.506 ± 0.018, 0.760 ± 0.012]
                [extra] type2_criteria_bias: 0.011 ± 0.011
                [extra] type2_criteria_confidence_bias: -0.011 ± 0.011
        [subject] Log-likelihood: -1689.90 (per sample: -1.69)
        [subject] Fitting time: 10.12 secs
    Subject 2 / 3
        Parameters estimates (subject-level fit)
            [subject] type2_noise: 0.273 ± 0.035
            [subject] type2_criteria: [0.263 ± 0.017, 0.489 ± 0.016, 0.754 ± 0.012]
                [extra] type2_criteria_bias: 0.006 ± 0.011
                [extra] type2_criteria_confidence_bias: -0.006 ± 0.011
        [subject] Log-likelihood: -1684.49 (per sample: -1.684)
        [subject] Fitting time: 10.80 secs
    Subject 3 / 3
        Parameters estimates (subject-level fit)
            [subject] type2_noise: 0.301 ± 0.041
            [subject] type2_criteria: [0.223 ± 0.016, 0.501 ± 0.015, 0.763 ± 0.011]
                [extra] type2_criteria_bias: 0.001 ± 0.010
                [extra] type2_criteria_confidence_bias: -0.001 ± 0.010
        [subject] Log-likelihood: -1697.14 (per sample: -1.697)
        [subject] Fitting time: 10.75 secs
Type 2 level finished

Group-level parameters are first fitted at an individual level, to provide suitable initial values for the group-level estimate. The result object always contains information for both levels, accessible as follows.

result2.subject.params

[{'type1_noise': np.float64(0.46208938432786706),
  'type1_bias': np.float64(-0.09931352775816989),
  'type2_noise': np.float64(0.31585988970367485),
  'type2_criteria': array([0.26465114, 0.50571586, 0.76037933])},
 {'type1_noise': np.float64(0.4739633340174307),
  'type1_bias': np.float64(-0.08224188059838966),
  'type2_noise': np.float64(0.27279449366817915),
  'type2_criteria': array([0.26281465, 0.48883183, 0.75365125])},
 {'type1_noise': np.float64(0.5094581873264677),
  'type1_bias': np.float64(-0.13417843272424754),
  'type2_noise': np.float64(0.30140098040048996),
  'type2_criteria': array([0.22286925, 0.50141797, 0.76257958])}]

result2.group.params

[{'type1_noise': np.float64(0.4620991504236927),
  'type1_bias': np.float64(-0.10397343552490379)},
 {'type1_noise': np.float64(0.47447815250071174),
  'type1_bias': np.float64(-0.10188457437583853)},
 {'type1_noise': np.float64(0.5098555076382547),
  'type1_bias': np.float64(-0.10805906886853289)}]

In the case, the type 2 level was not fitted at the group level and thus the group-level parameters only include type 1 parameters.

The following works as well:

result2.type1.subject.params

[{'type1_noise': np.float64(0.46208938432786706),
  'type1_bias': np.float64(-0.09931352775816989)},
 {'type1_noise': np.float64(0.4739633340174307),
  'type1_bias': np.float64(-0.08224188059838966)},
 {'type1_noise': np.float64(0.5094581873264677),
  'type1_bias': np.float64(-0.13417843272424754)}]

result2.type1.group.params

[{'type1_noise': np.float64(0.4620991504236927),
  'type1_bias': np.float64(-0.10397343552490379)},
 {'type1_noise': np.float64(0.47447815250071174),
  'type1_bias': np.float64(-0.10188457437583853)},
 {'type1_noise': np.float64(0.5098555076382547),
  'type1_bias': np.float64(-0.10805906886853289)}]

result2.type2.subject.params

[{'type2_noise': np.float64(0.31585988970367485),
  'type2_criteria': array([0.26465114, 0.50571586, 0.76037933])},
 {'type2_noise': np.float64(0.27279449366817915),
  'type2_criteria': array([0.26281465, 0.48883183, 0.75365125])},
 {'type2_noise': np.float64(0.30140098040048996),
  'type2_criteria': array([0.22286925, 0.50141797, 0.76257958])}]

Since the type 2 level did not include group-level parameters, result2.type2.group is empty (i.e., None):

result2.type2.group is None

True

Model evidence¶

ReMeta is fundamentally based on a frequentist framework and uses maximum likelihood estimation to estimate parameters.

Likelihood at the type 1 stage is the probability of decisions $D$ given the stimuli $x$ and the type 1 parameters $\boldsymbol{\theta_1}$ . Maximum likelihood estimation tries to find the set of paratemers $\boldsymbol{\hat{\theta}_1}$ that maximize the likelihood under the sampling distribution $f(D\mid \boldsymbol{\theta_1}, x)$ :

\boldsymbol{\hat{\theta}_1} = arg\max_{\boldsymbol{\theta_1}} f(D\mid x,\boldsymbol{\theta_1})

(4)

Likelihood at the type 2 stage is the probability of reported confidence ratings $C$ given type 1 decision values $y$ and the type 2 parameters $\boldsymbol{\theta_2}$ .

\boldsymbol{\hat{\theta}_2} = arg\max_{\boldsymbol{\theta_2}} f(C\mid y,\boldsymbol{\theta_2})f(y)

(5)

Likelihoods are computed for the type 1 and type 2 stage separately and are represented as such in the result object:


print(f'Type 1 log likelihood (overall): {result.type1.loglik:.3f}')
print(f'Type 1 log likelihood (per sample): {result.type1.loglik_per_sample:.3f}')
print(f'Type 2 log likelihood (overall): {result.type2.loglik:.3f}')
print(f'Type 2 log likelihood (per sample): {result.type2.loglik_per_sample:.3f}')

Type 1 log likelihood (overall): -717.841
Type 1 log likelihood (per sample): -0.359
Type 2 log likelihood (overall): -3425.602
Type 2 log likelihood (per sample): -1.713

Higher likelihoods indicate better fits. If different studies are compared, it is best to assess the likelihood per sample.

In Addition, ReMeta reports the Akaike information criterion (AIC) and the Bayesian information criterion (BIC):

\mathrm{AIC} = 2k-2 \log L(\boldsymbol{\hat{\theta}})

(6)

where $k$ is the number of parameters and $L(\boldsymbol{\hat{\theta}})$ is the likelihood for the estimated parameters $\boldsymbol{\hat{\theta}}$ .

\mathrm{BIC} = k \log n-2 \log L(\boldsymbol{\hat{\theta}})

(7)

where $n$ is the number of samples.


print(f'Type 1 AIC (overall): {result.type1.aic:.2f}')
print(f'Type 1 AIC (per sample): {result.type1.aic_per_sample:.4f}')
print(f'Type 2 AIC (overall): {result.type2.aic:.2f}')
print(f'Type 2 AIC (per sample): {result.type2.aic_per_sample:.4f}\n')

print(f'Type 1 BIC (overall): {result.type1.bic:.2f}')
print(f'Type 1 BIC (per sample): {result.type1.bic_per_sample:.4f}')
print(f'Type 2 BIC (overall): {result.type2.bic:.2f}')
print(f'Type 2 BIC (per sample): {result.type2.bic_per_sample:.4f}')

Type 1 AIC (overall): 1439.68
Type 1 AIC (per sample): 0.7198
Type 2 AIC (overall): 6855.20
Type 2 AIC (per sample): 3.4276

Type 1 BIC (overall): 1450.88
Type 1 BIC (per sample): 0.7254
Type 2 BIC (overall): 6866.41
Type 2 BIC (per sample): 3.4332

Here, lower AIC and BIC values indicate better fits.

If a group-level model was fitted to a stage, the result objects contains both the model evidence of the subject-level fit and the group-level fit for each:

for s in range(result2.nsubjects):
    print(f'[subject {s}] Subject-level log likelihood: {result2.type1.subject.loglik[s]:.2f}')

[subject 0] Subject-level log likelihood: -326.13
[subject 1] Subject-level log likelihood: -333.80
[subject 2] Subject-level log likelihood: -361.38

for s in range(result2.nsubjects):
    print(f'[subject {s}] Group-level log likelihood: {result2.type1.group.loglik[s]:.2f}')

[subject 0] Group-level log likelihood: -326.14
[subject 1] Group-level log likelihood: -334.08
[subject 2] Group-level log likelihood: -361.86

Note that the group level log likelihood is expected to be higher than the subject-level log likelihood. It is precisely the goal of group-level fits to avoid overfitting to individual subjects.