Group estimation and priors

Often, data from single participants are not sufficient for precise parameter estimation. In this section, we introduce to methods to address this: group estimation (random effects or group-level fixed effects) and priors.

Group estimation¶

To use group-level information we need to pass 2d data to ReMeta (n_subjects x n_samples) and specify the group attribute for parameters.

The fit method of ReMeta accepts data as either 1d arrays (single participant) or as 2d arrays (group data). To this aim, we simulate type 1 data for 4 participants:

np.random.seed(42)
cfg = remeta.Configuration()
cfg.skip_type2 = True
params_true = dict(
    type1_noise=0.5,
    type1_bias=0.1,
)
cfg.true_params = params_true
data = remeta.simulate(nsubjects=5, nsamples=200, params=params_true, cfg=cfg)

----------------------------------
..Generative parameters:
    Type 1 noise distribution: normal
    type1_noise: 0.5
    type1_bias: 0.1
..Descriptive statistics:
    No. subjects: 5
    No. samples: 200
    Accuracy: 82.1% correct
    d': 1.9
    Choice bias: 5.1%
----------------------------------

Note that

data.stimuli.shape

(5, 200)

We fit a default ReMeta model:

rem = remeta.ReMeta(cfg=cfg)
rem.fit(data.stimuli, data.choices, data.confidence)
result = rem.summary()

Dataset characteristics:
    No. subjects: 5
    No. samples: [200, 200, 200, 200, 200]
    Accuracy: 82.1% correct
    d': 1.908
    Choice bias: 5.1%

+++ Type 1 level +++
  Subject-level estimation (MLE)
     Subject 1 / 5
     Subject 2 / 5
     Subject 3 / 5
     Subject 4 / 5
     Subject 5 / 5
    .. finished (0.6 secs).
  Final report
    Subject 1 / 5
        Parameters estimates (subject-level fit)
            [subject] type1_noise: 0.604 ± 0.073 (true: 0.500)
            [subject] type1_bias: 0.169 ± 0.065 (true: 0.100)
        [subject] Log-likelihood: -90.64 (per sample: -0.4532)
        [subject] Fitting time: 0.23 secs
        Log-likelihood using true params: -92.56 (per sample: -0.4628)
    Subject 2 / 5
        Parameters estimates (subject-level fit)
            [subject] type1_noise: 0.403 ± 0.030 (true: 0.500)
            [subject] type1_bias: 0.077 ± 0.048 (true: 0.100)
        [subject] Log-likelihood: -64.10 (per sample: -0.3205)
        [subject] Fitting time: 0.09 secs
        Log-likelihood using true params: -65.84 (per sample: -0.3292)
    Subject 3 / 5
        Parameters estimates (subject-level fit)
            [subject] type1_noise: 0.510 ± 0.049 (true: 0.500)
            [subject] type1_bias: 0.161 ± 0.057 (true: 0.100)
        [subject] Log-likelihood: -79.39 (per sample: -0.3969)
        [subject] Fitting time: 0.07 secs
        Log-likelihood using true params: -79.97 (per sample: -0.3998)
    Subject 4 / 5
        Parameters estimates (subject-level fit)
            [subject] type1_noise: 0.636 ± 0.083 (true: 0.500)
            [subject] type1_bias: 0.070 ± 0.067 (true: 0.100)
        [subject] Log-likelihood: -94.99 (per sample: -0.4749)
        [subject] Fitting time: 0.07 secs
        Log-likelihood using true params: -97.54 (per sample: -0.4877)
    Subject 5 / 5
        Parameters estimates (subject-level fit)
            [subject] type1_noise: 0.508 ± 0.048 (true: 0.500)
            [subject] type1_bias: 0.081 ± 0.056 (true: 0.100)
        [subject] Log-likelihood: -80.71 (per sample: -0.4036)
        [subject] Fitting time: 0.07 secs
        Log-likelihood using true params: -80.78 (per sample: -0.4039)
Type 1 level finished

In case of a group-level fit, the result returned by the summary() method is a list of length nsubjects. We can print the final parameter estimates more cleanly as follows:

for s in range(result.nsubjects):
    print(f'Subject {s}')
    for k, v in result.type1.params[s].items():
        print(f'\t{k}: {v:.3f} ± {result.type1.params_se[s][k]:.3f}')

Subject 0
	type1_noise: 0.604 ± 0.073
	type1_bias: 0.169 ± 0.065
Subject 1
	type1_noise: 0.403 ± 0.030
	type1_bias: 0.077 ± 0.048
Subject 2
	type1_noise: 0.510 ± 0.049
	type1_bias: 0.161 ± 0.057
Subject 3
	type1_noise: 0.636 ± 0.083
	type1_bias: 0.070 ± 0.067
Subject 4
	type1_noise: 0.508 ± 0.048
	type1_bias: 0.081 ± 0.056

In this section, we exemplarily focus on the type1_bias parameter which was set to 0.1 in the simulated data. The fitted parameters for type1_bias vary strongly around 0.1. In line with this variability, the standard errors of the parameter estimates are big.

Yet, this is no fitting error, since the empirical log-likelihood is always hight than one for the true parameters. We can verify this as follows:

for s in range(result.nsubjects):
    print(f'Subject {s}')
    print(f'\tLog likelihood of true parameters: {result.type1.loglik_true[s]:.2f}')
    print(f'\tLog likelihood of estimated parameters: {result.type1.loglik[s]:.2f}')

Subject 0
	Log likelihood of true parameters: -92.56
	Log likelihood of estimated parameters: -90.64
Subject 1
	Log likelihood of true parameters: -65.84
	Log likelihood of estimated parameters: -64.10
Subject 2
	Log likelihood of true parameters: -79.97
	Log likelihood of estimated parameters: -79.39
Subject 3
	Log likelihood of true parameters: -97.54
	Log likelihood of estimated parameters: -94.99
Subject 4
	Log likelihood of true parameters: -80.78
	Log likelihood of estimated parameters: -80.71

The issue is rather that few samples are often not representative of the ground truth. By contrast, the more samples, the less likely that the data — in aggregate — behave substantially different than the ground truth.

Random effects¶

An elegant way to regularize unreliable parameter estimates at the individual level are hierarchical random effect models. For random effects parameters, the likelihood computation comprises not only the likelihood given the individual data of a participant, but also the likelihood under a population distribution.

What is elegant about random effects models is that this population distribution is itself learned from the data, with the only “prio”" that the distribution is Gaussian (at least in the frequentist domain).

In this way, extreme estimates for individual subjects are “tamed” (regularized), since they tend to be unlikely under the population distribution. Nevertheless, random effects models leave enough room for interindividual variability between participants.

In our example above, we specify the type 1 bias parameter as a random effects group parameter:

cfg.param_type1_bias.group = 'random'

We restart the fitting procedure with this new setting:

rem = remeta.ReMeta(cfg=cfg)
rem.fit(data.stimuli, data.choices, data.confidence)
result = rem.summary()

Dataset characteristics:
    No. subjects: 5
    No. samples: [200, 200, 200, 200, 200]
    Accuracy: 82.1% correct
    d': 1.908
    Choice bias: 5.1%

+++ Type 1 level +++
  Subject-level estimation (MLE)
     Subject 1 / 5
     Subject 2 / 5
     Subject 3 / 5
     Subject 4 / 5
     Subject 5 / 5
    .. finished (0.4 secs).

  Group-level optimization (MLE / MAP)
        [20:31:46] Iteration 1 / 30 (Convergence: 0.00059327)
        [20:31:47] Iteration 11 / 30 (Convergence: 0.00011138)
        [20:31:47] Iteration 21 / 30 (Convergence: 0.00004453)
    .. finished (0.4 secs).
  Final report
    Subject 1 / 5
        Parameters estimates (subject-level fit)
            [subject] type1_noise: 0.604 ± 0.073 (true: 0.500)
            [subject] type1_bias: 0.169 ± 0.065 (true: 0.100)
        [subject] Log-likelihood: -90.64 (per sample: -0.4532)
        [subject] Fitting time: 0.08 secs
        Parameters estimates (group-level fit)
            [subject] type1_noise: 0.600 ± 0.071 (true: 0.500)
            [group=random] type1_bias: 0.115 ± 0.019 (true: 0.100)
        [final] Log-likelihood: -91.00 (per sample: -0.455)
        Log-likelihood using true params: -92.56 (per sample: -0.4628)
    Subject 2 / 5
        Parameters estimates (subject-level fit)
            [subject] type1_noise: 0.403 ± 0.030 (true: 0.500)
            [subject] type1_bias: 0.077 ± 0.048 (true: 0.100)
        [subject] Log-likelihood: -64.10 (per sample: -0.3205)
        [subject] Fitting time: 0.07 secs
        Parameters estimates (group-level fit)
            [subject] type1_noise: 0.406 ± 0.045 (true: 0.500)
            [group=random] type1_bias: 0.106 ± 0.018 (true: 0.100)
        [final] Log-likelihood: -64.26 (per sample: -0.3213)
        Log-likelihood using true params: -65.84 (per sample: -0.3292)
    Subject 3 / 5
        Parameters estimates (subject-level fit)
            [subject] type1_noise: 0.510 ± 0.049 (true: 0.500)
            [subject] type1_bias: 0.161 ± 0.057 (true: 0.100)
        [subject] Log-likelihood: -79.39 (per sample: -0.3969)
        [subject] Fitting time: 0.07 secs
        Parameters estimates (group-level fit)
            [subject] type1_noise: 0.508 ± 0.057 (true: 0.500)
            [group=random] type1_bias: 0.115 ± 0.019 (true: 0.100)
        [final] Log-likelihood: -79.71 (per sample: -0.3985)
        Log-likelihood using true params: -79.97 (per sample: -0.3998)
    Subject 4 / 5
        Parameters estimates (subject-level fit)
            [subject] type1_noise: 0.636 ± 0.083 (true: 0.500)
            [subject] type1_bias: 0.070 ± 0.067 (true: 0.100)
        [subject] Log-likelihood: -94.99 (per sample: -0.4749)
        [subject] Fitting time: 0.06 secs
        Parameters estimates (group-level fit)
            [subject] type1_noise: 0.641 ± 0.078 (true: 0.500)
            [group=random] type1_bias: 0.107 ± 0.019 (true: 0.100)
        [final] Log-likelihood: -95.14 (per sample: -0.4757)
        Log-likelihood using true params: -97.54 (per sample: -0.4877)
    Subject 5 / 5
        Parameters estimates (subject-level fit)
            [subject] type1_noise: 0.508 ± 0.048 (true: 0.500)
            [subject] type1_bias: 0.081 ± 0.056 (true: 0.100)
        [subject] Log-likelihood: -80.71 (per sample: -0.4036)
        [subject] Fitting time: 0.06 secs
        Parameters estimates (group-level fit)
            [subject] type1_noise: 0.509 ± 0.058 (true: 0.500)
            [group=random] type1_bias: 0.107 ± 0.019 (true: 0.100)
        [final] Log-likelihood: -80.81 (per sample: -0.4041)
        Log-likelihood using true params: -80.78 (per sample: -0.4039)
Type 1 level finished

In the current example, all participant estimates for type1_bias are pretty similar, reflecting the fact that the data were in fact generated by the same ground truth model:

for s in range(result.nsubjects):
    print(f'Subject {s}')
    for k, v in result.type1.params[s].items():
        print(f'\t{k}: {v:.6f}')

Subject 0
	type1_noise: 0.600090
	type1_bias: 0.114900
Subject 1
	type1_noise: 0.405529
	type1_bias: 0.105509
Subject 2
	type1_noise: 0.508127
	type1_bias: 0.115239
Subject 3
	type1_noise: 0.641339
	type1_bias: 0.106636
Subject 4
	type1_noise: 0.509449
	type1_bias: 0.106799

This observation is matched by an inspection of the population estimate for type1_nonlinear_gain:

print(f'Estimated population mean: {result.params_random_effect.mean['type1_bias']:.3f}')
print(f'Estimated population SD: {result.params_random_effect.std['type1_bias']:.3f}')

Estimated population mean: 0.110
Estimated population SD: 0.019

Fixed effects¶

Another option to tackle unreliable individual parameter estimates is to fit a single parameter to the entire group. In ReMeta, this is possible by setting the group attribute of the parameter to 'fixed':

cfg.param_type1_bias.group = 'fixed'

We fit the model as usual:

rem = remeta.ReMeta(cfg=cfg)
rem.fit(data.stimuli, data.choices, data.confidence)
result = rem.summary()

Dataset characteristics:
    No. subjects: 5
    No. samples: [200, 200, 200, 200, 200]
    Accuracy: 82.1% correct
    d': 1.908
    Choice bias: 5.1%

+++ Type 1 level +++
  Subject-level estimation (MLE)
     Subject 1 / 5
     Subject 2 / 5
     Subject 3 / 5
     Subject 4 / 5
     Subject 5 / 5
    .. finished (0.4 secs).

  Group-level optimization (MLE / MAP)
        [20:31:47] Iteration 1 / 30
        [20:31:47] Iteration 11 / 30
        [20:31:47] Iteration 21 / 30
    .. finished (0.2 secs).
  Final report
    Subject 1 / 5
        Parameters estimates (subject-level fit)
            [subject] type1_noise: 0.604 ± 0.073 (true: 0.500)
            [subject] type1_bias: 0.169 ± 0.065 (true: 0.100)
        [subject] Log-likelihood: -90.64 (per sample: -0.4532)
        [subject] Fitting time: 0.07 secs
        Parameters estimates (group-level fit)
            [subject] type1_noise: 0.600 ± 0.071 (true: 0.500)
            [group=fixed] type1_bias: 0.109 ± 0.026 (true: 0.100)
        [final] Log-likelihood: -91.07 (per sample: -0.4554)
        Log-likelihood using true params: -92.56 (per sample: -0.4628)
    Subject 2 / 5
        Parameters estimates (subject-level fit)
            [subject] type1_noise: 0.403 ± 0.030 (true: 0.500)
            [subject] type1_bias: 0.077 ± 0.048 (true: 0.100)
        [subject] Log-likelihood: -64.10 (per sample: -0.3205)
        [subject] Fitting time: 0.07 secs
        Parameters estimates (group-level fit)
            [subject] type1_noise: 0.406 ± 0.045 (true: 0.500)
            [group=fixed] type1_bias: 0.109 ± 0.026 (true: 0.100)
        [final] Log-likelihood: -64.31 (per sample: -0.3215)
        Log-likelihood using true params: -65.84 (per sample: -0.3292)
    Subject 3 / 5
        Parameters estimates (subject-level fit)
            [subject] type1_noise: 0.510 ± 0.049 (true: 0.500)
            [subject] type1_bias: 0.161 ± 0.057 (true: 0.100)
        [subject] Log-likelihood: -79.39 (per sample: -0.3969)
        [subject] Fitting time: 0.07 secs
        Parameters estimates (group-level fit)
            [subject] type1_noise: 0.508 ± 0.057 (true: 0.500)
            [group=fixed] type1_bias: 0.109 ± 0.026 (true: 0.100)
        [final] Log-likelihood: -79.79 (per sample: -0.399)
        Log-likelihood using true params: -79.97 (per sample: -0.3998)
    Subject 4 / 5
        Parameters estimates (subject-level fit)
            [subject] type1_noise: 0.636 ± 0.083 (true: 0.500)
            [subject] type1_bias: 0.070 ± 0.067 (true: 0.100)
        [subject] Log-likelihood: -94.99 (per sample: -0.4749)
        [subject] Fitting time: 0.07 secs
        Parameters estimates (group-level fit)
            [subject] type1_noise: 0.642 ± 0.078 (true: 0.500)
            [group=fixed] type1_bias: 0.109 ± 0.026 (true: 0.100)
        [final] Log-likelihood: -95.17 (per sample: -0.4758)
        Log-likelihood using true params: -97.54 (per sample: -0.4877)
    Subject 5 / 5
        Parameters estimates (subject-level fit)
            [subject] type1_noise: 0.508 ± 0.048 (true: 0.500)
            [subject] type1_bias: 0.081 ± 0.056 (true: 0.100)
        [subject] Log-likelihood: -80.71 (per sample: -0.4036)
        [subject] Fitting time: 0.07 secs
        Parameters estimates (group-level fit)
            [subject] type1_noise: 0.510 ± 0.058 (true: 0.500)
            [group=fixed] type1_bias: 0.109 ± 0.026 (true: 0.100)
        [final] Log-likelihood: -80.83 (per sample: -0.4042)
        Log-likelihood using true params: -80.78 (per sample: -0.4039)
Type 1 level finished

Now, the parameter type1_bias is fitted to the entire group dataset and thus the parameter is identical for each participant. The final estimate of 0.109 much closer to the ground truth value of 0.1. We once again print the final parameter more cleanly:

for s in range(result.nsubjects):
    print(f'Subject {s}')
    for k, v in result.type1.params[s].items():
        print(f'\t{k}: {v:.3f}')

Subject 0
	type1_noise: 0.600
	type1_bias: 0.109
Subject 1
	type1_noise: 0.406
	type1_bias: 0.109
Subject 2
	type1_noise: 0.508
	type1_bias: 0.109
Subject 3
	type1_noise: 0.642
	type1_bias: 0.109
Subject 4
	type1_noise: 0.510
	type1_bias: 0.109

Note that even though parameter recovery improved, the log-likelihood of this group fit is worse (i.e. lower) than the single-subject fit:

for s in range(result.nsubjects):
    print(f'Subject {s}')
    print(f'\tnegll(subject fit): {result.type1.subject.loglik[s]:.3f}')
    print(f'\tnegll(group fit): {result.type1.group.loglik[s]:.3f}')

Subject 0
	negll(subject fit): -90.645
	negll(group fit): -91.072
Subject 1
	negll(subject fit): -64.101
	negll(group fit): -64.308
Subject 2
	negll(subject fit): -79.385
	negll(group fit): -79.794
Subject 3
	negll(subject fit): -94.989
	negll(group fit): -95.167
Subject 4
	negll(subject fit): -80.711
	negll(group fit): -80.834

Yet, in this case this effectively means that the model is not overfit to random peculiarities of each subject’s data and better fits the broad trends in the group data.

Priors¶

Priors present another way to inform and regularize point estimates of participants. If there is good reason from prior literature or a prior study to assume a prior distribution for a parameter, one can perform Maximum A Posteriori estimation (MAP) instead of Maximum Likelihood estimation (MLE). In Remeta this is possible by specifying the prior attribute of a parameter. In the following example, we delete the previous random effect for type1_bias and specify a prior instead - a tuple of the form (prior_mean, prior_std).

Specifically, we assume that the bias will be on average 0 with a standard deviation 0.05 (under a normal model). This is a unrealistically strong prior, but it serves for demonstration.

cfg.param_type1_bias.group = None
cfg.param_type1_bias.prior = (0, 0.05)
rem = remeta.ReMeta(cfg=cfg)
rem.fit(data.stimuli, data.choices, data.confidence)
result = rem.summary()

Dataset characteristics:
    No. subjects: 5
    No. samples: [200, 200, 200, 200, 200]
    Accuracy: 82.1% correct
    d': 1.908
    Choice bias: 5.1%

+++ Type 1 level +++
  Subject-level estimation (MLE)
     Subject 1 / 5
     Subject 2 / 5
     Subject 3 / 5
     Subject 4 / 5
     Subject 5 / 5
    .. finished (0.4 secs).
  Final report
    Subject 1 / 5
        Parameters estimates (subject-level fit)
            [subject] type1_noise: 0.605 ± 0.072 (true: 0.500)
            [subject+prior=N(0,0.05)] type1_bias: 0.064 ± 0.040 (true: 0.100)
        [subject] Log-likelihood: -92.80 (per sample: -0.464)
        [subject] Fitting time: 0.05 secs
        Log-likelihood using true params: -94.56 (per sample: -0.4728)
    Subject 2 / 5
        Parameters estimates (subject-level fit)
            [subject] type1_noise: 0.403 ± 0.030 (true: 0.500)
            [subject+prior=N(0,0.05)] type1_bias: 0.038 ± 0.035 (true: 0.100)
        [subject] Log-likelihood: -64.69 (per sample: -0.3235)
        [subject] Fitting time: 0.07 secs
        Log-likelihood using true params: -67.84 (per sample: -0.3392)
    Subject 3 / 5
        Parameters estimates (subject-level fit)
            [subject] type1_noise: 0.512 ± 0.049 (true: 0.500)
            [subject+prior=N(0,0.05)] type1_bias: 0.070 ± 0.037 (true: 0.100)
        [subject] Log-likelihood: -81.63 (per sample: -0.4082)
        [subject] Fitting time: 0.04 secs
        Log-likelihood using true params: -81.97 (per sample: -0.4098)
    Subject 4 / 5
        Parameters estimates (subject-level fit)
            [subject] type1_noise: 0.636 ± 0.082 (true: 0.500)
            [subject+prior=N(0,0.05)] type1_bias: 0.025 ± 0.040 (true: 0.100)
        [subject] Log-likelihood: -95.34 (per sample: -0.4767)
        [subject] Fitting time: 0.10 secs
        Log-likelihood using true params: -99.54 (per sample: -0.4977)
    Subject 5 / 5
        Parameters estimates (subject-level fit)
            [subject] type1_noise: 0.509 ± 0.048 (true: 0.500)
            [subject+prior=N(0,0.05)] type1_bias: 0.035 ± 0.038 (true: 0.100)
        [subject] Log-likelihood: -81.29 (per sample: -0.4064)
        [subject] Fitting time: 0.11 secs
        Log-likelihood using true params: -82.78 (per sample: -0.4139)
Type 1 level finished

According to our prior, a null effect for the type1_bias should be most likely. Indeed, as seen in the following output, the type1_bias is biased towards 0 compared to the original estimates without a prior:

for s in range(result.nsubjects):
    print(f'Subject {s}')
    for k, v in result.type1.params[s].items():
        print(f'\t{k}: {v:.3f}')

Subject 0
	type1_noise: 0.605
	type1_bias: 0.064
Subject 1
	type1_noise: 0.403
	type1_bias: 0.038
Subject 2
	type1_noise: 0.512
	type1_bias: 0.070
Subject 3
	type1_noise: 0.636
	type1_bias: 0.025
Subject 4
	type1_noise: 0.509
	type1_bias: 0.035