Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Group estimation and priors

Often, data from single participants are not sufficient for precise parameter estimation. In this section, we introduce to methods to address this: group estimation (random effects or group-level fixed effects) and priors.

Group estimation

To use group-level information we need to pass 2d data to ReMeta (n_subjects x n_samples) and specify the group attribute for parameters.

The fit method of ReMeta accepts data as either 1d arrays (single participant) or as 2d arrays (group data). To this aim, we simulate type 1 data for 4 participants:

np.random.seed(42)
cfg = remeta.Configuration()
cfg.skip_type2 = True
params_true = dict(
    type1_noise=0.5,
    type1_bias=0.1,
)
cfg.true_params = params_true
data = remeta.simulate(n_subjects=5, n_samples=200, params=params_true, cfg=cfg)
----------------------------------
..Generative parameters:
    Type 1 noise distribution: normal
    type1_noise: 0.5
    type1_bias: 0.1
..Descriptive statistics:
    No. subjects: 5
    No. samples: 200
    Accuracy: 82.1% correct
    d': 1.9
    Choice bias: 5.1%
----------------------------------

Note that

data.stimuli.shape
(5, 200)

We fit a default ReMeta model:

rem = remeta.ReMeta(cfg=cfg)
rem.fit(data.stimuli, data.choices, data.confidence)
result = rem.summary()

In case of a group-level fit, the result returned by the summary() method is a list of length nsubjects. We can print the final parameter estimates more cleanly as follows:

for s in range(result.n_subjects):
    print(f'Subject {s}')
    for k, v in result.type1.params[s].items():
        print(f'\t{k}: {v:.3f} ± {result.type1.params_se[s][k]:.3f}')
Subject 0
	type1_noise: 0.604 ± 0.073
	type1_bias: 0.169 ± 0.065
Subject 1
	type1_noise: 0.403 ± 0.045
	type1_bias: 0.077 ± 0.050
Subject 2
	type1_noise: 0.510 ± 0.058
	type1_bias: 0.161 ± 0.058
Subject 3
	type1_noise: 0.636 ± 0.077
	type1_bias: 0.070 ± 0.066
Subject 4
	type1_noise: 0.508 ± 0.058
	type1_bias: 0.081 ± 0.057

In this section, we exemplarily focus on the type1_bias parameter which was set to 0.1 in the simulated data. The fitted parameters for type1_bias vary strongly around 0.1. In line with this variability, the standard errors of the parameter estimates are big.

Yet, this is no fitting error, since the empirical log-likelihood is always hight than one for the true parameters. We can verify this as follows:

for s in range(result.n_subjects):
    print(f'Subject {s}')
    print(f'\tLog likelihood of true parameters: {result.type1.loglik_true[s]:.2f}')
    print(f'\tLog likelihood of estimated parameters: {result.type1.loglik[s]:.2f}')
Subject 0
	Log likelihood of true parameters: -92.56
	Log likelihood of estimated parameters: -90.64
Subject 1
	Log likelihood of true parameters: -65.84
	Log likelihood of estimated parameters: -64.10
Subject 2
	Log likelihood of true parameters: -79.97
	Log likelihood of estimated parameters: -79.39
Subject 3
	Log likelihood of true parameters: -97.54
	Log likelihood of estimated parameters: -94.99
Subject 4
	Log likelihood of true parameters: -80.78
	Log likelihood of estimated parameters: -80.71

The issue is rather that few samples are often not representative of the ground truth. By contrast, the more samples, the less likely that the data — in aggregate — behave substantially different than the ground truth.

Random effects

An elegant way to regularize unreliable parameter estimates at the individual level are hierarchical random effect models. For random effects parameters, the likelihood computation comprises not only the likelihood given the individual data of a participant, but also the likelihood under a population distribution.

What is elegant about random effects models is that this population distribution is itself learned from the data, with the only “prio”" that the distribution is Gaussian (at least in the frequentist domain).

In this way, extreme estimates for individual subjects are “tamed” (regularized), since they tend to be unlikely under the population distribution. Nevertheless, random effects models leave enough room for interindividual variability between participants.

In our example above, we specify the type 1 bias parameter as a random effects group parameter:

cfg.param_type1_bias.group = 'random'

We restart the fitting procedure with this new setting:

rem = remeta.ReMeta(cfg=cfg)
rem.fit(data.stimuli, data.choices, data.confidence)
result = rem.summary()

In the current example, all participant estimates for type1_bias are pretty similar, reflecting the fact that the data were in fact generated by the same ground truth model:

for s in range(result.n_subjects):
    print(f'Subject {s}')
    for k, v in result.type1.params[s].items():
        print(f'\t{k}: {v:.6f}')
Subject 0
	type1_noise: 0.600143
	type1_bias: 0.111268
Subject 1
	type1_noise: 0.405660
	type1_bias: 0.106200
Subject 2
	type1_noise: 0.508257
	type1_bias: 0.111492
Subject 3
	type1_noise: 0.641400
	type1_bias: 0.106824
Subject 4
	type1_noise: 0.509499
	type1_bias: 0.106910

This observation is matched by an inspection of the population estimate for type1_bias:

print(f'Estimated population mean: {result.params_random_effect.mean['type1_bias']:.3f}')
print(f'Estimated population SD: {result.params_random_effect.std['type1_bias']:.3f}')
Estimated population mean: 0.108
Estimated population SD: 0.014

Fixed effects

Another option to tackle unreliable individual parameter estimates is to fit a single parameter to the entire group. In ReMeta, this is possible by setting the group attribute of the parameter to 'fixed':

cfg.param_type1_bias.group = 'fixed'

We fit the model as usual:

rem = remeta.ReMeta(cfg=cfg)
rem.fit(data.stimuli, data.choices, data.confidence)
result = rem.summary()

Now, the parameter type1_bias is fitted to the entire group dataset and thus the parameter is identical for each participant. The final estimate of 0.109 much closer to the ground truth value of 0.1. We once again print the final parameter more cleanly:

for s in range(result.n_subjects):
    print(f'Subject {s}')
    for k, v in result.type1.params[s].items():
        print(f'\t{k}: {v:.3f}')
Subject 0
	type1_noise: 0.600
	type1_bias: 0.109
Subject 1
	type1_noise: 0.406
	type1_bias: 0.109
Subject 2
	type1_noise: 0.508
	type1_bias: 0.109
Subject 3
	type1_noise: 0.642
	type1_bias: 0.109
Subject 4
	type1_noise: 0.510
	type1_bias: 0.109

Note that even though parameter recovery improved, the log-likelihood of this group fit is worse (i.e. lower) than the single-subject fit:

for s in range(result.n_subjects):
    print(f'Subject {s}')
    print(f'\tnegll(subject fit): {result.type1.subject.loglik[s]:.3f}')
    print(f'\tnegll(group fit): {result.type1.group.loglik[s]:.3f}')
Subject 0
	negll(subject fit): -90.645
	negll(group fit): -91.076
Subject 1
	negll(subject fit): -64.101
	negll(group fit): -64.303
Subject 2
	negll(subject fit): -79.385
	negll(group fit): -79.799
Subject 3
	negll(subject fit): -94.989
	negll(group fit): -95.164
Subject 4
	negll(subject fit): -80.711
	negll(group fit): -80.831

Yet, in this case this effectively means that the model is not overfit to random peculiarities of each subject’s data and better fits the broad trends in the group data.

Priors

Priors present another way to inform and regularize point estimates of participants. If there is good reason from prior literature or a prior study to assume a prior distribution for a parameter, one can perform Maximum A Posteriori estimation (MAP) instead of Maximum Likelihood estimation (MLE). In Remeta this is possible by specifying the prior attribute of a parameter. In the following example, we delete the previous random effect for type1_bias and specify a prior instead - a tuple of the form (prior_mean, prior_std).

Specifically, we assume that the bias will be on average 0 with a standard deviation 0.05 (under a normal model). This is a unrealistically strong prior, but it serves for demonstration.

cfg.param_type1_bias.group = None
cfg.param_type1_bias.prior = (0, 0.05)
rem = remeta.ReMeta(cfg=cfg)
rem.fit(data.stimuli, data.choices, data.confidence)
result = rem.summary()

According to our prior, a null effect for the type1_bias should be most likely. Indeed, as seen in the following output, the type1_bias is biased towards 0 compared to the original estimates without a prior:

for s in range(result.n_subjects):
    print(f'Subject {s}')
    for k, v in result.type1.params[s].items():
        print(f'\t{k}: {v:.3f}')
Subject 0
	type1_noise: 0.605
	type1_bias: 0.064
Subject 1
	type1_noise: 0.403
	type1_bias: 0.038
Subject 2
	type1_noise: 0.512
	type1_bias: 0.070
Subject 3
	type1_noise: 0.636
	type1_bias: 0.025
Subject 4
	type1_noise: 0.509
	type1_bias: 0.035