MCMC Tutorial¶
This tutorial describes the available options when running an MCMC
with mc3
. The following sections make up a script meant to be run
from the Python interpreter or in a Python script. At the
bottom of this page you can see the entire script.
Preamble¶
In this tutorial, we will use the following function to create and fit a synthetic dataset following a quadratic behavior:
def quad(p, x):
"""
Quadratic polynomial function.
Parameters
p: Polynomial constant, linear, and quadratic coefficients.
x: Array of dependent variables where to evaluate the polynomial.
Returns
y: Polinomial evaluated at x: y(x) = p0 + p1*x + p2*x^2
"""
y = p[0] + p[1]*x + p[2]*x**2.0
return y
Argument Inputs¶
From the shell, the arguments can be input as command-line arguments.
However, in this case, the best option is to specify all inputs in a
cconfiguration file. An mc3
configuration file follows the
configparser
standard format described here.
To see all the available options, run:
mc3 --help
From the Python interpreter, the arguments must be input as function arguments. To see the available options, run:
import mc3
help(mc3.sample)
Input Data¶
The data
and uncert
arguments (required) defines the dataset
to be fitted and their \(1\sigma\) uncertainties, respectively.
Each one of these arguments is a 1D float ndarray:
# Preamble (create a synthetic dataset, in a real scenario you would
# get your dataset from your own data analysis pipeline):
np.random.seed(314)
x = np.linspace(0, 10, 1000)
p0 = [3, -2.4, 0.5]
y = quad(p0, x)
uncert = np.sqrt(np.abs(y))
error = np.random.normal(0, uncert)
data = y + error
Note
Alternatively, the data
argument can be a string
specifying a Numpy npz filename containing the data and
uncert arrays. See the Input Data File Section below.
Modeling Function¶
The func
argument (required) defines the parameterized modeling
function fitting the data. The user can either set func
as a
callable, e.g.:
# Define the modeling function as a callable:
func = quad
or as a tuple of strings pointing to the modeling function, e.g.:
# A three-element tuple indicates the function's name, the module
# name (without the '.py' extension), and the path to the module.
func = ('quad', 'quadratic', '/path/to/quadratic/model/')
# If the module is already within the scope of the Python path,
# the user can set func as a two-elements tuple:
func = ('quad', 'quadratic')
Note
The only requirement for the modeling function is that its
arguments follow the same structure of the callable in
scipy.optimize.leastsq
, i.e., the first argument is a 1D
iterable containing the fitting parameters.
The indparams
and indparams_dict
arguments (optional) can be
used to set any additional func
argument as a list or keyword,
respectively. Eventually, the modeling function has to able to be
called as: model = func(params, *indparams, **indparams_dict)
# List of additional arguments of func (if necessary):
indparams = [x]
Fitting Parameters¶
The params
argument (required) is a 1D float ndarray containing
the initial-guess values for the model fitting parameters.
# Array of initial-guess values of fitting parameters:
params = np.array([ 10.0, -2.0, 0.1])
The pmin
and pmax
arguments (optional) are 1D float ndarrays
that set lower and upper boundaries explored by the MCMC, for each
fitting parameter (same size as params
).
# Lower and upper boundaries for the MCMC exploration:
pmin = np.array([-10.0, -20.0, -10.0])
pmax = np.array([ 40.0, 20.0, 10.0])
If a proposed step falls outside the set boundaries,
that iteration is automatically rejected.
The default values for each element of pmin
and pmax
are
-np.inf
and +np.inf
, respectively.
Parameters Stepping Behavior¶
The pstep
argument (optional) is a 1D float ndarray that defines
the stepping behavior of the fitting parameters over the parameter
space. This argument has actually a dual purpose:
Stepping Behavior¶
First, it can keep a fitting parameter fixed by setting its pstep
value to zero, for example:
# Keep the third parameter fixed:
pstep = np.array([1.0, 0.5, 0.0])
It can force a fitting parameter to share its value with another
parameter by setting its pstep
value equal to the negative index
of the sharing parameter, for example:
# Make the third parameter share the value of the second parameter:
pstep = np.array([1.0, 0.5, -2])
Otherwise, a positive pstep
value leaves the parameter as a free
fitting parameter:
# Parameters' stepping behavior:
pstep = np.array([1.0, 0.5, 0.1])
Stepping Scale¶
pstep
also sets the step size of the free parameters. For a
differential-evolution run (e.g., sampler = 'snooker'
), mc3
starts the MCMC drawing samples from a normal distribution for each
parameter, whose standard deviation is set by the pstep
values.
For a classic Metropolis random walk (sampler = 'mrw'
), the pstep
values set the standard deviation of the Gaussian proposal jumps for
each parameter.
For more details on the MCMC algorithms, see Sampler Algorithm.
Parameter Priors¶
The prior
, priorlow
, and priorup
arguments (optional) are
1D float ndarrays that set the prior estimate, lower uncertainty, and
upper uncertainty of the fitting parameters. mc3
supports two
types of priors:
A priorlow
value of zero (default) defines a uniform prior between
the parameter boundaries. This is appropriate when there is no prior
knowledge for a parameter \(\theta\):
Positive priorlow
and priorup
values define a Gaussian
prior for a parameter \(\theta\):
where prior
sets the prior value \(\theta_{p}\), and
priorlow
and priorup
set the lower and upper \(1\sigma\) prior uncertainties,
\(\sigma_{p}\), of the prior (depending if the proposed value
\(\theta\) is lower or higher than \(\theta_{p}\), respectively).
The leading factor is given by: \(A =
2/(\sqrt{2\pi}(\sigma_{\rm up}+\sigma_{\rm lo}))\) (see [Wallis2014]), which reduces to
the familiar Gaussian normal distribution when \(\sigma_{\rm up}
= \sigma_{\rm lo}\):
For example, to explicitly set uniform priors for all parameters:
# Parameter prior probability distributions:
prior = np.array([ 0.0, 0.0, 0.0])
priorlow = np.array([ 0.0, 0.0, 0.0])
priorup = np.array([ 0.0, 0.0, 0.0])
Parameter Names¶
The pnames
argument (optional) define the names of the model
parametes to be shown in the scren output and figure labels. The
screen output will display up to 11 characters. For figures, the
texnames
argument (optional) enables names using LaTeX syntax, for
example:
# Parameter names:
pnames = ['y0', 'alpha', 'beta']
texnames = [r'$y_{0}$', r'$\alpha$', r'$\beta$']
If texnames = None
, it defaults to pnames
. If pnames =
None
, it defaults to texnames
. If both arguments are None
,
they default to a generic [Param 1, Param 2, ...]
list.
Sampler Algorithm¶
The sampler
argument (required) defines the sampler algorithm
for the MCMC:
# Sampler algorithm, choose from: 'snooker', 'demc' or 'mrw'.
sampler = 'snooker'
The standard Differential-Evolution MCMC algorithm (sampler = 'demc'
,
[terBraak2006]) proposes for each chain \(i\) in state
\(\mathbf{x}_{i}\):
where \(\mathbf{x}_{R1}\) and \(\mathbf{x}_{R2}\) are randomly
selected without replacement from the population of current states
except \(\mathbf{x}_{i}\). This implementation adopts
\(\gamma=f_{\gamma} 2.38/\sqrt{2 N_{\rm free}}\), with \(N_{\rm
free}\) the number of free parameters; and
\(\mathbf{e}\sim \mathcal{N}(0, \sigma^2)\), with \(\sigma=f_{e}\)
pstep
, where the scaling factors are defaulted to
\(f_{\gamma}=1.0\) and \(f_{e}=0.0\) (see Fine-tuning).
If sampler = 'snooker'
(recommended), mc3
will use the
DEMC-zs algorithm with snooker propsals (see [terBraakVrugt2008]).
If sampler = 'mrw'
, mc3
will use the classical Metropolis-Hastings
algorithm with Gaussian proposal distributions. I.e., in each
iteration and for each parameter, \(\theta\), the MCMC will propose
jumps, drawn from
Gaussian distributions centered at the current value, \(\theta_0\), with
a standard deviation, \(\sigma\), given by the values in the pstep
argument:
Note
For sampler=snooker
, an MCMC works well with 3 chains or
more. For sampler=demc
, [terBraak2006] suggest using
\(2 N_{\rm free}\) chains. From experience, I recommend
the snooker
, as it is more efficient than most others
MCMC random walks.
MCMC Configuration¶
The following arguments set the MCMC chains configuration:
# MCMC setup:
nsamples = 1e5
burnin = 1000
nchains = 14
ncpu = 7
thinning = 1
The nsamples
argument (required for MCMC runs) sets the total
number of MCMC samples to compute.
The burnin
argument (optional, default: 0) sets the number
of burned-in (removed) iterations at the beginning of each chain.
The nchains
argument (optional, default: 7) sets the number
of parallel chains to use.
The ncpu
argument (optional, default: nchains
) sets the number
CPUs to use for the chains. mc3
runs in multiple processors
through the mutiprocessing
Python Standard-Library package
(additionaly, the central MCMC hub will use one extra CPU. Thus, the
total number of CPUs used is ncpu
+ 1).
Note
If ncpu+1
is greater than the number of available CPUs
in the machine, mc3
will cap ncpu
to the number of
available CPUs minus one. To keep a good balance, I
recommend setting nchains
equal to a multiple of chains
ncpu
as in the example above.
The thinning
argument (optional, default: 1) sets the chains
thinning factor (discarding all but every thinning
-th sample).
To reduce the memory usage, when requested, only the thinned samples
are stored (and returned).
Note
Thinning is often unnecessary for a DE run, since this algorithm reduces significatively the sampling autocorrelation.
Pre-MCMC Setup¶
The following arguments set how the code set the initial values for the MCMC chains:
# MCMC initial draw, choose from: 'normal' or 'uniform'
kickoff = 'normal'
# DEMC snooker pre-MCMC sample size:
hsize = 10
The starting point of the MCMC chains come from a random draw, set by
the kickoff
argument (optional, default: ‘normal’). This can be a
Normal-distribution draw centered at params
with standard
deviation pstep
; or it can be a uniform draw bewteen pmin
and
pmax
.
The snooker DEMC, in particular, needs an initial sample, set by the
hsize
argument (optional, default: 10). The draws from this
initial sample do not count for the posterior-distribution statistics.
Usually, these variables do not have a significant impact in the outputs. Thus, they can be left at their default values.
Optimization¶
When not None, the leastsq
argument (optional, default: None) run
a least-squares optimization before the MCMC:
# Optimization before MCMC, choose from: 'lm' or 'trf':
leastsq = 'lm'
chisqscale = False
Set leastsq='lm'
to
use the Levenberg-Marquardt algorithm via Scipy’s leastsq,
or set leastsq='trf'
to use the Trust Region Reflective algorithm
via Scipy’s least_squares.
Fixed and shared-values apply during the optimization (see
Stepping Behavior), as well as the priors (see Parameter Priors).
Note
From the scipy documentation: Levenberg-Marquardt ‘doesn’t handle bounds’ but is ‘the most efficient method for small unconstrained problems’; whereas the Trust Region Reflective algorithm is a ‘Generally robust method, suitable for large sparse problems with bounds’.
The chisqscale
argument (optional, default: False) is a flag to
scale the data uncertainties to enforce a reduced \(\chi^{2}\)
equal to \(1.0\). The scaling applies by multiplying all
uncertainties by a common scale factor.
Convergence¶
mc3
checks for convergence through the Gelman-Rubin test
([GelmanRubin1992]):
# MCMC Convergence:
grtest = True
grbreak = 1.01
grnmin = 0.5
The grtest
argument (optional, default: False), when True, runs
the Gelman-Rubin convergence test. Values larger than 1.01 are
indicative of non-convergence. See [GelmanRubin1992] for further
information. The Gelman-Rubin test is computed every 10% of the MCMC
exploration.
The grbreak
argument (optional, default: 0.0) sets a convergence
threshold to stop an MCMC when GR drops below grbreak
. Reasonable
values seem to be \({\rm grbreak} \lesssim 1.01\). The default
behavior is not to break (grbreak = 0.0
).
The grnmin
argument (optional, default: 0.5) sets a minimum number
of valid samples (after burning and thinning) required for
grbreak
. If grnmin
is greater than one, it defines the
minimum number of samples to run before breaking out of the MCMC. If
grnmin
is lower than one, it defines the fraction of the total
samples to run before breaking out of the MCMC.
Wavelet-Likelihood MCMC¶
The wlike
argument (optional, default: False) allows mc3
to
implement the Wavelet-based method to account for time-correlated noise.
When using this method, the used must append the three additional fitting
parameters (\(\gamma, \sigma_{r}, \sigma_{w}\)) from [CarterWinn2009]
to the end of the params
array. Likewise, add the correspoding values
to the pmin
, pmax
, stepsize
, prior
, priorlow
,
and priorup
arrays.
For further information see [CarterWinn2009].
This tutorial won’t use the wavelet method:
# Carter & Winn (2009) Wavelet-likelihood method:
wlike = False
Fine-tuning¶
The \(f_{\gamma}\) and \(f_{e}\) factors scale the DEMC proposal distributions.
fgamma = 1.0 # Scale factor for DEMC's gamma jump.
fepsilon = 0.0 # Jump scale factor for DEMC's "e" distribution
The default \(f_{\gamma} = 1.0\) value is set such that the MCMC acceptance rate approaches 25–40%. Therefore, most of the time, the user won’t need to modify this. Only if the acceptance rate is very low, we recommend to set \(f_{\gamma} < 1.0\). The \(f_{e}\) factor sets the jump scale for the \(\mathbf e\) distribution, which has to have a small variance compared to the posterior. For further information, see [terBraak2006].
Logging¶
If not None, the log
argument (optional, default: None) stores the
screen output into a log file. log
can either be a string
of the filename where to store the log, or an
mc3.utils.Log
object (see API).
# Logging:
log = 'MCMC_tutorial.log'
Outputs¶
The following arguments set the output files produced by mc3
:
# File outputs:
savefile = 'MCMC_tutorial.npz'
plots = True
theme = 'indigo'
statistics = 'med_central'
rms = True
The savefile
argument (optional, default: None) defines an
.npz
file names where to store the MCMC outputs. This file
contains the following key–items:
posterior
: thinned posterior distribution of shape [nsamples, nfree], including burn-in phase.zchain
: chain indices for the posterior samples.zmask
: posterior mask to remove the burn-in.chisq
: \(\chi^2\) values for the posterior samples.log_post
: log of the posterior for the sample (as defined here).burnin
: number of burned-in samples per chain.ifree
: Indices of the free parameters.pnames
: Parameter names.texnames
: Parameter names in Latex format.medianp
: median of the marginal posteriors for each model parameter.meanp
: mean of the marginal posteriors for each model parameter.stdp
: standard deviation of the marginal posteriors for each model parameter.median_low_bounds
: lower boundary of the marginal central 68%-quantile credible interval for each model parameter.median_high_bounds
: upper boundary of the marginal central 68%-quantile credible interval.hpd_low_bounds
: lower boundary of the marginal 68%-HPD credible interval for each model parameter.hpd_high_bounds
: upper boundary of the marginal 68%-HPD credible interval.stddev_residuals
: standard deviation of the residuals.acceptance_rate
: sample’s acceptance rate.best_log_post
: optimal log of the posterior in the sample (see here).bestp
: model parameters for thebest_log_post
sample.best_model
: model evaluated atbestp
.best_chisq
: \(\chi^2\) for thebest_log_post
sample.red_chisq
: reduced chi-squared: \(\chi^2/(N_{\rm data}-N_{\rm free})\) for thebest_log_post
sample.BIC
: Bayesian Information Criterion: \(\chi^2 -N_{\rm free} \log(N_{\rm data})\) for thebest_log_post
sample.chisq_factor
: Uncertainties scale factor to enforce \(\chi^2_{\rm red} \equiv 1\).
Note
Note that if there are fixed or shared parameters, then the
number of free parameters won’t be the same as the number of model
parameters. The posterior
array in the output includes only
the free parameters, whereas the other arrays include all model
parameters. Fixed and shared parameters can be filtered out using
the ifree
mask.
A file ‘rootname_statistics.txt’ summarizing all statistics in
machine readable and latex format will always be produced. The root
name of this file is taken from savefile
without the file
extension.
Setting plots=True
(optional argument, default: False) will
generate figures for the MCMC chain traces, the marginalized posterior
histograms, and pair-wise posteriors. Setting the ioff
argument
to True (optional, default: False) will turn the display interactive
mode off. The theme
argument enables the user to set a color
theme for the plots.
statistics
argument sets the statistics to use in the plots (parameter estimates and credible intervals). Select from:statistics = med_central
: Median and central quantile credible intervals (default)statistics = max_like
: Max marginal likelihood (mode) and HPD credible intervalsstatistics = global_max_like
: Max a posteriori (best-fit) and HPD credible intervalsSet the rms
argument (optional, default: False) to True to compute
and plot the time-averaging test for time-correlated noise (see
[Winn2008]).
MCMC Run¶
Putting it all together, here’s a Python script to run an mc3
retrieval explicitly defining all the variables described above:
import sys
import numpy as np
import mc3
def quad(p, x):
"""
Quadratic polynomial function.
Parameters
p: Polynomial constant, linear, and quadratic coefficients.
x: Array of dependent variables where to evaluate the polynomial.
Returns
y: Polinomial evaluated at x: y(x) = p0 + p1*x + p2*x^2
"""
y = p[0] + p[1]*x + p[2]*x**2.0
return y
# ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
# Preamble (create a synthetic dataset, in a real scenario you would
# get your dataset from your own data analysis pipeline):
np.random.seed(314)
x = np.linspace(0, 10, 1000)
p0 = [3, -2.4, 0.5]
y = quad(p0, x)
uncert = np.sqrt(np.abs(y))
error = np.random.normal(0, uncert)
data = y + error
# ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
# Define the modeling function as a callable:
func = quad
# List of additional arguments of func (if necessary):
indparams = [x]
# Array of initial-guess values of fitting parameters:
params = np.array([ 10.0, -2.0, 0.1])
# Lower and upper boundaries for the MCMC exploration:
pmin = np.array([-10.0, -20.0, -10.0])
pmax = np.array([ 40.0, 20.0, 10.0])
# Parameters' stepping behavior:
pstep = np.array([1.0, 0.5, 0.1])
# Parameter prior probability distributions:
prior = np.array([ 0.0, 0.0, 0.0])
priorlow = np.array([ 0.0, 0.0, 0.0])
priorup = np.array([ 0.0, 0.0, 0.0])
# Parameter names:
pnames = ['y0', 'alpha', 'beta']
texnames = [r'$y_{0}$', r'$\alpha$', r'$\beta$']
# Sampler algorithm, choose from: 'snooker', 'demc' or 'mrw'.
sampler = 'snooker'
# MCMC setup:
nsamples = 1e5
burnin = 1000
nchains = 14
ncpu = 7
thinning = 1
# MCMC initial draw, choose from: 'normal' or 'uniform'
kickoff = 'normal'
# DEMC snooker pre-MCMC sample size:
hsize = 10
# Optimization before MCMC, choose from: 'lm' or 'trf':
leastsq = 'lm'
chisqscale = False
# MCMC Convergence:
grtest = True
grbreak = 1.01
grnmin = 0.5
# Logging:
log = 'MCMC_tutorial.log'
# File outputs:
savefile = 'MCMC_tutorial.npz'
plots = True
theme = 'indigo'
statistics = 'med_central'
rms = True
# Carter & Winn (2009) Wavelet-likelihood method:
wlike = False
# Run the MCMC:
output = mc3.sample(
data=data, uncert=uncert, func=func, params=params,
indparams=indparams, pmin=pmin, pmax=pmax, pstep=pstep,
pnames=pnames, texnames=texnames,
prior=prior, priorlow=priorlow, priorup=priorup,
sampler=sampler, nsamples=nsamples, nchains=nchains,
ncpu=ncpu, burnin=burnin, thinning=thinning,
leastsq=leastsq, chisqscale=chisqscale,
grtest=grtest, grbreak=grbreak, grnmin=grnmin,
hsize=hsize, kickoff=kickoff,
wlike=wlike, log=log,
plots=plots, theme=theme, statistics=statistics,
savefile=savefile, rms=rms,
)
This routine returns a dictionary containing the outputs listed in Outputs. The screen output should look like this:
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
Multi-core Markov-chain Monte Carlo (mc3).
Version 3.1.0.
Copyright (c) 2015-2023 Patricio Cubillos and collaborators.
mc3 is open-source software under the MIT license (see LICENSE).
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
Least-squares best-fitting parameters:
[ 3.02203328 -2.3897706 0.49543328]
Yippee Ki Yay Monte Carlo!
Start MCMC chains (Thu Mar 30 01:14:54 2023)
[: ] 10.0% completed (Thu Mar 30 01:14:55 2023)
Out-of-bound Trials:
[0 0 0]
Best Parameters: (chisq=1035.2269)
[ 3.02203328 -2.3897706 0.49543328]
[:: ] 20.0% completed (Thu Mar 30 01:14:55 2023)
Out-of-bound Trials:
[0 0 0]
Best Parameters: (chisq=1035.2269)
[ 3.02203328 -2.3897706 0.49543328]
Gelman-Rubin statistics for free parameters:
[1.0321583 1.02847767 1.0228459 ]
[::: ] 30.0% completed (Thu Mar 30 01:14:56 2023)
Out-of-bound Trials:
[0 0 0]
Best Parameters: (chisq=1035.2269)
[ 3.02203328 -2.3897706 0.49543328]
Gelman-Rubin statistics for free parameters:
[1.0048345 1.00476673 1.004175 ]
All parameters converged to within 1% of unity.
[:::: ] 40.0% completed (Thu Mar 30 01:14:56 2023)
Out-of-bound Trials:
[0 0 0]
Best Parameters: (chisq=1035.2269)
[ 3.02203328 -2.3897706 0.49543328]
Gelman-Rubin statistics for free parameters:
[1.00292334 1.00357332 1.00374888]
All parameters converged to within 1% of unity.
[::::: ] 50.0% completed (Thu Mar 30 01:14:57 2023)
Out-of-bound Trials:
[0 0 0]
Best Parameters: (chisq=1035.2269)
[ 3.02203328 -2.3897706 0.49543328]
Gelman-Rubin statistics for free parameters:
[1.00158522 1.00181851 1.00164976]
All parameters converged to within 1% of unity.
[:::::: ] 60.0% completed (Thu Mar 30 01:14:57 2023)
Out-of-bound Trials:
[0 0 0]
Best Parameters: (chisq=1035.2269)
[ 3.02203328 -2.3897706 0.49543328]
Gelman-Rubin statistics for free parameters:
[1.00058249 1.0005924 1.00048652]
All parameters converged to within 1% of unity.
All parameters satisfy the GR convergence threshold of 1.01, stopping
the MCMC.
MCMC Summary:
-------------
Number of evaluated samples: 61899
Number of parallel chains: 14
Average iterations per chain: 4421
Burned-in iterations per chain: 1000
Thinning factor: 1
MCMC sample size (thinned, burned): 47899
Acceptance rate: 28.82%
Parameter name best fit median 1sigma_low 1sigma_hi S/N
--------------- ----------- ----------------------------------- ---------
y0 3.0220e+00 3.0222e+00 -1.2339e-01 1.2645e-01 24.6
alpha -2.3898e+00 -2.3898e+00 -7.3264e-02 7.2663e-02 33.9
beta 4.9543e-01 4.9538e-01 -8.7058e-03 8.8865e-03 57.6
Best-parameter's chi-squared: 1035.2269
Best-parameter's -2*log(posterior): 1035.2269
Bayesian Information Criterion: 1055.9502
Reduced chi-squared: 1.0383
Standard deviation of residuals: 2.77253
For a detailed summary with all parameter posterior statistics see
MCMC_tutorial_statistics.txt
Output sampler files:
MCMC_tutorial_statistics.txt
MCMC_tutorial.npz
MCMC_tutorial_trace.png
MCMC_tutorial_pairwise_posterior.png
MCMC_tutorial_marginal_posterior.png
MCMC_tutorial_RMS.png
MCMC_tutorial.log
Resume a Previous Run¶
It is also possible to add more samples to a previous run (identified
by the .npz output file name). To do this, set the resume = True
.
Ideally, keep the same number of MCMC chains from the previous run to avoid
any conflict with the shape of the posterior.
This resumed run will append nsamples
samples into the posterior
output from the previous run (overwritting all output files).
Inputs from Files¶
The data
, uncert
, indparams
, params
, pmin
, pmax
,
stepsize
, prior
, priorlow
, and priorup
input arrays
can be optionally be given as input file.
Furthermore, multiple input arguments can be combined into a single file.
Input Data File¶
The data
, uncert
, and indparams
inputs can be provided as
binary numpy
.npz
files.
data
and uncert
can be stored together into a single file.
An indparams
input file contain the list of independent variables
(must be a list, even if there is a single independent variable).
The utils
sub-package of mc3
provide utility functions to save
and load these files. This script shows how to create data
and
indparams
input files:
import numpy as np
import mc3
# Create a synthetic dataset using a quadratic polynomial curve:
x = np.linspace(0.0, 10, 1000)
p0 = [3, -2.4, 0.5]
y = quad(p0, x)
error = np.random.normal(0, uncert)
data = y + error
uncert = np.sqrt(np.abs(y))
# data.npz contains the data and uncertainty arrays:
mc3.utils.savebin([data, uncert], 'data.npz')
# indp.npz contains a list of variables:
mc3.utils.savebin([x], 'indp.npz')
Model Parameters¶
The params
, pmin
, pmax
, stepsize
,
prior
, priorlow
, and priorup
inputs
can be provided as plain ASCII files.
For simplycity all of these input arguments can be combined into
a single file.
In the params
file, each line correspond to one model
parameter, whereas each column correspond to one of the input array arguments.
This input file can hold as few or as many of these argument arrays,
as long as they are provided in that exact order.
Empty or comment lines are allowed (and ignored by the reader).
A valid params file look like this:
# params pmin pmax stepsize
10 -10 40 1.0
-2.0 -20 20 0.5
0.1 -10 10 0.1
Alternatively, the utils
sub-package of mc3
provide utility
functions to save and load these files:
params = [ 10, -2.0, 0.1]
pmin = [-10, -20, -10]
pmax = [ 40, 20, 10]
stepsize = [ 1, 0.5, 0.1]
# Store ASCII arrays:
mc3.utils.saveascii([params, pmin, pmax, stepsize], 'params.txt')
Then, to run the MCMC simply provide the input file names to the mc3
routine:
# Set arguments as the file names:
data = 'data.npz'
indparams = 'indp.npz'
params = 'params.txt'
# Run the MCMC:
mc3_output = mc3.sample(data=data, func=func, params=params,
indparams=indparams, sampler=sampler, nsamples=nsamples, nchains=nchains,
ncpu=ncpu, burnin=burnin, leastsq=leastsq, chisqscale=chisqscale,
grtest=grtest, grbreak=grbreak, grnmin=grnmin,
log=log, plots=plots, savefile=savefile, rms=rms)