MCMC Tutorial

This tutorial describes the available options when running an MCMC with mc3. The following sections make up a script meant to be run from the Python interpreter or in a Python script. At the bottom of this page you can see the entire script.

Preamble

In this tutorial, we will use the following function to create and fit a synthetic dataset following a quadratic behavior:

def quad(p, x):
    """
    Quadratic polynomial function.

    Parameters
        p: Polynomial constant, linear, and quadratic coefficients.
        x: Array of dependent variables where to evaluate the polynomial.
    Returns
        y: Polinomial evaluated at x:  y(x) = p0 + p1*x + p2*x^2
    """
    y = p[0] + p[1]*x + p[2]*x**2.0
    return y

Argument Inputs

From the shell, the arguments can be input as command-line arguments. However, in this case, the best option is to specify all inputs in a cconfiguration file. An mc3 configuration file follows the configparser standard format described here. To see all the available options, run:

mc3 --help

From the Python interpreter, the arguments must be input as function arguments. To see the available options, run:

import mc3
help(mc3.sample)

Input Data

The data and uncert arguments (required) defines the dataset to be fitted and their \(1\sigma\) uncertainties, respectively. Each one of these arguments is a 1D float ndarray:

# Preamble (create a synthetic dataset, in a real scenario you would
# get your dataset from your own data analysis pipeline):
np.random.seed(314)
x  = np.linspace(0, 10, 1000)
p0 = [3, -2.4, 0.5]
y  = quad(p0, x)

uncert = np.sqrt(np.abs(y))
error  = np.random.normal(0, uncert)
data   = y + error

Note

Alternatively, the data argument can be a string specifying a Numpy npz filename containing the data and uncert arrays. See the Input Data File Section below.

Modeling Function

The func argument (required) defines the parameterized modeling function fitting the data. The user can either set func as a callable, e.g.:

# Define the modeling function as a callable:
func = quad

or as a tuple of strings pointing to the modeling function, e.g.:

# A three-element tuple indicates the function's name, the module
# name (without the '.py' extension), and the path to the module.
func = ('quad', 'quadratic', '/path/to/quadratic/model/')

# If the module is already within the scope of the Python path,
# the user can set func as a two-elements tuple:
func = ('quad', 'quadratic')

Note

The only requirement for the modeling function is that its arguments follow the same structure of the callable in scipy.optimize.leastsq, i.e., the first argument is a 1D iterable containing the fitting parameters.

The indparams and indparams_dict arguments (optional) can be used to set any additional func argument as a list or keyword, respectively. Eventually, the modeling function has to able to be called as: model = func(params, *indparams, **indparams_dict)

# List of additional arguments of func (if necessary):
indparams = [x]

Fitting Parameters

The params argument (required) is a 1D float ndarray containing the initial-guess values for the model fitting parameters.

# Array of initial-guess values of fitting parameters:
params = np.array([ 10.0, -2.0, 0.1])

The pmin and pmax arguments (optional) are 1D float ndarrays that set lower and upper boundaries explored by the MCMC, for each fitting parameter (same size as params).

# Lower and upper boundaries for the MCMC exploration:
pmin = np.array([-10.0, -20.0, -10.0])
pmax = np.array([ 40.0,  20.0,  10.0])

If a proposed step falls outside the set boundaries, that iteration is automatically rejected. The default values for each element of pmin and pmax are -np.inf and +np.inf, respectively.

Parameters Stepping Behavior

The pstep argument (optional) is a 1D float ndarray that defines the stepping behavior of the fitting parameters over the parameter space. This argument has actually a dual purpose:

Stepping Behavior

First, it can keep a fitting parameter fixed by setting its pstep value to zero, for example:

# Keep the third parameter fixed:
pstep = np.array([1.0, 0.5, 0.0])

It can force a fitting parameter to share its value with another parameter by setting its pstep value equal to the negative index of the sharing parameter, for example:

# Make the third parameter share the value of the second parameter:
pstep = np.array([1.0, 0.5, -2])

Otherwise, a positive pstep value leaves the parameter as a free fitting parameter:

# Parameters' stepping behavior:
pstep = np.array([1.0, 0.5, 0.1])

Stepping Scale

pstep also sets the step size of the free parameters. For a differential-evolution run (e.g., sampler = 'snooker'), mc3 starts the MCMC drawing samples from a normal distribution for each parameter, whose standard deviation is set by the pstep values. For a classic Metropolis random walk (sampler = 'mrw'), the pstep values set the standard deviation of the Gaussian proposal jumps for each parameter.

For more details on the MCMC algorithms, see Sampler Algorithm.

Parameter Priors

The prior, priorlow, and priorup arguments (optional) are 1D float ndarrays that set the prior estimate, lower uncertainty, and upper uncertainty of the fitting parameters. mc3 supports two types of priors:

A priorlow value of zero (default) defines a uniform prior between the parameter boundaries. This is appropriate when there is no prior knowledge for a parameter \(\theta\):

\[p(\theta) = \frac{1}{\theta_{\rm max} - \theta_{\rm min}},\]

Positive priorlow and priorup values define a Gaussian prior for a parameter \(\theta\):

\[p(\theta) = A \exp\left(\frac{-(\theta-\theta_{p})^{2}}{2\sigma_{p}^{2}}\right),\]

where prior sets the prior value \(\theta_{p}\), and priorlow and priorup set the lower and upper \(1\sigma\) prior uncertainties, \(\sigma_{p}\), of the prior (depending if the proposed value \(\theta\) is lower or higher than \(\theta_{p}\), respectively). The leading factor is given by: \(A = 2/(\sqrt{2\pi}(\sigma_{\rm up}+\sigma_{\rm lo}))\) (see [Wallis2014]), which reduces to the familiar Gaussian normal distribution when \(\sigma_{\rm up} = \sigma_{\rm lo}\):

\[p(\theta) = \frac{1}{\sqrt{2\pi}\sigma_{p}} \exp\left(\frac{-(\theta-\theta_{p})^{2}}{2\sigma_{p}^{2}}\right),\]

For example, to explicitly set uniform priors for all parameters:

# Parameter prior probability distributions:
prior    = np.array([ 0.0, 0.0, 0.0])
priorlow = np.array([ 0.0, 0.0, 0.0])
priorup  = np.array([ 0.0, 0.0, 0.0])

Parameter Names

The pnames argument (optional) define the names of the model parametes to be shown in the scren output and figure labels. The screen output will display up to 11 characters. For figures, the texnames argument (optional) enables names using LaTeX syntax, for example:

# Parameter names:
pnames = ['y0', 'alpha', 'beta']
texnames = [r'$y_{0}$', r'$\alpha$', r'$\beta$']

If texnames = None, it defaults to pnames. If pnames = None, it defaults to texnames. If both arguments are None, they default to a generic [Param 1, Param 2, ...] list.

Sampler Algorithm

The sampler argument (required) defines the sampler algorithm for the MCMC:

# Sampler algorithm, choose from: 'snooker', 'demc' or 'mrw'.
sampler = 'snooker'

The standard Differential-Evolution MCMC algorithm (sampler = 'demc', [terBraak2006]) proposes for each chain \(i\) in state \(\mathbf{x}_{i}\):

\[\mathbf{x}^* = \mathbf{x}_i + \gamma (\mathbf{x}_{R1}-\mathbf{x}_{R2}) + \mathbf{e},\]

where \(\mathbf{x}_{R1}\) and \(\mathbf{x}_{R2}\) are randomly selected without replacement from the population of current states except \(\mathbf{x}_{i}\). This implementation adopts \(\gamma=f_{\gamma} 2.38/\sqrt{2 N_{\rm free}}\), with \(N_{\rm free}\) the number of free parameters; and \(\mathbf{e}\sim \mathcal{N}(0, \sigma^2)\), with \(\sigma=f_{e}\) pstep, where the scaling factors are defaulted to \(f_{\gamma}=1.0\) and \(f_{e}=0.0\) (see Fine-tuning).

If sampler = 'snooker' (recommended), mc3 will use the DEMC-zs algorithm with snooker propsals (see [terBraakVrugt2008]).

If sampler = 'mrw', mc3 will use the classical Metropolis-Hastings algorithm with Gaussian proposal distributions. I.e., in each iteration and for each parameter, \(\theta\), the MCMC will propose jumps, drawn from Gaussian distributions centered at the current value, \(\theta_0\), with a standard deviation, \(\sigma\), given by the values in the pstep argument:

\[q(\theta) = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp \left( -\frac{(\theta-\theta_0)^2}{2 \sigma^2}\right)\]

Note

For sampler=snooker, an MCMC works well with 3 chains or more. For sampler=demc, [terBraak2006] suggest using \(2 N_{\rm free}\) chains. From experience, I recommend the snooker, as it is more efficient than most others MCMC random walks.

MCMC Configuration

The following arguments set the MCMC chains configuration:

# MCMC setup:
nsamples =  1e5
burnin   = 1000
nchains  =   14
ncpu     =    7
thinning =    1

The nsamples argument (required for MCMC runs) sets the total number of MCMC samples to compute.

The burnin argument (optional, default: 0) sets the number of burned-in (removed) iterations at the beginning of each chain.

The nchains argument (optional, default: 7) sets the number of parallel chains to use.

The ncpu argument (optional, default: nchains) sets the number CPUs to use for the chains. mc3 runs in multiple processors through the mutiprocessing Python Standard-Library package (additionaly, the central MCMC hub will use one extra CPU. Thus, the total number of CPUs used is ncpu + 1).

Note

If ncpu+1 is greater than the number of available CPUs in the machine, mc3 will cap ncpu to the number of available CPUs minus one. To keep a good balance, I recommend setting nchains equal to a multiple of chains ncpu as in the example above.

The thinning argument (optional, default: 1) sets the chains thinning factor (discarding all but every thinning-th sample). To reduce the memory usage, when requested, only the thinned samples are stored (and returned).

Note

Thinning is often unnecessary for a DE run, since this algorithm reduces significatively the sampling autocorrelation.

Pre-MCMC Setup

The following arguments set how the code set the initial values for the MCMC chains:

# MCMC initial draw, choose from: 'normal' or 'uniform'
kickoff = 'normal'
# DEMC snooker pre-MCMC sample size:
hsize = 10

The starting point of the MCMC chains come from a random draw, set by the kickoff argument (optional, default: ‘normal’). This can be a Normal-distribution draw centered at params with standard deviation pstep; or it can be a uniform draw bewteen pmin and pmax.

The snooker DEMC, in particular, needs an initial sample, set by the hsize argument (optional, default: 10). The draws from this initial sample do not count for the posterior-distribution statistics.

Usually, these variables do not have a significant impact in the outputs. Thus, they can be left at their default values.

Optimization

When not None, the leastsq argument (optional, default: None) run a least-squares optimization before the MCMC:

# Optimization before MCMC, choose from: 'lm' or 'trf':
leastsq = 'lm'
chisqscale = False

Set leastsq='lm' to use the Levenberg-Marquardt algorithm via Scipy’s leastsq, or set leastsq='trf' to use the Trust Region Reflective algorithm via Scipy’s least_squares. Fixed and shared-values apply during the optimization (see Stepping Behavior), as well as the priors (see Parameter Priors).

Note

From the scipy documentation: Levenberg-Marquardt ‘doesn’t handle bounds’ but is ‘the most efficient method for small unconstrained problems’; whereas the Trust Region Reflective algorithm is a ‘Generally robust method, suitable for large sparse problems with bounds’.

The chisqscale argument (optional, default: False) is a flag to scale the data uncertainties to enforce a reduced \(\chi^{2}\) equal to \(1.0\). The scaling applies by multiplying all uncertainties by a common scale factor.

Convergence

mc3 checks for convergence through the Gelman-Rubin test ([GelmanRubin1992]):

# MCMC Convergence:
grtest = True
grbreak = 1.01
grnmin = 0.5

The grtest argument (optional, default: False), when True, runs the Gelman-Rubin convergence test. Values larger than 1.01 are indicative of non-convergence. See [GelmanRubin1992] for further information. The Gelman-Rubin test is computed every 10% of the MCMC exploration.

The grbreak argument (optional, default: 0.0) sets a convergence threshold to stop an MCMC when GR drops below grbreak. Reasonable values seem to be \({\rm grbreak} \lesssim 1.01\). The default behavior is not to break (grbreak = 0.0).

The grnmin argument (optional, default: 0.5) sets a minimum number of valid samples (after burning and thinning) required for grbreak. If grnmin is greater than one, it defines the minimum number of samples to run before breaking out of the MCMC. If grnmin is lower than one, it defines the fraction of the total samples to run before breaking out of the MCMC.

Wavelet-Likelihood MCMC

The wlike argument (optional, default: False) allows mc3 to implement the Wavelet-based method to account for time-correlated noise. When using this method, the used must append the three additional fitting parameters (\(\gamma, \sigma_{r}, \sigma_{w}\)) from [CarterWinn2009] to the end of the params array. Likewise, add the correspoding values to the pmin, pmax, stepsize, prior, priorlow, and priorup arrays. For further information see [CarterWinn2009].

This tutorial won’t use the wavelet method:

# Carter & Winn (2009) Wavelet-likelihood method:
wlike = False

Fine-tuning

The \(f_{\gamma}\) and \(f_{e}\) factors scale the DEMC proposal distributions.

fgamma   = 1.0  # Scale factor for DEMC's gamma jump.
fepsilon = 0.0  # Jump scale factor for DEMC's "e" distribution

The default \(f_{\gamma} = 1.0\) value is set such that the MCMC acceptance rate approaches 25–40%. Therefore, most of the time, the user won’t need to modify this. Only if the acceptance rate is very low, we recommend to set \(f_{\gamma} < 1.0\). The \(f_{e}\) factor sets the jump scale for the \(\mathbf e\) distribution, which has to have a small variance compared to the posterior. For further information, see [terBraak2006].

Logging

If not None, the log argument (optional, default: None) stores the screen output into a log file. log can either be a string of the filename where to store the log, or an mc3.utils.Log object (see API).

# Logging:
log = 'MCMC_tutorial.log'

Outputs

The following arguments set the output files produced by mc3:

# File outputs:
savefile = 'MCMC_tutorial.npz'
plots = True
theme = 'indigo'
statistics = 'med_central'
rms = True

The savefile argument (optional, default: None) defines an .npz file names where to store the MCMC outputs. This file contains the following key–items:

  • posterior: thinned posterior distribution of shape [nsamples, nfree], including burn-in phase.
  • zchain: chain indices for the posterior samples.
  • zmask: posterior mask to remove the burn-in.
  • chisq: \(\chi^2\) values for the posterior samples.
  • log_post: log of the posterior for the sample (as defined here).
  • burnin: number of burned-in samples per chain.
  • ifree: Indices of the free parameters.
  • pnames: Parameter names.
  • texnames: Parameter names in Latex format.
  • medianp: median of the marginal posteriors for each model parameter.
  • meanp: mean of the marginal posteriors for each model parameter.
  • stdp: standard deviation of the marginal posteriors for each model parameter.
  • median_low_bounds: lower boundary of the marginal central 68%-quantile credible interval for each model parameter.
  • median_high_bounds: upper boundary of the marginal central 68%-quantile credible interval.
  • hpd_low_bounds: lower boundary of the marginal 68%-HPD credible interval for each model parameter.
  • hpd_high_bounds: upper boundary of the marginal 68%-HPD credible interval.
  • stddev_residuals: standard deviation of the residuals.
  • acceptance_rate: sample’s acceptance rate.
  • best_log_post: optimal log of the posterior in the sample (see here).
  • bestp: model parameters for the best_log_post sample.
  • best_model: model evaluated at bestp.
  • best_chisq: \(\chi^2\) for the best_log_post sample.
  • red_chisq: reduced chi-squared: \(\chi^2/(N_{\rm data}-N_{\rm free})\) for the best_log_post sample.
  • BIC: Bayesian Information Criterion: \(\chi^2 -N_{\rm free} \log(N_{\rm data})\) for the best_log_post sample.
  • chisq_factor: Uncertainties scale factor to enforce \(\chi^2_{\rm red} \equiv 1\).

Note

Note that if there are fixed or shared parameters, then the number of free parameters won’t be the same as the number of model parameters. The posterior array in the output includes only the free parameters, whereas the other arrays include all model parameters. Fixed and shared parameters can be filtered out using the ifree mask.

A file ‘rootname_statistics.txt’ summarizing all statistics in machine readable and latex format will always be produced. The root name of this file is taken from savefile without the file extension.

Setting plots=True (optional argument, default: False) will generate figures for the MCMC chain traces, the marginalized posterior histograms, and pair-wise posteriors. Setting the ioff argument to True (optional, default: False) will turn the display interactive mode off. The theme argument enables the user to set a color theme for the plots.

The statistics argument sets the statistics to use in the plots (parameter estimates and credible intervals). Select from:
statistics = med_central: Median and central quantile credible intervals (default)
statistics = max_like: Max marginal likelihood (mode) and HPD credible intervals
statistics = global_max_like: Max a posteriori (best-fit) and HPD credible intervals

Set the rms argument (optional, default: False) to True to compute and plot the time-averaging test for time-correlated noise (see [Winn2008]).

MCMC Run

Putting it all together, here’s a Python script to run an mc3 retrieval explicitly defining all the variables described above:

import sys
import numpy as np
import mc3


def quad(p, x):
    """
    Quadratic polynomial function.

    Parameters
        p: Polynomial constant, linear, and quadratic coefficients.
        x: Array of dependent variables where to evaluate the polynomial.
    Returns
        y: Polinomial evaluated at x:  y(x) = p0 + p1*x + p2*x^2
    """
    y = p[0] + p[1]*x + p[2]*x**2.0
    return y

# ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
# Preamble (create a synthetic dataset, in a real scenario you would
# get your dataset from your own data analysis pipeline):
np.random.seed(314)
x  = np.linspace(0, 10, 1000)
p0 = [3, -2.4, 0.5]
y  = quad(p0, x)

uncert = np.sqrt(np.abs(y))
error  = np.random.normal(0, uncert)
data   = y + error
# ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

# Define the modeling function as a callable:
func = quad

# List of additional arguments of func (if necessary):
indparams = [x]

# Array of initial-guess values of fitting parameters:
params = np.array([ 10.0, -2.0, 0.1])
# Lower and upper boundaries for the MCMC exploration:
pmin = np.array([-10.0, -20.0, -10.0])
pmax = np.array([ 40.0,  20.0,  10.0])
# Parameters' stepping behavior:
pstep = np.array([1.0, 0.5, 0.1])

# Parameter prior probability distributions:
prior    = np.array([ 0.0, 0.0, 0.0])
priorlow = np.array([ 0.0, 0.0, 0.0])
priorup  = np.array([ 0.0, 0.0, 0.0])

# Parameter names:
pnames = ['y0', 'alpha', 'beta']
texnames = [r'$y_{0}$', r'$\alpha$', r'$\beta$']

# Sampler algorithm, choose from: 'snooker', 'demc' or 'mrw'.
sampler = 'snooker'

# MCMC setup:
nsamples =  1e5
burnin   = 1000
nchains  =   14
ncpu     =    7
thinning =    1

# MCMC initial draw, choose from: 'normal' or 'uniform'
kickoff = 'normal'
# DEMC snooker pre-MCMC sample size:
hsize = 10

# Optimization before MCMC, choose from: 'lm' or 'trf':
leastsq = 'lm'
chisqscale = False

# MCMC Convergence:
grtest = True
grbreak = 1.01
grnmin = 0.5

# Logging:
log = 'MCMC_tutorial.log'

# File outputs:
savefile = 'MCMC_tutorial.npz'
plots = True
theme = 'indigo'
statistics = 'med_central'
rms = True

# Carter & Winn (2009) Wavelet-likelihood method:
wlike = False

# Run the MCMC:
output = mc3.sample(
    data=data, uncert=uncert, func=func, params=params,
    indparams=indparams, pmin=pmin, pmax=pmax, pstep=pstep,
    pnames=pnames, texnames=texnames,
    prior=prior, priorlow=priorlow, priorup=priorup,
    sampler=sampler, nsamples=nsamples,  nchains=nchains,
    ncpu=ncpu, burnin=burnin, thinning=thinning,
    leastsq=leastsq, chisqscale=chisqscale,
    grtest=grtest, grbreak=grbreak, grnmin=grnmin,
    hsize=hsize, kickoff=kickoff,
    wlike=wlike, log=log,
    plots=plots, theme=theme, statistics=statistics,
    savefile=savefile, rms=rms,
)

This routine returns a dictionary containing the outputs listed in Outputs. The screen output should look like this:

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
  Multi-core Markov-chain Monte Carlo (mc3).
  Version 3.1.0.
  Copyright (c) 2015-2023 Patricio Cubillos and collaborators.
  mc3 is open-source software under the MIT license (see LICENSE).
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

Least-squares best-fitting parameters:
  [ 3.02203328 -2.3897706   0.49543328]

Yippee Ki Yay Monte Carlo!
Start MCMC chains  (Thu Mar 30 01:14:54 2023)

[:         ]  10.0% completed  (Thu Mar 30 01:14:55 2023)
Out-of-bound Trials:
[0 0 0]
Best Parameters: (chisq=1035.2269)
[ 3.02203328 -2.3897706   0.49543328]

[::        ]  20.0% completed  (Thu Mar 30 01:14:55 2023)
Out-of-bound Trials:
[0 0 0]
Best Parameters: (chisq=1035.2269)
[ 3.02203328 -2.3897706   0.49543328]
Gelman-Rubin statistics for free parameters:
[1.0321583  1.02847767 1.0228459 ]

[:::       ]  30.0% completed  (Thu Mar 30 01:14:56 2023)
Out-of-bound Trials:
[0 0 0]
Best Parameters: (chisq=1035.2269)
[ 3.02203328 -2.3897706   0.49543328]
Gelman-Rubin statistics for free parameters:
[1.0048345  1.00476673 1.004175  ]
All parameters converged to within 1% of unity.

[::::      ]  40.0% completed  (Thu Mar 30 01:14:56 2023)
Out-of-bound Trials:
[0 0 0]
Best Parameters: (chisq=1035.2269)
[ 3.02203328 -2.3897706   0.49543328]
Gelman-Rubin statistics for free parameters:
[1.00292334 1.00357332 1.00374888]
All parameters converged to within 1% of unity.

[:::::     ]  50.0% completed  (Thu Mar 30 01:14:57 2023)
Out-of-bound Trials:
[0 0 0]
Best Parameters: (chisq=1035.2269)
[ 3.02203328 -2.3897706   0.49543328]
Gelman-Rubin statistics for free parameters:
[1.00158522 1.00181851 1.00164976]
All parameters converged to within 1% of unity.

[::::::    ]  60.0% completed  (Thu Mar 30 01:14:57 2023)
Out-of-bound Trials:
[0 0 0]
Best Parameters: (chisq=1035.2269)
[ 3.02203328 -2.3897706   0.49543328]
Gelman-Rubin statistics for free parameters:
[1.00058249 1.0005924  1.00048652]
All parameters converged to within 1% of unity.

All parameters satisfy the GR convergence threshold of 1.01, stopping
the MCMC.

MCMC Summary:
-------------
  Number of evaluated samples:        61899
  Number of parallel chains:             14
  Average iterations per chain:        4421
  Burned-in iterations per chain:      1000
  Thinning factor:                        1
  MCMC sample size (thinned, burned): 47899
  Acceptance rate:   28.82%

Parameter name     best fit   median      1sigma_low   1sigma_hi        S/N
--------------- -----------  -----------------------------------  ---------
y0               3.0220e+00   3.0222e+00 -1.2339e-01  1.2645e-01       24.6
alpha           -2.3898e+00  -2.3898e+00 -7.3264e-02  7.2663e-02       33.9
beta             4.9543e-01   4.9538e-01 -8.7058e-03  8.8865e-03       57.6

  Best-parameter's chi-squared:       1035.2269
  Best-parameter's -2*log(posterior): 1035.2269
  Bayesian Information Criterion:     1055.9502
  Reduced chi-squared:                   1.0383
  Standard deviation of residuals:  2.77253

For a detailed summary with all parameter posterior statistics see
MCMC_tutorial_statistics.txt

Output sampler files:
  MCMC_tutorial_statistics.txt
  MCMC_tutorial.npz
  MCMC_tutorial_trace.png
  MCMC_tutorial_pairwise_posterior.png
  MCMC_tutorial_marginal_posterior.png
  MCMC_tutorial_RMS.png
  MCMC_tutorial.log

Resume a Previous Run

It is also possible to add more samples to a previous run (identified by the .npz output file name). To do this, set the resume = True. Ideally, keep the same number of MCMC chains from the previous run to avoid any conflict with the shape of the posterior. This resumed run will append nsamples samples into the posterior output from the previous run (overwritting all output files).


Inputs from Files

The data, uncert, indparams, params, pmin, pmax, stepsize, prior, priorlow, and priorup input arrays can be optionally be given as input file. Furthermore, multiple input arguments can be combined into a single file.

Input Data File

The data, uncert, and indparams inputs can be provided as binary numpy .npz files. data and uncert can be stored together into a single file. An indparams input file contain the list of independent variables (must be a list, even if there is a single independent variable).

The utils sub-package of mc3 provide utility functions to save and load these files. This script shows how to create data and indparams input files:

import numpy as np
import mc3

# Create a synthetic dataset using a quadratic polynomial curve:
x  = np.linspace(0.0, 10, 1000)
p0 = [3, -2.4, 0.5]
y  = quad(p0, x)
error  = np.random.normal(0, uncert)

data   = y + error
uncert = np.sqrt(np.abs(y))

# data.npz contains the data and uncertainty arrays:
mc3.utils.savebin([data, uncert], 'data.npz')
# indp.npz contains a list of variables:
mc3.utils.savebin([x], 'indp.npz')

Model Parameters

The params, pmin, pmax, stepsize, prior, priorlow, and priorup inputs can be provided as plain ASCII files. For simplycity all of these input arguments can be combined into a single file.

In the params file, each line correspond to one model parameter, whereas each column correspond to one of the input array arguments. This input file can hold as few or as many of these argument arrays, as long as they are provided in that exact order. Empty or comment lines are allowed (and ignored by the reader). A valid params file look like this:

#       params            pmin            pmax        stepsize
            10             -10              40             1.0
          -2.0             -20              20             0.5
           0.1             -10              10             0.1

Alternatively, the utils sub-package of mc3 provide utility functions to save and load these files:

params   = [ 10, -2.0,  0.1]
pmin     = [-10,  -20, -10]
pmax     = [ 40,   20,  10]
stepsize = [  1,  0.5,  0.1]

# Store ASCII arrays:
mc3.utils.saveascii([params, pmin, pmax, stepsize], 'params.txt')

Then, to run the MCMC simply provide the input file names to the mc3 routine:

# Set arguments as the file names:
data      = 'data.npz'
indparams = 'indp.npz'
params    = 'params.txt'

# Run the MCMC:
mc3_output = mc3.sample(data=data, func=func, params=params,
    indparams=indparams, sampler=sampler, nsamples=nsamples,  nchains=nchains,
    ncpu=ncpu, burnin=burnin, leastsq=leastsq, chisqscale=chisqscale,
    grtest=grtest, grbreak=grbreak, grnmin=grnmin,
    log=log, plots=plots, savefile=savefile, rms=rms)