We have a set of observed population numbers \begin{equation} \opop=\left[\begin{array}{c} \opop_1\\ \vdots\\ \opop_\nop \end{array}\right] \end{equation} over \(\nop\) age intervals \([a_i,a_i+n_i)\). The objective is to ungroup the population into single years of age \(\pop_x\), \(x=0,1,2,\ldots,\maxage\) out to a maximum age \(\maxage\). In Rizzi et al (2015)'s Penalised Composite Link Method (PCLM) the ungrouped numbers are expressed in the form \begin{equation} \label{eqn:pclm} \log\pop_x = \bmat_x\mmult\bvec , \end{equation} where \(\bmat_x\) is a row vector of basis splines evaluated at age \(x\), \(\bvec\) is a column vector of B-Spline weights to be determined. The weights are found by maximising the penalised loglikelihood function \begin{equation} \loglik(\bvec) = \ploglik(\bvec) - \frac{\pen}{2}\,\bvec'\mmult \diff'\mmult \diff\mmult\bvec \end{equation} Where \(\diff\) is the \(k\)-order difference matrix. Assuming the observed population numbers follow a Poisson distribution \begin{equation} \ploglik(\bvec)=\ones'\mmult (\opop \log\apop-\apop). \end{equation} Here \(\ones\) is a vector of ones and \(\apop\) is the vector of \(\nop\) smoothed abridged numbers given in terms of \(\pop_x\) by the sum \begin{equation} _n\apop_a = \sum_{a\le x< a+n} \pop_x . \end{equation} Assuming \(\loglik(\bvec)\) is maximised at a stationary point we get the following nonlinear equation for \(\bvec\) \begin{equation} \label{eqn:nonlinear} \gmat'(\bvec)\mmult\vmat\mmult (\opop-\apop)-\pen \diff'\mmult \diff\mmult \bvec=0, \end{equation} where \(\vmat\) is the identity matrix and \(\gmat(\bvec)\) is the matrix of logarithmic derivatives \begin{equation} \gmat(\bvec) = \frac{1}{\apop}\grad{\apop}. \end{equation} This equation can be solved by iterated linear regressions as follows. Given an approximation \(\bar{\bvec}\) the updated value \(\bvec\) is calculated by solving the linear equation \begin{equation} \label{eqn:linear} \qmat(\bar{\bvec})\mmult \bvec = b(\bar{\bvec}) \end{equation} where \begin{eqnarray} \qmat(\bvec)&=&\gmat'(\bvec)\mmult\wmat(\bvec)\mmult\gmat(\bvec) + \pen \diff'\mmult\diff\\ b(\bvec)&=& \gmat'(\bvec)\mmult (\opop-\apop) + \gmat'(\bvec)\mmult\wmat(\bvec)\mmult\gmat(\bvec)\mmult\bvec \end{eqnarray} and \begin{equation} \wmat(\bvec)=\diag(\apop)\mmult\vmat \end{equation}
The user can alter the default settings of the method using an extrainfo string of the form PCLM:ExtraInfo. ExtraInfo is an optional comma-separated list of quoted strings of the form "Parameter=Value".
Degree
By default PCLM uses cubic B-Splines. The degree of splines can be changed with the extrainfo string "Degree=\(\degree\)".
Order
By default PCLM uses a second order penalty. The order can be changed with the extrainfo string "Order=\(\order\)".
Knots
There are two ways to set the B-Spline knots. The first way is to set the maximum age using extrainfo string "MaxAge=\(\maxage\)". The internal points are then equally spaced upwards from \(0\) as follows
\begin{eqnarray}
x_0&=&0\\
x_i &= &x_{i-1}+\knotspacing,\quad i=1,\ldots,\nintern\\
x_{\nintern+1}&=&\maxage
\end{eqnarray}
where
\begin{equation}
\nintern=\ceil{\frac{\maxage}{\knotspacing}}-1
\end{equation}
and the spacing \(\knotspacing\) is specified by the extrainfo string "KnotSpacing=\(\knotspacing\)". The default values are \(\maxage=110\) and \(\knotspacing=2.5\). The second way is to set the knots explicitly using the extrainfo string "Knots=\(x_0\)|\(x_1\)|\(x_2\)|\(\ldots\)|\(x_\nintern\)|\(\maxage\)".
Penalty
By default PCLM automatically calculates the penalty \(\pen\) by minimising the Bayesean information criterion ("Penalty=BIC"). An alternative is the Akaike information criterion ("Penalty=AIC").
The penalty can also be specified explicitly using "Penalty=\(\pen\)".
Error Type
Rizzi et al (2015) orginally modelled the observed population with a Poisson distribution but the method is flexible enough to incorporate alternative assumptions. For example, if the values \(\opop\) are normally distributed with variance \(\var\) then
\begin{equation}
\ploglik(\bvec)=-\ones'\mmult \frac{1}{2\var}(\opop-\apop)^2 .
\end{equation}
The PCLM equations are still valid with the matrix \(\vmat\) changed from the identity to
\begin{equation}
\vmat=\diag(\apop/\var).
\end{equation}
By default PCLM uses a Poisson distribution ("ErrType=POISSON"). Users can specify a normal distribution by setting "ErrType=NORMAL".
Var
When \(\opop\) is modelled by a normal distribution then the variance is set using the extrainfo string "Var=VarPCurveName" where VarPCurveName is the name of a PCURVE object with the age-specific variances in the Numbers column. Alternatively a flat variance \(\beta\) can be specified using "Var=\(\beta\)". By default PCLM uses a flat variance with \(\beta=1000\).
MaxIts
The number of iterations used to solve for \(\bvec\) can be set using the extrainfo string "MaxIts=\(\maxits\)". The default is \(\maxits=15\).