Panel data arise when time series are measured on a collection of
units. When the time series for each unit is modeled as a partially
observed Markov process (POMP) the collection of these models is a
PanelPOMP. The panelPomp
package provides facilities for
inference on panel data using PanelPOMP models. Monte Carlo methods used
for POMP models require adaptation for PanelPOMP models due to the
higher dimensionality of panel data. This package builds on the
functionality and tools of the popular pomp
R package,
providing a computationally efficient framework that can be used to
simulate from, fit, and diagnose PanelPOMP models. As such, a basic
working knowledge of the pomp
package is recommended. Here
we cover some of the necessary basics. See Getting
Started With pomp
for an introductory Vignette to the
pomp
package.
Before discussing features of the panelPomp
package, we
describe a mathematical notation that is helpful in communicating
details about panelPomp
code and models. The general scope
of the panelPomp
package requires notation concerning
random variables and their densities in arbitrary spaces. The notation
below allows us to talk about these things using the language of
mathematics, enabling precise description of models and algorithms.
Units of the panel can be identified with numeric labels {1, 2, …, U}, which we also write as 1 : U. Let Nu be the number of measurements collected on unit u, allowing for the possibility that a different number of observations are collected for each unit. The data from the entire panel are written as y1 : U, 1 : Nu* = {y1, 1*, y2, 1*, …, yU, 1*, y1, 2*…, yu, Nu*} where the nth observation from unit u (denoted as yu, n*) is collected at time tu, n with tu, 1 < tu, 2 < … < tu, Nu. Observation times {tu, n} are often equally spaced, but this general framework and notation permit unequally spaced observations times both across and within units. The data from unit u are modeled as a realization of an observable stochastic process Yu, 1 : Nu which is dependent on a latent Markov process {Xu(t), tu, 0 ≤ t ≤ tu, Nu} defined subsequent to an initial time tu, 0 ≤ tu, 1. Requiring that {Xu(t)} and {Yu, i, i ≠ n} are independent of Yu, n given Xu(tu, n), for each n ∈ 1 : Nu, completes the partially observed Markov process (POMP) model structure for unit u. For a PanelPOMP we require additionally that all units are modeled as independent.
While the latent process may exist between measurement times, its value at measurement times is of particular interest. We write Xu, n = Xu(tu, n) to denote the latent process at the observation times. We suppose that Xu, n and Yu, n take values in arbitrary spaces 𝕏u and 𝕐u respectively. Using the independence of units, conditional independence of the observable random variables, and the Markov property of the latent states, the joint distribution of the entire collection of latent variables X = {Xu, 0 : Nu}u = 1U and observable variables Y = {Yu, 1 : Nu}u = 1U can be written as: $$ f_{\mathbf{X}\mathbf{Y}}(\mathbf{x}, \mathbf{y}) = \prod_{u = 1}^U f_{X_{u, 0}}(x_{u,0}; \theta)\prod_{n = 1}^{N_u} f_{Y_{u, n}|X_{u, n}}(y_{u, n}|x_{u, n}; \theta)f_{X_{u, n}|X_{u, n-1}}(x_{u, n}|x_{u, n-1}; \theta), $$ where θ ∈ ℝD is a possibly unknown parameter vector. This representation is useful as it demonstrates how any PanelPOMP model can be fully described using three primary components: the transition densities fXu, n|Xu, n − 1(xu, n|xu, n − 1; θ), measurement densities fYu, n|Xu, n(yu, n|xu, n : θ), and initialization densities fXu, 0(xu, 0; θ). Each class of densities are permitted to depend arbitrarily on u and n, allowing non-stationary models and the inclusion of covariate time series. In addition to continuous-time dynamics, the framework includes discrete-time dynamic models by specifying Xu, 0 : Nu directly without ever defining {Xu(t), tu, 0 ≤ t ≤ tu, Nu}.
The marginal density of Yu, 1 : Nu
at yu, 1 : Nu
is fYu, 1 : Nu(yu, 1 : Nu; θ)
and the likelihood function for unit u is Lu(θ) = fYu, 1 : Nu(yu, 1 : Nu*; θ).
The likelihood for the entire panel is $L(\theta) = \prod_{u=1}^{U} L_{u}(\theta)$,
and any solution θ̂ = arg max L(θ)
is a maximum likelihood estimate (MLE). The log likelihood is ℓ(θ) = log L(θ).
We also permit the possibility that some parameters may affect only a
subset of units, so that the parameter vector can be written as θ = (ϕ, ψ1, …, ψU),
where the important densities described above can be written as Then,
ψu is a
vector of unit-specific parameters for unit u, and ϕ is a shared parameter
vector. We suppose ϕ ∈ ℝA and ψ ∈ ℝB, so the
dimension of the parameter vector θ is D = A + BU.
In practice, the densities in Eqs. @ref(eq:proc)–@ref(eq:init) serve two
primary roles in PanelPOMP models: evaluation and simulation. The way
these fundamental goals are represented in the panelPomp
package is described in Table @ref(tab:funs).
Method | Mathematical terminology |
---|---|
rprocess |
Simulate from Eq. @ref(eq:proc): fXu, n|Xu, n − 1(xu, n|xu, n − 1; θ) |
dprocess |
Evaluate Eq. @ref(eq:proc): fXu, n|Xu, n − 1(xu, n|xu, n − 1; θ) |
rmeasure |
Simulate from Eq. @ref(eq:meas): fYu, n|Xu, n(yu, n|xu, n; ϕ, ψu) |
dmeasure |
Evaluate Eq. @ref(eq:meas): fYu, n|Xu, n(yu, n|xu, n; ϕ, ψu) |
rinit |
Simulate from Eq. @ref(eq:init): fXu, 0(xu, 0; ϕ, ψu) |
dinit |
Evaluate Eq. @ref(eq:init): fXu, 0(xu, 0; ϕ, ψu) |
Each independent unit in a panel is POMP model, represented using the
pomp
package. Each pomp
object contains the
same components described in Table @ref(tab:funs), with the exception
that parameters cannot be shared across individual units in the panel.
As such, the functions listed in Table @ref(tab:funs) are available in
the panelPomp
package through the pomp
package. Additional functions of interest that are not listed in Table
@ref(tab:funs) include: rprior()
and dprior()
,
enabling the use of Bayesian analysis if desired;
emeasure()
and vmeasure()
, which describe the
conditional expectation and covariance of the measurement model for
algorithms that rely on these values (such as the Kalman filter).
panelPomp
objectsThe panelPomp
package is written in a functional object
oriented programming framework. Key to the most important features of
the package is the panelPomp
class, which is implemented
using the S4
system. This S4
class contains three slots:
unit_objects
: A list of pomp
objects.shared
: a named numeric vector containing the names and
values of parameters that are shared for each unit of the panel.specific
: a numeric matrix with row and column names;
row names correspond to the parameter names, and column names to the
unit names of the panel.Notably, the functions listed in Table @ref(tab:funs) are not part of
a panelPomp
object directly, rather they are part of the
individual unit objects saved in the slot unit_objects
.
These objects can be extracted using the extracter function:
unit_objects(<object>)
.
panelPomp
objectsThe fundamental mathematical functions that define a PanelPOMP model
are made available via the pomp
objects in the
unit_objects
slot. As such, constructing a
panelPomp
object is simple if you are already familiar with
constructing pomp
objects, or if you already have access to
pomp
objects. Here we describe how to create a
panelPomp
object if the unit-specific pomp
objects are already created. In the next sub-section, we give a
brief demonstration of how to construct a panelPomp
object
from scratch, including each individual pomp
object. Here,
we construct a panelPomp
object representing a panel of
stochastic Gompertz population models with log-normal measurement error.
The latent state process is defined as: Xn + 1 = K1 − SXnSϵn,
where S = exp−r and
the ϵn are
i.i.d. lognormal random variables with variance σ2. The measurement model
for the observed variables Yn are
distributed as: Yn ∼ Lognormal(log Xn, τ).
The parameters of this model are:
This particular model class has a constructor function
gompertz()
from the pomp
package. Here, we
create 5 unique instances of this model, and use these instances to
create a single panelPomp
object:
mod1 <- pomp::gompertz() # Using default values
mod2 <- pomp::gompertz(K = 2, r = 0.01) # Overwriting some of the defaults
mod3 <- pomp::gompertz(K = 1.5, sigma = 0.15, X_0 = 5)
mod4 <- pomp::gompertz(K = 1.5, r = 0.05, X_0 = 5)
mod5 <- pomp::gompertz(K = 5, sigma = 0.08)
panelMod1 <- panelPomp(
object = list(mod1, mod2, mod3, mod4, mod5)
)
One important thing to note above the above construction is that each
individual model already has parameter values present. When this is the
case, the panelPomp()
constructor sets all parameters to be
unit specific:
print(specific(panelMod1))
#> unit
#> parameter unit1 unit2 unit3 unit4 unit5
#> K 1.0 2.00 1.50 1.50 5.00
#> r 0.1 0.01 0.10 0.05 0.10
#> sigma 0.1 0.10 0.15 0.10 0.08
#> tau 0.1 0.10 0.10 0.10 0.10
#> X_0 1.0 1.00 5.00 5.00 1.00
print(shared(panelMod1))
#> numeric(0)
In the panelMod1
object, all five parameters are listed
as unit specific. Notably, because τ was not modified in any of the
unit specific objects, it has the same value across all five units. In
such cases, it might make sense to list parameters that have the same
value across all units as shared parameters, which can be done
in the model constructor:
panelMod2 <- panelPomp(
object = list(mod1, mod2, mod3, mod4, mod5),
shared = c("tau" = 0.1)
)
specific(panelMod2)
#> unit
#> parameter unit1 unit2 unit3 unit4 unit5
#> K 1.0 2.00 1.50 1.50 5.00
#> r 0.1 0.01 0.10 0.05 0.10
#> sigma 0.1 0.10 0.15 0.10 0.08
#> X_0 1.0 1.00 5.00 5.00 1.00
shared(panelMod2)
#> tau
#> 0.1
In this case we did not need to explicitly specify unit-specific
parameters; if parameter values are present in the unit
pomp
objects that comprise the panel, parameters are
assumed to be unit-specific unless otherwise specified. However, it is
possible to explicitly provide a matrix of unit specific parameters in
the constructor, if desired. This is especially important if the
individual pomp
objects that make up the panel have missing
parameter values.
Unit-specific parameters can be expressed in two ways: as a matrix
with rows corresponding to parameter values and columns the
corresponding unit (as seen above), or as a named numeric vector that
follows the convention
<param>[<unit name>]
:
specific(panelMod2, format = 'vector')
#> K[unit1] r[unit1] sigma[unit1] X_0[unit1] K[unit2] r[unit2]
#> 1.00 0.10 0.10 1.00 2.00 0.01
#> sigma[unit2] X_0[unit2] K[unit3] r[unit3] sigma[unit3] X_0[unit3]
#> 0.10 1.00 1.50 0.10 0.15 5.00
#> K[unit4] r[unit4] sigma[unit4] X_0[unit4] K[unit5] r[unit5]
#> 1.50 0.05 0.10 5.00 5.00 0.10
#> sigma[unit5] X_0[unit5]
#> 0.08 1.00
It is often convenient to modify which parameters are shared and
which are unit-specific on existing panelPomp
objects
rather than creating new objects from scratch. This can be done with the
shared<-
and specific<-
setter
functions:
shared(panelMod2) <- c('r' = 0.05, 'sigma' = 0.1)
specific(panelMod2) <- c('tau[unit1]' = 0.11, 'tau[unit4]' = 0.09)
print(shared(panelMod2))
#> r sigma
#> 0.05 0.10
print(specific(panelMod2))
#> unit
#> param unit1 unit2 unit3 unit4 unit5
#> K 1.00 2.0 1.5 1.50 5.0
#> X_0 1.00 1.0 5.0 5.00 1.0
#> tau 0.11 0.1 0.1 0.09 0.1
Notice above that if a shared parameter (tau
) is changed
to a unit-specific parameter and not all values of the unit-specific
parameter are explicitly provided, the parameters that are not specified
default to the original shared value. The unit-specific setter function
also works in matrix format:
specific(panelMod2) <- matrix(
data = rbind(c(1.24, 1.78),
c( 2, 3)),
nrow = 2,
dimnames = list(
param = c("K", "X_0"),
unit = c('unit2', 'unit4')
)
)
specific(panelMod2)
#> unit
#> param unit1 unit2 unit3 unit4 unit5
#> K 1.00 1.24 1.5 1.78 5.0
#> X_0 1.00 2.00 5.0 3.00 1.0
#> tau 0.11 0.10 0.1 0.09 0.1
Neither the shared<-
nor the
specific<-
setter functions allow a user to add new
parameters (or unit names) that are not already part of the model: