In order to learn the complex features of large spatio-temporal data, models with a large number of parameters are often required. However, inference is often infeasible due to the computational and memory costs of maximum likelihood estimation (MLE). The class of marginally parameterized (MP) models is introduced, where estimation can be performed efficiently with a sequence of marginal likelihood functions with stepwise maximum likelihood estimation (SMLE). The conditions under which the stepwise estimators are consistent are provided, and it is shown that this class of models includes the diagonal vector autoregressive moving average model. It is demonstrated that the parameters of this model can be obtained at least three orders of magnitude faster with SMLE compared to MLE, with only a small loss in statistical efficiency. A MP model is applied to a spatio-temporal global climate data set consisting of over five million data points, and it is demonstrated how estimation can be achieved in less than one hour on a laptop with a dual core at 2.9 Ghz. (C) 2020 Elsevier B.V. All rights reserved.