My work in this thesis is about the lasso-based penalties, SCAD-based penalties and their applications to genetic research and economic forecasting. Motivated by functional genome-wide association studies (fGWAS), a procedure with penalization on both linear regression coefficients and the inverse covariance matrix is proposed. Motivated by macroeconomic variable forecasting, the differences in the performance of forecasting from lasso regression and SCAD regression are examined and the improvements of forecast accuracy resulting from the inclusion of group structure, residual bootstrap and forecasts combination are investigated.
The first proposed procedure applies lasso-based penalties on both coefficients and inverse covariance matrix estimation in nonparametric varying-coefficient models in fGWAS to select SNPs that are significantly associated with an interested phenotypic trait based on limited number of measurements. The genetic effects of SNPs are time-varying and the phenotypic trait is measured repeatedly. The procedure provides satisfactory variable selection results in simulation, facilitates model interpretation and enhances variable selection power with sparse inverse covariance matrix estimation.
The rest of the dissertation is about penalized linear regressions in macroeconomic forecasting. Lasso and SCAD regressions are first examined as two alternative forecasting methods. Based on comparison in simulation and real examples, SCAD penalty is recommended over lasso penalty for macroeconomic data because they are grouped data and model mis-specication risks have to be considered. With such recommendation, SCAD penalty is further extended to group SCAD regression, SCAD regression with residual bootstrap and group SCAD regression with residual bootstrap. Group SCAD penalty provides more consistent variable selection results and enhances model interpretability. Residual bootstrap increases model selection stability. Group SCAD regression with residual bootstrap and SCAD regression significantly improve forecast accuracy for most macroeconomic variables. In the end,the forecasts combination of SCAD-related models and dynamic factor model are studied. The combined forecast shows advantages over individual forecasts in out-of-sample forecasting.