Applications of L1 regularisation

Authors

  • M. R. Osborne Mathematical Sciences Institute, Australian National University, ACT 0200
  • Tania Prvan Department of Statistics, Macquarie University, NSW 2109

DOI:

https://doi.org/10.21914/anziamj.v52i0.3805

Keywords:

variable selection, l1 regularisation

Abstract

The lasso algorithm for variable selection in linear models, introduced by Tibshirani, works by imposing an $l_1$~norm bound constraint on the variables in a least squares model and then tuning the model estimation calculation using this bound. This introduction of the bound is interpreted as a form of regularisation step. It leads to a form of quadratic program which is solved by a straight-forward modification of a standard active set algorithm for each value of this bound. Considerable interest was generated by the discovery that the complete solution trajectory parametrised by this bound is piecewise linear and can be calculated very efficiently. Essentially it takes no more work than the solution of either the unconstrained least squares problem or the quadratic program at a single bound value. This has resulted in the study both of the selection problem for different objective and constraint choices and of applications to such areas as data compression and the generation of sparse solutions of very under-determined systems. One important class of generalisation is to quantile regression estimation problems. The original continuation idea extends to these polyhedral objectives in an interesting two phase procedure which involves both the constrained and Lagrangian forms of the problem at each step. However, it is significantly less computationally effective than is the original algorithm for least squares objectives. In contrast, the piecewise linear estimation problem can be solved for each value of the $l_1$~bound by a relatively efficient simplicial descent algorithm, and that this can be used to explore trajectory information in a manner which is at least competitive with the homotopy algorithm in this context. The form of line search used in the descent steps has an important bearing on the effectiveness of the algorithm. A comparison is given between the relative performance of the simplicial descent algorithm used and an interior point method on the piecewise linear estimation problem. References
  • I. Barrodale and F. D. K. Roberts. An improved algorithm for $l_1$ linear approximation. SIAM J. Numer. Anal., 10:839--848, 1973.
  • P. Bloomfield and W. L. Steiger. Least Absolute Deviations. Birkhauser, Boston, 1983.
  • H. D. Bondell and B. J. Reich. Simultaneous regression shrinkage, variable selection, and supervising clustering of predictors with {OSCAR}. Biometrics, 64:115--123, 2008. doi:10.1111/j.1541-0420.2007.00843.x
  • E. Candes and T. Tao. The Dantzig selector: statistical estimation when $p$ is much larger than $n$. Ann. Statist., 35(6):2313--2351, 2007. doi:10.1214/009053606000001523
  • D. L. Donoho and Y. Tsaig. Fast solution of $l_1$-norm minimization problems when the solution may be sparse. IEEE Trans. Inf. Theory, 54:4789--4812, 2008. doi:10.1109/TIT.2008.929958
  • R. Koenker. Quantile Regression. Cambridge University Press, 2005.
  • Y. Li and J. Zhu. {$L_1$}-norm quantile regression. J. Comp. Graph. Stat., 17(1):163--185, 2008. doi:10.1198/106186008X289155
  • M. R. Osborne. Simplicial algorithms for minimizing polyhedral functions. Cambridge University Press, 2001.
  • M. R. Osborne, Brett Presnell, and B. A. Turlach. On the Lasso and its dual. J. Comp. Graph. Stat., 9(2):319--337, 2000. http://www.jstor.org/stable/1390657
  • M. R. Osborne and B. A. Turlach. A homotopy algorithm for the quantile regression lasso and related piecewise linear problems. J. Comp. Graph. Stat., 2010. Accepted for publication. doi:10.1198/jcgs.2011.09184
  • S. Portnoy and R. Koenker. The Gaussian hare and the Laplacian tortoise: computability of squared error vs absolute error estimates. Stat. Sci., 12:279--300, 1997. http://www.jstor.org/stable/2246216
  • S. Rosset and Ji Zhu. Piecewise linear regularised solution paths. Ann. Stat., 35(3):1012--1030, 2007. doi:10.1214/009053606000001370
  • R. Tibshirani. Regression shrinkage and selection via the lasso. J. R. Stat. Soc., Ser. B, 58(1):267--288, 1996. http://www.jstor.org/stable/2346178
  • B. A. Turlach, W. N. Venables, and S. J. Wright. Simultaneous variable selection. Technometrics, 47(3):349--363, 2005. doi:10.1198/004017005000000139
  • Y. Yao and Y. Lee. Another look at linear programming for feature selection via methods of regularization. Technical Report 800, Department of Statistics, Ohio State University, 2007.
  • M. Yuan and H. Zou. Efficient global approximation of generalised nonlinear {$\ell _1$} regularised solution paths and its applications. J. Am. Stat. Assoc., 104:1562--1573, 2009. doi:10.1198/jasa.2009.tm08287
  • J. Zhu, T. Hastie, S. Rosset, and R. Tibshirani. $l_1$-norm support vector machines. Adv. Neural Inf. Process. Syst., 16:49--56, 2004. http://books.nips.cc/papers/files/nips16/NIPS2003_AA07.pdf
  • H. Zou and T. Hastie. Regularization and variable selection via the elastic net. J. R. Stat. Soc., Ser. B, 67:301--320, 2005. http://www.jstor.org/stable/3647580
  • H. Zou and M. Yuan. Regularised simultaneous model selection in multiple quantiles regression. Comp. Stat. Data Anal., 52:5296--5304, 2008. doi:10.1016/j.csda.2008.05.013

Published

2011-10-17

Issue

Section

Proceedings Computational Techniques and Applications Conference