Reduced-Rank Modeling for High-Dimensional Model-Based Clustering

Authors

  • Lei Yang Department of Environmental Medicine, New York University, New York, NY, USA
  • Junhui Wang Department of Mathematics, City University of Hong Kong, Kowloon Tong, Hong Kong
  • Shiqian Ma Department of Mathematics, University of California, Davis, CA 95616, USA

DOI:

https://doi.org/10.4208/jcm.1708-m2016-0830

Keywords:

Clustering, Gaussian mixture model, Group Lasso, ADMM, Reduced-rank model.

Abstract

Model-based clustering is popularly used in statistical literature, which often models the data with a Gaussian mixture model. As a consequence, it requires estimation of a large amount of parameters, especially when the data dimension is relatively large. In this paper, reduced-rank model and group-sparsity regularization are proposed to equip with the model-based clustering, which substantially reduce the number of parameters and thus facilitate the high-dimensional clustering and variable selection simultaneously. We propose an EM algorithm for this task, in which the M-step is solved using alternating minimization. One of the alternating steps involves both nonsmooth function and nonconvex constraint, and thus we propose a linearized alternating direction method of multipliers (ADMM) for solving it. This leads to an efficient algorithm whose subproblems are all easy to solve. In addition, a model selection criterion based on the concept of clustering stability is developed for tuning the clustering model. The effectiveness of the proposed method is supported in a variety of simulated and real examples, as well as its asymptotic estimation and selection consistencies.

Published

2018-09-17

Issue

Section

Articles