Text
Detecting Meaningful Clusters From High-Dimensional Data : A Strongly Consistent Sparse Center-Based Clustering Approach
In context to high-dimensional clustering, the concept of feature weighting has gained considerable importance over the years to capture the relative degrees of importance of different features in revealing the cluster structure of the dataset. However, the popular techniques in this area either fail to perform feature selection or do not preserve the simplicity of Lloyd’s heuristic to solve the k-means problem and the like. In this paper, we propose a Lasso Weighted k-means (LW-k-means) algorithm, as a simple yet efficient sparse clustering procedure for high-dimensional data where the number of features (p) can be much higher than the number of observations (n). The LW-k-means method imposes an ℓ1 regularization term involving the feature weights directly to induce feature selection in a sparse clustering framework. We develop a simple block-coordinate descent type algorithm with time-complexity resembling that of Lloyd’s method, to optimize the proposed objective. In addition, we establish the strong consistency of the LW-k-means procedure. Such an analysis of the large sample properties is not available for the conventional sparse k-means algorithms, in general. LW-k-means is tested on a number of synthetic and real-life datasets and through a detailed experimental analysis, we find that the performance of the method is highly competitive against the baselines as well as the state-of-the-art procedures for center-based high-dimensional clustering, not only in terms of clustering accuracy but also with respect to computational time.
Barcode | Tipe Koleksi | Nomor Panggil | Lokasi | Status | |
---|---|---|---|---|---|
art142912 | null | Artikel | Gdg9-Lt3 | Tersedia namun tidak untuk dipinjamkan - No Loan |
Tidak tersedia versi lain