Skip to main content
Log in

Strong consistency of factorial \(K\)-means clustering

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

Factorial \(k\)-means (FKM) clustering is a method for clustering objects in a low-dimensional subspace. The advantage of this method is that the partition of objects and the low-dimensional subspace reflecting the cluster structure are obtained, simultaneously. In some cases that reduced \(k\)-means (RKM) clustering does not work well, FKM clustering can discover the cluster structure underlying a lower dimensional subspace. Conditions that ensure the almost sure convergence of the estimator of FKM clustering as the sample size increases unboundedly are derived. The result is proved for a more general model including FKM clustering. Moreover, it is also shown that there exist some cases in which RKM clustering becomes equivalent to FKM clustering as the sample size goes to infinity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

  • Akama, Y., Irie, K., Kawamura, A., Uwano, Y. (2010). VC dimensions of principal component analysis. Discrete and Computational Geometry, 44(3), 589–598.

    Google Scholar 

  • Arabie, P., Hubert, L. (1994). Cluster Analysis in Marketing Research. In R. P. Bagozzi (Ed.), Advanced methods of marketing research (pp. 160–189). Oxford: Blackwell.

  • Chang, W. (1983). On using principal components before separating a mixture of two multivariate normal distributions. Applied Statistics, 32(3), 267–275.

    Article  MATH  MathSciNet  Google Scholar 

  • De Soete, G., Carroll, J. D. (1994). \(K\)-means clustering in a low-dimensional Euclidean space. In E. Diday, Y. Lechevallier, M. Schader, P. Bertrand, & B. Burtschy (Eds.), New approaches in classification and data analysis (pp. 212–219). Berlin: Springer.

  • Dehardt, J. (1971). Generalizations of the Glivenko-Cantelli theorem. Annals of Mathematical Statistics, 42(6), 2050–2055.

    Article  MATH  MathSciNet  Google Scholar 

  • Devroye, L., Györfi, L., Lugosi, G. (1996).  A probabilistic theory of pattern recognition. New York: Springer.

  • Peskir, G. (2000). From uniform laws of large numbers to uniform ergodic theorems. Lecture notes series. 66, University of Aarhus, Department of Mathematics.

  • Pfanzagl, J. (1994). Parametric statistical theory. Berlin: de Gruyter.

  • Pollard, D. (1981). Strong consistency of \(k\)-means clustering. Annals of Statistics, 9(1), 135–140.

    Article  MATH  MathSciNet  Google Scholar 

  • Pollard, D. (1982). Quantization and the method of \(k\)-means. IEEE Transactions of Information Theory, 28(2), 199–205.

    Article  MATH  MathSciNet  Google Scholar 

  • Terada, Y. (2014). Strong consistency of reduced \(k\)-means clustering. To appear in Scandinavian Journal of Statistics.

  • Timmerman, M. E., Ceulemans, E., Kiers, H. A. L., Vichi, M. (2010). Factorial and reduced \(K\)-means reconsidered. Computational Statistics and Data Analysis, 54(7), 1858–1871.

    Google Scholar 

  • Vichi, M., Kiers, H. A. L. (2001). Factorial \(k\)-means analysis for two-way data. Computational Statistics and Data Analysis, 37(1), 49–64.

    Google Scholar 

Download references

Acknowledgments

The author wishes to express his thanks to Dr. Michio Yamamoto for his helpful comments and discussions related to Example 2. The author gratefully acknowledges the helpful comments and suggestions of the two anonymous reviewers. Moreover, the author also thank Professor Yutaka Kano for his supervision of this research. This work was supported by Grant-in-Aid for JSPS Fellows Number \(24\cdot 2466\).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yoshikazu Terada.

Appendix: Existence of \(\Theta ^{\prime }\)

Appendix: Existence of \(\Theta ^{\prime }\)

Here, we prove the existence of population global optimizers.

Lemma 5

Suppose that \(\int \psi (\Vert \varvec{x}\Vert )P(d\varvec{x})<\infty \). There exists \(M>0\) such that

$$\begin{aligned} \inf _{A\in \mathcal {O}(p\times q)}\Psi (F^{\prime },A,P) > \inf _{\theta \in \Theta _k^*(M)}\Psi (\theta ,P) \end{aligned}$$

for all \(F^{\prime }\in \mathcal {R}_k\) satisfying \(F^{\prime }\cap B_q(M)=\emptyset \).

Proof

Conversely, suppose that, for all \(M>0\), there exists \(F^{\prime }\in \mathcal {R}_k\) such that \(F^{\prime }\cap B_q(M)=\emptyset \) and

$$\begin{aligned} \inf _{A\in \mathcal {O}(p\times q)}\Psi (F^{\prime },A,P) \le \inf _{\theta \in \Theta _k^*(M)}\Psi (\theta ,P). \end{aligned}$$

Choose \(r>0\) to satisfy that the ball \(B_p(r)\) has a positive \(P\) measure; that is \(P(B_p(r))>0\). Let \(M\) be sufficiently large such that \(M>r\) and that it satisfies inequality (3). Since \(\Vert A^T\varvec{x}-\varvec{f}\Vert \ge \Vert \varvec{f}\Vert -\Vert A^{T}\varvec{x}\Vert >M-r\) for all \(\varvec{f}\notin B_{q}(M)\) and all \(\varvec{x}\in B_p(r)\), we have

$$\begin{aligned} \int \psi (\Vert \varvec{x}\Vert )P(\varvec{x})&\ge \inf _{\theta \in \Theta _k^*(M)}\Psi (\theta ,P) \ge \inf _{A\in \mathcal {O}(p\times q)}\Psi (F^{\prime },A,P)\\&\ge \inf _{A\in \mathcal {O}(p\times q)}\int \nolimits _{\varvec{x}\in B_p(r)}\min _{\varvec{f}\in F^{\prime }}\psi \big (\big \Vert A^T\varvec{x}-\varvec{f}\big \Vert \big )P(\text {d}\varvec{x})\\&\ge \phi (M-r)P(B_p(r)). \end{aligned}$$

This is a contradiction. \(\square \)

Lemma 6

Suppose that \(\int \psi (\Vert \varvec{x}\Vert )P(d\varvec{x})<\infty \), and for \(j=2,3,\dots ,k-1\), \(m_j(P) > m_{k}(P)\). There exists \(M>0\) such that, for all \(F^{\prime }\in \mathcal {R}_k\) satisfying \(F^{\prime }\not \subset B_q(5M)\),

$$\begin{aligned} \inf _{A\in \mathcal {O}(p\times q)}\Psi (F^{\prime },A,P) > \inf _{\theta \in \Theta _k^*(5M)}\Psi (\theta ,P). \end{aligned}$$

Proof

Choose \(M>0\) to be sufficiently large to satisfy inequalities (3) and (4). Suppose that, for all \(M>0\), there exists \(F^{\prime }\in \mathcal {R}_k\) satisfying \(F^{\prime }\not \subset B_q(5M)\) and

$$\begin{aligned} \inf _{A\in \mathcal {O}(p\times q)}\Psi (F^{\prime },A,P) \le \inf _{\theta \in \Theta _k^*(5M)}\Psi (\theta ,P). \end{aligned}$$

Let \(\mathcal {R}_k^{\prime }\) be the set of such \(F^{\prime }\) and then

$$\begin{aligned} m_{k}(P)=\inf _{\theta \in \mathcal {R}_k^{\prime }\times \mathcal {O}(p\times q)}\Psi (\theta ,P). \end{aligned}$$

According to Lemma 5, each \(F^{\prime }\in \mathcal {R}_k^{\prime }\) includes at least one point on \(B_{q}(M)\), say \(\varvec{f}_1\). For all \(\varvec{x}\) satisfying \(\Vert \varvec{x}\Vert <2M\) and all \(A\in \mathcal {O}(p\times q)\), we obtain

$$\begin{aligned} \big \Vert A^T\varvec{x}-\varvec{f}\big \Vert >3M \quad \text {for all } \varvec{f}\not \in B_q(5M) \end{aligned}$$

and

$$\begin{aligned} \big \Vert A^T\varvec{x}-\varvec{g}\big \Vert <3M \quad \text {for all } \varvec{g}\in B_q(M). \end{aligned}$$

Thus,

$$\begin{aligned} \int \nolimits _{\Vert \varvec{x}\Vert < 2M} \min _{\varvec{f}\in F^{\prime }}\psi \big (\big \Vert A^T\varvec{x}-\varvec{f}\big \Vert \big )P(d\varvec{x}) = \int \nolimits _{\Vert \varvec{x}\Vert < 2M} \min _{\varvec{f}\in F^{*}}\psi \big (\big \Vert A^T\varvec{x}-\varvec{f}\big \Vert \big )P(d\varvec{x}), \end{aligned}$$

where the set \(F^{*}\) is obtained by deleting all points outside \(B_q(5M)\) from \(F^{\prime }\). Since \( \int _{\Vert \varvec{x}\Vert \ge 2M}\psi (\Vert A^T\varvec{x}-\varvec{f}_1\Vert )P(d\varvec{x}) \le \lambda \int _{\Vert \varvec{x}\Vert \ge 2M}\psi (\Vert \varvec{x}\Vert )P(d\varvec{x}) \), we obtain that

$$\begin{aligned}&\Psi (F_{k}^{\prime },A,P) + \lambda \int \nolimits _{\Vert \varvec{x}\Vert \ge 2M}\psi (\Vert \varvec{x}\Vert )P(\text {d}\varvec{x})\\&\ge \int \nolimits _{\Vert \varvec{x}\Vert < 2M} \min _{\varvec{f}\in F^{*}}\psi \big (\big \Vert A^T\varvec{x}-\varvec{f}\big \Vert \big )P(\text {d}\varvec{x}) + \int \nolimits _{\Vert \varvec{x}\Vert \ge 2M}\psi \big (\big \Vert A^T\varvec{x}-\varvec{f}_1\big \Vert \big )P(\text {d}\varvec{x})\\&\ge \Psi (F^{*},A,P) \ge m_{k-1}(P) \end{aligned}$$

for all \(A\in \mathcal {O}(p\times q)\). It follows that \(m_k(P)+\epsilon \le m_{k-1}(P)\), which is a contradiction. \(\square \)

Let us consider \(M>0\) to be sufficiently large to satisfy inequalities (3) and (4). Write \(\Theta _k:=\mathcal {R}_k^*(5M)\times \mathcal {O}(p\times q)\). Proposition 1 and Corollary 1 can be proved in the same way as Proposition 1 and Corollary 1 in Terada (2014).

Proof of Proposition 1

According to Lemma 6,

$$\begin{aligned} \inf _{\theta \in \Xi _k}\Psi (\theta ,P)=\inf _{\theta \in \Theta _k}\Psi (\theta ,P). \end{aligned}$$

Moreover, for any \(\theta \in (\mathcal {R}_k\setminus \mathcal {R}_k^*(5M))\times \mathcal {O}(p\times q)\), \(m_k(P)<\Psi (\theta ,P)\). Thus, we only have to prove \(\Theta ^{\prime }\ne \emptyset \).

Let \(C:=\{\Psi (\theta ,P)\mid \theta \in \Theta _k\}\) and then \(m_k(P)=\inf C\). By the definition of the infimum, for all \(x > m_k(P)\), there exists \(c\in C\) such that \(c<x\). By the axiom of choice, we can obtain a sequence \(\{c_n\}_{n\in \mathbb {N}}\) such that \(c_n \rightarrow m_k(P)\) as \(n\rightarrow \infty \). Using the axiom of choice again, we can obtain a sequence \(\{\theta _{n}\}_{n\in \mathbb {N}}\) such that \(\Psi (\theta _n,P)\rightarrow m_k(P)\) as \(n\rightarrow \infty \).

By the compactness of \(\Theta _k\), there exists a convergent subsequence of \(\{\theta _n\}_{n\in \mathbb {N}}\), say \(\{\theta _{n_i}\}_{i\in \mathbb {N}}\). Let \(\theta _{*}\in \Theta _k\) denote the limit of subsequence \(\{\theta _{n_i}\}_{i\in \mathbb {N}}\), i.e., \(\theta _{m_i}\rightarrow \theta _*\) as \(i\rightarrow \infty \). Since \(\Psi (\cdot ,P)\) is continuous on \(\Theta _{k}\), \(\Psi (\theta _*,P)=m_k(P)\). Hence, we obtain \(\Theta ^{\prime }\ne \emptyset \). \(\square \)

Proof of Corollary 1

Let \(\Theta _\epsilon :=\{\theta _k\in \Theta _k \mid \Psi (\theta _k,P)=m_k(P)\}\). Conversely, suppose that there exists \(\epsilon >0\) such that \(\inf _{\theta \in \Theta _{\epsilon }}\Psi (\theta ,P)=\inf _{\theta \in \Theta ^{\prime }}\Psi (\theta ,P)\). By the definition of the infimum, there exists a sequence \(\{\theta _n\}_{n\in \mathbb {N}}\) on \(\Theta _{\epsilon }\) such that \(\Psi (\theta _n,P)\rightarrow m_k(P)\) as \(n\rightarrow \infty \). By compactness of \(\Theta _k\), there exists a convergent subsequence of \(\{\theta _n\}_{n\in \mathbb {N}}\), say \(\{\theta _{m_i}\}_{i \in \mathbb {N}}\). Let \(\theta _{*}\in \Theta _k\) denote the limit of subsequence \(\{\theta _{m_i}\}_{i\in \mathbb {N}}\). Since \(\theta _{m_i}\rightarrow \theta _*\) as \(i\rightarrow \infty \), we have \(d(\theta _{m_i},\theta _*)<\epsilon \) for a sufficiently large \(i\), which is a contradiction. \(\square \)

About this article

Cite this article

Terada, Y. Strong consistency of factorial \(K\)-means clustering. Ann Inst Stat Math 67, 335–357 (2015). https://doi.org/10.1007/s10463-014-0454-0

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-014-0454-0

Keywords