A Constant-Factor Approximation Algorithm for Co-clustering

by Aris Anagnostopoulos, Anirban Dasgupta, and Ravi Kumar

Theory of Computing, Volume 8(26), pp. 597-622, 2012

Bibliography with links to cited articles

[1]   Deepak Agarwal and Srujana Merugu: Predictive discrete latent factor models for large scale dyadic data. In Proc. 13th Internat. Conf. on Knowledge Discovery and Data Mining (KDD’07), pp. 26–35, 2007. [doi:10.1145/1281192.1281199]

[2]   Daniel Aloise, Amit Deshpande, Pierre Hansen, and Preyas Popat: NP-hardness of Euclidean sum-of-squares clustering. Machine Learning, 75(2):245–248, 2009. [doi:10.1007/s10994-009-5103-0]

[3]   Arindam Banerjee, Inderjit S. Dhillon, Joydeep Ghosh, Srujana Merugu, and Dharmendra S. Modha: A generalized maximum entropy approach to Bregman co-clustering and matrix approximation. Journal of Machine Learning Research, 8:1919–1986, 2007. Preliminary version in KDD’04. [ACM:1314498.1314563]

[4]   Nikhil Bansal, Avrim Blum, and Shuchi Chawla: Correlation clustering. Machine Learning, 56(1-3):89–113, 2004. Preliminary version in FOCS’02. [doi:10.1023/B:MACH.0000033116.57574.95]

[5]   Peter Brucker: On the complexity of clustering problems. In Optimization and Operations Research: Proc. of a Workshop held at the University of Bonn, pp. 45–54. Springer, 1977.

[6]   Moses Charikar, Sudipto Guha, Éva Tardos, and David B. Shmoys: A constant-factor approximation algorithm for the k-median problem. J. Comput. System Sci., 65(1):129–149, 2002. Preliminary version in STOC’99. [doi:10.1006/jcss.2002.1882]

[7]   Yizong Cheng and George M. Church: Biclustering of expression data. In Proc. 8th Internat. Conf. on Intelligent Systems for Molecular Biology (ISMB’00), pp. 93–103. AAAI Press, 2000. [ACM:645635.660833]

[8]   Hyuk Cho, Inderjit S. Dhillon, Yuqiang Guan, and Suvrit Sra: Minimum sum-squared residue co-clustering of gene expression data. In Proc. 4th SIAM Internat. Conf. on Data Mining (SDM’04), pp. 114–125. SIAM, 2004. Accessible at http://siam.org/proceedings/datamining/2004/dm04_011choh.pdf.

[9]   Sanjoy Dasgupta: The hardness of k-means clustering. Technical Report CS2008-0916, University of California, San Diego, 2008. Accessible at http://csetechrep.ucsd.edu/Dienst/UI/2.0/Describe/ncstrl.ucsd_cse/CS2008-0916.

[10]   Inderjit S. Dhillon: Co-clustering documents and words using bipartite spectral graph partitioning. In Proc. 7th Internat. Conf. on Knowledge Discovery and Data Mining (KDD’01), pp. 269–274. ACM Press, 2001. [doi:10.1145/502512.502550]

[11]   Petros Drineas, Alan M. Frieze, Ravi Kannan, Santosh Vempala, and V. Vinay: Clustering large graphs via the singular value decomposition. Machine Learning, 56(1–3):9–33, 2004. Preliminary version in SODA’99. [doi:10.1023/B:MACH.0000033113.59016.96]

[12]   Richard O. Duda, Peter E. Hart, and David G. Stork: Pattern Classification. Wiley Interscience, 2nd edition, 2000.

[13]   Uriel Feige and Shimon Kogan: Hardness of approximation of the balanced complete bipartite subgraph problem. Technical Report MCS04-04, Department of Computer Science and Applied Math., The Weizmann Institute of Science, 2004. Accessible at http://www.wisdom.weizmann.ac.il/math/reports/years/2004.shtml.

[14]   Robert J. Fowler, Mike Paterson, and Steven L. Tanimoto: Optimal packing and covering in the plane are NP-complete. Inform. Process. Lett., 12(3):133–137, 1981. [doi:10.1016/0020-0190(81)90111-3]

[15]   Bin Gao, Tie-Yan Liu, Xin Zheng, Qian-Sheng Cheng, and Wei-Ying Ma: Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering. In Proc. 11th Internat. Conf. on Knowledge Discovery and Data Mining (KDD’05), pp. 41–50, 2005. [doi:10.1145/1081870.1081879]

[16]   Sreenivas Gollapudi, Ravi Kumar, and D. Sivakumar: Programmable clustering. In Proc. 25th Symp. on Principles of Database Systems (PODS’06), pp. 348–354, 2006. [doi:10.1145/1142351.1142400, ACM:1142351.1142400]

[17]   John A. Hartigan: Direct clustering of a data matrix. J. Amer. Stat. Assoc., 67(337):123–129, 1972. [doi:10.1080/01621459.1972.10481214]

[18]   Saeed Hassanpour: Computational complexity of bi-clustering. Master’s thesis, University of Waterloo, 2007. Accessible at http://uwspace.uwaterloo.ca/bitstream/10012/3359/1/Thesis.pdf.

[19]   Anil K. Jain, M. Narasimha Murty, and Patrick J. Flynn: Data clustering: A review. ACM Comput. Surv., 31(3):264–323, 1999. [doi:10.1145/331499.331504]

[20]   Michel Jambu and Marie-Odile Lebeaux: Cluster Analysis and Data Analysis. Elsevier Science Ltd., 1983.

[21]   Stefanie Jegelka, Suvrit Sra, and Arindam Banerjee: Approximation algorithms for tensor clustering. In Proc. 20th Internat. Conf. on Algorithmic Learning Theory (ALT’09), pp. 368–383. Springer, 2009. [doi:10.1007/978-3-642-04414-4_30]

[22]   Jon Kleinberg: An impossibility theorem for clustering. In Proc. Advances in Neural Information Processing Systems (NIPS’02), pp. 446–453. MIT Press, 2002.

[23]   Yuval Kluger, Ronen Basri, Joseph T. Chang, and Mark Gerstein: Spectral biclustering of microarray data: Coclustering genes and conditions. Genome Research, 13:703–716, 2003. [doi:10.1101/gr.648603]

[24]   Amit Kumar, Yogish Sabharwal, and Sandeep Sen: A simple linear time (1 + ϵ)-approximation algorithm for k-means clustering in any dimensions. In Proc. 45th FOCS, pp. 454–462. IEEE Comp. Soc. Press, 2004. [doi:10.1109/FOCS.2004.7]

[25]   Sara C. Madeira and Arlindo L. Oliveira: Biclustering algorithms for biological data analysis: A survey. IEEE/ACM Trans. Comput. Biology Bioinform., 1(1):24–45, 2004. [doi:10.1109/TCBB.2004.2]

[26]   Meena Mahajan, Prajakta Nimbhorkar, and Kasturi Varadarajan: The planar k-means problem is NP-hard. Theoret. Comput. Sci., 442:13–21, 2012. Preliminary version in WALCOM’09. [doi:10.1016/j.tcs.2010.05.034]

[27]   Nimrod Megiddo and Kenneth J. Supowit: On the complexity of some common geometric location problems. SIAM J. Comput., 13(1):182–196, 1984. [doi:10.1137/0213014]

[28]   Nina Mishra, Dana Ron, and Ram Swaminathan: A new conceptual clustering framework. Machine Learning, 56(1-3):115–151, 2004. Preliminary version in COLT’03. [doi:10.1023/B:MACH.0000033117.77257.41]

[29]   René Peeters: The maximum edge biclique problem is NP-complete. Discr. Appl. Math., 131(3):651–654, 2003. [doi:10.1016/S0166-218X(03)00333-0]

[30]   Kai Puolamäki, Sami Hanhijärvi, and Gemma C. Garriga: An approximation ratio for biclustering. Inform. Process. Lett., 108(2):45–49, 2008. [doi:10.1016/j.ipl.2008.03.013]

[31]   Ron Shamir, Roded Sharan, and Dekel Tsur: Cluster graph modification problems. Discr. Appl. Math., 144(1–2):173–182, 2004. Preliminary version in WG’02. [doi:10.1016/j.dam.2004.01.007]

[32]   Amos Tanay, Roded Sharan, and Ron Shamir: Biclustering algorithms: A survey. In Srinivas Aluru, editor, Handbook of Computational Molecular Biology. Chapman & Hall/CRC, Computer and Information Science Series, 2005.

[33]   Andrea Vattani: The hardness of k-means clustering in the plane. Manuscript, accessible at http://cseweb.ucsd.edu/˜avattani/papers/kmeans_hardness.pdf.

[34]   Jiong Yang, Haixun Wang, Wei Wang, and Philip S. Yu: Enhanced biclustering on expression data. In Proc. 3rd IEEE Conf. on Bioinformatics and Bioengineering (BIBE’03), pp. 321–327. IEEE Comp. Soc. Press, 2003. [doi:10.1109/BIBE.2003.1188969]

[35]   Jiong Yang, Wei Wang, Haixun Wang, and Philip S. Yu: δ-clusters: Capturing subspace correlation in a large data set. In Proc. 18th Internat. Conf. on Data Engineering (ICDE’02), pp. 517–528. IEEE Comp. Soc. Press, 2002. [doi:10.1109/ICDE.2002.994771]

[36]   Hanson Zhou and David P. Woodruff: Clustering via matrix powering. In Proc. 23rd Symp. on Principles of Database Systems (PODS’04), pp. 136–142. ACM Press, 2004. [doi:10.1145/1055558.1055579]