Why Simple Hash Functions Work: Exploiting the Entropy in a Data Stream

by Kai-Min Chung, Michael Mitzenmacher, and Salil Vadhan

Theory of Computing, Volume 9(30), pp. 897-945, 2013

Bibliography with links to cited articles

[1]   Noga Alon, Martin Dietzfelbinger, Peter Bro Miltersen, Erez Petrank, and Gábor Tardos: Linear hash functions. J. ACM, 46(5):667–683, 1999. Preliminary version in STOC’97. [doi:10.1145/324133.324179]

[2]   Yossi Azar, Andrei Z. Broder, Anna R. Karlin, and Eli Upfal: Balanced allocations. SIAM J. Comput., 29(1):180–200, 1999. Preliminary version in STOC’94. [doi:10.1137/S0097539795288490]

[3]   Charles H. Bennett, Gilles Brassard, and Jean-Marc Robert: Privacy amplification by public discussion. SIAM J. Comput., 17(2):210–229, 1988. Preliminary version in CRYPTO’85. [doi:10.1137/0217014]

[4]   Burton H. Bloom: Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7):422–426, 1970. [doi:10.1145/362686.362692]

[5]   Avrim Blum and Joel Spencer: Coloring random and semi-random k-colorable graphs. J. Algorithms, 19(2):204–234, 1995. [doi:10.1006/jagm.1995.1034]

[6]   Andrei Z. Broder and Michael Mitzenmacher: Using multiple hash functions to improve IP lookups. In Proc. 20th Ann. Joint Conf. IEEE Computer and Communications Societies (INFOCOM’01), pp. 1454–1463. IEEE Comp. Soc. Press, 2001. [doi:10.1109/INFCOM.2001.916641]

[7]   Andrei Z. Broder and Michael Mitzenmacher: Network applications of Bloom filters: A survey. Internet Mathematics, 1(4):485–509, 2004. [doi:10.1080/15427951.2004.10129096]

[8]   J. Lawrence Carter and Mark N. Wegman: Universal classes of hash functions. J. Comput. System Sci., 18(2):143–154, 1979. Preliminary version in STOC’77. [doi:10.1016/0022-0000(79)90044-8]

[9]   Benny Chor and Oded Goldreich: Unbiased bits from sources of weak randomness and probabilistic communication complexity. SIAM J. Comput., 17(2):230–261, 1988. Preliminary version in FOCS’85. [doi:10.1137/0217015]

[10]   Sarang Dharmapurikar, Praveen Krishnamurthy, Todd S. Sproull, and John W. Lockwood: Deep packet inspection using parallel Bloom filters. IEEE Micro, 24(1):52–61, 2004. Preliminary version in HotI’03. [doi:10.1109/MM.2004.1268997]

[11]   Martin Dietzfelbinger and Philipp Woelfel: Almost random graphs with simple hash functions. In Proc. 35th STOC, pp. 629–638. ACM Press, 2003. [doi:10.1145/780542.780634]

[12]   Yevgeniy Dodis, Rosario Gennaro, Johan Håstad, Hugo Krawczyk, and Tal Rabin: Randomness extraction and key derivation using the CBC, Cascade and HMAC modes. In Proc. 24th. Ann. Internat. Cryptology Conference (CRYPTO’04), pp. 494–510. Springer, 2004. [doi:10.1007/978-3-540-28628-8_30]

[13]   Richard Durrett: Probability: Theory and Examples. Third Edition. Duxbury, 2004.

[14]   Alison L. Gibbs and Francis Edward Su: On choosing and bounding probability metrics. International Statistical Review, 70(3):419–435, 2002. See also at arXiv. [doi:10.1111/j.1751-5823.2002.tb00178.x]

[15]   Gaston H. Gonnet: Expected length of the longest probe sequence in hash code searching. J. ACM, 28(2):289–304, 1981. [doi:10.1145/322248.322254]

[16]   Gaston H. Gonnet and Ricardo Baeza-Yates: Handbook of algorithms and data structures: in Pascal and C. Addison-Wesley Longman Publishing Co., Inc. Boston, MA, USA, 1991. [ACM:103324]

[17]   Johan Håstad, Russell Impagliazzo, Leonid A. Levin, and Michael Luby: A pseudorandom generator from any one-way function. SIAM J. Comput., 28(4):1364–1396, 1999. Preliminary versions in STOC’89 and STOC’90. [doi:10.1137/S0097539793244708]

[18]   Russell Impagliazzo and David Zuckerman: How to recycle random bits. In Proc. 30th FOCS, pp. 248–253. IEEE Comp. Soc. Press, 1989. [doi:10.1109/SFCS.1989.63486]

[19]   Adam Kirsch and Michael Mitzenmacher: Less hashing, same performance: Building a better Bloom filter. Random Structures & Algorithms, 33(2):187–218, 2008. Preliminary version in ESA’06. [doi:10.1002/rsa.20208]

[20]   Donald E. Knuth: The Art of Computer Programming, Volume 3: Sorting and Searching. Addison Wesley Longman Publishing Co., Inc. Redwood City, CA, USA, 1998. [ACM:280635]

[21]   S. Muthu Muthukrishnan: Data Streams: Algorithms and Applications. Foundations and Trends in Theoretical Computer Science, 1(2):117–236, 2005. Preliminary version in SODA’03. [doi:10.1561/0400000002]

[22]   Noam Nisan and Amnon Ta-Shma: Extracting randomness: A survey and new constructions. J. Comput. System Sci., 58(1):148–173, 1999. Preliminary versions in CCC’96 and STOC’96. [doi:10.1006/jcss.1997.1546]

[23]   Noam Nisan and David Zuckerman: Randomness is linear in space. J. Comput. System Sci., 52(1):43–52, 1996. [doi:10.1006/jcss.1996.0004]

[24]   Anna Pagh and Rasmus Pagh: Uniform hashing in constant time and optimal space. SIAM J. Comput., 38(1):85–96, 2008. Preliminary version in STOC’03. [doi:10.1137/060658400]

[25]   Anna Pagh, Rasmus Pagh, and Milan Ružić: Linear probing with constant independence. SIAM J. Comput., 39(3):1107–1120, 2009. Preliminary version in STOC’07. [doi:10.1137/070702278]

[26]   Rasmus Pagh and Flemming Friche Rodler: Cuckoo hashing. J. Algorithms, 51(2):122–144, 2004. Preliminary version in ESA’01. [doi:10.1016/j.jalgor.2003.12.002]

[27]   Mihai Pǎtraşcu and Mikkel Thorup: On the k-independence required by linear probing and minwise independence. In Proc. 37th Internat. Colloq. on Automata, Languages and Programming (ICALP’10), pp. 715–726, 2010. Extended version on arXiv. [doi:10.1007/978-3-642-14165-2_60]

[28]   Martin Raab and Angelika Steger: “Balls into bins”—a simple and tight analysis. In Proc. 2nd Internat. Workshop on Randomization and Computation (RANDOM’98), pp. 159–170. Springer, 1998. [doi:10.1007/3-540-49543-6_13]

[29]   Jaikumar Radhakrishnan and Amnon Ta-Shma: Bounds for dispersers, extractors, and depth-two superconcentrators. SIAM J. Discrete Math., 13(1):2–24, 2000. Preliminary version in FOCS’97. [doi:10.1137/S0895480197329508]

[30]   M. V. Ramakrishna: Hashing practice: analysis of hashing and universal hashing. In Proc. 1988 ACM SIGMOD Internat. Conf. on Management of Data (SIGMOD’88), pp. 191–199, New York, NY, USA, 1988. ACM Press. [doi:10.1145/50202.50223]

[31]   M. V. Ramakrishna: Practical performance of Bloom filters and parallel free-text searching. Commun. ACM, 32(10):1237–1239, 1989. [doi:10.1145/67933.67941]

[32]   M. V. Ramakrishna, E. Fu, and E. Bahcekapili: Efficient hardware hashing functions for high performance computers. IEEE Trans. Comput., 46(12):1378–1381, 1997. [doi:10.1109/12.641938]

[33]   Leonid Reyzin: A note on the statistical difference of small direct products. Technical Report BUCS-TR-2004-032, Boston University Computer Science Department, 2004. Boston University.

[34]   Amit Sahai and Salil Vadhan: Manipulating statistical difference. In Panos Pardalos, Sanguthevar Rajasekaran, and José Rolim, editors, Proc. DIMACS Workshop on Randomization Methods in Algorithm Design (DIMACS’97), volume 43 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science, pp. 251–270. Amer. Math. Soc., 1997.

[35]   Miklos Santha and Umesh V. Vazirani: Generating quasi-random sequences from semi-random sources. J. Comput. System Sci., 33(1):75–87, 1986. Preliminary version in FOCS’84. [doi:10.1016/0022-0000(86)90044-9]

[36]   Jeanette P. Schmidt and Alan Siegel: The analysis of closed hashing under limited randomness (extended abstract). In Proc. 22nd STOC, pp. 224–234. ACM Press, 1990. [doi:10.1145/100216.100245]

[37]   Ronen Shaltiel: Recent developments in explicit constructions of extractors. In Current and Trends in Theoretical Computer Science: The Challenge of the New Century. Volume 1: Algorithms and Complexity, pp. 189–228. World Scientific, 2002. Available at World Scientific.

[38]   Alan Siegel: On universal classes of extremely random constant-time hash functions. SIAM J. Comput., 33(3):505–543, 2004. [doi:10.1137/S0097539701386216]

[39]   Daniel A. Spielman and Shang-Hua Teng: Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time. J. ACM, 51(3):385–463, 2004. Preliminary version in STOC’01. See also at arXiv. [doi:10.1145/990308.990310]

[40]   Mikkel Thorup and Yin Zhang: Tabulation-based 5-independent hashing with applications to linear probing and second moment estimation. SIAM J. Comput., 41(2):293–331, 2012. Preliminary work in ALENEX’10 and SODA’04. [doi:10.1137/100800774]

[41]   Berthold Vöcking: How asymmetry helps load balancing. J. ACM, 50(4):568–589, 2003. Preliminary version in FOCS’99. [doi:10.1145/792538.792546]

[42]   Mark N. Wegman and J. Lawrence Carter: New hash functions and their use in authentication and set equality. J. Comput. System Sci., 22(3):265–279, 1981. Preliminary version in FOCS’79. [doi:10.1016/0022-0000(81)90033-7]

[43]   Philipp Woelfel: Asymmetric balanced allocation with simple hash functions. In Proc. 17th Ann. ACM-SIAM Symp. on Discrete Algorithms (SODA’06), pp. 424–433. ACM Press, 2006. [ACM:1109557.1109605]

[44]   David Zuckerman: Simulating BPP using a general weak random source. Algorithmica, 16(4/5):367–391, 1996. Preliminary version in FOCS’91. [doi:10.1007/BF01940870]