Efficient algorithms for new computational models

Advances in hardware design and manufacturing often lead to new ways in which problems can be solved computationally. In this thesis we explore fundamental problems in three computational models that are based on such recent advances. The first model is based on new chip architectures, where multiple independent processing units are placed on one chip, allowing for an unprecedented parallelism in hardware. We provide new scheduling algorithms for this computational model. The second model is motivated by peer-to-peer networks, where countless (often inexpensive) computing devices cooperate in distributed applications without any central control. We state and analyze new algorithms for load balancing and for locality-aware distributed data storage in peer-to-peer networks. The last model is based on extensions of the streaming model. It is an attempt to capture the class of problems that can be efficiently solved on massive data sets. We give a number of algorithms for this model, and compare it to other models that have been proposed for massive data set computations. Our algorithms and complexity results for these computational models follow the central thesis that it is an important part of theoretical computer science to model real-world computational structures, and that such effort is richly rewarded by a plethora of interesting and challenging problems. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

[1]  Peter van Emde Boas,et al.  Machine Models and Simulation , 1990, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[2]  Daniel W. Engels,et al.  Scheduling for hardware/software partitioning in embedded system design , 2000 .

[3]  E.L. Lawler,et al.  Optimization and Approximation in Deterministic Sequencing and Scheduling: a Survey , 1977 .

[4]  R. Brotman,et al.  A functional approach. , 1970, International journal of psychiatry.

[5]  Jeffrey Scott Vitter,et al.  External memory algorithms and data structures: dealing with massive data , 2001, CSUR.

[6]  Robert E. Tarjan,et al.  A data structure for dynamic trees , 1981, STOC '81.

[7]  D. Matula A linear time 2 + ε approximation algorithm for edge connectivity , 1993, SODA 1993.

[8]  S. Muthukrishnan,et al.  Overcoming the memory bottleneck in suffix tree construction , 1998, Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280).

[9]  Paul G. Spirakis,et al.  Lower bounds and efficient algorithms for multiprocessor scheduling of dags with communication delays , 1989, SPAA '89.

[10]  David Mazières,et al.  Kademlia: A Peer-to-Peer Information System Based on the XOR Metric , 2002, IPTPS.

[11]  Mark de Berg,et al.  Computational geometry: algorithms and applications , 1997 .

[12]  Jeffrey D. Ullman,et al.  NP-Complete Scheduling Problems , 1975, J. Comput. Syst. Sci..

[13]  Aravind Srinivasan,et al.  Chernoff-Hoeffding bounds for applications with limited independence , 1995, SODA '93.

[14]  Jon Feldman,et al.  The Directed Steiner Network problem is tractable for a constant number of terminals , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[15]  Ziv Bar-Yossef,et al.  An information statistics approach to data stream and communication complexity , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[16]  David R. Karger,et al.  Finding nearest neighbors in growth-restricted metrics , 2002, STOC '02.

[17]  Sudipto Guha,et al.  Application of the two-sided depth test to CSG rendering , 2003, I3D '03.

[18]  Christophe Picouleau Etude de problemes d'optimisation dans les systemes distribues , 1992 .

[19]  David Bernstein,et al.  Scheduling expressions on a pipelined processor with a maximal delay of one cycle , 1989, TOPL.

[20]  室 章治郎 Michael R.Garey/David S.Johnson 著, "COMPUTERS AND INTRACTABILITY A guide to the Theory of NP-Completeness", FREEMAN, A5判変形判, 338+xii, \5,217, 1979 , 1980 .

[21]  Mayur Datar,et al.  Extending the Streaming Model: Sorting and Streaming Networks , 2003 .

[22]  Jeffery R. Westbrook,et al.  A Functional Approach to External Graph Algorithms , 1998, Algorithmica.

[23]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[24]  Joshua B. Tenenbaum,et al.  Mapping a Manifold of Perceptual Observations , 1997, NIPS.

[25]  Sergey Brin,et al.  Near Neighbor Search in Large Metric Spaces , 1995, VLDB.

[26]  Rolf H. Möhring,et al.  A Simple Approximation Algorithm for Scheduling Forests with Unit Processing Times and Zero-One Communication Delays , 1995 .

[27]  Robert Krauthgamer,et al.  Navigating nets: simple algorithms for proximity search , 2004, SODA '04.

[28]  Michael T. Goodrich,et al.  Parallel methods for visibility and shortest-path problems in simple polygons , 1992, Algorithmica.

[29]  Moni Naor,et al.  Viceroy: a scalable and dynamic emulation of the butterfly , 2002, PODC '02.

[30]  David R. Karger,et al.  Random Sampling in Cut, Flow, and Network Design Problems , 1999, Math. Oper. Res..

[31]  Olivier Devillers,et al.  Scalable Algorithms for Bichromatic Line Segment Intersection Problems on Coarse Grained Multicomputers , 1993, WADS.

[32]  Ben Y. Zhao,et al.  OceanStore: an architecture for global-scale persistent storage , 2000, SIGP.

[33]  Jiri Matousek,et al.  Lectures on discrete geometry , 2002, Graduate texts in mathematics.

[34]  Scott Shenker,et al.  Complex Queries in Dht-based Peer-to-peer Networks , 2002 .

[35]  David R. Karger,et al.  Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web , 1997, STOC '97.

[36]  Bruce W. Weide,et al.  Optimal Expected-Time Algorithms for Closest Point Problems , 1980, TOMS.

[37]  Mihalis Yannakakis,et al.  Optimization, Approximation, and Complexity Classes (Extended Abstract) , 1988, STOC 1988.

[38]  Rajmohan Rajaraman,et al.  Accessing Nearby Copies of Replicated Objects in a Distributed Environment , 1999, Theory of Computing Systems.

[39]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[40]  Moni Naor,et al.  Novel architectures for P2P applications: the continuous-discrete approach , 2003, SPAA '03.

[41]  Pat Hanrahan,et al.  Ray tracing on programmable graphics hardware , 2002, SIGGRAPH Courses.

[42]  Sudipto Guha,et al.  Data-streams and histograms , 2001, STOC '01.

[43]  Zhen Liu,et al.  Single Machine Scheduling Subject To Precedence Delays , 1996, Discret. Appl. Math..

[44]  Richard M. Karp,et al.  Efficient Randomized Pattern-Matching Algorithms , 1987, IBM J. Res. Dev..

[45]  Nabil H. Mustafa,et al.  Streaming Geometric Optimization Using Graphics Hardware , 2003, ESA.

[46]  James Aspnes,et al.  Skip graphs , 2003, SODA '03.

[47]  David R. Karger,et al.  Koorde: A Simple Degree-Optimal Distributed Hash Table , 2003, IPTPS.

[48]  Jack Snoeyink,et al.  Counting and Reporting Red/Blue Segment Intersections , 1993, WADS.

[49]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[50]  Jeffrey K. Uhlmann,et al.  Satisfying General Proximity/Similarity Queries with Metric Trees , 1991, Inf. Process. Lett..

[51]  Pat Hanrahan,et al.  Data Parallel Computation on Graphics Hardware , 2003 .

[52]  M. V. Wilkes,et al.  The Art of Computer Programming, Volume 3, Sorting and Searching , 1974 .

[53]  S. VitterJ.,et al.  Algorithms for parallel memory, I , 1994 .

[54]  Jeffrey Scott Vitter,et al.  Algorithms for parallel memory, I: Two-level memories , 2005, Algorithmica.

[55]  Leonidas J. Guibas,et al.  Algorithms for bichromatic line-segment problems and polyhedral terrains , 1994, Algorithmica.

[56]  Rajeev Motwani,et al.  What can you do with a Web in your Pocket? , 1998, IEEE Data Eng. Bull..

[57]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[58]  Richard M. Karp,et al.  Load Balancing in Structured P2P Systems , 2003, IPTPS.

[59]  Jim Wyllie,et al.  SPsort: How to Sort a Terabyte Quickly , 1999 .

[60]  Alok Aggarwal,et al.  The input/output complexity of sorting and related problems , 1988, CACM.

[61]  Noga Alon,et al.  The Space Complexity of Approximating the Frequency Moments , 1999 .

[62]  S. Muthukrishnan,et al.  On the sorting-complexity of suffix tree construction , 2000, JACM.

[63]  David R. Karger,et al.  Minimum cuts in near-linear time , 1998, JACM.

[64]  Richard Cole,et al.  Scanning and Traversing: Maintaining Data for Traversals in a Memory Hierarchy , 2002, ESA.

[65]  Daniel M. Lewin,et al.  Consistent hashing and random trees : algorithms for caching in distributed networks , 1998 .

[66]  Greg Humphreys,et al.  Chromium: a stream-processing framework for interactive rendering on clusters , 2002, SIGGRAPH.

[67]  Éva Tardos,et al.  Fast Approximation Algorithms for Fractional Packing and Covering Problems , 1995, Math. Oper. Res..

[68]  Ramesh C. Agarwal,et al.  A super scalar sort algorithm for RISC processors , 1996, SIGMOD '96.

[69]  Edward F. Grove,et al.  External-memory graph algorithms , 1995, SODA '95.

[70]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[71]  Henry Hoffmann,et al.  The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs , 2002, IEEE Micro.

[72]  Santosh S. Vempala,et al.  Locality-preserving hashing in multidimensional spaces , 1997, STOC '97.

[73]  Jeffrey Scott Vitter,et al.  External-Memory Algorithms for Processing Line Segments in Geographic Information Systems , 1996 .

[74]  Jan Karel Lenstra,et al.  The Complexity of Scheduling Trees with Communication Delays , 1996, J. Algorithms.

[75]  Michael Luby,et al.  A simple parallel algorithm for the maximal independent set problem , 1985, STOC '85.

[76]  I. Bárány LECTURES ON DISCRETE GEOMETRY (Graduate Texts in Mathematics 212) , 2003 .

[77]  Sanjeev Khanna,et al.  Space-efficient online computation of quantile summaries , 2001, SIGMOD '01.

[78]  A. Church The calculi of lambda-conversion , 1941 .

[79]  Prabhakar Raghavan,et al.  Computing on data streams , 1999, External Memory Algorithms.

[80]  Ben Y. Zhao,et al.  Tapestry: a resilient global-scale overlay for service deployment , 2004, IEEE Journal on Selected Areas in Communications.

[81]  Kenneth L. Clarkson,et al.  Nearest Neighbor Queries in Metric Spaces , 1997, STOC '97.

[82]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[83]  Andrea C. Arpaci-Dusseau,et al.  High-performance sorting on networks of workstations , 1997, SIGMOD '97.

[84]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[85]  Stephen A. Cook,et al.  Problems Complete for Deterministic Logarithmic Space , 1987, J. Algorithms.

[86]  Peter Druschel,et al.  Pastry: Scalable, distributed object location and routing for large-scale peer-to- , 2001 .

[87]  Jon Feldman,et al.  Parallel processor scheduling with delay constraints , 2001, SODA '01.

[88]  Charles E. Leiserson,et al.  Cache-Oblivious Algorithms , 2003, CIAC.

[89]  Peter N. Yianilos,et al.  Data structures and algorithms for nearest neighbor search in general metric spaces , 1993, SODA '93.

[90]  Evripidis Bampis,et al.  Scheduling UET-UCT Series-Parallel Graphs on Two Processors , 1996, Theor. Comput. Sci..

[91]  Robert Krauthgamer,et al.  Bounded geometries, fractals, and low-distortion embeddings , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[92]  Gordon Stoll,et al.  WireGL: a scalable graphics system for clusters , 2001, SIGGRAPH.

[93]  Richard M. Karp,et al.  A stochastic process on the hypercube with applications to peer-to-peer networks , 2003, STOC '03.

[94]  A. Turing On Computable Numbers, with an Application to the Entscheidungsproblem. , 1937 .

[95]  Thomas Kailath,et al.  Scheduling in and out forests in the presence of communication delays , 1993, [1993] Proceedings Seventh International Parallel Processing Symposium.

[96]  J. Ian Munro,et al.  Selection and sorting with limited storage , 1978, 19th Annual Symposium on Foundations of Computer Science (sfcs 1978).

[97]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.