# Browsing by Author "Agarwal, Pankaj Kumar"

###### Results Per Page

###### Sort Options

Item Open Access Algorithms for Analyzing Spatio-Temporal Data(2018) Nath, AbhinandanIn today's age, huge data sets are becoming ubiquitous. In addition to their size, most of these data sets are often noisy, have outliers, and are incomplete. Hence, analyzing such data is challenging. We look at applying geometric techniques to tackle some of these challenges, with an emphasis on designing provably efficient algorithms. Our work takes two broad approaches -- distributed algorithms, and concise descriptors for big data sets.

With the massive amounts of data available today, it is common to store and process data using multiple machines. Parallel programming frameworks such as MapReduce and its variants are becoming popular for handling such large data. We present the first provably efficient algorithms to compute, store, and query data structures for range queries and approximate nearest neighbor queries in a popular parallel computing abstraction that captures the salient features of MapReduce and other massively parallel communication (MPC) models. Our algorithms are competitive in terms of running time and workload to their classical counterparts.

We propose parallel algorithms in the MPC model for processing large terrain elevation data (represented as a 3D point cloud) that are too big to fit on one machine. In particular, we present a simple randomized algorithm to compute the Delaunay triangulation of the $xy$-projections of the input points. Next we describe an efficient algorithm to compute the contour tree (a topological descriptor that succinctly encodes the contours of a terrain) of the resulting triangulated terrain.

We then look at comparing real-valued functions, by computing a distance function between their merge trees (a small-sized descriptor that succinctly captures the sublevel sets of a function). Merge trees are robust to noise in the data, and can be used effectively as proxies for the data sets themselves in some cases. We also use it give an algorithm to compute the Gromov-Hausdorff distance, a natural way to measure distance between two metric spaces. We give the first proof of hardness and the first non-trivial approximation algorithm for computing the Gromov-Hausdorff distance between metric spaces defined on trees, where the distance between two points is given by the length of the unique path between them in the tree.

Finally we look at the problem of capturing shared portions between large number of input trajectories. We formulate it as a subtrajectory clustering problem - the clustering of subsequences of trajectories. We propose a new model for clustering subtrajectories. Each cluster of subtrajectories is represented as a \emph{pathlet}, a sequence of points that is not necessarily a subsequence of an input trajectory. We present a single objective function for finding the optimal collection of pathlets that best represents the trajectories taking into account noise and other artifacts of the data. We show that the subtrajectory clustering problem is NP-hard and present fast approximation algorithms. We further improve the running time of our algorithm if the input trajectories are ``well-behaved". We also present experimental results on both real and synthetic data sets.

Item Open Access Algorithms for Clustering, Partitioning, and Planning(2023) Taylor, Erin C.We explore three geometric optimization problems with multifaceted goals, including efficiency, fairness, solution quality, and interpretability. The problems we study pose significant challenges because they involve complex geometric objects that interact in high-dimensional spaces. To contend with these challenges, our work utilizes realistic input structure to design algorithms that provide provably good solutions. Alternatively, we can approach the general problem statement with heuristic or approximate schemes.

The first problem domain addressed is trajectory clustering, where the goal is to partition noisy trajectories (polygonal curves) into clusters based on shared structure and find a representative trajectory for each cluster. We present a near-linear time (1+\eps)-approximation algorithm for k-median clustering of polygonal trajectories under the discrete Fréchet distance and finite point sets under the Hausdorff distance. The algorithm leverages the geometric properties of the trajectories, specifically that centers come from a "simpler" metric space, to achieve efficient clustering results.

The second problem we consider is redistricting, aiming to partition a given population into geographic regions in a fair and representative way. Redistricting involves drawing boundaries for electoral districts, which can significantly impact political representation and decision-making processes. We introduce a new model of fairness for the problem of fair redistricting and investigate the existence and computability of fair partitions. We consider this problem in both one and two dimensions. We also propose a new method of generating redistricting plans that leads to more compact (geometrically "nice") districts.

The final problem addressed is optimal motion planning for multiple robots in polygonal environments with obstacles. Each robot is represented a simple polygonal object, such as a unit disc or square. We present algorithms for generating motion plans which approximately minimize the sum of distances traveled from a start to target configuration. For the simplified case of two robots and given the paths the robots must traverse, we also give an algorithm to minimize the makespan (latest arrival time) of our schedule.

Item Open Access Algorithms for continuous queries: A geometric approach(2013) Yu, AlbertThere has been an unprecedented growth in both the amount of data and the number of users interested in different types of data. Users often want to keep track of the data that match their interests over a period of time. A continuous query, once issued by a user, maintains the matching results for the user as new data (as well as updates to the existing data) continue to arrive in a stream. However, supporting potentially millions of continuous queries is a huge challenge. This dissertation addresses the problem of scalably processing a large number of continuous queries over a wide-area network.

Conceptually, the task of supporting distributed continuous queries can be divided into two components--event processing (computing the set of affected users for each data update) and notification dissemination (notifying the set of affected users). The first part of this dissertation focuses on event processing. Since interacting with large-scale data can easily frustrate and overwhelm the users, top-k queries have attracted considerable interest from the database community as they allow users to focus on the top-ranked results only. However, it is nearly impossible to find a set of common top-ranked data that everyone is interested in, therefore, users are allowed to specify their interest in different forms of preferences, such as personalized ranking function and range selection. This dissertation presents geometric frameworks, data structures, and algorithms for answering several types of preference queries efficiently. Experimental evaluations show that our approaches outperform the previous ones by orders of magnitude.

The second part of the dissertation presents comprehensive solutions to the problem of processing and notifying a large number of continuous range top-k queries across a wide-area network. Simple solutions include using a content-driven network to notify all continuous queries whose ranges contain the update (ignoring top-k), or using a server to compute only the affected continuous queries and notifying them individually. The former solution generates too much network traffic, while the latter overwhelms the server. This dissertation presents a geometric framework which allows the set of affected continuous queries to be described succinctly with messages that can be efficiently disseminated using content-driven networks. Fast algorithms are also developed to reformulate each update into a set of messages whose number is provably optimal, with or without knowing all continuous queries.

The final component of this dissertation is the design of a wide-area dissemination network for continuous range queries. In particular, this dissertation addresses the problem of assigning users to servers in a wide-area content-based publish/subscribe system. A good assignment should consider both users' interests and locations, and balance multiple performance criteria including bandwidth, delay, and load balance. This dissertation presents a Monte Carlo approximation algorithm as well as a simple greedy algorithm. The Monte Carlo algorithm jointly considers multiple performance criteria to find a broker-subscriber assignment and provides theoretical performance guarantees. Using this algorithm as a yardstick, the greedy algorithm is also concluded to work well across a wide range of workloads.

Item Open Access Algorithms for Geometric Matching, Clustering, and Covering(2016) Pan, JiangweiWith the popularization of GPS-enabled devices such as mobile phones, location data are becoming available at an unprecedented scale. The locations may be collected from many different sources such as vehicles moving around a city, user check-ins in social networks, and geo-tagged micro-blogging photos or messages. Besides the longitude and latitude, each location record may also have a timestamp and additional information such as the name of the location. Time-ordered sequences of these locations form trajectories, which together contain useful high-level information about people's movement patterns.

The first part of this thesis focuses on a few geometric problems motivated by the matching and clustering of trajectories. We first give a new algorithm for computing a matching between a pair of curves under existing models such as dynamic time warping (DTW). The algorithm is more efficient than standard dynamic programming algorithms both theoretically and practically. We then propose a new matching model for trajectories that avoids the drawbacks of existing models. For trajectory clustering, we present an algorithm that computes clusters of subtrajectories, which correspond to common movement patterns. We also consider trajectories of check-ins, and propose a statistical generative model, which identifies check-in clusters as well as the transition patterns between the clusters.

The second part of the thesis considers the problem of covering shortest paths in a road network, motivated by an EV charging station placement problem. More specifically, a subset of vertices in the road network are selected to place charging stations so that every shortest path contains enough charging stations and can be traveled by an EV without draining the battery. We first introduce a general technique for the geometric set cover problem. This technique leads to near-linear-time approximation algorithms, which are the state-of-the-art algorithms for this problem in either running time or approximation ratio. We then use this technique to develop a near-linear-time algorithm for this

shortest-path cover problem.

Item Open Access Algorithms for Rectangular Robot Motion Planning(2023) Steiger, Alexander JohnWith the proliferation of robot deployment in large-scale operations, such as warehouse management and manufacturing, there is an increasing need for multi-robot motion planning solutions. The basic multi-robot motion planning problem is to decide whether robots can translate from given start positions to given target positions without colliding with each other or any obstacles in the environment on their way, and if so, to compute such collision-free paths. Together, such paths compose a collision-free motion plan.

In practice, there are other qualities to consider besides being collision-free, such as the lengths of the paths, the overall time for all robots to follow their paths, and robustness. We explore this problem space, called optimal multi-robot motion planning. In light of previous hardness results, we focus on instances where the robot(s) and obstacles are assumed to have simple but realistic geometric properties with the goal of describing algorithms that are provably efficient and/or compute high-quality solutions. A frequent aspect of our work is to suppose that the input objects (robots and/or obstacles) are rectangular objects with bounded aspect ratio; that is, the side lengths of the objects are similar in proportion to each other.

Motivated by applications to motion planning and others,we explore the problem of computing and decomposing the common exterior of a set of n 3D axis-aligned cubes. We first present an algorithm to compute the common exterior, whose runtime is nearly optimal and output-sensitive in the sense that the running time depends linearly on the complexity of the common exterior, k, and not the worst-case complexity, Θ(n^2), by exploiting the geometry of the cubes.

Second, we describe an algorithm to further process the common exterior of n axis-aligned cubes into a partition composed of simple regions, specifically axis-aligned boxes. Our partition has size near-linear in the complexity of the union, k, and Ω(k) is a lower bound on the size of any partition.We also describe a simpler algorithm to partition the common exterior of n axis-aligned boxes of arbitrary sizes into O(n^(3/2) + k) axis-aligned boxes, which is optimal in the sense that min{n^(3/2),k} is a worst-case lower bound. In the context of motion planning, our algorithms can be used to compute a small partition of the free space for a box-shaped robot amid obstacles approximated by a set of boxes.

Lastly, we consider a special case of the min-sum motion planning problem, which falls under the umbrella of optimal motion planning. The goal is to compute a motion plan that minimizes the total length of the paths to be followed by the robots. This problem is well-studied for a single robot and polynomial-time algorithms to compute the optimal motion plan are known.However, we are not aware of any polynomial time algorithm for this problem, nor do we know whether the problem is NP-hard. We consider instances of this problem for congruent square robots amid polygonal obstacles in the plane. Our result is a fully polynomial-time approximation scheme (FPTAS) for these instances, which is the first (1+ε)-approximation algorithm for an optimal motion planning problem involving two robots moving in a polygonal environment.

Item Open Access Computational Journalism: from Answering Question to Questioning Answers and Raising Good Questions(2015) Wu, YouOur media is saturated with claims of ``facts'' made from data. Database research has in the past focused on how to answer queries, but has not devoted much attention to discerning more subtle qualities of the resulting claims, e.g., is a claim ``cherry-picking''? This paper proposes a Query Response Surface (QRS) based framework that models claims based on structured data as parameterized queries. A key insight is that we can learn a lot about a claim by perturbing its parameters and seeing how its conclusion changes. This framework lets us formulate and tackle practical fact-checking tasks --- reverse-engineering vague claims, and countering questionable claims --- as computational problems. Within the QRS based framework, we take one step further, and propose a problem along with efficient algorithms for finding high-quality claims of a given form from data, i.e. raising good questions, in the first place. This is achieved to using a limited number of high-valued claims to represent high-valued regions of the QRS. Besides the general purpose high-quality claim finding problem, lead-finding can be tailored towards specific claim quality measures, also defined within the QRS framework. An example of uniqueness-based lead-finding is presented for ``one-of-the-few'' claims, landing in interpretable high-quality claims, and an adjustable mechanism for ranking objects, e.g. NBA players, based on what claims can be made for them. Finally, we study the use of visualization as a powerful way of conveying results of a large number of claims. An efficient two stage sampling algorithm is proposed for generating input of 2d scatter plot with heatmap, evalutaing a limited amount of data, while preserving the two essential visual features, namely outliers and clusters. For all the problems, we present real-world examples and experiments that demonstrate the power of our model, efficiency of our algorithms, and usefulness of their results.

Item Open Access Efficient Algorithms for Querying Large and Uncertain Data(2020) Sintos, StavrosQuery processing is an important problem in many research fields including database systems, data mining, and geometric computing. The goal is to preprocess input data into an index to efficiently answer queries over data. In many real applications we need to handle a large amount of data or/and data that are ambiguous because of human errors and data integration. With the data sets becoming increasingly large and complex, queries are also becoming complex, therefore, new challenging problems have emerged in the area of query processing.

Data summarization helps to accelerate expensive data queries by running the query procedure in a small data summary rather than the entire large dataset.

The first part of this thesis studies the problem of finding small high quality data summaries in the Euclidean space.

Data summarization can be applied either in the entire dataset or among points in a query range given by the user.

Efficient algorithms are proposed to get a small summarization of the entire dataset so that Top-$k$ or user preference queries are answered efficiently with high quality guarantees by looking only in the data summary.

Such a summary can also be used in multi-criteria decision problems.

Furthermore, near-linear space indexes are designed so that given a query rectangle, a diverse high-valued summary of $k$ points in the query range is returned efficiently.

The second part of the thesis focuses on data queries over uncertain data.

Uncertainty is typically captured using stochastic data models, and querying data requires either statistics about the probabilistic behavior of the underlying data, or data cleaning to reduce the uncertainty of the query answer.

Small size indexes are built so that, given a query rectangle, statistics of the MAX (or top-$k$) operator among the points in the query range are computed in sublinear time.

In addition, approximation algorithms are proposed for selecting data to clean in order to minimize the variance of the result of a query function over a probabilistic database.

While most of the methods hold for general functions, the main focus is on minimizing the variance in fact checking operations.

Item Open Access Flood Risk Analysis on Terrains(2021) Lowe, Aaron SAn important problem in terrain analysis is modeling how water flowsacross a terrain and creates floods by filling up depressions. This thesis examines a number of flood-risk related problems. One such problem is answering terrain-flood queries: given a terrain represented as a triangulated xy-monotone surface, a rain distribution and a volume of rain, determine which portions of the terrain are flooded.

The first part of this thesis develops efficient algorithms for terrain-flood queries under the single-flow direction (SFD) and multiflow-directions (MFD) models, in which water at a point flows along a single downslope edge or multiple downslope edges respectively. Algorithms are given for the more specific case of the SFD model, and then it is shown how to answer queries in the more general case under the MFD model.

Available terrain data is also often subject to uncertaintywhich must be incorporated into the terrain analysis. For instance, the digital elevation models of terrains have to be refined to incorporate underground pipes, tunnels, and waterways under bridges, but there is often uncertainty in their existence. By representing the uncertainty in the terrain data explicitly, methods for flood risk analysis that properly incorporate terrain uncertainty when reporting what areas are at risk of flooding can be developed.

The second part of the thesis shows how the algorithms for flood-risk can be extended to handle ``uncertain'' terrains, using standard a Monte Carlo method.

Finally, the third part of the thesis develops efficient algorithms for computing flow -query related problems to determine how much water is flowing over a given vertex or edges as a function of time. We show how to compute the 1D flow rate as well as develop a model for computing 2D channels as well.

A number of the algorithms are implemented and their efficacy and efficiency are tested on real terrains of different types (urban, suburban and mountainous.)

Item Open Access Geometric Approximation Algorithms - A Summary Based Approach(2012) Raghvendra, SharathkumarLarge scale geometric data is ubiquitous. In this dissertation, we design algorithms and data structures to process large scale geometric data efficiently. We design algorithms for some fundamental geometric optimization problems that arise in motion planning, machine learning and computer vision.

For a stream S of n points in d-dimensional space, we develop (single-pass) streaming algorithms for maintaining extent measures such as the minimum enclosing ball and diameter. Our streaming algorithms have a work space that is polynomial in d and sub-linear in n. For problems of computing diameter, width and minimum enclosing ball of S, we obtain lower bounds on the worst-case approximation ratio of any streaming algorithm that uses polynomial in d space. On the positive side, we design a summary called the blurred ball cover and use it for answering approximate farthest-point queries and maintaining approximate minimum enclosing ball and diameter of S. We describe a streaming algorithm for maintaining a blurred ball cover whose working space is linear in d and independent

of n.

For a set P of k pairwise-disjoint convex obstacles in 3-dimensions, we design algorithms and data structures for computing Euclidean shortest path between source s and destination t. The running time of our algorithm is linear in n and the size and query time of our data structure is independent of n. We follow a summary based approach, i.e., quickly compute a small sketch Q of P whose size is independent of n and then compute approximate shortest paths with respect to Q.

For d-dimensional point sets A and B, |A| |B| n, and for a parameter &epsilon > 0,

We give an algorithm to compute &epsilon-approximate minimum weight perfect matching of A and B under d(. , .) in time O(n1.5&tau(n)) ; here &tau(n) is the query/update time of a dynamic weighted nearest neighbor under d(. , .). When A, B are point sets from

a bounded integer grid, for L

_{1}and L_{infinity}-norms, our algorithm computes minimum weightperfect matching of A and B in time O(n1.5). Our algorithm also extends to a generalization of matching called the transportation problem.

We also present an O(n polylog n ) time algorithm that computes under any L

_{p}-norm, an &epsilon-approximate minimum weight perfect matching of A and B with high probability; all previous algorithms take

O(n1.5 time. We approximate the L

_{p}norm using a distance function, based on a randomly shifted quad-tree. The algorithm iteratively generates an approximate minimum-cost augmenting path under the new distance function intime proportional to the length of the path. We show that the total length of the augmenting paths generated by the algorithm is O(n log n) implying a near-linear running time.

All the problems mentioned above have a history of more than two decades and algorithms presented here improve previous work by an order of magnitude. Many of these improvements are obtained by new geometric techniques that might have broader applications

and are of independent interest.

Item Open Access Geometric Computing over Uncertain Data(2015) Zhang, WuzhouEntering the era of big data, human beings are faced with an unprecedented amount of geometric data today. Many computational challenges arise in processing the new deluge of geometric data. A critical one is data uncertainty: the data is inherently noisy and inaccuracy, and often lacks of completeness. The past few decades have witnessed the influence of geometric algorithms in various fields including GIS, spatial databases, and computer vision, etc. Yet most of the existing geometric algorithms are built on the assumption of the data being precise and are incapable of properly handling data in the presence of uncertainty. This thesis explores a few algorithmic challenges in what we call geometric computing over uncertain data.

We study the nearest-neighbor searching problem, which returns the nearest neighbor of a query point in a set of points, in a probabilistic framework. This thesis investigates two different nearest-neighbor formulations: expected nearest neighbor (ENN), where we consider the expected distance between each input point and a query point, and probabilistic nearest neighbor (PNN), where we estimate the probability of each input point being the nearest neighbor of a query point.

For the ENN problem, we consider a probabilistic framework in which the location of each input point and/or query point is specified as a probability density function and the goal is to return the point that minimizes the expected distance. We present methods for computing an exact ENN or an \\eps-approximate ENN, for a given error parameter 0 < \\eps < 1, under different distance functions. These methods build an index of near-linear size and answer ENN queries in polylogarithmic or sublinear time, depending on the underlying function. As far as we know, these are the first nontrivial methods for answering exact or \\eps-approximate ENN queries with provable performance guarantees. Moreover, we extend our results to answer exact or \\eps-approximate k-ENN queries. Notably, when only the query points are uncertain, we obtain state-of-the-art results for top-k aggregate (group) nearest-neighbor queries in the L1 metric using the weighted SUM operator.

For the PNN problem, we consider a probabilistic framework in which the location of each input point is specified as a probability distribution function. We present efficient algorithms for (i) computing all points that are nearest neighbors of a query point with nonzero probability; (ii) estimating, within a specified additive error, the probability of a point being the nearest neighbor of a query point; (iii) using it to return the point that maximizes the probability being the nearest neighbor, or all the points with probabilities greater than some threshold to be the nearest neighbor. We also present some experimental results to demonstrate the effectiveness of our approach.

We study the convex-hull problem, which asks for the smallest convex set that contains a given point set, in a probabilistic setting. In our framework, the uncertainty of each input point is described by a probability distribution over a finite number of possible locations including a null location to account for non-existence of the point. Our results include both exact and approximation algorithms for computing the probability of a query point lying inside the convex hull of the input, time-space tradeoffs for the membership queries, a connection between Tukey depth and membership queries, as well as a new notion of \\beta-hull that may be a useful representation of uncertain hulls.

We study contour trees of terrains, which encode the topological changes of the level set of the height value \\ell as we raise \\ell from -\\infty to +\\infty on the terrains, in a probabilistic setting. We consider a terrain that is defined by linearly interpolating each triangle of a triangulation. In our framework, the uncertainty lies in the height of each vertex in the triangulation, and we assume that it is described by a probability distribution. We first show that the probability of a vertex being a critical point, and the expected number of nodes (resp. edges) of the contour tree, can be computed exactly efficiently. Then we present efficient sampling-based methods for estimating, with high probability, (i) the probability that two points lie on an edge of the contour tree, within additive error; (ii) the expected distance of two points p, q and the probability that the distance of p, q is at least \\ell on the contour tree, within additive error and/or relative error, where the distance of p, q on a contour tree is defined to be the difference between the maximum height and the minimum height on the unique path from p to q on the contour tree.

Item Open Access Geometric Hitting Sets and Their Variants(2011) Ganjugunte, Shashidhara KrishnamurthyThis thesis explores a few geometric optimization problems that arise

in robotics and sensor networks. In particular we present efficient

algorithms for the hitting-set problem and the budgeted hitting-set problem.

Given a set of objects and a collection of subsets of the objects,

called ranges, the hitting-set problem asks for a minimum number of

objects that intersect all the subsets in the collection.

In geometric settings, objects are

typically a set of points and ranges are defined by a set of geometric

regions (e.g., disks or polygons), i.e., the subset of points lying in each

region forms a range.

The first result of this thesis is an efficient algorithm for an instance

of the hitting-set problem in which both the set of points and the set

of ranges are implicitly defined. Namely, we are given a convex

polygonal robot and a set of convex polygonal obstacles, and we wish

to find a small number of congruent copies of the robot that intersect

all the obstacles.

Next, motivated by the application of sensor placement in sensor networks,

we study the so-called ``art-gallery'' problem. Given a polygonal

environment, we wish to place the minimum number of guards so that

the every point in the environment is visible from at least one guard.

This problem can be formulated as a hitting-set problem. We present

a sampling based algorithm for this problem and study various extensions

of this problem.

Next, we study the geometric hitting-set problem in a dynamic setting,

where the objects and/or the ranges change with time and the goal is

to maintain a hitting set. We present algorithms

which maintain a small size hitting set with sub-linear update time.

Finally, we consider the budgeted hitting-set problem, in which we

are asked to choose a bounded number of objects that intersect as many

ranges as possible. Motivated by applications in network vulnerability

analysis we study this problem in a probabilistic setting.

Item Open Access Small and Stable Descriptors of Distributions for Geometric Statistical Problems(2009) Phillips, Jeff M.This thesis explores how to sparsely represent distributions of points for geometric statistical problems. A coreset C is a small summary of a point set P such that if a certain statistic is computed on P and C, then the difference in the results is guaranteed to be bounded by a parameter ε. Two examples of coresets are ε-samples and ε-kernels. An ε-sample can estimate the density of a point set in any range from a geometric family of ranges (e.g., disks, axis-aligned rectangles). An ε-kernel approximates the width of a point set in all directions. Both coresets have size that depends only on ε, the error parameter, not the size of the original data set. We demonstrate several improvements to these coresets and how they are useful for geometric statistical problems.

We reduce the size of ε-samples for density queries in axis-aligned rectangles to nearly a square root of the size when the queries are with respect to more general families of shapes, such as disks. We also show how to construct ε-samples of probability distributions.

We show how to maintain “stable” ε-kernels, that is if the point set P changes by a small amount, then the ε-kernel also changes by a small amount. This is useful in surveillance tracking problems and the stable properties leads to more efficient algorithms for maintaining ε-kernels.

We next study when the input point sets are uncertain and their uncertainty is modeled by probability distributions. Statistics on these point sets (e.g., radius of smallest enclosing ball) do not have exact answers, but rather distributions of answers. We describe data structures to represent approximations of these distributions and algorithms to compute them. We also show how to create distributions of ε-kernels and ε-samples for these uncertain data sets.

Finally, we examine a spatial anomaly detection problem: computing a spatial scan statistic. The input is a point set P and measurements on the point set. The spatial scan statistic finds the range (e.g., an axis-aligned bounding box) where the measurements inside the range are the most different from measurements outside of the range. We show how to compute this statistic efficiently while allowing for a bounded amount of approximation error. This result generalizes to several statistical models and types of input point sets.

Item Open Access Two Algorithmic Schemes for Geometric Bipartite Matching and Transportation(2020) Xiao, AllenGiven n red points and n blue points in d-dimensional space and a distance function, the geometric bipartite matching problem is to find a matching of red points to blue points, such that the sum of distances between the matched pairs is minimized. The general graph problem, minimum-cost bipartite matching, can be solved in polynomial time using the classical Hungarian algorithm, which leads to a cubic algorithm for the geometric matching problem. We present several algorithms for computing optimal and near-optimal geometric bipartite matching and its fractional generalization, geometric transportation. We also present fast algorithms for partial matching, i.e., match only k pairs for a given k <= n.

Finally, we consider the case when red points are allowed to translate (rigidly) and the goal is to find a partial matching minimum-cost matching under all translations of red points. We assume the cost of a matching to be the sum of squares of edge costs (RMS). For a given translation t, let f(t) denote the cost of optimal partial matching (for a fixed k). A long-standing open problem is whether the number of local minima in f(t) is polynomial; the best upper bounds are exponential. We show that there is an approximation g(t) of the function f(t) that has only quadratic number of local minima.