I expect to graduate at the end of 2017.
I'm a 4th year PhD candidate in Computer Science at Hasselt University and a proud member of its Databases and Theoretical Computer Science group, where I work with professor Frank Neven (my advisor). I'm also a PhD Fellow of the Research Foundation - Flanders (FWO). I conduct research in the area of principles of database management and enjoy working on topics related to cloud computing and big data. Prior to starting a PhD I obtained an MSc degree in computer science, databases.
During the months February till April 2016 I visited UWDB, the database group of the University of Washington in Seattle, to work with professor Dan Suciu. I also collaborate with professor Thomas Schwentick from tu Dortmund. I'm currently visiting the Database Group of the University of Wisconsin-Madison to work with professor Paris Koutris.
I expect to graduate at the end of 2017.
Hasselt University, Agoralaan, Building D, B-3590 Diepenbeek, Belgium Room: D6 Tel: +3211268209 E-mail: Firstname.Lastname (at) uhasselt.be
We study the optimal communication cost for computing a full conjunctive query Q over p distributed servers. Two prior results were known. First, for one-round algorithms over skew-free data the optimal communication cost per server is m/p^1/tau* , where m is the size of the largest input relation, and tau* is the fractional vertex covering number of the query hypergraph. Second, for multi-round algorithms and unrestricted database instances, it was shown that any algorithm requires at least m/p^1/rho* communication cost per server, where rho* is the fractional edge covering number of the query hypergraph; but no matching algorithms were known for this case (except for two restricted queries: chains and cycles). In this paper we describe a multi-round algorithm that computes any query with load m/p^1/rho*(Q) per server, in the case when all input relations are binary. Thus, we prove this to be the optimal load for all queries over binary input relations. Our algorithm represents a non-trivial extension of previous algorithms for chains and cycles, and exploits some unique properties of graphs, which no longer hold for hyper-graphs.
Single-round multiway join algorithms first reshuffle data over many servers and then evaluate the query at hand in a parallel and communication-free way. A key question is whether a given distribution policy for the reshuffle is adequate for computing a given query, also referred to as parallel-correctness. This paper extends the study of the complexity of parallel-correctness and its constituents, parallel-soundness and parallel-completeness, to unions of conjunctive queries with and without negation. As a by-product it is shown that the containment problem for conjunctive queries with negation is coNEXPTIME-complete.
A dominant cost for query evaluation in modern massively distributed systems is the number of communication rounds. For this reason, there is a growing interest in single-round multiway join algorithms where data is first reshuffled over many servers and then evaluated in a parallel but communication-free way. The reshuffling itself is specified as a distribution policy. We introduce a correctness condition, called parallel-correctness, for the evaluation of queries w.r.t. a distribution policy. We study the complexity of parallel- correctness for conjunctive queries as well as transferability of parallel-correctness between queries. We also investigate the complexity of transferability for certain families of distribution policies, including, for instance, the Hypercube distribution.
We investigate the class D of queries that distribute over components. These are the queries that can be evaluated by taking the union of the query results over the connected components of the database instance. We show that it is undecidable whether a (positive) Datalog program distributes over components. Additionally, we show that connected Datalog with negation (the fragment of Datalog with negation where all rules are connected) provides an effective syntax for Datalog with negation programs that distribute over components under the stratified as well as under the well-founded semantics. As a corollary, we obtain a simple proof for one of the main results in previous work, namely, that the classic win-move query is in F2 (a particular class of coordination-free queries).
In a distributed context where data is dispersed over many computing nodes, monotone queries can be evaluated in an eventually consistent and coordination-free manner through a simple but naive broadcasting strategy which makes all data available on every computing node. In this paper, we investigate more economical broadcasting strategies for full conjunctive queries without self-joins that only transmit a part of the local data necessary to evaluate the query at hand. We consider oblivious broadcasting strategies which determine which local facts to broadcast independent of the data at other computing nodes. We introduce the notion of broadcast dependency set (BDS) as a sound and complete formalism to represent locally optimal oblivious broadcasting functions. We provide algorithms to construct a BDS for a given conjunctive query and study the complexity of various decision problems related to these algorithms.
The CALM-conjecture, first stated by Hellerstein and proved in its revised form by Ameloot et al. within the framework of relational transducer networks, asserts that a query has a coordination-free execution strategy if and only if the query is monotone. Zinn et al. extended the framework of relational transducer networks to allow for specific data distribution strategies and showed that the non-monotone win-move query is coordination-free for domain-guided data distributions. In this paper, we complete the story by equating increasingly larger classes of coordination-free computations with increasingly weaker forms of mono-tonicity and make Datalog variants explicit that capture each of these classes. One such fragment is based on stratified Datalog where rules are required to be connected with the exception of the last stratum. In addition, we characterize coordination-freeness as those computations that do not require knowledge about all other nodes in the network, and therefore, can not globally coordinate. The results in this paper can be interpreted as a more fine-grained answer to the CALM-conjecture.