This paper presents ETSCH, a novel paradigm for processing large graphs. ETSCH departs from the vertex-based approach of BSP frameworks like PREGEL in two ways: first, the units of computation are not the vertices, but rather a collection of subgraphs, obtained by partitioning the input graph, second, the subgraphs are obtained through an edge-partitioning algorithm, in which edges, rather than vertices, are subdivided into disjoint subsets Global computations over the graph are then easily expressed using classical centralized algorithms executed on each of the partitions, with the only additional burden of specifying simple reconciliation procedures when vertices are replicated in multiple computing nodes. The ETSCH paradigm has been implemented both on top of existing frameworks like HADOOP, SPARK, as a stand-alone service based on AKKA, a toolkit for building distributed message-driven applications. When considering problems like single-source shortest path, PageRank, our experiments show that solutions based on ETSCH/HADOOP, ETSCH/SPARK already outperform the standard solutions to the same problems in HADOOP, SPARK, respectively. But it is our AKKA implementation that really shines: the execution time on graphs with millions of edges falls down from from thousands of seconds (ETSCH/HADOOP) to tens of seconds (ETSCH/SPARK) to seconds (ETSCH/AKKA), while easily scaling to graphs with billions of edges. ETSCH/AKKA is also faster than other partition-centric frameworks like BLOGEL, GPS.
ETSCH: Partition-centric Graph Processing
Simone Centellegher
2016-01-01
Abstract
This paper presents ETSCH, a novel paradigm for processing large graphs. ETSCH departs from the vertex-based approach of BSP frameworks like PREGEL in two ways: first, the units of computation are not the vertices, but rather a collection of subgraphs, obtained by partitioning the input graph, second, the subgraphs are obtained through an edge-partitioning algorithm, in which edges, rather than vertices, are subdivided into disjoint subsets Global computations over the graph are then easily expressed using classical centralized algorithms executed on each of the partitions, with the only additional burden of specifying simple reconciliation procedures when vertices are replicated in multiple computing nodes. The ETSCH paradigm has been implemented both on top of existing frameworks like HADOOP, SPARK, as a stand-alone service based on AKKA, a toolkit for building distributed message-driven applications. When considering problems like single-source shortest path, PageRank, our experiments show that solutions based on ETSCH/HADOOP, ETSCH/SPARK already outperform the standard solutions to the same problems in HADOOP, SPARK, respectively. But it is our AKKA implementation that really shines: the execution time on graphs with millions of edges falls down from from thousands of seconds (ETSCH/HADOOP) to tens of seconds (ETSCH/SPARK) to seconds (ETSCH/AKKA), while easily scaling to graphs with billions of edges. ETSCH/AKKA is also faster than other partition-centric frameworks like BLOGEL, GPS.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.