IRIS Institutional Research Information System

This paper presents ETSCH, a novel paradigm for processing large graphs. ETSCH departs from the vertex-based approach of BSP frameworks like PREGEL in two ways: first, the units of computation are not the vertices, but rather a collection of subgraphs, obtained by partitioning the input graph, second, the subgraphs are obtained through an edge-partitioning algorithm, in which edges, rather than vertices, are subdivided into disjoint subsets Global computations over the graph are then easily expressed using classical centralized algorithms executed on each of the partitions, with the only additional burden of specifying simple reconciliation procedures when vertices are replicated in multiple computing nodes. The ETSCH paradigm has been implemented both on top of existing frameworks like HADOOP, SPARK, as a stand-alone service based on AKKA, a toolkit for building distributed message-driven applications. When considering problems like single-source shortest path, PageRank, our experiments show that solutions based on ETSCH/HADOOP, ETSCH/SPARK already outperform the standard solutions to the same problems in HADOOP, SPARK, respectively. But it is our AKKA implementation that really shines: the execution time on graphs with millions of edges falls down from from thousands of seconds (ETSCH/HADOOP) to tens of seconds (ETSCH/SPARK) to seconds (ETSCH/AKKA), while easily scaling to graphs with billions of edges. ETSCH/AKKA is also faster than other partition-centric frameworks like BLOGEL, GPS.

ETSCH: Partition-centric Graph Processing

Alessio Guerrieri;Alberto Montresor;Simone Centellegher

2016-01-01

Abstract

This paper presents ETSCH, a novel paradigm for processing large graphs. ETSCH departs from the vertex-based approach of BSP frameworks like PREGEL in two ways: first, the units of computation are not the vertices, but rather a collection of subgraphs, obtained by partitioning the input graph, second, the subgraphs are obtained through an edge-partitioning algorithm, in which edges, rather than vertices, are subdivided into disjoint subsets Global computations over the graph are then easily expressed using classical centralized algorithms executed on each of the partitions, with the only additional burden of specifying simple reconciliation procedures when vertices are replicated in multiple computing nodes. The ETSCH paradigm has been implemented both on top of existing frameworks like HADOOP, SPARK, as a stand-alone service based on AKKA, a toolkit for building distributed message-driven applications. When considering problems like single-source shortest path, PageRank, our experiments show that solutions based on ETSCH/HADOOP, ETSCH/SPARK already outperform the standard solutions to the same problems in HADOOP, SPARK, respectively. But it is our AKKA implementation that really shines: the execution time on graphs with millions of edges falls down from from thousands of seconds (ETSCH/HADOOP) to tens of seconds (ETSCH/SPARK) to seconds (ETSCH/AKKA), while easily scaling to graphs with billions of edges. ETSCH/AKKA is also faster than other partition-centric frameworks like BLOGEL, GPS.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2016

Appare nelle tipologie:

4.1 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/347888

Citazioni

ND

social impact