Distributed Graph Mining: A Study of Performance Advantages in Distributed Data Mining Paradigms when Processing Graphs Using Pagerank on a Single Node Cluster

Distributed data mining is a relatively new area within computer science that is steadily growing, emerging from the demands of being able to gather and process various distributed data by utilising clusters.

This report presents the properties of graph structured data and what paradigms to use for efficiently processing the data type, based on comprehensive theoretical studies applied on practical tests performed on a single node cluster.

The results in the study showcase the various performance aspects of processing graph data, using different open source paradigm frameworks and amount of shards used on input.

A conclusion to be drawn from this study is that there are no real performance advantages to using distributed data mining paradigms specifically developed for graph data on single machines.
Source: KTH
Authors: Abdlwafa, Alan | Edman, Henrik

Download Project