Welcome to G-thinker!

Mining from a big graph those subgraphs that satisfy certain requirement is compute-intensive. A natural solution to tackle the high complexity is to utilizes many CPU cores for parallel execution. Unfortunately, existing Big Data frameworks are dominantly data-intensive where communication caused by data movement is the bottleneck, and CPU cores are underutilized. Solving compute-intensive graph mining problems using these frameworks often results in even worse performance.

For example, Chu and Cheng reported that the state-of-the-art MapReduce algorithm for triangle counting uses 1,636 machines and takes 5.33 minutes on a small graph, on which their single-threaded algorithm uses less than 0.5 minute. McSherry et al. reported that existing "think like a vertex" parallel graph processing systems (designed for data-intensive iterative computation) is comparable and sometimes beaten by a single-threaded solution.

Our G-thinker system fundamentally changes the data-intensive design of existing Big Data frameworks. It adopts a task-based execution engine that is able to keep CPUs in a cluster fully utilized even with a budget limit on memory consumption. It also provides a subgraph-centric API that is natural for writing graph mining algorithms. G-thinker is orders of magnitude faster than existing systems and scales to graphs orders of magnitude larger given the same hardware environment.

A G-thinker Demo

Updates

We now have an improved version of G-thinker that does better in load balancing (see our PVLDB 2020 paper below for the details), which can be accessed from this GitHub link.

Related Publications

  • Guimu Guo, Da Yan: Systems and Algorithms for Massively Parallel Graph Mining (Tutorial). IEEE BigData 2020. [PPT], [VIDEO1], [VIDEO2]
  • Guimu Guo, Da Yan, M. Tamer Özsu, Zhe Jiang, Jalal Khalil: Scalable Mining of Maximal Quasi-Cliques: An Algorithm-System Codesign Approach. Proc. VLDB Endow. 14(4): 573-585 (2020) [PDF]
  • Da Yan, Guimu Guo, Md Mashiur Rahman Chowdhury, M. Tamer Özsu, Wei-Shinn Ku, John C. S. Lui: G-thinker: A Distributed Framework for Mining Subgraphs in a Big Graph. ICDE 2020: 1369-1380 [PDF], [VIDEO]
  • Da Yan, Guimu Guo, Md Mashiur Rahman Chowdhury, M. Tamer Özsu, John C. S. Lui, Weida Tan: T-thinker: a task-centric distributed framework for compute-intensive divide-and-conquer algorithms. PPoPP 2019: 411-412 [PDF]