Graph Generators

Gelly provides a collection of scalable graph generators. Each generator is

  • parallelizable, in order to create large datasets
  • scale-free, generating the same graph regardless of parallelism
  • thrifty, using as few operators as possible

Graph generators are configured using the builder pattern. The parallelism of generator operators can be set explicitly by calling setParallelism(parallelism). Lowering the parallelism will reduce the allocation of memory and network buffers.

Graph-specific configuration must be called first, then configuration common to all generators, and lastly the call to generate(). The following example configures a grid graph with two dimensions, configures the parallelism, and generates the graph.

ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

boolean wrapEndpoints = false;

int parallelism = 4;

Graph<LongValue,NullValue,NullValue> graph = new GridGraph(env)
    .addDimension(2, wrapEndpoints)
    .addDimension(4, wrapEndpoints)
    .setParallelism(parallelism)
    .generate();
import org.apache.flink.api.scala._
import org.apache.flink.graph.generator.GridGraph

val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment

wrapEndpoints = false

val parallelism = 4

val graph = new GridGraph(env.getJavaEnv).addDimension(2, wrapEndpoints).addDimension(4, wrapEndpoints).setParallelism(parallelism).generate()

Complete Graph

An undirected graph connecting every distinct pair of vertices.

ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

long vertexCount = 5;

Graph<LongValue,NullValue,NullValue> graph = new CompleteGraph(env, vertexCount)
    .generate();
import org.apache.flink.api.scala._
import org.apache.flink.graph.generator.CompleteGraph

val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment

val vertexCount = 5

val graph = new CompleteGraph(env.getJavaEnv, vertexCount).generate()
0 1 2 3 4

Cycle Graph

An undirected graph where all edges form a single cycle.

ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

long vertexCount = 5;

Graph<LongValue,NullValue,NullValue> graph = new CycleGraph(env, vertexCount)
    .generate();
import org.apache.flink.api.scala._
import org.apache.flink.graph.generator.CycleGraph

val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment

val vertexCount = 5

val graph = new CycleGraph(env.getJavaEnv, vertexCount).generate()
0 1 2 3 4

Empty Graph

The graph containing no edges.

ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

long vertexCount = 5;

Graph<LongValue,NullValue,NullValue> graph = new EmptyGraph(env, vertexCount)
    .generate();
import org.apache.flink.api.scala._
import org.apache.flink.graph.generator.EmptyGraph

val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment

val vertexCount = 5

val graph = new EmptyGraph(env.getJavaEnv, vertexCount).generate()
0 1 2 3 4

Grid Graph

An undirected graph connecting vertices in a regular tiling in one or more dimensions. Each dimension is configured separately. When the dimension size is at least three the endpoints are optionally connected by setting wrapEndpoints. Changing the following example to addDimension(4, true) would connect 0 to 3 and 4 to 7.

ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

boolean wrapEndpoints = false;

Graph<LongValue,NullValue,NullValue> graph = new GridGraph(env)
    .addDimension(2, wrapEndpoints)
    .addDimension(4, wrapEndpoints)
    .generate();
import org.apache.flink.api.scala._
import org.apache.flink.graph.generator.GridGraph

val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment

val wrapEndpoints = false

val graph = new GridGraph(env.getJavaEnv).addDimension(2, wrapEndpoints).addDimension(4, wrapEndpoints).generate()
0 1 2 3 4 5 6 7

Hypercube Graph

An undirected graph where edges form an n-dimensional hypercube. Each vertex in a hypercube connects to one other vertex in each dimension.

ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

long dimensions = 3;

Graph<LongValue,NullValue,NullValue> graph = new HypercubeGraph(env, dimensions)
    .generate();
import org.apache.flink.api.scala._
import org.apache.flink.graph.generator.HypercubeGraph

val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment

val dimensions = 3

val graph = new HypercubeGraph(env.getJavaEnv, dimensions).generate()
0 1 2 3 4 5 6 7

Path Graph

An undirected Graph where all edges form a single path.

ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

long vertexCount = 5

Graph<LongValue,NullValue,NullValue> graph = new PathGraph(env, vertexCount)
    .generate();
import org.apache.flink.api.scala._
import org.apache.flink.graph.generator.PathGraph

val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment

val vertexCount = 5

val graph = new PathGraph(env.getJavaEnv, vertexCount).generate()
0 1 2 3 4

RMat Graph

A directed or undirected power-law graph generated using the Recursive Matrix (R-Mat) model.

RMat is a stochastic generator configured with a source of randomness implementing the RandomGenerableFactory interface. Provided implementations are JDKRandomGeneratorFactory and MersenneTwisterFactory. These generate an initial sequence of random values which are then used as seeds for generating the edges.

ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

RandomGenerableFactory<JDKRandomGenerator> rnd = new JDKRandomGeneratorFactory();

int vertexCount = 1 << scale;
int edgeCount = edgeFactor * vertexCount;

Graph<LongValue,NullValue,NullValue> graph = new RMatGraph<>(env, rnd, vertexCount, edgeCount)
    .generate();
import org.apache.flink.api.scala._
import org.apache.flink.graph.generator.RMatGraph

val env = ExecutionEnvironment.getExecutionEnvironment

val vertexCount = 1 << scale
val edgeCount = edgeFactor * vertexCount

val graph = new RMatGraph(env.getJavaEnv, rnd, vertexCount, edgeCount).generate()

The default RMat contants can be overridden as shown in the following example. The contants define the interdependence of bits from each generated edge’s source and target labels. The RMat noise can be enabled and progressively perturbs the contants while generating each edge.

The RMat generator can be configured to produce a simple graph by removing self-loops and duplicate edges. Symmetrization is performed either by a “clip-and-flip” throwing away the half matrix above the diagonal or a full “flip” preserving and mirroring all edges.

ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

RandomGenerableFactory<JDKRandomGenerator> rnd = new JDKRandomGeneratorFactory();

int vertexCount = 1 << scale;
int edgeCount = edgeFactor * vertexCount;

boolean clipAndFlip = false;

Graph<LongValue,NullValue,NullValue> graph = new RMatGraph<>(env, rnd, vertexCount, edgeCount)
    .setConstants(0.57f, 0.19f, 0.19f)
    .setNoise(true, 0.10f)
    .generate();
import org.apache.flink.api.scala._
import org.apache.flink.graph.generator.RMatGraph

val env = ExecutionEnvironment.getExecutionEnvironment

val vertexCount = 1 << scale
val edgeCount = edgeFactor * vertexCount

clipAndFlip = false

val graph = new RMatGraph(env.getJavaEnv, rnd, vertexCount, edgeCount).setConstants(0.57f, 0.19f, 0.19f).setNoise(true, 0.10f).generate()

Singleton Edge Graph

An undirected graph containing isolated two-paths.

ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

long vertexPairCount = 4

// note: configured with the number of vertex pairs
Graph<LongValue,NullValue,NullValue> graph = new SingletonEdgeGraph(env, vertexPairCount)
    .generate();
import org.apache.flink.api.scala._
import org.apache.flink.graph.generator.SingletonEdgeGraph

val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment

val vertexPairCount = 4

// note: configured with the number of vertex pairs
val graph = new SingletonEdgeGraph(env.getJavaEnv, vertexPairCount).generate()
0 1 2 3 4 5 6 7

Star Graph

An undirected graph containing a single central vertex connected to all other leaf vertices.

ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

long vertexCount = 6;

Graph<LongValue,NullValue,NullValue> graph = new StarGraph(env, vertexCount)
    .generate();
import org.apache.flink.api.scala._
import org.apache.flink.graph.generator.StarGraph

val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment

val vertexCount = 6

val graph = new StarGraph(env.getJavaEnv, vertexCount).generate()
0 1 2 3 4 5

Back to top