GraphImpl (Spark 1.3.1 JavaDoc)

Object
- org.apache.spark.graphx.Graph<VD,ED>
- - org.apache.spark.graphx.impl.GraphImpl<VD,ED>

All Implemented Interfaces:

java.io.Serializable
```
public class GraphImpl<VD,ED>
extends Graph<VD,ED>
implements scala.Serializable
```
An implementation of Graph to support computation on graphs.
Graphs are represented using two RDDs: vertices, which contains vertex attributes and the routing information for shipping vertex attributes to edge partitions, and replicatedVertexView, which contains edges and the vertex attributes mentioned by each edge.

See Also:
Serialized Form

Method Summary

Methods
Modifier and Type	Method and Description
`<A> VertexRDD<A>`	`aggregateMessagesWithActiveSet(scala.Function1<EdgeContext<VD,ED,A>,scala.runtime.BoxedUnit> sendMsg, scala.Function2<A,A,A> mergeMsg, TripletFields tripletFields, scala.Option<scala.Tuple2<VertexRDD<?>,EdgeDirection>> activeSetOpt, scala.reflect.ClassTag<A> evidence$11)` Aggregates values from the neighboring edges and vertices of each vertex.
`static <VD,ED> GraphImpl<VD,ED>`	`apply(RDD<Edge<ED>> edges, VD defaultVertexAttr, StorageLevel edgeStorageLevel, StorageLevel vertexStorageLevel, scala.reflect.ClassTag<VD> evidence$14, scala.reflect.ClassTag<ED> evidence$15)` Create a graph from edges, setting referenced vertices to `defaultVertexAttr`.
`static <VD,ED> GraphImpl<VD,ED>`	`apply(RDD<scala.Tuple2<Object,VD>> vertices, RDD<Edge<ED>> edges, VD defaultVertexAttr, StorageLevel edgeStorageLevel, StorageLevel vertexStorageLevel, scala.reflect.ClassTag<VD> evidence$18, scala.reflect.ClassTag<ED> evidence$19)` Create a graph from vertices and edges, setting missing vertices to `defaultVertexAttr`.
`static <VD,ED> GraphImpl<VD,ED>`	`apply(VertexRDD<VD> vertices, EdgeRDD<ED> edges, scala.reflect.ClassTag<VD> evidence$20, scala.reflect.ClassTag<ED> evidence$21)` Create a graph from a VertexRDD and an EdgeRDD with arbitrary replicated vertices.
`Graph<VD,ED>`	`cache()` Caches the vertices and edges associated with this graph at the previously-specified target storage levels, which default to `MEMORY_ONLY`.
`void`	`checkpoint()` Mark this Graph for checkpointing.
`EdgeRDDImpl<ED,VD>`	`edges()` An RDD containing the edges and their associated attributes.
`static <VD,ED> GraphImpl<VD,ED>`	`fromEdgePartitions(RDD<scala.Tuple2<Object,EdgePartition<ED,VD>>> edgePartitions, VD defaultVertexAttr, StorageLevel edgeStorageLevel, StorageLevel vertexStorageLevel, scala.reflect.ClassTag<VD> evidence$16, scala.reflect.ClassTag<ED> evidence$17)` Create a graph from EdgePartitions, setting referenced vertices to `defaultVertexAttr`.
`static <VD,ED> GraphImpl<VD,ED>`	`fromExistingRDDs(VertexRDD<VD> vertices, EdgeRDD<ED> edges, scala.reflect.ClassTag<VD> evidence$22, scala.reflect.ClassTag<ED> evidence$23)` Create a graph from a VertexRDD and an EdgeRDD with the same replicated vertex type as the vertices.
`scala.collection.Seq<String>`	`getCheckpointFiles()` Gets the name of the files to which this Graph was checkpointed.
`Graph<VD,ED>`	`groupEdges(scala.Function2<ED,ED,ED> merge)` Merges multiple edges between two vertices into a single edge.
`boolean`	`isCheckpointed()` Return whether this Graph has been checkpointed or not.
`<ED2> Graph<VD,ED2>`	`mapEdges(scala.Function2<Object,scala.collection.Iterator<Edge<ED>>,scala.collection.Iterator<ED2>> f, scala.reflect.ClassTag<ED2> evidence$6)` Transforms each edge attribute using the map function, passing it a whole partition at a time.
`<A> VertexRDD<A>`	`mapReduceTriplets(scala.Function1<EdgeTriplet<VD,ED>,scala.collection.Iterator<scala.Tuple2<Object,A>>> mapFunc, scala.Function2<A,A,A> reduceFunc, scala.Option<scala.Tuple2<VertexRDD<?>,EdgeDirection>> activeSetOpt, scala.reflect.ClassTag<A> evidence$10)` Aggregates values from the neighboring edges and vertices of each vertex.
`<ED2> Graph<VD,ED2>`	`mapTriplets(scala.Function2<Object,scala.collection.Iterator<EdgeTriplet<VD,ED>>,scala.collection.Iterator<ED2>> f, TripletFields tripletFields, scala.reflect.ClassTag<ED2> evidence$7)` Transforms each edge attribute a partition at a time using the map function, passing it the adjacent vertex attributes as well.
`<VD2> Graph<VD2,ED>`	`mapVertices(scala.Function2<Object,VD,VD2> f, scala.reflect.ClassTag<VD2> evidence$5, scala.Predef.$eq$colon$eq<VD,VD2> eq)` Transforms each vertex attribute in the graph using the map function.
`<VD2,ED2> Graph<VD,ED>`	`mask(Graph<VD2,ED2> other, scala.reflect.ClassTag<VD2> evidence$8, scala.reflect.ClassTag<ED2> evidence$9)` Restricts the graph to only the vertices and edges that are also in `other`, but keeps the attributes from this graph.
`<U,VD2> Graph<VD2,ED>`	`outerJoinVertices(RDD<scala.Tuple2<Object,U>> other, scala.Function3<Object,VD,scala.Option<U>,VD2> updateF, scala.reflect.ClassTag<U> evidence$12, scala.reflect.ClassTag<VD2> evidence$13, scala.Predef.$eq$colon$eq<VD,VD2> eq)` Joins the vertices with entries in the `table` RDD and merges the results using `mapFunc`.
`Graph<VD,ED>`	`partitionBy(PartitionStrategy partitionStrategy)` Repartitions the edges in the graph according to `partitionStrategy`.
`Graph<VD,ED>`	`partitionBy(PartitionStrategy partitionStrategy, int numPartitions)` Repartitions the edges in the graph according to `partitionStrategy`.
`Graph<VD,ED>`	`persist(StorageLevel newLevel)` Caches the vertices and edges associated with this graph at the specified storage level, ignoring any target storage levels previously set.
`ReplicatedVertexView<VD,ED>`	`replicatedVertexView()`
`Graph<VD,ED>`	`reverse()` Reverses all edges in the graph.
`Graph<VD,ED>`	`subgraph(scala.Function1<EdgeTriplet<VD,ED>,Object> epred, scala.Function2<Object,VD,Object> vpred)` Restricts the graph to only the vertices and edges satisfying the predicates.
`RDD<EdgeTriplet<VD,ED>>`	`triplets()` Return a RDD that brings edges together with their source and destination vertices.
`Graph<VD,ED>`	`unpersist(boolean blocking)` Uncaches both vertices and edges of this graph.
`Graph<VD,ED>`	`unpersistVertices(boolean blocking)` Uncaches only the vertices of this graph, leaving the edges alone.
`VertexRDD<VD>`	`vertices()` An RDD containing the vertices and their associated attributes.

Methods inherited from class org.apache.spark.graphx.Graph
aggregateMessages, fromEdges, fromEdgeTuples, graphToGraphOps, mapEdges, mapTriplets, mapTriplets, ops

Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Method Detail
  - apply
```
public static <VD,ED> GraphImpl<VD,ED> apply(RDD<Edge<ED>> edges,
                             VD defaultVertexAttr,
                             StorageLevel edgeStorageLevel,
                             StorageLevel vertexStorageLevel,
                             scala.reflect.ClassTag<VD> evidence$14,
                             scala.reflect.ClassTag<ED> evidence$15)
```
    Create a graph from edges, setting referenced vertices to `defaultVertexAttr`.
  - fromEdgePartitions
```
public static <VD,ED> GraphImpl<VD,ED> fromEdgePartitions(RDD<scala.Tuple2<Object,EdgePartition<ED,VD>>> edgePartitions,
                                          VD defaultVertexAttr,
                                          StorageLevel edgeStorageLevel,
                                          StorageLevel vertexStorageLevel,
                                          scala.reflect.ClassTag<VD> evidence$16,
                                          scala.reflect.ClassTag<ED> evidence$17)
```
    Create a graph from EdgePartitions, setting referenced vertices to `defaultVertexAttr`.
  - apply
```
public static <VD,ED> GraphImpl<VD,ED> apply(RDD<scala.Tuple2<Object,VD>> vertices,
                             RDD<Edge<ED>> edges,
                             VD defaultVertexAttr,
                             StorageLevel edgeStorageLevel,
                             StorageLevel vertexStorageLevel,
                             scala.reflect.ClassTag<VD> evidence$18,
                             scala.reflect.ClassTag<ED> evidence$19)
```
    Create a graph from vertices and edges, setting missing vertices to `defaultVertexAttr`.
  - apply
```
public static <VD,ED> GraphImpl<VD,ED> apply(VertexRDD<VD> vertices,
                             EdgeRDD<ED> edges,
                             scala.reflect.ClassTag<VD> evidence$20,
                             scala.reflect.ClassTag<ED> evidence$21)
```
    Create a graph from a VertexRDD and an EdgeRDD with arbitrary replicated vertices. The VertexRDD must already be set up for efficient joins with the EdgeRDD by calling VertexRDD.withEdges or an appropriate VertexRDD constructor.
  - fromExistingRDDs
```
public static <VD,ED> GraphImpl<VD,ED> fromExistingRDDs(VertexRDD<VD> vertices,
                                        EdgeRDD<ED> edges,
                                        scala.reflect.ClassTag<VD> evidence$22,
                                        scala.reflect.ClassTag<ED> evidence$23)
```
    Create a graph from a VertexRDD and an EdgeRDD with the same replicated vertex type as the vertices. The VertexRDD must already be set up for efficient joins with the EdgeRDD by calling VertexRDD.withEdges or an appropriate VertexRDD constructor.
  - vertices
```
public VertexRDD<VD> vertices()
```
    Description copied from class: Graph
    
    An RDD containing the vertices and their associated attributes.
    
    Specified by:
    
    vertices in class Graph<VD,ED>
    
    Returns:
    an RDD containing the vertices in this graph
  - replicatedVertexView
```
public ReplicatedVertexView<VD,ED> replicatedVertexView()
```
  - edges
```
public EdgeRDDImpl<ED,VD> edges()
```
    Description copied from class: Graph
    
    An RDD containing the edges and their associated attributes. The entries in the RDD contain just the source id and target id along with the edge data.
    
    Specified by:
    
    edges in class Graph<VD,ED>
    
    Returns:
    an RDD containing the edges in this graph
    See Also:
    Edge} for the edge type., Graph#triplets} to get an RDD which contains all the edges along with their vertex data.
  - triplets
```
public RDD<EdgeTriplet<VD,ED>> triplets()
```
    Return a RDD that brings edges together with their source and destination vertices.
    
    Specified by:
    
    triplets in class Graph<VD,ED>
    
    Returns:
    an RDD containing edge triplets
  - persist
```
public Graph<VD,ED> persist(StorageLevel newLevel)
```
    Description copied from class: Graph
    
    Caches the vertices and edges associated with this graph at the specified storage level, ignoring any target storage levels previously set.
    
    Specified by:
    
    persist in class Graph<VD,ED>
    
    Parameters:
    newLevel - the level at which to cache the graph.
    
    Returns:
    A reference to this graph for convenience.
  - cache
```
public Graph<VD,ED> cache()
```
    Description copied from class: Graph
    
    Caches the vertices and edges associated with this graph at the previously-specified target storage levels, which default to MEMORY_ONLY. This is used to pin a graph in memory enabling multiple queries to reuse the same construction process.
    
    Specified by:
    
    cache in class Graph<VD,ED>
  - checkpoint
```
public void checkpoint()
```
    Description copied from class: Graph
    
    Mark this Graph for checkpointing. It will be saved to a file inside the checkpoint directory set with SparkContext.setCheckpointDir() and all references to its parent RDDs will be removed. It is strongly recommended that this Graph is persisted in memory, otherwise saving it on a file will require recomputation.
    
    Specified by:
    
    checkpoint in class Graph<VD,ED>
  - isCheckpointed
```
public boolean isCheckpointed()
```
    Description copied from class: Graph
    
    Return whether this Graph has been checkpointed or not. This returns true iff both the vertices RDD and edges RDD have been checkpointed.
    
    Specified by:
    
    isCheckpointed in class Graph<VD,ED>
  - getCheckpointFiles
```
public scala.collection.Seq<String> getCheckpointFiles()
```
    Description copied from class: Graph
    
    Gets the name of the files to which this Graph was checkpointed. (The vertices RDD and edges RDD are checkpointed separately.)
    
    Specified by:
    
    getCheckpointFiles in class Graph<VD,ED>
  - unpersist
```
public Graph<VD,ED> unpersist(boolean blocking)
```
    Description copied from class: Graph
    
    Uncaches both vertices and edges of this graph. This is useful in iterative algorithms that build a new graph in each iteration.
    
    Specified by:
    
    unpersist in class Graph<VD,ED>
  - unpersistVertices
```
public Graph<VD,ED> unpersistVertices(boolean blocking)
```
    Description copied from class: Graph
    
    Uncaches only the vertices of this graph, leaving the edges alone. This is useful in iterative algorithms that modify the vertex attributes but reuse the edges. This method can be used to uncache the vertex attributes of previous iterations once they are no longer needed, improving GC performance.
    
    Specified by:
    
    unpersistVertices in class Graph<VD,ED>
  - partitionBy
```
public Graph<VD,ED> partitionBy(PartitionStrategy partitionStrategy)
```
    Description copied from class: Graph
    
    Repartitions the edges in the graph according to partitionStrategy.
    
    Specified by:
    
    partitionBy in class Graph<VD,ED>
    
    Parameters:
    partitionStrategy - the partitioning strategy to use when partitioning the edges in the graph.
  - partitionBy
```
public Graph<VD,ED> partitionBy(PartitionStrategy partitionStrategy,
                       int numPartitions)
```
    Description copied from class: Graph
    
    Repartitions the edges in the graph according to partitionStrategy.
    
    Specified by:
    
    partitionBy in class Graph<VD,ED>
    
    Parameters:
    partitionStrategy - the partitioning strategy to use when partitioning the edges in the graph.
    numPartitions - the number of edge partitions in the new graph.
  - reverse
```
public Graph<VD,ED> reverse()
```
    Description copied from class: Graph
    
    Reverses all edges in the graph. If this graph contains an edge from a to b then the returned graph contains an edge from b to a.
    
    Specified by:
    
    reverse in class Graph<VD,ED>
  - mapVertices
```
public <VD2> Graph<VD2,ED> mapVertices(scala.Function2<Object,VD,VD2> f,
                              scala.reflect.ClassTag<VD2> evidence$5,
                              scala.Predef.$eq$colon$eq<VD,VD2> eq)
```
    Description copied from class: Graph
    
    Transforms each vertex attribute in the graph using the map function.
    
    Specified by:
    
    mapVertices in class Graph<VD,ED>
    
    Parameters:
    f - the function from a vertex object to a new vertex value
  - mapEdges
```
public <ED2> Graph<VD,ED2> mapEdges(scala.Function2<Object,scala.collection.Iterator<Edge<ED>>,scala.collection.Iterator<ED2>> f,
                           scala.reflect.ClassTag<ED2> evidence$6)
```
    Description copied from class: Graph
    
    Transforms each edge attribute using the map function, passing it a whole partition at a time. The map function is given an iterator over edges within a logical partition as well as the partition's ID, and it should return a new iterator over the new values of each edge. The new iterator's elements must correspond one-to-one with the old iterator's elements. If adjacent vertex values are desired, use mapTriplets.
    
    Specified by:
    
    mapEdges in class Graph<VD,ED>
    
    Parameters:
    f - a function that takes a partition id and an iterator over all the edges in the partition, and must return an iterator over the new values for each edge in the order of the input iterator
  - mapTriplets
```
public <ED2> Graph<VD,ED2> mapTriplets(scala.Function2<Object,scala.collection.Iterator<EdgeTriplet<VD,ED>>,scala.collection.Iterator<ED2>> f,
                              TripletFields tripletFields,
                              scala.reflect.ClassTag<ED2> evidence$7)
```
    Description copied from class: Graph
    
    Transforms each edge attribute a partition at a time using the map function, passing it the adjacent vertex attributes as well. The map function is given an iterator over edge triplets within a logical partition and should yield a new iterator over the new values of each edge in the order in which they are provided. If adjacent vertex values are not required, consider using mapEdges instead.
    
    Specified by:
    
    mapTriplets in class Graph<VD,ED>
    
    Parameters:
    f - the iterator transform
    tripletFields - which fields should be included in the edge triplet passed to the map function. If not all fields are needed, specifying this can improve performance.
  - subgraph
```
public Graph<VD,ED> subgraph(scala.Function1<EdgeTriplet<VD,ED>,Object> epred,
                    scala.Function2<Object,VD,Object> vpred)
```
    Description copied from class: Graph
    Restricts the graph to only the vertices and edges satisfying the predicates. The resulting subgraph satisifies
```
 V' = {v : for all v in V where vpred(v)}
 E' = {(u,v): for all (u,v) in E where epred((u,v)) && vpred(u) && vpred(v)}
 
```
    Specified by:
    
    subgraph in class Graph<VD,ED>
    
    Parameters:
    epred - the edge predicate, which takes a triplet and evaluates to true if the edge is to remain in the subgraph. Note that only edges where both vertices satisfy the vertex predicate are considered.
    vpred - the vertex predicate, which takes a vertex object and evaluates to true if the vertex is to be included in the subgraph
    
    Returns:
    the subgraph containing only the vertices and edges that satisfy the predicates
  - mask
```
public <VD2,ED2> Graph<VD,ED> mask(Graph<VD2,ED2> other,
                          scala.reflect.ClassTag<VD2> evidence$8,
                          scala.reflect.ClassTag<ED2> evidence$9)
```
    Description copied from class: Graph
    
    Restricts the graph to only the vertices and edges that are also in other, but keeps the attributes from this graph.
    
    Specified by:
    
    mask in class Graph<VD,ED>
    
    Parameters:
    other - the graph to project this graph onto
    
    Returns:
    a graph with vertices and edges that exist in both the current graph and other, with vertex and edge data from the current graph
  - groupEdges
```
public Graph<VD,ED> groupEdges(scala.Function2<ED,ED,ED> merge)
```
    Description copied from class: Graph
    
    Merges multiple edges between two vertices into a single edge. For correct results, the graph must have been partitioned using partitionBy.
    
    Specified by:
    
    groupEdges in class Graph<VD,ED>
    
    Parameters:
    merge - the user-supplied commutative associative function to merge edge attributes for duplicate edges.
    
    Returns:
    The resulting graph with a single edge for each (source, dest) vertex pair.
  - mapReduceTriplets
```
public <A> VertexRDD<A> mapReduceTriplets(scala.Function1<EdgeTriplet<VD,ED>,scala.collection.Iterator<scala.Tuple2<Object,A>>> mapFunc,
                                 scala.Function2<A,A,A> reduceFunc,
                                 scala.Option<scala.Tuple2<VertexRDD<?>,EdgeDirection>> activeSetOpt,
                                 scala.reflect.ClassTag<A> evidence$10)
```
    Description copied from class: Graph
    
    Aggregates values from the neighboring edges and vertices of each vertex. The user supplied mapFunc function is invoked on each edge of the graph, generating 0 or more "messages" to be "sent" to either vertex in the edge. The reduceFunc is then used to combine the output of the map phase destined to each vertex.
    This function is deprecated in 1.2.0 because of SPARK-3936. Use aggregateMessages instead.
    
    Specified by:
    
    mapReduceTriplets in class Graph<VD,ED>
    
    Parameters:
    mapFunc - the user defined map function which returns 0 or more messages to neighboring vertices
    reduceFunc - the user defined reduce function which should be commutative and associative and is used to combine the output of the map phase
    activeSetOpt - an efficient way to run the aggregation on a subset of the edges if desired. This is done by specifying a set of "active" vertices and an edge direction. The sendMsg function will then run only on edges connected to active vertices by edges in the specified direction. If the direction is In, sendMsg will only be run on edges with destination in the active set. If the direction is Out, sendMsg will only be run on edges originating from vertices in the active set. If the direction is Either, sendMsg will be run on edges with *either* vertex in the active set. If the direction is Both, sendMsg will be run on edges with *both* vertices in the active set. The active set must have the same index as the graph's vertices.
  - aggregateMessagesWithActiveSet
```
public <A> VertexRDD<A> aggregateMessagesWithActiveSet(scala.Function1<EdgeContext<VD,ED,A>,scala.runtime.BoxedUnit> sendMsg,
                                              scala.Function2<A,A,A> mergeMsg,
                                              TripletFields tripletFields,
                                              scala.Option<scala.Tuple2<VertexRDD<?>,EdgeDirection>> activeSetOpt,
                                              scala.reflect.ClassTag<A> evidence$11)
```
    Description copied from class: Graph
    
    Aggregates values from the neighboring edges and vertices of each vertex. The user-supplied sendMsg function is invoked on each edge of the graph, generating 0 or more messages to be sent to either vertex in the edge. The mergeMsg function is then used to combine all messages destined to the same vertex.
    This variant can take an active set to restrict the computation and is intended for internal use only.
    
    Specified by:
    
    aggregateMessagesWithActiveSet in class Graph<VD,ED>
    
    Parameters:
    sendMsg - runs on each edge, sending messages to neighboring vertices using the EdgeContext.
    mergeMsg - used to combine messages from sendMsg destined to the same vertex. This combiner should be commutative and associative.
    tripletFields - which fields should be included in the EdgeContext passed to the sendMsg function. If not all fields are needed, specifying this can improve performance.
    activeSetOpt - an efficient way to run the aggregation on a subset of the edges if desired. This is done by specifying a set of "active" vertices and an edge direction. The sendMsg function will then run on only edges connected to active vertices by edges in the specified direction. If the direction is In, sendMsg will only be run on edges with destination in the active set. If the direction is Out, sendMsg will only be run on edges originating from vertices in the active set. If the direction is Either, sendMsg will be run on edges with *either* vertex in the active set. If the direction is Both, sendMsg will be run on edges with *both* vertices in the active set. The active set must have the same index as the graph's vertices.
  - outerJoinVertices
```
public <U,VD2> Graph<VD2,ED> outerJoinVertices(RDD<scala.Tuple2<Object,U>> other,
                                      scala.Function3<Object,VD,scala.Option<U>,VD2> updateF,
                                      scala.reflect.ClassTag<U> evidence$12,
                                      scala.reflect.ClassTag<VD2> evidence$13,
                                      scala.Predef.$eq$colon$eq<VD,VD2> eq)
```
    Description copied from class: Graph
    
    Joins the vertices with entries in the table RDD and merges the results using mapFunc. The input table should contain at most one entry for each vertex. If no entry in other is provided for a particular vertex in the graph, the map function receives None.
    
    Specified by:
    
    outerJoinVertices in class Graph<VD,ED>
    
    Parameters:
    other - the table to join with the vertices in the graph. The table should contain at most one entry for each vertex.
    updateF - the function used to compute the new vertex values. The map function is invoked for all vertices, even those that do not have a corresponding entry in the table.

Class GraphImpl<VD,ED>

Method Summary

Methods inherited from class org.apache.spark.graphx.Graph

Methods inherited from class Object

Method Detail

apply

fromEdgePartitions

apply

apply

fromExistingRDDs

vertices

replicatedVertexView

edges

triplets

persist

cache

checkpoint

isCheckpointed

getCheckpointFiles

unpersist

unpersistVertices

partitionBy

partitionBy

reverse

mapVertices

mapEdges

mapTriplets

subgraph

mask

groupEdges

mapReduceTriplets

aggregateMessagesWithActiveSet

outerJoinVertices