VertexRDD (Spark 2.1.3 JavaDoc)

Object
- org.apache.spark.rdd.RDD<scala.Tuple2<Object,VD>>
- - org.apache.spark.graphx.VertexRDD<VD>

All Implemented Interfaces:: java.io.Serializable

Direct Known Subclasses:: VertexRDDImpl

public abstract class VertexRDD<VD>
extends RDD<scala.Tuple2<Object,VD>>

Extends RDD[(VertexId, VD)] by ensuring that there is only one entry for each vertex and by pre-indexing the entries for fast, efficient joins. Two VertexRDDs with the same index can be joined efficiently. All operations except reindex preserve the index. To construct a VertexRDD, use the VertexRDD object.

Additionally, stores routing information to enable joining the vertex attributes with an EdgeRDD.

See Also:

Serialized Form

Example:

Construct a VertexRDD from a plain RDD:


 // Construct an initial vertex set
 val someData: RDD[(VertexId, SomeType)] = loadData(someFile)
 val vset = VertexRDD(someData)
 // If there were redundant values in someData we would use a reduceFunc
 val vset2 = VertexRDD(someData, reduceFunc)
 // Finally we can use the VertexRDD to index another dataset
 val otherData: RDD[(VertexId, OtherType)] = loadData(otherFile)
 val vset3 = vset2.innerJoin(otherData) { (vid, a, b) => b }
 // Now we can construct very fast joins between the two sets
 val vset4: VertexRDD[(SomeType, OtherType)] = vset.leftJoin(vset3)

Constructor Summary

Constructors
Constructor and Description

VertexRDD(SparkContext sc, scala.collection.Seq<Dependency<?>> deps)

Constructors
Constructor and Description
`VertexRDD(SparkContext sc, scala.collection.Seq<Dependency<?>> deps)`

Method Summary

Methods
Modifier and Type	Method and Description
`static RDD<T>`	`$plus$plus(RDD<T> other)`
`static <U> U`	`aggregate(U zeroValue, scala.Function2<U,T,U> seqOp, scala.Function2<U,U,U> combOp, scala.reflect.ClassTag<U> evidence$30)`
`abstract <VD2> VertexRDD<VD2>`	`aggregateUsingIndex(RDD<scala.Tuple2<Object,VD2>> messages, scala.Function2<VD2,VD2,VD2> reduceFunc, scala.reflect.ClassTag<VD2> evidence$12)` Aggregates vertices in `messages` that have the same ids using `reduceFunc`, returning a VertexRDD co-indexed with `this`.
`static <VD> VertexRDD<VD>`	`apply(RDD<scala.Tuple2<Object,VD>> vertices, scala.reflect.ClassTag<VD> evidence$14)` Constructs a standalone `VertexRDD` (one that is not set up for efficient joins with an `EdgeRDD`) from an RDD of vertex-attribute pairs.
`static <VD> VertexRDD<VD>`	`apply(RDD<scala.Tuple2<Object,VD>> vertices, EdgeRDD<?> edges, VD defaultVal, scala.reflect.ClassTag<VD> evidence$15)` Constructs a `VertexRDD` from an RDD of vertex-attribute pairs.
`static <VD> VertexRDD<VD>`	`apply(RDD<scala.Tuple2<Object,VD>> vertices, EdgeRDD<?> edges, VD defaultVal, scala.Function2<VD,VD,VD> mergeFunc, scala.reflect.ClassTag<VD> evidence$16)` Constructs a `VertexRDD` from an RDD of vertex-attribute pairs.
`static RDD<T>`	`cache()`
`static <U> RDD<scala.Tuple2<T,U>>`	`cartesian(RDD<U> other, scala.reflect.ClassTag<U> evidence$5)`
`static void`	`checkpoint()`
`static RDD<T>`	`coalesce(int numPartitions, boolean shuffle, scala.Option<PartitionCoalescer> partitionCoalescer, scala.math.Ordering<T> ord)`
`static boolean`	`coalesce$default$2()`
`static scala.Option<PartitionCoalescer>`	`coalesce$default$3()`
`static scala.math.Ordering<T>`	`coalesce$default$4(int numPartitions, boolean shuffle, scala.Option<PartitionCoalescer> partitionCoalescer)`
`static Object`	`collect()`
`static <U> RDD<U>`	`collect(scala.PartialFunction<T,U> f, scala.reflect.ClassTag<U> evidence$29)`
`scala.collection.Iterator<scala.Tuple2<Object,VD>>`	`compute(Partition part, TaskContext context)` Provides the `RDD[(VertexId, VD)]` equivalent output.
`static SparkContext`	`context()`
`static long`	`count()`
`static PartialResult<BoundedDouble>`	`countApprox(long timeout, double confidence)`
`static double`	`countApprox$default$2()`
`static long`	`countApproxDistinct(double relativeSD)`
`static long`	`countApproxDistinct(int p, int sp)`
`static double`	`countApproxDistinct$default$1()`
`static scala.collection.Map<T,Object>`	`countByValue(scala.math.Ordering<T> ord)`
`static scala.math.Ordering<T>`	`countByValue$default$1()`
`static PartialResult<scala.collection.Map<T,BoundedDouble>>`	`countByValueApprox(long timeout, double confidence, scala.math.Ordering<T> ord)`
`static double`	`countByValueApprox$default$2()`
`static scala.math.Ordering<T>`	`countByValueApprox$default$3(long timeout, double confidence)`
`static scala.collection.Seq<Dependency<?>>`	`dependencies()`
`abstract VertexRDD<VD>`	`diff(RDD<scala.Tuple2<Object,VD>> other)` For each vertex present in both `this` and `other`, `diff` returns only those vertices with differing values; for values that are different, keeps the values from `other`.
`abstract VertexRDD<VD>`	`diff(VertexRDD<VD> other)` For each vertex present in both `this` and `other`, `diff` returns only those vertices with differing values; for values that are different, keeps the values from `other`.
`static RDD<T>`	`distinct()`
`static RDD<T>`	`distinct(int numPartitions, scala.math.Ordering<T> ord)`
`static scala.math.Ordering<T>`	`distinct$default$2(int numPartitions)`
`VertexRDD<VD>`	`filter(scala.Function1<scala.Tuple2<Object,VD>,Object> pred)` Restricts the vertex set to the set of vertices satisfying the given predicate.
`static T`	`first()`
`static <U> RDD<U>`	`flatMap(scala.Function1<T,scala.collection.TraversableOnce<U>> f, scala.reflect.ClassTag<U> evidence$4)`
`static T`	`fold(T zeroValue, scala.Function2<T,T,T> op)`
`static void`	`foreach(scala.Function1<T,scala.runtime.BoxedUnit> f)`
`static void`	`foreachPartition(scala.Function1<scala.collection.Iterator<T>,scala.runtime.BoxedUnit> f)`
`static <VD> VertexRDD<VD>`	`fromEdges(EdgeRDD<?> edges, int numPartitions, VD defaultVal, scala.reflect.ClassTag<VD> evidence$17)` Constructs a `VertexRDD` containing all vertices referred to in `edges`.
`static scala.Option<String>`	`getCheckpointFile()`
`static int`	`getNumPartitions()`
`static StorageLevel`	`getStorageLevel()`
`static RDD<Object>`	`glom()`
`static <K> RDD<scala.Tuple2<K,scala.collection.Iterable<T>>>`	`groupBy(scala.Function1<T,K> f, scala.reflect.ClassTag<K> kt)`
`static <K> RDD<scala.Tuple2<K,scala.collection.Iterable<T>>>`	`groupBy(scala.Function1<T,K> f, int numPartitions, scala.reflect.ClassTag<K> kt)`
`static <K> RDD<scala.Tuple2<K,scala.collection.Iterable<T>>>`	`groupBy(scala.Function1<T,K> f, Partitioner p, scala.reflect.ClassTag<K> kt, scala.math.Ordering<K> ord)`
`static <K> scala.runtime.Null$`	`groupBy$default$4(scala.Function1<T,K> f, Partitioner p)`
`static int`	`id()`
`abstract <U,VD2> VertexRDD<VD2>`	`innerJoin(RDD<scala.Tuple2<Object,U>> other, scala.Function3<Object,VD,U,VD2> f, scala.reflect.ClassTag<U> evidence$10, scala.reflect.ClassTag<VD2> evidence$11)` Inner joins this VertexRDD with an RDD containing vertex attribute pairs.
`abstract <U,VD2> VertexRDD<VD2>`	`innerZipJoin(VertexRDD<U> other, scala.Function3<Object,VD,U,VD2> f, scala.reflect.ClassTag<U> evidence$8, scala.reflect.ClassTag<VD2> evidence$9)` Efficiently inner joins this VertexRDD with another VertexRDD sharing the same index.
`static RDD<T>`	`intersection(RDD<T> other)`
`static RDD<T>`	`intersection(RDD<T> other, int numPartitions)`
`static RDD<T>`	`intersection(RDD<T> other, Partitioner partitioner, scala.math.Ordering<T> ord)`
`static scala.math.Ordering<T>`	`intersection$default$3(RDD<T> other, Partitioner partitioner)`
`static boolean`	`isCheckpointed()`
`static boolean`	`isEmpty()`
`static scala.collection.Iterator<T>`	`iterator(Partition split, TaskContext context)`
`static <K> RDD<scala.Tuple2<K,T>>`	`keyBy(scala.Function1<T,K> f)`
`abstract <VD2,VD3> VertexRDD<VD3>`	`leftJoin(RDD<scala.Tuple2<Object,VD2>> other, scala.Function3<Object,VD,scala.Option<VD2>,VD3> f, scala.reflect.ClassTag<VD2> evidence$6, scala.reflect.ClassTag<VD3> evidence$7)` Left joins this VertexRDD with an RDD containing vertex attribute pairs.
`abstract <VD2,VD3> VertexRDD<VD3>`	`leftZipJoin(VertexRDD<VD2> other, scala.Function3<Object,VD,scala.Option<VD2>,VD3> f, scala.reflect.ClassTag<VD2> evidence$4, scala.reflect.ClassTag<VD3> evidence$5)` Left joins this RDD with another VertexRDD with the same index.
`static RDD<T>`	`localCheckpoint()`
`static <U> RDD<U>`	`map(scala.Function1<T,U> f, scala.reflect.ClassTag<U> evidence$3)`
`static <U> RDD<U>`	`mapPartitions(scala.Function1<scala.collection.Iterator<T>,scala.collection.Iterator<U>> f, boolean preservesPartitioning, scala.reflect.ClassTag<U> evidence$6)`
`static <U> boolean`	`mapPartitions$default$2()`
`static <U> boolean`	`mapPartitionsInternal$default$2()`
`static <U> RDD<U>`	`mapPartitionsWithIndex(scala.Function2<Object,scala.collection.Iterator<T>,scala.collection.Iterator<U>> f, boolean preservesPartitioning, scala.reflect.ClassTag<U> evidence$9)`
`static <U> boolean`	`mapPartitionsWithIndex$default$2()`
`static <U> boolean`	`mapPartitionsWithIndexInternal$default$2()`
`abstract <VD2> VertexRDD<VD2>`	`mapValues(scala.Function1<VD,VD2> f, scala.reflect.ClassTag<VD2> evidence$2)` Maps each vertex attribute, preserving the index.
`abstract <VD2> VertexRDD<VD2>`	`mapValues(scala.Function2<Object,VD,VD2> f, scala.reflect.ClassTag<VD2> evidence$3)` Maps each vertex attribute, additionally supplying the vertex ID.
`static T`	`max(scala.math.Ordering<T> ord)`
`static T`	`min(scala.math.Ordering<T> ord)`
`abstract VertexRDD<VD>`	`minus(RDD<scala.Tuple2<Object,VD>> other)` For each VertexId present in both `this` and `other`, minus will act as a set difference operation returning only those unique VertexId's present in `this`.
`abstract VertexRDD<VD>`	`minus(VertexRDD<VD> other)` For each VertexId present in both `this` and `other`, minus will act as a set difference operation returning only those unique VertexId's present in `this`.
`static void`	`name_$eq(String x$1)`
`static String`	`name()`
`static scala.Option<Partitioner>`	`partitioner()`
`static Partition[]`	`partitions()`
`static RDD<T>`	`persist()`
`static RDD<T>`	`persist(StorageLevel newLevel)`
`static RDD<String>`	`pipe(scala.collection.Seq<String> command, scala.collection.Map<String,String> env, scala.Function1<scala.Function1<String,scala.runtime.BoxedUnit>,scala.runtime.BoxedUnit> printPipeContext, scala.Function2<T,scala.Function1<String,scala.runtime.BoxedUnit>,scala.runtime.BoxedUnit> printRDDElement, boolean separateWorkingDir, int bufferSize, String encoding)`
`static RDD<String>`	`pipe(String command)`
`static RDD<String>`	`pipe(String command, scala.collection.Map<String,String> env)`
`static scala.collection.Map<String,String>`	`pipe$default$2()`
`static scala.Function1<scala.Function1<String,scala.runtime.BoxedUnit>,scala.runtime.BoxedUnit>`	`pipe$default$3()`
`static scala.Function2<T,scala.Function1<String,scala.runtime.BoxedUnit>,scala.runtime.BoxedUnit>`	`pipe$default$4()`
`static boolean`	`pipe$default$5()`
`static int`	`pipe$default$6()`
`static String`	`pipe$default$7()`
`static scala.collection.Seq<String>`	`preferredLocations(Partition split)`
`static RDD<T>[]`	`randomSplit(double[] weights, long seed)`
`static long`	`randomSplit$default$2()`
`static T`	`reduce(scala.Function2<T,T,T> f)`
`abstract VertexRDD<VD>`	`reindex()` Construct a new VertexRDD that is indexed by only the visible vertices.
`static RDD<T>`	`repartition(int numPartitions, scala.math.Ordering<T> ord)`
`static scala.math.Ordering<T>`	`repartition$default$2(int numPartitions)`
`abstract VertexRDD<VD>`	`reverseRoutingTables()` Returns a new `VertexRDD` reflecting a reversal of all edge directions in the corresponding `EdgeRDD`.
`static RDD<T>`	`sample(boolean withReplacement, double fraction, long seed)`
`static long`	`sample$default$3()`
`static void`	`saveAsObjectFile(String path)`
`static void`	`saveAsTextFile(String path)`
`static void`	`saveAsTextFile(String path, Class<? extends org.apache.hadoop.io.compress.CompressionCodec> codec)`
`static RDD<T>`	`setName(String _name)`
`static <K> RDD<T>`	`sortBy(scala.Function1<T,K> f, boolean ascending, int numPartitions, scala.math.Ordering<K> ord, scala.reflect.ClassTag<K> ctag)`
`static <K> boolean`	`sortBy$default$2()`
`static <K> int`	`sortBy$default$3()`
`static SparkContext`	`sparkContext()`
`static RDD<T>`	`subtract(RDD<T> other)`
`static RDD<T>`	`subtract(RDD<T> other, int numPartitions)`
`static RDD<T>`	`subtract(RDD<T> other, Partitioner p, scala.math.Ordering<T> ord)`
`static scala.math.Ordering<T>`	`subtract$default$3(RDD<T> other, Partitioner p)`
`static Object`	`take(int num)`
`static Object`	`takeOrdered(int num, scala.math.Ordering<T> ord)`
`static Object`	`takeSample(boolean withReplacement, int num, long seed)`
`static long`	`takeSample$default$3()`
`static String`	`toDebugString()`
`static JavaRDD<T>`	`toJavaRDD()`
`static scala.collection.Iterator<T>`	`toLocalIterator()`
`static Object`	`top(int num, scala.math.Ordering<T> ord)`
`static String`	`toString()`
`static <U> U`	`treeAggregate(U zeroValue, scala.Function2<U,T,U> seqOp, scala.Function2<U,U,U> combOp, int depth, scala.reflect.ClassTag<U> evidence$31)`
`static <U> int`	`treeAggregate$default$4(U zeroValue)`
`static T`	`treeReduce(scala.Function2<T,T,T> f, int depth)`
`static int`	`treeReduce$default$2()`
`static RDD<T>`	`union(RDD<T> other)`
`static RDD<T>`	`unpersist(boolean blocking)`
`static boolean`	`unpersist$default$1()`
`abstract VertexRDD<VD>`	`withEdges(EdgeRDD<?> edges)` Prepares this VertexRDD for efficient joins with the given EdgeRDD.
`static <U> RDD<scala.Tuple2<T,U>>`	`zip(RDD<U> other, scala.reflect.ClassTag<U> evidence$10)`
`static <B,V> RDD<V>`	`zipPartitions(RDD<B> rdd2, boolean preservesPartitioning, scala.Function2<scala.collection.Iterator<T>,scala.collection.Iterator<B>,scala.collection.Iterator<V>> f, scala.reflect.ClassTag<B> evidence$11, scala.reflect.ClassTag<V> evidence$12)`
`static <B,V> RDD<V>`	`zipPartitions(RDD<B> rdd2, scala.Function2<scala.collection.Iterator<T>,scala.collection.Iterator<B>,scala.collection.Iterator<V>> f, scala.reflect.ClassTag<B> evidence$13, scala.reflect.ClassTag<V> evidence$14)`
`static <B,C,V> RDD<V>`	`zipPartitions(RDD<B> rdd2, RDD<C> rdd3, boolean preservesPartitioning, scala.Function3<scala.collection.Iterator<T>,scala.collection.Iterator<B>,scala.collection.Iterator<C>,scala.collection.Iterator<V>> f, scala.reflect.ClassTag<B> evidence$15, scala.reflect.ClassTag<C> evidence$16, scala.reflect.ClassTag<V> evidence$17)`
`static <B,C,V> RDD<V>`	`zipPartitions(RDD<B> rdd2, RDD<C> rdd3, scala.Function3<scala.collection.Iterator<T>,scala.collection.Iterator<B>,scala.collection.Iterator<C>,scala.collection.Iterator<V>> f, scala.reflect.ClassTag<B> evidence$18, scala.reflect.ClassTag<C> evidence$19, scala.reflect.ClassTag<V> evidence$20)`
`static <B,C,D,V> RDD<V>`	`zipPartitions(RDD<B> rdd2, RDD<C> rdd3, RDD<D> rdd4, boolean preservesPartitioning, scala.Function4<scala.collection.Iterator<T>,scala.collection.Iterator<B>,scala.collection.Iterator<C>,scala.collection.Iterator<D>,scala.collection.Iterator<V>> f, scala.reflect.ClassTag<B> evidence$21, scala.reflect.ClassTag<C> evidence$22, scala.reflect.ClassTag<D> evidence$23, scala.reflect.ClassTag<V> evidence$24)`
`static <B,C,D,V> RDD<V>`	`zipPartitions(RDD<B> rdd2, RDD<C> rdd3, RDD<D> rdd4, scala.Function4<scala.collection.Iterator<T>,scala.collection.Iterator<B>,scala.collection.Iterator<C>,scala.collection.Iterator<D>,scala.collection.Iterator<V>> f, scala.reflect.ClassTag<B> evidence$25, scala.reflect.ClassTag<C> evidence$26, scala.reflect.ClassTag<D> evidence$27, scala.reflect.ClassTag<V> evidence$28)`
`static RDD<scala.Tuple2<T,Object>>`	`zipWithIndex()`
`static RDD<scala.Tuple2<T,Object>>`	`zipWithUniqueId()`

Methods inherited from class org.apache.spark.rdd.RDD
aggregate, cache, cartesian, checkpoint, coalesce, collect, collect, context, count, countApprox, countApproxDistinct, countApproxDistinct, countByValue, countByValueApprox, dependencies, distinct, distinct, doubleRDDToDoubleRDDFunctions, first, flatMap, fold, foreach, foreachPartition, getCheckpointFile, getNumPartitions, getStorageLevel, glom, groupBy, groupBy, groupBy, id, intersection, intersection, intersection, isCheckpointed, isEmpty, iterator, keyBy, localCheckpoint, map, mapPartitions, mapPartitionsWithIndex, max, min, name, numericRDDToDoubleRDDFunctions, partitioner, partitions, persist, persist, pipe, pipe, pipe, preferredLocations, randomSplit, rddToAsyncRDDActions, rddToOrderedRDDFunctions, rddToPairRDDFunctions, rddToSequenceFileRDDFunctions, reduce, repartition, sample, saveAsObjectFile, saveAsTextFile, saveAsTextFile, setName, sortBy, sparkContext, subtract, subtract, subtract, take, takeOrdered, takeSample, toDebugString, toJavaRDD, toLocalIterator, top, toString, treeAggregate, treeReduce, union, unpersist, zip, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipWithIndex, zipWithUniqueId

Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait

Constructor Detail

VertexRDD

public VertexRDD(SparkContext sc,
         scala.collection.Seq<Dependency<?>> deps)

Method Detail

apply
```
public static <VD> VertexRDD<VD> apply(RDD<scala.Tuple2<Object,VD>> vertices,
                       scala.reflect.ClassTag<VD> evidence$14)
```
Constructs a standalone VertexRDD (one that is not set up for efficient joins with an EdgeRDD) from an RDD of vertex-attribute pairs. Duplicate entries are removed arbitrarily.

Parameters:
vertices - the collection of vertex-attribute pairs
evidence$14 - (undocumented)

Returns:
(undocumented)

apply
```
public static <VD> VertexRDD<VD> apply(RDD<scala.Tuple2<Object,VD>> vertices,
                       EdgeRDD<?> edges,
                       VD defaultVal,
                       scala.reflect.ClassTag<VD> evidence$15)
```
Constructs a VertexRDD from an RDD of vertex-attribute pairs. Duplicate vertex entries are removed arbitrarily. The resulting VertexRDD will be joinable with edges, and any missing vertices referred to by edges will be created with the attribute defaultVal.

Parameters:
vertices - the collection of vertex-attribute pairs
edges - the EdgeRDD that these vertices may be joined with
defaultVal - the vertex attribute to use when creating missing vertices
evidence$15 - (undocumented)

Returns:
(undocumented)

apply
```
public static <VD> VertexRDD<VD> apply(RDD<scala.Tuple2<Object,VD>> vertices,
                       EdgeRDD<?> edges,
                       VD defaultVal,
                       scala.Function2<VD,VD,VD> mergeFunc,
                       scala.reflect.ClassTag<VD> evidence$16)
```
Constructs a VertexRDD from an RDD of vertex-attribute pairs. Duplicate vertex entries are merged using mergeFunc. The resulting VertexRDD will be joinable with edges, and any missing vertices referred to by edges will be created with the attribute defaultVal.

Parameters:
vertices - the collection of vertex-attribute pairs
edges - the EdgeRDD that these vertices may be joined with
defaultVal - the vertex attribute to use when creating missing vertices
mergeFunc - the commutative, associative duplicate vertex attribute merge function
evidence$16 - (undocumented)

Returns:
(undocumented)

fromEdges
```
public static <VD> VertexRDD<VD> fromEdges(EdgeRDD<?> edges,
                           int numPartitions,
                           VD defaultVal,
                           scala.reflect.ClassTag<VD> evidence$17)
```
Constructs a VertexRDD containing all vertices referred to in edges. The vertices will be created with the attribute defaultVal. The resulting VertexRDD will be joinable with edges.

Parameters:
edges - the EdgeRDD referring to the vertices to create
numPartitions - the desired number of partitions for the resulting VertexRDD
defaultVal - the vertex attribute to use when creating missing vertices
evidence$17 - (undocumented)

Returns:
(undocumented)

partitioner

public static scala.Option<Partitioner> partitioner()

sparkContext

public static SparkContext sparkContext()

id
```
public static int id()
```

name
```
public static String name()
```

name_$eq

public static void name_$eq(String x$1)

setName

public static RDD<T> setName(String _name)

persist

public static RDD<T> persist(StorageLevel newLevel)

persist
```
public static RDD<T> persist()
```

cache
```
public static RDD<T> cache()
```

unpersist

public static RDD<T> unpersist(boolean blocking)

getStorageLevel

public static StorageLevel getStorageLevel()

dependencies

public static final scala.collection.Seq<Dependency<?>> dependencies()

partitions

public static final Partition[] partitions()

getNumPartitions

public static final int getNumPartitions()

preferredLocations

public static final scala.collection.Seq<String> preferredLocations(Partition split)

iterator

public static final scala.collection.Iterator<T> iterator(Partition split,
                                    TaskContext context)

map

public static <U> RDD<U> map(scala.Function1<T,U> f,
             scala.reflect.ClassTag<U> evidence$3)

flatMap

public static <U> RDD<U> flatMap(scala.Function1<T,scala.collection.TraversableOnce<U>> f,
                 scala.reflect.ClassTag<U> evidence$4)

distinct

public static RDD<T> distinct(int numPartitions,
              scala.math.Ordering<T> ord)

distinct
```
public static RDD<T> distinct()
```

repartition

public static RDD<T> repartition(int numPartitions,
                 scala.math.Ordering<T> ord)

coalesce

public static RDD<T> coalesce(int numPartitions,
              boolean shuffle,
              scala.Option<PartitionCoalescer> partitionCoalescer,
              scala.math.Ordering<T> ord)

sample

public static RDD<T> sample(boolean withReplacement,
            double fraction,
            long seed)

randomSplit

public static RDD<T>[] randomSplit(double[] weights,
                   long seed)

takeSample

public static Object takeSample(boolean withReplacement,
                int num,
                long seed)

union

public static RDD<T> union(RDD<T> other)

$plus$plus

public static RDD<T> $plus$plus(RDD<T> other)

sortBy

public static <K> RDD<T> sortBy(scala.Function1<T,K> f,
                boolean ascending,
                int numPartitions,
                scala.math.Ordering<K> ord,
                scala.reflect.ClassTag<K> ctag)

intersection

public static RDD<T> intersection(RDD<T> other)

intersection

public static RDD<T> intersection(RDD<T> other,
                  Partitioner partitioner,
                  scala.math.Ordering<T> ord)

intersection

public static RDD<T> intersection(RDD<T> other,
                  int numPartitions)

glom
```
public static RDD<Object> glom()
```

cartesian

public static <U> RDD<scala.Tuple2<T,U>> cartesian(RDD<U> other,
                                   scala.reflect.ClassTag<U> evidence$5)

groupBy

public static <K> RDD<scala.Tuple2<K,scala.collection.Iterable<T>>> groupBy(scala.Function1<T,K> f,
                                                            scala.reflect.ClassTag<K> kt)

groupBy

public static <K> RDD<scala.Tuple2<K,scala.collection.Iterable<T>>> groupBy(scala.Function1<T,K> f,
                                                            int numPartitions,
                                                            scala.reflect.ClassTag<K> kt)

groupBy

public static <K> RDD<scala.Tuple2<K,scala.collection.Iterable<T>>> groupBy(scala.Function1<T,K> f,
                                                            Partitioner p,
                                                            scala.reflect.ClassTag<K> kt,
                                                            scala.math.Ordering<K> ord)

pipe

public static RDD<String> pipe(String command)

pipe

public static RDD<String> pipe(String command,
               scala.collection.Map<String,String> env)

pipe

public static RDD<String> pipe(scala.collection.Seq<String> command,
               scala.collection.Map<String,String> env,
               scala.Function1<scala.Function1<String,scala.runtime.BoxedUnit>,scala.runtime.BoxedUnit> printPipeContext,
               scala.Function2<T,scala.Function1<String,scala.runtime.BoxedUnit>,scala.runtime.BoxedUnit> printRDDElement,
               boolean separateWorkingDir,
               int bufferSize,
               String encoding)

mapPartitions

public static <U> RDD<U> mapPartitions(scala.Function1<scala.collection.Iterator<T>,scala.collection.Iterator<U>> f,
                       boolean preservesPartitioning,
                       scala.reflect.ClassTag<U> evidence$6)

mapPartitionsWithIndex

public static <U> RDD<U> mapPartitionsWithIndex(scala.Function2<Object,scala.collection.Iterator<T>,scala.collection.Iterator<U>> f,
                                boolean preservesPartitioning,
                                scala.reflect.ClassTag<U> evidence$9)

zip

public static <U> RDD<scala.Tuple2<T,U>> zip(RDD<U> other,
                             scala.reflect.ClassTag<U> evidence$10)

zipPartitions

public static <B,V> RDD<V> zipPartitions(RDD<B> rdd2,
                         boolean preservesPartitioning,
                         scala.Function2<scala.collection.Iterator<T>,scala.collection.Iterator<B>,scala.collection.Iterator<V>> f,
                         scala.reflect.ClassTag<B> evidence$11,
                         scala.reflect.ClassTag<V> evidence$12)

zipPartitions

public static <B,V> RDD<V> zipPartitions(RDD<B> rdd2,
                         scala.Function2<scala.collection.Iterator<T>,scala.collection.Iterator<B>,scala.collection.Iterator<V>> f,
                         scala.reflect.ClassTag<B> evidence$13,
                         scala.reflect.ClassTag<V> evidence$14)

zipPartitions

public static <B,C,V> RDD<V> zipPartitions(RDD<B> rdd2,
                           RDD<C> rdd3,
                           boolean preservesPartitioning,
                           scala.Function3<scala.collection.Iterator<T>,scala.collection.Iterator<B>,scala.collection.Iterator<C>,scala.collection.Iterator<V>> f,
                           scala.reflect.ClassTag<B> evidence$15,
                           scala.reflect.ClassTag<C> evidence$16,
                           scala.reflect.ClassTag<V> evidence$17)

zipPartitions

public static <B,C,V> RDD<V> zipPartitions(RDD<B> rdd2,
                           RDD<C> rdd3,
                           scala.Function3<scala.collection.Iterator<T>,scala.collection.Iterator<B>,scala.collection.Iterator<C>,scala.collection.Iterator<V>> f,
                           scala.reflect.ClassTag<B> evidence$18,
                           scala.reflect.ClassTag<C> evidence$19,
                           scala.reflect.ClassTag<V> evidence$20)

zipPartitions

public static <B,C,D,V> RDD<V> zipPartitions(RDD<B> rdd2,
                             RDD<C> rdd3,
                             RDD<D> rdd4,
                             boolean preservesPartitioning,
                             scala.Function4<scala.collection.Iterator<T>,scala.collection.Iterator<B>,scala.collection.Iterator<C>,scala.collection.Iterator<D>,scala.collection.Iterator<V>> f,
                             scala.reflect.ClassTag<B> evidence$21,
                             scala.reflect.ClassTag<C> evidence$22,
                             scala.reflect.ClassTag<D> evidence$23,
                             scala.reflect.ClassTag<V> evidence$24)

zipPartitions

public static <B,C,D,V> RDD<V> zipPartitions(RDD<B> rdd2,
                             RDD<C> rdd3,
                             RDD<D> rdd4,
                             scala.Function4<scala.collection.Iterator<T>,scala.collection.Iterator<B>,scala.collection.Iterator<C>,scala.collection.Iterator<D>,scala.collection.Iterator<V>> f,
                             scala.reflect.ClassTag<B> evidence$25,
                             scala.reflect.ClassTag<C> evidence$26,
                             scala.reflect.ClassTag<D> evidence$27,
                             scala.reflect.ClassTag<V> evidence$28)

foreach

public static void foreach(scala.Function1<T,scala.runtime.BoxedUnit> f)

foreachPartition

public static void foreachPartition(scala.Function1<scala.collection.Iterator<T>,scala.runtime.BoxedUnit> f)

collect
```
public static Object collect()
```

toLocalIterator

public static scala.collection.Iterator<T> toLocalIterator()

collect

public static <U> RDD<U> collect(scala.PartialFunction<T,U> f,
                 scala.reflect.ClassTag<U> evidence$29)

subtract

public static RDD<T> subtract(RDD<T> other)

subtract

public static RDD<T> subtract(RDD<T> other,
              int numPartitions)

subtract

public static RDD<T> subtract(RDD<T> other,
              Partitioner p,
              scala.math.Ordering<T> ord)

reduce

public static T reduce(scala.Function2<T,T,T> f)

treeReduce

public static T treeReduce(scala.Function2<T,T,T> f,
           int depth)

fold

public static T fold(T zeroValue,
     scala.Function2<T,T,T> op)

aggregate

public static <U> U aggregate(U zeroValue,
              scala.Function2<U,T,U> seqOp,
              scala.Function2<U,U,U> combOp,
              scala.reflect.ClassTag<U> evidence$30)

treeAggregate

public static <U> U treeAggregate(U zeroValue,
                  scala.Function2<U,T,U> seqOp,
                  scala.Function2<U,U,U> combOp,
                  int depth,
                  scala.reflect.ClassTag<U> evidence$31)

count
```
public static long count()
```

countApprox

public static PartialResult<BoundedDouble> countApprox(long timeout,
                                       double confidence)

countByValue

public static scala.collection.Map<T,Object> countByValue(scala.math.Ordering<T> ord)

countByValueApprox

public static PartialResult<scala.collection.Map<T,BoundedDouble>> countByValueApprox(long timeout,
                                                                      double confidence,
                                                                      scala.math.Ordering<T> ord)

countApproxDistinct

public static long countApproxDistinct(int p,
                       int sp)

countApproxDistinct

public static long countApproxDistinct(double relativeSD)

zipWithIndex

public static RDD<scala.Tuple2<T,Object>> zipWithIndex()

zipWithUniqueId

public static RDD<scala.Tuple2<T,Object>> zipWithUniqueId()

take
```
public static Object take(int num)
```

first
```
public static T first()
```

top

public static Object top(int num,
         scala.math.Ordering<T> ord)

takeOrdered

public static Object takeOrdered(int num,
                 scala.math.Ordering<T> ord)

max

public static T max(scala.math.Ordering<T> ord)

min

public static T min(scala.math.Ordering<T> ord)

isEmpty
```
public static boolean isEmpty()
```

saveAsTextFile

public static void saveAsTextFile(String path)

saveAsTextFile

public static void saveAsTextFile(String path,
                  Class<? extends org.apache.hadoop.io.compress.CompressionCodec> codec)

saveAsObjectFile

public static void saveAsObjectFile(String path)

keyBy

public static <K> RDD<scala.Tuple2<K,T>> keyBy(scala.Function1<T,K> f)

checkpoint
```
public static void checkpoint()
```

localCheckpoint

public static RDD<T> localCheckpoint()

isCheckpointed

public static boolean isCheckpointed()

getCheckpointFile

public static scala.Option<String> getCheckpointFile()

context
```
public static SparkContext context()
```

toDebugString
```
public static String toDebugString()
```

toString
```
public static String toString()
```

toJavaRDD
```
public static JavaRDD<T> toJavaRDD()
```

sample$default$3
```
public static long sample$default$3()
```

mapPartitionsWithIndex$default$2

public static <U> boolean mapPartitionsWithIndex$default$2()

unpersist$default$1

public static boolean unpersist$default$1()

distinct$default$2

public static scala.math.Ordering<T> distinct$default$2(int numPartitions)

coalesce$default$2

public static boolean coalesce$default$2()

coalesce$default$3

public static scala.Option<PartitionCoalescer> coalesce$default$3()

coalesce$default$4

public static scala.math.Ordering<T> coalesce$default$4(int numPartitions,
                                        boolean shuffle,
                                        scala.Option<PartitionCoalescer> partitionCoalescer)

repartition$default$2

public static scala.math.Ordering<T> repartition$default$2(int numPartitions)

subtract$default$3

public static scala.math.Ordering<T> subtract$default$3(RDD<T> other,
                                        Partitioner p)

intersection$default$3

public static scala.math.Ordering<T> intersection$default$3(RDD<T> other,
                                            Partitioner partitioner)

randomSplit$default$2

public static long randomSplit$default$2()

sortBy$default$2

public static <K> boolean sortBy$default$2()

sortBy$default$3

public static <K> int sortBy$default$3()

mapPartitions$default$2

public static <U> boolean mapPartitions$default$2()

groupBy$default$4

public static <K> scala.runtime.Null$ groupBy$default$4(scala.Function1<T,K> f,
                                        Partitioner p)

pipe$default$2

public static scala.collection.Map<String,String> pipe$default$2()

pipe$default$3

public static scala.Function1<scala.Function1<String,scala.runtime.BoxedUnit>,scala.runtime.BoxedUnit> pipe$default$3()

pipe$default$4

public static scala.Function2<T,scala.Function1<String,scala.runtime.BoxedUnit>,scala.runtime.BoxedUnit> pipe$default$4()

pipe$default$5

public static boolean pipe$default$5()

pipe$default$6
```
public static int pipe$default$6()
```

pipe$default$7
```
public static String pipe$default$7()
```

treeReduce$default$2

public static int treeReduce$default$2()

treeAggregate$default$4

public static <U> int treeAggregate$default$4(U zeroValue)

countApprox$default$2

public static double countApprox$default$2()

countByValue$default$1

public static scala.math.Ordering<T> countByValue$default$1()

countByValueApprox$default$2

public static double countByValueApprox$default$2()

countByValueApprox$default$3

public static scala.math.Ordering<T> countByValueApprox$default$3(long timeout,
                                                  double confidence)

takeSample$default$3

public static long takeSample$default$3()

countApproxDistinct$default$1

public static double countApproxDistinct$default$1()

mapPartitionsWithIndexInternal$default$2

public static <U> boolean mapPartitionsWithIndexInternal$default$2()

mapPartitionsInternal$default$2

public static <U> boolean mapPartitionsInternal$default$2()

compute

public scala.collection.Iterator<scala.Tuple2<Object,VD>> compute(Partition part,
                                                         TaskContext context)

Provides the RDD[(VertexId, VD)] equivalent output.

Specified by:: compute in class RDD<scala.Tuple2<Object,VD>>
Parameters:: part - (undocumented); context - (undocumented)
Returns:: (undocumented)

reindex
```
public abstract VertexRDD<VD> reindex()
```
Construct a new VertexRDD that is indexed by only the visible vertices. The resulting VertexRDD will be based on a different index and can no longer be quickly joined with this RDD.

Returns:
(undocumented)

filter
```
public VertexRDD<VD> filter(scala.Function1<scala.Tuple2<Object,VD>,Object> pred)
```
Restricts the vertex set to the set of vertices satisfying the given predicate. This operation preserves the index for efficient joins with the original RDD, and it sets bits in the bitmask rather than allocating new memory.
It is declared and defined here to allow refining the return type from RDD[(VertexId, VD)] to VertexRDD[VD].

Overrides:

filter in class RDD<scala.Tuple2<Object,VD>>

Parameters:
pred - the user defined predicate, which takes a tuple to conform to the RDD[(VertexId, VD)] interface

Returns:
(undocumented)

mapValues
```
public abstract <VD2> VertexRDD<VD2> mapValues(scala.Function1<VD,VD2> f,
                             scala.reflect.ClassTag<VD2> evidence$2)
```
Maps each vertex attribute, preserving the index.

Parameters:
f - the function applied to each value in the RDD
evidence$2 - (undocumented)

Returns:
a new VertexRDD with values obtained by applying f to each of the entries in the original VertexRDD

mapValues
```
public abstract <VD2> VertexRDD<VD2> mapValues(scala.Function2<Object,VD,VD2> f,
                             scala.reflect.ClassTag<VD2> evidence$3)
```
Maps each vertex attribute, additionally supplying the vertex ID.

Parameters:
f - the function applied to each ID-value pair in the RDD
evidence$3 - (undocumented)

Returns:
a new VertexRDD with values obtained by applying f to each of the entries in the original VertexRDD. The resulting VertexRDD retains the same index.

minus
```
public abstract VertexRDD<VD> minus(RDD<scala.Tuple2<Object,VD>> other)
```
For each VertexId present in both this and other, minus will act as a set difference operation returning only those unique VertexId's present in this.

Parameters:
other - an RDD to run the set operation against

Returns:
(undocumented)

minus
```
public abstract VertexRDD<VD> minus(VertexRDD<VD> other)
```
For each VertexId present in both this and other, minus will act as a set difference operation returning only those unique VertexId's present in this.

Parameters:
other - a VertexRDD to run the set operation against

Returns:
(undocumented)

diff
```
public abstract VertexRDD<VD> diff(RDD<scala.Tuple2<Object,VD>> other)
```
For each vertex present in both this and other, diff returns only those vertices with differing values; for values that are different, keeps the values from other. This is only guaranteed to work if the VertexRDDs share a common ancestor.

Parameters:
other - the other RDD[(VertexId, VD)] with which to diff against.

Returns:
(undocumented)

diff
```
public abstract VertexRDD<VD> diff(VertexRDD<VD> other)
```
For each vertex present in both this and other, diff returns only those vertices with differing values; for values that are different, keeps the values from other. This is only guaranteed to work if the VertexRDDs share a common ancestor.

Parameters:
other - the other VertexRDD with which to diff against.

Returns:
(undocumented)

leftZipJoin
```
public abstract <VD2,VD3> VertexRDD<VD3> leftZipJoin(VertexRDD<VD2> other,
                                   scala.Function3<Object,VD,scala.Option<VD2>,VD3> f,
                                   scala.reflect.ClassTag<VD2> evidence$4,
                                   scala.reflect.ClassTag<VD3> evidence$5)
```
Left joins this RDD with another VertexRDD with the same index. This function will fail if both VertexRDDs do not share the same index. The resulting vertex set contains an entry for each vertex in this. If other is missing any vertex in this VertexRDD, f is passed None.

Parameters:
other - the other VertexRDD with which to join.
f - the function mapping a vertex id and its attributes in this and the other vertex set to a new vertex attribute.
evidence$4 - (undocumented)
evidence$5 - (undocumented)

Returns:
a VertexRDD containing the results of f

leftJoin
```
public abstract <VD2,VD3> VertexRDD<VD3> leftJoin(RDD<scala.Tuple2<Object,VD2>> other,
                                scala.Function3<Object,VD,scala.Option<VD2>,VD3> f,
                                scala.reflect.ClassTag<VD2> evidence$6,
                                scala.reflect.ClassTag<VD3> evidence$7)
```
Left joins this VertexRDD with an RDD containing vertex attribute pairs. If the other RDD is backed by a VertexRDD with the same index then the efficient leftZipJoin implementation is used. The resulting VertexRDD contains an entry for each vertex in this. If other is missing any vertex in this VertexRDD, f is passed None. If there are duplicates, the vertex is picked arbitrarily.

Parameters:
other - the other VertexRDD with which to join
f - the function mapping a vertex id and its attributes in this and the other vertex set to a new vertex attribute.
evidence$6 - (undocumented)
evidence$7 - (undocumented)

Returns:
a VertexRDD containing all the vertices in this VertexRDD with the attributes emitted by f.

innerZipJoin

public abstract <U,VD2> VertexRDD<VD2> innerZipJoin(VertexRDD<U> other,
                                  scala.Function3<Object,VD,U,VD2> f,
                                  scala.reflect.ClassTag<U> evidence$8,
                                  scala.reflect.ClassTag<VD2> evidence$9)

Efficiently inner joins this VertexRDD with another VertexRDD sharing the same index. See innerJoin for the behavior of the join.

Parameters:: other - (undocumented); f - (undocumented); evidence$8 - (undocumented); evidence$9 - (undocumented)
Returns:: (undocumented)

innerJoin
```
public abstract <U,VD2> VertexRDD<VD2> innerJoin(RDD<scala.Tuple2<Object,U>> other,
                               scala.Function3<Object,VD,U,VD2> f,
                               scala.reflect.ClassTag<U> evidence$10,
                               scala.reflect.ClassTag<VD2> evidence$11)
```
Inner joins this VertexRDD with an RDD containing vertex attribute pairs. If the other RDD is backed by a VertexRDD with the same index then the efficient innerZipJoin implementation is used.

Parameters:
other - an RDD containing vertices to join. If there are multiple entries for the same vertex, one is picked arbitrarily. Use aggregateUsingIndex to merge multiple entries.
f - the join function applied to corresponding values of this and other
evidence$10 - (undocumented)
evidence$11 - (undocumented)

Returns:
a VertexRDD co-indexed with this, containing only vertices that appear in both this and other, with values supplied by f

aggregateUsingIndex
```
public abstract <VD2> VertexRDD<VD2> aggregateUsingIndex(RDD<scala.Tuple2<Object,VD2>> messages,
                                       scala.Function2<VD2,VD2,VD2> reduceFunc,
                                       scala.reflect.ClassTag<VD2> evidence$12)
```
Aggregates vertices in messages that have the same ids using reduceFunc, returning a VertexRDD co-indexed with this.

Parameters:
messages - an RDD containing messages to aggregate, where each message is a pair of its target vertex ID and the message data
reduceFunc - the associative aggregation function for merging messages to the same vertex
evidence$12 - (undocumented)

Returns:
a VertexRDD co-indexed with this, containing only vertices that received messages. For those vertices, their values are the result of applying reduceFunc to all received messages.

reverseRoutingTables
```
public abstract VertexRDD<VD> reverseRoutingTables()
```
Returns a new VertexRDD reflecting a reversal of all edge directions in the corresponding EdgeRDD.

Returns:
(undocumented)

withEdges
```
public abstract VertexRDD<VD> withEdges(EdgeRDD<?> edges)
```
Prepares this VertexRDD for efficient joins with the given EdgeRDD.

Class VertexRDD<VD>

Constructor Summary

Method Summary

Methods inherited from class org.apache.spark.rdd.RDD

Methods inherited from class Object

Constructor Detail

VertexRDD

Method Detail

apply

apply

apply

fromEdges

partitioner

sparkContext

id

name

name_$eq

setName

persist

persist

cache

unpersist

getStorageLevel

dependencies

partitions

getNumPartitions

preferredLocations

iterator

map

flatMap

distinct

distinct

repartition

coalesce

sample

randomSplit

takeSample

union

$plus$plus

sortBy

intersection

intersection

intersection

glom

cartesian

groupBy

groupBy

groupBy

pipe

pipe

pipe

mapPartitions

mapPartitionsWithIndex

zip

zipPartitions

zipPartitions

zipPartitions

zipPartitions

zipPartitions

zipPartitions

foreach

foreachPartition

collect

toLocalIterator

collect

subtract

subtract

subtract

reduce

treeReduce

fold

aggregate

treeAggregate

count

countApprox

countByValue

countByValueApprox

countApproxDistinct

countApproxDistinct

zipWithIndex