API Reference
Spark.JavaRDDType.

Pure wrapper around JavaRDD

Wrapper around JavaSparkContext

Spark.SparkContextMethod.

Params:

  • master - address of application master. Currently only local and standalone modes are supported. Default is 'local'

  • appname - name of application

Base.closeMethod.

Close SparkContext

Base.collectMethod.

Collect all elements of rdd on a driver machine

Base.collectMethod.

Collect all elements of rdd on a driver machine

Base.countMethod.

Count number of elements in this RDD

Base.mapMethod.

Apply function f to each element of rdd

Base.reduceMethod.

Reduce elements of rdd using specified function f

Spark.cacheMethod.

Persist this RDD with the default storage level (MEMORY_ONLY)

Spark.cacheMethod.

Persist this RDD with the default storage level (MEMORY_ONLY)

Spark.cartesianMethod.

Create a pair RDD with every combination of the values of rdd1 and rdd2

Spark.coalesceMethod.

Return a new RDD that is reduced into num_partitions partitions.

Spark.flat_mapMethod.

Similar to map, but each input item can be mapped to 0 or more output items (so f should return an iterator rather than a single item)

Similar to map, but each input item can be mapped to 0 or more output items (so f should return an iterator of pairs rather than a single item)

Spark.group_by_keyMethod.

When called on a dataset of (K, V) pairs, returns a dataset of (K, [V]) pairs.

Spark.idMethod.

Return the id of the rdd

Spark.map_pairMethod.

Apply function f to each element of rdd

Apply function f to each partition of rdd. f should be of type (iterator) -> iterator

Apply function f to each partition of rdd. f should be of type (iterator) -> iterator

Apply function f to each partition of rdd. f should be of type (index, iterator) -> iterator

Returns the number of partitions of this RDD.

Spark.pipeMethod.

Return an RDD created by piping elements to a forked external process.

When called on a dataset of (K, V) pairs, returns a dataset of (K, V) pairs where the values for each key are aggregated using the given reduce function func, which must be of type (V,V) => V.

Spark.repartitionMethod.

Return a new RDD that has exactly num_partitions partitions.

Makes the value of data available on workers as symbol name

Spark.text_fileMethod.

Create RDD from a text file

Iterates over the iterators within an iterator

Pure wrapper around JavaPairRDD

Julia type to handle Pair RDDs. Can handle pipelining of operations to reduce interprocess IO.

Params:

  • parentrdd - parent RDD

  • func - function of type (index, iterator) -> iterator to apply to each partition

Julia type to handle RDDs. Can handle pipelining of operations to reduce interprocess IO.

Spark.PipelinedRDDMethod.

Params:

  • parentrdd - parent RDD

  • func - function of type (index, iterator) -> iterator to apply to each partition

Spark.add_fileMethod.

Add file to SparkContext. This file will be downloaded to each executor's work directory

Spark.add_jarMethod.

Add JAR file to SparkContext. Classes from this JAR will then be available to all tasks

chain 2 partion functions together

Collects the RDD to the Julia process, by serialising all values via a byte array

Collects the RDD to the Julia process, via an Julia iterator that fetches each row at a time. This prevents creation of a byte array containing all rows at a time.

Spark.collect_itrMethod.

Collect all elements of rdd on a driver machine

Spark.collect_itrMethod.

Collect all elements of rdd on a driver machine

Spark.contextMethod.

Get SparkContext of this RDD

creates a function that operates on a partition from an element by element flat_map function

creates a function that operates on a partition from an element by element map function

Spark.deserializedMethod.

Return object deserialized from array of bytes

Spark.readobjMethod.

Read data object from a ioet. Returns code and byte array:

  • if code is negative, it's considered as a special command code

  • if code is positive, it's considered as array length

Spark.serializedMethod.

Return serialized object as an array of bytes

Spark.writeobjMethod.

Write object to stream