API Reference · Spark

Spark.JavaRDD — Type.

Pure wrapper around JavaRDD

Spark.SparkContext — Type.

Wrapper around JavaSparkContext

Spark.SparkContext — Method.

Params:

master - address of application master. Currently only local and standalone modes are supported. Default is 'local'
appname - name of application

Base.close — Method.

Close SparkContext

Base.collect — Method.

Collect all elements of rdd on a driver machine

Base.collect — Method.

Collect all elements of rdd on a driver machine

Base.count — Method.

Count number of elements in this RDD

Base.map — Method.

Apply function f to each element of rdd

Base.reduce — Method.

Reduce elements of rdd using specified function f

Spark.cache — Method.

Persist this RDD with the default storage level (MEMORY_ONLY)

Spark.cache — Method.

Persist this RDD with the default storage level (MEMORY_ONLY)

Spark.cartesian — Method.

Create a pair RDD with every combination of the values of rdd1 and rdd2

Spark.coalesce — Method.

Return a new RDD that is reduced into num_partitions partitions.

Spark.flat_map — Method.

Similar to map, but each input item can be mapped to 0 or more output items (so f should return an iterator rather than a single item)

Spark.flat_map_pair — Method.

Similar to map, but each input item can be mapped to 0 or more output items (so f should return an iterator of pairs rather than a single item)

Spark.group_by_key — Method.

When called on a dataset of (K, V) pairs, returns a dataset of (K, [V]) pairs.

Spark.id — Method.

Return the id of the rdd

Spark.map_pair — Method.

Apply function f to each element of rdd

Spark.map_partitions — Method.

Apply function f to each partition of rdd. f should be of type (iterator) -> iterator

Spark.map_partitions_pair — Method.

Apply function f to each partition of rdd. f should be of type (iterator) -> iterator

Spark.map_partitions_with_index — Method.

Apply function f to each partition of rdd. f should be of type (index, iterator) -> iterator

Spark.num_partitions — Method.

Returns the number of partitions of this RDD.

Spark.pipe — Method.

Return an RDD created by piping elements to a forked external process.

Spark.reduce_by_key — Method.

When called on a dataset of (K, V) pairs, returns a dataset of (K, V) pairs where the values for each key are aggregated using the given reduce function func, which must be of type (V,V) => V.

Spark.repartition — Method.

Return a new RDD that has exactly num_partitions partitions.

Spark.share_variable — Method.

Makes the value of data available on workers as symbol name

Spark.text_file — Method.

Create RDD from a text file

Spark.FlatMapIterator — Type.

Iterates over the iterators within an iterator

Spark.JavaPairRDD — Type.

Pure wrapper around JavaPairRDD

Spark.PipelinedPairRDD — Type.

Julia type to handle Pair RDDs. Can handle pipelining of operations to reduce interprocess IO.

Spark.PipelinedPairRDD — Method.

Params:

parentrdd - parent RDD
func - function of type (index, iterator) -> iterator to apply to each partition

Spark.PipelinedRDD — Type.

Julia type to handle RDDs. Can handle pipelining of operations to reduce interprocess IO.

Spark.PipelinedRDD — Method.

Params:

parentrdd - parent RDD
func - function of type (index, iterator) -> iterator to apply to each partition

Spark.add_file — Method.

Add file to SparkContext. This file will be downloaded to each executor's work directory

Spark.add_jar — Method.

Add JAR file to SparkContext. Classes from this JAR will then be available to all tasks

Spark.chain_function — Method.

chain 2 partion functions together

Spark.collect_internal — Method.

Collects the RDD to the Julia process, by serialising all values via a byte array

Spark.collect_internal_itr — Method.

Collects the RDD to the Julia process, via an Julia iterator that fetches each row at a time. This prevents creation of a byte array containing all rows at a time.

Spark.collect_itr — Method.

Collect all elements of rdd on a driver machine

Spark.collect_itr — Method.

Collect all elements of rdd on a driver machine

Spark.context — Method.

Get SparkContext of this RDD

Spark.create_flat_map_function — Method.

creates a function that operates on a partition from an element by element flat_map function

Spark.create_map_function — Method.

creates a function that operates on a partition from an element by element map function

Spark.deserialized — Method.

Return object deserialized from array of bytes