Spark.FlatMapIterator
Spark.JavaPairRDD
Spark.JavaRDD
Spark.PipelinedPairRDD
Spark.PipelinedPairRDD
Spark.PipelinedRDD
Spark.PipelinedRDD
Spark.SparkContext
Spark.SparkContext
Base.close
Base.collect
Base.collect
Base.count
Base.map
Base.reduce
Spark.add_file
Spark.add_jar
Spark.cache
Spark.cache
Spark.cartesian
Spark.chain_function
Spark.coalesce
Spark.collect_internal
Spark.collect_internal_itr
Spark.collect_itr
Spark.collect_itr
Spark.context
Spark.create_flat_map_function
Spark.create_map_function
Spark.deserialized
Spark.flat_map
Spark.flat_map_pair
Spark.group_by_key
Spark.id
Spark.map_pair
Spark.map_partitions
Spark.map_partitions_pair
Spark.map_partitions_with_index
Spark.num_partitions
Spark.pipe
Spark.readobj
Spark.reduce_by_key
Spark.repartition
Spark.serialized
Spark.share_variable
Spark.text_file
Spark.writeobj
Spark.JavaRDD
— Type.Pure wrapper around JavaRDD
Spark.SparkContext
— Type.Wrapper around JavaSparkContext
Spark.SparkContext
— Method.Params:
master - address of application master. Currently only local and standalone modes are supported. Default is 'local'
appname - name of application
Base.close
— Method.Close SparkContext
Base.collect
— Method.Collect all elements of rdd
on a driver machine
Base.collect
— Method.Collect all elements of rdd
on a driver machine
Base.count
— Method.Count number of elements in this RDD
Base.map
— Method.Apply function f
to each element of rdd
Base.reduce
— Method.Reduce elements of rdd
using specified function f
Spark.cache
— Method.Persist this RDD with the default storage level (MEMORY_ONLY)
Spark.cache
— Method.Persist this RDD with the default storage level (MEMORY_ONLY)
Spark.cartesian
— Method.Create a pair RDD with every combination of the values of rdd1 and rdd2
Spark.coalesce
— Method.Return a new RDD that is reduced into num_partitions partitions.
Spark.flat_map
— Method.Similar to map
, but each input item can be mapped to 0 or more output items (so f
should return an iterator rather than a single item)
Spark.flat_map_pair
— Method.Similar to map
, but each input item can be mapped to 0 or more output items (so f
should return an iterator of pairs rather than a single item)
Spark.group_by_key
— Method.When called on a dataset of (K, V) pairs, returns a dataset of (K, [V]) pairs.
Spark.id
— Method.Return the id of the rdd
Spark.map_pair
— Method.Apply function f
to each element of rdd
Spark.map_partitions
— Method.Apply function f
to each partition of rdd
. f
should be of type (iterator) -> iterator
Spark.map_partitions_pair
— Method.Apply function f
to each partition of rdd
. f
should be of type (iterator) -> iterator
Spark.map_partitions_with_index
— Method.Apply function f
to each partition of rdd
. f
should be of type (index, iterator) -> iterator
Spark.num_partitions
— Method.Returns the number of partitions of this RDD.
Spark.pipe
— Method.Return an RDD created by piping elements to a forked external process.
Spark.reduce_by_key
— Method.When called on a dataset of (K, V) pairs, returns a dataset of (K, V) pairs where the values for each key are aggregated using the given reduce function func, which must be of type (V,V) => V.
Spark.repartition
— Method.Return a new RDD that has exactly num_partitions partitions.
Spark.share_variable
— Method.Makes the value of data available on workers as symbol name
Spark.text_file
— Method.Create RDD from a text file
Spark.FlatMapIterator
— Type.Iterates over the iterators within an iterator
Spark.JavaPairRDD
— Type.Pure wrapper around JavaPairRDD
Spark.PipelinedPairRDD
— Type.Julia type to handle Pair RDDs. Can handle pipelining of operations to reduce interprocess IO.
Spark.PipelinedPairRDD
— Method.Params:
parentrdd - parent RDD
func - function of type
(index, iterator) -> iterator
to apply to each partition
Spark.PipelinedRDD
— Type.Julia type to handle RDDs. Can handle pipelining of operations to reduce interprocess IO.
Spark.PipelinedRDD
— Method.Params:
parentrdd - parent RDD
func - function of type
(index, iterator) -> iterator
to apply to each partition
Spark.add_file
— Method.Add file to SparkContext. This file will be downloaded to each executor's work directory
Spark.add_jar
— Method.Add JAR file to SparkContext. Classes from this JAR will then be available to all tasks
Spark.chain_function
— Method.chain 2 partion functions together
Spark.collect_internal
— Method.Collects the RDD to the Julia process, by serialising all values via a byte array
Spark.collect_internal_itr
— Method.Collects the RDD to the Julia process, via an Julia iterator that fetches each row at a time. This prevents creation of a byte array containing all rows at a time.
Spark.collect_itr
— Method.Collect all elements of rdd
on a driver machine
Spark.collect_itr
— Method.Collect all elements of rdd
on a driver machine
Spark.context
— Method.Get SparkContext of this RDD
Spark.create_flat_map_function
— Method.creates a function that operates on a partition from an element by element flat_map function
Spark.create_map_function
— Method.creates a function that operates on a partition from an element by element map function
Spark.deserialized
— Method.Return object deserialized from array of bytes
Spark.readobj
— Method.Read data object from a ioet. Returns code and byte array:
if code is negative, it's considered as a special command code
if code is positive, it's considered as array length
Spark.serialized
— Method.Return serialized object as an array of bytes
Spark.writeobj
— Method.Write object to stream