Spark.FlatMapIteratorSpark.JavaPairRDDSpark.JavaRDDSpark.PipelinedPairRDDSpark.PipelinedPairRDDSpark.PipelinedRDDSpark.PipelinedRDDSpark.SparkContextSpark.SparkContextBase.closeBase.collectBase.collectBase.countBase.mapBase.reduceSpark.add_fileSpark.add_jarSpark.cacheSpark.cacheSpark.cartesianSpark.chain_functionSpark.coalesceSpark.collect_internalSpark.collect_internal_itrSpark.collect_itrSpark.collect_itrSpark.contextSpark.create_flat_map_functionSpark.create_map_functionSpark.deserializedSpark.flat_mapSpark.flat_map_pairSpark.group_by_keySpark.idSpark.map_pairSpark.map_partitionsSpark.map_partitions_pairSpark.map_partitions_with_indexSpark.num_partitionsSpark.pipeSpark.readobjSpark.reduce_by_keySpark.repartitionSpark.serializedSpark.share_variableSpark.text_fileSpark.writeobj
Spark.JavaRDD — Type.Pure wrapper around JavaRDD
Spark.SparkContext — Type.Wrapper around JavaSparkContext
Spark.SparkContext — Method.Params:
master - address of application master. Currently only local and standalone modes are supported. Default is 'local'
appname - name of application
Base.close — Method.Close SparkContext
Base.collect — Method.Collect all elements of rdd on a driver machine
Base.collect — Method.Collect all elements of rdd on a driver machine
Base.count — Method.Count number of elements in this RDD
Base.map — Method.Apply function f to each element of rdd
Base.reduce — Method.Reduce elements of rdd using specified function f
Spark.cache — Method.Persist this RDD with the default storage level (MEMORY_ONLY)
Spark.cache — Method.Persist this RDD with the default storage level (MEMORY_ONLY)
Spark.cartesian — Method.Create a pair RDD with every combination of the values of rdd1 and rdd2
Spark.coalesce — Method.Return a new RDD that is reduced into num_partitions partitions.
Spark.flat_map — Method.Similar to map, but each input item can be mapped to 0 or more output items (so f should return an iterator rather than a single item)
Spark.flat_map_pair — Method.Similar to map, but each input item can be mapped to 0 or more output items (so f should return an iterator of pairs rather than a single item)
Spark.group_by_key — Method.When called on a dataset of (K, V) pairs, returns a dataset of (K, [V]) pairs.
Spark.id — Method.Return the id of the rdd
Spark.map_pair — Method.Apply function f to each element of rdd
Spark.map_partitions — Method.Apply function f to each partition of rdd. f should be of type (iterator) -> iterator
Spark.map_partitions_pair — Method.Apply function f to each partition of rdd. f should be of type (iterator) -> iterator
Spark.map_partitions_with_index — Method.Apply function f to each partition of rdd. f should be of type (index, iterator) -> iterator
Spark.num_partitions — Method.Returns the number of partitions of this RDD.
Spark.pipe — Method.Return an RDD created by piping elements to a forked external process.
Spark.reduce_by_key — Method.When called on a dataset of (K, V) pairs, returns a dataset of (K, V) pairs where the values for each key are aggregated using the given reduce function func, which must be of type (V,V) => V.
Spark.repartition — Method.Return a new RDD that has exactly num_partitions partitions.
Spark.share_variable — Method.Makes the value of data available on workers as symbol name
Spark.text_file — Method.Create RDD from a text file
Spark.FlatMapIterator — Type.Iterates over the iterators within an iterator
Spark.JavaPairRDD — Type.Pure wrapper around JavaPairRDD
Spark.PipelinedPairRDD — Type.Julia type to handle Pair RDDs. Can handle pipelining of operations to reduce interprocess IO.
Spark.PipelinedPairRDD — Method.Params:
parentrdd - parent RDD
func - function of type
(index, iterator) -> iteratorto apply to each partition
Spark.PipelinedRDD — Type.Julia type to handle RDDs. Can handle pipelining of operations to reduce interprocess IO.
Spark.PipelinedRDD — Method.Params:
parentrdd - parent RDD
func - function of type
(index, iterator) -> iteratorto apply to each partition
Spark.add_file — Method.Add file to SparkContext. This file will be downloaded to each executor's work directory
Spark.add_jar — Method.Add JAR file to SparkContext. Classes from this JAR will then be available to all tasks
Spark.chain_function — Method.chain 2 partion functions together
Spark.collect_internal — Method.Collects the RDD to the Julia process, by serialising all values via a byte array
Spark.collect_internal_itr — Method.Collects the RDD to the Julia process, via an Julia iterator that fetches each row at a time. This prevents creation of a byte array containing all rows at a time.
Spark.collect_itr — Method.Collect all elements of rdd on a driver machine
Spark.collect_itr — Method.Collect all elements of rdd on a driver machine
Spark.context — Method.Get SparkContext of this RDD
Spark.create_flat_map_function — Method.creates a function that operates on a partition from an element by element flat_map function
Spark.create_map_function — Method.creates a function that operates on a partition from an element by element map function
Spark.deserialized — Method.Return object deserialized from array of bytes
Spark.readobj — Method.Read data object from a ioet. Returns code and byte array:
if code is negative, it's considered as a special command code
if code is positive, it's considered as array length
Spark.serialized — Method.Return serialized object as an array of bytes
Spark.writeobj — Method.Write object to stream