In order to keep a fair amount of consistency between the Scala and Java APIs, some
of the features that allow a high-level of expressiveness in Scala have been left
out from the standard APIs for both batch and streaming.
If you want to enjoy the full Scala experience you can choose to opt-in to
extensions that enhance the Scala API via implicit conversions.
To use all the available extensions, you can just add a simple import for the
DataSet API
or the DataStream API
Alternatively, you can import individual extensions a-là-carte to only use those
you prefer.
Accept partial functions
Normally, both the DataSet and DataStream APIs don’t accept anonymous pattern
matching functions to deconstruct tuples, case classes or collections, like the
following:
This extension introduces new methods in both the DataSet and DataStream Scala API
that have a one-to-one correspondance in the extended API. These delegating methods
do support anonymous pattern matching functions.
DataSet API
Method
Original
Example
mapWith
map (DataSet)
mapPartitionWith
mapPartition (DataSet)
flatMapWith
flatMap (DataSet)
filterWith
filter (DataSet)
reduceWith
reduce (DataSet, GroupedDataSet)
reduceGroupWith
reduceGroup (GroupedDataSet)
groupingBy
groupBy (DataSet)
sortGroupWith
sortGroup (GroupedDataSet)
combineGroupWith
combineGroup (GroupedDataSet)
projecting
apply (JoinDataSet, CrossDataSet)
projecting
apply (CoGroupDataSet)
DataStream API
Method
Original
Example
mapWith
map (DataStream)
mapPartitionWith
mapPartition (DataStream)
flatMapWith
flatMap (DataStream)
filterWith
filter (DataStream)
keyingBy
keyBy (DataStream)
mapWith
map (ConnectedDataStream)
flatMapWith
flatMap (ConnectedDataStream)
keyingBy
keyBy (ConnectedDataStream)
reduceWith
reduce (KeyedDataStream, WindowedDataStream)
foldWith
fold (KeyedDataStream, WindowedDataStream)
applyWith
apply (WindowedDataStream)
projecting
apply (JoinedDataStream)
For more information on the semantics of each method, please refer to the
DataSet and DataStream API documentation.
To use this extension exclusively, you can add the following import:
for the DataSet extensions and
The following snippet shows a minimal example of how to use these extension
methods together (with the DataSet API):