Rodrigo Hernández Mota
whoami
Data Engineer / Scientist at LeadGenius
Predictive Markdown Model Language is:
"an XML-based language that provides a way for applications to define statistical and data-mining models as well as to share models between PMML-compliant applications."
Integration with the most popular ML frameworks via JPMML:
We can perform model scoring either with a stream-processing engine or a stream-processing library.
We can use Akka Streams - based on Akka Actors (see syntax example).
According to their website,
"Apache Spark is a unified analytics engine for large-scale data processing."
Spark ML is a practical and scalable machine learning library based on a [Dataset].
Dataset[A].map(fn: A => B): Dataset[B]
Dataset[A].flatMap(fn: A => Dataset[B]): Dataset[B]
Dataset[A].filter(fn: A => Boolean): Dataset[A]
Dataset[Row]
Transformer
Estimator
Pipeline
val pmmlBuilder = new PMMLBuilder(schema, pipelineModel)
pmmlBuilder.build()
See the official jpmml-sparkml github repo for a complete list of supported PipelineStages
types.
We can use Openscoring, a java-based REST web-service, as our scoring-engine of the resulting PMML model.