Nonlinear regression using Spark – Part 2: sum-of-squares objective functions

Constantinos VoglisData Science, Spark 4 Comments

This post is the second one in a series that discusses algorithmic and implementation issues about nonlinear regression using Spark. In the previous post we identified a small window for contribution into Spark MLlib by adding methods for nonlinear regression, starting with the definition and implementation of a general nonlinear model. We remind the reader that regression is essentially an …

Classification in Spark 2.0: “Input validation failed” and other wondrous tales

Christos - Iraklis TsatsoulisData Science, Spark 7 Comments

Spark 2.0 has been released since last July but, despite the numerous improvements and new features, several annoyances still remain and can cause headaches, especially in the Spark machine learning APIs. Today we’ll have a look at some of them, inspired by a recent answer of mine in a Stack Overflow question (the question was about Spark 1.6 but, as …

Nonlinear regression using Spark – Part 1: Nonlinear models

Constantinos VoglisSpark 2 Comments

Regression constitutes a very important topic in supervised learning. Its goal is to predict the value of one or more continuous target variables (responses) given the value of a -dimensional vector of input variables (predictors). More specifically, given a training data set comprising of observations , where , together with corresponding target values , the goal is to predict the …