Nonlinear regression using Spark – Part 2: sum-of-squares objective functions

Constantinos VoglisData Science, Spark 4 Comments

This post is the second one in a series that discusses algorithmic and implementation issues about nonlinear regression using Spark. In the previous post we identified a small window for contribution into Spark MLlib by adding methods for nonlinear regression, starting with the definition and implementation of a general nonlinear model. We remind the reader that regression is essentially an …

How to evaluate R models in Azure Machine Learning Studio

Constantinos VoglisAzure Machine Learning Studio, Data Science, R 6 Comments

Azure Machine Learning Studio is a GUI-based integrated development environment for constructing and operationalizing machine learning workflows. The basic computational unit of an Azure ML Studio workflow (or Experiment) is a module which implements machine learning algorithms, data conversion and transformation functions etc. Modules can be connected by data flows, thus implementing a machine learning pipeline. A typical pipeline in …

Nonlinear regression using Spark – Part 1: Nonlinear models

Constantinos VoglisSpark 2 Comments

Regression constitutes a very important topic in supervised learning. Its goal is to predict the value of one or more continuous target variables (responses) given the value of a -dimensional vector of input variables (predictors). More specifically, given a training data set comprising of observations , where , together with corresponding target values , the goal is to predict the …

Development and deployment of Spark applications with Scala, Eclipse, and sbt – Part 2: A Recommender System

Constantinos VoglisBig Data, Spark 11 Comments

In our previous post, we demonstrated how to setup the necessary software components, so that we can develop and deploy Spark applications with Scala, Eclipse, and sbt. We also included the example of a simple application. In this post, we are taking this demonstration one step further. We discuss a more serious application of a recommender system and present the …

Development and deployment of Spark applications with Scala, Eclipse, and sbt – Part 1: Installation & configuration

Constantinos VoglisBig Data, Spark 23 Comments

The purpose of this tutorial is to setup the necessary environment for development and deployment of Spark applications with Scala. Specifically, we are going to use the Eclipse IDE for development of applications and deploy them with spark-submit. The glue that ties everything together is the sbt interactive build tool. The sbt tool provides plugins used to: Create an Eclipse …