Dataframes from CSV files in Spark 1.5: automatic schema extraction, neat summary statistics, & elementary data exploration

Christos - Iraklis TsatsoulisBig Data, Spark 25 Comments

In a previous post, we glimpsed briefly at creating and manipulating Spark dataframes from CSV files. In the couple of months since, Spark has already gone from version 1.3.0 to 1.5, with more than 100 built-in functions introduced in Spark 1.5 alone; so, we thought it is a good time for revisiting the subject, this time also utilizing the external …