Spark data frames from CSV files: handling headers & column types

Christos - Iraklis Tsatsoulis Big Data, Spark 15 Comments

If you come from the R (or Python/pandas) universe, like me, you must implicitly think that working with CSV files must be one of the most natural and straightforward things to happen in a data analysis context. Indeed, if you have your data in a CSV file, practically the only thing you have to do from R is to fire …

Cloudera Manager configuration issues in Oracle Big Data Lite VM 4.1.0

Christos - Iraklis Tsatsoulis Big Data, Hadoop 2 Comments

Oracle has recently announced the release of a new version (4.1.0) of its Big Data Lite VM. Compared to the previous release (4.0.1), we now have more recent versions of Oracle Enterprise Linux (6.5), Oracle NoSQL database (3.2.5), Cloudera distribution of Apache Hadoop (CDH 5.3.0) and Cloudera Manager (5.3.0). The new version of CDH, by itself, also brings forward several …