Spark data frames from CSV files: handling headers & column types

Christos - Iraklis TsatsoulisBig Data, Spark 16 Comments

If you come from the R (or Python/pandas) universe, like me, you must implicitly think that working with CSV files must be one of the most natural and straightforward things to happen in a data analysis context. Indeed, if you have your data in a CSV file, practically the only thing you have to do from R is to fire …

ViewCriteria issue when using same attribute twice (Oracle JDeveloper 12.1.3.0)

Rigas PapazisisFusion Middleware, Oracle ADF 1 Comment

Some days ago, in an Oracle MAF technical workshop, I witnessed one of those awkward moments when you are certain that something should work correctly in a presentation but it didn’t. While showing us some basic functionality in Oracle ADF 12c, the instructor implemented a simple query using ViewCriteria. He used an OR conjunction but the query seemed to react …

Undocumented behavior of ore.make.names() function in Oracle R Enterprise

Christos - Iraklis TsatsoulisOracle R 2 Comments

While working with some data in Hive recently using the Oracle R Connectors for Hadoop (ORCH), I tried to use the ore.make.names function (of package OREbase ). The function creates valid column names for ore.frame objects. Here is a reproducible example, copied straight from the function documentation: Experimenting a little, I discovered that ore.make.names becomes functional after executing ore.connect. Indeed, …

Oracle R Enterprise issues in Oracle Big Data Lite VM 4.1.0

Christos - Iraklis TsatsoulisOracle R 4 Comments

In the previous post, we examined some configuration issues with Cloudera Manager and Hadoop services in the latest release of Oracle Big Data Lite VM (4.1.0). In this post we report issues with Oracle R Enterprise, and the remedies we applied. It turns out that if we load the ORE package in R, we subsequently cannot use the help system …

Cloudera Manager configuration issues in Oracle Big Data Lite VM 4.1.0

Christos - Iraklis TsatsoulisBig Data, Hadoop 2 Comments

Oracle has recently announced the release of a new version (4.1.0) of its Big Data Lite VM. Compared to the previous release (4.0.1), we now have more recent versions of Oracle Enterprise Linux (6.5), Oracle NoSQL database (3.2.5), Cloudera distribution of Apache Hadoop (CDH 5.3.0) and Cloudera Manager (5.3.0). The new version of CDH, by itself, also brings forward several …