Bulk load data to HBase in Oracle Big Data Appliance

Christos - Iraklis Tsatsoulis Big Data, HBase 0 Comments

I ran into an issue recently, while trying to bulk load some data to HBase in Oracle Big Data Appliance. Following is a reproducible description and solution using the current version of Oracle Big Data Lite VM (4.4.0). Enabling HBase in Oracle Big Data Lite VM (Feel free to skip this section if you do not use Oracle Big Data …

Installing the additional R packages in Oracle Big Data Lite VM 4.4.0

Christos - Iraklis Tsatsoulis R 0 Comments

In the just-released version 4.4.0 of Oracle Big Data Lite VM, as in the previous one (4.3.0.1), there is a rather large number of additional R packages to be installed by the provided script install_additional_packages.sh, i.e. 28 packages without counting their dependencies (the respective number in version 4.2.1 was only 10). Unfortunately, what has also changed is the form of …

Using ROracle with Oracle Instant Client 12c

Christos - Iraklis Tsatsoulis Oracle R, R 0 Comments

The other day, while setting up the new Oracle R Enterprise (ORE) 1.5 client packages in a Linux server, we installed the Oracle DB Instant Client v. 12.1, as advised in the relevant documentation. Problem was, ORE failed to load, in fact due to ROracle failure: Truth is, the file libclntsh.so.11.1 did not exist, but this was expected, simply due …

Querying Big Data SQL tables with Oracle R Enterprise

Christos - Iraklis Tsatsoulis Big Data, Oracle Big Data SQL, Oracle R 0 Comments

I was wondering recently if I could use Oracle R Enterprise (ORE) to query Big Data SQL tables (i.e. Oracle Database external tables based on HDFS or Hive data), since I have never seen such a combination mentioned in the relevant Oracle documentation and white papers. I am happy to announce that the answer is an unconditional yes. In this …

Nonlinear regression using Spark – Part 1: Nonlinear models

Constantinos Voglis Spark 1 Comment

Regression constitutes a very important topic in supervised learning. Its goal is to predict the value of one or more continuous target variables (responses) given the value of a -dimensional vector of input variables (predictors). More specifically, given a training data set comprising of observations , where , together with corresponding target values , the goal is to predict the …

Caution when installing Oracle R Distribution in Oracle Linux using Yum

Christos - Iraklis Tsatsoulis Oracle R 0 Comments

Last week we tried to install Oracle R Distribution (ORD) in Oracle Linux 7.1 using Yum, which is the installation method recommended by Oracle. After following closely the instructions provided in the documentation, instead of the Oracle R Distribution 3.2.0, we found ourselves with the latest (3.2.3) version of GNU R installed. What had happened is that in our /etc/yum.repos.d, …

Limitations of Spark MLlib linear algebra module

Christos - Iraklis Tsatsoulis Spark 0 Comments

A couple of days ago I stumbled upon some unexpected behavior of Spark MLlib (v. 1.5.2), while trying some ultra-simple operations on vectors. Consider the following Pyspark snippet: Clearly, what happens is that the unary operator – (minus) for vectors fails, giving errors for expressions like -x and -y+x, although x-y behaves as expected. The result of the last operation, …

How NOT to perform feature selection!

Christos - Iraklis Tsatsoulis Data Science 1 Comment

Cross-validation (CV) is nowadays being widely used for model assessment in predictive analytics tasks; nevertheless, cases where it is incorrectly applied are not uncommon, especially when the predictive model building includes a feature selection stage. I was reminded of such a situation while reading this recent Revolution Analytics blog post, where CV is used to assess both the feature selection …

Oracle R Enterprise 1.4: ore.make.names does not work for Oracle DB connections

Christos - Iraklis Tsatsoulis Oracle R 0 Comments

I have reported in the past about some unexpected behavior issues of Oracle R Enterprise 1.4 ore.make.names function; nevertheless, back then I had only tried it with Hive connections. I tried to use it today with an Oracle database connection, and it doesn’t seem to work. Here is a reproducible example in Oracle Big Data Lite VM 4.2.1, using the …