Streaming data from Raspberry Pi to Oracle NoSQL via Node-RED

Christos - Iraklis Tsatsoulis Internet of Things, Node-RED, Oracle NoSQL, Raspberry Pi Leave a Comment

Starting from version 4.2, Oracle NoSQL now offers drivers for Node.js and Python, in addition to the existing ones for Java, C, and C++; this is good news for data science people, like myself, since we are normally not accustomed to code in Java or C/C++. So, I thought to build a short demo project, putting into test both the …

sparklyr: a test drive on YARN

Christos - Iraklis Tsatsoulis R, Spark 2 Comments

sparklyr is a new R front-end for Apache Spark, developed by the good people at RStudio. It offers much more functionality compared to the existing SparkR interface by Databricks, allowing both dplyr-based data transformations, as well as access to the machine learning libraries of both Spark and H2O Sparkling Water. Moreover, the latest RStudio IDE v1.0 now offers native support …

Classification in Spark 2.0: “Input validation failed” and other wondrous tales

Christos - Iraklis Tsatsoulis Data Science, Spark 7 Comments

Spark 2.0 has been released since last July but, despite the numerous improvements and new features, several annoyances still remain and can cause headaches, especially in the Spark machine learning APIs. Today we’ll have a look at some of them, inspired by a recent answer of mine in a Stack Overflow question (the question was about Spark 1.6 but, as …

Installing the additional R packages in Oracle Big Data Lite VM 4.5.0

Christos - Iraklis Tsatsoulis R 2 Comments

Oracle has just released version 4.5.0 of the Big Data Lite VM which, when it comes to R, still suffers from the issues we had pinpointed for the previous version 4.4.0 (and then some). The first attempt to install the additional packages fails with a ‘cannot open URL’ error: Fortunately, the warning about the proxy helps to locate the issue, …

Bulk load data to HBase in Oracle Big Data Appliance

Christos - Iraklis Tsatsoulis Big Data, HBase 1 Comment

I ran into an issue recently, while trying to bulk load some data to HBase in Oracle Big Data Appliance. Following is a reproducible description and solution using the current version of Oracle Big Data Lite VM (4.4.0). Enabling HBase in Oracle Big Data Lite VM (Feel free to skip this section if you do not use Oracle Big Data …

Installing the additional R packages in Oracle Big Data Lite VM 4.4.0

Christos - Iraklis Tsatsoulis R Leave a Comment

In the just-released version 4.4.0 of Oracle Big Data Lite VM, as in the previous one (4.3.0.1), there is a rather large number of additional R packages to be installed by the provided script install_additional_packages.sh, i.e. 28 packages without counting their dependencies (the respective number in version 4.2.1 was only 10). Unfortunately, what has also changed is the form of …

Using ROracle with Oracle Instant Client 12c

Christos - Iraklis Tsatsoulis Oracle R, R Leave a Comment

The other day, while setting up the new Oracle R Enterprise (ORE) 1.5 client packages in a Linux server, we installed the Oracle DB Instant Client v. 12.1, as advised in the relevant documentation. Problem was, ORE failed to load, in fact due to ROracle failure: Truth is, the file libclntsh.so.11.1 did not exist, but this was expected, simply due …

Querying Big Data SQL tables with Oracle R Enterprise

Christos - Iraklis Tsatsoulis Big Data, Oracle Big Data SQL, Oracle R 1 Comment

I was wondering recently if I could use Oracle R Enterprise (ORE) to query Big Data SQL tables (i.e. Oracle Database external tables based on HDFS or Hive data), since I have never seen such a combination mentioned in the relevant Oracle documentation and white papers. I am happy to announce that the answer is an unconditional yes. In this …