Graph analysis of Stack Overflow tags with Oracle PGX – Part 2: Incremental Updates

Panagiotis Konstantinidis Data Engineering, Oracle NoSQL, Oracle PGX Leave a Comment

This is the second part of our three-part blog post series (see the first part here), which deals with incremental data updates. In our scenario we assume that we acquire small batches of data updates using some kind of web scraping mechanism. We will not deal with the details of that mechanism, as it is beyond the scope of this …

Graph analysis of Stack Overflow tags with Oracle PGX – Part 1: Data Engineering

Panagiotis Konstantinidis Big Data, Data Engineering, Oracle NoSQL, Oracle PGX Leave a Comment

Intoduction Oracle Parallel Graph Analytics (PGX) is a toolkit for graph analysis, both for running algorithms such as PageRank and for performing SQL-like pattern-matching against graphs. Extreme performance is offered through algorithm parallelization, and graphs can be loaded from a variety of sources such as flat files, SQL and NoSQL databases etc. So, in order to get a deeper feeling, …

Enabling the Green-Marl compiler for Parallel Graph Analytics in Oracle Big Data Lite VM

Panagiotis Konstantinidis Oracle Big Data Spatial & Graph Leave a Comment

Recently, I began working with Parallel Graph Analytics (PGX) on my Oracle Big Data Lite (BDL) VM version 4.7.0.1. I was especially intrigued and curious about the capabilities of a PGX component called Green-Marl (GM), a domain-specific language specially designed for graph data analysis. It was stated to extend PGX’s capabilities and “implement algorithms with no limit”. Especially the last argument …

Generative Adversarial Networks (GANs) by Udacity – The complete YouTube playlist

Christos - Iraklis Tsatsoulis Data Science, Deep Learning 1 Comment

Have you heard of Generative Adversarial Networks (GANs)? If you are in the machine learning & data science business, you should – it has been argued that GANs will change the world, and Yann LeCun, one of the pioneers of modern deep learning and currently Director of Artificial Intelligence Research at Facebook, has called them “the coolest idea in deep learning …

Streaming data from Raspberry Pi to Oracle NoSQL via Node-RED

Christos - Iraklis Tsatsoulis Internet of Things, Node-RED, Oracle NoSQL, Raspberry Pi Leave a Comment

Starting from version 4.2, Oracle NoSQL now offers drivers for Node.js and Python, in addition to the existing ones for Java, C, and C++; this is good news for data science people, like myself, since we are normally not accustomed to code in Java or C/C++. So, I thought to build a short demo project, putting into test both the …

Workaround for disclosed af:query issue on ADF 12.2.1

Michael Koniotakis Oracle ADF Leave a Comment

While migrating ADF applications from 11 to 12 we stepped on a known bug in ADF 12.2.1 Bug 22469635 : AF:QUERY COMPONENT BUTTONS DON’T WORK WHEN “DISCLOSED” SET TO “FALSE” Since this is not resolved yet we needed a workaround. There were search criteria with disclosed=”false” in pages with rich layout that the user should expand it only if he …

sparklyr: a test drive on YARN

Christos - Iraklis Tsatsoulis R, Spark 2 Comments

sparklyr is a new R front-end for Apache Spark, developed by the good people at RStudio. It offers much more functionality compared to the existing SparkR interface by Databricks, allowing both dplyr-based data transformations, as well as access to the machine learning libraries of both Spark and H2O Sparkling Water. Moreover, the latest RStudio IDE v1.0 now offers native support …

Nonlinear regression using Spark – Part 2: sum-of-squares objective functions

Constantinos Voglis Data Science, Spark 4 Comments

This post is the second one in a series that discusses algorithmic and implementation issues about nonlinear regression using Spark. In the previous post we identified a small window for contribution into Spark MLlib by adding methods for nonlinear regression, starting with the definition and implementation of a general nonlinear model. We remind the reader that regression is essentially an …

Classification in Spark 2.0: “Input validation failed” and other wondrous tales

Christos - Iraklis Tsatsoulis Data Science, Spark 6 Comments

Spark 2.0 has been released since last July but, despite the numerous improvements and new features, several annoyances still remain and can cause headaches, especially in the Spark machine learning APIs. Today we’ll have a look at some of them, inspired by a recent answer of mine in a Stack Overflow question (the question was about Spark 1.6 but, as …