>>> from pyspark.mllib.linalg import Vectors >>> x = Vectors.dense([0.0, 1.0, 0.0, 7.0, 0.0]) >>> x DenseVector([0.0, 1.0, 0.0, 7.0, 0.0]) >>> -x Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: func() takes exactly 2 arguments (1 given) >>> y = Vectors.dense([2.0, 0.0, 3.0, 4.0, 5.0]) >>> y DenseVector([2.0, 0.0, 3.0, 4.0, 5.0]) >>> x-y DenseVector([-2.0, 1.0, -3.0, 3.0, -5.0]) >>> -y+x Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: func() takes exactly 2 arguments (1 given) >>> -1*x DenseVector([-0.0, -1.0, -0.0, -7.0, -0.0])
Clearly, what happens is that the unary operator
- (minus) for vectors fails, giving errors for expressions like
x-y behaves as expected. The result of the last operation,
-1*x, although mathematically “correct”, includes minus signs for the zero entries, which again is normally not expected.
I thought I had discovered a bug, so I opened an issue in Spark JIRA, titled Unary operator “-” fails for MLlib vectors.
One hour later, the issue was closed as “Not an issue”.
Joseph K. Bradley, a frequent Spark contributor working for Databricks, who had closed the issue, had commented as follows:
There simply isn’t a unary operation. There are ongoing discussions about turning MLlib vectors and matrices into a full-fledged local linear algebra library, but currently, you could convert to numpy/scipy and use those library for pyspark.
That was a surprise. I had definitely performed a fair amount of searching in the documentation and googling before raising the issue, without results; I tried again, hoping that I would now find some trace of the issue or of the “ongoing discussions” mentioned in Bradley’s comment.
This second search attempt was fruitless, just like the first one; I could not find any mention of this supposedly known issue, even in the recently published academic papers on MLlib and its linalg submodule (Bradley is a co-author of the former).
I commented back in the Spark JIRA:
If this is the case, then a warning/clarification in the documentation wouldn’t hurt – Spark users are not supposed to be aware of the internal “ongoing discussions” between Spark developers (BTW, any relevant link would be very welcome – I could not find any mention in MLlib & Breeze docs, neither in the recent preprint papers on linalg & MLlib).
All in all, I suggest you re-open the issue with a different type (it’s not a bug, as you say), and the required resolution being a notification in the relevant docs (“don’t try this…, because…”).
Fortunately, Bradley considered this as a good point; so, he reopened the issue, changing its title to Document limitations of MLlib local linear algebra.
So, until this clarification finds its way through to the documentation, you now know that Spark MLlib’s local linear algebra types are supposed to provide only simple functionality, without being a full-fledged local linear algebra library.-
- Streaming data from Raspberry Pi to Oracle NoSQL via Node-RED - February 13, 2017
- Dynamically switch Keras backend in Jupyter notebooks - January 10, 2017
- sparklyr: a test drive on YARN - November 7, 2016