Wednesday, March 2, 2016

Pandas and Spark DataFrame

Spark DataFrame is a great way to do data analytics over big data, and it has many similar (not slightly different) APIs like the well-adopted python package: Pandas. Recently, I have been working with both of them quite frequently and I found it is very easy to misuse one with another.

Here are several great posts about the comparison between Pandas and Spark DF:

@chris_bour/6-differences-between-pandas-and-spark-dataframes

from-pandas-to-apache-sparks-dataframe

pandarize-spark-dataframes

No comments:

Post a Comment