Spark DataFrame is a great way to do data analytics over big data, and it has many similar (not slightly different) APIs like the well-adopted python package: Pandas. Recently, I have been working with both of them quite frequently and I found it is very easy to misuse one with another.
Here are several great posts about the comparison between Pandas and Spark DF:
@chris_bour/6-differences-between-pandas-and-spark-dataframes
from-pandas-to-apache-sparks-dataframe
pandarize-spark-dataframes
No comments:
Post a Comment