PySpark MLlib tutorial

If you’re familiar with Spark you probably know it offers a Python framework named PySpark that enables developers to use the existing Spark libraries.

I have released a tutorial of the Spark machine learning library (MLlib). This tutorial is not intended to explain any ML theory, although some theory can be found. The tutorial is more a collection of examples around how to manipulate data structures to feed the algorithms implemented by this library.

The corresponding Jupyter Notebook is available here. If you want to fork the repo:

git clone https://github.com/juanmanuel-tirado/pyspark-tutorial

Additionally, you can see the rendered version below.

Click here to view this notebook in full screen

Does a song with a long title have a longer duration? A PySpark lesson

13 February 2024

Tutorials python spark data analytics

Graph processing: a problem with no clear victor

16 July 2020·3 mins

Data science Graphs Opinion datascience graphs opinion spark tensorflow

Covid19 spreading in a networking model (part II)

30 April 2020·11 mins

Data science Graphs covid19 datascience graph-tool graphs plotly python

Covid19 spreading in a networking model (part I)

17 April 2020·13 mins

Data science covid19 datascience graph-tool graphs networks pandas plotly python

Covid-19 forecasting

4 April 2020·15 mins

Data science covid19 forecasting matplotlib pandas python statsmodels time series

Related