Skip to main content

Does a song with a long title have a longer duration? A PySpark lesson

Tutorials python spark data analytics

Following the previous post on basic ML with PySpark, we continue with a tutorial on how to run day-to-day data analytics. In this case, we explore the FMA dataset trying to answer if there is a correlation between the title of a song and its duration.

I think this could be a nice example of how to use some data transformations with PySpark to run a data analysis.

The corresponding Jupyter Notebook is available here. If you want to fork the repo:

git clone https://github.com/juanmanuel-tirado/pyspark-tutorial

Additionally, you can see the rendered version below.

Thanks for reading,

Click here to view this notebook in full screen

Related

PySpark MLlib tutorial
Tutorials python spark machinelearning data
Graph processing: a problem with no clear victor
·3 mins
Data science Graphs Opinion datascience graphs opinion spark tensorflow
Covid19 spreading in a networking model (part II)
·11 mins
Data science Graphs covid19 datascience graph-tool graphs plotly python
Covid19 spreading in a networking model (part I)
·13 mins
Data science covid19 datascience graph-tool graphs networks pandas plotly python
Covid-19 forecasting
·15 mins
Data science covid19 forecasting matplotlib pandas python statsmodels time series