Does a song with a long title have a longer duration? A PySpark lesson
Tutorials
python
spark
data
analytics
Following the previous post on basic ML with PySpark, we continue with a tutorial on how to run day-to-day data analytics. In this case, we explore the FMA dataset trying to answer if there is a correlation between the title of a song and its duration.
I think this could be a nice example of how to use some data transformations with PySpark to run a data analysis.
The corresponding Jupyter Notebook is available here. If you want to fork the repo:
git clone https://github.com/juanmanuel-tirado/pyspark-tutorial
Additionally, you can see the rendered version below.
Thanks for reading,
Related
PySpark MLlib tutorial
Tutorials
python
spark
machinelearning
data
Graph processing: a problem with no clear victor
·3 mins
Data science
Graphs
Opinion
datascience
graphs
opinion
spark
tensorflow
Covid19 spreading in a networking model (part II)
·11 mins
Data science
Graphs
covid19
datascience
graph-tool
graphs
plotly
python
Covid19 spreading in a networking model (part I)
·13 mins
Data science
covid19
datascience
graph-tool
graphs
networks
pandas
plotly
python
Covid-19 forecasting
·15 mins
Data science
covid19
forecasting
matplotlib
pandas
python
statsmodels
time series