Then what is the first chord? The beginning is labelled "Start" What does this mean? Either less than 25% of songs start on the 1 (no way this is possible) or the second chord is mislabeled as a chord change when it stays the same or those with a 1 in the second column started on a chord other than the root and instead of listing that chord, the data is thrown away. Something is messed up here.
> Very lazily, I just normalized all probabilities across each transition so that each transition “mega bar” is kind-of the same height. I’m sure there’s a better way to do it, the community is invited to improve!
The size of the bars is normalized, not necessarily indicative of probabilities. That's why I isn't any bigger than the rest.
The 1 is the key, which is to say, the lowest note played in the progression. If it's less than 25% then that means that 4-chord progressions frequently do not choose their minimum as the first note.
If you were a musician you'd know that in popular music the vast majority start on the 1 chord. Classical music probably does this less frequently, but it's nowhere around 1/5 of the time.
Someone asked this on reddit[1]. The answer the author gave was: "the "start" is whatever the api gave for each song, which in turn is the first chord that was entered by the user that uploaded that song, which is of course, not standardized. So it's for sure not the start of songs, or the bridge or whatever... it's this and that."
The way I interpreted it, the first chord starts at the second column. The "start" state is necessary so you can see the distribution of songs among the first chords.
This would mean that less than 25% of songs start on the root chord. That can't be true. I would expect a distribution of about 90% 1 chord with a smattering of others taking up the rest.