Alias | Papers | StartYear | Stirling Diversity Index | h-index
Alias is a letter+number, to protect our subjects, papers is numbers of publications, StartYear is their earliest publication, the Stirling Diversity Index is from the last post, and the h-index is a common measure of academic success, where H is the highest number of H papers that have been cited at least H times. So for example, the h-index of a person's who publications have been cited (7,6,3,1,1,0) is 3, from the papers that have been cited 7,6, and 3 times.
I created an anonymized data file, 3-scientometrics101_EDA.csv, to use going forward.
And as always, the first step is to just take a look at the data with .head(), .describe(), and a quick pairplot. One thing to note is a few of the people we sampled aren't in the dataset at all. They didn't have any papers indexed by Web of Science.
Or, since this is a proof-of-concept, press on regardless.