METHODS

Author interests were vectorized using word2vec with a window size of len(interests) so as to specify that word order is irrelevant. Author institutions were categorically encoded using a dictionary of CUNY institutions plus the 'External' label for co-authors external to CUNY. For purposes of dimensionality reduction, the graph was embedded

using the FastRP graph projection algorithm. Several graph algorithms were run: among them link prediction, community detection, and betweenness centrality. Future work will investigate author interests within communities, as well as the evolution of communities over time (using the years of co-publications).

Community Detection

Distinct communities of authors were detected using the Louvain method with an edge weighting parameter (the number of co-authored publications two authors share). Future work will investigate shared interests within communities as well as change in communities over time, using the years of publication.

Betweenness Centrality

Betweenness centrality was calculated for all authors using Brandes’ algorithm. The authors most influential to the flow of ideas between distant communities are displayed (the average and mode of the network’s betweenness score were both ~500, while these authors have a betweenness score of over 10,000).

Link Prediction

Link prediction was run to predict future collaborations based on author’ communities and interests (vectorized with word2vec), following a FastRP graph embedding to reduce dimensionality. Many of these predictions are existing collaborations that were missing from Google Scholar.