top of page

SCRAPING

Scraping was done using SerpAPI via a Python script. The first loop sends a query to search for all authors who have either Math or Computer Science in their title and a cuny.edu email address, and collects their IDs. Unfortunately, Google Scholar does not provide a way to search for authors by institution and department directly, and so any CUNY authors whose profile did not match this pattern were missed. The loop goes on to collect all articles the authors of interest published, and creates a dictionary of co-author IDs using the list of authors (this was done because Google's native co-authors list is very incomplete).

​

After IDs are collected for all the authors and co-authors of interest in the first loop, the second loop collects information about authors' interests and preprocesses some of the data before returning a dataframe.

bottom of page