Publication patterns (Part 1)
Why are publication patterns relevant for understanding the gender gap in science?
Successful academic careers are strongly tied to a prolific scholarly record. Scientific publications are not only the major outlet for scholarly communication, they are regarded as a proxy for a researcher’s scientific credentials and play a key role in achieving and maintaining a successful career in academia.
<p”>Decisions on tenure and other academic promotions are mostly based on evaluations of the candidate’s research portfolio that pay special attention to research publications like journal articles, grants, conference presentations, and the visibility or renown of the scholar. Thus, the understanding of publication practices in various STEM disciplines, obtained through measurable data on research output, is of great interest to academic institutions, science policy makers, and researchers alike.
Moreover, examining exhaustive data sources gives a full picture of the situation and offers the opportunity to undertake longitudinal studies.
What disciplines have you studied in the publication analyses and what data sources have you used?
The selection of disciplines is limited by the availability of suitable data sources, which must be accessible (preferably via open data or at least operated by a scientific institution), represent a discipline comprehensively and provide sufficiently good data quality. Through established cooperation we gained access to the representative high-quality databases zbMATH for publications in Mathematics and ADS (Astronomy Data System) for literature in Astronomy and Astrophysics. Furthermore, we used the data of the open access e-print archive arXiv to study publication patterns in Theoretical Physics.
In order to better explore the participation of women as authors in well-known journals, we enriched the arXiv data with the database CrossRef. Additionally, we retrieved data from CrossRef for selected renowned chemistry journals, since we had no access to a comprehensive data source for Chemistry.
What are the methods you used to identify female authors?
Bibliographic metadata do not include the authors’ gender, so this information had to be inferred. Usually, an author’s name is the only piece of information that can provide an indication of gender.>
For the present data we have combined responses from different gender assignment services that we had benchmarked as part of the project. As a result of the gender assignment procedure, all author names are tagged as “female”, “male”, or “unknown”.
Plenty of issues arise in connection with Automated Gender Recognition (AGR). Names are not always “uniquely” associated to one gender, which leads to a bias towards certain countries. For instance, authors of Chinese ancestry are more often assigned unknown labels due to loss of “gender marking” during transliteration.
Furthermore, all AGR approaches, building on names or other physiological features, such as facial images or voice, only allow for a binary definition of gender, which fundamentally excludes individuals that do not conform to this societal concept. Despite these issues, we have performed a name-based gender recognition because gender differences can be observed in various aspects of academic life and need to be explained.
How many references did you analyze in the Publication Pattern task?
We analyzed millions of references. To take an example, the zbMATH data set that we used to study publication patterns in mathematics comprises 3,083,185 documents corresponding to 5,273,035 instances of authorship.
Marie-Françoise Roy et Colette Guillopé