Inferring Gender from Author Names with Local LLMs: A Multi-Model Evaluation

Herrero Solana, Víctor; González-Salmón, Elvira; Robinson García, Nicolás

dc.contributor.author	Herrero Solana, Víctor
dc.contributor.author	González-Salmón, Elvira
dc.contributor.author	Robinson García, Nicolás
dc.date.accessioned	2026-02-12T07:44:51Z
dc.date.available	2026-02-12T07:44:51Z
dc.date.issued	2026
dc.identifier.citation	Herrero Solana, V.; González-Salmón, E. y Robinson García, N. (2026). Inferring Gender from Author Names with Local LLMs: A Multi-Model Evaluation.	es_ES
dc.identifier.uri	https://hdl.handle.net/10481/110902
dc.description.abstract	Gender identification of researchers is a common practice in scientometric studies examining inequalities in science. The most widely used approach relies on inferring gender from author names using commercial APIs or name-gender dictionaries, which often lack transparency and reproducibility. This study explores the use of local open-weight Large Language Models (LLMs) as an alternative for name-based gender classification. We evaluate 25 models from seven leading families (Llama, Gemma, Phi, Mistral, Qwen, DeepSeek, and Yi), ranging from 270 million to 70 billion parameters, using a reference dataset of nearly 200,000 names across 195 countries extracted from Wikidata. Results show that top-performing models achieve F1-Scores above 0.93 for both gender categories, positioning local LLMs as a viable, cost-effective, and reproducible alternative to proprietary tools. A critical performance threshold emerges at approximately 7 billion parameters, above which all models achieve acceptable results, with diminishing returns beyond 12-14 billion. All models exhibit systematic gender bias, showing higher precision for men and higher recall for women, indicating a tendency to classify ambiguous names as male. Mistral-Nemo-12b emerges as the optimal choice, balancing accuracy, computational efficiency, and gender equity.	es_ES
dc.language.iso	eng	es_ES
dc.rights	Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License	es
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/	es
dc.subject	Generative AI	es_ES
dc.subject	Local Large Language Models	es_ES
dc.subject	Gender Assignment Algorithms	es_ES
dc.title	Inferring Gender from Author Names with Local LLMs: A Multi-Model Evaluation	es_ES
dc.type	journal article	es_ES
dc.rights.accessRights	open access	es_ES
dc.type.hasVersion	SMUR	es_ES
dc.identifier.url	https://zenodo.org/records/18610104

Ficheros en el ítem

Nombre:: manuscript.pdf
Tamaño:: 809.0Kb
Formato:: PDF
Descripción:: Artículo Principal

Este ítem aparece en la(s) siguiente(s) colección(ones)

EC3 - Artículos

Mostrar el registro sencillo del ítem

Excepto si se señala otra cosa, la licencia del ítem se describe como Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License