An artificial intelligence algorithm developed by a Meta researcher will attempt to reduce the gender gap in Wikipedia articles. Angela Fan, an AI expert, created a model capable of generating biographies of women based on information from the web and writing them in the encyclopedia format.
Using artificial intelligence, the model searches for relevant information about the person on the internet, builds a biography and integrates a citation system that links to sources. According to Fan, the system is a response to the lack of representation on Wikipedia.
Of all the biographies found in the encyclopedia, barely a fifth corresponds to women. A Wikimedia report revealed that 15% of editors are women, and that white men from Europe and North America make up the majority of Wikimedians.
This is important, since it influences the publication of biographies and other Wikipedia articles.
How do you write a Wikipedia biography using artificial intelligence?
WikiSum process for creating a Wikipedia biography. Image: Goal The algorithm captures relevant information about the person, writes the paragraph and integrates the citations to link to the source. The model is based on the structure of a biography on Wikipedia (Early Years, Education, Career, Awards, etc.) and reproduces each section.
The information is obtained from the content present in the first 10 Google results. According to the researcher, the generation of the text by section uses a caching mechanism similar to Transformer-XL, a machine learning model that allows the understanding of natural language beyond a fixed-length context.
The model is not the definitive solution to reduce the gender gap, since it has its limitations. According to Fan, when evaluating its performance, they found that 68 percent of the text generated in the biography was not found in the reference text.
Example of a biography created with WikiSum. The text in orange is a “hallucination” and cannot be verified. Image: Goal.After reviewing the content, they discovered that many sentences were partially verifiable, while others — considered “hallucinations” — cannot be fully verified.
An open source model to reduce the gender gap
The dataset is open source and includes 1,527 biographies distributed by region and interests. The model represents a starting point for creators and verifiers to publish more biographies of women in the encyclopedia.
Something worth mentioning is that the algorithm not only deals with a lack of representation, but also the absence of content about important women on the web. According to the researcher, the current articles do not have enough information, or prioritize her personal life before her achievements.
If this pattern is not changed when writing the original article, the algorithm will learn and replicate this bad practice.