Voice corpora are not the only male-biased databases we’re using to produce what turn out to be male-biased algorithms. Text corpora (made up of a wide variety of texts from novels, to newspaper articles, to legal textbooks) are used to train translation software, CV-scanning software, and web search algorithms. And they are riddled with gendered data gaps. Searching the BNC34 (100 million words from a wide range of late twentieth-century texts) I found that female pronouns consistently appeared at around half the rate of male pronouns.35 The 520-million-word Corpus of Contemporary American English (COCA) also has a 2:1 male to female pronoun ratio despite including texts as recent as 2015.36 Algorithms trained on these gap-ridden corpora are being left with the impression that the world actually is dominated by men.
Image datasets also seem to have a gender data gap problem: a 2017 analysis of two commonly used datasets containing ‘more than 100,000 images of complex scenes drawn from the web, labeled with descriptions’ found that images of men greatly outnumber images of women.37 A University of Washington study similarly found that women were under-represented on Google Images across the forty-five professions they tested, with CEO being the most divergent result: 27% of CEOs in the US are female, but women made up only 11% of the Google Image search results.38 Searching for ‘author’ also delivered an imbalanced result, with only 25% of the Google Image results for the term being female compared to 56% of actual US authors, and the study also found that, at least in the short term, this discrepancy did affect people’s views of a field’s gender proportions. For algorithms, of course, the impact will be more long term.
As well as under-representing women, these datasets are misrepresenting them. A 2017 analysis of common text corpora found that female names and words (‘woman’, ‘girl’, etc.) were more associated with family than career; it was the opposite for men.39 A 2016 analysis of a popular publicly available dataset based on Google News found that the top occupation linked to women was ‘homemaker’ and the top occupation linked to men was ‘Maestro’.40 Also included in the top ten gender-linked occupations were philosopher, socialite, captain, receptionist, architect and nanny – I’ll leave it to you to guess which were male and which were female. The 2017 image dataset analysis also found that the activities and objects included in the images showed a ‘significant’ gender bias.41 One of the researchers, Mark Yatskar, saw a future where a robot trained on these datasets who is unsure of what someone is doing in the kitchen ‘offers a man a beer and a woman help washing dishes’.42
These cultural stereotypes can be found in artificial intelligence (AI) technologies already in widespread use. For example, when Londa Schiebinger, a professor at Stanford University, used translation software to translate a newspaper interview with her from Spanish into English, both Google Translate and Systran repeatedly used male pronouns to refer to her, despite the presence of clearly gendered terms like ‘profesora’ (female professor).43 Google Translate will also convert Turkish sentences with gender-neutral pronouns into English stereotypes. ‘O bir doktor,’ which means ‘S/he is a doctor’ is translated into English as ‘He is a doctor’, while ‘O bir hemsire (which means ‘S/he is a nurse’) is rendered ‘She is a nurse’. Researchers have found the same behaviour for translations into English from Finnish, Estonian, Hungarian and Persian.
The good news is that we now have this data – but whether or not coders will use it to fix their male-biased algorithms remains to be seen. We have to hope that they will, because machines aren’t just reflecting our biases. Sometimes they are amplifying them – and by a significant amount. In the 2017 images study, pictures of cooking were over 33% more likely to involve women than men, but algorithms trained on this dataset connected pictures of kitchens with women 68% of the time. The paper also found that the higher the original bias, the stronger the amplification effect, which perhaps explains how the algorithm came to label a photo of a portly balding man standing in front of a stove as female. Kitchen > male pattern baldness.
James Zou, assistant professor of biomedical science at Stanford, explains why this matters. He gives the example of someone searching for ‘computer programmer’ on a program trained on a dataset that associates that term more closely with a man than a woman.44 The algorithm could deem a male programmer’s website more relevant than a female programmer’s – ‘even if the two websites are identical except for the names and gender pronouns’. So a male-biased algorithm trained on corpora marked by a gender data gap could literally do a woman out of a job.
But web search is only scraping the surface of how algorithms are already guiding decision-making. According to the Guardian 72% of US CVs never reach human eyes,45 and robots are already involved in the interview process with their algorithms trained on the posture, facial expressions and vocal tone of ‘top-performing employees’.46 Which sounds great – until you start thinking about the potential data gaps: did the coders ensure that these top-performing employees were gender and ethnically diverse and, if not, does the algorithm account for this? Has the algorithm been trained to account for socialised gender differences in tone and facial expression? We simply don’t know, because the companies developing these products don’t share their algorithms – but let’s face it, based on the available evidence, it seems unlikely.
AI systems have been introduced to the medical world as well, to guide diagnoses – and while this could ultimately be a boon to healthcare, it currently feels like hubris.47 The introduction of AI to diagnostics seems to be accompanied by little to no acknowledgement of the well-documented and chronic gaps in medical data when it comes to women.48 And this could be a disaster. It could, in fact, be fatal – particularly given what we know about machine learning amplifying already-existing biases. With our body of medical knowledge being so heavily skewed towards the male body, AIs could make diagnosis for women worse, rather than better.
And, at the moment, barely anyone is even aware that we have a major problem brewing here. The authors of the 2016 Google News study pointed out that not a single one of the ‘hundreds of papers’ about the applications for word-association software recognised how ‘blatantly sexist’ the datasets are. The authors of the image-labelling paper similarly noted that they were ‘the first to demonstrate structured prediction models amplify bias and the first to propose methods for reducing this effect’.
Our current approach to product design is disadvantaging women. It’s affecting our ability to do our jobs effectively – and sometimes to even get jobs in the first place. It’s affecting our health, and it’s affecting our safety. And perhaps worst of all, the evidence suggests that when it comes to algorithm-driven products, it’s making our world even more unequal. There are solutions to these problems if we choose to acknowledge them, however. The authors of the women = homemaker paper devised a new algorithm that reduced gender stereotyping (e.g. ‘he is to doctor as she is to nurse’) by over two-thirds, while leaving gender-appropriate word associations (e.g. ‘he is to prostate cancer as she is to ovarian cancer’) intact.49 And the authors of the 2017 study on image interpretation devised a new algorithm that decreased bias amplification by 47.5%.
CHAPTER 9
A Sea of Dudes
When Janica Alvarez was trying to raise funds for her tech start-up Naya Health Inc. in 2013, she struggled to get investors to take her seriously. In one meeting, ‘investors Googled the product and ended up on a porn site. They lingered on the page and started cracking jokes’, leaving Alvarez feeling like she was ‘in the middle of a fraternity’.1 Other investors were ‘too grossed out to touch her product or pleaded ignorance’, with one male investor saying ‘I’m not touching that; that’s disgusting.’2 And what was this vile, ‘disgusting’ and incomprehensible product Alvarez was pitching? Reader, it was a breast pump.