Facebook is embarking on a major overhaul of its algorithms that detect hate speech, according to internal documents, reversing years of so-called “race-blind” practices.
Those practices resulted in the company being more vigilant about removing slurs lobbed against white users while flagging and deleting innocuous posts by people of color on the platform.
The overhaul, which is known as the WoW Project and is in its early stages, involves re-engineering Facebook’s automated moderation systems to get better at detecting and automatically deleting hateful language that is considered “the worst of the worst,” according to internal documents describing the project obtained by The Washington Post. The “worst of the worst” includes slurs directed at Black people, Muslim people, people of more than one race, the LGBTQ community, and Jewish people, according to the documents.
As one way to assess severity, Facebook assigned different types of attacks numerical scores weighted based on their perceived harm. For example, the company’s systems would now place a higher priority on automatically removing statements such as “Gay people are disgusting” than “Men are pigs.”
Facebook has long banned hate speech — defined as violent or dehumanizing speech— based on race, gender, sexuality, and other protected characteristics. It owns Instagram and has the same hate speech policies there. But before the overhaul, the company’s algorithms and policies did not make a distinction between groups that were more likely to be targets of hate speech versus those that have not been historically marginalized. Comments like “White people are stupid” were treated the same as anti-Semitic or racist slurs.
In the first phase of the project, which was announced internally to a small group in October, engineers said they had changed the company’s systems to deprioritize policing contemptuous comments about “whites,” “men” and “Americans.” Facebook still considers such attacks to be hate speech, and users can still report it to the company. However, the company’s technology now treats them as “low-sensitivity” — or less likely to be harmful — so that they are no longer automatically deleted by the company’s algorithms. That means roughly 10,000 fewer posts are now being deleted each day, according to the documents.
The shift is a response to a racial reckoning within the company as well as years of criticism from civil rights advocates that content from Black users is disproportionately removed, particularly when they use the platform to describe experiences of discrimination.
Some civil rights advocates said the change was overdue.
“To me this is confirmation of what we’ve been demanding for years, an enforcement regime that takes power and historical dynamics into account,” said Arisha Hatch, vice president at the civil rights group Color of Change, who reviewed the documents on behalf of The Post but said she did not know about the changes.
“We know that hate speech targeted towards underrepresented groups can be the most harmful, which is why we have focused our technology on finding the hate speech that users and experts tell us is the most serious,” said Facebook spokeswoman Sally Aldous. “Over the past year, we’ve also updated our policies to catch more implicit hate speech, such as content depicting Blackface, stereotypes about Jewish people controlling the world, and banned Holocaust denial.”
Because describing experiences of discrimination can involve critiquing white people, Facebook’s algorithms often automatically removed that content, demonstrating the ways in which even advanced artificial intelligence can be overzealous in tackling nuanced topics.
“We can’t combat systemic racism if we can’t talk about it, and challenging white supremacy and white men is an important part of having dialogue about racism,” said Danielle Citron, a law professor specializing in free speech at Boston University Law School, who also reviewed the documents. “But you can’t have the conversation if it is being filtered out, bizarrely, by overly blunt hate speech algorithms.”
In addition to deleting comments protesting racism, Facebook’s approach has at times resulted in a stark contrast between its automated takedowns and users’ actual reports about hate speech. At the height of the nationwide protests in June over the killing of George Floyd, an unarmed Black man, for example, the top three derogatory terms Facebook’s automated systems removed were “white trash,” a gay slur, and “cracker,” according to an internal chart obtained by The Post and first reported by NBC News in July. During that time period, slurs targeted at people in marginalized groups, including Black people, Jewish people, and transgender people, were taken down less frequently.
As protests over Floyd’s death sparked national soul searching in June, Facebook employees raged against the company’s choices to leave up racially divisive comments by President Trump, who condemned protesters. They also debated the limits of personal expressions of solidarity, like allowing Black Lives Matter and Blue Lives Matter slogans as people’s internal profile pictures. Black employees met with senior executives to express frustration over the company’s policies.
In July, Facebook advertisers organized a high-profile boycott over civil rights issues, which put pressure on the company to improve its treatment of marginalized groups. It was also bitterly criticized by its own independent auditors in a searing civil rights report, which found Facebook’s hate speech policies to be a “tremendous setback” when it came to protecting its users of color. More than a dozen employees have quit in protest over the company’s policies on hate speech. An African American manager filed a civil rights complaint against the company in July, alleging racial bias in recruiting and hiring.
Complaints by Black users continue, with some saying they are seeing posts removed with increased frequency even as the WoW project gets underway.
In one instance in November, Facebook-owned Instagram removed a post from a man who asked his followers to “Thank a Black woman for saving our country,” according to screenshots posted by social media users at that time. The user received a notice that said “This Post Goes Against Our Community Guidelines” on hate speech, according to the screenshots.