Losing Face

How a facial recognition mismatch can ruin your life.

Ava KofmanThe InterceptOctober 13, 2016

It was just after sundown when a man knocked on Steve Talley’s door in south Denver. The man claimed to have hit Talley’s silver Jeep Cherokee and asked him to assess the damage. So Talley, wearing boxers and a tank top, went outside to take a look.

Seconds later, he was knocked to the pavement outside his house. Flash bang grenades detonated, temporarily blinding and deafening him. Three men dressed in black jackets, goggles, and helmets repeatedly hit him with batons and the butts of their guns. He remembers one of the men telling him, “So you like to fuck with my brothers in blue!” while another stood on his face and cracked two of his teeth. “You’ve got the wrong guy,” he remembers shouting. “You guys are crazy.”

Talley was driven to a Denver detention center, where he was booked for two bank robberies — the first on May 14 and the second on September 5, 2014, 10 days before his arrest — and for assaulting an officer during the second robbery.

After surveillance camera images of the September robbery were publicly distributed, three of Talley’s acquaintances called in with tips to the police hotline, noting similarities between Talley’s appearance and the robber’s. A detective then showed photographs of both the May and September robber to Talley’s estranged ex-wife. “That is Steven,” she told him. “That is my ex-husband.”

News

Showtime picked up and re-packaged our story for an episode in the Dark Web.

The identifications justified Talley’s detention, even though he claimed he had been at work as a financial adviser for Transamerica Capital when the May robbery took place. Talley said he was held for nearly two months in a maximum security pod and was released only after his public defender obtained his employer’s surveillance records. In a time-stamped audio recording from 11:12 a.m. on the day of the May robbery, Talley could be heard at his desk trying to sell mutual funds to a potential client. Nine miles north, a white male wearing a black baseball cap, red athletic jacket, white shorts, and black sneakers entered a U.S. Bank, where he threatened the teller, hid $2,475 in his shirt, wrestled with an off-duty officer, and jumped down a flight of 10 stairs to the parking lot. At the same time as Talley was trying to close a deal, parking lot surveillance tapes show the robber tumbling with the officer, escaping his grip, and jogging away.

Talley was released in November, and the charges were apparently dropped. In the months that followed, a series of medical exams revealed that Talley had sustained several injuries on the night of his arrest, including a broken sternum, several broken teeth, four ruptured disks, blood clots in his right leg, nerve damage in his right ankle, and a possibly fractured penis. “I didn’t even know you could break a penis,” he told me.

But while voice recordings had exculpated Talley, an appeal to other, seemingly objective markers of his identity would soon be used to implicate him again. Nearly a year after his release from jail, Talley was arrested a second time on December 10, 2015, and charged with the aggravated bank robbery that had taken place the morning of September 5, 2014.

This time around, Denver prosecutors obtained what looked like damning forensic evidence of their own. The detective assigned to Talley’s case, Jeffery Hart, had requested that an FBI facial examiner manually compare stills from the banks’ grainy surveillance videos to several pictures of Talley — a tall, broad-shouldered white man with short blond hair, mild blue eyes, and a square jaw.

A comparison chart displaying photos of Steve Talley alongside still images from footage of the suspect in the September 2014 robbery.Image: FEDERAL BUREAU OF INVESTIGATION

The FBI analysis concluded that Talley’s face did not match the May robber’s, but that he and the September robber shared multiple corresponding characteristics, including the shape of the head, chin, jaw line, mole marks, and ear features. “The questioned individual depicted” in the September images, the report concluded, “appears to be Talley.”

Except that it wasn’t. Again.

Steve Talley is hardly the first person to be arrested for the errors of a forensic evaluation. More than half of the exonerations analyzed by the Innocence Project have involved cases where forensic experts cited flawed or exaggerated evidence, and in 2009 a landmark paper by the National Academy of Sciences stated what many had long suspected: Apart from DNA testing, no other forensic method could reliably and consistently “demonstrate a connection between evidence and a specific individual or source.”

The report launched the forensic science community into a crisis of interpretation, with many questioning whether its methods should be deemed “sciences” at all. Last year, the FBI announced that virtually all of its hair analysis testimony had been scientifically indefensible, while the Texas Forensic Science Commission recently recommended banning bite-mark evidence from court. In September, the President’s Council of Advisors on Science and Technology issued a report that firmly concluded that forensic techniques relying on visual patterns fell short of scientific standards and relied on the subjective opinions of law enforcement.

But while the accuracy of other visual, pattern-matching methods like blood-splatter analysis has been subject to vigorous public debate, the fallibility of facial comparison, or facial identification, has received less attention.

This may be because comparing facial images can seem like an easy or even intuitive task. “We assume, wrongly, that we are good at recognizing faces,” said David White, an Australian psychologist who researches facial perception. In reality, however, we are for the most part terrible at comparing photographs, video stills, and composite images of unfamiliar faces — and this remains the case even with high-quality, full frontal images.

The photographs of Talley and the suspects were sent to the FBI’s Forensic, Audio, Video and Image Analysis Unit, where trained forensic examiners manually compare points of similarity between faces to help investigators confirm or eliminate the identities of potential suspects. After selecting frames from the video, they work back and forth between evidence from the crime scene and images of their suspect to develop a conclusion regarding the type and number of similarities.

While the FBI has been comparing facial images since at least the 1950s, FAVIAU was formed in 2000 to merge video and analysis units spread across the agency into a main office in Quantico, Virginia. Examiners there compare not only faces but also the voices and heights of suspects submitted by law enforcement for terrorism, homicide, armed robbery, and financial fraud cases, among others. As of 2012, the FBI unit had around five employees comparing faces, all of whom had undergone a two-year training program, but according to a source familiar with the agency’s workings, the unit has since expanded. The methods used by the FBI and other independent examiners typically follow the ACE VR method — which stipulates that examiners analyze, compare, and evaluate the (known) images from the crime scene and the images in question. They then verify and peer review the analysis.

But even this method — undertaken in ideal conditions — remains vulnerable. No threshold currently exists for the number of points of similarity necessary to constitute a match. Even when agencies like the FBI do institute classification guidelines, subjective comparisons have been shown to differ greatly from examiner to examiner. And the appearance of differences, or similarities, between faces can often depend on photographic conditions outside of the examiner’s control, such as perspective, lighting, image quality, and camera angle. Given these contingencies, most analysts do not ultimately provide a judgment as to the identity of the face in question, only as to whether the features that appear to be present are actually there.

“You step back and let the argument be made by the prosecutor,” explained Grant Fredericks, a video analyst who teaches widely and has worked with the Texas Forensic Science Commission. “It is dangerous for a video examiner to tell the court that the person on video is the defendant. If it were that easy, there would be little need for trials in a surveillance society and that’s a frightening thought.”

“As an analytical scientist, whenever someone gives me absolute certainty, my red flag goes up,” said Jason Latham, who worked as a biochemist prior to becoming a forensic scientist and certified video examiner. “When I came from analytical sciences to forensic sciences, I was like some of these guys are not scientists. They are voodoo witchcraft.”

Forensic reports generally provide few details about the methods they use to arrive at points of similarity. But in Talley’s case, the FBI examiner’s report displayed a high degree of certainty. George Reis, a facial examiner who has testified more than 50 times for state, federal, and military courts throughout the country on forensic visual comparisons, pointed out that the report on Talley’s case was vague. “It is generally considered best practice to be specific in reports and to point out features of similarity, as well as differences, in any comparison illustration or chart,” Reis noted. “In the Talley case no such markings exist. The video frames that were used in the FBI illustration were of poor quality and limited value.”

In 2009, following the National Academy of Sciences’ call for stricter scientific standards to underpin forensic techniques, the FBI formed the Facial Identification Scientific Working Group to recommend uniform standards and best practices for the subjective practice of facial comparison. But the working group’s mission soon ran up against an objective difficulty: Like some other forensic sciences, facial comparison lacks a statistical basis from which its conclusions may be drawn.

This is, in part, because no one knows the probability of a given feature’s distinctiveness. As a FAVIAU slide on the “Individualization of People from Images” explained, “Lack of statistics means: conclusions are ultimately opinion-based.” To remedy this flaw, a 2008 FBI report recommended that the agency undertake research to quantify the frequency of facial features. But such efforts, which have been underway since at least the late 19th century, have so far proved inconclusive.

“What is similar enough? Nobody can tell you. It’s in the eye of the beholder,” said Itiel Dror, a cognitive neuroscientist at University College London. “You need to know that if this person has a right nostril bigger than the left nostril, are the chances one out of a million or is it every second person?”

What this means, in practice, is that the likelihood that two different images are of the same person can only be expressed as a subjective estimate. Unlike DNA analysis, the relative certainty of an examiner’s conclusions is determined by the person performing the analysis.

“The examiner must also judge how likely they would be to observe a given feature in the same person, relative to observing [it] in two different people,” explained White. “And one problem I see is that the statistics necessary to make that judgment objectively aren’t available and who knows if they ever will be.”

In the past decade, studies have shown that irrelevant or contextual information about a case can influence a forensic examiner’s conclusions. When people are shown two faces and told they are related, they are more likely to describe them as similar even when the hereditary context is fabricated. And the more ambiguous the quality of the evidence, the more likely such contextual information will influence the expert’s conclusions, according to Dror, who has advised forensic agencies internationally on how to reduce bias. In the case of faces, the FBI’s facial identification working group has acknowledged that “the lower the quality of the image being used in a comparison, the weaker the conclusion that can be drawn.” But in many investigations, where images are collected from surveillance videos without the suspect’s consent, poor quality is unavoidable.

Several forensic experts I consulted noted that the images produced in Talley’s case from both banks’ CCTV footage were subpar, at best. In the stills taken from the banks’ cameras, the suspect’s pose differs significantly and the images, even with enhancement, are blurry. In both Denver robberies, the suspects are captured from a high angle, wearing baseball caps. The September suspect also wears sunglasses — occluding the shape of the head and ears and greatly reducing the viability of the images, according to several examiners.

Varied illumination also presents a challenge to the accuracy of both manual and automated facial comparisons — so much so that changing the illumination can be more misleading than substituting an entirely different face. But the most intractable problem posed by video evidence is compression. Compressing video images into a usable size often results in the removal, corruption, or distortion of the very skin, vein, and mole patterns that examiners use to individuate subjects. Facial marks like freckles or moles are considered to be some of the best candidates for individuation, but they are also the most vulnerable to erasure and distortion when recorded by CCTV. When CCTV systems compress video data, they generate ambiguous dark or light spots on the image; at other times, compression algorithms will generate spots even in the absence of a mole.

Such ambiguity leaves room for suggestive interpretation on the part of experts testifying for both the defense and the prosecution. The FBI examiner relied on four points of similarity between Talley and the September robber in the comparison chart accompanying his report — two of which suggested the presence of moles on the robber’s face. But at the preliminary hearing, Talley’s defense attorney Benjamin Hartford pointed out that the robber caught on camera appeared to lack the distinctive mole Talley has on his right cheek. “If they missed a mole on a guy’s face,” he later told The Intercept, “I don’t know how anyone can trust this.” Detective Hart conceded in his testimony that Talley’s mole was not visible in the images shown of the suspect’s face. At the same time, he explained that the resolution was inevitably lower in the surveillance images, which could affect the visibility of key those details.

In his classes on video analysis, where students have included FAVIAU examiners, Grant Fredericks has advised that examiners verify that marks in the same location appear across multiple images in order to avoid mistaking an artifact or shadow from the video processing for the presence, or absence, of a mole. “I’ve identified moles but only when they’re on multiple images and move with the body,” he said, noting that persistent shadows sometimes get mistaken for moles.

Talley’s case is not alone in raising questions about the reliability of forensic facial identification. In U.K. courts, several cases have unfolded in which expert witnesses using the same techniques came to different opinions on the stand, leading judges to request further research into identification from CCTV footage.

Glenn Porter, a facial comparison expert and researcher who has testified in Australian courts for both the prosecution and the defense, has published extensively on the unreliability of facial identification — especially with CCTV images. His studies have faulted examiners for misunderstanding photographic evidence, deploying highly subjective unstandardized methods, and lacking clear validation. These problems, he writes, “may result in evidence derived from CCTV or other photographic sources being misrepresented, exaggerated or erroneous. …. This situation presents a serious risk of misidentification of persons of interest, which can lead to wrongful convictions.”

“The fact that somebody might look the same on video means nothing,” Fredericks cautioned. “It means nothing. There has to be more consistency.”

The FBI declined to comment on the Talley case specifically or to answer any general questions about FAVIAU’s methods.

At the same time as Detective Jeffrey Hart was working to rebuild the case against Talley, Talley was attempting to rebuild his life. He hadn’t paid rent during his two months in jail, so he was living in homeless shelters. The money he had saved was gone.

Potential employers in the financial industry would express interest in Talley’s application only to rescind offers after conducting a background check. “The charges could not have been worse,” Talley said. Despite a résumé that listed his former positions as a financial consultant and analyst for E-Trade, Curian Capital, and Morgan Stanley, he could not find work. “I think if I had been charged with murder, it would have been easier than being a serial bank robber, because in terms of handling money and being in the financial industry, the fiduciary trust is totally broken.”

Following his release from jail after his first arrest, Talley filed a series of complaints with the Denver Police Department’s internal affairs bureau, seeking justice for what he alleged was a pattern of misconduct and mistreatment. Some of Talley’s complaints against the department appear to be unsubstantiated. His emails to the department were angry and accusatory — deploying multiple fonts, colors, and styles. He left several belligerent voicemails and was often agitated on calls with officers. “You guys are the dumbest cops in the world,” he remembers telling Hart. In a complaint filed in May 2015, he wrote: “I still have not received a single apology from them for the hell they put both me and my family through. I still continue to suffer from the injuries and the consequences of their actions.” Whatever apology Talley was seeking, however, was not forthcoming. Instead, he says he felt “constantly threatened by the Denver police.” On one phone call, Talley recalls Hart telling him, “I’m going to throw your ass back in jail, we’re going to refile.” Hart testified in court that the calls were “contentious” but denied making any threats.

The internal affairs bureau, for its part, investigated and dismissed many of Talley’s claims, but it did confirm three critical allegations.

The first: An after-action report from the Denver Police Department states that the city’s SWAT team saw Talley as an escape risk and notes explicitly that agents used two Noise Flash Diversionary Devices, also known as flashbang grenades, during his arrest. The internal affairs bureau also confirmed that investigators ran a fingerprint left by the May robbery suspect against Talley’s fingerprints — which the police had on file from his job and from two prior DUIs. There was no match between the prints, but Talley was still kept in custody.

Most important, however, were Talley’s allegations concerning the identification process. Talley filed a complaint with the Office of the Independent Monitor, charging that Hart did not follow a standard blind procedure when he personally presented the six-person photographic lineup to Bonita Shipp, the teller who had worked at the September bank. At that time, Shipp identified Talley with 85 percent certainty as the man who had robbed her. Other than Shipp, no other personnel from either bank identified Talley in a line-up.

After noting that allegations against officers “must be proven by a preponderance of the evidence,” the bureau determined that Hart’s decision to show the lineup to Shipp himself, rather than through a blind investigator, was “improper” and not what was “expected of a Denver police officer.” Shipp said that Hart told her he had already arrested Talley when he pointed to him in the lineup. Hart was disciplined and a written note about the misconduct was added to his permanent file by internal affairs.

In between filing complaints and medical visits, Talley was having a rough time. He had a series of run-ins with law enforcement where he was charged with trespassing, disturbing the peace, attempting to influence a public servant, and loitering — charges that he attributed, in part, to the fact that he was now homeless. But when we spoke about his various charges, he could be quick to claim an implausible amount of blamelessness. Trying to get his suits back from the home where he once lived, for instance, he was caught by witnesses kicking down a fence —actions that he later described in milder terms.

In seeking redress for his injuries, Talley was quick-tempered and easily frustrated by what he repeatedly referred to as the “gross incompetence” of the police department. But in TV interviews about his case with local news media, he told his story with calm, considerate eloquence.

It was through this latter route that his story eventually reached Maureen Cain, a program director for the Colorado Criminal Defense Institute who had been looking for local cases of mistaken identity. Cain was working with the Innocence Project to develop best practices for suspect identifications across the country.

“His story was so painful because he was going through a hard time in his life to begin with and you put this arrest on top of it, which was so wrong. It really makes you question our justice system in a fundamental way and you start to think of how many other people suffer like Mr. Talley because of bad identification,” Cain told me. “You think every day they are going to figure out this mistake and they don’t.”

In March 2015, Cain got in touch with Talley to see if he wanted to be part of an effort to legislate that police departments draft and follow written procedures for eyewitness identifications and lineups. Talley agreed to help and later that month, he told the story of his arrest before the House Judiciary Committee. “He came across kind of as a common man,” Cain recalled of his testimony. “And his life was upended so drastically.” Half an hour later, the bill passed the house unanimously.

Cain connected Talley with a pro bono clinic to help people remove arrests without convictions from their records. Talley hoped that clearing his charges by the end of 2015 would make him less of a “persona non grata” to potential employers. Time was of the essence: His financial licenses were set to expire several months later.

Which is why the timing of his second arrest in December 2015 could not have been worse. It made the expungement process that was underway impossible. At best, Talley would now lose his financial licenses. At worst, he would be convicted for a robbery he did not commit.

A comparison chart displaying a photo of Talley alongside a still image from footage of the suspect in the September 2014 robbery.Image: FEDERAL BUREAU OF INVESTIGATION

The preliminary hearing in January, charging Talley with the September robbery, did not go as the prosecution planned. First there was the matter of the analysis of his cellphone — which inconclusively showed two calls during and just after the robbery in the area of both his home and the U.S. Bank. But Talley lived directly around the corner, less than .1 miles away as the crow flies, from the bank he had allegedly robbed. He testified that he missed the first phone call registered by the tower because he had left his phone at home to charge while he was driving to a church food bank to pick up groceries. A sign-up sheet maintained by the food bank shows that a volunteer checked in Talley on September 5, 2014, but it does not specify his exact time of arrival.

The forensic facial comparison analysis was the other piece of new evidence, but its conclusions were still bound up in the complications of Talley’s first arrest: Investigators had originally arrested Talley based on the premise that the robberies were committed by the same person; the facial comparison now stated otherwise, pinning him as the suspect in the second robbery exclusively. Benjamin Hartford, Talley’s lawyer during the case, believed investigators used the forensic analysis to conveniently cover for “the egg on their face” and that the robberies were, in fact, committed by the same stranger.

The FBI’s facial analysis was further called into question in court, when the prosecution’s star witness directly contradicted its conclusions. When Bonita Shipp — the sole witness to the September 5 robbery, who had previously identified Talley based on Hart’s photographic line-up — took the stand, she testified that Talley was not the same man who threatened her and robbed her station.

Shipp and other tellers at the bank had been required to undergo suspect recognition as part of their training. According to the internal bank form tellers fill out after each robbery, Shipp originally described the suspect as 6 feet, 175 pounds, with a slender build. But the man who stood before her, she noted, did not fit this description. Talley stood just under 6 feet 4 inches and weighed between 230 and 250 pounds. He did not, in her opinion, appear to be a slender man.

It wasn’t just Talley’s weight or height that eliminated him, but also his teeth. Shipp recalled that the robber’s teeth were not visible even when he grinned. And in the cross-examination with the prosecutor, Shipp said that she had not previously told anybody about the robber’s hands. “When he reached his hands over the counter,” she told the DA, “I could see through his surgical gloves, and I could — he had like marks on his hands.”

The markings were moles and freckles, which she believed she would recognize if presented again with the robber’s hands. At the hearing, Talley offered to show Shipp his hands, and she examined them. “It’s not him,” she told the courtroom. “It’s not the guy who robbed me.” The prosecutor, Shipp recalled, went slack-jawed.

Shipp later told me that the most remarkable thing about the robber was that his face was completely unremarkable. “I immediately kept trying to identify him but this guy had no moles, no tattoos, his complexion was very clear,” she recalled. “I kept looking at his nose. He had a perfect nose: It wasn’t long, it wasn’t wide, he didn’t have big nostrils. Mr. Talley was a lot taller. And Mr. Talley has a horrible nose — no offense to him — but he has a long hunky nose, and he has a mole in the side of his face.” A mole, she explained, that was missing from the September robber. It was only the sun spots on his hands that distinguished him.

When asked about her initial identification of Talley, Shipp said she had examined Hart’s photographic lineup for a few minutes at the bank, during her workday. “It looked a lot like him. I’ll have to admit that. There were a lot of things that looked like him,” she said, adding that she had told detective at the time she would have preferred to see the suspect in person. When she did see Mr. Talley in court, any doubts left her mind. “Mr. Talley had big broad shoulders and the robber didn’t,” she told me. “He was just a medium-sized guy.”

After Shipp’s testimony, the judge concluded that it was “unlikely” prosecutors would convict Talley. And yet his case remained open.

The Denver District Attorney’s Office released a statement after the preliminary hearing explaining Shipp’s testimony “was a surprise to the prosecutor. … She will now have to assess the case in light of our burden of proof.”

Further proof was not forthcoming. In March, at a second preliminary hearing, FAVIAU examiners compared Talley’s height to that of the September robber and concluded that they differed by three inches. In April, the prosecution announced that they would be dismissing the case against Talley. But in July, before he could resume and complete his expungement process, his financial licenses expired. By then, he had been out of work for more than a year.

Two years after the night that began his ordeal, Talley sued the Denver Police Department, the FBI (which participated in the joint Safe Streets Task Force that arrested him), and the city of Denver on September 14, 2016. “It’s been very stressful. I’ve been somewhat relieved that it’s been finally filed. However, all the media and with the addition of the anniversary date of event has brought me recent ‘flashbacks’ of the incident,” Talley wrote me a few weeks ago. He is seeking $10 million in damages.

In response to a series of questions about the lawsuit’s allegations of police brutality, Hart’s investigation, and departmental corruption, the Denver Police Department declined to comment, stating that “it would be inappropriate to comment on a pending lawsuit out of respect for the legal process. Upon conclusion of the legal proceedings, the department will gladly address any public concerns regarding this matter.”

Steve Talley in a photo from 2011.Image: STEVE TALLEY

It’s not that forensic face analysts are unaware of the pitfalls of their practice. But as expert witnesses with dubious qualifications are often admitted by judges, policing the field’s professionalism often falls to the analysts themselves. Forensic video analyst George Reis, who has been practicing in the field for three decades, recalled working on several cases where the evidence used by another expert to make a positive identification was inadequate and even one case where the expert was not an analyst at all but a plastic surgeon.

The FBI and professional associations like the International Association for Identification and the Law Enforcement and Emergency Services Video Association offer training programs for experts, but such certification is not required to testify in court. “In this field there are a lot of people who practice with absolutely no background or experience or training and have no idea what the necessary conditions for individualization are,” Reis said. When untrained or inexperienced examiners make egregious mistakes, he added, it reflects badly on the field.

While the practice of comparing faces depends on a combination of innate skill and trained expertise, few studies have actually tested the accuracy of trained experts. One study that did test experts found that passport officers performed the same as untrained students — that is, very poorly — even in recognition scenarios that resembled their jobs. Another study, conducted as a response to the National Academy of Sciences report, determined that highly trained members of the FBI’s forensics working group were more accurate on average than untrained counterparts — achieving an average misclassification rate of about 7 percent. But exactly how one achieves greater perceptual expertise is largely unknown.

Scientists have only recently discovered that facial recognition ability exists along a spectrum. Just as there are people who are completely face blind, there are also individuals who wield exceptional, preternatural skill in recognizing faces. The London Metropolitan Police has administered tests to form a selective bureau of officers, the first of its kind, filled with these “super-recognizers.” Many super-recognizers display higher accuracy with images in varied conditions than even the most refined algorithms, and David White, the Australian scientist, has worked with several of them to gain insights into the nature of human recognition abilities. But it’s unclear if other departments will follow the Met’s lead in testing and trusting them.

The forensic comparison and video analysts who spoke with me emphasized the steps they took to guard against bias: limiting their knowledge of the case to only the relevant evidence at hand, securing the original format of the video, admitting when the evidence was insufficient.

“Bias can lead to error if you think you know the right answer and are supposed to know the right answer,” Jason Latham explained. He said that his clients sometimes get frustrated because he avoids hearing prejudicial information before conducting his analysis. In 2015 the National Commission on Forensic Science dictated that fingerprint analysts be provided with only the information necessary to their analysis, but such steps have only taken the form of recommendations for facial examiners. Meanwhile, the Organization of Scientific Area Committees for Forensic Science started work last year to update the Facial Identification Scientific Working Group guidelines and standards. The updated documents have not yet been released.

Given the problems inherent in facial image comparison, replacing human judgments with computer calculations would seem like an obvious solution to avoiding the problems of Talley’s case. Unlike manual facial comparisons, automated facial recognition systems deploy algorithms to search across a database of faces, which are then ranked based on the probability of a correct match. According to a Government Office of Accountability report published in May, the FBI’s Criminal Justice Information Services Division has spent about $55 million on developing face recognition systems since 2010. In addition to signing agreements to access the FBI’s systems, several dozen police departments across the U.S. have started to roll out automated face recognition systems of their own.

The ascendancy of computer recognition for broad queries is all but inevitable: Algorithms can search millions of faces in seconds — a task that would otherwise take multiple humans multiple lifetimes. And while image quality still presents the same challenges for computers as it does for humans, many algorithmic face recognition systems surpass human performance for images with regular lighting and standardized poses.

“Typically, the forensics community relied on experts in a binary way: Is this the same guy or not the same guy?” Akil N. Jain, one of the world’s leading pioneers of face recognition technology, explained in an interview. “The focus has shifted to ‘How can you be so sure? Give us some confidence level.’ The forensic community needs to accept that examiners can make mistakes, and they need to say, ‘How can we avoid that?'” As empirical data supporting forensic opinions are scarce, he hopes that data analysis by large computer systems will support the development of probabilistic conclusions for courtrooms.

But experts warn that the same flawed system that sent Talley to jail will not disappear with the advent of automated recognition. If anything, these flaws may be exacerbated. The reason is practical: The conclusions of any automated system ultimately depend on the judgments of human reviewers to evaluate and verify that the correct subject is present in the computer’s list of possible matches.

“There has been very little consideration of that part of the process,” said White. “And even in reports on the reliability of these systems, it’s very much focused on reliability of the algorithm. But once the algorithm has generated a set of possible matches, a human must adjudicate. And we know that the average human is very bad at this.”

As a 2014 paper by the National Institute of Standards and Technology concluded, “The accuracy with which human reviewers can reliably adjudicate the most-similar faces returned in a large-population one-to-many search remains poorly quantified.” When it was quantified, in a study the next year, White and his co-authors recommended halving the accuracy rates reported by other algorithmic testing. Humans made errors in one out of every two candidate lists.

Such a drastic re-evaluation of the accuracy of current systems would spell bad news for the FBI’s digital systems, which, according to the most recent numbers available, provide a correct match 85 percent of the time. The agency’s face recognition software has access to 411 million images as part of its Next Generation Identification system, a decadelong effort to build the world’s largest database of human identifiers. State, local, and federal law enforcement agencies can search the faces in the database for a range of cases — from DMV fraud to missing persons to immigrations claims. And as researchers have repeatedly shown, the potential for false matches only increases with the size of the dataset: The more faces there are to search, the more prone they are to appear similar and be mistaken for one another. Human examiners are currently not required to adopt a higher threshold of similarity to justify their decisions when looking at candidate lists — even when the likelihood of such similarities increases.

The automated ranking of candidates may also subtly bias examiners, as a study with fingerprint examiners working with automated systems has shown. Itiel Dror has recommended agencies consider randomizing the candidate lists so that examiners will not be biased when comparing faces ranked at the top of the list.

Despite these concerns about accuracy, official standards for human-computer collaborations have yet to be developed. Nicole Spaun, a former FBI image examiner who has published several leading studies on forensic facial identification, has been surprised by how many police departments have installed face recognition systems without also instituting the proper training to go along with them. To this end, Spaun is currently working at MorphoTrak, a biometrics vendor, to develop training so that the users of any computer system are a priority rather than an “afterthought.”

“There’s a lot of people being thrown at face recognition systems who may know how to use a camera but don’t know anything about the science of imaging,” she explained. While the temptation to make a definitive identification is strong, Spaun says that most of her courses involve explaining to people that certainty is rare. “Looking at my own driver’s license photo I barely see the moles that I know are there,” she said. “What I find myself saying to a lot of people is, ‘No you’re not going to be able to positively identify a person.'”

According to an FBI spokesperson, FAVIAU does not plan to replace humans with automated facial recognition searching, but it may use systems to locate “better potential candidate matches than the current subject of the examination, which could provide support for eliminating the current subject from consideration.”

But even with automated facial recognition technology in place, false matches will, according to Jain, continue to be “a valid concern.” “Biometrics systems can make errors so we should be open to someone complaining that they are put in the wrong place at the wrong time,” he said. “The ‘fingerprints don’t lie’ attitude has to change. If someone is claiming that they have a perfect system, that attitude needs to be corrected.”

Jennifer Lynch, an attorney at the Electronic Frontier Foundation who works on face recognition, is concerned that human biases may even exacerbate the errors of technology. The seemingly unassailable combination of human expertise and technology may create legal situations where the burden of proof shifts onto the defendant, she explained. False matches end up forcing citizens to prove that they aren’t who examiners (and, increasingly, their algorithmic partners) say they are. In other words, what happened to Steve Talley could happen to others again and again.

This article was reported in partnership with The Investigative Fund at The Nation Institute, now known as Type Investigations.

About the reporter