If you thought IBM using “quietly scraped” Flickr images to train facial recognition systems was bad, it gets worse. Our research, which will be reviewed for publication this summer, indicates that the U.S. government, researchers, and corporations have used images of immigrants, abused children, and dead people to test their facial recognition systems, all without consent. The very group the U.S. government has tasked with regulating the facial recognition industry is perhaps the worst offender when it comes to using images sourced without the knowledge of the people in the photographs.

The National Institute of Standards and Technology, a part of the U.S. Department of Commerce, maintains the Facial Recognition Verification Testing program, the gold standard test for facial recognition technology. This program helps software companies, researchers, and designers evaluate the accuracy of their facial recognition programs by running their software through a series of challenges against large groups of images (data sets) that contain faces from various angles and in various lighting conditions. NIST has multiple data sets, each with a name identifying its provenance, so that tests can include people of various ages and in different settings. Scoring well on the tests by providing the fastest and most accurate facial recognition is a massive boost for any company, with both private industry and government customers looking at the tests to determine which systems they should purchase. In some cases, cash prizes as large as $25,000 are awarded. With or without a monetary reward, a high score on the NIST tests essentially functions as the tech equivalent of a Good Housekeeping seal or an “A+” Better Business Bureau rating. Companies often tout their test scores in press releases. In recognition of the organization’s existing market approval role, a recent executive order put NIST at the lead of regulatory efforts around facial recognition technology, and A.I. more broadly.

Through a mix of publicly released documents and materials obtained through the Freedom of Information Act, we’ve found that the Facial Recognition Verification Testing program depends on images of children who have been exploited for child pornography; U.S. visa applicants, especially those from Mexico; and people who have been arrested and are now deceased. Additional images are drawn from the Department of Homeland Security documentation of travelers boarding aircraft in the U.S. and individuals booked on suspicion of criminal activity.

When a company, university group, or developer wants to test a facial recognition algorithm, it sends that software to NIST, which then uses the full set of photograph collections to determine how well the program performs in terms of accuracy, speed, storage and memory consumption, and resilience. An “input pattern,” or single photo, is selected, and then the software is run against one or all of the databases held by NIST. For instance, one test, known as the false non-match rate, measures the probability of the software failing to correctly identify a matching face in the database. Results are then posted on an agency leaderboard, where developers can see how they’ve performed relative to other developers. In some respects, this is like more familiar product testing, except that none of the people involved in the testing know about, let alone have consented to, the testing.

Altogether, NIST data sets contain millions of pictures of people. Any one of us might end up as testing material for the facial recognition industry, perhaps captured in moments of extraordinary vulnerability and then further exploited by the very government sectors tasked with protecting the public. Not only this, but NIST actively releases some of those data sets for public consumption, allowing any private citizen or corporation to download, store, and use them to build facial recognition systems, with the photographic subjects none the wiser. (The child exploitation images are not released.) There is no way of telling how many commercial systems use this data, but multiple academic projects certainly do…Read More at