Evaluating Facial Recognition Technology

Evaluating Facial Recognition Technology

 

A Protocol for Performance Assessment in New Domains 

 

Introduction

 

Facial recognition technology (FRT), namely the set of computer vision techniques to identify individuals from images, has proliferated throughout society. Individuals use FRT to unlock smartphones, computer appliances, and cars. Retailers use FRT to monitor stores for shoplifters and perform more targeted advertising. Banks use FRT as an identification mechanism at ATMs. Airports and airlines use FRT to identify travelers. 

FRT technology has been used in a range of contexts, including high-stakes situations where the output of the software can lead to substantial effects on a person’s life: being detained overnight at an airport  or being falsely accused of a crime, as was the case for Robert Williams and Michael Oliver.  A 2016 study reports that one out of two Americans are involved in a “perpetual line-up” (i.e., an ongoing virtual police lineup), since local and federal law enforcement regularly perform facial recognition-based searches on their databases to aid in ongoing investigations. Beyond the effects of current use of FRT, widening the deployment of FRT to continuous surveillance of the public has the potential to change our use of public spaces, our expectations of privacy, our sense of dignity, and the right to assemble. 

The widespread use of FRT in high-stakes contexts has led to a loud call to regulate the technology — not only from civil society organizations, but also by the creators and vendors of FRT themselves. IBM, for instance, has discontinued its sale of “general purpose facial recognition software,” stating that “now is the time to begin a national dialogue on whether and how facial recognition technology should be employed by domestic law enforcement agencies,” offering to work with Congress to this end. Amazon initiated a one-year moratorium on police use of its facial recognition technology, calling for “governments [to] put in place stronger regulations to govern the ethical use of facial recognition technology.” Microsoft, too, announced that they will not sell FRT software to police departments "until we have a national law in place, grounded in human rights."

Numerous pieces of state and federal legislation in the US echo this call. Many propose a moratorium on government use of FRT until comprehensive guidelines can be set. One U.S. Senate bill proposes to bar federal agencies and federally funded programs from using FRT. The state of Massachusetts has proposed restricting state usage of FRT, and the City of San Francisco enacted legislation to prohibit municipal departments from using FRT.

All of us support these calls for rigorous reflection about the use of FRT and one common thread throughout nearly all proposed and passed pieces of legislation is a need to understand the accuracy of facial recognition systems, within the exact context of their intended use. The federal Facial Recognition Technology Warrant Act, for example, calls for “independent tests of the performance of the system in typical operational conditions” in order to receive a warrant to use facial recognition for a given task within the government; the Ethical Use of Facial Recognition Act calls for a moratorium on government use of FRT until regulatory guidelines can be established to prevent “inaccurate results”; the State of Washington requires that FRT vendors to enable “legitimate, independent and reasonable tests” for “accuracy and unfair performance differences across distinct subpopulations;” the state of Massachusetts proposes “standards for minimum accuracy rates” as a condition for FRT use in the state. The push for accuracy testing is not unique to the United States. The European Union Agency for Fundamental Rights has similarly emphasized the need to make accuracy assessments for different population groups, and the European Commission emphasizes the need to demonstrate robustness and accuracy with AI systems.

Understanding true in-domain accuracy — that is, accuracy of FRT deployment in a specific context — is crucial for all stakeholders to have a grounded understanding of the capabilities of the technology. FRT vendors require objective, standardized accuracy tests to meaningfully compete based on technological improvements. FRT users require in-domain accuracy to acquire FRT platforms that are of highest value in the posited application. Civil society groups, academics, and the public would benefit from a common understanding of the capabilities and limitations of the technology in order to properly assess risks and benefits. Therefore, we took a concerted effort to examine this specific question of the technology, in hopes of better understanding the operational dynamics in the field.

Although it may seem simple at first glance, understanding performance of facial recognition for a given real-world task — e.g. identifying individuals from stills of closed-circuit television video capture — is not in fact an easy undertaking. Many FRT vendors advertise stunning performance of their software. And to be sure, we have witnessed dramatic advances in computer vision over the past decade, but these claims of accuracy are not necessarily indicative of how the technology will work in the field. The context in which accuracy is measured is often vastly different from the context in which FRT is applied. For instance, FRT vendors may train their images with well-lit, clear images and with proper software usage from machine learning professionals, but during deployment, clients such as law enforcement may use FRT based on live video in police body cameras, later evaluated by officers with no technical training. The accuracy of FRT in one domain does not translate to its uses in other domains —and changing context can significantly impact performance, as is common knowledge in the computer science literature.

One central concern of such cross-domain performance, which has given rise to profound criticisms of FRT, is that models may exhibit sharply different performance across demographic groups. Models trained disproportionately on light-skinned individuals, for instance, may perform poorly on dark-skinned individuals. A leading report, for instance, found that false positive rates varied by factors of 10 to 100 across demographic groups, with such errors being “highest in West and East African and East Asian people, and lowest in Eastern European individuals.” In this White Paper, we characterize this gulf between the contexts in which facial recognition technology is created and deployed as stemming from two sources: domain shifts stemming from data differences across domains and institutional shifts in how humans incorporate FRT output in decisions. We outline concrete, actionable methods to access deployment-domain accuracy of FRT.

In our view, the ability to evaluate the accuracy of FRT is critical to the normative debates surrounding FRT. First, if a system simply does not perform as billed, and if accuracy differs dramatically across demographic groups, poor performance may disqualify an FRT system from use and obviate the need for other normative considerations. Second, performance interacts directly with normative questions. For example, lower accuracy heightens concerns about the cost of misidentification. Higher accuracy, on the other hand, amplifies concerns over surveillance, privacy, and freedom of expression. The central role of accuracy in these debates likely explains why so much proposed legislation has called for rigorous assessments of performance and is why we have tailored this White Paper to the subject.

Of course, many other considerations factor into the adoption of FRT. Concerns over privacy, consent, transparency, and biased usage all significantly complicate the use of FRT systems, independent of accuracy. While such concerns are critical to a meaningful discussion about FRT, they fall outside the direct scope of this White Paper. The scope here remains intentionally narrow, as consensus around how to assess the operational limits of the technology can be crafted more readily than consensus around wide-ranging normative commitments around the technology. For a broader normative assessment, each individual use case must necessarily be judged by the potential harms and benefits along all of these dimensions and we point readers to broader discussions in the references cited throughout this White Paper.

 

Read it here

A WHITE PAPER FOR STANFORD’S INSTITUTE FOR HUMAN-CENTERED ARTIFICIAL INTELLIGENCE

Daniel E. Ho

Emily Black

Maneesh Agrawala

Li Fei-Fei

Browse our Suppliers and their Products & Services

Products & Services

more products or services
Infiniti Electro Optics, SIGMA All-Weather Long-Range PTZ Camera System
Infiniti Electro Optics, SIGMA All-Weather Long-Range PTZ Camera System

The Sigma is a customizable multi-sensor PTZ system that boasts an extreme long-range HD visible camera in options up to 2075mm.

MICRO-TURBINE ENGINE FOR UAVS - SoluNox MGTE UM1-1
MICRO-TURBINE ENGINE FOR UAVS - SoluNox MGTE UM1-1

SoluNox UAV Solutions & Launching System has the expertise in design and development of fourth generation launching systems for High Mass and High Speed Target Drones.

Inertial Labs, Weapons Orientation Modules (WOM)
Inertial Labs, Weapons Orientation Modules (WOM)

The WOM TM weapon orientation modules provides a level of performance previously unseen in the world of miniature 3DOF orientation systems.

MILMAST Lifting Systems Inc., Pan&Tilt GSPT 1030
MILMAST Lifting Systems Inc., Pan&Tilt GSPT 1030

Pan-tilt series has unique software and can communicate over all protocols. It has the feature of being the most compact product of the 1030 series family. With its unique mechanical design, it can move high weights with a low body weight (6.5 kg).

Aviation
Aviation

AMD is the major and leading manufacturer and supplier for precision-led aircraft components with AS9100D standards.

Menatek Defense Technologies, Sur Armor
Menatek Defense Technologies, Sur Armor

There are over 30 parameters affecting the ballistic resistance of an add-on armor and we control all of them. Every layer and their interactions have been carefully engineered to provide the highest protection and multi-shot endurance.

Fasteners
Fasteners

With over 20 years of experience in the aeronautical market, AeroTF is a specialist in the distribution of components and tools for both commercial and military aircraft and helicopters and the space sector.

Extreme Long Range UNI 200 Infrared Illuminator
Extreme Long Range UNI 200 Infrared Illuminator

Ideal for Extreme Long Range Night Vision, Driver Identification, and very special Military Applications.

Ball Bearings
Ball Bearings

PACAMOR KUBAR BEARINGS (PKB) Miniature, Precision, and Instrument Ball Bearings are manufactured to the highest degree of quality and workmanship.

High Pressure Waterjet Stripping
High Pressure Waterjet Stripping

High performance and integrated solutions.

Menatek Defense Technologies, MeVu Periscopes
Menatek Defense Technologies, MeVu Periscopes

MeVu – M Series is the first and only indigenous Mono-Block glass periscope series manufactured in Turkey. Mono-Block glass body provides ultra-clear peripheral view while offering superior ballistic performance.

The Sky Polarization Azimuth Sensing System
The Sky Polarization Azimuth Sensing System

The Sky Polarization Azimuth Sensing System, SkyPASS®, provides mission-critical, highly-accurate attitude information regardless of GPS accessibility in a low-power, low-cost, extremely small form factor.

Engineering Technology Corporation

A leading expert in filament winding, composite solutions, and a complete service provider for the filament winding industry. Designer and manufacturer of filament winders, automated production lines, other filament winding equipment, and software.

Diamatec

DIAMATEC manufactures diamond, CBN grinding wheels, and other diamond tools such as cutting discs, files, drills, routers, and also everything you need to maintain the lifetime of your tools : sharpening stones, diamond dressers...

Turbo Cast (India) Pvt. Ltd.

Turbo Cast (India) Pvt. Ltd. is India's First Indigenous Investment Casting Foundry, started in 2015, especially for Aerospace, Defense, Medical & Aluminum Alloy, Accredited for AS9100D & Nadcap (NDT & Welding) certification.

DMD Solutions

DMD Solutions provides engineering services in the fields of aerospace design & certification with a focus on Reliability, Availability, Maintainability, and Safety in the aerospace industry.

SoluNox Pvt Ltd.

We have a dedicated and highly experienced pool of avionics, aerospace engineers, mechatronics naval architects; marine engineers, experts in IT domain, mechanical engineering etc.

Cobtec Ltd.

Founded in March 2004, COBTEC.,LTD is a high-tech enterprise specializing in the business of optical devices. At the beginning, CoBTec imports the optical device from USA, Europe and Russia to distribute them in China.

AVIATION PARTS & SERVICES LTD.

Aviation Parts and Services Ltd specializes in the supply and management of rotatable components and consumables for the commercial aviation market, with a particular focus and expertise in the support of Boeing and Airbus fleets.

Pavan Industries

Pavan Industries was established as a proprietary tiny unit in the year 2010 and started manufacturing & supplying Valve & Pump parts.

Download our app to your mobile phone