The Legal Turing Test - Artificial Judgement on the Trial

There are efforts to introduce artificial intelligence to our legal and policing systems. Although there is still a long way to go, we need to ask ourselves now: what needs to be done that society benefits from Artificial Judges?

10. Apr 2018 · Lisa Schurrer · Stan Kerstjens

It did not look as she had expected. Instead of a room filled with wooden benches, the police led her into what seemed to be a movie set. Cameras and microphones pointed at every corner of the room, and black wires covered the tables with devices she did not recognise. As she was seated the judge spoke: “Scientists are developing an Artificial Judge. This Artificial Judge is an algorithm combining all case files and all observations in this room: facial expressions, blood pressure, body temperatures and of course, what’s said during the hearing. The Artificial Judge then processes this information with intricate machine learning algorithmics and reaches a verdict – just as a human judge would. As part of a series of trials, today’s trial will be judged both by me, representing human judge, and the Artificial Judge to see if we come to similar conclusions. To ensure independence, I and the Artificial Judge will not exchange information, or discuss opinions. For this trial, only my judgement will be legally binding, and will not be influenced by the conclusions of the Artificial Judge. My apologies for the distracting environment. In the interest of the study and the verdict, please proceed as naturally as possible.”

This scenario sounds like science fiction. However, efforts to introduce artificial intelligence to our legal and policing systems already exist. For example, Precops has been predicting burglaries in Kanton Zurich since 2014 (1), and COMPAS (US) or HART (UK) have been estimating the probability of recidivism of convicts and therefore influencing sentences (2) (for more examples (3, 4)). Although there is still a long way to go before we reach automated judgment as imagined above, it's important to think about what these systems would need to benefit society, rather than burden it.

About effectiveness and efficiency of the legal system

There are two main motivations for introducing automated systems into our legal system: efficiency and effectiveness. A more efficient judicial system can handle more cases more quickly for lower costs, for instance by reading hundreds of pages in seconds, a cause nobody would argue against. But some not only expect an intelligent system to be more efficient, but also to be better and therefore to improve the quality of the legal system. For instance, an A.I. system could be able to reduce human mistakes in simple ways like preventing clerical errors, but also in more complicated ways, such as increasing the objectivity of judgement by reducing bias towards race or gender.

The Problem of Biases, Lunch Breaks and the Black Box

Let’s have a closer look at how we define a “good” or “just” judgement. A good judgment might not be the strict implementation of rules but a deliberate application of a set of norms to a specific case and context. Some people argue that a computer will never capture concepts claimed to be exclusively human such as empathy, compassion, and the spirit of the law. To avoid unanswerable questions about the capabilities of artificial computing systems, such as “can a computer feel compassion”, the mathematician Alan Turing devised something he called the Imitation Game, now usually referred to as the Turing Test. Turing’s argument crudely states that if the behaviour of an artificial system is indistinguishable from that of a real system, they are identical for all practical intents and purposes. In this line of thought, it doesn’t matter if  the judge is human or artificial, if it has feeling, empathy and knows the spirit of law or none of this – the only thing that matters is that the output be indistinguishable from a human. When a machine behaves differently from a human, it fails the Turing Test. But this does not mean that the machine is inferior to the human. It could also mean the artificial judgement was not influenced by emotions, stereotypes or even a missed lunch break (a study found that (human) judges give more lenient decisions at the start of the day and immediately after a scheduled break in court proceedings (5)).  

The desire to optimise judgment with an algorithmic system is based on the idea that a pure calculation is more objective than a human decision-making process. This isn’t necessarily the case as the results of ProPublica show, an organization which found out about the reproduced biases in COMPAS. The system attributed much higher risks of recidivism to black prisoners than in comparison with white convicts who were accused of similar crimes (6). Even so, the artificial judge at least doesn’t have an interest in hiding its bias, whereas the human judge actively (or even unconsciously) wants to hide his or her prejudices. But there could also be the opposite consequence. The judge might be aware of the fact that he or she has biases and try to honestly face his or her own stereotypes. An automated system however is hardly equipped with this ability of a self-critical investigation of own judgments. Now, is the process of a reasoning and self-critique necessary for a good judgment? It might be enough already to provide transparency about the reasons for a judgment such that the convict can understand and accept the sentence (7). Some methods in Artificial Intelligence are essentially a black-box to humans, meaning that there is no obvious way to understand how decisions are made; there is nothing but the bare result. It will be a challenge for designers of an Artificial Judge to reconcile black-box methods with the ability to properly explain judgements and, hence, convince legal experts and the public.

Final Remarks

We shouldn’t stop thinking at the walls of the courtroom – how do sentences influence the society as a whole? A sentence can have implications on the society, it can for example be a paradigmatic judgment which reflects changes in society or even anticipates them. The judicial branch is one pillar of a democratic state next to the legislative and the executive branch. We need to take this responsibility into account when introducing AI to the legal system. Indeed, this very notion of responsibility might be a good tool to differentiate between levels and tasks we want to be automated and organized more efficiently and those that ought to remain tied to human judgment, for human judgment still is the basic ingredient for any notion of justice.

Want to learn more about this topic? We will be discussing "Algorithmic Justice" on the 19th of April.
Find more information here:


(1) N.N, »Der Wetterbericht für Einbrüche«, In: SRF, online: [Accessed 15.3.2018].

(2) See Website of the company, online: [Accessed 15.3.2018], Adam Liptak, »Sent to Prison by a Software Program’s Secret Algorithms«, In: The New York Times, 1.5.2017, online: [Accessed 15.3.2018] and Matt Burgess, »UK police are using AI to inform custodial decisions – but it could be discriminating against the poor«, In: Wired, 1.3.2018, online: [Accessed 15.3.2018].

(3) Vikram Dott, »Met police to use facial recognition software at Notting Hill carnival«, In: The Guardian, 5.8.2017, online: [Accessed 15.3.2018] and Robert Scammell, Facial recognition surveillance glasses used by Chinese railway police, online: [Accessed 15.3.2018].

(4) Ministry of Justice (ed.), Transforming our justice system: assisted digital strategy, automatic online conviction and statutory standard penalty, and panel composition in tribunals. Government response, February 2017, S.8, online: [Accessed 15.3.2018].

(5) Shai Danziger, Jonathan Levav and Liora Avnaim-Pesso, »Extraneous factors in judicial decisions«, In: PNAS, 26.4.2011, Vol. 108, No. 17, pp. 6889– 6892, online: [Accessed 15.3.2018].

(7) So far, we face the situation that private companies label the algorithms as a trade secret. In the case of COMPAS it is therefore not possible to explicate the factors which led to a specific risk estimation. The case of a convict claiming his right to have insight into the factors of his risk probability got quite a lot of media attention. See: [14.3.2018] and [15.3.2018]. Eventhough the appeal had not been successful, experts like the law professor Frank Pasquale conclude: “[The] basic constitutional principle gives defendants a right to understand what they are charged with, and what the evidence against them is. A secret risk assessment algorithm that offers a damning score is analogous to evidence offered by an anonymous expert, whom one cannot cross-examine.” See: [Accessed 15.3.2018].


Lisa Schurrer (27) hat ihren Bachelorabschluss in Kulturwissenschaften an den Universitäten Leipzig und Prag erlangt und studiert derzeit Geschichte und Philosophie des Wissens an der ETH Zürich. In ihrer Masterarbeit beschäftigt sie sich mit den Auswirkungen selbstlernender Systeme auf die Gesellschaft und untersucht dabei den Bereich des politischen Handelns.

Stan Kerstjens


Der vorliegende Blogeintrag gibt die persönliche Meinung der Autoren wieder und entspricht nicht zwingend derjenigen von reatch oder seiner Mitglieder.