One Test for the Resilience of a Role to AI Automation.

What is the ratio of time and expertise required to verify the correctness of the labor output to the time and expertise required to produce the output? If that ratio is high, meaning that verification is difficult, then all things being equal you should expect your role will be more resilient to automation. The reasoning here is straightforward: humans will continue to verify the work LLMs produce even after LLMs are actually doing the producing. Likely, that status quo will be maintained for years.

Two Reasons to Think Humans Will be in the Loop

The first is that the technology is error prone. In particular, it is prone to certain kinds of errors that humans don’t normally make. The most well known is hallucination, where models will make up detailed information wholesale. This phenomenon is broader than that, however. LLMs and humans have different areas of relative strength and weakness. This is not surprising, since the underlying reasoning mechanisms (an actual brain vs a neural network) are so different. Consequently, LLMs may benefit from human input even if their unassisted error rate is lower than their human counterpart. LLMs checking other LLMs are less helpful because their errors are so closely correlated.

The second reason is social. People want clear liability. People don’t trust new technology. People are uncomfortable with massive disruptions to the labor market, especially when the disruptions affect high-status jobs. The EU has already passed legislation requiring human oversight for high stakes decision making and protectionist professional licensing has a long history. Once unemployment rates start spiking, so will political will. Expect these kinds of legal protections to become more stringent and widespread.

While verification requirements alone might go a long way towards satisfying concerns about liability and trust, they are clearly insufficient to assuage concerns about the labor market. To assuage that fear, you’d have to stop automation altogether. So why expect humans in the loop but not AI out of the loop? Because that would be practically impossible to accomplish outside of a sci-fi Butlerian Jihad scenario where all computers are destroyed. LLMs are widely distributed software and the individual/commercial incentives for their use will not go away just because of political antipathy. From an enforcement perspective, it is much easier to verify that human labor/oversight entered into the equation than it is to verify that AI didn’t.

A Complicating Factor: AI Assistance in the Verification Process

Of course, the human effort required to verify something may itself be sensitive to technological factors. A recent example comes from the world of math. Verification in mathematics can be extremely difficult, in the aspects of required expertise and labor time both (see a dispute about the correctness of a proof of the ABC conjecture, which has been ongoing for years). That might be changing in the era of automated proof assistants. These are just special kinds of programming languages. You translate your proof to the language and it checks each step, verifying in a reliable and deterministic way that the theorem stated at the end of the program is valid.

Automated theorem provers, which could in principle work for any theorem, are not new to the scene. The problem is that translating a human proof to the automated theorem proving language is typically extremely labor intensive. To prove a new theorem from the ground up might take millions of lines. However, the longer these theorem provers exist, the more viable they are. People share their work. New results are generally not proven from scratch, but on the basis of existing theorems. If someone has already shared the code for the existing theorem, you don’t need to start from scratch, and the inventory of already encoded theorems is growing all the time.

That alone might never be enough to make automated theorem proving practical in many cases because the labor required to encode the novel part of the proof alone can be too costly to bother with. This is where LLMs enter the equation. If an LLM can write a proof AND translate it to the theorem proving code, all that will be left from the human practitioner to verify is that the theorem which the LLM claims to have proven matches the theorem translated in the code. No one need bother to glance at the actual proof to be sure of the accuracy of the result. This sort of idea is common in the software development world.

Example Case: Software Development

One of the most central concepts in software development is abstraction. The idea here is that we do not need to think of every step in a computation. Consider the operation of taking the logarithm of a number. The kinds of programming languages most developers use day to day have a function built in which completes this operation so that it appears to the programmer that you only need to give the computer one instruction. At the level of the processor where the computation is really executed, however, this operation actually involves several instructions. The detailed operation of the processor is abstracted away, the everyday developer doesn’t need to think about it or even understand it, just as with the previously proven theorems I talked about in the above section. Software is built on layers and layers of these abstractions.

Here’s the rub: heavy use of abstractions in combination with automated tests which make it easy to see how those abstractions behave make software easy to verify. Verifying the abstraction is implemented correctly boils down to verifying the tests, which can be very simple even if what is tested requires a complex underlying implementation. The general testing model is to feed the module of code that implements the abstraction an input and check that it provides the expected output; this is itself done programmatically. There may be some expertise involved in anticipating which inputs may cause problems and knowing what their output ought to be, but it is often rote. The correctness of an implementation for an abstraction only needs to be checked once, after which it can be recycled over and over again or used to create higher-level abstractions. The verification-to-production ratio in software, then, is very low as a rule. Software development may be an especially vulnerable variety of white collar work for this reason.