The Pascal-Lebowski Theorem
An intelligent being will exert no more energy than it takes to hack its own reward system.
This is one version of what is known as the Lebowski Theorem. It’s pretty brilliant. Here’s a short history. Basically, someone used it as a rebuttal to the idea that super-intelligent machines would/could destroy the world seeking to achieve some kind of objective that doesn’t really make sense to humans (like making paperclips, in the particular example).
But does it have any implications for us? One consequence would be advice like:
Do what makes you happy, and no more.
Of course, we don’t exactly know what is going to make us happy, and a lot of people end up pursuing all kinds of endeavors that don’t really lead to individual happiness. Plus, what about sacrificing for a greater cause? Well, maybe in the moment that is what makes us happy. We sacrifice for the greater good because of our reward system.
But if we just pursue the normal paths to happiness that God intended, that isn’t really hacking is it? Hacking seems to imply some kind of cheating, or exploiting a loophole, or something.
Any being that takes a shortcut to happiness might end up being a bit short on the evolutionary fitness scale. It’s kind of like how over-simplistic metrics can ruin productivity. For example, some Soviet dairies were judged by how many tons of milk they could produce. So when the milk soured, they just poured it back in to another fresh batch. This diluted the soured milk so it was harder to detect, but ultimately led to a much inferior product.
In any case, evolution doesn’t like animals hacking their reward system. They tend to die off and those of us suckers that can’t seem to figure it out (and therefore fall back on actually working for a living) tend to live on. An individual shouldn’t really care what evolution expects of them, but just don’t expect the whole hacking process to be simple for an organic being.
For super-intelligent AI, it might be another story. Still, I do have a few quibbles with the Lebowski Theorem, even for AIs.
Imagine that we had some kind of function to represent the relationship between effort and utility: u = f(e). Let’s assume that an intelligent being will try to maximize u, in other words, it will choose the e that will lead to the most u. So what if the relationship is something like u = e?
Well, anything with a utility function like this will end up consuming all of the energy in the universe, and will be really, really satisfied with themselves. This is kind of like the nightmare scenario of the paperclip-making super-intelligence.
The Lebowski Theorem kind is only comforting if we believe that due to the existence of some kind of “hacking”, a being can actually maximize their utility with some less-than-infinite amount of energy consumption:
But let’s go back to organic beings for a moment. One way that nature kind of keeps us grounded is that a part of our brain tends to make us want to minimize effort. Our utility function has a downward slope. Thus, all else being equal, we would prefer to exert zero effort.
The Lebowski Theorem seems to have something like this implicit in it. Why wouldn’t an intelligent being want to exert more effort than necessary? Well, only if there were some general negative relationship between effort and utility. Who knows whether an AI would have this as part of their utility function. What if they are programmed to exert as much effort as possible, like in the u = e example? An animal with this kind of programming wouldn’t last long, so evolution gets rid of any kind of utility ‘bug’ like that. But since AI isn’t necessarily created by an evolutionary process, it might not have that kind of reward system.
Of course, anything with an upward sloping utility function will soon come across the limitation that resources are limited. Therefore, it won’t be able to exert infinite effort. So, maybe even if your programming at first looks something like u = e, a super-intelligent AI will soon look to change its utility function. Maybe that is really what is meant by “hacking.” But there isn’t really any reason for it to change the reward system to minimize effort. The goal will still be to maximize utility (or whatever passes for utility among super-intelligent AI).
In other words, an AI will attempt to alter their own utility function in order to achieve infinite utility by making it looks something like this:
See that discontinuity at e = 5? That’s the hack. Meaning, if the being can just figure out it’s own reward system it can get infinite utility for finite effort. Will the AI seek then to minimize the point at which this transition happens?
Maybe. Have you ever heard of Pascal’s Wager? Basically, it says that you should believe in God even if there is only a small chance heaven is real, because doing so only costs a finite amount and has a potentially infinite reward.
Of course, it’s not that straightforward because the same argument applies to all sorts of bogus, but potentially infinite rewards. For example, maybe there is a tiny possibility that overdosing on LSD, or eating pasta every day and praying to the Flying Spaghetti Monster will lead to nirvana…so shouldn’t we cover our bases there, too? I guess the answer has to do with whether any of those things actually has a non-zero probability. But it’s really hard for us to distinguish zero from a really small non-zero probability.
Now consider a being that can hack its reward system in order to reach infinite utility, and they have a choice regarding where to place the inflection point? If they choose a non-zero value, there is a possibility that the amount of effort they choose won’t be available. Even a tiny bit of effort introduces the possibility that they lose out on infinite happiness. So, a Pascal inspired variant of the Lebowski Theorem might read:
Whenever a superintelligent being has the ability to completely hack its own reward system, it will minimize the effort required to achieve infinite utility to minimize the risk of failure.