Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124

Advanced generative AI involves stipulating “thinking time” for the AI and here’s the way this is … [+]
In today’s column, I identify three eras underlying the advent of so-called “thinking time” when it comes to using generative AI and large language models (LLMs) and discuss the changes in prompting that will arise soon accordingly.
Let’s talk about it.
This analysis of an innovative AI breakthrough is part of my ongoing Forbes column coverage on the latest in AI including identifying and explaining various impactful AI complexities (see the link here). For my extensive discussion of fifty key prompting strategies and the value of solid prompt engineering, see the link here.
When you use modern-day generative AI, the latest versions tend to have a chain-of-thought (CoT) capability that is now built into the AI, see my in-depth explanation at the link here.
The essence of the chain-of-thought approach is that the AI derives a series of steps to follow when trying to process the prompt that a user has entered. You can somewhat liken this to human decision-making and problem-solving in the broad sense of coming up with logical steps to figure things out (do not though anthropomorphize AI, such that AI isn’t on par with how humans think).
In the AI community, this extended use of processing time has been given all kinds of fanciful titles such as providing additional test-time compute or so-called thinking, reasoning, or inference time. I disfavor the phrasing of thinking/reasoning/inference time since those words smack of human thought. I find quite questionable the phrase test-time compute since the idea of testing is something usually done before a system or app is rolled out to users, while in the case of AI, the semblance seems to be that “test-time” also encompasses when users are actively utilizing a supposedly fielded system.
I prefer the admittedly bland but apt indication that you are simply providing more processing time for the AI to dig deeper into deriving an answer or solution.
Imagine things this way.
You are using an AI app that plays chess. If you timebox the app to a split second of processing time, there isn’t sufficient time to perhaps examine a plethora of possible chess moves. The AI must cut off the look-ahead since there isn’t sufficient processing time to do a lot of calculations. On the other hand, you might tell the AI that it can take up to five minutes to make a chess move. This would allow for a bunch more processing time to explore a wide variety of chess moves and potentially derive a better choice for the move at hand.
Some would say this is adding thinking time, reasoning time, test-time compute, and so on. I’d suggest we could also be more plainspoken and note this as extending the run-time, real-time processing, or execution time allotment.
You decide which vernacular is the most parsimonious and, frankly, honest.
Since the AI community seems to have decided that they like the impressive-sounding parlance of thinking time, I am going to proceed to use that catchphrase in this discussion since I will be showing generative AI examples entailing this topic. Please know that I use that phrase with great reluctance. In your mind, construe the thinking time as simply the processing time, thanks.
I have identified that there are three eras associated with the amount of thinking time for generative AI:
Right now, we are primarily in the first era.
It goes like this. You ask generative AI a question and the AI will run for some amount of thinking time to figure out your answer. The AI maker has decided how much thinking time you will be allowed. To some extent, you can pay a fee to get extended thinking times, otherwise it is set by the AI maker.
This has led us to the second era.
After realizing that users might want to set how much thinking time is to be consumed, AI makers are variously now implementing the option of users being able to choose the amount of thinking time. For example, you might choose between Low, Medium, and High. That’s the ChatGPT approach for the moment. Another angle is a sliding scale, which is what Anthropic is moving toward.
You will see in a moment that this ability to choose the desired thinking time isn’t all that it’s cracked up to be. Hang in there.
I predict we are heading at a rapid pace toward a third era.
In the third era, the amount of thinking time will be collaboratively ascertained on a human-AI basis. In short, the generative AI will discuss the thinking time aspects with the user, of which the user then gets the final say on the amount of thinking time that will be utilized for a prompt or when used on a default basis.
My logic for saying that AI makers are inevitably and soon going to shift into my defined third era is that users of generative AI are going to express their dissatisfaction with the second-era approach. Maybe few of the AI makers will admit that there is such discontent and will merely proclaim they have enhanced how thinking time is set. Sure, whatever gets us to improvements for those using generative AI. Spin away.
The first era is what you have already experienced when using most of the conventional generative AI. No need to dwell on that era. Let’s jump into the slowly emerging second era and see some examples.
The ability to pick from low, medium, or high is straightforward and provides a useful illustration of the second-era approach. There is either a picklist presented to you for each prompt, or the AI directly asks you which you prefer.
Here is an example of generative AI asking for your preference.
Observe that I told generative AI that I wanted to go with low for the prompt I am entering.
My basis for choosing low is that the question I am going to ask is pretty easy. By simply asking about the capital of Paris, I expect that the thinking time should be quite minimal. No sense in telling the AI to go with high when I can guess that the thinking time isn’t going to be notable.
This brings up an important point.
Some generative AI apps will do whatever you say, such that if you say high, the AI will potentially whirl and calculate far beyond the true amount of time needed. You have essentially told the AI that this is okay with you. The problem is that you might end up paying for extra cycles of processing that you really didn’t need to have expended. It’s kind of a rip-off.
The retort by the AI maker is that if the user has said they want a high amount of thinking time, by gosh, that’s on their shoulders. It is up to the user to realize what they are doing. Plus, the added thinking time can be construed as a handy double-check. You are getting the AI to do a lengthier exploration and as a result, you can be happier with the answer given.
Mull that over.
Though there is a bit of a splash made about using a sliding scale instead of the low, medium, and high, the rub is still about the same.
Take a look at this example.
You can certainly tout the extravagance of being able to use a sliding scale, which might be an actual bar on the screen with a slider or could be as I’ve shown above the entry of a number. In reality, with a scale of 1 to 10, you could reasonably assume that 1 is low, 5 is medium, and 10 is high. Your ability to choose those numbers or something in between might be nice, but it doesn’t get us to the moon if you know what I mean.
The essence is that the selection of the thinking time is still on the shoulders of the user.
I will now shift into the approaching third era.
Let’s continue my example using the sliding scale. Suppose that the AI consumed the 7 as the amount of thinking time, but the answer wasn’t quite as good as it could have been if I had given a higher number. Please know that in the first and second eras, you would not be informed that your choice of time was a tad low. Tough luck to you.
In the third era, something like this would happen.
You can see that the AI not only proceeded as I originally instructed, but it also detected that there was more that could have been done to give a stronger answer. The AI kindly informed me accordingly.
I then opted to do a rerun with a 9 as the amount of thinking time.
This showcases the third era as consisting of human-AI collaboration in establishing the thinking time. Rather than the first era where the AI makes the choice, and the second era where the user makes the choice (though somewhat blindly), the third era entails the AI and the user working hand-in-hand to figure out the thinking time. Nice.
You might have had some heartburn that the AI informed me after the fact that my 7 was less than what might have been a better choice. I reran the prompt with a 9, but I had already incurred the cost and delay associated with my prompt that said to use 7. You might say that I am doubling my cost and that this seems unfair.
I agree.
The third era will introduce the midstream capability of adjusting thinking time. So, for this next example, envision that my initial prompt of 7 was accepted and the AI got underway.
Here’s what might have happened.
The beauty is that I don’t incur a complete rerun. Midstream of processing, the AI came back to me and asked if I was willing to up the ante to 9. I said yes.
Cynical readers might right away be bellowing that this is going to incentivize the AI makers to convince users to increase their thinking times, perhaps just to make a buck. I get that. There is little doubt that the AI could be tilted to ask the user for more thinking time even when it is misleading or an outright lie. Ka-ching goes the cash register for the AI maker. It will be hard for an average user to discern whether they are being honestly told to increase their time or are being tricked into doing so.
The saving grace, perhaps, would be that AI makers doing this tomfoolery are taking a huge reputational risk if it is discovered they are purposely gaming users. Possibly lawsuits and maybe criminal action could be in their future for such deceptions of users (for more on the evolving realm of AI and the law, see my analysis at the link here).
We’ll have to wait and see how this pans out.
Part of the issue with my having stipulated the 7 as my desired thinking time for my prompt was that I had to take a wild guess about the matter.
Consider things this way.
You go to a car mechanic to fix your car. Suppose the car mechanic asks you how much you are willing to spend to fix the car. That seems zany. The car mechanic ought to give you an estimate. Few people would magically know how much they think the car fix is going to cost. It doesn’t make much sense to do things that way.
The same will hold true in the third era of generative AI thinking time.
Here is an example.
This makes a lot more sense. I was able to provide my prompt and get a preliminary estimate. I approved the estimate.
Once the AI gets underway, if it determines that the estimate was not sufficient, it will come back to me midstream and let me know. I could then adjust if desired.
A user that is frequently utilizing generative AI might get tired of having to continually deal with estimates and approvals for thinking time. It could be exasperating and irksome.
In the third era, the AI will keep track of how things are going and make recommendations to the user.
Consider this example.
The AI has handily determined that my best bet is to generally be at Low. This isn’t rigid. The AI will adjust, and I can adjust.
If you haven’t been dealing with thinking time when using generative AI, you now know what’s coming up. I trust that you are prepared for the changes afoot.
My expectation is that we will advance quickly to the third era. No sense in making life harder for users by getting mired in the first era or the second era. It’s time to move on.
A final comment for now.
Henry Ford famously said this: “Coming together is a beginning; keeping together is progress; working together is success.”
The same applies to working with generative AI. Human-AI collaboration is the best path toward success. Humans will be happy, and I suppose the AI will be “happy” though let’s not hand out that emotion to non-sentient AI. We must keep our heads and minds clear on what contemporary AI can and cannot do.
That’s a good use of our thinking time.