News

do anything — How ChatGPT turned generative AI into an anything tool Until recently, AI models were specialized tools. Modern LLMs are different.

Haomiao Huang – Aug 23, 2023 11:30 am UTC EnlargeAurich Lawson | Getty Images reader comments 95 with

The chief technology officer of a robotics startup told me earlier this year, We thought wed have to do a lot of work to build ChatGPT for robotics. Instead, it turns out that, in a lot of cases, ChatGPT is ChatGPT for robotics.

Until recently, AI models were specialized tools. Using AI in a particular area, like robotics, meant spending time and money creating AI models specifically and only for that area. For example, Googles AlphaFold, an AI model for predicting protein folding, was trained using protein structure data and is only useful for working with protein structures.

So this founder thought that to benefit from generative AI, the robotics company would need to create its own specialized generative AI models for robotics. Instead, the team discovered that for many cases, they could use off-the-shelf ChatGPT for controlling their robots without the AI having ever been specifically trained for it.

Ive heard similar things from technologists working on everything from health insurance to semiconductor design. To create ChatGPT, a chatbot that lets humans use generative AI by simply having a conversation, OpenAI needed to change large language models (LLMs) like GPT3 to become more responsive to human interaction.

But perhaps inadvertently, these same changes let the successors to GPT3, like GPT3.5 and GPT4, be used as powerful, general-purpose information-processing toolstools that arent dependent on the knowledge the AI model was originally trained on or the applications the model was trained for. This requires using the AI models in a completely different wayprogramming instead of chatting, new data instead of training. But it’s opening the way for AI to become general purpose rather than specialized, more of an anything tool.

Now, an important caveat in a time of AI hype: When I say general purpose and anything tool, I mean in the way CPUs are general purpose vs, say, specialized signal-processing chips. They are tools that can be used for a wide variety of tasks, not all-powerful and all-knowing. And just like good programmers dont deploy code to production without code review and unit tests, AI output will need its own processes and procedures. The applications I talk about below are tools for multiplying human productivity, not autonomous agents running amok. But the important thing is to recognize what AI can usefully do. Advertisement

So to that end, how did we get here? Fundamentals: Probability, gradient descent, and fine-tuning

Lets take a moment to touch on how the LLMs that power generative AI work and how theyre trained.

LLMs like GPT4 are probabilistic; they take an input and predict the probability of words and phrases relating to that input. They then generate an output that is most likely to be appropriate given the input. Its like a very sophisticated auto-complete: Take some text, and give me what comes next. Fundamentally, it means that generative AI doesnt live in a context of right and wrong but rather more and less likely.

Being probabilistic has strengths and weaknesses. The weaknesses are well-known: Generative AI can be unpredictable and inexact, prone to not just producing bad output but producing it in ways youd never expect. But it also means the AI can be unpredictably powerful and flexible in ways that traditional, rule-based systems cant be. We just need to shape that randomness in a useful way.

Heres an analogy. Before quantum mechanics, physicists thought the universe worked in predictable, deterministic ways. The randomness of the quantum world came as a shock at first, but we learned to embrace quantum weirdness and then use it practically. Quantum tunneling is fundamentally stochastic, but it can be guided so that particles jump in predictable patterns. This is what led to semiconductors and the chips powering the device youre reading this article on. Dont just accept that God plays dice with the universelearn how to load the dice.

The same thing applies to AI. We train the neural networks that LLMs are made of using a technique called gradient descent. Gradient descent looks at the outputs a model is producing, compares that with training data, and then calculates a direction to adjust the neural networks parameters so that the outputs become more correctthat is, to look more like the training data the AI is given. In the case of our magic auto-complete, a more correct answer means output text that is more likely to follow the input.

Probabilistic math is a great way for computers to deal with words; computing how likely some words are to follow other words is just counting, and how many is a lot easier for a computer to work with than more right or more wrong. Produce output, compare with the training data, and adjust. Rinse and repeat, making many small, incremental improvements, and eventually you’ll turn a neural network that spits out gibberish into something that produces coherent sentences. And this technique can also be adapted to pictures, DNA sequences, and more. Page: 1 2 3 4 5 6 Next → reader comments 95 with Advertisement Channel Ars Technica ← Previous story Next story → Related Stories Today on Ars

Articles You May Like

Consumer watchdog sues major US bank claiming it cheated customers
Childhood vaccination rates, a rare health bright spot in struggling states, are slipping
Consumer watchdog sues major US bank claiming it cheated customers
Company behind Trumps favorite drink goes above and beyond for the inauguration
Company behind Trumps favorite drink goes above and beyond for the inauguration