The remarkable results achieved by transformer-based models like GPT-2 and GPT-3 gravitated the research community toward exploring large language models (LLMs). Additionally, ChatGPT’s recent success and popularity have only served to increase people’s interest in LLMs. In-context learning and chain-of-thought prompting are two other major discoveries that have significantly improved the accuracy of the models. These discoveries go beyond simple question answering, where an input prompt containing a question is used to output a reasonable answer.
Although these prompting tactics have been effective in enhancing performance, current transformer-based LLMs can only condition on a fixed input string length, which limits the computations they can represent. This can also be understood as any deterministic language model that relies on strings of finite length is computationally limited since the model is equivalent to a finite automaton. To counter this, researchers have looked into the possibility of adding an external feedback loop to LLMs, where the model outputs are supplied as inputs after some post-processing. However, the question of whether this method substantially broadens a model’s set of computations is yet open.
Google Brain and researchers from the University of Alberta worked together to work on this problem statement. They added an external read-write memory to an LLM to verify that it could emulate any algorithm on any input. Their research is summarised in the paper “Memory Augmented Large Language Models are Computationally Universal,” which shows how an LLM enhanced with an associative read-write memory is computationally universal.
The Flan-U-PaLM 540B was the LLM of choice for the researchers. The underlying idea behind the research is to use a simple stored instruction computer to link the LLM and associative memory. This makes it possible for outputs and input prompts that are to be forwarded to the language model to interact in a loop. The external associative memory can be considered a dictionary, with the key-value pairs being variable names/address locations and values. The language model and memory use regular expression matches to perform each parsing step.
A unique “prompt program” is then developed to direct the system to simulate the execution of a universal Turing machine after establishing a stored instruction computer. In the end, demonstrating the simulation’s dependability comes down to examining a limited number of prompt-result patterns and confirming that the language model generates the appropriate output for each finite set of possible input strings. The fact that this study does not entail any extra “training” of the language model or alteration of its pre-trained weights is one of the work’s primary strengths. Instead, the construction exclusively depends on creating a type of stored instruction computer that can then be programmed with certain prompts.
In contrast to earlier research in this field that explores the computational universality of models, this study is distinctive. The main contrast is that the researchers showed how external memory augmentation could elicit universal computational behavior using a fixed language model with fixed pre-trained weights. The findings demonstrate that large language models are already computationally universal as they currently exist as long as they have access to infinite external memory.
Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our Reddit Page, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
Khushboo Gupta is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Goa. She is passionate about the fields of Machine Learning, Natural Language Processing and Web Development. She enjoys learning more about the technical field by participating in several challenges.