Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
As rumors and reports swirl about the difficulty facing top AI companies in developing newer, more powerful large language models (LLMs), the spotlight is increasingly shifting toward alternate architectures to the “Transformer” — the tech underpinning most of the current generative AI boom, introduced by Google researchers in the seminal 2017 paper “Attention Is All You Need.“
As described in that paper and henceforth, a transformer is a deep learning neural network architecture that processes sequential data, such as text or time-series information.
Now, MIT-birthed startup Liquid AI has introduced STAR (Synthesis of Tailored Architectures), an innovative framework designed to automate the generation and optimization of AI model architectures.
The STAR framework leverages evolutionary algorithms and a numerical encoding system to address the complex challenge of balancing quality and efficiency in deep learning models.
According to Liquid AI’s research team, which includes Armin W. Thomas, Rom Parnichkun, Alexander Amini, Stefano Massaroli, and Michael Poli, STAR’s approach represents a shift from traditional architecture design methods.
Instead of relying on manual tuning or predefined templates, STAR uses a hierarchical encoding technique—referred to as “STAR genomes”—to explore a vast design space of potential architectures.
These genomes enable iterative optimization processes such as recombination and mutation, allowing STAR to synthesize and refine architectures tailored to specific metrics and hardware requirements.
90% cache size reduction versus traditional ML Transformers
Liquid AI’s initial focus for STAR has been on autoregressive language modeling, an area where traditional Transformer architectures have long been dominant.
In tests conducted during their research, the Liquid AI research team demonstrated STAR’s ability to generate architectures that consistently outperformed highly-optimized Transformer++ and hybrid models.
For example, when optimizing for quality and cache size, STAR-evolved architectures achieved cache size reductions of up to 37% compared to hybrid models and 90% compared to Transformers. Despite these efficiency improvements, the STAR-generated models maintained or exceeded the predictive performance of their counterparts.
Similarly, when tasked with optimizing for model quality and size, STAR reduced parameter counts by up to 13% while still improving performance on standard benchmarks.
The research also highlighted STAR’s ability to scale its designs. A STAR-evolved model scaled from 125 million to 1 billion parameters delivered comparable or superior results to existing Transformer++ and hybrid models, all while significantly reducing inference cache requirements.
Re-architecting AI model architecture
Liquid AI stated that STAR is rooted in a design theory that incorporates principles from dynamical systems, signal processing, and numerical linear algebra.
This foundational approach has enabled the team to develop a versatile search space for computational units, encompassing components such as attention mechanisms, recurrences, and convolutions.
One of STAR’s distinguishing features is its modularity, allowing the framework to encode and optimize architectures across multiple hierarchical levels. This capability provides insights into recurring design motifs and enables researchers to identify effective combinations of architectural components.
What’s next for STAR?
STAR’s ability to synthesize efficient, high-performing architectures has potential applications far beyond language modeling. Liquid AI envisions this framework being used to tackle challenges in various domains where the trade-off between quality and computational efficiency is critical.
While Liquid AI has yet to disclose specific plans for commercial deployment or pricing, the research findings signal a significant advancement in the field of automated architecture design. For researchers and developers looking to optimize AI systems, STAR could represent a powerful tool for pushing the boundaries of model performance and efficiency.
With its open research approach, Liquid AI has published the full details of STAR in a peer-reviewed paper, encouraging collaboration and further innovation. As the AI landscape continues to evolve, frameworks like STAR are poised to play a key role in shaping the next generation of intelligent systems. STAR might even herald the birth of a new post-Transformer architecture boom — a welcome winter holiday gift for the machine learning and AI research community.