As training ever-larger transformer-based models encounters diminishing returns, a novel blockchain protocol could advance AI by emphasizing the optimization of neural network architectures, harnessing the decentralized computational power of blockchain technology. The innovative protocol would replace arbitrary decryption tasks in the proof-of-work concept with a focus on enhancing benchmark scores of AI models on standardized datasets, utilizing interfaces like the Open Neural Network Exchange (ONNX) protocol to define architectures. The economic potential of blockchain technology could draw a diverse range of players into the field, sparking a competitive drive for the development of more efficient and effective neural networks, potentially giving blockchain a purpose beyond digital currency while democratizing the field of AI.
The remarkable progress in large language models (LLMs) can be traced to two key developments: the discovery of the right neural network architecture based on transformers and attention mechanisms, as outlined in the groundbreaking paper "Attention is All You Need" (2017)[^1^]. As a result, training larger models on ever-expanding datasets with a fairly fixed net architecture has achieved significant advances.
A recent Nature publication, “In AI, Bigger is Better”[^2^], surveyed the scaling properties of LLMs. The breakthrough occurred once the model size reached a few billion parameters, and the race for bigger models is ongoing. However, this approach faces constraints, as Sam Altman explained in a recent interview: “I think we’re at the end of the era where it’s gonna be these giant models, and we’ll make them better in other ways.”[^3^]
A significant constraint is memory, as all parameters must be stored in memory. While hardware will undoubtedly improve following Moore's Law, the largest models will remain accessible primarily to major corporations, creating a bottleneck. Furthermore, we are approaching the limits of available datasets as the internet is exhaustively mined for data. Although incorporating video data could expand data quantity, its semantic value is in doubt. There are indications that training larger transformer-based models on fixed datasets is already experiencing diminishing returns.
The human brain demonstrates that limited input from sensory organs can be coupled with a larger, more complex neuronal architecture for advanced cognition. The brain has about 86 billion neurons and 100 trillion synapses, making it two orders of magnitude larger than GPT4. The amount of written data a high-achieving human can consume over a lifetime is about 4 billion tokens (assuming a 600 words-per-minute reading speed and 4 hours of reading a day). The amount of visual information over the lifespan is 1 trillion tokens. Only a fraction of this visual information is probably of any use. Therefore, the brain seems to fall into the paradigm of a larger, more complicated model trained on a smaller dataset. This suggests that after size, the next low-hanging fruit to enhance LLMs is optimizing the network itself.
Some promising research in optimizing network architecture includes Neural Architecture Search (NAS) and AutoML, which employ machine learning algorithms for discovering optimal network structures. However, they have yet to produce breakthroughs on the scale of transformers. Notably, the human brain resembles recurrent neural networks (RNNs) rather than transformers and is quite efficient to run, consuming under 100 watts of energy compared to 40kW for a small GPT3 engine in continuous use.
Enter Bitcoin and Blockchain Technology
With a market cap of nearly $500 billion out of $1.2 trillion total, Bitcoin still relies on proof-of-work validation. This immense computational power could potentially be harnessed to run algorithms, such as an evolutionary approach, to develop improved neural networks.
Blockchain technology has struggled to find a definitive purpose beyond its most successful use case: digital money. Other applications, such as Decentralized Autonomous Organizations (DAOs), have largely failed to take off. Even Bitcoin's role as a form of currency remains uncertain due to its persistent price volatility and lack of widespread adoption.
A novel crypto protocol, building on the cryptographic core of Bitcoin's blockchain, could potentially transform the field of AI by harnessing the power of decentralized computing to develop and optimize neural network architectures. This approach would repurpose the immense computational resources used by blockchain participants, known as miners, and direct them towards advancing large language models (LLMs).
At the heart of this proposal lies the proof-of-work concept, which currently requires miners to solve computationally-costly mathematical problems using the Secure Hash Algorithm 256-bit (SHA-256). However, instead of these arbitrary decryption tasks, the new crypto protocol would focus on improving the benchmark scores of AI models on a medium-size standardized dataset, such as a 10-gigabyte text corpus.
A standardized interface, like the Open Neural Network Exchange (ONNX) protocol[^4^], could be employed to specify neural network architectures. These structures would be easily verifiable and publicly visible, allowing for both financial rewards and bragging rights to motivate miners to invest in the endeavor. Furthermore, those who discover valuable improvements in LLM models might opt to monetize their advancements through copyrighting rather than merely posting them on the blockchain.
Many technical details of this proposal will need to be worked out to ensure the blockchain is functional, secure, and provides correct incentives. However, the widely-adopted LLM benchmarks such as GLUE and SuperGLUE[^5^] are a good starting point for designing an automated model-verification system.
Rather than requiring complex testing sets to avoid the overfitting issue that plagues rapidly growing models, a simpler method would be to establish a benchmark for LLMs through in-sample model testing. This could be achieved by setting a relatively small parameter size—for instance, 10 billion—to feasibly limit the scope of complexity.
In practice, LLMs would be submitted to the blockchain via a unique hash value. Then, these models would be compared through a randomly selected in-sample test. The verification process involves running the model on the chosen test set, a relatively inexpensive operation. The ultimate winner will be the model that possesses the best combination of accuracy and performance. This victorious model is granted the right to mint a blockchain coin and pen the next block in the chain.
This evolving benchmarking process can be adaptable over time. For example, the accuracy threshold can be raised or lowered to influence the overall speed and advancement of the blockchain – an operation mimicking the mining difficulty adjustment of bitcoin. The simpler the verification process and the closer it is to existing standard protocols, the more robust, scalable, and, ultimately, successful the blockchain will become.
Although proof-of-work presents an appealing starting point for redirecting the computational power of blockchain towards optimizing neural networks, the proof-of-stake incentive structure of blockchains, such as Ethereum or Solana, could also be adopted. Most of the computation for training neural nets will occur off-chain, while the primary on-chain task will involve validation or benchmarking of models. The incentive for model validation can be structured similarly to proof-of-stake blockchains, with payments deriving from transaction fees and validation rewards. The primary incentive for model builders will remain the minting of cryptocurrency.
By harnessing the economic potential of blockchain technology, a diverse range of players could be attracted to the field, igniting a competition to develop more efficient and effective neural networks. In doing so, blockchain may finally discover a purpose beyond digital currency – propelling innovation and progress in the realm of artificial intelligence and large language models.
Footnotes:
- https://arxiv.org/abs/1706.03762
- https://www.nature.com/articles/d41586-023-00641-w
- https://techcrunch.com/2023/04/14/sam-altman-size-of-llms-wont-matter-as-much-moving-forward/
- https://onnx.ai/
- https://gluebenchmark.com/ and https://super.gluebenchmark.com/
Comments
Post a Comment