logo image
20 June 2024

Learnings from doing ML on web3 data

From the experiments we ran in web3, we came to the conclusion that the best way to approach it is to think of the web3 as a beta-level next generation web infrastructure.

We have been running experiments using large language models (LLMs) to model web3 data.In the process of running these experiments we have discovered a few things about doing machine learning (ML) in the context of web3, and we would like to share these with the community.

Here’s what we’ve learned:

Web3 data is not optimized for batch read

Web3 data is not optimized for batch read, instead the few systems that occur are designed for write. The core use case for a blockchain is for you to store data, and often to do that, you write to the chain. ML use cases require batch read.

NO standardized data schema for all blockchains

There is no standardized data schema for all blockchains. For example, Ethereum has a different data model than Solana. Which means if you are building a generalized blockchain level solution, you will have to customize your code at the data layer for each blockchain that diverges. This isn’t necessarily a dealbreaker, however, in software design we like solutions that are generalizable and cost minimal effort to accommodate new examples of the same primitive.

Ethereum and Solana are both blockchains and supposedly should allow the creation of a metaverse, where there is a world where you can take your data with you anywhere you want on the internet. That day is still not today.

Non-Human readable code

Even though web3 is supposed to be open, any smart contracts do not post their code, instead what one can reliably get access to is the bytecode of the smart contract. And of course, bytecode is not human-readable.

Very nascent developer tooling

The developer tooling landscape for web3 is nascent. For example, let’s say you want to decompile bytecode back to source code, the tools available for this task are not robust. And there are more examples of tasks like this that should be fairly east to do but are not.

From the experiments we ran in web3, we came to the conclusion that the best way to approach it is to think of the web3 as a beta-level next generation web infrastructure. We are still deep in the building phase when it comes to web3 infrastructure, unfortunately we as a web3 community has treated it as if it was ready for production. There are quite a few significant things that need to be invented before we should unleash it on the consumer web.

You might also like...

    Images of beakers, computers, and science equipment
    Published By
    Omoju Miller •
    12 September 2024

    The Science of Deployment: Ensuring Reproducibility in Modern Software

    As someone who earned my PhD in Interdisciplinary Computer Science and spent the last 10+ years working in machine learning, AI, and software development, I can confidently say: there’s a deployment problem across tech, and more broadly, there is a crisis of reproducibility in science. So what does that mean and why is it important? I’ve been thinking about that for years, and I’ve got answers.
    Blue AICPA and purple Vanta SOC 2 compliance badges against a mint colored background
    Published By
    Charlyn Glenn •
    5 September 2024

    Achieving SOC 2 Type 1 Compliance

    Building for Security and Reproducibility from Day One
    imho API Alpha
    Published By
    Charlyn Glenn •
    20 June 2024

    Fimio API Alpha 0.0.1 with Data Supported by Spice AI

    Fimio's Malicious Smart Contract Detection API: Protecting Your Web3 Applications with Data Provided by Spice AI