peep the horrors

Large model skepticism is not AI pessimism

I've been working on a longer AI-related post for a few months now. I might eventually just release it to get it done with, but I think I'm just too far past the Dunning-Kruger hump to be happy with basically anything too technical I write about AI at this point. So this is a shorter post about general principle stuff.

I would probably say I am mostly an AI optimist. I have obviously seen a lot of very interesting and important projects in academia that leverage machine learning methods, and I even think pretrained large models like GPT can be very useful for natural language applications. But I am still fairly skeptical of a lot of the post-2023 LLM-centered AI hype. Why is that?

To explain, I've always found it useful to look back at the history of cloud adoption as a framework. When people and businesses started integrating computer use into their daily life, that encouraged people to develop what might be called "computer literacy" skills. Non-technical people in the early days of the internet would eventually have to learn how a filesystem worked, or how to configure their operating system, or how filesharing worked, or even more technical things like HTML to set up a web presence in Geocities or what have you1.

As we transitioned to the more "modern" internet, though, the mass adoption of centralized cloud tools, on top of related phenomena like social media, interrupted that growing curiosity that people might have about how computers worked. This was kind of their appeal: you don't have to learn how to install Word or some other text processor anymore, you just go to the Google Docs website. Lowering the barrier to entry meant stuff like the Google Suite or Facebook could attract more users than Office or Wordpress, which in turn meant more profits. If you were a "computer optimist" before, you'd be discouraged by this turn, since it seems like it restricts the horizon of possibilities for computer use to only those things which these hegemonic companies deem acceptable.

My concerns with current-day AI innovations are similar. LLMs are the centerpiece of basically every big AI player nowadays, and by their very nature they are centralizing: training a large model is completely unfeasible for individuals, and even inference for models like ChatGPT is still too resource-intensive to be run anywhere but dedicated datacenters. To me, it seems like the horizon of possibilities for AI has shrunk from the late 2010s to now. We've gone from the exciting prospects of building general-purpose tools like TensorFlow or PyTorch to allow anyone to build models from any kind of data, to a much narrower, LLM-dominated landscape.

I do particularly worry about this because, going back to the cloud stuff, mass cloud adoption did also end up meaning that going against the grain is actively harmful. Economies of scale mean that sure, you can host your web server locally, but it's going to be pricier and have worse uptime and a lot of other worse things compared to just using Azure. We're kind of seeing the same thing start to happen with NLP tasks: sure, you can pay some AI engineer a lot of money to build an NLP model for your specific use case, but it's going to be cheaper to just get some web dev to make a ChatGPT API call to consult the millennia of dead labor encoded within.

This is low effort, so no conclusion.

  1. As a zoomer who is very "computer literate" I do feel like people overromanticize this period and pretend older generations were better at the computer than they actually are. But it is a fact that you needed to learn those things if you wanted to do them.