Daniel's Blog

Why Most Image-Gen Startups Will Die Off

Right after the crypto-winter, a new fad begins

Years ago when I was working in a gaming MNC, we tried using Generative Adversarial Networks (GANs) to solve an image generation problem in a real business use case: automating idle animations for 2D images that were not already created using SPINE or LIVE2D software to replace a legion of 30 artists. Our software worked and solved it, but ultimately the solution was abandoned.

A retrospective led to us realising that while we provided a platform for image generation with a tool with complex features to support the usecase, but artists were asking us if we can integrate it into Photoshop instead, and payment was not on the table. People overestimate how much money individual artists are willing to pay for tools, especially since artists are underpaid as an industry.

Image generation is not a new tool in the AI utility belt, but the recent advances in text-to-image with Stable Diffusion, Dall-E etc allowed more powerful and intuitive interfaces to these AI models. This led to a blossoming of startups attempting to make the next AI unicorn with it aka ImageGen startups. ImageGen startups mostly target the research phase of art and design, where the assumption is with a sufficiently well crafted prompt, the provided image should be close enough to the final creation.

This assumes three things.

  1. Artists enjoy talking to customers and figuring out their requirements.
  2. The styles can be sufficiently close to whatever they need that change is minimal.
  3. Internet artists are happy with their art taken as input for AI models.

Here the zeroth assumption that everything works 90% of the time as a baseline, aka you can get the result you want as long as you spend enough time and effort to polish the prompt.

Neither of which couldn't be further from the truth.

While artists are better than a layman at crafting prompts for stable diffusion or Dall E, they aren't wordsmiths aka they don't want to be writing, they want to be drawing! If artists enjoyed creating with words, why not be an author instead? Artists want to get from zero->art as soon as possible without sacrificing the artistic process, and the tool that provides the least friction help the best. Least friction, not in the sense of time and money, but in the sense of fulfilling the sastisfaction of creation. This is currently best done through a stylus and screen, where each stroke is in of itself part of the process. With prompts, it takes away most of the process.

While style blending is possible, I doubt stable diffusion or other ImageGen base models understand style guides and how to craft them, besides providing enough layers so the artists can work with the original generated art. Imagine a new artist joining an existing art team with an existing style guide and how much work the person needs to understand how and where to contribute. ImageGen startups want their solution to be this new artist, and yet fail to understand why this is the wrong problem with the right solution.

Text-to-Image model creators are already being sued, and for good reason. Plagiarism is already a touchy subject among internet artists, and using implicitly copyrighted art for "research purposes" does not play well with artists who are not even credited in the final creation of whatever solution these startups create. This won't change, and plagiarism activism has only grown since more and more artists are speaking up against these acts of desecration.

So now, back to the point about targetting the research phase. What competition are there and where do artists get their inspiration from then? Pinterest, Behance, and other social sites is where artists can find well crafted, dissected answers to what they are seeking, and these websites are free. Essentially websites like http://playground.ai cannot compete, or will likely look similar to every other image gen site because they are tackling the wrong problem with the wrong tool, and Image Generation Models are not cheap to host or train. Experience places the operational cost of managing hardware and a team consisting of image MLOps, DevOps, AI engineering, AI scientists, and software developers at a ballpark of 5x-10x that of a traditional software development.

So...how can we use ImageGen as a layman to provide references to artists? Sure, if you're willing to pay for it? But if you're already going to pay for an artist, why pay then? To be honest, image gen would be useful if people focus on the hard parts of art creation: 3D/2.5D modelling from 2D concept art. How do you turn a 2d art into layers for LIVE2D or SPINE, and then a 3d model? This would simplify the process for many studios and reduce operational costs greatly.

I can't stress the importance of understanding processes before talking it with tech, especially for startups who lack the cash and time to burn through an idea.


Recent posts