AI in Music: An Artist's Sous-Chef or Skynet 1.0?
As OpenAI dominates headlines and mindshare, the impending impact on the creative industries cannot be understated. ChatGPT’s implications for songwriting - and writing in general - go without saying.
And in 10 seconds, ChatGPT has blessed us with the crucial seed for a Platinum track. Biebs circa 2012 vibes.
Admittedly, I have been wary of this type of technology in the context of music creation. The notion of instantaneous and fully automated track production, without any real need for technical ability (shoutout Rick Rubin), is equal parts exciting and demoralizing. Whereas pioneers like Amper (now owned by Shutterstock) cater to more of a professional end user, there is a new breed of highly consumerized AI engines. These engines - companies like Soundful, Boomy, and MusicStar.AI - don’t just lower the barrier to entry for creation, they open up the floodgates.
They are prolific and expeditious content generators, especially when compared to their more traditional counterparts that rely on human production. Soundful boasts the capacity to create 39k tracks per day, which is more than royalty-free music hub Epidemic Sound’s entire reported catalog size of 36k tracks.
Whether the output from these AI engines is commercially viable on its own remains up for debate. But one need only listen to Yung Lean’s ‘Ginseng Strip 2002’ once to realize that TikTok can create viral hits out of tracks that might not have been up to snuff in a prior era. Spotify and other DSPs are already inundated by a glut of content (100k+ daily uploads) which makes breaking artists increasingly more difficult; the question becomes whether AI engines will do more to complement real artistry or further accentuate our content oversupply issue. In a year-end letter to employees, Universal Music Group CEO Lucian Grainge cited the oversupply issue as a cause for an eroding consumer experience and, more importantly, fewer royalties back to deserving artists.
Consumers are increasingly being guided by algorithms to lower-quality functional content that in some cases can barely pass for “music.” Let me explain. In order to entice consumers to subscribe, platforms naturally exploit the music of those artists who have large and passionate fan bases. But then, once those fans have subscribed, consumers are often guided by algorithms to generic music that lacks a meaningful artistic context, is less expensive for the platform to license or, in some cases, has been commissioned directly by the platform.
It’s difficult to envision a future in which AI engines aren’t exacerbating this problem.
Man v. Machine
Whether or not you believe AI-induced job/income displacement is overblown in the near term, there is a real example that could begin to play out in the music industry. Royalty-free music hubs like Epidemic Sound and Artlist (i.e., the traditional counterparts) have built unicorn businesses by paying artists upfront for content that the companies then charge a subscription for. These upfront payments range from $1.2k-$6k and historically didn’t include any downstream participation (e.g., streaming or sync revenues from the track). After catching flak for not having enough of an artist-first remuneration model, Epidemic Sound amended its model to include a 50/50 royalty split for select artists/tracks. On average, artists working with Epidemic Sound make $35k annually, and top earners eclipse $200k. That’s a nice income stream, especially in an era when indies are challenged to make a living from streaming.
In 2021, Epidemic Sound incurred a $40mm operating loss on nearly $70mm in revenue. The culprit? Increasing content costs to spur growth of its curated and premium catalog via a more artist-friendly model. Naturally, AI engines are targeting Epidemic Sound, Artlist, and others as customers for their content. As I see it, Epidemic Sound has a few choices: 1) Move quickly acquire one of these AI engines; 2) Become a SaaS customer of an AI engine(s) or; 3) Sit back and watch as the AI engines undercut you in price (Soundful is already in market with a B2C offering that’s 50% cheaper per month than Epidemic Sound) - the advantage of significantly lower content costs - and eat into your market share. In any case, it doesn’t bode well for the human artist, a less efficient cost center in this scenario. And I fear we move to a place where production music hubs paying human artists becomes more of an ESG-esque virtue signal than primary content source.
It’s ultimately the customers of these AI engines that stand to benefit in an increasingly competitive market through access to larger, more diverse libraries at a cheaper monthly cost. I’m keen to see what will ultimately differentiate engines from one another. Will it be access to and usage of commercial music to train the AI/input thereby leading to a higher quality output? Will it be a seamless UX in the form of simplest creation workflows? Do the engines ultimately become commoditized and stuck in a race to the bottom a la distributors?
While AI might threaten a creative process rooted in first principles, it will also undoubtedly empower artists in ways we probably can’t even predict yet. A talented singer/songwriter on a shoestring budget could spin up a beat without a producer. Producers can quickly workshop ideas to inspire content that is then expanded upon in a DAW. An existing artist could do a deal with an engine and encourage her fans to remix the content for further amplification. We’re just getting started.
Ownership
Perhaps the most important question around all of this is who owns the content created using AI? In the case of the AI music engines, it is the companies that own it, at least initially; a core business model tactic centers around charging users for ownership of the copyright. On Soundful it will cost you $50 - the equivalent of just over 14k streams on Spotify.
But who (or what) would be entitled to own the composition above? OpenAI? Poo Bear? Me? Sam Altman?
In the case of OpenAI and ChatGPT, the answer remains somewhat opaque. There are several illuminating excerpts from the terms of use, however:
As between the parties and to the extent permitted by applicable law, you own all Input, and subject to your compliance with these Terms, OpenAI hereby assigns to you all its right, title and interest in and to Output. OpenAI may use Content as necessary to provide and maintain the Services, comply with applicable law, and enforce our policies. You are responsible for Content, including for ensuring that it does not violate any applicable law or these Terms (Content; Your Content).
It goes without saying that I own the input prompts. OpenAI is essentially granting me ownership of the output with some important caveats: 1) the company retains the ability to use the output to train its machine learning models (‘The Service’), and 2) the onus is on me to ensure that the output doesn’t infringe on anyone else’s work. The latter presents a bit of a conundrum seeing as the output was clearly influenced by Poo Bear’s copyrighted works. The question then becomes whether OpenAI and other AI companies are built on the back of rampant copyright infringement or fair use of those works, which aims to encourage the use of copyright-protected work to promote freedom of expression. This same question underpins a litany of ongoing lawsuits against AI companies. Chief among them, Getty Images is a plaintiff in multiple lawsuits against Stability AI, the company behind image generator Stable Diffusion, for unlawful use of Getty’s image repository to train its engine (Getty Images vs. Stability AI).
In the same way that Getty Images is accusing Stability AI of copyright infringement via unlicensed usage of its image repository, would Poo Bear’s publisher, BMG, have a case against OpenAI? It’s difficult to say that the above composition on its own, although clearly inspired by Poo Bear’s prior works, constitutes a derivative work. After all, there is nothing in the composition that explicitly copies lines from his prior works. But the notion that a potentially endless amount of output inspired by his work could be created without any compensation back to him doesn’t seem fair.
Daniel Gervais, an IP law professor at Vanderbilt University, has delved into how fair use applies in AI. One of the most important factors, according to Gervais, is whether the use threatens the livelihood of the original creator. He uses an example that’s particularly relevant in the context of Poo Bear:
If you give an AI 10 Stephen King novels and say, ‘Produce a Stephen King novel,’ then you’re directly competing with Stephen King. Would that be fair use? Probably not.
In the most equitable scenario, OpenAI would license the composition (and other content) from BMG. You can imagine Poo Bear being compensated differently based on query types. Queries that explicitly ask for his style, like mine, would yield a higher royalty, whereas queries that tap his works without an explicit reference (e.g., write me a short pop song) would yield a lower royalty. This tiered compensation would only be possible if AI engines had robust content ID infrastructure a la YouTube. In the absence of such a feature, rights holders could instead propose a blanket license, or fixed fee that’s irrespective of how frequently the content is used. This is the current licensing framework for brick and mortar businesses that use commercial music, as there is no reliable way at scale to track the content being played within these locations.
Whether this equitable scenario is actually realistic is another question; the breadth of content - 300 billion words - inputted to train ChatGPT and other AI engines could make licensing a Sisyphean endeavor.
Watermarking
Another excerpt from OpenAI’s terms of use sheds light on what I can and can’t do with the output is the below:
You may not represent that output from the Services was human-generated when it is not (Usage Requirements; Restrictions).
Seeing as the Copyright Act only protects original works from a human author, users are then technically not able to register output as copyright. But how would anyone know?
A cryptographic watermark is said to be on ChatGPT’s product roadmap. Undetectable by the human eye, the watermark would embed a 'pseudorandom’ statistical pattern into the output’s distribution of words that would otherwise appear random. Those with access to the corresponding private key would be able to match text as output created by the platform. While a noble start, this safeguard is far from foolproof as Altman admits that all it would take is a ‘determined’ person to get around it.
On the flip side, the proliferation of generative content also means it becomes increasingly important for us to sign our work. In a December blog post titled Sign Everything, Fred Wilson makes the case for signing posts and storing the content on chain. Doing so, according to Wilson, will help ‘manage our identity and humanity.’ For those throwing stones at an already beleaguered crypto industry for its ‘lack of use cases,’ well, here you go.
Conclusion
Perhaps I’m falling victim to the Luddite fallacy with my concerns. This type of myopia has plagued the music industry as technological advances - in both product and models - have occurred. It took decades for the synthesizer, created in 1952, to popularize across music genres. Initially, some traditional musicians and critics saw the synth as a threat to ‘real’ music, a replacement for live instruments, and a sign of the commodification of music. The digital consumption era was initially marred by The Pirate Bay and Napster before the advent of streaming.
But never before have we seen technology that can automate every facet of the creation process. Time will tell whether this automation will be a net positive for the music industry. My hope is that we get to a state where creators are complemented, not supplanted, by this technology, and fairly compensated for their contributions in shaping it.
Sources (of inspiration):
Lucian Grainge: Streaming Needs a New Model - Music Business Worldwide
AI Art Tools Targeted with Lawsuit - The Verge
Making a Living in Music: How Epidemic Sound Works With Artists
Epidemic Sound 2021 Annual Report
Who Ultimately Owns the Content Created by ChatGPT and Others? - Forbes