The Ear Economy

Sandeep Murthy , Atharva Purandare
17th September 2021

There is something unique about audio-only content. Unlike Video content which captures our eyes and ears, audio targets only one of our senses: Hearing. Think about it- If you’re watching a movie on Netflix with terrible content but with a decent soundtrack and a background score, chances are, you’ll keep watching the film. If you’re out jogging and a terrible song comes on, you will quickly forward to the next one...

Communication remains at the core of who we are as humans. We write. We write to friends, family, and now internet strangers through letters, telegrams, text messages, and tweets. Not necessarily in that order. We write stories in newspapers, books, and blogs to communicate through generations. We have always communicated visually through hieroglyphics and now more efficiently through paintings, doodles, GIFs, NFTs, and videos. We make music. We all hum, whistle, sing in the shower and sing along to the car radio or your favorite Spotify playlist. And most importantly, we talk- in person, through calls, Facetime. Sometimes even through body language. Communication allows us to truly be the social animals we are.


How is Audio relevant again?

In 2021, thanks to the rise of social audio platforms, audio as a communication medium is back in fashion and is enjoying its time in the spotlight. But does audio really need to be disrupted?

Short answer: Yes.

Across all content forms, mediums have successfully transitioned from being physical digital. Within digital, platforms have shifted from revolving around Platform Generated Content (PGC) to User Generated Content (UGC). Within audio, there exists a glaring whitespace in the UGC space and this is what social audio apps are building for. This provides an exciting opportunity for new content platforms to emerge.

Source: Rex Woodbury and his piece on Social Audio

With data getting cheaper and faster (as low as $0.05/GB in India) and pictures/videos arguably being a more efficient way to communicate, it is crazy to think that an opportunity in audio exists in our modern world. Despite the meteoric rise in video content, the audio space globally continues to thrive, not just exist. And there is more to it than just social audio.


1) Podcasts- The global podcast market is projected to reach $41BN by 2026. 2020 was a fantastic year for podcasts- global podcasts downloads increased 189% from Jan ‘20 to Nov ‘20. With everyone being confined in their houses, and no time being dedicated to traveling, driving, or running outside, people’s behavior shifted from consumption to creation. Don’t believe us? There was a time when Amazon ran out of Podcast mics!


India has emerged as the third-largest podcast listening market in the world after China and US, with listeners increasing to 58MM in 2020. According to a report by Spotify, 45% of Indian millennials and GenZ listened to at least 5 podcast genres regularly in 2020.

2) Radio- Radio as a medium registered a 23% rise in listenership in 2020 in India. A recent study said the average time spent on Radio has increased by 23% to 2.4 hrs every day during the lockdown. This is 2nd only to television which witnessed a 25% growth to 3.3 hours per day. Radio enjoys immense distribution power in India- 40MM listeners tuned in to Radio Mirchi in Dec ‘19. Every week.

3) Audio Streaming- Gone are the days of physical records and vinyl contributing to the music industry’s growth. In 2019, music streaming contributed 80% to the overall $20BN industry. This is evident in Spotify’s growth (the largest music streaming platform globally)- the platform has seen a 30% YoY growth in MAUs ever since 2015. Interestingly, this phenomenon is also seen in Ximalaya, the largest non-music audio streaming platform in the world. Despite this crazy momentum, their numbers seem pale in comparison to video-streaming platforms leaving a lot of headroom to grow for audio streaming platforms.



MAUs of Ximalaya and Spotify have witnessed Year-on-Year growth

Drivers of the Audio Ecosystem

Clearly, there is a lot going on here. What is driving this growth in the audio space and attracting all this attention? We believe there are 3 major drivers at work here-


1) The Rise of Audio Tech- Hardware enablers, especially earphones, are causing changes in the way audio is consumed. Apple is on track to sell 100MM pairs of AirPods next year— capturing ~70% of the wireless headphone market. Airpods currently do more revenue than Adobe, Uber, Snapchat, Twitter, and even Spotify. Back home, there is a similar story playing out in the Earphones space. With a 37% market share in Indian Personal Audio products, BoAt recorded revenue of $93MM in FY20, growing 193% YoY. The homegrown D2C brand has more market share than Realme, JBL, Sony, OnePlus, and Samsung combined.


Boat dominates the Earphones market in India

Smart speakers have also been a common sight in a lot of households. Amazon Alexa + Google Assistant combined sold 150MM units last year, and are on track to do 300MM by 2025.

2) Screen Fatigue- Screens have become an extension of our bodies, especially so in the lockdown. We work on our screens, attend school, stay connected with loved ones, date, workout, and now also book vaccine appointments. This has led to immense screen fatigue. Audio is a good respite from this- we don’t need to look at our screens to consume content.

3) Audio’s Asynchronous Nature- All of us divide our attention in a given day across 3 different types of activities- Synchronous, Semi-Synchronous, and Asynchronous. Whether you’re watching Netflix, playing Fortnite, reading on a Kindle, or in a Zoom meeting, all your attention is directed to the activity. It is synchronous. When you’re waiting for your bus to arrive or just walking down the street, you sporadically scroll through Instagram, TikTok, and Twitter- you need to give it only part of your attention and can continue doing your primary activity (semi-synchronous). Audio, as a medium, has a unique quality- it is completely asynchronous in nature. When you listen to music, podcasts, and books, you can continue to run, lie down on bed, drive, travel, do laundry or wash dishes.

Thus, we would argue that the biggest enablers for audio platforms are your daily chores- and whether you decide to fill up those hours of your chores with some content. All of these factors acting together are a strong multiplier for the larger audio ecosystem.


Market Map



We see 5 major buckets in the audio space- Streaming Platforms, Distribution (Hosting&Editing) platforms, Long Tail listening apps, Production houses, and Closed audio apps. There are, of course, further nuances in each bucket but it’s too broad and not relevant to this discussion. There are 2 important buckets here that are consumer-facing and relevant for our thesis-

1)    Streaming Platforms (Musical: Spotify, Gaana, Saavn, etc, and Non-Music Audio: Ximalaya)

2)    Closed Audio Platforms (UGC: Social Audio + PGC Content Platforms).

Expanding on the latter more, the differentiation between Live / Pre-Recorded and PGC / UGC is essential to understand. Mapping this out further-


The closed audio ecosystem globally occupies only 2 quadrants currently

It is interesting to see that as the ecosystem stands right now, it only occupies 2 quadrants in this matrix- a platform either streams Pre-Recorded & Professionally Generated Content or Live & User Generated Content. We believe this binary nature of categorization will soon change- the eventual evolution of these platforms will lead to them existing in all 4 quadrants, with everyone wanting to capture the UGC whitespace in audio. Let’s deep dive further into streaming platforms.


Streaming Platforms

Music streaming services have become omnipresent. For as little as $2/month, you can listen to music from your favorite artists anywhere you like, whenever you want to. Moreover, they’ll help you discover new artists according to your taste and you can even see what your friends are playing. Neat, right? What’s not to like? Services like Spotify and Apple Music have become second nature to the everyday music listener, and in the past decade, streaming has not only dominated music consumption but also the entire industry.


Source: The state of the music industry


There are 2 protagonists in the music industry’s story (sadly artists aren’t one of them)- Music Publishers and Record Labels. At a basic level there are two distinct copyrights:

a) Song – a.k.a. the ‘composition’ which may include lyrics
b) Sound recording – a.k.a. ‘master’ or ‘master recording’

Without the song, there can be no sound recording. A record label represents only the sound recording (master). On the other hand, a music publisher is responsible for all recordings of a song, including covers by other artists. The record label controlling a cover version will usually be different from the record label that controls the original artist’s recording. The music publisher controlling the underlying song remains constant irrespective of the recording. Universal Music Group, Sony Music, and Warner Music Group function as both publishers and record labels and are collectively referred to as the ‘Big 3’ of this world.



Labels and Publishers act as intermediaries between the streaming platforms and the artists & songwriters. But is this ‘representation’ really beneficial for the artists monetarily? Not really- by the time the money paid out by these platforms, in the form of royalties, trickles down to artists, it’s a small fraction of how much the platforms, labels, and publishers make. Some reports claim that Spotify pays 0.6 to 0.8 cents per stream to artists, which is absurdly low. Solving this warrants a larger conversation around how the music industry is structured.



Source: The Opportunity in Social Audio


On the other side of the table, streaming platforms are required to pay out a hefty amount in the form of royalties to these music publishers and record labels. This has lead to weak gross margins (~22% for Spotify) and often hampers profitability. How can unit economics be improved for these businesses?


The Streaming Wars: how to improve Gross Margins?

Spotify currently is the largest music streaming platform globally, with 356MM MAUs last quarter. Since it gives away most of its revenue in the form of royalties, Spotify has lower Gross Margins than other subscription content companies (Spotify: ~22%, Netflix: ~40%, Calm: ~80%). When you compare streaming services across content forms, the difference in numbers is furthermore evident.

Source: Company filings

We believe there are 3 realistic ways music streaming platforms can improve their low gross margins-

1) Launch their ‘private’ label (Music Label): The Big 3 record labels (Sony, Universal, and Warner) own 70% of all the music in the world. Currently, Spotify and all large music streaming platforms are dependent on them for music streaming rights. This leads to a major cost in the form of royalties. With streaming now accounting for a majority of the industry’s revenue, the power dynamics need to reverse- the labels need streaming platforms. Top Indian platforms like Gaana, Saavn, and Wynk can invest in new music talent (with better data informing its investments) thus enabling them to promote their own roster of artists under their own label. Artists will also want to work with them because the platforms have a direct connection to their fans and are responsible for their revenue.


2) Introduce Non-Music content: One way to reduce royalties is to, well, not pay them at all. The rise of Non-Music content like Podcasts removes the inherent dependencies on labels and brings in a new vertical for audio content. Like we mentioned earlier, Podcast consumption has shot up significantly in the past year. Apple Music and Spotify’s focus on podcasts also opens up another viable monetisation channel for the platforms

3) Introduce Original Content: Finally, introducing Freemium Podcasts on top of signing up exclusive creators ushers in the era of non-music audio content. Signing up exclusive creators also opens up the top of the funnel, with the creators bringing with them their millions of fans (Eg- Joe Rogan, Michelle Obama, Kim Kardashian).

In India, ‘audio’ streaming is often used interchangeably with ‘Music’ streaming and we’re extremely bullish that in the coming years, the biggest driver of growth in the streaming industry is going to be driven by the rise of non-music audio, led by the growth of Podcasts. This will also lead to better margins. But for platforms like Gaana, Saavn, Wynk, and Spotify who are largely built around music, how do you introduce and perfect the non-music engine? We looked at how this has played out globally, and there was one clear answer-

Consolidation, Consolidation, Consolidation


Integrating non-music audio has been largely achieved through aggressive consolidation globally. Spotify spent ~$1BN acquiring 10 companies in the last 3 years. We found an interesting pattern in the way these streaming platforms evolve, grow, and acquire over time- to eventually complete the vertical stack. It starts with having Platform Generated Content (PGC) in the form of music (Spotify, Amazon Music). To enable creators to come on board and create their content, called Professionally User Generated Content (PUGC), there needs to be robust Hosting, Distribution, Editing platforms + Production Houses (Anchor, Megaphone, Gimlet, MGM, Parcast) and thus, these PGC platforms integrate vertically backward through acquisition. To complete the stack upwards (consumer-facing), platforms like Spotify have also acquired social audio platforms, thus building out User Generated Content (UGC), and completing the stack.


The Audio ‘stack’ being completed by various companies to enable PGC + PUGC + UGC



Consolidation, as a means to complete the stack, has been observed globally in the audio space

Although not a lot of this has been seen in India yet, we expect to see a lot more acquisitions and consolidations in the Indian audio ecosystem in the coming years. There have been early signs with Pratilipi acquiring IVM Podcasts and PocketFM and Pratilipi raising funding from Times Internet, which owns Gaana, the leading Music streaming platform in India.

Ximalaya: The largest Non-Music Audio Platform

This gradual shift towards non-music audio begs the question- Has any platform managed to achieve significant scale? Does it solve the problems highlighted above? Enter Ximalaya, the largest online audio platform in China in terms of MAUs and revenue. In 2020, users spent a total of an astounding 1.5TN minutes listening to Ximalaya’s audio content, accounting for approximately 75% of total mobile listening time amongst all online audio platforms in China. Insights from Ximalaya’s success are key to understanding the future of audio in India, both in streaming and social audio.

Source: Company filings

Ximalaya’s FY21 Revenue came in at $621MM and only 43% of this was driven by subscriptions (as compared to ~90% in music streaming platforms). Further, it had a Gross Margin of 49% and had a thriving content creator user base (enabling UGC) of 5.2MM. 2 important things to understand here- If not subscription, where is the revenue coming from? What are the different content types and who are the content creators? 

1. Revenue Streams- Ximalaya has diversified revenue verticals, much different from the models in the West which largely depends on Advertising + Subscription. It has 5 revenue streams:

a) Subscription- Ximalaya has 13.3MM subscribers making subscriptions the single largest source of revenue - that's 43%

b) Advertising- This is driven by display and audio ads on the platform. They have now also introduced programmatic advertising which now accounts for ~50% of advertising revenue at 26%

c) Live Streaming- They have ~3.5MM live-streaming MAUs. The consumers tip these live creators in the form of virtual gifts and items, which Ximalaya then takes a cut from. Which acccounts for 18% of the revenue.

d) Educational services- Ximalaya has packaged its courses into subject-specific boot camps and vocational training programs for children. This is akin to an audio-only MasterClass. Just 6%

e) Others- Revenue from the sale of IoT devices or fees they receive for the rights to convert their audio content to text. Accounts for 7% of revenue


   2. Content and its Creators-  Ximalaya has over 280MM tracks of audio content across 100 genres. These unique content types are made by various kinds of content creators on the platform. Ximalaya features Podcasts, Audiobooks, Audio Entertainment (traditional Chinese comedies), Educational Content, Audio Live Streaming. Because of this unique, non-music nature of the content, it opens up untapped revenue streams. They have 3 types of content being generated on the platform:

a) Professionally generated content (PGC)- This includes content secured via partnerships and licensing deals with top-tier publishers, online literature platforms, content creators, and key opinion leaders, including getting the rights to audiobooks?.

b) Professional user-generated content (PUGC)- Ximalaya has a marketplace that matches content creators with high-quality copyrighted content that they can produce. They also train these content creators and helps them produce content. This allows these creators to have a higher reach and better monetization than they might have otherwise had.

c) User-generated content (UGC)- Easy for anybody to create and upload and get distribution for a wide variety of audio content. The UGC creators help maintain a breadth of content on the platform and also serve as a funnel into PUGC creators (many of the PUGC creators started as UGC)

While most streaming services are largely PGC, Ximalaya has taken a different route and is a true pioneer in a strategy we’re extremely bullish about in the overall audio space: The PGC + UGC + PUGC approach. How does this strategy workout in actual audio apps? Let’s dive deeper into the Closed Audio Platforms (UGC + PGC).

Closed Audio Platforms (UGC + PGC)

There is something unique about audio-only content. Unlike Video content which captures our eyes and ears, audio targets only one of our senses: Hearing. Think about it- If you’re watching a movie on Netflix with terrible content but with a decent soundtrack and a background score, chances are, you’ll keep watching the film. If you’re out jogging and a terrible song comes on, you will quickly forward to the next one. Because only one of our senses is engaged, the quality threshold for audio content is set much higher than that for video. Consumers simply won’t tolerate bad content. Swipe. Onto the next Podcast. Since it’s professionally made, the best quality content almost always ends up being PGC. Thus, while we believe that the eventual transition from PGC PUGC UGC is essential to scale, PGC remains the hook to acquire and retain audio listeners.

Going back to Ximalaya for a second, this strategy also holds true for them. There are 2.1k PGC creators, 4.6k PUGC creators, and 5.1MM UGC creators on their platform. Despite the huge difference, UGC creators’ content make up only 52% of the listening time. Thus, 48% of the listening time is driven by the 0.1% of creators (PUGC + PGC), who inadvertently have better quality content. Let’s look at how this strategy works out in the 
UGC-Live space.



48% of the listening time is driven by only 0.1% of content creators (PGC + PUGC)


UGC and Live

This year has seen an explosion in the growth of social audio features, platforms, and services. As we stated earlier, these social audio platforms exist in the 1st quadrant occupying the ‘Live’ and ‘UGC’ space. We’re looking at this UGC-first space through 2 lenses: 1) Is social audio a feature or a platform?  2) What are the use cases of the app?


The latter is easier to answer: while all the current platforms are building horizontal content since it is early days, we believe the true moat of a platform will only be established when Audio becomes verticalized.

The various companies coming up in this space, all with similar features, will render horizontal content meaningless. Discord is a great example of being synonymous with social audio within the vertical of Gaming.

In the feature/platform debate, at least in the form it currently exists in, social audio as a feature is a lot more feasible than having it as a sole platform. As mentioned above, the eventual scale on an audio platform comes from adopting the PGC + UGC + PUGC strategy. Firstly, having PGC content on your platform is the best way to retain an audience and establish quality content. On top of this, having social audio (UGC) as a feature helps open up the top of the funnel further and lets you truly be creator-first without worrying about monetization (which will come through Virtual Gifting/Tipping).


Social Audio (UGC) as a feature- UGC channel becomes a flywheel to better Gross Margins through PUGC. When a platform with established PGC content having a revenue stream launches UGC, there is no pressure to generate revenue for the creators or the UGC platform as it is being funded by the parent company. UGC platforms become akin to a talent contest- the best performing creators are selected by the platform to create PUGC. Creators also have an incentive to use the platform: to get noticed by the platform. This completes the flywheel and helps achieve the UGC + PGC + PUGC growth model ensuring both, retention and engagement.



Social Audio as a feature helps drive this virtuous cycle

That being said, we believe all social audio platforms currently lack moderation, curation, and discoverability. There are also a lot of concerns that remain unanswered around monetization, the ephemeral nature of the rooms, curation of news feed, lack of proof of work for creators (lack of recording), quality of content, and having a specific use case. Monetization has especially been a tough nut to crack owing to the fact that Indian users are simply not used to the live tipping (except the casual ‘Superchat’ on YouTube). The largest social audio platform, Lizhi, did $232MM of revenue in FY21. 99% of this came through virtual tipping and it also has a much larger userbase than Clubhouse- coming in at 58MM MAUs in FY21. Clubhouse in India has 8MM DAUs. Assuming a 40% DAU/MAU ratio for Clubhouse, this makes Lizhi at least 3x larger and with an audience much more accustomed to Virtual Gifting. Lizhi, despite being a standalone platform, has succeeded in retaining users and dominating this space by playing largely on the entertainment use case, enabling PUGC and PGC, and most importantly- giving users a strong creation tool.

Thus, we strongly believe that the horizontal nature of social audio platforms has been commoditized. All the platforms also feel largely similar from a feature perspective except for the occasional gamification. Platforms that go vertical and introduce better quality, original content and hence make the users stick faster will win within their vertical. This is because audio, as a vertical, emphasizes a lot more on quality than other content forms owing to only one sense (ears) being engaged. Clubhouse leads in the number of users currently, but if it doesn’t iterate on the problems, the winner for India for the entertainment use case will be Twitter Spaces for an India1 audience since it already attracts a similar crowd. Further, to enable PUGC/PGC on social audio platforms, a Creators’ Fund needs to be set up (similar to TikTok). Thus, Venture Capital money would be largely used to fund these Creator Funds?. We are bullish on the future of social audio being verticalized and a mixture of PGC+UGC+PUGC.


The future of Social Audio is verticalized and a mixture of PGC+UGC+PUGC


PGC and Pre-Recorded

The rise in prominence of social audio platforms is intuitive to understand- there was a whitespace to be built for and several companies stepped up. Why do we even need PGC, Pre-Recorded platforms? Don’t Spotify + Radio fill the whitespace? The answer is no. While a lot of audio platforms, including radio, exist in the country, there is a dearth of audio-first content in India. While the use cases of music, podcasts, and audiobooks are solved by Spotify, Gaana, JioSaavn, and Audible, we believe the world of entertainment and knowledge remain unexplored. This is exactly where these PGC platforms come in.


While radio has a large audience in India (51MM), the barriers to entry are incredibly high: the last FM Radio spectrum auctions cost $160MM. Radio also faces a lot of legal scrutiny around the use of IPs and commentary around socially relevant topics/politics. This leads to a majority of the content on radio being music-first, leading to further whitespace in the entertainment, knowledge, and conversational verticals.

PocketFM and KukuFM have a similar approach to the entertainment whitespace: through vernacular fictional storytelling. Unlike music streaming apps where premium subscriptions removes ads, on these platforms, a lot of interesting content is kept behind a paywall. Well-produced (PGC) long-form audio content has emerged as a clear winner in the storytelling space. Both these platforms have a strong focus on exclusive content and have backward integrated studios (PocketStudio and KukuStudio). This helps them build PUGC content. This shift to PUGC and UGC will be done through the conversational vertical as highlighted above, further cementing their position in the UGC, live space alongside the PGC play.

We have observed that building strong IP is a north star metric at an early stage. PocketFM having Tencent on its CapTable gives them another massive advantage- access to their global IP pool which they can license later. Another Tencent-backed company, Pratilipi is a market leader in the written vernacular stories space, and with its acquisition of IVM Podcasts, has entered into the Audio space with PratilipiFM.

Audio, as a content form, is one of the most intuitive mediums out there. We believe it is also a space where there is a lot of room for innovation- social audio being the most prominent example. If you’re a founder building something fascinating in this sector, especially in any of the above whitespaces, or just someone who wants to chat about the space and the companies, we’d love to talk (and hear!). Drop us an email at



  1. Across all content forms, companies have transitioned from having Platform Generated Content (PGC) to User Generated Content (UGC). Within audio, there’s an opportunity for a new content platform to emerge.
  2. This opportunity in audio is driven by the omnipresence of audio tech, extreme screen fatigue, and the asynchronous nature of audio platforms,
  3. This hands-free experience means that audio apps generally don’t compete with a vast competitive library of other startups. Instead, they compete with your time while you’re busy doing chores.
  4. Diving deeper into the audio ecosystem, there are 2 major groups of companies: 1) UGC and live & 2) PGC and pre-recorded. We believe this binary nature of categorisation will soon change- the eventual evolution of these platforms will lead to them existing in all 4 quadrants (PGC/UGC, Live/Pre-Recorded), with everyone wanting to capture the UGC whitespace in audio.
  5.  ‘Audio’ streaming is often used interchangeably with ‘Music’ streaming and in the coming years, the biggest driver of growth in the streaming industry is going to be driven by the rise of non-music audio, led by the growth of podcasts. This will also lead to better margins.
  6. This growth will be driven by an approach unique to the audio industry: PGC + PUGC + UGC (As seen in Ximalaya, the largest audio platform in China).
  7. Integrating non-music, audio has been largely achieved through aggressive consolidation globally, and remains inevitable in this space.
  8. While UGC + PUGC remains pivotal for the growth of audio, PGC remains the best way to hook the users initially. With audio, only one sense is stimulated (ears), so there is no replacement for good quality content, unlike video (ears + eyes) where the bar is set lower.
  9. We believe the future of audio is also highly verticalized. The music and podcasting vertical will be captured by the current music streaming platforms. Audible has a monopoly in the audiobooks vertical. Entertainment, knowledge, and conversational (social) remains up for grabs.
  10. Within conversational, UGC live platforms are on the rise. We strongly believe in audio as a feature over audio as a platform. Having UGC as a feature helps open up the top of the funnel and lets you truly be creator-first. This also achieves the UGC + PGC + PUGC growth model ensuring retention and engagement both. Thus, there is also no pressure to generate revenue for the UGC platform as it is being funded by the parent company.
  11. We believe UGC live platforms will also need to have PGC/PUGC eventually and Clubhouse will be the winner for India1 Entertainment market and Sharechat a potential winner for India 2/3 in the Conversational (Social) use case.
  12. There is a dearth of audio-first content in India. This is being tackled by PGC, pre-recorded platforms like KukuFM, PocketFM, Storytel, Headfone, etc. PocketFM, tackling the storytelling space through original content, is a leader in the space with a much higher scale than the competition.
  13. PocketFM seems well placed to win the space because of 1) Business model: subscription, advertising, licensing, virtual gifting (through a UGC play) and 2) Investors: having Tencent on the cap table gives them access to their global pool of IPs.
  14. Tencent-backed Pratilipi, the leader in vernacular written content space, is also gearing up to enter the audio space with its acquisition of IVM Podcasts. Tencent and Times Internet have now backed Gaana, PocketFM, and Pratilipi hinting at a possible consolidation.