Identifying Advertisements in Podcasts

URL
Dokumentart: Master Thesis
Institut: Fachbereich Informatik
Sprache: Englisch
Erstellungsjahr: 2025
Publikationsdatum:
Freie Schlagwörter (Deutsch): Sprachverarbeitung , Podcasts , Werbung , Transformer , LanguageBind
Freie Schlagwörter (Englisch): Language processing , Podcasts , Advertisement , Transformer , LanguageBind
DDC-Sachgruppe: Informatik
BK - Klassifikation: 54.75

Kurzfassung auf Englisch:

As podcasting developed from a passion project for hobbyist to a vastly popular entertainment medium, advertiser spending increased to nearly $2 billion by 2024. Today, podcasting is a popular form of entertainment, with close to 100 million users in the United States. Advertisement identification in podcasts is a challenging tasks due to the heterogeneous nature of content and advertisements. Being directly embedded into the audio stream, ad identification requires the segmentation of content based on features extracted from the raw audio signal as well as the transcriptions. For a lack of readily available podcast datasets with advertisement annotations, we create a dataset of popular podcast shows in the U.S, manually annotating over 150 episodes. As manual dataset annotation is tedious and time-consuming, we leverage annotations from an open source ad database for YouTube videos to build a second, larger dataset of podcast-like YouTube videos. Upon exploration of the created dataset, we find differences in ad-related statistics between sources, e.g. number of advertisements per episode, number of advertisements in a row and advertisement type and duration. We introduce the local classifier model architecture that uses the multimodal Transformer model LanguageBind to generate embeddings from audio and text data to classify single-sentence input samples from the in-domain audio podcat dataset. Multimodal embeddings outperform embeddings computed from single modalities. We also concatenate input samples to increase model context in the superlocal architecture, unable to meaningfully improve of single-sentence results. Due to factors like subpar annotation quality and lack of advertiser-produced advertisements in the out-of-domain YouTube dataset, training models on out-of-domain data to transfer learned feature characteristics for in-domain inference proved to be unsuccessful.

Hinweis zum Urherberrecht

Für Dokumente, die in elektronischer Form über Datenenetze angeboten werden, gilt uneingeschränkt das Urheberrechtsgesetz (UrhG). Insbesondere gilt:

Einzelne Vervielfältigungen, z.B. Kopien und Ausdrucke, dürfen nur zum privaten und sonstigen eigenen Gebrauch angefertigt werden (Paragraph 53 Urheberrecht). Die Herstellung und Verbreitung von weiteren Reproduktionen ist nur mit ausdrücklicher Genehmigung des Urhebers gestattet.

Der Benutzer ist für die Einhaltung der Rechtsvorschriften selbst verantwortlich und kann bei Mißbrauch haftbar gemacht werden.