Suno - CA AB 2013 Disclosure
Datasets Used for Suno Music Generative AI Models
This document provides information for California consumers, pursuant to California Civil Code Section 3110 et seq., on how we, Suno, Inc. and our affiliates and subsidiaries (“Suno”), use data to train Suno’s generative artificial intelligence services.
Dataset sources: Suno’s music generative AI models are trained on publicly available music files and related metadata accessible on third-party websites on the open internet. Suno abides by all paywalls, password protections, and the like. In particular, Suno does not create login credentials for the purpose of obtaining training data from credentials-walled websites. Certain of Suno’s post-training datasets may also include user Content (as defined in the Suno Terms of Service) and User Activity Information (as defined in the Suno Privacy Notice).
Intended purpose: Suno uses the collected data to train its music generative AI models, which are intended to create novel music from text prompts.
Data points, including counts and types: Suno’s training data consists of tens of millions of public music audio files and corresponding textual metadata that help teach the models about what certain genres or types of music sound like. Suno also uses User Activity Information and Content to improve its models; the volume of available data is consistent with the number of outputs that have been generated using the Service at any given time.
Inclusion of public domain or IP-protected data: Suno trains its models on datasets consisting of songs that may be from the public domain and/or that may be subject to intellectual property protection.
Acquisition of data: As described above, Suno’s models were developed with publicly-available data, obtained in a manner that abided by all paywalls, password protections, and the like as well as Content and User Activity Information collected pursuant to Suno’s Terms of Service and Privacy Policy.
Inclusion of personal information or aggregate consumer information: Suno’s training datasets primarily consist of publicly available materials, which are not personal information as defined in subdivision (v) of California Civil Code Section 1798.140. Suno does not knowingly include in its training dataset aggregate consumer information as defined in subdivision (b) of California Civil Code Section 1798.140.
Suno’s training datasets also contain Content and User Activity Information. Suno removes structured identifying information, such as usernames, from this information prior to use for training.
Cleaning, processing, and other modification to datasets: Suno undertook processing steps to associate audio files to their related metadata. Prior to training, Suno also organizes, cleans, filters, and processes the collected datasets to remove junk or other low-quality data and improve its usefulness for model training.
Data collection time period: Suno has been collecting data to train its music generative AI models since Spring 2023. Its data collection efforts remain ongoing as it continues to develop new versions of its models.
Dates the datasets were first used: Suno began collecting the datasets in Spring 2023 and began using them for model development shortly thereafter. Data collection and use remain ongoing as Suno continues to develop new versions of its models.
Use of Synthetic Data: Suno uses synthetic data in the development of its generative artificial intelligence systems and services.
