Platforms collided in the struggle for control over information
From large tech companies to startups, artificial intelligence application makers are actively licensing e-books, images, videos, audio and other data from information brokers. Shutterstock, for example, has agreements with Meta (recognized as extremist and banned in Russia), Google, Amazon and Apple to supply millions of images to train its models, and OpenAI partners with several news organizations to train its models on news archives.
In many cases, data creators and owners do not note how their information changes hands, often without their consent or even knowledge of it. However, a startup called Vana aims to change that.
Anna Kazlauskas and Art Abal teamed up in 2021 to found Vana. Kazlauskas, an MIT graduate with degrees in computer science and economics, previously worked on launching a fintech company, while Abal, who has a corporate law background, began his career in law and then worked on impact sourcing at a software company. data annotation Appen.
Together they created the Vana platform, which allows users to «merge» your data, such as chats, audio recordings and photos, into datasets. This data can then be used to train generative models. In addition to this, they are committed to creating a more personalized experience for users, including daily motivational voice messages based on their goals, as well as “art” apps tailored to the user's style preferences.
Vana presents its platform and API to developers as follows: «Vana's API brings together users' cross-platform personal data so you can personalize your application. The app gets instant access to the user's personalized AI model or underlying data, simplifying onboarding and eliminating computational overhead issues. We believe that users should be able to bring their personal data out of the gated gardens like Instagram, Facebook and Google into your app, so you can create amazing, personalized experiences from the very first user interaction with your product. ;raquo;.
The process of creating an account in Vana is simple. Once the email address is verified, the user can attach data to the digital avatar (such as selfies, descriptions, and voice recordings) and experience apps built using Vana's platform and datasets. Applications range from ChatGPT-style chatbots and interactive storybooks to the Hinge profile generator.
The question, however, is why, in an age of growing awareness of data privacy and threats from ransomware, someone would be willing to provide their personal information to an anonymous startup, especially one funded by venture capital?& nbsp;
Vana has raised $20,000,000 to date from Paradigm, Polychain Capital and other investors and is focused on ensuring data privacy and trust for its users, promising to handle their information securely and ethically.
Anna Kazlauskas emphasized that the main principle of the platform is to restore user control over their own data. Vana users can host their data themselves rather than storing it on third-party servers, while being able to control how and when data is shared with apps and developers. Kazlauskas argues that since Vana charges a monthly subscription (starting at $3.99) and a «data transaction fee» developers (for example, for sharing data sets for training AI models), the company has no incentive to use users and their personal data unethically.
While Vana doesn't currently sell user data to companies to train generative AI models, it is committed to enabling users to do it themselves, starting with their Reddit posts.
This month Vana launched Reddit Data DAO — a program that aggregates the Reddit data of multiple users (including their posting history and »karma) and allows them to jointly decide how to use the combined data. After requesting and uploading their data to the DAO via a Reddit account, users are able to vote with other DAO members on decisions to license that data to generative AI companies for shared profit. This is somewhat of a response to Reddit's recent moves to monetize data on its platform.
Reddit has previously openly provided access to public data to train generative AI. However, a change in company policy before the IPO led to the commercialization of data. Since the policy change, Reddit has received over $203,000,000 in licensing fees from companies including Google.
«The main task of the DAO is to rid user data of the main platforms seeking to accumulate and monetize it. This project — part of our commitment to helping users combine their data into their own sets to train AI models», — notes Kazlauskas.
Reddit, which has no official partnership with Vana, expressed dissatisfaction with the DAO. The platform blocked the Vana subreddit dedicated to discussing the DAO, and a Reddit representative accused Vana of «exploiting» their data export system built to comply with data privacy regulations such as GDPR and the California Consumer Privacy Act.
A Reddit spokesperson stressed that its data processing mechanisms allow it to set restrictions on the sharing of public information with organizations, and that Reddit does not share sensitive personal data with commercial enterprises. When users request to export their data, Reddit will provide non-public personal information in accordance with applicable law. Direct partnerships between Reddit and trusted organizations with clear terms and responsibilities are key to preventing misuse and abuse of data.
Anna Kazlauskas suggests that the growth of DAOs may influence the amount Reddit can charge for data. The DAO currently has just over 141,000 members, which is only a small portion of Reddit's 73 million user base, and some of them may be bots or duplicate accounts.
The question arises about the fair distribution of possible payments received by the DAO from data buyers. Currently, the DAO rewards the user with «tokens» in the form of cryptocurrency, matching their karma on Reddit. However, karma may not always reflect quality data contributions, especially in smaller Reddit communities where it can be difficult to earn.
Kazlauskas proposes the idea that DAO members can share cross-platform and demographic data, making the DAO more valuable and encouraging participation. This also calls for additional user trust in Vana and responsible handling of confidential data.