The US Navy want to create an archive of at least 350 billion social media posts from around the world, in order to study how people talk online.
The Navy did specify which social media platform(s) plans to collect the data from.
The posts must be publicly available, come from at least 100 different countries and include at least 60 different languages.
They should also date between 2014 and 2016.
The details were revealed in a document from the Naval Postgraduate School for a firm to provide the data.
Additional Requirements
- the posts must come from at least 200 million unique users
- no more than 30% can come from a particular country
- at least 50% must be in a language other than English
- location information must be included in at least 20% of the records
Private messaging and user information will not form part of the database per the document for the Naval school.
“Social media data allows us for the first time, to measure how colloquial expressions and slang evolve over time, across a diverse array of human societies, so that we can begin to understand how and why communities come to be formed around certain forms of discourse rather than others,” T Camber Warren, the project’s lead researcher, told Bloomberg.
The US Navy was behind the creation of Tor, the anonymous browsing network, in 2002.