Visualizing The 50 Biggest Data Breaches From 2004–2021
Connect with us

Technology

Visualizing The 50 Biggest Data Breaches From 2004–2021

Published

on

View high-resolution version of infographic

This graphic visualizes the 50 largest data breaches, by entity and sector, since 2004.

Visualizing The 50 Biggest Data Breaches From 2004–2021

As our world has become increasingly reliant on technology and data stored online, data breaches have become an omnipresent threat to users, businesses, and government agencies. In 2021, a new record was set with more than 5.9 billion user records stolen.

This graphic by Chimdi Nwosu visualizes the 50 largest data breaches since 2004, along with the sectors most impacted. Data was aggregated from company statements and news reports.

Understanding the Basics of Data Breaches

A data breach is an incident in which sensitive or confidential information is copied, transmitted or stolen by an unauthorized entity. This can occur as a result of malware attacks, payment card fraud, insider leaks, or unintended disclosure.

The targeted data is often customer PII (personally identifiable information), employee PII, intellectual property, corporate data or government agency data.

Date breaches can be perpetrated by lone hackers, organized cybercrime groups, or even national governments. Stolen information can then be used in other criminal enterprises such as identity theft, credit card fraud, or held for ransom payment.

Notable Data Breaches Since 2004

The largest data breach recorded occurred in 2013 when all three billion Yahoo accounts had their information compromised. In that cyberattack, the hackers were able to gather the personal information and passwords of users. While the full extent of the Yahoo data breach is still not fully realized, subsequent cybercrimes across the globe have been linked to the stolen information.

Here are the 50 largest data breaches by amount of user records stolen from 2004–2021.

RankEntitySectorRecords CompromisedYear
1YahooWeb3.0B2013
2River City MediaWeb1.4B2017
3AadhaarGovernment1.1B2018
4First American CorporationFinance885M2019
5SpambotWeb711M2017
6LinkedinWeb700M2021
7FacebookTech533M2021
8YahooWeb500M2014
9Marriott InternationalRetail500M2018
10SyniverseTelecoms500M2021
11FacebookWeb419M2019
12Friend Finder NetworkWeb412M2016
13OxyDataTech380M2019
14MySpaceWeb360M2016
15ExactisData340M2018
16TwitterTech330M2018
17AirtelTelecoms320M2019
18Indian citizensWeb275M2019
19WattpadWeb270M2020
20MicrosoftWeb250M2019
21Experian BrazilFinance220M2021
22Chinese resume leakWeb202M2019
23Court VenturesFinance200M2013
24ApolloTech200M2018
25Deep Root AnalyticsWeb198M2015
26ZyngaGaming173M2019
27VKWeb171M2016
28EquifaxFinance163M2017
29DubsmashWeb162M2019
30Massive American business hackFinance160M2013
31MyFitnessPalApp150M2018
32EbayWeb145M2014
33CanvaWeb139M2019
34HeartlandFinance130M2009
35NametestsApp120M2018
36TetradFinance120M2020
37LinkedInWeb117M2016
38Pakistani mobile operatorsTelecoms115M2020
39ElasticSearchTech108M2019
40Capital OneFinance106M2019
41Thailand visitorsGovernment106M2021
42FirebaseApp100M2018
43QuoraWeb100M2018
44Rambler.ruWeb98M2012
45TK / TJ MaxxRetail94M2007
46MyHeritageWeb92M2018
47AOLWeb92M2004
48DailymotionWeb85M2016
49AnthemHealth80M2015
50Sony Playstation NetworkGaming77M2011

The massive Yahoo hack accounted for roughly 30% of the 9.9 billion user records stolen from the Web sector—by far the most impacted sector. The next most-impacted sectors were Tech and Finance, with 2 billion and 1.6 billion records stolen, respectively.

Although these three sectors had the highest totals of user data lost, that doesn’t necessarily imply they have weaker security measures. Instead, it can probably be attributed to the sheer number of user records they compile.

Not all infamous data breaches are of a large scale. A smaller data breach in 2014 made headlines when Apple’s iCloud was hacked and the personal pictures of roughly 200 celebrities were disseminated across the internet. Although this highly targeted hack only affected a few hundred people, it highlighted how invasive and damaging data breaches can be to users.

The Cost of Data Breaches to Businesses

Every year data breaches cost businesses billions of dollars to prevent and contain, while also eroding consumer trust and potentially having an adverse effect on customer retention.

A 2021 IBM security report estimated that the average cost per data breach for companies in 2020 was $4.2 million, which represents a 10% increase from 2019. That increase is mainly attributed to the added security risk associated with having more people working remotely due to the COVID-19 pandemic.

Measures to Improve Data Security

Completely preventing data breaches is essentially impossible, as cybercrime enterprises are often persistent, dynamic, and sophisticated. Nevertheless, businesses can seek out innovative methods to prevent exposure of data and mitigate potential damages.

For example, after the iCloud attack in 2014, Apple began avidly encouraging users to adopt two-factor authentication in an effort to strengthen data security.

Regardless of the measures businesses take, the unfortunate reality is that data breaches are a cost of doing business in the modern world and will continue to be a concern to both companies and users.

green check mark icon

This article was published as a part of Visual Capitalist's Creator Program, which features data-driven visuals from some of our favorite Creators around the world.

Subscribe to Visual Capitalist
Click for Comments

Technology

The Evolution of Media: Visualizing a Data-Driven Future

Media and information delivery is transforming at an increasing pace. Here’s why the future will be more data-driven, transparent, and verifiable.

Published

on

In today’s highly-connected and instantaneous world, we have access to a massive amount of information at our fingertips.

Historically, however, this hasn’t always been the case.

Time travel back just 20 years ago to 2002, and you’d notice the vast majority of people were still waiting on the daily paper or the evening news to help fill the information void.

In fact, for most of 2002, Google was trailing in search engine market share behind Yahoo! and MSN. Meanwhile, early social media incarnations (MySpace, Friendster, etc.) were just starting to come online, and all of Facebook, YouTube, Twitter, and the iPhone did not yet exist.

The Waves of Media So Far

Every so often, the dominant form of communication is upended by new technological developments and changing societal preferences.

These transitions seem to be happening faster over time, aligning with the accelerated progress of technology.

  • Proto-Media (50,000+ years)
    Humans could only spread their message through human activity. Speech, oral tradition, and manually written text were most common mediums to pass on a message.
  • Analog and Early Digital Media (1430-2004)
    The invention of the printing press, and later the radio, television, and computer unlock powerful forms of one-way and cheap communication to the masses.
  • Connected Media (2004-current)
    The birth of Web 2.0 and social media enables participation and content creation for everyone. One tweet, blog post, or TikTok video by anyone can go viral, reaching the whole world.

Each new wave of media comes with its own pros and cons.

For example, Connected Media was a huge step forward in that it enabled everyone to be a part of the conversation. On the other hand, algorithms and the sheer amount of content to sift through has created a lot of downsides as well. To name just a few problems with media today: filter bubbles, sensationalism, clickbait, and so on.

Before we dive into what we think is the next wave of media, let’s first break down the common attributes and problems with prior waves.

Wave Zero: Proto-Media

Before the first wave of media, amplifying a message took devotion and a lifetime.

Add in the fact that even by the year 1500, only 4% of global citizens lived in cities, and you can see how hard it would be to communicate effectively with the masses during this era.

Or, to paint a more vivid picture of what proto-media was like: information could only travel as fast as the speed of a horse.

Wave 1: Analog and Early Digital Media

In this first wave, new technological advancements enabled widescale communication for the first time in history.

Newspapers, books, magazines, radios, televisions, movies, and early websites all fit within this framework, enabling the owners of these assets to broadcast their message at scale.

With large amounts of infrastructure required to print books or broadcast television news programs, it took capital or connections to gain access. For this reason, large corporations and governments were usually the gatekeepers, and ordinary citizens had limited influence.

AttributeDescription
📡 Information FlowOne-way
💰 Barriers to EntryVery high
📰 DistributionControlled by mass media companies and government
🏆 IncentiveTo cast a wide net, and to not alienate viewers or advertisers

Importantly, these mediums only allowed one-way communication—meaning that they could broadcast a message, but the general public was restricted in how they could respond (i.e. a letter to the editor, or a phone call to a radio station).

Wave 2: Connected Media

Innovations like Web 2.0 and social media changed the game.

Starting in the mid-2000s, barriers to entry began to drop, and it eventually became free and easy for anyone to broadcast their opinion online. As the internet exploded with content, sorting through it became the number one problem to solve.

For better or worse, algorithms began to feed people what they loved, so they could consume even more. The ripple effect of this was that everyone competing for eyeballs suddenly found themselves optimizing content to try and “win” the algorithm game to get virality.

AttributeDescription
📡 Information FlowTwo-way
💰 Barriers to EntryVery low
📰 DistributionControlled by technology companies and algorithms
🏆 IncentiveTo cast a narrow net, to engage and mobilize a specific audience

Viral content is often engaging and interesting, but it comes with tradeoffs. Content can be made artificially engaging by sensationalizing, using clickbait, or playing loose with the facts. It can be ultra-targeted to resonate emotionally within one particular filter bubble. It can be designed to enrage a certain group, and mobilize them towards action—even if it is extreme.

Despite the many benefits of Connected Media, we are seeing more polarization than ever before in society. Groups of people can’t relate to each other or discuss issues, because they can’t even agree on basic facts.

Perhaps most frustrating of all? Many people don’t know they are deep within their own bubble in which they are only fed information they agree with. They are unaware that other legitimate points of view exist. Everything is black and white, and grey thinking is rarer and rarer.

Wave 3: Data Media

Between 2015 and 2025, the amount of data captured, created, and replicated globally will increase by 1,600%.

For the first time ever, a significant quantity of data is becoming “open source” and available to anyone. There have been massive advancements in how to store and verify data, and even the ownership of information can now be tracked on the blockchain. Both media and the population are becoming more data literate, and they are also becoming aware of the societal drawbacks stemming from Connected Media.

As this new wave emerges, it’s worth examining some of its attributes and connecting concepts in more detail:

  • Transparency:
    Data literate users will begin to demand that data is transparent and originating from trustworthy, factual sources. Or if a source is not rock solid, users will demand that limitations of methodology or possible biases are openly revealed and discussed.
  • Verifiability and Trust:
    How do we know data shown is legitimate and bonafide? Platforms and media will increasingly want to prove to users that data has been verified, going all the way back to the original source.
  • Decentralization and Web3:
    Anyone can tap into large amounts of public data available today, which means that reporting, analysis, ideas, and insights can come from an increasingly growing set of actors. Web3 and decentralized ledgers will allow us to provide trust, attribution, accountability, and even ownership of content when necessary. This can remove the middleman, which is often large tech companies, and can allow users to monetize their content more directly.
  • Data Storytelling
    Growing data literacy, and the explosion of data storytelling is a key approach to making sense of vast amounts of data, by combining data visualization, narrative, and powerful insights.
  • Data Creator Economy:
    Democratized data and the rise of storytelling are intersecting to create a potential new ecosystem for data storytellers. This is increasingly what we are focused on at Visual Capitalist, and we encourage you to support our Kickstarter project on this (just 6 days left, as of publishing time)
  • Open-Ended Ecosystem:
    Just like open source has revolutionized the software industry, we will begin to see more and more data available broadly. Incentives may shift in some cases from keeping data proprietary, to getting it out in the open so that others can use, remix, and publish it, and attributing it back to the original source.
  • Data > Opinion:
    Data Media will have a bias towards facts over opinion. It’s less about punditry, bias, spin, and telling others what they should think, and more about allowing an increasingly data literate population to have access to the facts themselves, and to develop their own nuanced opinion on them.
  • Global Data Standards:
    As data continues to proliferate, it will be important to codify and unify it when possible. This will lead to global standards that will make communicating it even easier.

Early Pioneers of Data Media

The Data Media ecosystem is just beginning to emerge, but here are some early pioneers we like:

  • Our World in Data:
    Led by economist Max Roser, OWiD is doing an excellent job amalgamating global economic data in one place, and making it easy for others to remix and communicate those insights effectively.
  • USAFacts:
    Founded by Steve Ballmer of Microsoft fame to be a non-partisan source of U.S. government data.
  • FRED:
    This tool by the Federal Reserve Bank of St. Louis is just one example of many tools that have cropped up over the years to democratize data that were previously proprietary or hard to access. Other similar tools have been created by the IMF, World Bank, and so on.
  • FiveThirtyEight:
    FiveThirtyEight uses statistical analysis, data journalism, and predictions to cover politics, sports, and other topics in a unique way.
  • FlowingData:
    At FlowingData, data viz expert Nathan Yau explores a wide variety of data and visualization themes.
  • Data Journalists:
    There are incredible data journalists at publications like The Economist, The Washington Post, The New York Times, and Reuters that are tapping into the early beginnings of what is possible. Many of these publications also made their COVID-19 work freely available during the pandemic, which is certainly commendable.

Growth in data journalism and the emergence of these pioneers helps give you a sense of the beginnings of Data Media, but we believe they are only scratching the surface of what is possible.

What Data Media is Not

In a sense, it’s easier to define what Data Media isn’t.

Data Media is not partisan pundits arguing over each other on a newscast, and it’s not fake news, misinformation, or clickbait that is engineered to drive easy clicks. Data media is not an echo chamber that only reinforces existing biases. Because data is also less subjective, it’s less likely to be censored in the way we see today.

Data is not perfect, but it can help change the conversations we are having as a society to be more constructive and inclusive. We hope you agree!

Continue Reading

Technology

33 Problems With Media in One Chart

In this infographic, we catalog 33 problems with the social and mass media ecosystem.

Published

on

problems with media

33 Problems With Media in One Chart

One of the hallmarks of democratic society is a healthy, free-flowing media ecosystem.

In times past, that media ecosystem would include various mass media outlets, from newspapers to cable TV networks. Today, the internet and social media platforms have greatly expanded the scope and reach of communication within society.

Of course, journalism plays a key role within that ecosystem. High quality journalism and the unprecedented transparency of social media keeps power structures in check—and sometimes, these forces can drive genuine societal change. Reporters bring us news from the front lines of conflict, and uncover hard truths through investigative journalism.

That said, these positive impacts are sometimes overshadowed by harmful practices and negative externalities occurring in the media ecosystem.

The graphic above is an attempt to catalog problems within the media ecosystem as a basis for discussion. Many of the problems are easy to understand once they’re identified. However, in some cases, there is an interplay between these issues that is worth digging into. Below are a few of those instances.

Editor’s note: For a full list of sources, please go to the end of this article. If we missed a problem, let us know!

Explicit Bias vs. Implicit Bias

Broadly speaking, bias in media breaks down into two types: explicit and implicit.

Publishers with explicit biases will overtly dictate the types of stories that are covered in their publications and control the framing of those stories. They usually have a political or ideological leaning, and these outlets will use narrative fallacies or false balance in an effort to push their own agenda.

Unintentional filtering or skewing of information is referred to as implicit bias, and this can manifest in a few different ways. For example, a publication may turn a blind eye to a topic or issue because it would paint an advertiser in a bad light. These are called no fly zones, and given the financial struggles of the news industry, these no fly zones are becoming increasingly treacherous territory.

Misinformation vs. Disinformation

Both of these terms imply that information being shared is not factually sound. The key difference is that misinformation is unintentional, and disinformation is deliberately created to deceive people.

Fake news stories, and concepts like deepfakes, fall into the latter category. We broke down the entire spectrum of fake news and how to spot it, in a previous infographic.

Simplify, Simplify

Mass media and social feeds are the ultimate Darwinistic scenario for ideas.

Through social media, stories are shared widely by many participants, and the most compelling framing usually wins out. More often than not, it’s the pithy, provocative posts that spread the furthest. This process strips context away from an idea, potentially warping its meaning.

Video clips shared on social platforms are a prime example of context stripping in action. An (often shocking) event occurs, and it generates a massive amount of discussion despite the complete lack of context.

This unintentionally encourages viewers to stereotype the persons in the video and bring our own preconceived ideas to the table to help fill in the gaps.

Members of the media are also looking for punchy story angles to capture attention and prove the point they’re making in an article. This can lead to cherrypicking facts and ideas. Cherrypicking is especially problematic because the facts are often correct, so they make sense at face value, however, they lack important context.

Simplified models of the world make for compelling narratives, like good-vs-evil, but situations are often far more complex than what meets the eye.

The News Media Squeeze

It’s no secret that journalism is facing lean times. Newsrooms are operating with much smaller teams and budgets, and one result is ‘churnalism’. This term refers to the practice of publishing articles directly from wire services and public relations releases.

Churnalism not only replaces more rigorous forms of reporting—but also acts as an avenue for advertising and propaganda that is harder to distinguish from the news.

The increased sense of urgency to drive revenue is causing other problems as well. High-quality content is increasingly being hidden behind paywalls.

The end result is a two-tiered system, with subscribers receiving thoughtful, high-quality news, and everyone else accessing shallow or sensationalized content. That everyone else isn’t just people with lower incomes, it also largely includes younger people. The average age of today’s paid news subscriber is 50 years old, raising questions about the future of the subscription business model.

For outlets that rely on advertising, desperate times have called for desperate measures. User experience has taken a backseat to ad impressions, with ad clutter (e.g. auto-play videos, pop-ups, and prompts) interrupting content at every turn. Meanwhile, in the background, third-party trackers are still watching your every digital move, despite all the privacy opt-in prompts.

How Can We Fix the Problems with Media?

With great influence comes great responsibility. There is no easy fix to the issues that plague news and social media. But the first step is identifying these issues, and talking about them.

The more media literate we collectively become, the better equipped we will be to reform these broken systems, and push for accuracy and transparency in the communication channels that bind society together.

Sources and further reading:

Veils of Distortion: How the News Media Warps our Minds by John Zada
Hate Inc. by Matt Taibbi
Manufacturing Consent by Edward S. Herman and Noam Chomsky
The Truth Matters: A Citizen’s Guide to Separating Facts from Lies and Stopping Fake News in its Tracks by Bruce Bartlett
Active Measures: The Secret History of Disinformation and Political Warfare by Thomas Rid
The Twittering Machine by Richard Seymour
After the Fact by Nathan Bomey
Ten Arguments for Deleting Your Social Media Accounts Right Now by Jaron Lanier
Zucked by Roger McNamee
Antisocial: Online Extremists, Techno-Utopians, and the Highjacking of the American Conversation by Andrew Marantz
Social media is broken by Sara Brown
The U.S. Media’s Problems Are Much Bigger than Fake News and Filter Bubbles by Bharat N. Anand
What’s Wrong With the News? by FAIR
Is the Media Doomed? by Politico
The Implied Truth Effect by Gordon Pennycook, Adam Bear, Evan T. Collins, David G. Rand

 

Continue Reading

Subscribe

Popular