Back To Top

US Social Security: Data Availability and Ethical Use


US Social Security: Data Availability and Ethical Use

Did you know that Social Security data is used for much more than benefits computing?



Social Security datasets are required by different categories of entities for various purposes, now more than ever. Social Security was originally intended to only provide benefits for retired, unemployed and disadvantaged Americans. However, by now, it covers more benefits, hence the universality of its use such as health insurance inclusion and its trust fund.
Slowly, the Social Security number became the single most adopted identification number in the United States of America both by government agencies and the private sector. The significance of this 9-digit number translates to the importance of the data connected to it that was harvested from the Social Security Administration (SSA) of the USA. It not only involves information about the beneficiaries of the social security but also about their respective employers and level of income and much more.
Besides computing benefit amounts for citizens eligible for the social insurance programs, the Social Security administrative data is relevant to private businesses, governmental as well as non-governmental organizations and also research labs where studies are conducted for policy evaluation, innovation and development. The datasets provided by the SSA fall into 3 different categories: PublicRestricted Public or Non-Public datasets.

Data Availability.

Despite the dynamic events that the United States of America has witnessed both on a national and global level in the previous 5 years, the Social Security Administration managed to provide a scaling number of datasets. The nature and proportion of these datasets, however, varies in time.

Figure 1: Social Security datasets’ percentages (2015-2020)

You may hover over the chart for more details.

[visualizer id=”10366″ lazy=”no” class=””]

      Looking at the Figure 1, a noticeable pattern shows the increase in Public datasets’ percentage from the total number datasets in the SSA from 2015 to 2020 from 41.01% to 71.51% and a less proportionate percentage decrease in both Non-Public and Restricted Public datasets that accounts to about half of the respective percentages from 2015 to 2020.
      No matter the nature of the events that occurred throughout these 6 fiscal years, they make space for potential research studies on a limitless number of topics. R&D professionals tend to link Social Security data with their survey data to successfully conduct their studies. This is due to the restricted scope of variables in SSA’s public datasets.
      On the other hand, restricted public and non-public datasets cover more ground in terms of features which makes them rare for the public. This is proven in the previous chart with the largest respective percentages of 50.73% and 8.26% from a total of 10,791 datasets in the year 2015.
The progressive change in these numbers poses a question regarding what happened in the US since 2016 that could have possibly stimulated this or been affected by it.

Data Use.

Ethical Data Use.

Figure 2: Number of SSA Public Datasets (2016-2020)

You may hover over the chart for more details and interactive options.
The chart illustrates an identical pattern in the number of public datasets from 2016 to 2020, specifically in the 4th quarter of every year. It also highlights a few alarming events for the American Citizen which could represent a probable reason for the observed pattern.

The SSA minimizes data politicization during presidential elections

SSA restricts over 100 datasets prior to Trump’s presidential election in 2016

One month prior to the presidential elections of Donald Trump in 2016, the number of publicly released datasets dropped from 969 to 811. This accounts to over 100 datasets that either have been restricted to a certain number of people or organizations or clasfified as non-public for the purpose of minimzing the risk of data exploitation and politicization for election polls or they were completely disposed of by the Social Security Administration.

SSA shows a decrease of 52 public datasets prior to Biden’s election in 2020

The drop in the number of public social security datasets also reoccured one month prior to the presidential elections of Joe Biden in 2020 and it accounted to a decrease of 52 datasets from September to November 2020. However, between the two previous elections, the loss gap has got slowly smaller than 100 over the years, especially after the national scandal of Cambridge Analytica Data Leaks in March 2018.
Documents from Cambridge Analytica in London revealed that the firm improperly obtained and used over 87 million Facebook user profiles in a transaction with Donald Trump’s presidential compaign where the scraped private Facebook data was used to build voter profiles and assist the candidate in the US presidential elections of 2016. This national data privacy crisis proved that data exploitation is not limited to commerical purposes and that it can target the American Citizen without their consent.

Social security data contributes in making the workplace safer

152 public datasets were restricted during the “Me Too” Movement in the US

The same phenomena occured with a drop rate of 13.18% in the beginning of the US “Me too” Movement from October to September 2017. “Me too” is a global movement that condemns sexual harrasement and gives a voice to its victims.
This movement was led by activists, the victims themselves as well as lawyers who offered their support with pro bono cases and needed access to more data about their clients’ workplaces especially in cases of sexual harrasment in a working environment. The sudden decrease could reflect the restriction of some of these 152 public social security datasets either by choice of the SSA or by request from the respective victims’ representatives.

Social Security Administration reacts to data privacy

SSA gives access to 1312 public datasets on Data Privacy Day in January 2019

Subsequently, the National Data Privacy Day on January 28th, 2019 was like no other in the US. Many institutions manifested its importance and spred awareness about the cause after the Cambridge Analytica scandal, including the Social Security Administration.
The SSA became more dedicated to protecting Personally Identifiable Information (PII) and securing the public social security datasets portal for more ethical use in the future, hence, it publicly released 1312 datasets in January 2019, a record breaking number since the start of 2016.

Crisis & Social Impact.

Figure 3: Social Security public datasets released in 2020

You may hover over the chart for more details.

[visualizer id=”10376″ lazy=”no” class=””]

Social Security data becomes more available during Covid-19 crisis

SSA’s monthly public datasets reach their maximum of 1469 since the pandemic

The year 2020 was a very unbalanced year for the world and in particular the United States of America where many events affected almost every American citizen. The Covid-19 Pandemic in January 2020 was a global health crisis that did influence how early aged employees with chronic diseases decided to retire and requested their social security from the SSA. This made an impact on the urge to get access to social security data especially by national health researchers in the USA who are trying to put an end to the “infodemic” during the crisis by preventing the spread of misleading or fake information and news.
Until the 2020 presidential elections, the number of public datasets released by the SSA in 2020 has been continuously increasing from 1430 to a maximum value of 1469 in order to satisfy these research needs. Moreover, non-governmental organizations were in need of social security data in the year 2020 as they worked on advocating equal health care opportunities such as First Covid-19 Vaccinations starting from December 2020 to all US social security beneficiaries which will probably result in a new rise in the beginning of 2021.

Social Security data contributes to #BLM Movement

SSA publicly releases 1457 datasets during #BLM movement

Protests for the Black Lives Matter globel movement that was initiated in the USA shortly after the death of George Floyd in May 2020 have generated a large number of petitions and fundraising campaigns regarding #BlackLivesMatter across the internet and numerous sensibilization events about ending racism globally. The movement also approached the BLM cause on an international level which perhaps pushed the SSA to move some datasets from being restricted or non-public to being publicly available to everyone everywhere.
This was proven in the fiscal year 2020, given the exceptionally high number of public datasets released of 1457 in May 2020. The data was used to highlight the gap between white and black people’s wages, benefits and much more with the purpose of raising awareness about the details of racism present in an american’s everyday activities.

TikTok App challenges data protection values of the SSA

SSA releases 1463 public datasets during TikTok US data harvest allegations

The issue of uncertain data privacy was brought up once again with the TikTok US data harvest risks. The main concern of the American users was the possibility that history with Facebook could possibly repeat itself with TikTok after July 2020; when the number of publicly available social security datasets unusually reached 1463. By the end of 2020, data became something extremely accessible but the concern was not about the quantity but rather the use of the data. Researchers claim that the TikTok app starts collecting all types of data the moment the app is installed on the device and utilisze it in their custom advertisments’ display.
The Social Security Administration works tirelessly to secure american citizens’ private data and publicly release other data with their consent. But, what would be the point if social media giants such as TikTok can collect and share this data with third parties in a matter of seconds due to ambigiuous privacy policies and terms of service?


The Social Security Administration has been playing an active role in 2020 when it comes to endorsing public data to support Research & Development departments in different fields in the United States of America. This includes global health which is the world’s priority during the Covid-19 crisis and many more fields. The data that is publicly released on a yearly basis also contributes in recovery from other types of crisis, raising awareness about data privacy threats from social media or content creation giants and about data transparency when it comes to policy making and elections in the US. As for the restrictions made on private datasets to the administration, some are working on protecting US citizens data from political exploitation. Nonetheless, some are preventing NGOs from tackling certain topics and keeping the data from speaking for the voiceless. The SSA needs to open calls for datasets where every external organization can make a data usage proposal for certain restricted or non-public datasets added by the administration. How else would social security data be as transparent as the SSA claims? How else would the world use the data for positive social change and ethical purposes?



Sarra Hannachi is a Master of Science Student in Business Analytics at Tunis Business School, Tunisia and a Data Storytelling Intern at DataEthics4All. She’s passionate about statistics and data science and about organizations that apply their data practices ethically and for the greater good. Sarra is also the leader of a Data Science Club “DSC TBS” in her campus for business students looking to enhance their skills in Data Science & AI and to enrich their knowledge of these fields.

Contact: LinkedIn

Customer Reviews

    Showing 0 reviews

Leave a Reply

Thanks for submitting your comment!
You must be logged in to vote.