Table of Contents
ToggleLink rot, also known as death link or reference rot, happens when hyperlinks stop working because the webpage or file they were supposed to lead to has been moved or is no longer available. When a link doesn’t work anymore it’s called a broken, dead or orphaned link.
The rate at which links stop working is a topic of research because it affects the internet’s ability to keep information accessible. Estimates of this rate vary widely among studies. Experts have cautioned that link rot could cause important historical data to vanish which could have implications for the legal system and academic research.
Often, when a website link is broken, it may redirect users to the website’s home page which can be confusing and make it hard to find the correct URL.
Prevalence
Several studies have looked at how often link rot happens on the World Wide Web, in academic papers that use URLs as references and in digital libraries.
A study from 2002 found that link rot in digital libraries is much slower than on the web. It showed that about 3% of the objects in these libraries were inaccessible after one year which means they had a half-life of almost 23 years.
Another study from 2003 looked at the web and found that approximately one out of every 200 links became broken each week suggesting a half-life of 138 weeks. This rate was supported by a study from 2016-2017 on links in the Yahoo! Directory. The directory had not been updated since 2014 after 21 years of development and the study found that the half-life of its links was two years.
Different types of web links can have vastly different lifespans. For example, links to specific file types or those hosted by academic institutions may last longer or shorter than average.
Studies have shown that URLs selected for publication tend to last longer than the average URL. For instance, a 2015 study found that links in the references of articles from major open-access publishers had a half-life of about 14 years. This confirms an earlier study that found half of the URLs cited in D-Lib Magazine articles were still active 10 years after publication.
While some studies have found higher rates of link rot in academic literature, they typically suggest a half-life of four years or more. For example a study in BMC Bioinformatics analyzed links in abstracts from Thomson Reuters’s Web of Science and found that the median lifespan of web pages was 9.3 years with only 62% being archived.
A 2021 study on external links in New York Times articles published between 1996 and 2019 found a half-life of about 15 years, with some topics showing more variation. 3% of the links that used to work no longer lead to the original content. This is called content drift.
In a 2013 study, nearly half of the links in U.S. Supreme Court opinions were found to be dead.
A study in 2023 focusing on United States COVID-19 dashboards discovered that 23% of state dashboards available in February 2021 were no longer accessible at their original URLs by April 2023.
According to Pew Research in 2023, 38% of web pages from 2013 were no longer available.
Why Links Stop Working
Link rot can happen for several reasons. The web page that a link is supposed to lead to might be deleted. The server hosting the page might fail, be taken down or move to a new domain name.
As early as 1999, it was recognized that losing data due to a hard drive failure could be as catastrophic as the burning of the library at Alexandria given the amount of information that can be stored on a single drive.
Sometimes a domain name’s registration expires or is transferred to someone else.
Some of these issues will cause a link to show an error like “HTTP 404 Not Found.” Others might lead the link to go to content different from what the author intended.
There are several other reasons why links can break:
- Websites may restructure, changing their URLs (for example, moving from domain.net/pine_tree to domain.net/tree/pine).
- Content that used to be free may be put behind a paywall.
- Changes in server technology like PHP can cause links to function differently.
- Dynamic content such as search results may change regularly.
- The target page or its content may be deleted.
- Links containing user-specific information like a login name may not work for others.
- Content filters or firewalls may block access to the link.
- Domain name registrations may expire.
Preventing & Detecting
To prevent link rot you can focus on placing content where it’s more likely to remain accessible creating links that are less likely to break, preserving existing links and fixing links that lead to relocated or removed content.
Creating URLs that won’t change over time is key to preventing link rot, a principle advocated by Tim Berners-Lee and other internet pioneers.
Strategies for creating links include:
- Linking to primary sources and stable websites rather than secondary sources.
- Avoiding links to resources on personal pages.
- Using clean URLs or techniques like URL normalization to ensure links remain valid.
- Using permanent identifiers like ARKs, DOIs or PURLs.
- Avoiding linking to non-webpage documents.
- Avoiding deep linking.
- Linking to web archives like the Internet Archive or WebCite to preserve content.
To protect existing links, you can:
- Use redirection mechanisms like HTTP 301 to automatically send users and search engines to the new location of content.
- Use content management systems that can update links automatically when content is moved or replaced.
- Include search features on error pages (HTTP 404) to help users find what they’re looking for.
Detecting broken links can be done manually or automatically. Automated methods include plugins for content management systems and standalone tools like Xenu’s Link Sleuth. However, automated checks might miss links that return a “soft 404” or links that return a “200 OK” response but lead to content that has changed.
“Death link” refers to hyperlinks that no longer lead to their intended destination, often due to the target page being removed or the URL being changed.