In 2023, my team and I began work on perhaps one of the most challenging content audits ever conducted on the HubSpot blog. We have carried out content audits in the past – but not in this form.
We conducted the audit in three phases:
- Phase 1 dealt with our oldest content.
- Phase 2 rated our worst-performing content.
- Phase 3 assessed the value of our topic clusters.
Ultimately, we reviewed over 10,000 blog post URLs and over 450 topic clusters.
In this post I will focus on the first phase of our audit. I’ll walk you through how we reviewed and took action on our oldest content. I will also share the results we found.
But first let me provide some background on why we decided to conduct an audit of this magnitude.
Why we checked
It all started in early 2023. Back then, my team was called the “Historical Optimization Team” and we sat at the intersection of HubSpot’s SEO and blogging teams.
We were responsible for updating and optimizing our existing blog posts and finding growth opportunities in our library. (We have since evolved into what is now the EN Blog Strategy team.)
If you’re new here, the HubSpot blog is HUGE.
For comparison, the blog was 13,822 pages in February 2023, the month we began our review. Mateusz Makosiewicz of Ahrefs even declared it the “biggest corporate blog… of all time.” SEO case study Earlier this year.
Although we are fortunate to have high domain authority and generate millions of visits each month, a blog of this size is not without its challenges.
As our library ages, the opportunity for new content in our blog properties and clusters shrinks.
That’s why we decided to audit our library to find optimization opportunities.
We assumed that by examining the oldest 4,000 URLs in our library, we could uncover “greenspace” and “quasi-greenspace” – topics we’ve covered but not used as well.
Although this only made up about a third of our content library, we believed we could uncover some traffic opportunities and give our blog a boost.
Around the same time, we began to feel the impact of Google’s March 2023 Core Update, which emphasized experience, and our technical SEO team immediately began addressing it.
However, another part of this algorithm update emphasized the freshness and usefulness of the content. In other words, how current and useful our content is to our readers.
We really felt a sense of urgency here.
Since we had over 4,000 URLs with publication dates between 2006 and 2015, we already knew that this piece of content was not current or helpful.
So we got to work reviewing these blog posts for ten weeks.
Finally, we added phases two and three to our plan so we could further address unhelpful content and clusters.
How we reviewed our oldest content
1. Define our goals.
Before we started reviewing the content, it was important for us to set the goals.
For some publishers, the goal of a content audit may be to improve on-page SEO, increase user engagement, align content with marketing goals, or identify content gaps.
For this particular audit, this meant uncovering “greenspace” and “quasi-greenspace” in our blog library and improving the freshness of all our content.
We also had to determine the scope of our audit.
There is no right or wrong approach. Depending on your goals and the size of your website, you can check the whole thing at once.
You can also start with a small part of your website (e.g. product pages or specific topic clusters) and build from there.
Because HubSpot has such a large content library, we decided to limit this review to our oldest 4,000 URLs. Not only was this easier to manage than checking all of our content in one audit, but it also targeted URLs that were more likely to benefit from an update or cleanup.
We also knew that we would later deal with the rest of our library in phases two and three.
2. Collect our content inventory.
After determining our goals and scope, we needed to collect the oldest 4,000 blog posts and put them into a spreadsheet.
This process may vary depending on the tools and CMS you use. That’s how we did it Content Hub:
1. Sign in to and navigate to HubSpot Blog Page in the Content Hub.
2. Navigate to Actions drop down menu and click on it Export blog posts.
3. Choose File format and click export. This will send all information about your blog post to your email address. You’ll also receive a notification in HubSpot when your export is ready.
4. Download your export and open it in your favorite spreadsheet software (I’m usually a Google Sheets girl, but I had to use Microsoft Excel since the file was so large).
5. Review each column in the table and delete those that are not relevant to your exam. We immediately deleted the following:
- Post SEO titles
- Meta description
- Last modified date
- Post text
- Featured image URL
- Head HTML
- Archived
6. After removing the irrelevant columns, what was left was:
- Blog name
- Post title
- Tags
- Post language
- Post URL
- author
- Release date
- status
7. Filter the Post language column for EN Posts only. Once the sheet is filtered, delete the column.
8. Filter the status column for PUBLISHED only. Once the sheet is filtered, delete the column.
9. Filter the sheet Release date from oldest to newest.
10. Select the first 4,000 rows, copy and paste them into a separate table.
11. Name the new table Content Audit Master.
If you feel like it, you can do that too Create a custom report in Content Hub and select only the fields to include in the check so you don’t have to filter as much when setting up your table.
3. Retrieve the data.
After gathering all the content required for our audit, we needed to collect relevant data for each blog post.
For this audit, we kept it pretty simple and only analyzed total organic traffic from the previous calendar year, total backlinks, and total keywords.
We did this because our recommended actions for each URL were determined during post-evaluation. (We’ll cover this in the next step.)
We received organic traffic data from Google Search Console and used a VLOOKUP to match each URL with the corresponding number of clicks.
Then we obtained backlink and keyword data by copying and pasting our audit URLs Ahrefs’ batch analysis tool and exporting the data to our spreadsheet.
At the time of our testing, the batch analysis tool could only analyze up to 200 URLs at a time, so we had to repeat this step 20 times until we had data for each URL.
Luckily, Ahrefs has since released a Batch Analysis 2.0 tool that can analyze up to 1,000 URLs at once. So if we were to perform a similar audit in the future, retrieving this data would take much less time.
4. Evaluate the content.
Next, we rated each piece of content based on the data we collected. We then evaluated the post itself to determine the following:
- Type of content
- degree of freshness
- Organic potential
Type of content
There are many different types of blog posts on the HubSpot blog, each serving a unique purpose. Labeling each post helped us determine its relevance and became a key factor in our decision to update or prune it.
While this is not a complete list of all the content types you could find on the HubSpot blog, for the purposes of this review we have narrowed it down to the following:
- Educational: A topic that can educate the user about a pain point or problem they know they have.
- Thought leadership: A topic that can educate the user about a pain point or issue they didn’t know existed until an expert brought it to their attention.
- Business update: A HubSpot-related piece of news or a press release that is probably not always current.
- Newsjacking: An industry-related piece of news or a press release that is probably not always current.
- Research: A collection of data or results from an experiment intended to educate the reader. This topic may or may not always be current, but the content is not and needs to be updated to stay current.
degree of freshness
Because the posts in this review have not been updated for a long time, none of them could be considered 100% “fresh.” However, we considered different types of freshness when determining the actions required for the URLs.
For example, there are some themes like Google+ So outdated that an update would be silly. However, many topics were still relevant, even if our content was not.
The following scale helped us make decisions about whether the URL had value in terms of freshness:
- Outdated: The theme is out of date and updating may not be possible.
- Stale: The topic is always current, but it would need a major update to make the content more competitive.
- Relatively fresh: The topic is always current and only a moderate update is needed to make the content competitive.
Organic potential
To determine the organic potential of each URL, we had to ask ourselves the following question: Will anyone search for the content on Google?
- Yes: Someone would definitely look for it, so we need to optimize/recycle the content.
- NO: Someone wouldn’t look for it. There is no point in optimizing/reusing the content as there is no possible focus keyword.
For all posts marked “Yes” for organic potential, we recommended a focus keyword for the newly optimized content to compete for. To do this, we evaluated the existing title, slug and content. We then conducted keyword research on Ahrefs and checked the Google SERP for that query.
We also included the monthly search volume (MSV) of the focus keyword to help prioritize which updates to make first. To do this, we inserted the recommended keyword into Ahrefs’ Keywords Explorer and added the MSV to our master sheet.
As an added precaution, we also checked all posts marked “yes” for cannibalization and checked their organic potential. There are several options for this:
- Do a site search and see if any URLs appear for the focus keyword.
- Paste the focus keyword into Google Search Console to see if any URLs appear.
- Paste the focus keyword into Ahrefs Keyword Explorer and scroll down to Position historysearch for your domain name and filter by Top 20 and your desired time frame (I usually check the last six months). If multiple URLs are found, this may indicate cannibalization.
If the focus keyword was flagged as cannibalization, we either found another focus keyword or determined that the URL should be redirected to the newer post.
If no cannibalization was detected, we had the green light to continue updating the post.
5. Recommend an action.
As soon as a contribution was fully evaluated, we translated the findings into measures.
Each URL was assigned to one of the following categories:
- Hold: No action is required as both the content and URL are good.
- Optimize: The content is good, but outdated in terms of freshness or SEO practices. Keep the spirit of the article, but update and re-optimize to improve performance.
- Recycle: The content is unsalvageable, but the URL still has value (in terms of backlinks or keyword opportunities). Create new content from scratch but keep the URL the same.
- Prune: Neither the content nor the URL has any value from an organic perspective.
Audit insights
Of the 4,000 URLs we reviewed, 951 (23.78%) were categorized as posts with organic potential and recommended for optimization or reuse. Additionally, 2,888 URLs were recommended to be cleaned. That’s about 72.2% of the exam.
These posts either had no organic potential, posed a risk of cannibalization, or were so out of date that there was no point in updating them.
The remaining 161 URLs either required no action or were already redirected.
How we took action
The action taken on a URL was determined based on its potential for organic traffic.
The URLs with organic potential were submitted to our blogging team and recommended for optimization or reuse.
In the meantime, the URLs with no organic potential were submitted to our SEO team and recommended for archiving or forwarding.
First, let’s walk through how we took action on the posts that were recommended to be optimized or reused.
Seize content with organic potential
Before diving into any of the 951 posts with organic potential, we needed to figure out the following:
- Our ability for strategic analysis and short writing
- The capacity of our internal writing staff and available freelancers
- Our ability to process the updates
We coordinated with stakeholders and determined that in 2024 we only have the bandwidth to update 240 posts (in addition to the dozens of blog posts we update each month). This initiative was known internally as the “De-Age the Blog Project” and was led by my EN Blog Strategy teammate Kimberly Turner.
Once we knew how many contributions we could take on, we had to narrow down the priorities. We achieved this by assessing the complexity of the lift required for each post update:
- Simple update: The content updates required are relatively minimal, making it suitable for freelancers.
- Complex update: The content updates required are extensive, making them more suitable for in-house authors.
- Recycle: The content cannot be saved, but the URL can. Rewrite the post from scratch but keep the URL the same.
- No opportunity: Leave it to update.
We originally prioritized updating the simplest URLs first, but we later changed our strategy to target the URLs with the highest MSV potential, regardless of update complexity.
We did this because we wanted to get the most out of our updates.
De-age blog results
We originally expected these updates to be completed by the end of the first half of 2024, but we had to change our strategy… again.
Like many other publishers, we have felt the impact of the March 2024 Google Core Update and the introduction of AI Overviews.
After putting the De-Age the Blog project on hold while we addressed the issues, we deprioritized the project entirely in favor of higher impact workflows.
SEO, am I right? It always keeps you on your toes.
Even though we had to cancel the project before it was completed, we were still able to complete 76 post updates. Six months after implementing the updates, cumulative monthly traffic to these posts had increased by 458%.
This shows that updating even a small portion of URLs can make a big difference.
Seizing content without organic potential
While the De-Age the Blog project was underway, we also took action on the 2,888 URLs that were recommended for cleanup.
Because the initial audit did not provide any cleanup recommendations, we had to re-examine each URL to determine this How we would circumcise.
This is how we rated the contributions:
- Archive (404): The URL has less than 10 backlinks and the backlink profile has no value.
- Redirect (301): The URL has more than 10 backlinks and/or the backlink profile is valuable.
How exactly did we determine the backlink profile value? Rory HopeHubSpot’s head of SEO, recommended we follow the following steps:
1. Log in to Ahrefs and submit the URL to Ahrefs. Site Explorer Search bar.
2. Choose overview from the left sidebar.
3. Scroll down and click Backlink profile.
4. Continue scrolling down and select By DR under Referring domains.
5. Analyze and investigate all referring domains that are > 50.
6. Navigate to the referring domain you are investigating > 50 by clicking on the number.
7. Analyze them Referring page.
Select Forward (301) if:
- The Referring page The link is from a domain that is still receiving Domain traffic.
Select Archive (404) if:
- The Referring page The link appears to be “spam”. You can determine this by asking the following questions:
- Does this website only publish low-quality guest posts (SEO driven) on many different topics?
- Does this website still publish content? If not, ignore it.
- The Referring page comes from a website that links to many EN blog posts via an automated RSS-style link system.
Additionally, all URLs labeled “Redirect (301)” required a new URL for the redirect.
When choosing a new URL, we did our best to select the most relevant and similar page. If we couldn’t find one, we redirected to the pillar page of the cluster the post belonged to.
If for some reason the URL didn’t belong to a cluster or didn’t have a pillar page, we redirected it to the HubSpot blog home page.
Some content types were easier to make a decision than others. For example, we were able to automatically assign 301 redirects to URLs that were flagged as cannibalization during the initial inspection. We also automatically assigned 404 errors to URLs with fewer than 10 backlinks flagged as Newsjacking and Business Updates.
Everything else was checked manually to ensure accuracy. To simplify the evaluation process, we followed this decision tree:
It took my team about two and a half weeks to make sure each URL had the correct label. In the end, we were assigned 1,675 URLs to redirect and 1,210 URLs to unpublish and archive.
After each URL was evaluated, we were finally ready to take action.
After coordination with Rory and the principal technical SEO strategist, Sylvain Charbitwe decided to clean up the URLs in batches rather than all at once. This would allow us to better monitor the impact of redirecting and archiving a large amount of content.
We originally planned to give our prunes in five batches over a five-week period to give us time to monitor performance in the weeks in between.
Batches one and two contained URLs intended to be archived and unpublished, and batches three through five contained URLs intended for 301 redirects.
Because so many URLs needed to be unpublished and archived, we worked with HubSpot’s Digital Experience team developers to create a script that would automatically unpublish and archive URLs and redirect them to our 404 page.
Then we were able to implement the 301 redirects with that Bulk URL redirection tool in the Content Hub.
Note: Although we were able to work through this process internally and complete it before our deadline, I want to acknowledge that manually evaluating over 2,000 URLs can be tedious and time-consuming.
Depending on your resources and the scope of your audit, you may want to hire a freelancer to help your team handle such a large task.
Content cleanup results
Although we have successfully implemented each batch, this process has not been without some obstacles.
In the middle of our cleanup plan, Google released the March 2024 Core Algorithm Update. In the end, we put our cleanup plan on hold so we could better monitor performance during the update.
Once the update was complete, we continued the rest of our cleanup until it was complete.
Due to the volatile search landscape in 2024, we did not see the traffic increases we had hoped for after the cleanup was complete. However, we were able to celebrate tremendous success in terms of keeping the blog content generally current.
At the start of our 2023 audit, we calculated the freshness of our content library by looking at the publish date of each URL and quantifying the number of days since the update.
Let’s say the current date is November 12, 2024 and you have a post that was last updated on February 19, 2008. Based on the 2024 date, the 2008 post is 16.7 years or 6,110 days old.
Once we had the total age for every post on the HubSpot blog, we averaged those numbers to find the average age of our content library, which was 2,088 days (5.7 years).
Since cleaning 2,888 URLs (and updating hundreds of URLs from review and beyond), the average HubSpot blog age has dropped to 1,747 days – 341 days younger than when we started.
As content freshness and usefulness play an even larger role in search algorithms, being almost a year younger can make a big difference.
What’s next?
At the beginning of this post, I mentioned that this audit was just one of three my team worked on in 2024.
Our Phase 2 audit focuses on the lowest-performing posts that were not included in Phase 1, totaling over 6000 URLs. Phase three will then evaluate the value of our blog’s topic clusters.
We are still working on the results of these audits, but I am very excited to share the process and findings with you once completed.
Ultimately, content review is a task that never really gets done – especially when working with large libraries. You complete one audit, then move on to the next one.
Although the work can be tedious, improving content quality, user experience and performance is worth the effort.