Dealing with 404s in your Kentico Event Log
You have too many 404 “Page Not Found” warnings in your Kentico event log. You gave up reviewing them long ago. Fear not, there is help out there! The first step is admitting you have a problem...
You have too many 404 “Page Not Found” warnings in your Kentico event log. It’s impossible to review them easily without exporting to spreadsheets and doing some fancy filtering, and even then you’re faced with hundreds or thousands of different combinations of URLs, IPs, User Agents, referrers, and it’s all too hard. It may have even got to the point where there are so many, you’re considering just turning off the logging of 404s completely (admit it).
Best Practice is to log all 404s so you can act on those ones that are problems you need to solve, such as broken links or missing images on your site. But by extension, it’s also Best Practice to minimise the number of 404s being logged on your site (ideally to zero). So we are left with the paradoxical guidelines:
- You should log all 404s
- You should not have 404s in your log
Why is it an issue?
What’s the big deal? Why can’t we just have a simple 404 page that doesn’t tax the server too much, then log the thousands of 404s we’re getting and filter our event log reports to only show the ones we’re interested in?
Anatomy of a 404
Every time a request is made for a URL on a Kentico website, in addition to logging the request and returning a 404 page, Kentico tries really hard to find exactly what the visitor is looking for! On a single request, Kentico might have to:
- Check to see if a page exists in the database with that URL alias
- Check to see if any other pages in the database have this URL configured as an alias
- Check for matching files in that location on disk on the web server
- Log a “warning” in the Event Log in the database
- Render the 404 “page not found” page, just like any other page
So you can see a 404 response is likely to be quite a bit more performance-hungry than a regular page load, even in a well-configured site with proper caching set up. Kentico solution architects advise that a 404 response results in at least five database queries, or many more if your 404 page itself requires database access (which it commonly does).
On a high traffic site, your Event Log can also fill up very quickly if it’s logging thousands of 404s. This can create performance issues if it gets very large. If you have limited the size of the Event Log using Kentico settings, you might also find that your log is full of 404s and only spans a couple of days, which renders the log far less useful to you when an actual error occurs.
It’s not all doom and gloom - 404s do exist for a reason! Sometimes shared links will be accidentally concatenated, typos will be typed, and very commonly old pages will still be indexed in search engines when they don’t actually exist any more on a website. In these situations, a legitimate site user could very well end up trying to load the page, and they need an appropriate, helpful response. By the same token, a search engine or other robot might really appreciate receiving a 404 “page not found” status code from you, so they can dutifully update their index to remove that page.
You should also be very interested in these “good 404s” being logged, because you can often take action to reduce the numbers of these occurring, for example by adding an alias for a page that no longer exists, to automatically redirect people to the new version of a page.
Many 404s aren’t actually generated by real site visitors. Bots, crawlers and script kiddies abound out there on the Internet. Any systems administrator will be familiar with seeing pages of server logs filled with seemingly random attempts by automated tools to probe their sites for vulnerabilities. A prolific example is scripts scanning websites all over the web testing for the login pages of popular systems such as Wordpress, in an attempt to then find vulnerabilities in those sites (and hack in!).
These spurious or malicious requests often occur in waves, and in much greater quantities, than those generated by human visitors. It’s not uncommon to suddenly see hundreds or thousands of requests come in to a high-traffic website, with seemingly no sensible trigger or origin.
For these “bad 404s”, it is not important what the 404 page itself looks like, because no one is actually looking at it! We should be more interested in minimising the impact these requests can have on our site, and focusing on the traffic that we care about. In fact, we may actually want to consider returning some other response than a 404, in an attempt to send a message to any bots or scripts that they are not welcome here.
There are a bunch of techniques you can use to reduce the number of 404s in your Event Log. Each one has its own strengths and weaknesses, and is more suitable for a particular situation. It’s quite likely that you’ll need to implement two or more strategies to get a really nice, clean Event Log with only entries you care about.
This is a great solution for the “good 404s” - real visitors reaching the wrong URL for some reason. Kentico has in-built support for adding multiple URL aliases for a page, and having those alternate URLs automatically redirect the user to the correct address, using a best-practice “301 Moved Permanently” response.
To set up:
- Ensure “Use 301 Redirects” and “Redirect Page to Primary URL” settings are enabled
- Go to the page that does exist, and add URL Aliases for those other URLs you want to capture (that are triggering 404s)
- Any editor can do it
- Follows best practices (e.g. Google’s for old to new URL redirection)
- Just as performance-hungry as the original 404
- Very difficult to manage for hundreds or thousands of URLs
- Not suitable for handling bad 404s
301 Redirection Modules
There are a number of bulk 301 redirection modules available for Kentico in the Marketplace. These are most commonly used when launching a new site with a different sitemap to the old one. You would set up a whole bunch of redirections from the old pages to corresponding new pages, and let this module take care of them. Similarly, you could set up a large number of redirections from URLs that you drew from your 404 logs, and map them to existing pages in your site.
- Allows bulk management of good 404s
- Follows best practices for 301 redirection
- Similar performance cost to regular 404 responses
- Not suitable for handling bad 404s
HttpRedirect elements in the web.config file
This is another solution to the “old site” URL redirection scenario. Any ASP.NET site can have redirects configured directly in its web.config file.
For bad 404s, where the actual response is not as important and we just don’t want to spend effort processing the request, it is also possible to just redirect problem URLs to an empty “black hole” html file. This sounds a little dodgy... and it possibly is. But it has the desired effect!
- Super easy, just list in web.config
- Nothing gets logged in the Kentico Event Log, and no database queries executed, so it’s excellent for performance
- Not best practice for 404s, as the empty html page returns an empty 200 OK response
- Changes require access to the web server’s files, which can be dangerous
- Changes will restart your application and your site will be unresponsive
Large, enterprise sites will sometimes have a Web Application Firewall installed between the Kentico web server and the wider web. Some of these WAFs have very powerful features that can do everything from Request Filtering based on IP ranges, to switching and even virus scanning. Some can even keep themselves up to date with known lists of bad IPs, user agents and URL patterns, and filter them out before they even reach the Kentico website.
- Very robust and high-performance
- Excellent solution for bad 404s
- Difficult, requires advanced (expensive) hosting config
- Not manageable by Kentico users
- Not applicable for managing good 404s
Don’t log 404s in Event Log, create your own report
I know this goes against one of the guidelines listed at the top of this post, but not without reason! It’s a hot tip from fellow Kentico MVP Brian McKeiver. It elegantly solves the issue of your Event Log filling up with 404s, by not logging them in the Event Log! Under this scenario, you still log all visits using Web Analytics, and use the Kentico Reporting module to keep track of your 404s instead of using the Event Log.
- Keeps your Event Log clear of all 404s so you can concentrate on other errors
- Makes reporting and reviewing your 404 logs very easy as you can simply subscribe to a report and get it emailed
- Doesn’t solve the performance problems associated with mass 404s
- Still has potential for bad 404s to overwhelm your good ones in reports
- Only applicable if you have Web Analytics enabled within Kentico
My Request Filtering module
My personal favourite solution for handling bad 404s, is the Request Filtering module that I developed myself to run within the Kentico admin interface. It aims to bring some of the power of WAF-style Request Filtering to Kentico users without the need for expensive hardware, and allows users to configure rules without restarting their application or affecting performance.
Read the full release post on my module here:
Kentico Request Filtering Module
- Excellent performance - Kentico not even involved when requests are intercepted
- Returns an empty 403 response, in line with best practices for rejected URLs (rather than redirected)
- Supports more than just exact URLs, e.g. regular expressions, user agent strings, HTTP referrers and IP addresses
- Changes do not require an application restart - can tweak real-time on live site
- End result: Kentico event log still has good 404s in it, with bad 404s blocked at the front gate and not even logged
- Not suitable for managing good 404s - should be used in conjunction with another technique
- Potentially dangerously powerful - a badly-designed rule could break your site
Start the Discussion
Do you battle overwhelming 404s on your sites? Are you using one of the above methods, or something else? Let me know in the comments!
Want more? Here are some other blog posts you might be interested in.
A content calendar can be an extremely powerful tool – if well set-up and maintained. Content Strategist Tami Iseli outlines some of the factors that can reduce the chances of abandonment.
The European Union recently introduced the General Data Protection Regulation, or GDPR. This huge update to data privacy regulations finally brings them into the 21st century in terms of online data protection. Australia, however, is still livin' in the 80s. So what does GDPR mean for us?