In times of JavaScript crawling and indexing, it is crucial to understand how Google deals with the two HTML versions of each page of a website that uses JavaScript: the pre-DOM and the post-DOM HTML. Out of curiosity, we set up a little test to find out which of the two versions Google uses to interpret hreflang annotations implemented in the head of the HTML document.
If you are not familiar with the basics of JavaScript SEO, or if pre-DOM and post-DOM HTML don’t mean anything to you yet, we highly recommend Justin Briggs‘ article “The core principles of SEO for JavaScript”, a piece that everybody interested in the more technical side of SEO should read and digest.
The idea for this little test emerged while we were preparing a different experiment that we are going to publish soon: We had been wondering if it’s possible to implement hreflang with Google Tag Manager (Spoiler: the answer is yes!).
In order to test this on our own website, searchVIU.com, we first of all had to remove the hreflang annotations that were already in place. Due to our technical setup (hreflang had been automatically added by the great multilingual WordPress plugin Polylang), we did not find a quick and easy way to remove hreflang from our website. So we decided to just use Google Tag Manager to remove the hreflang annotations from our website before adding them again (also with Google Tag Manager).
Removing hreflang from the website with Google Tag Manager
The configuration in Google Tag Manager for removing all hreflang tags from the HTML code of a page is extremely simple:
The script you see in the screenshot removes all link elements that have hreflang attributes from the HTML source code (hat tip to Sam Nemzer for sharing the scripts we used as inspiration for this in his masterpiece “How to Implement SEO Changes Using Google Tag Manager”).
This implementation with Google Tag Manager results in a conflict between the pre- and the post-DOM HTML: The hreflang annotations are present in the HTML source document (pre-DOM), but absent in the rendered HTML (post-DOM). Let’s break this down for those of you that are less familiar with these concepts:
You can access the pre-DOM HTML code of a page by right clicking in your browser and selecting “View source code”. On searchVIU.com, while this test is still active (maybe not anymore right now, depending on when you are reading this), the hreflang annotations are visible in the pre-DOM HTML source document of each page:
In the rendered HTML version (the post-DOM HTML), which you can access by right clicking in your browser and selecting “Inspect” (or the equivalent term your browser uses), the hreflang annotations are absent, as they have been removed with the help of Google Tag Manager, using JavaScript. In this screenshot you see the same area of the HTML code, but there are no hreflang annotations to be seen:
How does Google deal with hreflang being present in the HTML source, but absent in the rendered HTML?
To be honest, we had no clue how this would work out. Would Google ignore the hreflang annotations in the HTML source because they were missing in the rendered HTML? We were of course hoping this would be the case, otherwise we wouldn’t have set up this test.
The simple answer to this question is that Google doesn’t care about the HTML source document (the pre-DOM HTML) and only looks at the rendered HTML (the post-DOM HTML).
Shortly after removing the hreflang annotations, Google started ranking the root URL with an English snippet in position 2 in the German search results (on google.de), right after the German home page:
Before, with hreflang still implemented correctly, only the German version ranked in the German search results, just as expected when hreflang is implemented correctly.
We, of course, wanted more proof for this, so we kept an eye on the hreflang report in Google Search Console. For about two weeks after the removal of the hreflang annotations, nothing happened. We already started thinking that the bad international SERP display we were suffering since the change might have been a coincidence, but then things finally started to move:
This, together with the fact that our international versions are displayed in the wrong way since the change, is enough evidence to say that Google ignores our HTML source document and only pays attention to the rendered HTML for the interpretation of hreflang. Quite interesting, huh?
Conclusion and outlook
We were surprised and happy to find out that Google behaves this way. We thought that Google might process both the pre- and the post-DOM HTML and use some of the information it finds in both. But then again, it would probably be quite difficult for them to determine which version to trust if there are conflicts. So it looks like they are moving (or have moved) to strictly working with the rendered HTML and ignoring the pre-DOM HTML. In hindsight, this probably makes sense and we should have expected the results we saw.
What do you think? Have you made similar observations? Or contrary ones? Do you know of current cases where Google ignores the rendered HTML and uses the pre-DOM HTML? We would love to hear from you in the comments.
Also, if you’re interested in part two of this experiment, stay tuned! We will implement hreflang using Google Tag Manager next and share a step-by-guide on this blog to show you how to do it yourself. We obviously expect Google to process hreflang annotations implemented this way just fine after having seen the results above, but we’ll let you know how it goes anyhow. Just follow us on Twitter, Facebook, or wherever you like, so you don’t miss it!
Hi Eoghan, great article thanks for this. There is one thing I find a little confusing. When you were doing this test you removed the hreflang implementation from the html code (pre-DOM HTML), is that right? Then Google started to rank the English domain in the German SERP, right?
So my point is that you can only remove a code implementation from the HTML code anyway (you cannot remove anything from DOM) because DOM is what browser is loading. Do you understand where I am coming from? So if you remove something from the pre-DOM it won’t exist within the post-DOM anyway. So how do you know that Google only cares about post-DOM not the pre-DOM? Maybe I am confusing myself but I appreciate if you could clear this up for me 😉
Hi Gokhan,
Thank you very much for your question. I think one sloppy formulation in my article might have confused you:
“The script you see in the screenshot removes all link elements that have hreflang attributes from the HTML source code”
I do see how this might be misunderstood (sorry about that!). It is very important to note that the hreflang attributes, in this experiment, remained present in the raw HTML, but were removed in the process of building the rendered HTML. The result of this is described in the following part of the article:
“This implementation with Google Tag Manager results in a conflict between the pre- and the post-DOM HTML: The hreflang annotations are present in the HTML source document (pre-DOM), but absent in the rendered HTML (post-DOM).”
So the hreflang annotations were always present in the HTML that was fetched from the server (pre-DOM / raw HTML), but were removed from the rendered HTML (post-DOM). After they were removed from there, they stopped working, so the conclusion was that Google only uses the rendered HTML, but not the initial HTML, to look for hreflang.
We also conducted more experiments after this one that all pointed in the same direction:
https://www.searchviu.com/en/javascript-seo-experiments-google-tag-manager/
https://www.searchviu.com/en/javascript-canonical-tags/
Please let me know if you have any further questions.
Thank you so much Eoghan, it makes sense now ? It’s interesting to see that Google only cares about rendered HTML (DOM).