Over the years, domain monitoring has become a standard industry practice, and an important component of most online brand protection services. Nonetheless, there are a few questions about the inner workings of domain name monitoring that we keep on getting asked:
-
How do Domain Name Crawlers work?
-
Why are mentions of a brandname in a subdomain not detected?
-
Why are some domain monitoring software not crawling certain ccTLDs?
These are some of the questions we address in this article.
Let’s start with a definition:
Domain Monitoring is the act of searching for relevant domains in domain registry databases.
The easy part: What you consider a relevant domain depends on your objective. As we are discussing domain monitoring in a brand protection context, relevant here means a risk to your brand e.g. those domains that infringe your trademark rights. If you are using domain monitoring as part of your competitive intelligence, relevance would doubtlessly have a different meaning.
Let us now look at the other part of the definition. What are domain registry databases , and how we can search in them?
In order to keep track of who (when and for how long?) has registered a domain, registries keep databases that store this information. Whether and how we can search those registry databases depends on whether the registries give access to the public. Here, not all domain endings are equal. Let us elaborate a bit.
Domain names can be divided into two groups: Those that are governed by ICANN (generic Top Level Domains, gTLDs) and those that are governed by national governments (country code Top Level Domains, ccTLDs). It’s easy to separate them. All two character TLDs are ultimately under the governance of countries, known examples are .cn, .ru, .de, uk., while all TLDs with 3 or more characters such as .com, .info, .shop, .app, .swiss follow regulations set out by ICANN.
This setup creates a domain world where there are unified rules regarding registry databases for all gTLDs and a fragmented landscape for ccTLDs.
Luckily, many ccTLDs voluntarily follow ICANN processes for many aspects of TLD governance but not nearly all.
This is important to start with because it explains why getting access to registry databases is not as straightforward as it may seem.
How do Domain Name Crawlers work?
Let’s start with the case of databases that are accessible by the public.
Whenever a domain is licensed to a registrant, the registrar marks the domain as registered in the registry database, together with other pertinent information, such as the registration date, expiry and registrant data. From that point on the public can look up the WHOIS for the domain. Domain monitoring tools search through a list of all domains registered since the bots last crawled, which we call here newly registered domains. The list of domains you find in your domain monitoring tool results page is a subset of all newly registered domains. Exactly which domains end up on your results page depends on the way your crawler is set to filter the results, but most brands are interested in finding domains that include their trademark, or domains that look confusingly similar to their trademark.
Filtering newly registered domains happens via an algorithm using so called “boolean operators”(AND,OR,NOT). You are familiar with those from any monitoring tool. Lines such as does not include keywords (negative keyword) help you minimize the amount of “false positive” results, data which is actually irrelevant even though it contains your trademark.
Example: The car brand Jaguar may want to exclude all results that include “zoo”, “safari” or other keywords that are very likely to be associated with something other than the car brand.
Besides the Boolean Operators, domain crawlers are also using algorithms that identify domain names which may be confusingly similar to your brand, without containing an exact match of your brand name. A domain name can be confusingly similar to your brand on different levels including phonetically and visually.
Phonetic similarity
The french words “deux mains” (two hands) and demain (tomorrow) sound very similar, even though the words appear quite different visually. When you are reading a word which is phonetically similar to a brand, you may automatically make a connection to that brand, especially if followed by a brand-related keyword. In practice, this is rarely used in scams, because it’s rather difficult to come up with good examples, so most people are underwear of it. 😛
Visual proximity
A much more commonly used way of creating proximity is by means of “look”. When reading we mainly focus on the first and last letters of a word, followed by the existence of characters we would expect to find in a word while the exact order of the characters is less relevant for our understanding of a word (see Typoglycemia).
Cybersquatters take advantage of this cognitive trait by registering domains that appear to be a known brand, while there is actually a small difference such as an omitted letter or additional letter, a double letter, a typo on a visually similar letter from the same or even from a different script.
Jaquar.com - Can you spot the mistake?
Depending on the brand you are monitoring for, the algorithm has either a strict or loose configuration. You want to catch all relevant results and omit as many false positives as possible to save analyst time when filtering through the result list.
How does domain name monitoring for ccTLDs differ?
ccTLDs on the other hand are not obliged to make the registered domains available to the public. This means that not all ccTLD databases can be crawled actively and other heuristic techniques are applied to monitor domains that contain keywords such as your brand under ccTLDs.
One such heuristic is to check whether a set of potentially interesting domains is registered under the TLD. For example, the monitoring tool could use domains that have been registered in gTLDs and contain your brand as a sample, then check whether any of those names are unavailable under the ccTLD in question. If the domain is unavailable, this is an indication that this particular domain is registered and therefore may need further attention from your brand protection team.
The heuristic ccTLD monitoring approach is less robust than searching through public databases because registries may block monitoring tools if they receive too many requests, others require a license. It is therefore laborious to finetune towards the different registries leading to a trade-off between cost of adding an extra ccTLD and the value you will get from monitoring less common extensions such as Ethiopian or Burmese domain registrations.
Therefore, domain monitoring software is always including gTLDs but how many ccTLDs are included differs a lot from provider to provider. It is up to each brand to assess whether the extra domain monitoring coverage is worth the extra expense.
Why are Subdomains not included in conventional Domain Monitoring?
It is important to understand where exactly domain monitoring crawlers are searching to understand the limitations of their results.
Let’s have a look at the structure of a URL:
protocol://sub-domain.second-level-domain.TLD/Folder1/Folder2
Protocol: Specifies the process of how computers communicate with each other. In most instances this will be http or https.
Sub-Domain: As admin of a domain, one can create as many and whatever kind of sub-domains one wants. Some commonly used subdomains are www, mail, test, app. Subdomains can be used to point to different servers and therefore services. You might be accessing “drive.google.com” or “docs.google.com” running under different subdomains of google.com.
Second-level-Domain: The part that you can register to get an exclusive license to use the name.
Top-Level Domain or First-Level Domain (TLD): The TLD follows the second-level domain and delimits a space under the governance of the TLD registry.
/FolderStructure: Directs a query to the right folder and file to fetch the requested data under that URL, also called Path.
Note: When we refer to domain name or domain, we talk about the combination of second-level and first-level domain, so our domain name is thomsentrampedach.com.
Which of the URL components is covered by domain monitoring crawlers?
Crawlers will only look at the domain name part, because that’s the only part that is available in the searchable registry database.
In other words, sub-domains and the path or folder structure are not analyzed by conventional domain monitoring solutions.
This creates several challenges for brand protection teams because brand risks may well occur in the subdomain or the URL:
Brand risk indication in subdomain: BlackFriday.JaguarOfficial.VintageCars.com
Brand risk indication in URL-path: VintageCars.com/Account/Invoices/JaguarXJ/unpaid.aspx
Conventional domain monitoring tools would miss the examples given here.
In a future article we will describe how Passive DNS and advanced search commands in search engines can help mitigate against those other risks.
—
At Thomsen Trampedach we offer a customized domain monitoring service balancing effectiveness and cost. If you are looking to protect your brand from abuse within domain names, let’s have a chat and explore your needs.
You can read more about our domain monitoring service here.