SEO Chapter-2: Robots.txt and Sitemap.xml

After analyzing the domain name, general design, and URL format, my colleagues and I look at potential client's robots.txt and sitemap. This is helpful because it starts to give you an idea of how much (or little) the developers of the site cared about SEO. A robots.txt file is a very basic step webmasters can take to work with search engines. The text file, which should be located in the root directory of the website (http://www.example.com/robots.txtV is based on an informal protocol that is used for telling search engines what directories and files they are allowed and disallowed from accessing. The inclusion of this file gives you a rough hint of whether or not the developers of the given site made SEO a priority.

Because this is a book for advanced SEOs, I will not go into this protocol in detail. (If you want more information, check out http://www.robotstxt.org or

http://googlewebmastercentral.blogspot.com/2008/06/improving-on-
robots-exclusion-protocol.html.) Instead, I will tell you a cautionary tale.

Bit.ly is a very popular URL shortening service. Due to its connections with Twitter.com, it is quickly becoming one of the most linked websites on the Web. One reason for this is its flexibility. It has a feature where users can pick their own URL. For example, when linking to my website I might choose http://bit.lv/SexyMustache. Unfortunately, Bit.ly forgot to block certain URLs, and someone was able to create a shortened URL for http://bit.Iv/robots.txt. This opened up the possibility for that person to control how robots were allowed to crawl Bit.ly. Oops! This is a great example of why knowing even the basics of SEO is essential for web- based business owners.

After taking a quick glance at the robots.txt file, SEO professionals tend to look at the default location for a sitemap. (http://www.example.com/sitemap.xml). When I do this, I don't spend a lot of time analyzing it (that comes later, if owners of that website become a client); instead, I skim through it to see if I can glean any information about the setup of the site. A lot of times, it will quickly show me if the website has information hierarchy issues. Specifically, I am looking for how the URLs relate to each other. A good example of information hierarchy would b e www.example.com/mammal/doas/enalish-sprinaer-spaniel.html.

whereas a bad example would be www.example.com/node? tvpe=6&kind=7. Notice on the bad example that the search engines can't extract any semantic value from the URL. The sitemap can give you a quick idea of the URL formation of the website.

URLs like this one are a sign a website has information hierarchy issues because search engines can't extract any semantic value from the URL.

Action Checklist

When viewing a website from the 100-foot level, be sure to take the following actions:

• Decide if the domain name is appropriate for the given site based on the criteria outlined in this chapter

• Based on your initial reaction, decide if the graphical design of the


website is appropriate

• Check for the common canonicalization errors

• Check to see if a robots.txt exists and get an idea of how important SEO was to the website developers.

• If inclined, check to see if a sitemap.xml file exists, and if it does, skim through it to get an idea of how the search engines might see the hierarchy of the website.

This section dealt with some of the first elements of a site that I look at when I first look at a client's site from an SEO perspective: domain name, design, canonicalization, robots.txt, and sitemaps. This initial look is intended to just be a high-level viewing of the site.

In the next section I focus on specific webpages on websites and take you even closer to piecing the SEO puzzle together.

কোন মন্তব্য নেই:

একটি মন্তব্য পোস্ট করুন