Learn Search Engine Optimization : Chapter 2

Chapter 2 লেবেলটি সহ পোস্টগুলি দেখানো হচ্ছে৷ সকল পোস্ট দেখান

SEO Chapter 2: The Importance of Good Site Architecture

Before you start examining a website from this level, let me explain the importance of good site architecture.

While writing this book I am working with a large client that is totally befuddled by its poor rankings. (Note: This client had me sign a nasty looking non-disclosure agreement, so I am unable to reveal its name.) The company's homepage is literally one of the most linked-to pages on the entire Internet and at one point had the elusive PageRank 10. One of its current strategies is to leverage its homepage's link popularity to bolster a large group of pages optimized for ultra competitive keywords. It wants to cast a wide net with the optimized pages and drive a large amount of search engine-referred traffic to its product pages.

It is a great idea, but with the current execution, it has no chance of working.

The problem is that the website lacks any kind of traditional site architecture. The link juice (ranking power) coming from the hundreds of thousands of domains that link to this company's homepage has no way of traveling to the other webpages on this domain. All of the link juice is essentially bottled up at the front door.

Its content is located on at least 20 different domains, and there is no global navigation that leads users or search engines from the homepage down to categorized pages. The company's online presence is more like a thousand islands rather than the super continent it could be. It is an enormous waste of resources and is directly affecting the company's bottom line in a real way.

When explaining site architecture to clients, I start out by asking them to visualize a website like an ant hill. All of the chambers are like webpages and the tunnels are like internal links. I then have them imagine a little boy pouring water into the ant hill. He pours it down the main entrance and wants to have it fill all of the chambers. (As a side note, scientists actually have done this with cement to study the structure of ant metropolises. In one case, they had to pour 10 tons of liquid cement into an ant hill before it filled all of the chambers.) In this analogy the water represents the flow of link juice to webpages. As discussed earlier, this link juice (popularity) is essential for rankings.

The optimal structure for a website (or ant hill, if you must) would look similar to a pyramid .

This structure allows the most possible juice to get to all of the website's pages with the fewest number of links. This means that every page on the website gets some ranking benefit from the homepage.

Apyramid structure for a website allows the most possible link juice to get to all the website's pages with the fewest number of links.

NOTE Homepages are almost always the most linked-to pages on a domain. This is because they are the most convenient (the shortest) URL to link to when referring to the website online.

Evaluating Homepages

Now that we are on the same page about site architecture, we can move forward. Once I get to this level of analysis, I start really looking at the site architecture. Obviously, this starts at the homepage.

Ideally, the homepage should link to every single category of pages on a website. Normally, this is accomplished with a global navigation menu (global meaning it is on every web page on the domain). This is easy to do with small websites because if they have less than 150 pages, the homepage could directly link to all of them. (Note this is only a good idea if the homepage has enough links pointing at it to warrant this. Remember the little boy and the ant hill; link popularity is analogous to the amount of water the little boy has. If he doesn't have enough, he can't fill every chamber.)

SEO Chapter-2: Robots.txt and Sitemap.xml

After analyzing the domain name, general design, and URL format, my colleagues and I look at potential client's robots.txt and sitemap. This is helpful because it starts to give you an idea of how much (or little) the developers of the site cared about SEO. A robots.txt file is a very basic step webmasters can take to work with search engines. The text file, which should be located in the root directory of the website (http://www.example.com/robots.txtV is based on an informal protocol that is used for telling search engines what directories and files they are allowed and disallowed from accessing. The inclusion of this file gives you a rough hint of whether or not the developers of the given site made SEO a priority.

Because this is a book for advanced SEOs, I will not go into this protocol in detail. (If you want more information, check out http://www.robotstxt.org or

http://googlewebmastercentral.blogspot.com/2008/06/improving-on-
robots-exclusion-protocol.html.) Instead, I will tell you a cautionary tale.

Bit.ly is a very popular URL shortening service. Due to its connections with Twitter.com, it is quickly becoming one of the most linked websites on the Web. One reason for this is its flexibility. It has a feature where users can pick their own URL. For example, when linking to my website I might choose http://bit.lv/SexyMustache. Unfortunately, Bit.ly forgot to block certain URLs, and someone was able to create a shortened URL for http://bit.Iv/robots.txt. This opened up the possibility for that person to control how robots were allowed to crawl Bit.ly. Oops! This is a great example of why knowing even the basics of SEO is essential for web- based business owners.

After taking a quick glance at the robots.txt file, SEO professionals tend to look at the default location for a sitemap. (http://www.example.com/sitemap.xml). When I do this, I don't spend a lot of time analyzing it (that comes later, if owners of that website become a client); instead, I skim through it to see if I can glean any information about the setup of the site. A lot of times, it will quickly show me if the website has information hierarchy issues. Specifically, I am looking for how the URLs relate to each other. A good example of information hierarchy would b e www.example.com/mammal/doas/enalish-sprinaer-spaniel.html.

whereas a bad example would be www.example.com/node? tvpe=6&kind=7. Notice on the bad example that the search engines can't extract any semantic value from the URL. The sitemap can give you a quick idea of the URL formation of the website.

URLs like this one are a sign a website has information hierarchy issues because search engines can't extract any semantic value from the URL.

Action Checklist

When viewing a website from the 100-foot level, be sure to take the following actions:

• Decide if the domain name is appropriate for the given site based on the criteria outlined in this chapter

• Based on your initial reaction, decide if the graphical design of the

website is appropriate

• Check for the common canonicalization errors

• Check to see if a robots.txt exists and get an idea of how important SEO was to the website developers.

• If inclined, check to see if a sitemap.xml file exists, and if it does, skim through it to get an idea of how the search engines might see the hierarchy of the website.

This section dealt with some of the first elements of a site that I look at when I first look at a client's site from an SEO perspective: domain name, design, canonicalization, robots.txt, and sitemaps. This initial look is intended to just be a high-level viewing of the site.

In the next section I focus on specific webpages on websites and take you even closer to piecing the SEO puzzle together.

SEO Chapter 2: Duplication and Canonicalization

After analyzing a website's domain name and general design, my

colleagues and I check for one of the most common SEO mistakes on the Internet, canonicalization. For SEOs, canonicalization refers to individual webpages that can be loaded from multiple URLs.

NOTE In this discussion, "canonicalization" simply refers to the concept of picking an authoritative version of a URL and propagating its usage, as opposed to using other variants of that URL. On the other hand, the book discusses the specific canonical link element in several places,

including in Chapter 5.

Remember that in Chapter 1 I discussed popularity? (Come on, it hasn't been that long.) What do you think happens when links that are intended to go to the same page get split up among multiple URLs? \fc>u guessed it: the popularity of the pages gets split up. Unfortunately for web developers, this happens far too often because the default settings for web servers create this problem. The following lists show the negative SEO effects of using the default settings on the two most common web servers:

Apache web server:

http://www.example.com/

http://www.example.com/index.html http://example.com/

http ://example.com/index. html Microsoft Internet Information Services (IIS):

http://www.example.com/

http://www.example.com/default.asp (or ,aSPx depending on the version) http://example.com/

http://example.com/default.asp (or .aspx)

Or any combination with different capitalization.

Each of these URLs spreads out the value of inbound links to the homepage. This means that if the homepage has 100 links to these various URLs, the major search engines only give them credit separately, not in a combined manner.

NOTE Don't think it can happen to >ou? Go to http://www.mattcutts.COm and wait for the page to load. Now, go tohttp://mattcutts.com and notice what happens. Look at that,

canonicalization issues. Whafs the significance of this example? Matt Cutts is the head of Google's web spam team and helped write many of the algorithms we SEOs study If he is making this mistake, odds are your less informed clients are as well.

Luckily for SEOs, web developers developed methods for redirection so that URLs can be changed and combined. Two primary types of server redirects exist—301 redirects and 302 redirects:

• A 301 indicates an HTTP status code of "Moved Permanently."

• A 302 indicates a status code of "Temporarily Moved."

Other redirect methods exist, such as the meta refresh and various JavaScript relocation commands. Avoid these methods. Not only do they not pass any authority from origin to destination, but engines are unreliable about following the redirect path.

TIP You can read all of the HTTP status codes at

http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html.

Though the difference between 301 and 302 redirects appears to be merely semantics, the actual results are dramatic. Google decided a long time ago to not pass link juice (ranking power) equally between normal links and server redirects. At SEOmoz, I did a considerable amount of testing around this subject and have concluded that 301 redirects pass between 90 percent and 99 percent of their value, whereas 302 redirects pass almost no value at all. Because of this, my co-workers and I always look to see how non-canonicalized pages are being redirected.

It's not just semantics. How a page is redirected (whether by a 301 or a 302 redirect) matters.

WARNING Oder vBrsions of IIS use 302 redirects by default. D'oh! Be sure to look out for this. You can see worthless redirects all around

popular NS-powered websites like microsoft.com and

myspace.com. The value of these redirects is being completely negated bya single value difference!

Canonicalization is not limited to the inclusion of letters. It also dictates forward slashes in URLs. Try going to http://www.google.com and notice that you will automatically get redirected to http://www.aooale.com/ (notice the trailing forward slash). This is happening because technically this is the correct format for the URL. Although this is a problem that is largely solved by the search engines already (they know thatwww.google.com is intended to mean the same as www.aooale.comI), it is still worth noting because many servers will automatically 301 redirect from the version without the trailing slash to the correct version. By doing this, a link pointing to the wrong version of the URL loses between 1 percent and 10 percent of its worth due to the 301 redirect. The takeaway here is that whenever possible, it is better to link to the version with the forward slash. There is no reason to lose sleep over this (because the engines have mostly solved the problem), but it is still a point to consider.

CROSSRB1 The right and wrong usage of 301 and 302 redirects is discussed in Chapter 3. The correct syntax and usage of the canonical link element is discussed in Chapter 5.

SEO Chapter-2: Don't Fool Yourself, Looks Matter

I once talked to a website owner who had an 80 percent bounce rate on his homepage and figured it was normal. Can you imagine if 80 percent of the people who looked at you immediately ran in the opposite direction? This isn't normal. Web design is an element of SEO that many amateur SEOs miss. It doesn't matter if you can get high rankings if none of the searchers stays on the given webpage after clicking through.

SEO-friendly web design is a lot like getting a prom date; appearance matters. People make decisions about the credibility of a website the instant the page loads. Like people, credible websites have a very specific look and feel to them. They generally have a clear logo in the top left, and a navigation bar horizontally on the top of the page or vertically on the left- hand side. They have less than five colors in their layout (not including images), and they have clear, readable text.

Would you feel comfortable leaving your children with a person in a bright orange prison jumpsuit? Of course not! In the same way, visitors to websites are not going to feel comfortable if they are greeted with pop- ups, loud music, and a multicolored skull logo.

Of course those are extreme examples. The common mistakes that I see are more along the line of the following:

• Lack of focus

• Crowded text

• Slow loading times

• Auto-playing music

• Unclear navigation

• Excess redirects

As an SEO, you need to stress the importance of good design. Though it may be fun and exciting to stretch the limits, it is not fun to be poor because 80 percent of your client's would-be customers leave the website directly after entering.

SEO Chapter-2: Relearning How You See the Web

In This Chapter

• Analyzing how a website fits in its "web neighborhood"

• Viewing websites like an SEO

• Assessing good site architecture and webpages from an SEO perspective

• Assessing website content like an SEO

When people surf the Internet, they generally view each domain as its own island of information. This works perfectly well for the average surfer but is a big mistake for beginner SEOs. Websites, whether they like it or not, are interconnected. This is a key perspective shift that is essential for understanding SEO.

Take Facebook, for example. It started out as a "walled garden" with all of its content hidden behind a login. It thought it could be different and remain completely independent. This worked for a while, and Facebook gained a lot of popularity. Eventually, an ex-Googler and his friend became fed up with the locked-down communication silo of Facebook and started a wide open website called Twitter. Twitter grew even faster than Facebook and challenged it as the media darling. Twitter was smart and made its content readily available to both developers (through APIs) and search engines (through indexable content).

Facebook responded with Facebook Connect (which enables people to log in to Facebook through other websites) and opened its chat protocol so its users could communicate outside of the Facebook domain. It also made a limited amount of information about users visible to search engines. Facebook is now accepting its place in the Internet community
and is benefiting from its decision to embrace other websites. The fact that it misjudged early on was that websites are best when they are interconnected. Being able to see this connection is one of the skills that separates SEO professionals from SEO fakes.

I highly recommend writing down everything you notice in a section of a notebook identified with the domain name and date of viewing.

In this chapter you learn the steps that the SEO professionals at SEOmoz go through either before meeting with a client or at the first meeting (depending on the contract). When you view a given site in the way you are about to learn in this chapter, you need to take detailed notes. \fc>u are likely going to notice a lot about the website that can use improvement, and you need to capture this information before details distract you.

Keep Your Notes Simple

The purpose of the notebook is simplicity and the ability to go back frequently and review yDur notes. If actual ph^ical writing isn't yDur thing, consider a low- tech text editor on your computer, such as Windows Notepad or the Mac's TextEdit.

Bare-bones solutions like a notebook or text editor help you avoid the distraction of the presentation itself and focus on the important issues^the characteristics of the web site that yDu're evaluating.

If you think it will be helpful and you have Internet access readily available, I recommend bringing up a website you are familiar with while reading through this chapter. If you choose to do this, be sure to take a lot of notes in your notebook so you can review them later.

The 1,000-Foot View—Understanding the Neighborhood

Before I do any work on a website I try to get an idea of where it fits into the grand scheme of things on the World Wide Web. The easiest way to do
this is to run searches for some of the competitive terms in the website's niche. If you imagine the Internet as one giant city, you can picture domains as buildings. The first step I take before working on a client's website is figuring out in which neighborhood its building (domain) resides.

This search result page is similar to seeing a map of the given Internet neighborhood. \fc>u usually can quickly identify the neighborhood anchors (due to their link popularity) and specialists in the top 10 (due to their relevancy). >t>u can also start to get an idea of the maturity of the result based on the presence of spam or low-quality websites.

During client meetings, when I look at the search engine result page for a competitive term like advertising, I am not looking for websites to visit but rather trying to get a general idea of the maturity of the Internet neighborhood. I am very vocal when I am doing this and have been known to question out loud, "How did that website get there?" A couple times, the client momentarily thought I was talking about his website and had a quick moment of panic. In reality, I am commenting on a spam site I see rising up the results.

To turn this off, append "&pws=0" to the end of the Google URL.

Also, take note that regardless of whether or not you are logged into a Google account, the search engine will automatically customize your search results based on links you click most. This can be misleading because it will make your favorite websites rank higher for you than they do for the rest of the population.

Along with looking at the results themselves, I look at the other data present on the page. The amount of advertisements on the search result gives a rough idea of how competitive it is. For example, a search forbuy viagra will return a full page height worth of ads, whereas a search for women

that look like Drew Carey WOPI t likely return any. This is because more people

are searching for the blue pill than are searching for large, bald women with nerd glasses.

In addition to the ads, I also look for signs of temporal algorithms. Temporal algorithms are ranking equations that take into account the element of time with regards to relevancy. These tend to manifest themselves as news results and blog posts.

Taking Advantage of Temporal Algorithms

You can use the temporal algorithms to yDur advantage. I accidentally did this once with great success. I wrote a blog post about Mchael Jackson's death and its effect on the search engines a day after he died. As a result of temporal algorithms my post ranked in the top 10 for the query "Mchael Jackson" for a short period following his death. Because of this high ranking, tens of thousands of people read my article. I thought it was because I was so awesome, but after digging into my analytics I realized it was because of unplanned use of the temporal algorithms. If you are a blogger, this tactic of quickly writing about news events can be a great traffic booster.

After scanning search result pages for the given website's niche, I generally get a sense for that neighborhood of the Internet. The important takeaway is to get an idea of the level of competition, not to figure out the ins and outs of how specific websites are ranking. That comes later.

Easy De-Personalization in Firefox and Chrome

Most SEOs perform searches dozens or hundreds of times per day and when you do, if s important that de-personalized results appear so that you see what a "typical" searcher would see, as opposed to search results influenced by yDu own search history.

Firefox is a terrific browser for SEOs for many reasons, but one of its most helpful features is the ability to search right from the address field of the browser, the area at the top of the browser where you normally see the URL of the web

page yDu're on. Better yet, with a little customization, you can easily perform Google searches that are de-personalized (although not de-geotargeted).

1. From the Bookmarks | Organize Bookmarks... menu, select any bookmarks folder in the left pane. (Do not simply select the Al Bookmarks folder, because it won't work.)

2. Right-click the folder and select New Bookmark...

3. Add the following values to the fields:

Name: Google de-personalized search

Location: http://www.google.com/search?&q=%s&pws=Q Tags: (Optional. Add anytags you want.)

Keyword: g

Description: (Optional. Use this to describe the search.)

4. Click Add.

That's it. Now, go to the AJdress field in Firefox (where you see a URL at the top of the browser) and type something like this:

g hdmi cables

This tells Google (g) to search for "hdmi cables". More important, because yDur Location field included &PwS=o, that URL parameter will carryover to your search result. From now on, if yDu want to perform a de-personalized Google search, simply type "g" (no quotes) and the query term from yDur URL field.

Use this process for creating as many custom searches as you like, keeping these important factors in mind:

1. The Location field must contain the exact URL of the search result, with the exception of the%s variable, which will be replaced with yDur query term automatically.

2. The Keyword field is where yau'll type before your search query to tell Firefox which custom query you'll be running. Be brief and accurate. I use terms like "b" for Bing, "tc" for text cache, and so on.

This functionality carries over to Google's Chrome browser too, because Chrome can import bookmarks from any other browser you use. If you're a Chrome user, simply import yDur Firefox bookmarks from the Chrome | Import Bookmarks and Settings menu, and you can search from the Chrome address bar just like you did in Firefox

Action Checklist

When viewing a website from the 1,000-foot level, be sure to complete the following:

• Search for the broadest keyword that the given site might potentially

rank

• Identify the maturity of the search engine results page (SERP) based on the criteria listed in this chapter

• Identify major competitors and record them in a list for later competitive analysis

This section discussed analyzing websites at their highest level. At this point, the details don't matter. Rather it is macro patterns that are important. The following sections dive deeper into the website and figure out how everything is related. Remember, search engines use hundreds of metrics to rank websites. This is possible because the same website can be viewed many different ways.

The 100-Foot View—The Website

When professional SEOs first come to a website that they plan to work with, they view it through a very different lens than if they were just idly surfing. They instinctively start viewing it from the perspective of a search engine. The following are the elements that my colleagues and I pay the most attention to.

SEO Chapter 2: How Important Is a Domain Name?

I could probably write an entire book on this subject. (Hear that Wiley Publishing? That's the sound of money.) From a marketing perspective, a domain name is the single most important element of a website. Unlike a brick-and-mortar company, websites don't have visual cues closely associated with them. Whereas potential customers can use visual cues to identify if a physical building is more likely a barber shop or a bank, they are not able to tell the difference between domain names. All domain names use the exact same format: http:// subdomain dot (optional) root domain dot TLD. Take, for example, http://www.gooale.com or http://www.bing.com. To an outsider, there is no reason to think that any of these resources would be a search engine. They don't contain the word search, and if their brands weren't as strong as they are, their gibberish names wouldn't mean anything to anyone. In fact, if you look at the top 100

most linked-to domains on the Internet, you see this trend over and over again: Wikipedia, >buTube, W3, Amazon, Macromedia, MSN, Flickr, Twitter, Digg, Technorati, IMDB, eBay—the list goes on.

This is where people get confused. They see websites like this and think that the domain name doesn't matter. They register domains that are hard to pronounce (SEOmoz) or hard to spell (Picnik) and figure they don't have to worry. The problem is they don't realize that the popular websites got popular not because of their domain names, but rather despite their domain names. Google was such an outstanding product with a plan that was executed so well that it could have had been named BackRub and still been successful. (Note: It was originally called BackRub. I am just amusing myself.)

As an SEO, if you find yourself in the position of changing or choosing a domain name, you need to make a difficult decision. How confident are you in the client's idea? Is it an idea that serves the entire world, or is it only useful to a few thousand people? If the website is world changing, it might actually benefit from a gibberish name. If the name is gibberish and very successful, people naturally start to associate its name with its service. For example, Google is now synonymous with "search." However, if the idea doesn't end up being world changing (and most websites aren't), a gibberish domain name can hurt the website. What are the odds that the general populous will type in spoke.com (a real website) to find personal profiles?

A nonsensical domain name can hurt a website, making it harder for people (and search engines) to find that site and associate with the concepts that the site focuses on.

For the vast majority of websites, a "search friendly" domain name is best. The search engines will always be constrained by the fact that many people search for exact URLs when they want to go to websites. Of course, the most relevant and popular result for the query "myspace.com" would be www.myspace.com. You can use this to your advantage.

Say your clients own a hotel in Seattle. For them, the best domain name would be www.seattlehotel.com so that they could rank for the query Seattle Hotel. They should not worry about becoming a verb because the demand
is not high enough for their service and the benefits of an exact match domain name outweigh the chances of their website changing the world. Need more proof? The domain names pom.com and sex.com sold for $9.5 million and $12 million, respectively.

NOTE For a while, the most searched-for term on both Yahoo! and MSN was Google. People would search for the search leader in Yahoo! and

MSN, click through toQOQQle.COm. and then type their search query This bothered Yahoo! so much that it e\entually put a Yahoo! search bar as the number one result for Google.

But what if a killer domain name is not available? \bu are not alone. As of the time of writing all of the combinations for .com domains with three or fewer characters were already owned. If you can't get seattlehotel.com. you will just need to be more creative. To limit your ability to hurt yourself by being "too creative," I advise you to look out for the following when registering a domain name:

• Avoid hyphens: In domain names, hyphens detract from credibility and act as a spam indicator.

• Avoid generic, uncommon top-level domains (TLDs): Like

hyphens, TLDs such as .info, .cc, .ws, and .name are spam indicators.

• Avoid domain names longer than 15 characters: People are lazy; don't try to make them type a novel just to access your website.

• Be aware of permutations: The owners of ExpertsExchanae.com built a sizable brand before they realized their domain name could be misconstrued as ExpertSexChange.com.

This advice about domains applies mostly to people who are either starting out from scratch, or for whom purchasing a better domain is an option. If you're an SEO, you'll probably have clients that are stuck with the domain they have, either due to branding or financial constraints. If that's you, never fear. While a smartly chosen, keyword-rich domain is often an ideal situation, plenty of sites succeed without one. I doubt, for example, thatAmazon.com is on the lookout for a more book- or electronics-based domain name.

Learn Search Engine Optimization

Ad468*60