Learn Search Engine Optimization : SEO

SEO লেবেলটি সহ পোস্টগুলি দেখানো হচ্ছে৷ সকল পোস্ট দেখান

SEO Chapter 2: The Importance of Good Site Architecture

Before you start examining a website from this level, let me explain the importance of good site architecture.

While writing this book I am working with a large client that is totally befuddled by its poor rankings. (Note: This client had me sign a nasty looking non-disclosure agreement, so I am unable to reveal its name.) The company's homepage is literally one of the most linked-to pages on the entire Internet and at one point had the elusive PageRank 10. One of its current strategies is to leverage its homepage's link popularity to bolster a large group of pages optimized for ultra competitive keywords. It wants to cast a wide net with the optimized pages and drive a large amount of search engine-referred traffic to its product pages.

It is a great idea, but with the current execution, it has no chance of working.

The problem is that the website lacks any kind of traditional site architecture. The link juice (ranking power) coming from the hundreds of thousands of domains that link to this company's homepage has no way of traveling to the other webpages on this domain. All of the link juice is essentially bottled up at the front door.

Its content is located on at least 20 different domains, and there is no global navigation that leads users or search engines from the homepage down to categorized pages. The company's online presence is more like a thousand islands rather than the super continent it could be. It is an enormous waste of resources and is directly affecting the company's bottom line in a real way.

When explaining site architecture to clients, I start out by asking them to visualize a website like an ant hill. All of the chambers are like webpages and the tunnels are like internal links. I then have them imagine a little boy pouring water into the ant hill. He pours it down the main entrance and wants to have it fill all of the chambers. (As a side note, scientists actually have done this with cement to study the structure of ant metropolises. In one case, they had to pour 10 tons of liquid cement into an ant hill before it filled all of the chambers.) In this analogy the water represents the flow of link juice to webpages. As discussed earlier, this link juice (popularity) is essential for rankings.

The optimal structure for a website (or ant hill, if you must) would look similar to a pyramid .

This structure allows the most possible juice to get to all of the website's pages with the fewest number of links. This means that every page on the website gets some ranking benefit from the homepage.

Apyramid structure for a website allows the most possible link juice to get to all the website's pages with the fewest number of links.

NOTE Homepages are almost always the most linked-to pages on a domain. This is because they are the most convenient (the shortest) URL to link to when referring to the website online.

Evaluating Homepages

Now that we are on the same page about site architecture, we can move forward. Once I get to this level of analysis, I start really looking at the site architecture. Obviously, this starts at the homepage.

Ideally, the homepage should link to every single category of pages on a website. Normally, this is accomplished with a global navigation menu (global meaning it is on every web page on the domain). This is easy to do with small websites because if they have less than 150 pages, the homepage could directly link to all of them. (Note this is only a good idea if the homepage has enough links pointing at it to warrant this. Remember the little boy and the ant hill; link popularity is analogous to the amount of water the little boy has. If he doesn't have enough, he can't fill every chamber.)

SEO Chapter-2: Robots.txt and Sitemap.xml

After analyzing the domain name, general design, and URL format, my colleagues and I look at potential client's robots.txt and sitemap. This is helpful because it starts to give you an idea of how much (or little) the developers of the site cared about SEO. A robots.txt file is a very basic step webmasters can take to work with search engines. The text file, which should be located in the root directory of the website (http://www.example.com/robots.txtV is based on an informal protocol that is used for telling search engines what directories and files they are allowed and disallowed from accessing. The inclusion of this file gives you a rough hint of whether or not the developers of the given site made SEO a priority.

Because this is a book for advanced SEOs, I will not go into this protocol in detail. (If you want more information, check out http://www.robotstxt.org or

http://googlewebmastercentral.blogspot.com/2008/06/improving-on-
robots-exclusion-protocol.html.) Instead, I will tell you a cautionary tale.

Bit.ly is a very popular URL shortening service. Due to its connections with Twitter.com, it is quickly becoming one of the most linked websites on the Web. One reason for this is its flexibility. It has a feature where users can pick their own URL. For example, when linking to my website I might choose http://bit.lv/SexyMustache. Unfortunately, Bit.ly forgot to block certain URLs, and someone was able to create a shortened URL for http://bit.Iv/robots.txt. This opened up the possibility for that person to control how robots were allowed to crawl Bit.ly. Oops! This is a great example of why knowing even the basics of SEO is essential for web- based business owners.

After taking a quick glance at the robots.txt file, SEO professionals tend to look at the default location for a sitemap. (http://www.example.com/sitemap.xml). When I do this, I don't spend a lot of time analyzing it (that comes later, if owners of that website become a client); instead, I skim through it to see if I can glean any information about the setup of the site. A lot of times, it will quickly show me if the website has information hierarchy issues. Specifically, I am looking for how the URLs relate to each other. A good example of information hierarchy would b e www.example.com/mammal/doas/enalish-sprinaer-spaniel.html.

whereas a bad example would be www.example.com/node? tvpe=6&kind=7. Notice on the bad example that the search engines can't extract any semantic value from the URL. The sitemap can give you a quick idea of the URL formation of the website.

URLs like this one are a sign a website has information hierarchy issues because search engines can't extract any semantic value from the URL.

Action Checklist

When viewing a website from the 100-foot level, be sure to take the following actions:

• Decide if the domain name is appropriate for the given site based on the criteria outlined in this chapter

• Based on your initial reaction, decide if the graphical design of the

website is appropriate

• Check for the common canonicalization errors

• Check to see if a robots.txt exists and get an idea of how important SEO was to the website developers.

• If inclined, check to see if a sitemap.xml file exists, and if it does, skim through it to get an idea of how the search engines might see the hierarchy of the website.

This section dealt with some of the first elements of a site that I look at when I first look at a client's site from an SEO perspective: domain name, design, canonicalization, robots.txt, and sitemaps. This initial look is intended to just be a high-level viewing of the site.

In the next section I focus on specific webpages on websites and take you even closer to piecing the SEO puzzle together.

SEO Chapter 2: Duplication and Canonicalization

After analyzing a website's domain name and general design, my

colleagues and I check for one of the most common SEO mistakes on the Internet, canonicalization. For SEOs, canonicalization refers to individual webpages that can be loaded from multiple URLs.

NOTE In this discussion, "canonicalization" simply refers to the concept of picking an authoritative version of a URL and propagating its usage, as opposed to using other variants of that URL. On the other hand, the book discusses the specific canonical link element in several places,

including in Chapter 5.

Remember that in Chapter 1 I discussed popularity? (Come on, it hasn't been that long.) What do you think happens when links that are intended to go to the same page get split up among multiple URLs? \fc>u guessed it: the popularity of the pages gets split up. Unfortunately for web developers, this happens far too often because the default settings for web servers create this problem. The following lists show the negative SEO effects of using the default settings on the two most common web servers:

Apache web server:

http://www.example.com/

http://www.example.com/index.html http://example.com/

http ://example.com/index. html Microsoft Internet Information Services (IIS):

http://www.example.com/

http://www.example.com/default.asp (or ,aSPx depending on the version) http://example.com/

http://example.com/default.asp (or .aspx)

Or any combination with different capitalization.

Each of these URLs spreads out the value of inbound links to the homepage. This means that if the homepage has 100 links to these various URLs, the major search engines only give them credit separately, not in a combined manner.

NOTE Don't think it can happen to >ou? Go to http://www.mattcutts.COm and wait for the page to load. Now, go tohttp://mattcutts.com and notice what happens. Look at that,

canonicalization issues. Whafs the significance of this example? Matt Cutts is the head of Google's web spam team and helped write many of the algorithms we SEOs study If he is making this mistake, odds are your less informed clients are as well.

Luckily for SEOs, web developers developed methods for redirection so that URLs can be changed and combined. Two primary types of server redirects exist—301 redirects and 302 redirects:

• A 301 indicates an HTTP status code of "Moved Permanently."

• A 302 indicates a status code of "Temporarily Moved."

Other redirect methods exist, such as the meta refresh and various JavaScript relocation commands. Avoid these methods. Not only do they not pass any authority from origin to destination, but engines are unreliable about following the redirect path.

TIP You can read all of the HTTP status codes at

http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html.

Though the difference between 301 and 302 redirects appears to be merely semantics, the actual results are dramatic. Google decided a long time ago to not pass link juice (ranking power) equally between normal links and server redirects. At SEOmoz, I did a considerable amount of testing around this subject and have concluded that 301 redirects pass between 90 percent and 99 percent of their value, whereas 302 redirects pass almost no value at all. Because of this, my co-workers and I always look to see how non-canonicalized pages are being redirected.

It's not just semantics. How a page is redirected (whether by a 301 or a 302 redirect) matters.

WARNING Oder vBrsions of IIS use 302 redirects by default. D'oh! Be sure to look out for this. You can see worthless redirects all around

popular NS-powered websites like microsoft.com and

myspace.com. The value of these redirects is being completely negated bya single value difference!

Canonicalization is not limited to the inclusion of letters. It also dictates forward slashes in URLs. Try going to http://www.google.com and notice that you will automatically get redirected to http://www.aooale.com/ (notice the trailing forward slash). This is happening because technically this is the correct format for the URL. Although this is a problem that is largely solved by the search engines already (they know thatwww.google.com is intended to mean the same as www.aooale.comI), it is still worth noting because many servers will automatically 301 redirect from the version without the trailing slash to the correct version. By doing this, a link pointing to the wrong version of the URL loses between 1 percent and 10 percent of its worth due to the 301 redirect. The takeaway here is that whenever possible, it is better to link to the version with the forward slash. There is no reason to lose sleep over this (because the engines have mostly solved the problem), but it is still a point to consider.

CROSSRB1 The right and wrong usage of 301 and 302 redirects is discussed in Chapter 3. The correct syntax and usage of the canonical link element is discussed in Chapter 5.

SEO Chapter-2: Don't Fool Yourself, Looks Matter

I once talked to a website owner who had an 80 percent bounce rate on his homepage and figured it was normal. Can you imagine if 80 percent of the people who looked at you immediately ran in the opposite direction? This isn't normal. Web design is an element of SEO that many amateur SEOs miss. It doesn't matter if you can get high rankings if none of the searchers stays on the given webpage after clicking through.

SEO-friendly web design is a lot like getting a prom date; appearance matters. People make decisions about the credibility of a website the instant the page loads. Like people, credible websites have a very specific look and feel to them. They generally have a clear logo in the top left, and a navigation bar horizontally on the top of the page or vertically on the left- hand side. They have less than five colors in their layout (not including images), and they have clear, readable text.

Would you feel comfortable leaving your children with a person in a bright orange prison jumpsuit? Of course not! In the same way, visitors to websites are not going to feel comfortable if they are greeted with pop- ups, loud music, and a multicolored skull logo.

Of course those are extreme examples. The common mistakes that I see are more along the line of the following:

• Lack of focus

• Crowded text

• Slow loading times

• Auto-playing music

• Unclear navigation

• Excess redirects

As an SEO, you need to stress the importance of good design. Though it may be fun and exciting to stretch the limits, it is not fun to be poor because 80 percent of your client's would-be customers leave the website directly after entering.

SEO Chapter-2: Relearning How You See the Web

In This Chapter

• Analyzing how a website fits in its "web neighborhood"

• Viewing websites like an SEO

• Assessing good site architecture and webpages from an SEO perspective

• Assessing website content like an SEO

When people surf the Internet, they generally view each domain as its own island of information. This works perfectly well for the average surfer but is a big mistake for beginner SEOs. Websites, whether they like it or not, are interconnected. This is a key perspective shift that is essential for understanding SEO.

Take Facebook, for example. It started out as a "walled garden" with all of its content hidden behind a login. It thought it could be different and remain completely independent. This worked for a while, and Facebook gained a lot of popularity. Eventually, an ex-Googler and his friend became fed up with the locked-down communication silo of Facebook and started a wide open website called Twitter. Twitter grew even faster than Facebook and challenged it as the media darling. Twitter was smart and made its content readily available to both developers (through APIs) and search engines (through indexable content).

Facebook responded with Facebook Connect (which enables people to log in to Facebook through other websites) and opened its chat protocol so its users could communicate outside of the Facebook domain. It also made a limited amount of information about users visible to search engines. Facebook is now accepting its place in the Internet community
and is benefiting from its decision to embrace other websites. The fact that it misjudged early on was that websites are best when they are interconnected. Being able to see this connection is one of the skills that separates SEO professionals from SEO fakes.

I highly recommend writing down everything you notice in a section of a notebook identified with the domain name and date of viewing.

In this chapter you learn the steps that the SEO professionals at SEOmoz go through either before meeting with a client or at the first meeting (depending on the contract). When you view a given site in the way you are about to learn in this chapter, you need to take detailed notes. \fc>u are likely going to notice a lot about the website that can use improvement, and you need to capture this information before details distract you.

Keep Your Notes Simple

The purpose of the notebook is simplicity and the ability to go back frequently and review yDur notes. If actual ph^ical writing isn't yDur thing, consider a low- tech text editor on your computer, such as Windows Notepad or the Mac's TextEdit.

Bare-bones solutions like a notebook or text editor help you avoid the distraction of the presentation itself and focus on the important issues^the characteristics of the web site that yDu're evaluating.

If you think it will be helpful and you have Internet access readily available, I recommend bringing up a website you are familiar with while reading through this chapter. If you choose to do this, be sure to take a lot of notes in your notebook so you can review them later.

The 1,000-Foot View—Understanding the Neighborhood

Before I do any work on a website I try to get an idea of where it fits into the grand scheme of things on the World Wide Web. The easiest way to do
this is to run searches for some of the competitive terms in the website's niche. If you imagine the Internet as one giant city, you can picture domains as buildings. The first step I take before working on a client's website is figuring out in which neighborhood its building (domain) resides.

This search result page is similar to seeing a map of the given Internet neighborhood. \fc>u usually can quickly identify the neighborhood anchors (due to their link popularity) and specialists in the top 10 (due to their relevancy). >t>u can also start to get an idea of the maturity of the result based on the presence of spam or low-quality websites.

During client meetings, when I look at the search engine result page for a competitive term like advertising, I am not looking for websites to visit but rather trying to get a general idea of the maturity of the Internet neighborhood. I am very vocal when I am doing this and have been known to question out loud, "How did that website get there?" A couple times, the client momentarily thought I was talking about his website and had a quick moment of panic. In reality, I am commenting on a spam site I see rising up the results.

To turn this off, append "&pws=0" to the end of the Google URL.

Also, take note that regardless of whether or not you are logged into a Google account, the search engine will automatically customize your search results based on links you click most. This can be misleading because it will make your favorite websites rank higher for you than they do for the rest of the population.

Along with looking at the results themselves, I look at the other data present on the page. The amount of advertisements on the search result gives a rough idea of how competitive it is. For example, a search forbuy viagra will return a full page height worth of ads, whereas a search for women

that look like Drew Carey WOPI t likely return any. This is because more people

are searching for the blue pill than are searching for large, bald women with nerd glasses.

In addition to the ads, I also look for signs of temporal algorithms. Temporal algorithms are ranking equations that take into account the element of time with regards to relevancy. These tend to manifest themselves as news results and blog posts.

Taking Advantage of Temporal Algorithms

You can use the temporal algorithms to yDur advantage. I accidentally did this once with great success. I wrote a blog post about Mchael Jackson's death and its effect on the search engines a day after he died. As a result of temporal algorithms my post ranked in the top 10 for the query "Mchael Jackson" for a short period following his death. Because of this high ranking, tens of thousands of people read my article. I thought it was because I was so awesome, but after digging into my analytics I realized it was because of unplanned use of the temporal algorithms. If you are a blogger, this tactic of quickly writing about news events can be a great traffic booster.

After scanning search result pages for the given website's niche, I generally get a sense for that neighborhood of the Internet. The important takeaway is to get an idea of the level of competition, not to figure out the ins and outs of how specific websites are ranking. That comes later.

Easy De-Personalization in Firefox and Chrome

Most SEOs perform searches dozens or hundreds of times per day and when you do, if s important that de-personalized results appear so that you see what a "typical" searcher would see, as opposed to search results influenced by yDu own search history.

Firefox is a terrific browser for SEOs for many reasons, but one of its most helpful features is the ability to search right from the address field of the browser, the area at the top of the browser where you normally see the URL of the web

page yDu're on. Better yet, with a little customization, you can easily perform Google searches that are de-personalized (although not de-geotargeted).

1. From the Bookmarks | Organize Bookmarks... menu, select any bookmarks folder in the left pane. (Do not simply select the Al Bookmarks folder, because it won't work.)

2. Right-click the folder and select New Bookmark...

3. Add the following values to the fields:

Name: Google de-personalized search

Location: http://www.google.com/search?&q=%s&pws=Q Tags: (Optional. Add anytags you want.)

Keyword: g

Description: (Optional. Use this to describe the search.)

4. Click Add.

That's it. Now, go to the AJdress field in Firefox (where you see a URL at the top of the browser) and type something like this:

g hdmi cables

This tells Google (g) to search for "hdmi cables". More important, because yDur Location field included &PwS=o, that URL parameter will carryover to your search result. From now on, if yDu want to perform a de-personalized Google search, simply type "g" (no quotes) and the query term from yDur URL field.

Use this process for creating as many custom searches as you like, keeping these important factors in mind:

1. The Location field must contain the exact URL of the search result, with the exception of the%s variable, which will be replaced with yDur query term automatically.

2. The Keyword field is where yau'll type before your search query to tell Firefox which custom query you'll be running. Be brief and accurate. I use terms like "b" for Bing, "tc" for text cache, and so on.

This functionality carries over to Google's Chrome browser too, because Chrome can import bookmarks from any other browser you use. If you're a Chrome user, simply import yDur Firefox bookmarks from the Chrome | Import Bookmarks and Settings menu, and you can search from the Chrome address bar just like you did in Firefox

Action Checklist

When viewing a website from the 1,000-foot level, be sure to complete the following:

• Search for the broadest keyword that the given site might potentially

rank

• Identify the maturity of the search engine results page (SERP) based on the criteria listed in this chapter

• Identify major competitors and record them in a list for later competitive analysis

This section discussed analyzing websites at their highest level. At this point, the details don't matter. Rather it is macro patterns that are important. The following sections dive deeper into the website and figure out how everything is related. Remember, search engines use hundreds of metrics to rank websites. This is possible because the same website can be viewed many different ways.

The 100-Foot View—The Website

When professional SEOs first come to a website that they plan to work with, they view it through a very different lens than if they were just idly surfing. They instinctively start viewing it from the perspective of a search engine. The following are the elements that my colleagues and I pay the most attention to.

SEO Chapter 2: How Important Is a Domain Name?

I could probably write an entire book on this subject. (Hear that Wiley Publishing? That's the sound of money.) From a marketing perspective, a domain name is the single most important element of a website. Unlike a brick-and-mortar company, websites don't have visual cues closely associated with them. Whereas potential customers can use visual cues to identify if a physical building is more likely a barber shop or a bank, they are not able to tell the difference between domain names. All domain names use the exact same format: http:// subdomain dot (optional) root domain dot TLD. Take, for example, http://www.gooale.com or http://www.bing.com. To an outsider, there is no reason to think that any of these resources would be a search engine. They don't contain the word search, and if their brands weren't as strong as they are, their gibberish names wouldn't mean anything to anyone. In fact, if you look at the top 100

most linked-to domains on the Internet, you see this trend over and over again: Wikipedia, >buTube, W3, Amazon, Macromedia, MSN, Flickr, Twitter, Digg, Technorati, IMDB, eBay—the list goes on.

This is where people get confused. They see websites like this and think that the domain name doesn't matter. They register domains that are hard to pronounce (SEOmoz) or hard to spell (Picnik) and figure they don't have to worry. The problem is they don't realize that the popular websites got popular not because of their domain names, but rather despite their domain names. Google was such an outstanding product with a plan that was executed so well that it could have had been named BackRub and still been successful. (Note: It was originally called BackRub. I am just amusing myself.)

As an SEO, if you find yourself in the position of changing or choosing a domain name, you need to make a difficult decision. How confident are you in the client's idea? Is it an idea that serves the entire world, or is it only useful to a few thousand people? If the website is world changing, it might actually benefit from a gibberish name. If the name is gibberish and very successful, people naturally start to associate its name with its service. For example, Google is now synonymous with "search." However, if the idea doesn't end up being world changing (and most websites aren't), a gibberish domain name can hurt the website. What are the odds that the general populous will type in spoke.com (a real website) to find personal profiles?

A nonsensical domain name can hurt a website, making it harder for people (and search engines) to find that site and associate with the concepts that the site focuses on.

For the vast majority of websites, a "search friendly" domain name is best. The search engines will always be constrained by the fact that many people search for exact URLs when they want to go to websites. Of course, the most relevant and popular result for the query "myspace.com" would be www.myspace.com. You can use this to your advantage.

Say your clients own a hotel in Seattle. For them, the best domain name would be www.seattlehotel.com so that they could rank for the query Seattle Hotel. They should not worry about becoming a verb because the demand
is not high enough for their service and the benefits of an exact match domain name outweigh the chances of their website changing the world. Need more proof? The domain names pom.com and sex.com sold for $9.5 million and $12 million, respectively.

NOTE For a while, the most searched-for term on both Yahoo! and MSN was Google. People would search for the search leader in Yahoo! and

MSN, click through toQOQQle.COm. and then type their search query This bothered Yahoo! so much that it e\entually put a Yahoo! search bar as the number one result for Google.

But what if a killer domain name is not available? \bu are not alone. As of the time of writing all of the combinations for .com domains with three or fewer characters were already owned. If you can't get seattlehotel.com. you will just need to be more creative. To limit your ability to hurt yourself by being "too creative," I advise you to look out for the following when registering a domain name:

• Avoid hyphens: In domain names, hyphens detract from credibility and act as a spam indicator.

• Avoid generic, uncommon top-level domains (TLDs): Like

hyphens, TLDs such as .info, .cc, .ws, and .name are spam indicators.

• Avoid domain names longer than 15 characters: People are lazy; don't try to make them type a novel just to access your website.

• Be aware of permutations: The owners of ExpertsExchanae.com built a sizable brand before they realized their domain name could be misconstrued as ExpertSexChange.com.

This advice about domains applies mostly to people who are either starting out from scratch, or for whom purchasing a better domain is an option. If you're an SEO, you'll probably have clients that are stuck with the domain they have, either due to branding or financial constraints. If that's you, never fear. While a smartly chosen, keyword-rich domain is often an ideal situation, plenty of sites succeed without one. I doubt, for example, thatAmazon.com is on the lookout for a more book- or electronics-based domain name.

SEO Chaper-1: Link Relevancy

As search engines matured, they started identifying more metrics for determining rankings. One that stood out among the rest was link relevancy.

The difference between link relevancy and link popularity (discussed in the previous section) is that link relevancy does not take into account the power of the link. Instead, it is a natural phenomenon that works when people link out to other content.

Let me give you an example of how it works. Say I own a blog where I write about whiteboard markers. (\fes, I did just look around my office for an example to use, and yes, there are actually people who blog about whiteboard markers. I checked.) Ever inclined to learn more about my passion for these magical writing utensils, I spend part of my day reading online what other people have to say about whiteboard markers.

On my hypothetical online reading journey, I find an article about the psychological effects of marker color choice. Excited, I go back to my website to blog about the article so (both of) my friends can read about it. Now here is the critical takeaway. When I write the blog post and link to the article, I get to choose the anchor text. I could choose something like "click here," but more likely I choose something that it is relevant to the article. In this case I choose "psychological effects of marker color choice." Someone else who links to the article might use the link anchor text "marker color choice and the effect on the brain."

People ha\e a tendency to link to content using the anchor text of either the domain name or the title of the page. Use this to your advantage by including keywords you want to rank tor in these two elements.

This human-powered information is essential to modem-day search engines. These descriptions are relatively unbiased and produced by real people. This metric, in combination with complicated natural language processing, makes up the lion's share of relevancy indicators online.

Other important relevancy indicators are link sources and information hierarchy. For example, the search engines can also use the fact that I linked to the color choice article from a blog about whiteboard markers to
supplement their understanding of relevancy. Similarly, they can use the fact that the original article was located at the URL www.example.com/vision/color/ to determine the high-level positioning and relevancy of the content. As you read later in this book (Chapter 2 specifically), these secrets are essential for SEOs to do their job.

Beyond specific anchor text, proximal text—the certain number of characters preceding and following the link itself—have some value. Something that's logical, but annoying is when people use a verb as anchor text, such as "Frank said ..." or "Jennifer wrote ...", using "said" or "wrote" as the anchor text pointing back to the post. In a situation like that, engines have figured out how to apply the context of the surrounding copy to the link.

SEO Chapter-1: The Secrets of Relevancy

In the previous section, I discussed how popular pages (as judged by links) rank higher. By this logic, you might expect that the Internet's most popular pages would rank for everything. To a certain extent they do (think Wikipedia!), but the reason they don't dominate the rankings for every search result page is that search engines put a lot of emphasis on determining relevancy.

Text Is the Currency of the Internet

Relevancy is the measurement of the theoretical distance between two corresponding items with regards to relationship. Luckily for Google and Microsoft, modem-day computers are quite good at calculating this measurement for text.

By my estimations, Google owns and operates well over a million servers. The electricity to power these servers is likely one of Google's larger operating expenses. This energy limitation has helped shape modern search engines by putting text analysis at the forefront of search. Quite simply, it takes less computing power and is much simpler programmatically to determine relevancy between a text query and a text document than it is between a text query and an image or video file. This is the reason why text results are so much more prominent in search results than videos and images.

As of this writing, the most recent time that Google publicly released the size of its indices was in 2006. At that time it released the numbers shown in Table 1-1.

Table 1-1: Size of Google Indices

Data

Size in Terabytes

Crawl Index

800

Google Analytics

200

Google Base

2

Google Earth

70

Orkut

9

Personalized Search

4

So what does this emphasis on textual content mean for SEOs? To me, it indicates that my time is better spent optimizing text than images or videos. This strategy will likely have to change in the future as computers get more powerful and energy efficient, but for right now text should be every SEO's primary focus.

This is especially true until Google finds better ways to interpret and grade non-textual media

But Why Content?

The most basic structure a functional website could take would be a blank page with a URL. For example purposes, pretend your blank page is on the fake domain www.WhatlsJessicaSimpsonThinking.com. (Get it? It is a blank page.) Unfortunately for the search engines, clues like top-level domains ( .com, .org, and so on), domain owners (WHOIS records), code validation, and copyright dates are poor signals for determining relevancy. This means your page with the dumb domain name needs some content before it is able to rank in search engines.

The search engines must use their analysis of content as their primary indication of relevancy for determining rankings for a given search query. For SEOs, this means the content on a given page is essential for manipulating—that is, earning—rankings. In the old days of AltaVista and
other search engines, SEOs would just need to write "Jessica Simpson" hundreds times on the site to make it rank #1 for that query. What could be more relevant for the query "Jessica Simpson" than a page that says Jessica Simpson 100 times? (Clever SEOs will realize the answer is a page that says "Jessica Simpson" 101 times.) This metric, called keyword density, was quickly manipulated, and the search engines of the time diluted the power of this metric on rankings until it became almost useless. Similar dilution has happened to the keywords meta tag, some kinds of internal links, and H1 tags.

Despite being more sophisticated, modem-day search engines still work essentially the same way they did in the past—by analyzing content on the page.

Hey, Ben Stein, thanks for the history lesson, but how does this apply to modern search engines? The funny thing is that modern-day search engines still work essentially the same way they did back in the time of keyword density. The big difference is that they are now much more sophisticated. Instead of simply counting the number of times a word or phrase is on a webpage, they use natural language processing algorithms and other signals on a page to determine relevancy. For example, it is now fairly trivial for search engines to determine that a piece of content is about Jessica Simpson if it mentions related phrases like "Nick Lachey" (her ex- husband), "Ashlee Simpson" (her sister), and "Chicken of the Sea" (she is infamous for thinking the tuna brand "Chicken of the Sea" was made from chicken). The engines can do this for a multitude of languages and with astonishing accuracy.

Don't believe me? Try going to Google right now and searching

related:www.j essicasimpson.com. If your results are like mine, you will see

websites about her movies, songs, and sister. Computers are amazing things. In addition to the words on a page, search engines use signals like image meta information (alt attribute), link profile and site architecture, and information hierarchy to determine how relevant a given page that mentions "Jessica" is to a search query for "The Simpsons."

SEO Chapter-1: Domain and Page Popularity

Domain and Page Popularity

There are hundreds of factors that help engines decide how to rank a page. And in general, those hundreds of factors can be broken into two categories—relevance and popularity (or "authority"). For the purposes of this demonstration you will need to completely ignore relevancy for a second. (Kind of like the search engine Ask.com.) Further, within the category of popularity, there are two primary types—domain popularity and page popularity. Modern search engines rank pages by a combination of these two kinds of popularity metrics. These metrics are measurements of link profiles. To rank number one for a given query you need to have the highest amount of total popularity on the Internet. (Again, bear with me as we ignore relevancy for this section.)

This is very clear if you start looking for patterns in search result pages. Have you ever noticed that popular domains like Wikipedia.org tend to rank for everything? This is because they have an enormous amount of domain popularity. But what about those competitors who outrank me for a specific term with a practically unknown domain? This happens when they have an excess of page popularity. See Figure 1-1.

Figure 1 -1: Graph showing different combinations of relevancy and popularity metrics that can be used to achieve high rankings

Link Popularity
en.wikipedia.org/ get,adobe,com/reader/ awesome.com/

I Domain Popularity ED Page Popularity

Althoughen.wikipedia.org has a lot of domain popularity and get.adobe.com/reader/ has a lot of page popularity, www.awesome.com ranks higher because it has a higher total amount of popularity. This fact and relevancy metrics (discussed later in this chapter) are the essence of Search Engine Optimization. (Shoot! I unveiled it in the first chapter, now what am I going to write about?)

Popularity Top Ten Lists

The top 10 most linked-to domains on the Internet (at the time of writing) are:

• Gooale.com

• Adobe.com

• Yahoo.com

• Bloaspot.com

• Wikipedia.org

• YouTube.com

• W3.ora

• Myspace.com

• Wordpress.com

• Microsoft.com

The top 10 most linked-to pages on the Internet (at the time of writing) are:

• http ://word press.org/

• http://www.aooale.com/

• http://www.adobe.com/products/acrobat/readstep2.html

• http://www.miibeian.gov.cn/

• http ://vali dator.w3 .org/check/referer

• http://www.statcounter.com/

• http://jigsaw.w3.org/css-validator/check/referer

• http://www.phpbb.com/

• http://www.vahoo.com/

• http://del.icio.us/post

Source: SEOmoz's Linkscape—Index of the World Wde Wsb

Before I summarize I would like to nip the PageRank discussion in the bud. Google releases its PageRank metric through a browser toolbar. This is not the droid you are looking for. That green bar represents only a very small part of the overall search algorithm.

Not only that, but at any gi\en time, the TbPR (Toolbar PageRank) value you see may be up to 60-90 days older or more, and if s a single-digit representation of whafs probablywsrya long decimal value.

Just because a page has a PageRank of 5 does not mean it will outrank all pages with a PageRank of 4. Keep in mind that major search engines do not want you to reverse engineer their algorithms. As such, publicly releasing a definitive metric for ranking would be idiotic from a business perspective. If there is one thing that Google is not, it's idiotic.

Google makes scraping (automatically requesting and distributing) its PageRank metric difficult To get around the limitations, you need to write a program that requests the metric from Google and identifies itself as the Google Toolbar.

In my opinion, hyperlinks are the most important factor when it comes to ranking web pages. This is the result of them being difficult to manipulate. Modern search engines look at link profiles from many different perspectives and use those relationships to determine rank. The takeaway for you is that time spent earning links is time well spent. In the same way that a rising tide raises all ships, popular domains raise all pages. Likewise, popular pages raise the given domain metrics.

In the next section I want you to take a look into the pesky missing puzzle piece of this chapter: relevancy. I am going to discuss how it interacts with popularity, and I may or may not tell you another fairy tale.

SEO Chapter 1: Understanding Search Engine Optimization

At Google, search engineers talk about "80-20" problems. They are describing situations where the last 20 percent of the problem is 80 percent of the work. Learning SEO is one of these problems. Eighty percent of the knowledge SEOs need is available online for free. Unfortunately, the remaining 20 percent takes the majority of the time and energy to find and understand. My goal with this book is to solve this problem by making the last 20 percent as easy to get as the first 80 percent. Though I don't think I will be able to cover the entire 20 percent (some of it comes from years of practice), I am going to write as much actionable advanced material as humanly possible.

This book is for those who already know the basics of SEO and are looking to take their skills to the next level. Before diving in, try reading the following list:

• robots.txt

• sitemap

• nofollow

• 301 redirect

• canonicalization

If you are not sure what any of the items in this list are, you should go
over to the nearest computer and read the article "The Beginner's Guide to SEO" at

http://www.seomoz.ora/article/beainners-auide-to-search-enaine- optimization

This free article can teach you everything you need to know to use this book to its fullest. Done with that? Great, now we can begin.

The Secrets of Popularity

Once upon a time there were two nerds at Stanford working on their PhDs. (Now that I think about it, there were probably a lot more than two nerds at Stanford.) Two of the nerds at Stanford were not satisfied with the current options for searching online, so they attempted to develop a better way.

Being long-time academics, they eventually decided to take the way academic papers were organized and apply that to webpages. A quick and fairly objective way to judge the quality of an academic paper is to see how many times other academic papers have cited it. This concept was easy to replicate online because the original purpose of the Internet was to share academic resources between universities. The citations manifested themselves as hyperlinks once they went online. One of the nerds came up with an algorithm for calculating these values on a global scale, and they both lived happily ever after.

Of course, these two nerds were Larry Page and Sergey Brin, the founders of Google, and the algorithm that Larry invented that day was what eventually became PageRank. Long story short, Google ended up becoming a big deal and now the two founders rent an airstrip from NASA so they have somewhere to land their private jets. (Think I am kidding? See http://searchenaineland.com/vour-auide-to-the-aooale-iet-12161.)

Relevance, Speed, and Scalability

Hypothetically the most relevant search engine would have a team of experts on every subject in the entire world—a staff large enough to read, study and evaluate every document published on the web so they could return the most accurate results for each querysubmitted by users.

The fastest search engine, on the other hand, would crawl a new URL the very second it's published and introduce it into the general index immediately available to appear in query results only seconds after it goes live.

The challenge for Google and all other engines is to find the balance between those two scenarios: "lb combine rapid crawling and indexing with a relevance algorithm that can be instantly applied to new content. In other words, they're trying to build scalable relevance. With very few exceptions, Google is uninterested in hand-removing (or hand-promoting) specific content. Instead, its model is built around identifying characteristics in web content that indicate the content is especially relevant or irrelevant, so that content all across the web with those same characteristics can be similarly promoted or demoted.

This book frequently discusses the benefits of content created with the user in mind, "lb some hardcore SEOs, Google's "think about the user" mantra is corny they'd much prefer to know a secret line of code or server technique that bypasses the intent of creating engaging content.

While it maybe corny Google's focus on creating relevant, user-focused content really is the key to its algorithm of scalable relevance. Google is constantly trying to find ways to reward content that truly answers users' questions and ways to minimize or filter out content built for content's sake. While this book discusses techniques for making your content visible and accessible to engines, remember that means talking about content constructed with users in mind, designed to be innovative, helpful, and to serve the query intent of human users. It might be corny, but if s effective.

That fateful day, the Google Guys capitalized on the mysterious power of links. Although a webmaster can easily manipulate everything (word choice, keyword placement, internal links, and so on) on his or her own website, it is much more difficult to influence inbound links. This natural link profile acts as an extremely good metric for identifying legitimately popular pages.

NOTE Google's PageRank was actually named after its creator, Larry Page. Originally the algorithm was named BackRub after its emphasis on backlinks. Later, its name was changed to PageRank because of its connections to Larry Page's last name and the ability for the algorithm to rank pages.

Larry Page's original paper on PageRank, "The Anatomy of a Large- Scale Hypertextual Web Search Engine," is still available online. If you are interested in reading it, it is available on Stanford's website at

http://infolab.stanford.edu/-backrub/google.html. it is highly

technical, and I have used it on more than one occasion as a sleep aid. It's worth noting that the original PageRank as described in this paper is onlya tiny part of Google's modern-day search algorithm.

Now wait a second—isn't this supposed to be a book for advanced SEOs? Then why am I explaining to you the value of links? Relax, there is a method to my madness. Before I am able to explain the more advanced secrets, I need to make sure we are on the same page.

As modern search engines evolved, they started to take into account the link profile of both a given page and its domain. They found out that the relationship between these two indicators was itself a very useful metric for ranking webpages.

Why This Book Is Better Than Other SEO Books

Modern SEO is complicated, fast moving, and rife with misconceptions. This makes it extremely difficult to learn. When I began researching for this book, I read all of the major SEO books that were available. I quickly found that they were full of theory and lacked actionable steps to really help the reader master the subject.

I wrote this book with the goal of building the bridge between theory and action by bringing together all of the best sources of information I have found and putting them in a format that makes it easy to understand and, more importantly, do SEO like a professional. This emphasis on action follows the steps I originally used to leam SEO. I believe this focus on process followed by explanation is unique among SEO books on the market, and I believe it will make the difference that allows you to out rank your competition.

SEO-Who This Book Is For

This book is for the SEO who already knows the basics of SEO and wants to take this knowledge to the next level so that they can make more money. In the SEO industry, the best way I have found to do this is to do SEO consulting.

This book is written as a guide to becoming an SEO consultant or for those who want to use the strategies of professional SEO consultants. It clearly lays out the processes and perspectives I have used at SEOmoz
when I did consulting for some of the most well-known websites on the Internet. It is intended for those who love the Internet and strive to influence how it operates.

SEO: Read This First

Why would someone like myself want to publish my SEO secrets for the world to read? Doesn't this destroy my competitive advantage? Won't I surely go broke and starve on the street? Won't my friends mock me and my family disown me?

For two reasons, the answer is probably not.

• The first reason is the size of the market. The Internet is incredibly large and growing at an astounding rate. The market for SEO is following a similar path. There is absolutely no way I could work for all of the websites that need SEO consulting. As such, I am happy to pass the work on to others and teach them how to succeed. It is no money out of my pocket, and it makes me feel like I am contributing to a greater good. I learned most of what I know about SEO from others and, as such, feel obligated to spread the knowledge.

• The second reason has to do with SEOmoz, the company I used to work for. SEOmoz provides tools to help SEOs do their jobs. As such, it is to my advantage to promote and train other SEOs. Just like Google benefits from getting more people online, I benefit from teaching others how to do SEO. \fc>u may choose to use SEOmoz's competitors' services or you may not. That is completely up to you, and I will do my best to show you all the available options.

Learn Search Engine Optimization

Ad468*60