|
Thursday, 15 December 2011
Join me on Twitter
How to Fix Crawl Errors in Google Webmaster Tools
Posted by Joe Robison
This post was originally in YouMoz, and was promoted to the main blog because it provides great value and interest to our community. The author's views are entirely his or her own and may not reflect the views of SEOmoz, Inc.
Looking at 12,000 crawl errors staring back at you in Webmaster Tools can make your hopes of eradicating those errors seem like an insurmountable task that will never be accomplished. The key is to know which errors are the most crippling to your site, and which ones are simply informational and can be brushed aside so you can deal with the real meaty problems. The reason it’s important to religiously keep an eye on your errors is the impact they have on your users and Google’s crawler.
Having thousands of 404 errors, especially ones for URLs that are being indexed or linked to by other pages pose a potentially poor user experience for your users. If they are landing on multiple 404 pages in one session, their trust for your site decreases and of course leads to frustration and bounces.
You also don’t want to miss out on the link juice from other sites that are pointing to a dead URL on your site, if you can fix that crawl error and redirect it to a good URL you can capture that link to help your rankings.
Additionally, Google does have a set crawl budget allotted to your site, and if a lot of the robot’s time is spent crawling your error pages, it doesn’t have the time to get to your deeper more valuable pages that are actually working.
Without further ado, here are the main categories that show up in the crawl errors report of Google Webmaster Tools:
HTTP
This section usually returns pages that have shown errors such as 403 pages, not the biggest problems in Webmaster Tools. For more documentation with a list of all the HTTP status codes, check out Google’s own help pages. Also check out SEO Gadget’s amazing Server Headers 101 infographic on SixRevisions.
In Sitemaps
Errors in sitemaps are often caused by old sitemaps that have since 404’d, or pages listed in the current sitemap that return a 404 error. Make sure that all the links in your sitemap are quality working links that you want Google to crawl.
One frustrating thing that Google does is it will continually crawl old sitemaps that you have since deleted to check that the sitemap and URLs are in fact dead. If you have an old sitemap that you have removed from Webmaster Tools, and you don’t want being crawled, make sure you let that sitemap 404 and that you are not redirecting the sitemap to your current sitemap.
From Google employee Susan Moskwa:
“The best way to stop Googlebot from crawling URLs that it has discovered in the past is to make those URLs (such as your old Sitemaps) 404. After seeing that a URL repeatedly 404s, we stop crawling it. And after we stop crawling a Sitemap, it should drop out of your "All Sitemaps" tab.”
Not Followed
Most of these errors are often caused by redirect errors. Make sure you minimize redirect chains, the redirect timer is set for a short period, and don’t use meta refreshes in the head of your pages.
Matt Cutts has a good Youtube video on redirect chains, start 2:45 in if you want to skip ahead.
Google crawler exhausted after a redirect chain.
What to watch for after implementing redirects:
- When you redirect pages permanently, make sure they return the proper HTTP status code, 301 Moved Permanently.
- Make sure you do not have any redirect loops, where the redirects point back to themselves.
- Make sure the redirects point to valid pages and not 404 pages, or other error pages such as 503 (server error) or 403 (forbidden).
- Make sure your redirects actually point to a page and are not empty.
Tools to use:
- Check your redirects with a response header checker tool like URI Valet or the Check Server Headers Tool.
- Screaming Frog is an excellent tool to check which pages on your site are showing a 301 redirect, and which ones are showing 404 errors or 500 errors. The free version caps out at 500 pages on the site, beyond this you would need to buy the full version.
- The SiteOpSys Search Engine Indexing Checker is an excellent tool where you can put in a list of your URLs that you submitted as redirects. This tool will allow you to check your URLs in bulk to see which ones are indexing and which ones are not. If your original URLs that you had redirected are no longer indexing that means Google removed the old URL from its index after it saw the 301 redirect and you can remove that redirect line from your .htaccess file now.
Examine your site in the text only version by viewing the cached version of the site from the Google SERP listing of the site, then select the text-only version. Make sure you can see all your links and they are not being hidden by Javascript, Flash, cookies, session IDs, DHTML, or frames.
Always use absolute and not relative links, if content scrapers scrape your images or links, they can reference your relative links on their site and if improperly parsed you may see not followed errors show up in your Webmaster Tools, this has happened with one of our sites before and it’s almost impossible to find out where the source link that caused the error is coming from.
Not Found
Not found errors are by and large 404 errors on your site. 404 errors can occur a few ways:
- You delete a page on your site and do not 301 redirect it
- You change the name of a page on your site and don’t 301 redirect it
- You have a typo in an internal link on you site, which links to a page that doesn’t exist
- Someone else from another site links to you but has a typo in their link
- You migrate a site to a new domain and the subfolders do not match up exactly
Best practice: if you are getting good links to a 404’d page, you should 301 redirect it to the page the link was supposed to go to, or if that page has been removed then to a similar or parent page. You do not have to 301 redirect all 404 pages. This can in fact slow down your site if you have way too many redirects. If you have an old page or a large set of pages that you want completely erased, it is ok to let these 404. It is actually the Google recommended way to let the Googlebot know which pages you do not want anymore.
There is an excellent Webmaster Central Blog post on how Google views 404 pages and handles them in webmaster tools. Everyone should read it as it dispels the common “all 404s are bad and should be redirected” myth.
Rand also has a great post on whether 404’s are always bad for SEO also.
Restricted by robots.txt
These errors are more informational, since it shows that some of your URLs are being blocked by your robots.txt file so the first step is to check out your robots.txt file and ensure that you really do want to block those URLs being listed.
Sometimes there will be URLs listed in here that are not explicitly blocked by the robots.txt file. These should be looked at on an individual basis as some of them may have strange reasons for being in there. A good method to investigate is to run the questionable URLs through URI valet and see the response code for this. Also check your .htacess file to see if there is a rule that is redirecting the URL.
Soft 404s
If you have pages that have very thin content, or look like a landing page these may be categorized as a soft 404. This classification is not ideal, if you want a page to 404 you should make sure it returns a hard 404, and if your page is listed as a soft 404 and it is one of your main content pages, you need to fix that page to make sure it doesn’t get this error.
If you are returning a 404 page and it is listed as a Soft 404, it means that the header HTTP response code does not return the 404 Page Not Found response code. Google recommends “that you always return a 404 (Not found) or a 410 (Gone) response code in response to a request for a non-existing page.“
We saw a bunch of these errors with one of our clients when we redirected a ton of broken URLs to a temporary landing page which only had an image and a few lines of text. Google saw this as a custom 404 page, even though it was just a landing page, and categorized all the redirecting URLs as Soft 404s.
Timed Out
If a page takes too long to load, the Googlebot will stop trying to call it after a while. Check your server logs for any issues and check the page load speed of your pages that are timing out.
Types of timed out errors:
- DNS lookup timeout – the Googlebot request could not get to your domain’s server, check DNS settings. Sometimes this is on Google’s end if everything looks correct on your side. Pingdom has an EXCELLENT tool to check out the DNS health of your domain and it will show you any issues that pop up.
- URL timeout – an error from one of your specific pages, not the whole domain.
- Robots.txt timeout – If your robots.txt file exists but the server timed out when Google tried to crawl it, Google will postpone the crawl of your site until it can reach the robots.txt file to make sure it doesn’t crawl any URLs that were blocked by the robots.txt file. Note that if you do not have a robots.txt and Google gets a 404 from trying to access your robots.txt, it will continue on to crawl the site as it assumes that the file doesn’t exist.
Unreachable
Unreachable errors can occur from internal server errors or DNS issues. A page can also be labeled as Unreachable if the robots.txt file is blocking the crawler from visiting a page. Possible errors that fall under the unreachable heading are “No response”, “500 error”, and “DNS issue” errors.
There is a long list of possible reasons for unreachable errors, so rather than list it here, I’ll point you to Google’s own reference guide here. Rand also touched on the impact of server issues back in 2008.
Conclusion
Google Webmaster Tools is far from perfect. While we all appreciate Google’s transparency with showing us what they are seeing, there are still some things that need to be fixed. To start with, Google is the best search engine in the universe, yet you cannot search through your error reports to find that one URL from a month ago that was keeping you up at night. At least they could have supplemented this with good pagination, but nope you have to physically click through 20 pages of data to get to page 21. One workaround for this is to edit the page number by editing the end of the URL string that shows what part of the errors list you are looking at. You can download all of the data into an Excel document, which is the best solution, but Google should still upgrade Webmaster Tools to allow searching from within the application.
Also, the owner of the site should have the ability to delete ALL sitemaps on the domain they own, even if someone else uploaded it a year ago. Currently you can only delete the sitemap that you yourself uploaded through your Webmaster Tools account. If Jimmy from Agency X uploaded an image sitemap a year ago before you let them go, this will still show up in the All Sitemaps tab. The solution to get rid of it is to let the sitemap 404 and it will drop off eventually but it can be a thorn in your side to have to see it every day until it leaves.
Perhaps, as Bing starts to upgrade its own Webmaster Tools, we will begin to see some more competition between the two search engines in their product offerings. Then one day, just maybe, we will get complete transparency and complete control of our sites in the search engines.
Give me some feedback!
What success/obstacles have you ran into when troubleshooting Webmaster Tools errors?
Any recommendations for new users to this powerful but perplexing tool?
kindle fire coupon codes chirstmas gifts ideas for boyfriend
Passive Online Income vs Sustainable Online Income
Is there such a thing as "passive" income? Generally no. A person can cash out existing brand equity and exposure, but if they cash out too aggressively and/or do not reinvest enough then they are ultimately cashing out their market position and will eventually fade.
Does Google Make "Passive" Income?
Online there are some network effects that are hard to beat. MySpace had them over Facebook & only lost due to years of systematic incompetence & mismanagement. But if you are boastful about your business model competition will come and eat your lunch. Look at all the Groupon clones. And even Google has to claw and fight for every percent of search marketshare.
A person could say "well Google makes passive income" and I would counter that with "not really."
So far this month Google has made about a dozen search interface changes or tests & the underlying relevancy algorithms have likely had at least 3x or 4x as much change.
Keeping Google's Marketshare Costs Big Money
The propaganda Google spreads include statements like: "users keep coming back to Google even though they have a choice of a search engine every time they open a browser"
While Google maintains that their monopolist marketshare is due to user appreciation of superior technology, a ton of their exposure is paid for. I was helping a friend set up a new laptop and the amount of Google added to the machine made me feel like Google is the new Norton or Symantec.
If you use the Internet Explorer browser to access the web it comes with a Google Toolbar.
That toolbar defaults to enhanced features enabled.
Google not only pays to be the default search provider, but as part of that they also pay to have competition removed from the default options list!
Google also pays for Chrome to be installed in the laptop.
If you are curious enough to click on the pinned Chrome logo then when it opens they try to set it as your default browser.
If you do use Chrome regularly you see Chrome store ads bundled right into the browser.
Ads are also included within the interface of their online tools. For example, if you use Google Analytics they may recommend you try AdSense, AdWords, or their affiliate network.
The act of logging out of 1 Google service may trigger ads for another.
Google bundled chat into Google+ & they were fined by the FTC for bundling Google Buzz into Gmail, a violation of user's privacy.
Google's doodle drawings on their homepage may also promote their other offerings
Even if you don't use Chrome or the Google Toolbar in Internet Explorer then whenever you use Google they suggest setting it to your home page.
And even if you don't change your homepage, Google paid to be the default search box on Toshiba's default start page!
If you manage to somehow avoid all the above Google payola then they also pay other browsers (like Firefox) to be the default search service. Further, they then wait for those 3rd party browser plugins to have security issues & then do a bundled cross-promotion there, thus turning competing browsers into ads for more Google crap.
And when you go to update Flash, look where they tell you to search from
If your default search provider isn't Google when you install Chrome they use an option screen to help you change it, with Google being the first choice
Either Google is fibbing when they state how much of their existing marketshare is due to superior quality service, or they are hedging a risk of losing marketshare to Bing by buying placement everywhere they can. And to me this really highlights one of the big issues with truly "passive" online income. In spite of Google's success (& the great network effects they enjoy), even Google feels the need to spend hundreds of millions of Dollars a year buying exposure for their own browser, buying default search provider exposure in 3rd party browsers, and ensuring new computers are filled with promotional Google crapware.
Google also uses their browser's start screen to push beyond software into hardware...a cautionary tale for Android manufacturers after seeing Google acquire Motorola Mobility.
This sort of cross promotion is everywhere, from ads on Youtube promoting Chrome
to Gmail ads highlighting featured Youtube videos
and Google+ games having Chrome ads integrated as special items in the game
right on through to Google buying display ads promoting display ads.
Facebook realizes how powerful this cross-integration is & thus buys ads on Youtube as well.
But if you want to leave Google's ecosystem it takes a lot of effort, as Google is willing to advertise the Google alternative aggressively wherever they can.
Google recently extended their ecosystem of cross-referencing further by automatically adding Google Related to Google Chrome & the Google Toolbar, which recommends Google content within the browser no matter where you are on the web.
Google's bundling not only follows users around the web & personalizes ads, but it also bakes right into the core of their relevancy algorithms. Eric Schmidt stated "the internet would be better if we knew you were a real person rather than a dog or a fake person. Some people are just evil and we should be able to ID them and rank them downward."
Either you sign up for a Google Profile or you suffer the consequences! Forbes published (then quickly pulled) an alarming article titled ?Stick Google Plus Buttons On Your Pages, Or Your Search Traffic Dies.? Wired followed up spreading a similar message & a new Google trusted stores rating system for merchants was also spotted.
With so many attempts at lock-in there is no surprise that some other browsers which have partnered with Google are considering moving on.
This is not to say that Bing doesn't do marketing as well. They just are not as slick about it.
Policing Advertisers Costs Billions
In addition to evolving their core relevancy algorithm, Google has to police advertisers who are willing to be deceptive, market counterfeit goods & use the lowest common denominator. When Google is too loose that can cost them a pretty penny: they just paid a $500 million fine to the US government for ads from Canadian pharmacies. The DOJ claimed Mr. Page knew what was going on:
Mr. Neronha said those efforts amounted to "window-dressing," allowing Google to continue earning revenues from the allegedly illicit ad sales even as it professed to be taking action against them. Google employees helped undercover Justice Department agents in the sting operation evade controls designed to stop companies from advertising illegally, he said.
"Suffice it to say that this is not two or three rogue employees at the customer service level doing this on their own," Mr. Neronha said in an interview. "This was a corporate decision to engage in this conduct."
Likewise, it costs Google a lot of money to deal with lawsuits that arise due to their business practices & lack of respect for copyright with photos, books & videos. They eventually had to develop an expensive video footprinting technology to adopt DRM features on Youtube.
And building the partnerships Google has to run Youtube isn't easy. They pay something like a half-cent per video view & if you create a site with a "no soup for you" message (like the above Google page) for markets where the finances do not work out then you are violating their search guidelines by cloaking, whereas Google overly-promotes YouTube in the search results and is free to count ad views as video views (once again, against Google's guidelines).
New Niche? New Lawsuits
Eric Schmidt highlighted how the lobbyists write the laws & then Google went out and hired over a dozen lobbyist firms. Anything that disintermediates search costs Google a cut of revenues.
While Groupon is still unproven as a business model, Google was willing to spend $6 billion to buy it in order to avoid the risk of missing out on a new form of local ads.
Mobile search now represents 12% of the search market. To look in their dominant search position onto the new devices Google:
- build a new operating system to give away for free
- paid carriers a revenue share (in addition to giving it away)
- likely violated Oracle patents (that will likely cost them in the B's)
- had other patent issues which required Google to spend $12.5 billion buying Motorola (that is nearly 1/3 of the cash Google has built up through their IPO & saved profits in the 10-year history of the company)
Sneaky ISPs Redirecting Search Traffic
What is worse for Google, is in spite their default status, their huge ad budget, and being large enough to be sued regularly, even all that isn't enough to keep all the traffic they pay for, as there is widespread hijacking of search traffic by ISP providers.
Google Isn't Passive, but ___ Is
Google may have bit off more than they could chew & are certainly doing anything but being passive. But maybe some other companies that make great money are doing so passively. Offline that is certainly true in many instances, but online passive companies tend to disappear.
Look at all the work Yahoo! has done with their news box & their sports vertical, yet when you back out the cash on the books & the foreign investments the company isn't valued at much above $0. AOL has also cratered. In spite of their huge traffic streams they are not growing with the market due to search bypassing them & niche players picking them apart one vertical at a time. Running a portal profitably & sustainably is anything but passive.
Even deep into the long tail at the other end of the equation the profits may be every bit as scarce. Demand Media's accounting techniques show that they were far better at growing revenues than growing profits & the company may never be profitable.
The Limits of "Search"
Google & Bing keep eating more of the value chain through content scraping & a more interactive search experience that include new ad formats, like coupons & product ads with pictures.
In addition, search companies are challenging the boundaries of search by creating vertical media & ad networks that compete against a wide array of publisher websites.
The Huffington Post
Autonomy / Fast Search
Groupon
BankRate
MapQuest + TomTom
The Yellow Pages
Dell / HP
That "Shady" Competitor
When Google talks about "protecting users" one of the case studies / angles they push is the health angle:
The paid post at the top happens to be about brain tumors, which is a really serious subject. If you are searching for information about brain cancer or radiosurgery, you probably don?t want a company buying links in an attempt to show up higher in search engines. Other paid posts might not be as starkly life-or-death, but they can still pollute the ecology of the web.
While Google was using the life-or-death approach to policing link buying outside of their AdWords ad network, Google was knowingly selling search ads to Canadian "pharmacies" providing illicit drugs in the US. The official settlement document lists how Google insiders knew work-arounds to the automated systems & were working directly on managing the ad accounts associated with the illegal activities. Google had done so for over a half-decade & only changed their approach *after* they knew a sting operation was underway.
For those scoring at home, this has been Google's approach to the health vertical:
- 3rd parties buying links that *could* influence search results for important health topics = morally reprehensible
- Google selling links *within* the search results for important health topics to criminal organizations = totally reasonable
Given the above investigation, it is not surprising that they shut down their health records initiative. They had already spent all their credibility.
Google may protect you from some third parties, but Google can not protect you from Google. :D
Not only can Google hardcode the algorithms toward promoting certain websites (while editorially discriminating against other webmasters for doing the exact same thing), but Google also actively invests in the publishing ecosystem, which pits them directly against anyone who doesn't receive their largesse.
Webmasters are told that having networks of similar websites is spammy. And yet, Google invests is a company that owns about 7 copies of the exact same business model in the exact same niche as a roll up.
As we saw with BeatThatQuote, Google owned-and-operated websites get penalized for a shorter duration of time for the same offense that other websites get penalized for longer periods of time for. It was only through *repeated* exposure of the absurdity on SEO blogs that Google decided to treat their own property like they treat a typical webmaster.
You can also do nothing wrong, but have your model undermined by looking too similar to a company that is exploiting Google's relevancy weaknesses & forces Google to apply retribution. A lot of small ecommerce sites were purged in the content farm update. What is so sad about that is that if not for accounting games & selling stock as a business model a lot of the biggest "success" stories in the content farm might not even exist.
While the above section focuses on Google, it could be about any competing business that touches the web...a bank which uses bogus accounting driving smaller banks out of business, a company that receives no bid government contracts associated with bribes & uses those "profits" to price dump in related fields, an ISP redirecting your traffic, etc. No matter how clean a business model looks at a glace, there is some gray area where businesses meet & exceed the numbers quarter after quarter.
Look, for example, at the sorts of links NetZero puts in some of their customer emails
And those links point at the illegal "fake news" styled $1 trials (with endless unstoppable recurring billing).
Look closely at any mainstream media site & you will run into those ads.
Are Passive Revenues Impossible?
It really comes down to how you define passive.
If your site doesn't evolve & isn't aggressively marketed then eventually a search engine or another competitor will pick away at your advantages until you are soon found ranking #2 then #3 then #7 then #20 then invisible. Or you might get clipped by an algorithm all at once in a sudden stop torch job that makes your site essentially invisible, or it may be a slow & painful debt by a thousand cuts.
This is one of the reasons I generally prefer to have a site with a 30% or 50% profit margin over one with a 90% or 95% profit margin. Sure high margins are great while they last, but if you don't reinvest enough over time an algorithm or a competitor will eventually torch some of those high margin projects.
When it comes to online income, passive and reliable are not synonyms.
If you saved the margins you made while they were there then you are lucky, whereas if you adjust your lifestyle to that level of income & don't save anything then dark times have appeared.
It turns out having passive frugal spending habits & active savings habits are crucial if your lifestyle relies on "passive" income. ;)
Source: http://feedproxy.google.com/~r/seobook/seobook/~3/VY5T_EAZ330/passive-income
homemade chirstmas gifts kindle fire coupon codes chirstmas gifts ideas for boyfriend