If you write an article and post it to your blog it is possible you might be creating duplicates of your own blog content without aware about it.
Whenever the Search Engines check duplications from your pages they automatically select which page to rank and stuff the rest in to their supplemental index , we can say it’s the black hole of Search Engine traffic.
It’s important you understand how to avoid duplicating your own content so that you can stay in control of which of your pages rank in the Search Engines.
The two important potential causes for these are:
The Search Engines view all the following URL’s as *separate* pages even though they all actually point to the same page…
http://yourdomain.com
http://yourdomain.com/
http://yourdomain.com/index.html
http://www.yourdomain.com
http://www.yourdomain.com/
http://www.yourdomain.com/index.html
If you or others blogger are linking to your site using a variety of these different URL’s you’ll not only be diluting Page Rank on your web site but you as well stand the chance of having your content labelled as duplicate.
At the moment Google is known to be well aware by these consequence and are figuring out to solve it. Even so, I urge you to not leave it to fate. Choose ensure of the situation as soon as you are able to.
Luckily the work around is really simple and only requires a small code to be posted in to your htaccess file on your webserver.
Are you 100% sure you don’t have duplications of your own articles within your own sites?
If you are using one of the popular free content management systems (i.e. WordPress) your site might already be suffering from this.
As good example, WordPress the favorite Blog management system, automatically creates archives and category pages on your Blog. The default settings of WordPress result in these archive and category
pages containing duplicates of the exact same posts appearing elsewhere in your Blog.
Once Google determines whole the double variations from your post their spider attempts to find out which page to rank and places all the rest in to the supplemental index.
This might not sound like a major problem because one way or another your content it still getting ranked but if it is left up to the Search Engine bots you may not get the page *you* want to rank.
Some content management systems create other kinds of internal duplicates such as different formats of the same page (i.e. PDF, text, word doc).
Perform the following Google search if you want to see how many pages your website has in the supplemental index:
site:www.yourdomain.com ***-view
Any pages listed from this search will be pages of your website that Google has choose to move to their supplemental index. (You’ll see the green text ‘Supplemental Result’ under each result).
Pages in the supplemental index are known to hardly ever get traffic if at all. This is until they move out of the supplemental index, however, many report this as hard to achieve.
The work around for this issue is to tell the Search Engine spiders to ignore specific locations on your website. This will enable you to control which pages will be indexed and ranked.
You can do this by adding the following code to a ‘robots.txt‘ file at the root of your website:
User-agent: *
Disallow: /example/directory/
Disallow: /another/example/directory/
Disallow: /one/more/example/directory/
The first part ‘User-agent: *’ causes the following statements to apply to entirely search engine bots that read the robots.txt file.
The ‘Disallow: /…/’ lines are where you list each directory location on your webserver that you want the Search Engine bots to ignore (i.e. NOT index).
Then in our above example we’re notification all search engine bots to *not* index any webpage or indexable file located in the three stated directory locations on our website.
Making out this correctly could really assist you control which pages are chosen by the SE’s to rank in their results.
Whenever you are not sure how all this functions please do consider the time to understand the robots.txt properly before you implement it. Search Google for info on robots.txt and also check out the Wikipedia.
If you’re putting in all the effort required to make sure your content is unique you really don’t want to fall over at this last vault.
I hope if you weren’t aware of these potential pitfalls before that you take the simple action necessary to ensure you don’t fall victim to self-imposed duplicate content.