Since Google Panda, website owners have become more aware of duplicate content and the need to make sure your website content is unique, compelling and contains quality. Prior to Panda, duplicate content was more of a nuisance, but now it is a problem for webmasters, especially bloggers and content writers who depend on good Google rank. Duplicate content from scrapers lowers your rank, and it can cause your pages to get deindexed by Google. The duplicate content issue is misunderstood and hard to understand. Most people just want to write and have an audience, but the audience and traffic depends on Google. Hopefully, we can demystify the whole process of crawling and why your unique blog content isn’t recognized as unique to the search engines.
What is Duplicate Content?
Before you know how to deal with stolen content, it’s better to understand how the crawler works and why id doesn’t know you are the author. First, the way to identify and understand duplicate content is to think like a robot. What does a crawler do? How does the crawler “find” your content? Not easy to do, but it is necessary to understand your problem. Most people consider duplicate content a simple copy and paste of someone else’s Web page. If you wrote it, Google should identify the original owner, right? Not as simple as we humans would like to think it is. There are a few reasons why Google cannot identify the original owner. Here are a few:
Duplicate Product Descriptions across Multiple Sites
This issue gets its own sub-head to stand out against the others, because it’s one seen all the time at Webmaster support. One issue on Google’s Webmaster forums is writers duplicating their own content across sites stronger than their own. One such site is Amazon. Or, the website owner uses the same product descriptions provided by product manufacturers. It’s not plagiarism and not content theft, but this is where thinking like a bot comes in handy. Imagine all you know how to do is identify content. If you’re a bot and you see the same content across thousands of sites, then you consider that content duplication. Site owners must create unique descriptions for all product brands. There are millions of ecommerce sites. Many new ecommerce site owners wind up de-indexed just for using product descriptions that thousands of other sites have used. Sometimes, it’s simply a few dozen.
How Do I Deal with a Content Thief?
If the content is scraped among dozens of sites, sometimes it’s better to rewrite it, if it’s only a few pages. However, the most effective way to deal with a content thief or web scraper is to file a dmca takedown notice. Our service handles these requests, but there are also steps you can take to deal with web scrapers. It should be pointed out that filing a false DMCA has legal implications, so just don’t do it.
A one-off DMCA filing can be done yourself on Google’s DMCA takedown notice page, but for sites that rank high, you must watch scrapers every day to ensure new ones are not popping up while you aren’t looking. That is where a DMCA protection service comes in handy. We run daily reports on your content to check for duplicate content, and send DMCA notices on your behalf each time we find a new scraper.
You can use the contact form on the right side of our blog or contact us through our DMCA contact form on the home page.
A DMCA takedown notice can be a powerful tool to deal with unauthorized use of copyrighted material. In some cases, the hosting provider of the site sent a notice will take the entire site down immediately, which may not be the intent of the person sending the DMCA notice. People who send notices can also be held liable for sending false notices. Because of these issues, we often get questions about when you should file a DMCA takedown notice. It’s something that can only be determined on a case by case basis. However, there are common factors that we always consider.
Only the copyright holder or an authorized agent can file a DMCA takedown notice, so this is the first thing we always look at. As a corollary to this, we look for whether the material is copyrightable. Titles, for example, cannot be copyrighted so we won’t file a notice against someone who uses the same post title as one of our customers.
In the United States, fair use is a defense to liability for infringing copyrights. It is basically an admission that copyright has been infringed but that the use of the material in question was fair. Some U.S. Courts have ruled that a copyright holder must do a fair use analysis before sending a DMCA takedown notice. To keep our clients in compliance with court rulings, we always factor in fair use. If only a few lines of text are quoted out of a copyrighted work, we will usually consider that to be fair use as long as there is proper attribution.
Nature of the Infringing Site
Some people inadvertantly infringe on copyrights either out of ignorance of the law or a lack of knowledge that the material in question is copyrighted. If we see a site that looks to have inadvertantly posted a single infringing photo, we might suggest that an email to the webmaster might remove the content faster and more amicably. However, when we come across a scraper site or a site regularly posting infringing material, we don’t hesitate to send a DMCA takedown notice.
The DMCA is a US law and only applicable to sites hosted in the United States. While some foreign hosting providers will respect DMCA takedown notices, not all will. Many foreign countries have their own procedures for removing copyright infringing material from the Internet. To the extent that their procedures are similar to the DMCA, hosts may choose to see the DMCA notice as effective under their own national laws. For a proper DMCA takedown notice, however, the infringing site should be hosted in the United States.
Time and Expense
If the same material is posted in several places online, it can get expensive and time-consuming to continuously file DMCA takedown notices. We always look for other cases of infringement beyond what our clients know about so that we can give the client an indication of the process involved and let the client decide how to proceed. If the copyrighted material is a book, the expense and time might be worth it for the client. If the material is a blog post, in some cases, the client may prefer to rewrite their own post to make it different from all the copies.
Those are a few factors that Dooplee looks at. There are more that might be necessary to consider at times. But, in deciding whether to file your own DMCA takedown notices or to ask Dooplee to file on your behalf, these factors should inform your decision. Fill out our form to file a dmca takedown notice and have your site evaluated.
photo by: www.anna-OM-line.c- om (sxc.hu)
When we read about copyright issues, it’s easy to get lost in the details. With all the talk about piracy, SOPA, PROTECT-IP, copyleft, the DMCA, copywrong, and Creative Commons, we sometimes forget that at its heart the copyright debate is an argument about who gets to choose how creative works are distributed. In this post, I’ll take a brief look at the extremes on both sides and what they are really saying about choice.
The Copyright Holders:
At one extreme of the copyright debate are the people who seek to enforce their current copyrights and, in some cases, to expand copyright laws. They believe that the creator of a work should have the right to choose how it is distributed. In this view, even if the creator transfers the copyright to someone else, like a record company, the creator is still enforcing her right to choose; she is saying that the record company should be the one to choose. At it’s extreme, this position would hold that no one else should ever have the choice as to how a work gets distributed. We can see this in software licenses that restrict any transfer of the software. Not many people on this side of the debate argue for the extremes of software licenses, however.
The File Uploaders:
At the other extreme of the copyright debate are the people who upload copyrighted work to file-sharing sites. The file uploader’s belief can be summarized as: no one has a right to exclusively choose how a work is distributed, not even the creator; it’s a choice that belongs to us all collectively and individually. When someone brags that he, as a copyright pirate, is doing the moral thing by uploading a copyrighted novel to a torrent site, this belief is what he is really bragging about; he believes that he is performing a moral duty to enforce the collective right to choose.
In the Middle:
These are just the two extreme positions. The actual debate is more nuanced, but almost everything can be seen as a variation of one of these two positions on choice. When an author uses a Creative Commons license for his work, is he not stating that he, as the creator of the work, has a right to determine how it will be distributed? Of course he is. He’s exercising the choice to state his work should be distributed according to the terms of whichever Creative Commons license he picked. In terms of the position on choice, there is little difference between the Creative Commons advocate and an RIAA lawyer, or not nearly as much as both would like to believe. The real difference between the two isn’t about choice; it’s about distribution methods and money.
Which Side is Right?
As with most moral debates, we can’t necessarily declare a right and wrong side of the debate. Everyone’s opinion depends on which side of the debate they are on. However, when we frame the copyright issues in terms of choice, we can see how fundamentally opposed the two positions are at their extremes and what that implies. The file uploader cannot be reconciled with the copyright enforcer unless one of the two changes his position about who gets to choose how creative works are distributed. We can’t bring the pirates into the fold by granting them distribution licenses or limited rights to distribute works on their sites. They don’t believe anyone has the right to grant such things. Fortunately, we can also see that despite differences on other issues some people on seemingly opposing sides of the copyright debates have the same fundamental assumptions about to whom the choice belongs: the creator.
Next time you disagree with someone over a copyright question, ask yourself what the other person believes about copyright and choice. You might find some common ground to work with.
Most people create a blog, start typing, publish and send it off into the void. Most writers know that — behind the scenes – there is some process that tells Google to index the content. How well the content ranks is another story for another time, but several factors affect rank and one is the number of Web pages that contain the same data – or content – as others. If you publish content on a blog, keeping track of where that content goes is a daily process to protect copyright. Not only does scraping your content affect rank, but it also infringes on your copyrights.
How Does a Scraper Work?
There are several “out-of-the-box” scrapers available. These programs install on a site such as a WordPress blog. The WordPress blog is sometimes referred to as an “autoblog.” The scraper software lets the user choose the content he wants to scrape and the frequency at which he wants to steal the content. Typically, the scraper uses the RSS feed on the website. RSS feeds are set up as XML files, so they have standard formats that are easy to use to download content and store it on the scraper sites.
More advanced scrapers take the content and code from the target site. As long as your site has a standard format and pattern, the scraper can decipher the difference between the content and the HTML. What makes these scraper sites easily functional is that most sites scraped are WordPress or Blogger blogs. These sites have a standard setup, so the software programmer can set up software that allows the user to scrape specific pages.
How to Stop Scrapers from Stealing Your Work
The first step to block the content scrapers is to check your RSS feed file. You can typically see the file by clicking on the RSS feed button on your site. Are you publishing the entire article or just the first paragraph. Don’t publish the entire article in your RSS feed. Instead, publish the first 50-100 words and add a link. The link back to your site helps fight against the scraper from showing higher in the SERPs than your own content. As the scraper takes your content, you at least get a backlink for your efforts to help your rank. The links might get devalued in the search engines as the scraper is finally caught and devalued for duplicate content, but you still have the benefit of some kind of backlink.
The next step is to monitor your content and file DMCA takedown notices as you find scraper sites in the search engines. This is an ongoing battle, but it stops the content from multiplying. Sometimes, a scraper is scraped by another scraper, so leaving your content up on an infringing site can lead to multiple sites having copies of your own content.
If you need help with a DMCA takedown notice, fill out the form on the home page with the information regarding the duplicate content site. The DMCA process can take up to two weeks to finally take down your content, but it works to stop scrapers from stealing your content.
Nothing stings more when – as a webmaster or writer – you find your content scraped and spun on another website. Scraper sites (also called “autoblogs”) use RSS feeds and other software to take content off of your site and post it on their own. Although there is no duplicate content penalty in Google, the scraped content chips away at your rank as all of the duplicate content is “averaged” out by the search engine algorithm. You can combat scrapers by checking for duplicate content regularly and file a DMCA takedown notice when you find scraper sites.
Google Search Alerts
Google Alerts provides you with an email if the scraper site is found in the index. Google search alerts sends you an email to your Google account each day, so each time the bot finds plagiarized content, you can click the link and check the content for your copyrighted material. This process does not help you remove content from the search engine, but it helps you monitor any content being taken or scraped from your site.
To use the Google search alerts system, you enter a sentence from your content. Typically, you want to copy a sentence from the first paragraph. Some scrapers take the first paragraph and link back to you. This is typical of legitimate scrapers, but it helps you find each site copying your content. Make sure you place the sentence in quotes when you add it to Google alerts. Without the quotes, you receive alerts from content that is similar to yours, which is annoying.
Use the Google Search Engine
Google search uses the same search feature as Google alerts. You type a sentence from your content into the search engine to find duplicate content. The difference between the two types of searches is that the Google Alerts feature is automated.
Duplicate Sites Such as Copyscape
Copyscape and other duplicate content sites cost money, but these sites are also effective for finding duplicate content. Copyscape and iThenticate use a percentage flag. If the site’s content reaches a threshold of a certain percentage of the same content, you can assume that search engine bots also see the content as a duplicate. Copyscape gives you a number of free searches each month, but after the number of searches is reached, you must pay a fee each month.
So what do you do after you find a scraper site? Dooplee provides you with a takedown service that contacts the host with your complaint. In normal situations, the host provider hosting the scraper site removes the content from the website. If we do not get a response, we file DMCA takedown notices with the search engines. We file in your behalf, and we remove the content from major search engines including Yahoo, Google and Bing. When the site is removed from the major search engines, the site cannot draw the traffic, which renders the stolen content and website useless.
We also handle copyright infringements for eBook theft. Some scrapers take your eBook content and post it on a website. The point of the scraper is to use your content for ad-clicks. If the scraper site ranks above your site, it is even more important to remove the site from the index. We can have these sites removed from the index. However, make sure you monitor your site for more scraped content after the scraper is removed.
The Web has a chronic problem with scraped content, so monitoring your site or eBook guarantees that your content is safe from scrapers.