The Essential Guide to Good bots, Content Harvesters and Comment Spammers

Based on the report of Incapsula in 2014, 56% of the internet traffic was generated by bots. Bots are considered to be automated software applications performing repetitive work in Internet. These programs usually perform the most mundane and time consuming tasks. In this article we are going to present the two faces of bots usage: benefits of properly using the good bots and the awareness and potential protection from the bad bots: content harvesters and comment spammers.

Good bots vs Bad bots

If you run an e-commerce site, manage a company website or own a personal website, you have probably heard about the impact of bots in the online world. Bots are crawling and running on internet in 24/7, visiting sites and performing different tasks. One of the most important task performed by bots is search engine indexing. These bots are called Good bots, known as search engines crawlers and spiders, the most important of which are: Googlebots, Baidu Spider,MSN Bot/Bingbot etc. On the other hand, according to the Incapsula report, nearly 3 out of every 10 visitors of your site are trying to steal information, break the security tools and pretend to be something they are not. These are known as bad bots, the most common of which are content harvesters and comment spammers.

bot.png

Source: 2014 Bot Traffic Report:https://www.incapsula.com/blog/bot-traffic-report-2014.html#sthash.o7cfB0oe.dpuf

Good bots

Good bots can simply be called as programs that search engines launch to get their database indexed with the relevant and viable websites. Even though there are several important Good bots running in internet, one with the most impact in your site performances is the Googlebot.  A "Googlebot" is Google's web crawling bot, that crawls the Internet in search of new pages and websites to add to Google's index. In order to get the most advantage of the Googlebots your site needs to be SEO friendly. There are several ways how to make sure that your site it getting the best indexing by Googlebot. You can check the performance of Googlebot running in your site by using the statistics provided by Google Webmaster Tools.

Verify your site with Google Search Console

Below you will find a small step-by-step tutorial how to verify your site with Google Webmaster Tools, recently renamed as Google Search Console.

Step 1: Register for Google Webmaster tools (Google Search Console).  

Login to your Google account and access the Google Webmasters tools

Enter your website domain name in the box and click Add Property.

Step 2: Verify the website ownership

Perform the instructions stated in the picture below to make sure for the website ownership

If all the steps are passed successfully, when you press the Verify button you will see this message:

Googlebot will start to review your site and will help Google update search results with your new content. However, your website won’t get instantaneously indexed and available through Google Search. The indexing process may take between 48 hours and a few days to complete.

If you own an OpenCart store, you can use SEO Pack Pro module in order to get the best advantages from the SEO Good bots. Furthermore we suggest you to follow these  aricles: https://isenselabs.com/posts/boosting-sales-in-opencart-seo-tips-and-social-media-activity-part-1

Bad bots

Based on the Incapsula report , the internet malicious bots consist of four types: Impersonators, Hacking Tools, Scrapers and Spammers. These bost compose ⅓ of all the sites’ traffic.

Impersonators are the more sophisticated DDoS, ad fraud, and malicious scanning bots, because they try to appear as legitimate users. These bots have in their target all types of websites. Hacking tools on the other hand are focused more on the CMS-based websites (Wordpress, Drupal...). The damage they commit consist in data theft and site/server hijacking.  E-commerce sites have a bigger risk to get attacked by scrapers bots.These bots tend to harvest content and perform reverse engineering of pricing and business model of these sites. Spammers, composing 0.5% of all the websites visits tend to do comment spamming, phishing links and search engine blacklisting.For the rest of this article, we are going to analyse the most frequent bots on e-commerce sites: Scrapers and Spammers.

Content Harvesters

These bots are considered to be the web scrapers/harvesters, automated programs that extract informations from websites. This is not necessarily a bad activity; for instance price comparison sites rely on the technique. On the other hand, if you are putting valuable content online, these Content Harvesters could pose a real threat to your business. The most common industry these bots attack is e-commerce sites. They use the stolen content to intercept web traffic. Furthermore, they merge different content to make new content so they can avoid duplicate content penalties. Beside stealing your content, these bots can penalize your SEO ranking. Your website could be hit by several penalties for duplicate content and even fail to appear in search engine rankings.

In order to prevent these attacks before they cause damage you need to be more proactive to your site. Preventing is better than curing.

In order to detect if your site has duplicated content you can use online tools to detect plagiarism. One alternative can be Copyscape. What you have to do is enter your site URL and check for duplicated content.

 

 

If you have identified the scraper bots by their IP, you can block their IP address directly from your .htaccess file. All you have to do is add the following code :

RewriteEngine on

RewriteCond %{REMOTE_ADDR} ^69.16.226.12

RewriteRule ^(.*)$ http://newfeedurl.com/feed

Where 69.16.226.12 is the banned IP, and http://newfeedurl.com/feed is the redirecting URL.

For more advanced techniques on how to be protected from harvester bots follow this article.

Comment Spammers

It is likely that in your website forum, blog or comment section you have detected some suspicious comments, usually unrelated with the discussed topic. These unwanted comments are usually spam content created by the comment spammers bots.

Note: there can also be human comment spammer

By publishing in your site, spammers have these benefits:

1. They can achieve a slightly higher search engine ranking.

2. Generating traffic and sometimes even making real sales.

The purpose of spammer is not to degrade your site, what they want is to make more profit. The figure below shows an example of a spam comment in a Wordpress site:

There are several security measures that can protect your site from comment spammers. These techniques try to stop the spam before the comment is posted. Below we will be present two of the most common techniques of protecting from comment spammers.

Use of CAPTCHA  

One of the most popular security measures today is CAPTCHA (Completely Automated Public Turing Test to Tell Computers and Humans Apart) codes part of Turing Test. A CAPTCHA code is usually an image with randomly generated letters and numbers. The content must be entered in the box in order to complete the registration process or to post comments.

By passing this process, it is guaranteed that the commenter is a human not a bot. On the other hand, this technique sometimes appears annoying for a commenter and may discourage the user to post a comment.

Authentication use

This is also a simple approach, when the site owner requires from the user to provide a username and a password before posting a comment. If the administrator then finds a user spamming the site, he or she can ban that username or their corresponding email address.

In this way the number of potential spammers will diminish since spammers do not want to be identified and spend a lot of time to sign up. Therefore, they would rather attack easy targets. Like in the CAPTCHA case, this method has the disadvantages of discouraging the user to post a comment.

Social logins authentication

To overcome the problems of the traditional way of authentication, in the recent years a new form of authentications has gain popularity. Based on a research by Janrain and Blue Research in 2011, 77 percent of the customers prefer social login way of authentication comparing with the traditional one. Social logins as a form of single sign-on, allows the users to login in a website using their social networking profiles. Being single sign-on, the user do not need to enter their authentication information multiple times, as the login credentials are remembered in multiple sites. When authenticating with Facebook, the user profile information is automatically retrieved and just a button press can automatically complete the registration process. Based on a research by Gyga, the the majority of social networkers worldwide used their Facebook IDs for social sign-in.

Conclusions

In order to maintain the success rate of your website you need to be cautious of the human generated bots crawling the internet. To get the quality traffic you need to optimize for the Good bots and try to exclude the Bad ones. Make your site SEO friendly, protect it from content harvesters and take into consideration the common spamming techniques to avoid most spams.

If you are running an OpenCart store, in order to take the most advantages of the clawing bots we recommend SEO Pack Pro extension. Further, to protect your store from unwanted bots use BotBlocker module.

Join 11,000+ subscribers receiving actionable E-commerce advice

* Unsubscribe any time
comments powered by Disqus