25 Years of Programming
An open source source for C, C++, OWL, BASIC, MDB, XLS, DOT, and more...
Home   Projects   Sitemap   Search   Blog   Forum+Chat   About Us   Privacy   Terms of Use   Feedback   FAQ   Images   Services   Ads   Donate

Malware:

Botnets:

RegEx searching:

Apache security:

PHP security:

Website traffic:

Search engine ranking:

Site monetization:

Before buying a product,
I look at Amazon.comGo to Amazon.com customer reviews to see what other people are saying about it.

Google/StopBadware says "This site may harm your computer", and it's YOUR site! What to do, step by step.

Here are reasons why your website can be flagged with the "This site may harm your computer" warning in Google search results, in approximate order of likelihood:

  1. Your site contains an outlink to another site that is flagged because it falls into one of the categories below. This is statistically most likely, by the following reasoning: if you outlink to 9 other websites, you will get flagged if any of the 10 sites has badware on it. There is only a 10% chance it is your site. There is a 90% chance it is one of the others. The more outlinks you have, the more likely they are the problem. Adjust the probabilities based on what you know about the sites you link to. If you only outlink to Google, for example, you can be reasonably sure they are not the problem. On the other hand, if you allow user-generated content such as blog comments and forum posts, it is even more likely that outlinks are the problem, and you should check every comment and every post (especially spam posts) for possible links to malicious sites.
     
  2. Your site was hacked. Malicious code was inserted onto your pages by somebody else. Your pages are now dangerous to your site visitors (and to you). This is a very common reason for the badware flag.
     
  3. Your pages have normally non-malicious iframes or JavaScript whose content is served from another website ("third party site") with a "src=http://othersite" property, or it has PHP code that is hosted on another website and included into your pages with a PHP include(). However, the other website was hacked and has become malicious. Your pages are now dangerous to your site visitors because instead of the advertisements, hit counters, website lists, or other third party content your visitors were supposed to get from the remote site, they are now receiving viruses, spyware, or other bad things.
     
  4. Your pages trigger the loading of out-of-date versions of Flash files or .swf files that are scripted to cause malicious behavior. This has been a particular problem with Flash advertising.

StopBadware and Google describe the criteria they use to determine whether a website is contributing to the badware problem.

Why antivirus scanning might not find the bad code

The first idea that occurs to many webmasters is to do an AV scan on the site. In many cases, this will not find the problem. The next sections explain why.

Scanning your website files on the server

Scanning your site with an antivirus program will only work if the site is actually hosting the virus, which it often isn't.

More likely, the virus itself is hosted on another computer. Your pages have been injected with iframe or JavaScript code that refers indirectly (with src=) to the virus on the other website. Thus, the AV program on your server sees only iframes and JavaScript which don't trigger virus alerts because they aren't viruses.

The remote viruses aren't pulled in until the page is loaded into a visitor's browser. Then their browser fetches the code referred to by the src= property, and then they get a virus alert.

You can, of course, scan your site with an antivirus program if you want, but if it finds no viruses, that does not mean the site is clean.

Downloading your site files to your PC and scanning them there

Using a tool like FTP, Wget, or cURL to download the source code of your pages to your local PC and AV-scanning them there is also unlikely to find the virus, for the same reason given above: the actual virus is probably not on the pages.

Wget in recursive download mode can retrieve all linked files, including ones from remote sites and including remote script files, but if some of them are viruses, you will be downloading them directly to your PC, which is an unnecessary risk.

Risky - browse your pages with an antivirus program running on your PC

If you are determined to use the facilities of an AV program to scan your site, you can browse the site as if you were an ordinary visitor. This is risky because an increasing number of viruses are "polymorphic". Their code changes so frequently (every day or every time they are served) that antivirus programs can't keep up and do a poor job of detecting them:

  1. Make sure your PC is fully patched with all the latest security updates.
  2. Make sure your antivirus software is up to date with the latest definitions.
  3. Set all your browser security settings to their highest levels, including turning JavaScript (or "active scripting") OFF.
  4. Go to each of your site's pages with your browser.
  5. If there is badware there, you are just as vulnerable as any other visitor.
  6. If your AV pops up an alert, that's a good sign you've found the problem.
  7. If your AV doesn't pop up an alert, that does not mean the site is clean. As mentioned, AV programs may not detect these viruses. The viruses may be encrypted, and your AV might not detect them until they are decrypted, which requires JavaScript. But if you enable JavaScript, you might discover the virus by getting infected with it!

In summary, antivirus scanning is not definitive. If it finds a problem, that's useful. If it doesn't find a problem, it means nothing because the virus might be on a different website, or encrypted and polymorphic, or the site's problem might not be a virus at all. It might be a bad outlink.

Instead, inspect your source files manually.

Searching your pages for badware

1) Discover which pages are flagged for badware

Use either one of these methods:

  • In any Google search box, enter:  site:yoursite.com
  • Look up your site in the StopBadware Clearinghouse database. Try both the www and non-www forms. A search for one doesn't find the other.

Whichever method you use, note the pages that are flagged. If it's only a few, make a list. If it's all of them, that could be important because the badware link might be something common to all of them.

1a) Suggested procedure

Start by doing each of the steps in this article in only a preliminary manner. Once you know what you're looking for, it is very easy to discover suspicious code.

Therefore, the first time through, don't spend too much time on a "fine-toothed comb" approach to every step. Just try to discover quickly:

  1. Do my pages have actual badware links on them? (Examine a few pages.)
  2. Did any of the sites I link to get hacked and become bad? (Look them up in Google.)
  3. Do my forum messages or blog comments have spam posts with badware links? (Look at the most recent posts.)
  4. Have my advertisers or other third-party content providers "gone bad"? (Do a Google search to see if other webmasters are posting in forums about problems.)

If your quick preview turns up nothing, come back and do each check more thoroughly.

2) Prepare to search the pages for badware

It is important to view and search the server-side source code of your pages, not the HTML output that you see with the View Source command in a browser.

View Source can allow you to think the page is clean even though it's not.

Some exploits put malicious code on pages only under certain circumstances, and your particular viewing might not meet those circumstances.

Viewing the code on your server allows you to see ALL the code, even if it is only put on the pages sometimes.

3) How to search pages for malicious code

One page at a time (recommended)

Ways to view the source code one page at a time:

Start with the most central and important flagged page, such as your home page.

Where to look:

Malicious code is often inserted into web page files by robots, without concern for placement (or W3C validation). Common locations are

  • At the very top of the file.
  • Just before or after the <body> or </body> tags.
  • At the very bottom of the file, after the </html> tag.

If your pages normally validate at W3C, do a check there on the badware-flagged pages. Any errors might point to the bad code.

Multi-file searching

With multi-file searching, a program searches multiple files for the search string you specify, and reports all the instances it finds.

If you're already familiar with how to use the following programs, you might find a multi-file search easy to do. Otherwise, I recommend the "one page at a time method" above, at least to start with.

Ways to obtain the source code for searching multiple files at once:

Dedicated server

Do your search directly on the server with an operating system tool like grep.

Shared server

If you're already familiar with cron and grep, you can create a cron job to do the grep search as though you had shell (command line) access.

Otherwise, download the pages (or your entire site) to your PC so you can search them there. Download with:

After downloading the pages to your PC, you can use FrontPage, Expressions Web, Dreamweaver, GREP, or a similar utility to search all of your site's pages at once with single commands. Even Windows Explorer is better than nothing.

Search strings

These are useful search strings, whether you are searching one file at a time or all files at once:

<iframe Can quickly discover malicious links, which are often in iframes.
src= Finds occurrences of iframes and JavaScript because they both use this property.
http:// Finds references to remote websites.
script language= Finds occurrences of scripts.

Make sure all instances of src= and http:// refer to files on your site or to external sites you know and trust.

Some common trusted ones that are not a problem are pagead2.googlesyndication.com (if you use AdSense) and www.google-analytics.com (Google Analytics).

3a) What do malicious or invisible iframes look like?

iframe code looks like this. If you don't recognize remotesite.com, the code is suspicious. This is an "invisible iframe" because of the width and height settings:

<iframe src="http://remotesite.com/path/file" width="0" height="0" frameborder="0"></iframe>

3b) What do malicious JavaScript references look like?

JavaScript references to external sites look like this. If you don't recognize remotesite.com, the code is suspicious. This code calls and runs a JS script that is hosted on a website that isn't yours (and which might be a hacked or otherwise malicious site). After a visitor loads your page, their browser fetches this JavaScript from the other site and then runs it:

<script language="JavaScript" src="http://remotesite.com/path/file.js"></script>

3c) What does malicious or obfuscated JavaScript code look like?

Malicious JavaScript code directly on your pages (rather than being called by reference as described above) is often "obfuscated", "obscured", "encoded" to make it hard to tell what it does. It looks like an undecipherable jumble, like this (this has been greatly shortened and mangled to make it nonfunctional). Code like this is always suspicious and must be investigated:

<script language="JavaScript">function nbsp() {var t,o,l,i,j;var s=''; s+='47116101120'; s+='09711409­9111'; s=s+'120340321­1910'; s=s+'71210581101­11112'; s=s+'9062032'; t='';l=s.length;i=0; while(i<(l-1)) {for(j=0;j<3;j++){t+=s.charAt(i);i++;} if((t-nescape(0xBF))>unscape(0x00)) t-=-(uescape(0x08)+unescae(0x30)); doc.rite(String.froCharCode(t));t='';}}nbsp(); </script>

Sometimes VBScript is used instead, so the code starts with:
<script language="VBScript">

When in doubt about whether a block of code is malicious, take a snippet of it and do a web search on it.

4) Search your pages for links to flagged sites

If you found no suspicious code on your pages, you can also be flagged for linking to another site that has badware.

  • In any Google search box, for every site you outlink to, enter site:thatsite.com, or
  • Look it up in the StopBadware Clearinghouse (using both its www and non-www forms)

If any of the site's pages have the "This site may harm your computer" warning, you may be flagged for linking to them. Or they may be flagged for linking to you, if they do. There's no way to tell which it is. If you have a good relationship with the other site, let them know they've been flagged.

5) Forum posts and other user-generated content, especially spam posts

You can get badware-flagged if malicious links are posted in user-generated content on your site. Starting with the most recent posts and working backwards, check all the messages in your forum and in blog comments. For every link to another site, do a Google site: search to see if it is flagged. If it is, remove the link.

6) Advertisements that Google considers badware

If you run affiliate ads or ads from advertising networks, you usually put the ads on the page by inserting iframe links or JavaScript into the code of your pages. The ads are retrieved from the third party sites only when your page is loaded into a visitor's browser.

There are a few advertising networks that do things in their code that StopBadware and Google consider badware behavior. There are also ad networks that fail to properly screen the ads submitted by advertisers for distribution, so sometimes malicious ads get into their rotation inventory. Make a list of the advertisers you are affiliated with. Do a web search on them or ask about them in the StopBadware Google Group. An example of a web search that should return useful results is:

advertiser badware OR StopBadware OR malware OR virus

Bad ads can slip into even the big ad networks. DoubleClick clients got hit in 2007.

An increasing amount of advertising is being served in Flash .swf files. These files can be flagged as badware, too. See the next section.

7) Out-of-date, exploitable, malicious Shockwave Flash files

There have been numerous security vulnerabilities found in Flash. In addition, Flash scripting allows authors to embed badware behavior such as redirecting to a different website while the user is helpless to prevent it.

Whether your Flash files serve third-party advertising or merely your own content, they will get flagged if Google determines they are outdated, exploitable, or have malicious scripting.

In early 2008, this is an increasingly common reason for sites being flagged.

The easiest way to determine whether .swf files are the reason for your site being flagged is to remove the files as part of your initial site cleanup. After the badware flag is removed from your site, put the files back. If the flag returns, they're a problem. You can also try scanning your .swf files with the AdopsTools Online click checker, which gives you a report about the file's content.

These links might help you investigate further:

Technical articles:

While you are investigating and fixing your site, you might want to keep Flash disabled in case you have a bad .swf file. In Internet Explorer, go to Tools > Manage Add-ons > Enable or Disable Add-ons > Add-ons that have been used by Internet Explorer. Disable two items: 1) Shockwave ActiveX Control and 2) Shockwave Flash Object. I keep both disabled all the time. For Firefox, there is an extension called Flashblock.

8) Advertisements from otherwise legitimate ad networks that have been hacked and are now serving badware

Even if your advertisers normally use only legitimate methods, their company server can be hacked and their ads replaced with malicious code, which would start appearing on your pages instead of the usual ads.

This is a danger anytime your pages pull some of their content from other sites.

This is one case where the only way to detect the malicious code is to visit your site pages with your browser, to make sure all the ads are the legitimate ones you expect. If you are affected by a hacked advertising network, you won't be the only one, so a web search should turn up other similar reports.

9) Other third party content

If you use any code that includes content from a remote site, such as

  • iframes or JS with a property of "src=http://othersite.com", or
  • PHP scripts that use include("http://othersite.com/filename.php"), or the related include_once(), require(), and require_once(),

there is always the danger that the other site got hacked and your pages are now referencing or including malicious content.

10) Database content from your CMS.

If the content for your site pages is stored in a Content Management System (CMS) database, it is possible that a website hack inserted malicious code into your database rather than into the code for your pages.

This is another case where visiting your site with your browser is the easiest (although risky) way to find the malicious code. The safer but slower way is to visually inspect the data in your database tables.

If database injections are found, the only way to clean the site will be to manually examine and clean the database (not easy) or restore it from a known-good backup.

11) Rewrites in your .htaccess file

Examine your site configuration files such as Apache httpd.conf and .htaccess for any rewrite or redirect code that serves malicious content from another website instead of the content that was requested from your site, or that redirects your visitors to a malicious site. If someone edited or replaced your .htaccess to do those things, the Google crawler would encounter badware when it tries to visit your site, and your site would get flagged.

In httpd.conf or .htaccess, look for lines containing Rewrite or Redirect and references to sites that aren't yours.

12) Server has a rootkit installed

If you have thoroughly investigated all the preceding possibilities and you are sure everything in your site is clean, your server might be infected with software such as a rootkit that is injecting malicious content onto your pages, not in the filesystem, but on the fly, as they are served.

A rootkit is one or more programs that partially replace the operating system. It performs ordinary operating system tasks just like the OS would, plus it also performs whatever malicious activity it is programmed to do. Because the rootkit hijacks operating system tasks, it can hide itself. If you ask for a file directory, the rootkit gives it to you, except it omits files it doesn't want you to know about. If the rootkit replaced a system file, you cannot discover its existence by checking the file size because when you ask for the file size, the rootkit lies. It tells you the size of the file it replaced, not what its own size really is. Simply put, a server with a rootkit is trashed.

If you are on shared hosting, there is absolutely nothing you can do to eliminate a rootkit except file a support ticket with your webhost and wait for them to investigate. While they investigate, examine your site files. This time, search not for malicious code, but for vulnerabilities. Maybe the attack did enter through your site and rooted the server but didn't modify any of your site files. Putting the same vulnerable site live again on a freshly cleaned server is not something your webhost will thank you for.

If you run a dedicated server, reformat the hard drive, reinstall and configure the operating system and server software, reinstall your site from known-good backups, and start fresh.

In the big scheme of things, rootkits are an unlikely cause of being flagged for badware, and it should be the last thing you consider after you have diligently investigated every other possibility and found nothing. However, the incidence of rootkit infections is increasing.

You might find the following articles interesting or useful if you suspect a rootkit. They discuss one that is being called "Random JavaScript Toolkit":

13) DNS cache poisoning

People think of website addresses as text like http://website.com, but web addresses are really numbers called IP addresses. Before a browser can fetch a web page from a site, it must first send a query to a DNS Server to get the site's correct numeric address.

Occasionally, someone manages to inject bad data into a DNS server so the IP address translations it returns are wrong. If someone tries to visit your website but their browser gets your IP address from a poisoned DNS server, they will be sent to a completely wrong website. That site might have malicious content, which could cause your site to be flagged for badware.

When investigating your badware flag, this is a "way-out-there" scenario, rare and unlikely, but it has happened, so it's included here for completeness.

14) Ask for help in the StopBadware Group

If the reason your site is flagged remains a mystery, post a message in the StopBadware Google Group (forum) to ask for assistance.

15) Request a review from Google or StopBadware

After you have found and resolved all the likely reasons your site got flagged, file a request for review in the Webmaster Tools section of Google Webmaster Central, or on the StopBadware Request for Review form. If they find that the badware or badware links are gone, the warning flag is usually removed within 1 to a few days, even though their submission form says to allow longer.

If you changed nothing on your site, but only filed the review request, the flag will not be removed. Google's accuracy at identifying badware is nearly 100%.


Links and resources:


Keeping badware off your site

The most important ways to keep badware off your site are

  1. Avoid getting hacked.
  2. Use a CAPTCHA to prevent robots from posting spam comments. A CAPTCHA is an image that humans can read, but robots can't. Some forum (and other) software has this feature in it by default, and all you have to do is enable it.
  3. Monitor forum posts and blog comments so you can catch and remove links to malicious sites.
  4. Keep to a minimum the amount of content served to your pages by outside (third party) websites.
  5. Use only trusted advertising networks. If a network has been identified as the cause for badware flags, avoid it until you are sure they've changed their ways.
  6. If a site has an obviously careless attitude toward security, don't link to it. Clues would be outdated versions of forum and blog software, spam posts that haven't been removed, and areas where anyone can upload content for others to download.
  7. Keep the number of outlinks down to the number you are willing to check manually. The more outlinks you have, the more likely you will get badware-flagged. If you get flagged, you'll have to check them all or remove them.

Comments, questions, and discussion welcome in the Forum.

 

 

Valid HTML 4.01 Transitional Valid CSS
View content labeling at ICRA.
Copyright ©2008 Steven Whitney. Last modified 05/12/2008.