|
25 Years of Programming
An open source source for C, C++, OWL, BASIC, MDB, XLS, DOT, and more... |
Home Projects Sitemap Search Blog Forum+Chat About Us Privacy Terms of Use Feedback FAQ Images Services Ads Donate |
|
Malware: Botnets: RegEx searching: Apache security: PHP security: Website traffic: Search engine ranking: Site monetization:
|
Google/StopBadware says "This site may harm your computer", and it's YOUR site! What to do, step by step.Here are reasons why your website can be flagged with the "This site may harm your computer" warning in Google search results, in approximate order of likelihood:
StopBadware and Google describe the criteria they use to determine whether a website is contributing to the badware problem. Why antivirus scanning might not find the bad codeThe first idea that occurs to many webmasters is to do an AV scan on the site. In many cases, this will not find the problem. The next sections explain why. Scanning your website files on the serverScanning your site with an antivirus program will only work if the site is actually hosting the virus, which it often isn't. More likely, the virus itself is hosted on another computer. Your pages have been injected with iframe or JavaScript code that refers indirectly (with src=) to the virus on the other website. Thus, the AV program on your server sees only iframes and JavaScript which don't trigger virus alerts because they aren't viruses. The remote viruses aren't pulled in until the page is loaded into a visitor's browser. Then their browser fetches the code referred to by the src= property, and then they get a virus alert. You can, of course, scan your site with an antivirus program if you want, but if it finds no viruses, that does not mean the site is clean. Downloading your site files to your PC and scanning them thereUsing a tool like FTP, Wget, or cURL to download the source code of your pages to your local PC and AV-scanning them there is also unlikely to find the virus, for the same reason given above: the actual virus is probably not on the pages. Wget in recursive download mode can retrieve all linked files, including ones from remote sites and including remote script files, but if some of them are viruses, you will be downloading them directly to your PC, which is an unnecessary risk. Risky - browse your pages with an antivirus program running on your PCIf you are determined to use the facilities of an AV program to scan your site, you can browse the site as if you were an ordinary visitor. This is risky because an increasing number of viruses are "polymorphic". Their code changes so frequently (every day or every time they are served) that antivirus programs can't keep up and do a poor job of detecting them:
In summary, antivirus scanning is not definitive. If it finds a problem, that's useful. If it doesn't find a problem, it means nothing because the virus might be on a different website, or encrypted and polymorphic, or the site's problem might not be a virus at all. It might be a bad outlink. Instead, inspect your source files manually. Searching your pages for badware1) Discover which pages are flagged for badwareUse either one of these methods:
Whichever method you use, note the pages that are flagged. If it's only a few, make a list. If it's all of them, that could be important because the badware link might be something common to all of them. 1a) Suggested procedureStart by doing each of the steps in this article in only a preliminary manner. Once you know what you're looking for, it is very easy to discover suspicious code. Therefore, the first time through, don't spend too much time on a "fine-toothed comb" approach to every step. Just try to discover quickly:
If your quick preview turns up nothing, come back and do each check more thoroughly. 2) Prepare to search the pages for badwareIt is important to view and search the server-side source code of your pages, not the HTML output that you see with the View Source command in a browser. View Source can allow you to think the page is clean even though it's not. Some exploits put malicious code on pages only under certain circumstances, and your particular viewing might not meet those circumstances. Viewing the code on your server allows you to see ALL the code, even if it is only put on the pages sometimes. 3) How to search pages for malicious codeOne page at a time (recommended)Ways to view the source code one page at a time:
Start with the most central and important flagged page, such as your home page. Where to look: Malicious code is often inserted into web page files by robots, without concern for placement (or W3C validation). Common locations are
If your pages normally validate at W3C, do a check there on the badware-flagged pages. Any errors might point to the bad code. Multi-file searchingWith multi-file searching, a program searches multiple files for the search string you specify, and reports all the instances it finds. If you're already familiar with how to use the following programs, you might find a multi-file search easy to do. Otherwise, I recommend the "one page at a time method" above, at least to start with. Ways to obtain the source code for searching multiple files at once: Dedicated server Do your search directly on the server with an operating system tool like grep. Shared server If you're already familiar with cron and grep, you can create a cron job to do the grep search as though you had shell (command line) access. Otherwise, download the pages (or your entire site) to your PC so you can search them there. Download with:
After downloading the pages to your PC, you can use FrontPage, Expressions Web, Dreamweaver, GREP, or a similar utility to search all of your site's pages at once with single commands. Even Windows Explorer is better than nothing. Search stringsThese are useful search strings, whether you are searching one file at a time or all files at once:
Make sure all instances of src= and http:// refer to files on your site or to external sites you know and trust. Some common trusted ones that are not a problem are pagead2.googlesyndication.com (if you use AdSense) and www.google-analytics.com (Google Analytics). 3a) What do malicious or invisible iframes look like?iframe code looks like this. If you don't recognize remotesite.com, the code is suspicious. This is an "invisible iframe" because of the width and height settings: <iframe src="http://remotesite.com/path/file" width="0" height="0" frameborder="0"></iframe> 3b) What do malicious JavaScript references look like?JavaScript references to external sites look like this. If you don't recognize remotesite.com, the code is suspicious. This code calls and runs a JS script that is hosted on a website that isn't yours (and which might be a hacked or otherwise malicious site). After a visitor loads your page, their browser fetches this JavaScript from the other site and then runs it: <script language="JavaScript" src="http://remotesite.com/path/file.js"></script> 3c) What does malicious or obfuscated JavaScript code look like?Malicious JavaScript code directly on your pages (rather than being called by reference as described above) is often "obfuscated", "obscured", "encoded" to make it hard to tell what it does. It looks like an undecipherable jumble, like this (this has been greatly shortened and mangled to make it nonfunctional). Code like this is always suspicious and must be investigated: <script language="JavaScript">function nbsp() {var t,o,l,i,j;var s=''; s+='47116101120'; s+='097114099111'; s=s+'1203403211910'; s=s+'7121058110111112'; s=s+'9062032'; t='';l=s.length;i=0; while(i<(l-1)) {for(j=0;j<3;j++){t+=s.charAt(i);i++;} if((t-nescape(0xBF))>unscape(0x00)) t-=-(uescape(0x08)+unescae(0x30)); doc.rite(String.froCharCode(t));t='';}}nbsp(); </script> Sometimes VBScript is used instead, so the code starts with: When in doubt about whether a block of code is malicious, take a snippet of it and do a web search on it. 4) Search your pages for links to flagged sitesIf you found no suspicious code on your pages, you can also be flagged for linking to another site that has badware.
If any of the site's pages have the "This site may harm your computer" warning, you may be flagged for linking to them. Or they may be flagged for linking to you, if they do. There's no way to tell which it is. If you have a good relationship with the other site, let them know they've been flagged. 5) Forum posts and other user-generated content, especially spam postsYou can get badware-flagged if malicious links are posted in user-generated content on your site. Starting with the most recent posts and working backwards, check all the messages in your forum and in blog comments. For every link to another site, do a Google site: search to see if it is flagged. If it is, remove the link. 6) Advertisements that Google considers badwareIf you run affiliate ads or ads from advertising networks, you usually put the ads on the page by inserting iframe links or JavaScript into the code of your pages. The ads are retrieved from the third party sites only when your page is loaded into a visitor's browser. There are a few advertising networks that do things in their code that StopBadware and Google consider badware behavior. There are also ad networks that fail to properly screen the ads submitted by advertisers for distribution, so sometimes malicious ads get into their rotation inventory. Make a list of the advertisers you are affiliated with. Do a web search on them or ask about them in the StopBadware Google Group. An example of a web search that should return useful results is: advertiser badware OR StopBadware OR malware OR virus Bad ads can slip into even the big ad networks. DoubleClick clients got hit in 2007. An increasing amount of advertising is being served in Flash .swf files. These files can be flagged as badware, too. See the next section. 7) Out-of-date, exploitable, malicious Shockwave Flash filesThere have been numerous security vulnerabilities found in Flash. In addition, Flash scripting allows authors to embed badware behavior such as redirecting to a different website while the user is helpless to prevent it. Whether your Flash files serve third-party advertising or merely your own content, they will get flagged if Google determines they are outdated, exploitable, or have malicious scripting. In early 2008, this is an increasingly common reason for sites being flagged. The easiest way to determine whether .swf files are the reason for your site being flagged is to remove the files as part of your initial site cleanup. After the badware flag is removed from your site, put the files back. If the flag returns, they're a problem. You can also try scanning your .swf files with the AdopsTools Online click checker, which gives you a report about the file's content. These links might help you investigate further:
Technical articles:
While you are investigating and fixing your site, you might want to keep Flash disabled in case you have a bad .swf file. In Internet Explorer, go to Tools > Manage Add-ons > Enable or Disable Add-ons > Add-ons that have been used by Internet Explorer. Disable two items: 1) Shockwave ActiveX Control and 2) Shockwave Flash Object. I keep both disabled all the time. For Firefox, there is an extension called Flashblock. 8) Advertisements from otherwise legitimate ad networks that have been hacked and are now serving badwareEven if your advertisers normally use only legitimate methods, their company server can be hacked and their ads replaced with malicious code, which would start appearing on your pages instead of the usual ads. This is a danger anytime your pages pull some of their content from other sites. This is one case where the only way to detect the malicious code is to visit your site pages with your browser, to make sure all the ads are the legitimate ones you expect. If you are affected by a hacked advertising network, you won't be the only one, so a web search should turn up other similar reports. 9) Other third party contentIf you use any code that includes content from a remote site, such as
there is always the danger that the other site got hacked and your pages are now referencing or including malicious content. 10) Database content from your CMS.If the content for your site pages is stored in a Content Management System (CMS) database, it is possible that a website hack inserted malicious code into your database rather than into the code for your pages. This is another case where visiting your site with your browser is the easiest (although risky) way to find the malicious code. The safer but slower way is to visually inspect the data in your database tables. If database injections are found, the only way to clean the site will be to manually examine and clean the database (not easy) or restore it from a known-good backup. 11) Rewrites in your .htaccess fileExamine your site configuration files such as Apache httpd.conf and .htaccess for any rewrite or redirect code that serves malicious content from another website instead of the content that was requested from your site, or that redirects your visitors to a malicious site. If someone edited or replaced your .htaccess to do those things, the Google crawler would encounter badware when it tries to visit your site, and your site would get flagged. In httpd.conf or .htaccess, look for lines containing Rewrite or Redirect and references to sites that aren't yours. 12) Server has a rootkit installedIf you have thoroughly investigated all the preceding possibilities and you are sure everything in your site is clean, your server might be infected with software such as a rootkit that is injecting malicious content onto your pages, not in the filesystem, but on the fly, as they are served. A rootkit is one or more programs that partially replace the operating system. It performs ordinary operating system tasks just like the OS would, plus it also performs whatever malicious activity it is programmed to do. Because the rootkit hijacks operating system tasks, it can hide itself. If you ask for a file directory, the rootkit gives it to you, except it omits files it doesn't want you to know about. If the rootkit replaced a system file, you cannot discover its existence by checking the file size because when you ask for the file size, the rootkit lies. It tells you the size of the file it replaced, not what its own size really is. Simply put, a server with a rootkit is trashed. If you are on shared hosting, there is absolutely nothing you can do to eliminate a rootkit except file a support ticket with your webhost and wait for them to investigate. While they investigate, examine your site files. This time, search not for malicious code, but for vulnerabilities. Maybe the attack did enter through your site and rooted the server but didn't modify any of your site files. Putting the same vulnerable site live again on a freshly cleaned server is not something your webhost will thank you for. If you run a dedicated server, reformat the hard drive, reinstall and configure the operating system and server software, reinstall your site from known-good backups, and start fresh. In the big scheme of things, rootkits are an unlikely cause of being flagged for badware, and it should be the last thing you consider after you have diligently investigated every other possibility and found nothing. However, the incidence of rootkit infections is increasing. You might find the following articles interesting or useful if you suspect a rootkit. They discuss one that is being called "Random JavaScript Toolkit":
13) DNS cache poisoningPeople think of website addresses as text like http://website.com, but web addresses are really numbers called IP addresses. Before a browser can fetch a web page from a site, it must first send a query to a DNS Server to get the site's correct numeric address. Occasionally, someone manages to inject bad data into a DNS server so the IP address translations it returns are wrong. If someone tries to visit your website but their browser gets your IP address from a poisoned DNS server, they will be sent to a completely wrong website. That site might have malicious content, which could cause your site to be flagged for badware. When investigating your badware flag, this is a "way-out-there" scenario, rare and unlikely, but it has happened, so it's included here for completeness. 14) Ask for help in the StopBadware GroupIf the reason your site is flagged remains a mystery, post a message in the StopBadware Google Group (forum) to ask for assistance. 15) Request a review from Google or StopBadwareAfter you have found and resolved all the likely reasons your site got flagged, file a request for review in the Webmaster Tools section of Google Webmaster Central, or on the StopBadware Request for Review form. If they find that the badware or badware links are gone, the warning flag is usually removed within 1 to a few days, even though their submission form says to allow longer. If you changed nothing on your site, but only filed the review request, the flag will not be removed. Google's accuracy at identifying badware is nearly 100%. Links and resources:
Keeping badware off your siteThe most important ways to keep badware off your site are
Comments, questions, and discussion welcome in the Forum. |
|
|
|
|
|
|