25 Years of Programming Community Forum
Blog  Sitemap  Services
May 25, 2013, 12:01:19 AM *
Welcome, Guest. Please login or register.

Login with username, password and session length
News: If you want email notification when someone replies to a topic, click the topic's Notify button.
 
   Home   Help Search Login Register  
This is a link to the Chat Room (for Firefox+ChatZilla) when you are logged in.
View help topic about using Live Chat
Pages: 1   Go Down
  Print  
Author Topic: opinions on .htaccess  (Read 6487 times)
0 Members and 1 Guest are viewing this topic.
jimlongo
Newbie
*
Offline Offline

Posts: 3


« on: January 12, 2010, 06:59:26 AM »

Hi Steve, I found some of these suggestions when scouring the internet for securing mywebsite.
I wonder if you have any opinions on them - I will admit I understand very little of regular expressions and don't understand many of these directives.

Quote
ServerSignature Off
RewriteCond %{REQUEST_METHOD}  ^(HEAD|TRACE|DELETE|TRACK) [NC,OR]
RewriteCond %{THE_REQUEST}     ^.*(\\r|\\n|%0A|%0D).* [NC,OR]

RewriteCond %{HTTP_REFERER}    ^(.*)(<|>|’|%0A|%0D|%27|%3C|%3E|%00).* [NC,OR]
RewriteCond %{HTTP_COOKIE}     ^.*(<|>|’|%0A|%0D|%27|%3C|%3E|%00).* [NC,OR]
RewriteCond %{REQUEST_URI}     ^/(,|;|:|<|>|'>|'<|/|\\\.\.\\).{0,9999}.* [NC,OR]

RewriteCond %{HTTP_USER_AGENT} ^$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^(java|curl|wget).* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*(winhttp|HTTrack|clshttp|archiver|loader|email|harvest|extract|grab|miner).* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*(libwww-perl|curl|wget|python|nikto|scan).* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*(<|>|’|%0A|%0D|%27|%3C|%3E|%00).* [NC,OR]

RewriteCond %{QUERY_STRING}    ^.*(;|<|>|’|”|\)|%0A|%0D|%22|%27|%3C|%3E|%00).*(/\*|union|select|insert|cast|set|declare|drop|update|md5|benchmark).* [NC,OR]
RewriteCond %{QUERY_STRING}    ^.*(localhost|loopback|127\.0\.0\.1).* [NC,OR]
RewriteCond %{QUERY_STRING}    ^.*\.[A-Za-z0-9].* [NC,OR]
RewriteCond %{QUERY_STRING}    ^.*(<|>|’|%0A|%0D|%27|%3C|%3E|%00).* [NC]

Thanks,
jim
Report to moderator   Logged
SteveW
Administrator
Sr. Member
*****
Offline Offline

Posts: 285


WWW
« Reply #1 on: January 13, 2010, 03:11:20 AM »

Jim, that's a really good question, and I'll try to answer in some depth although I'm not sure I'll manage to do it all in one sitting.

I've seen this code before and suspect you probably got it at http://docs.joomla.org/Htaccess_examples_%28security%29. What I'll try to do is put the code in bold text and insert some comments about it in plain text.  

The only thing obviously wrong with the code at first glance is that, as-is, it lacks a needed line and is thus incomplete and doesn't do anything! The author most likely assumed that the user would supply the missing line, or insert the provided code inside an existing code block that already has it, or create a custom action line depending on what action they want taken, but people unfortunately do use the provided code without doing any of those things.

The RewriteCond lines define conditions under which the final action, a "RewriteRule" line, should be taken. Without a RewriteRule line, the conditions are set up, but nothing gets done. The last line should be one that "forbids" the request. It sends a "403 - Forbidden" error page to the requestor:

RewriteRule .* - [F,L]
 
An important, if complex, reference to all this is at http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html. It has links to other pages with examples, etc. A way to get started understanding mod_rewrite might be to start with the code you provided and try to decode it a line at a time while using the Apache web page as a study guide and reference.  

What you're basically doing is testing various aspects of the request and rejecting the request if it matches certain conditions that are usually or always suspicious, associated only with hack attempts.  The format is:

RewriteCond [What part of the request is being tested]  [regular expression to test against] [flags]
One or more of the above lines...
RewriteRule [action to take, be it a Rewrite or a Redirect, a Forbid, or something else]

In the flags, NC="no case" (case-insensitive match). OR means "if this condition matches, OR the next one does..." Normally, RewriteConds form an "AND" chain. The RewriteRule is only applied if ALL the conditions are met. An OR flag makes it operate differently.  Flags are discussed at http://httpd.apache.org/docs/2.2/rewrite/rewrite_flags.html.

The first and most important thing to note is that since the code you provided doesn't take any action, you might be using it and think that it must be working great because it's not having any adverse effects on your server.  As soon as you add the RewriteRule line, it WILL start forbidding requests. Whenever you make a change to .htaccess, it is very important to immediately start watching your access logs to see what effect the change is having.  If you make a mistake, Apache might start forbidding all requests, or even stop serving pages at all, which you would want to know right away! On a cPanel server, the best place to watch this is at cPanel > Latest Visitors, where you can see the requests your server is receiving, and its responses, in real time.  You should also request a few pages yourself to test the new configuration: request pages having the characteristics you are trying to ban (to make sure they are properly banned), and request a few normal pages (to make sure they are still being properly served).  And watch many requests in Latest Visitors to make sure your legitimate visitors are still receiving the pages they ask for.

I'll take a break for now, with the only code revision so far being the addition of a final line:

RewriteRule .* - [F,L]

If you have any questions about specific lines or parts of lines, feel free to ask, and I can make those higher priority. This is complicated, and the full commentary is going to take a while.
« Last Edit: January 13, 2010, 05:31:08 AM by SteveW » Report to moderator   Logged
SteveW
Administrator
Sr. Member
*****
Offline Offline

Posts: 285


WWW
« Reply #2 on: January 13, 2010, 05:05:28 AM »

Next line is explained at http://httpd.apache.org/docs/2.2/mod/core.html. Not particularly important, but harmless:

ServerSignature Off

HTTP methods are described at http://en.wikipedia.org/wiki/HTTP_method#Request_methods. I don't know about the desirability or necessity of banning all these methods. HEAD allows someone to determine if the page exists without fetching the whole thing. PUT and DELETE should not be allowed, but if the server is otherwise properly configured, those won't be allowed anyway. If unsure, I suppose this is a second line of defense. Note the addition of PUT:

RewriteCond %{REQUEST_METHOD}  ^(HEAD|TRACE|DELETE|TRACK|PUT) [NC,OR]

Desirable. Line ends (carriage returns, line feeds) should not be allowed in requests. Whenever you ban a character, you should also ban it in upper/lower case and also in its encoded form. A carriage return can be sent as a character or encoded as %0A or as %0a (its ASCII codes in hexadecimal). LF = %0D or %0d. As a more general example (of a printable alphanumeric character), the letter A could be sent as A or %41 or a or %61. Refer to an ASCII code chart such as at http://en.wikipedia.org/wiki/Ascii, though I'm sure you can find better formatted ones. The [NC] flag takes care of the possible variations in upper/lower case.

RewriteCond %{THE_REQUEST}     ^.*(\\r|\\n|%0A|%0D).* [NC,OR]

I did not study these lines closely. See contents within {} for the part of the request being tested, and refer to an ASCII chart to see what characters are being banned. These are generally desirable characters to ban. These lines (especially the third) do not ban the chars in all their possible encodings.

RewriteCond %{HTTP_REFERER}    ^(.*)(<|>|’|%0A|%0D|%27|%3C|%3E|%00).* [NC,OR]
RewriteCond %{HTTP_COOKIE}     ^.*(<|>|’|%0A|%0D|%27|%3C|%3E|%00).* [NC,OR]
RewriteCond %{REQUEST_URI}     ^/(,|;|:|<|>|'>|'<|/|\\\.\.\\).{0,9999}.* [NC,OR]

This bans any request that does not provide a user-agent string such as Internet Explorer, Firefox, Googlebot, etc. I doubt that this is necessary. The situation it addresses is relatively rare. I am not sure this ban is harmless.

#RewriteCond %{HTTP_USER_AGENT} ^$ [OR]

These lines ban specific words in user-agent strings. The last one bans certain embedded special characters. Note that curl and wget are unnecessarily duplicated in two lines.

RewriteCond %{HTTP_USER_AGENT} ^(java|curl|wget).* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*(winhttp|HTTrack|clshttp|archiver|loader|email|harvest|extract|grab|miner).* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*(libwww-perl|curl|wget|python|nikto|scan).* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*(<|>|’|%0A|%0D|%27|%3C|%3E|%00).* [NC,OR]

This line (possibly displayed as 2 lines below) bans certain special characters (punctuation) in the query string followed by various words that are associated with SQL injection attacks.

RewriteCond %{QUERY_STRING}    ^.*(;|<|>|’|”|\)|%0A|%0D|%22|%27|%3C|%3E|%00).*(/\*|union|select|insert|cast|set|declare|drop|update|md5|benchmark).* [NC,OR]

The author must have encountered attacks using these words:

RewriteCond %{QUERY_STRING}    ^.*(localhost|loopback|127\.0\.0\.1).* [NC,OR]

Bans a period followed by any alphanumeric character. Seems overly general to me. Ok if your legitimate query strings never contain a period.

RewriteCond %{QUERY_STRING}    ^.*\.[A-Za-z0-9].* [NC,OR]

Bans carriage returns, line feeds, other punctuation.
Notice that this line, the last in the series of RewriteCond lines, does not (and must not) have "OR" in its flags field.

RewriteCond %{QUERY_STRING}    ^.*(<|>|’|%0A|%0D|%27|%3C|%3E|%00).* [NC]

It would be good to add a line to ban question marks in the query string, as well.
And another to ban parentheses in the query string.

RewriteRule .* - [F,L]

When you enable these bans, you need to watch your logs closely. For example, parentheses and brackets [] are also useful to ban, but you cannot do that if any of your legitimate page names do contain them. That's why copy and paste code like this can be dangerous. The original author cannot take your specific circumstances into account, so copying and pasting .htaccess code without understanding it can result in undesirable effects.

The code you started with is entirely in the spirit of what a well designed .htaccess for improved security will look like. It filters things found in malicious requests, and it takes into account the multiple possible encodings that a character might have. It's a good example to learn from and to use as a guide in designing your own. While designing your own, your HTTP access logs are an important data source. It will help you locate and determine the characteristics of attacks being launched agaist your site, and when you're considering a particular filter, it's a place where you can search to see if that filter would accidentally ban legitimate requests.

Regular expressions can be difficult, but they are so powerful and useful that time invested in becoming familiar with them is well spent. One starting point is http://en.wikipedia.org/wiki/Regex. Apache uses the particularly flexible and powerful "Perl-Compatible Regular Expressions" (PCRE), http://en.wikipedia.org/wiki/PCRE. After a bit of starting familiarity, the Linux manual page called "pcrepattern" is the reference I use often. It's a standard Linux man page; one location for it is http://linux.die.net/man/3/pcrepattern.

« Last Edit: January 13, 2010, 06:16:56 AM by SteveW » Report to moderator   Logged
jimlongo
Newbie
*
Offline Offline

Posts: 3


« Reply #3 on: January 13, 2010, 08:12:57 AM »

Thanks for the great explanation, you're right that some lines won't work in a (my) particular situation.

I also found a more concise approach yesterday that might be interesting, again one line in it needed to be removed for my application, but it seems more focused than what I had put above.  It can be found at the Perishable PRess website
http://perishablepress.com/press/2009/03/16/the-perishable-press-4g-blacklist/
« Last Edit: January 14, 2010, 08:43:09 AM by jimlongo » Report to moderator   Logged
SteveW
Administrator
Sr. Member
*****
Offline Offline

Posts: 285


WWW
« Reply #4 on: January 14, 2010, 01:58:16 AM »

The Perishable Press method looks good, and their format is neat and potentially easier to understand (depending on the groupings and format you prefer). There are usually multiple ways to do anything in .htaccess. Using a format that's easy for you to read makes the code easier to revise later, if needed. In addition to their completed 4G Blacklist at http://perishablepress.com/press/2009/03/16/the-perishable-press-4g-blacklist/, there is also an explanatory walkthrough of its methods at http://perishablepress.com/press/2009/02/03/eight-ways-to-blacklist-with-apaches-mod_rewrite/, and there are other related articles in the site, as well. The author also posted notes about customization needed for using it with Joomla and WordPress. Someone trying to put together an .htaccess file will find those articles useful to improving their understanding. The light text on dark background made my eyes go buggy quickly, though, and I stopped reading before I really wanted to.

-----

Here are some notes about regular expressions.

^ and $, if used, "anchor" (fix the location of, or "must match") the start and end of the target string.

A regex of:
dog
will return true if "dog" is anywhere within the target string.

^dog
returns true only if the target string starts with "dog".

dog$
returns true only if the target string ends with "dog".

^dog$
returns true only if the target string is exactly "dog".

When two or more terms are inside parentheses with vertical lines between them, the test will return true if either one appears at the given location:
(dog|cat|mouse)

A period . (if it does not have a backslash in front of it) will match any character.
An asterisk * means "zero or more occurrences" of the preceding "term" (piece of a regex).
You often see the two used together like this:
.*
which means "any sequence of characters, 0 or more characters long".

If you want to apply the * to a longer sequence, put that sequence inside parentheses so they get grouped together as one term:
(the)*
will match 0 or more occurrences of "the", including: (no text at all, 0 occurrences), the (1 occurrence), thethe, thethethe...

Characters inside brackets, like [ABCabc], will match any one of the enclosed characters. You can also specify a range: [A-Z], which means any one of the capital letters A through Z, and you can specify multiple ranges, like [A-Za-z], which matches any upper or lowercase letter.

Let's look at an example line from above, with the parts color coded to match their descriptions:

RewriteCond %{QUERY_STRING}    ^.*\.[A-Za-z0-9].* [NC,OR]

This test checks the request's query string to see if it contains:

"Starts with" any sequence of characters.
followed by
a literal period (the leading backslash is required when you want a period to mean "a period" instead of "any character").
followed by
any upper or lowercase letter or a digit
followed by
any sequence of characters
all tested in a case-insensitive manner.

Although this code does its job, some nitpicking will allow some additional explanation.

1. The sequence ^.* is unnecessary and can be eliminated. Remember that a regex of dog matches "dog" anywhere in the target string, since it is not anchored with ^ or $ to either the start or end of the string, nor is it fixed in place relative to any other text. That's the same as "Starts with" any sequence of characters.

2. [A-Za-z0-9] allows for upper and lower case, but the NC flag also takes care of that, so [a-z0-9] will do.

3. The trailing .* (any sequence of some, or no, chars) is also unnecessary, since it inherenly means that any trailing chars are completely optional and irrelevant to the match.

So what we're left with is a simpler and easier to comprehend:

RewriteCond %{QUERY_STRING}    \.[a-z0-9]   [NC,OR]

You can see that many of the above lines can be similarly simplified, which makes them less intimidating-looking.

You can make sense of a regular expression (or build one) by examining (or building) its component pieces one by one. What it requires is a willingness to not look at it as an intimidating jumble of text, but to pay attention to its details and work out its meaning a step at a time. With practice, it gets easier.

Like the asterisk, there are some other characters that have special meanings as wildcards or other things. When you want them not to have their special meanings, and be treated as ordinary characters, you must precede them with a backslash. I won't make a list here, but that will explain why sometimes in code you see backslashes.  

In this line

RewriteCond %{REQUEST_URI}     ^/(,|;|:|<|>|'>|'<|/|\\\.\.\\).{0,9999}.* [NC,OR]

the {0,9999} in bold is another type of "quantifier", similar to the asterisk. It determines how many occurrences of the preceding group to match. In this case, it means 0 to 9999 occurrences. The group that it quantifies is simply the period that precedes it, so it means a string of any characters 0 to 9999 characters long.  
« Last Edit: September 17, 2011, 03:01:02 AM by SteveW » Report to moderator   Logged
Pages: 1   Go Up
  Print  
 
Jump to:  

Yahoo! Search
Search the web Search this site
Mazeguy Smilies Powered by MySQL Powered by PHP Powered by SMF 1.1.16 | SMF © 2011, Simple Machines Valid XHTML 1.0! Valid CSS!