|
25 Years of Programming
An open source source for C, C++, OWL, BASIC, MDB, XLS, DOT, and more... |
Home Projects Sitemap Search Blog Forum+Chat About Us Privacy Terms of Use Feedback FAQ Images Services Payments Humor Music |
PHP security: how to use data validation to avoid Remote File Inclusion (RFI) vulnerabilities in your code, with examplesA remote file inclusion (RFI) vulnerability is a security flaw in programming code. Whenever a script receives data from outside itself, there is a danger that the data was sent by a malicious attacker, a hacker, who designed it so that it would corrupt the execution of the script and trick it into doing something it wasn't supposed to do. One of the actions that a corrupted script can be tricked into doing is to fetch a file from a distant website (that is the "remote file") and include() it into the body of the corrupted script (that is the "inclusion"). Whatever program code is in the remote (but now local!) file becomes part of the corrupted script, and it executes right along with all the other code. RFI therefore allows hackers to run their code on your server, with the same access permissions (to folders and files) that your own code has. The key to success of an RFI attack is that the hacker must be able to send the URL of the remote file into your script, disguised as innocent data. That's easy. All they have to do is find (or guess) the avenues by which your script accepts incoming data, make note of the variable names you use (or guess, using common names), and then start sending your script ordinary requests of the type it normally expects, but with one difference: the values of the variables it sends are all the URL of the remote script they want your script to execute. They have no control over whether your script actually uses the incoming data in PHP include(), include_once(), require(), or require_once() statements (or their equivalents in other languages), but it is so common for that to be the case that this is a high percentage play for them. An important defense against RFI attack is to write your scripts to examine every incoming variable to ensure that its data type, character composition, format, and value are "legal" according to the characteristics your script expects that variable to have. If an incoming variable is not what you expect, your script must not use it. This is called data validation or "sanitizing" or "scrubbing". It is the topic of this article. Untrusted data (from outside the script) requires validationWhen you set a variable explicitly in a script with: $a = 4; that data is considered trusted. You are in control of it. You set it yourself, and presumably not maliciously. Likewise, if you read data from a file or other source that you completely control, that data is trusted. Data is "untrusted" when it comes from a source you don't control completely. Common ways PHP scripts receive untrusted data from the outside world:
Before each of these incoming variables is used in a script, it is necessary to ensure that it has a value in the set of, or within the range of, the legitimate values your script is designed to handle for that variable. If it is not, you should instead give it a safe default value, or not use it at all, or reject the submission and inform the user that the input was invalid, whichever option is appropriate to your application. Some of the ways you can test variables include:
The example code will show methods of doing all these tests, but first let's do some experimenting to see what RFI is all about. Valid and invalid data in form submissions, and an RFI demonstrationAlthough this page doesn't have a form on it, it is designed to handle form submissions using HTTP GET requests. You submit the data manually by copying and pasting URLs for this page into your browser's address bar. The URLs have the same format as ones generated by a browser when you submit a form. Doing it manually will help understand what an RFI attack is and how it works. The hypothetical form has two fields:
Here are some example URLs to paste into your browser address bar. You'll see the result of your "form submissions" at the top of the resulting page.
http://25yearsofprogramming.com/blog/2011/20110124.htm?age=25&color=blue
http://25yearsofprogramming.com/blog/2011/20110124.htm?age=50&color=3
http://25yearsofprogramming.com/blog/2011/20110124.htm?age=noyb&color=violet So far, all seems quiet. Whatever you enter, the output you get is an age and a color. If you try to do something invalid, you get a default age and a default color. Big deal.
http://25yearsofprogramming.com/blog/2011/20110124.htm?age=75&color=htpp://25yearsofprogramming.com/robots.txt What happened??! You've got a lot of nerve! You hacked my website!! OK, not really. We're pretending. What happened was this:
If my original code (in pseudo-code form) were: fetch_the_color_file(); place_its_text_on_the_page(); do_more_stuff(); it could then become: fetch_the_color_file();
place_its_text_on_the_page(); // but it's PHP code!, so it runs and does this:
make_a_list_of_all_files_in_my_site();
for(every_file)
{
open_the_file_in_append_mode();
add_a_virus_infected_iframe_to_the_bottom();
save_the_file();
}
do_more_stuff();
All that extra code came from the file on your website. It got inserted right into the middle of my own code, and it ran. Now every single page of my site has a virus-infected iframe in it. That's what I get for failing to make sure that what you sent me was a legitimate color! PHP $_GET[''] Data Validation Example CodeThe examples show several methods of validating $_GET[''] variables. The same methods apply to $_POST[''] or the others. Links go to documentation pages at php.net. The basic strategy is the same for all methods:
After you remove the comments, you'll see that none of the examples have much code. |
If there are many legal values,
you
could keep the list in a file and use the
file() function
to read it into the array when you need it.
<?php
// LOCAL VARIABLE WITH ITS LEGITIMATE DEFAULT VALUE
$Color = 'red';
// ARRAY OF ALL POSSIBLE LEGAL VALUES FOR THE VARIABLE
$LegalColors = array
(
'red',
'blue',
'green'
);
if(isset($_GET['color'])) // IF USER SUBMITTED A COLOR VALUE
{
// REMOVE IRRELEVANT LEADING/TRAILING WHITESPACE FROM THE INCOMING TEXT
$_GET['color'] = trim($_GET['color']);
// CHECK AGAINST THE LEGAL-VALUES ARRAY, WITH STRICT TYPE CHECKING
if(in_array($_GET['color'], $LegalColors, TRUE))
{
// TRANSFER THE INCOMING VALUE TO THE LOCAL VARIABLE
$Color = $_GET['color'];
}
// AN else {} HERE COULD ABORT THE SCRIPT IF THE VALUE WAS ILLEGAL
}
?>
This example uses a regular expression to test against all possible legal values. That is an exact duplication of the array validation method above, but regex testing can be used more flexibly than that: you can test for variations and patterns rather than against specific entire strings. Be sure that your regular expression matches all the possible legal values, but nothing else.
<?php
$Color = 'red';
if(isset($_GET['color']))
{
$_GET['color'] = trim($_GET['color']);
if(preg_match('/^(red|blue|green)$/u', $_GET['color']))
$Color = $_GET['color'];
}
?>
// OTHER USEFUL REGULAR EXPRESSIONS. A WEB SEARCH WILL FIND MANY COMMON ONES.
if(preg_match('/^[A-Z]{1,8}$/u', $_GET['var'])) // 1-8 UPPERCASE ALPHABETIC
if(preg_match('/^[A-Z0-9]{1,8}$/ui', $_GET['var'])) // 1-8 UPPER/lower ALPHANUMERIC
The switch method allows some additional flexibility: you can translate incoming values to different values for internal use. This example, in addition to allowing the color names, allows numeric color values of 1,2,3 and uses the cases to translate them to red,blue,green for internal use. If I only used the numbers in publicly visible URLs, I could prevent anyone knowing what values they are translated to internally.
<?php
$Color = 'red';
if(isset($_GET['color']))
{
$_GET['color'] = trim($_GET['color']);
switch($_GET['color'])
{
case 'red':
case '1':
$Color = 'red';
break;
case 'blue':
case '2':
$Color = 'blue';
break;
case 'green':
case '3':
$Color = 'green';
break;
default:
// YOU COULD ABORT SCRIPT HERE
break;
}
}
?>
<?php
$Age = 0;
if(isset($_GET['age'])) // IF USER SUBMITTED AN AGE VALUE
{
$_GET['age'] = trim($_GET['age']);
// "IF THE INPUT CONSISTS OF 1 TO 3 DIGITS"
if(preg_match('/^[0-9]{1,3}$/u', $_GET['age']))
{
// FORCE THE VARIABLE TO THE REQUIRED TYPE
settype($_GET['age'], 'integer');
// TEST FOR ACCEPTABLE MINIMUM, MAXIMUM VALUES
if(($_GET['age'] >= 0) && ($_GET['age'] <= 114))
{
// ACCEPT THE VALUE TO OUR LOCAL VARIABLE
$Age = (int)$_GET['age'];
}
}
// AGAIN, IF INCOMING VALUE WASN'T VALID, LOCAL $Age WASN'T CHANGED.
}
?>
This alternative uses a PHP 5.2+ validating "filter function" to validate an integer with less code:
<?php
$Age = 0;
// RETURN VALUE IS THE VALIDATED INTEGER (ON SUCCESS), OR FALSE, OR NULL
$i = filter_input(INPUT_GET, 'age',
FILTER_VALIDATE_INT,
array('options'=>array('min_range'=>0, 'max_range'=>114)));
if(($i !== FALSE) && ($i !== NULL))
$Age = $i;
?>
The following code is equivalent. I currently recommend using it instead because it appears to me that filter_var is more reliable, predictable, and portable than filter_input. The user comments at php.net about filter_input (see the link) mention odd behavior that I've also experienced.
We must use isset() because filter_var throws an error if the tested variable isn't set. In the example, the default $Age of 0 is used if $_GET['age'] is not set, and is also specified as the default value if it is set but invalid.
<?php
$Age = 0;
if(isset($_GET['age']))
$Age = filter_var($_GET['age'],
FILTER_VALIDATE_INT,
array('options'=>array('default'=>$Age, 'min_range'=>0, 'max_range'=>114)));
?>
More defenses against RFITwo other methods of RFI defense can serve as backup, in case you make a mistake in your script and allow some variables to go unvalidated, or in case an application you use contains not-yet-discovered RFI vulnerabilities:
Notes
Suggestions, comments, questions welcome in the Forum. |
|
|
|
|
|
|
Copyright ©2012 Steven Whitney. Last modified Sun 07/29/2012 10:57:32 -0700. |
||