I defined the $FullpathExcludeRegexes array 3 times (each occurrence overrides the previous) to make it clear that you
can use a different list of exclusions for each search, but don't have to.
The technically correct way to define your own list is to define it 3 times in the 3 places shown. Basically, that means defining it once and then copying the code to the 2 other places.
However, if you want to use the same exclusion list for all the searches, you can also just define your list in the first instance, and then comment out or delete the 2 later instances where $FullpathExcludeRegexes is redefined.
Or, if you're really only interested in the results of the malicious snippets search (the last of the 3), you can just define your exclusion list in the array that precedes that search. Your definition there will override the previous definitions, "just in time" for doing the malicious snippets search.
Of course, you can also keep a master copy of the script, and then use a copy of it each time you run it. In the copy, you can delete all the code for the first two searches if you don't need to do them.
Excluding foldersEach regex entry in the exclusion list is matched against the full path of each file. For example, here is a fullpath:
/home/userid/public_html/25years/blog/2010/20100315.htm
To exclude this one file by name only, the regex could be
'#20100315\.htm$#'
The $ at the end means that the .htm must be the very end of the string. There must be nothing more after it. Without the $ to mark the end, that regex would also match and exclude the (unlikely) filename /blog/20100315.htm/somefile.php
If there are other folders that might have a file with that name, you could make the regex more specific with any of these, depending on how specific you need to be:
'#/2010/20100315\.htm$#'
'#/blog/2010/20100315\.htm$#'
'#^/home/userid/public_html/25years/blog/2010/20100315\.htm$#'
The third one has a ^ which marks the beginning of the string just like $ marks the end. In this case, the file will match the regex only if its entire path+filename exactly matches that whole string. That is the safest way to do a match.
The same principle applies to matching directories. To exclude the /blog/ directory, you could use this
'#/blog/#'
But if you have more than one /blog/ directory (in various other paths), you might have to be more specific about which /blog/ directory to exclude, with this regex
'#public_html/25years/blog/#'
or even this
'#^/home/userid/public_html/25years/blog/#'
Note that it has a starting ^ anchor, but in the case of a directory, it must not have an ending $ anchor because the string that this regex is being matched against will certainly have a filename after the ending "/".
In the array definition, all entries except the last one must end with a comma because it is a list. So here is an example with some of the lines from above:
$FullpathExcludeRegexes = array
(
'#lookforbadguys\.php$#i',
// but this matches any lookforbadguys.php file in any folder
'#20100315\.htm$#',
// any file with this name, in any folder
'#^/home/userid/public_html/25years/blog/#'
);
To completely avoid any possibility of ambiguity, you can make it a habit to always use full paths:
$FullpathExcludeRegexes = array
(
'#^/home/userid/public_html/lookforbadguys\.php$#i',
'#^/home/userid/public_html/25years/blog/2010/20100315\.htm$#',
'#^/home/userid/public_html/25years/blog/#'
);
As an advanced example, the regexes can do more complex things. This excludes .htm and .html and .php and .txt files (and also .HTM .HTML .PHP .TXT because the "i" at the end means case-insensitive) in the /blog/ folder itself, but entries in its subfolders such as /blog/subfolder/ are not excluded:
$FullpathExcludeRegexes = array
(
'#^/home/userid/public_html/25years/blog/[^/]+\.(html?|php|txt)$#i'
);