|
25 Years of Programming
An open source source for C, C++, OWL, BASIC, MDB, XLS, DOT, and more... |
Home Projects Sitemap Search Blog Forum+Chat About Us Privacy Terms of Use Feedback FAQ Images Services Payments Humor Music |
How to replace FrontPage included content webbots with PHP includes, and a walkthrough of a Regular Expressions Find and ReplaceIntroductionFrontPage included content webbots are just text in the source code of a web page. PHP includes are also just text (PHP code) in the source code, so the procedure for replacing one with the other is basically just a text search and replace. It is made more complicated because a) the text to search for varies due to relative path references in the webbot code, and b) the text to search for varies depending on whether you are searching in a page that is Open in code view or in a File that is not open in code view. Some additional detail is provided here in case this is the first time you've used PHP. There are several ways to do this conversion, ranging from all-manual to all-automated. At the end of this article is a detailed walkthrough of how to use Regular Expressions to convert every webbot include on your website to its PHP equivalent with one Find and Replace operation. You can also convert FrontPage Shared Borders to PHP includes, but it is less confusing if you do it in two steps: first convert the Shared Borders to webbot include pages, then convert the webbot include pages to PHP includes, as described here. Advantages of replacing webbot includes with PHPPHP includes can do things that FrontPage includes can't:
Disadvantages of replacing webbot includes with PHP
Should I use PHP or Apache Server Side Includes?I'd strongly recommend PHP. The capabilities of SSI are very limited. PHP is an entire programming language that opens a whole new world of things you can do with your website, and yet the code you need to learn to use PHP includes is minimal, no more difficult than SSI. Starting with "includes" is an easy way to begin learning about PHP. If you decide to use Apache SSI, the procedures below are basically the same, except that the replacement code will be different. See the Apache SSI documentation for more information. How webbot includes workWhen you add a FrontPage include webbot on your page (Insert > Web Component > Included Content > Page), FrontPage puts into your code a line that looks something like these. It uses a relative path from the document file to the file being included: <!--webbot bot="Include"
U-Include="inc/example.htm" TAG="BODY" --> These are HTML comments (because they are inside <!-- and -->). It is only FrontPage that interprets them as instructions. When you save the page, FrontPage translates the webbots to HTML as it saves the file, as follows: It retrieves everything between the <body></body> tags of the file being referenced and inserts it directly into the file being saved, delimited by an opening comment tag that is slightly modified from the ones shown above and a newly-added closing comment tag so that the "included" text is sandwiched between two comments. What it puts into the file looks like this: <!--webbot bot="Include"
U-Include="inc/example.htm" TAG="BODY" startspan --> You can see that text if you open the file in Notepad instead of in FrontPage. When you open the file again in the FrontPage editor, it removes the closing comment tag and the contents of the included file, and it restores the opening webbot comment tag to the form it had when you originally inserted it. Then it displays the result to you in the Code View pane for editing. It is not showing you the exact text that is in the file when it's saved to disk. That's why you must use Notepad to view the file's true text content. Whenever you modify and save an included page, FrontPage searches your site for pages that include it, and then updates them with the new content. This is why there are several seconds of processing whenever you save an include page. Important things to note:
How PHP includes workWhen your server receives a request for a file, it first sends the file through the PHP interpreter, which executes any PHP commands that are embedded in the file's text. One PHP command is include(filename). PHP reads filename and inserts its entire contents into the page. Then it gives the page back to Apache, which sends it to whoever requested it.
Preparation before you replace FrontPage includes with PHPPlease see recommended PHP configuration settings for php.ini and .htaccess. Those settings are important because although "includes" are an easy way to get started with PHP, they can also open a security hole to your website if the PHP configuration is wrong. That article also shows you the .htaccess lines for instructing Apache to send all .htm pages through the PHP interpreter as if they had .php extensions. A file that uses PHP code must either have a .php extension or be processed by Apache as if it did. Renaming existing files can have many negative consequences, so processing .htm files as .php is the best solution. How to replace a FrontPage include with PHP1) Convert the include file from .htm to .php
Explanation:
2) Three ways to replace the webbot include with a PHP includeThey are in order of increasing complexity. Understand each one before you move on to the next. A) ...in a page that is currently Open in FrontPage Code View (manually)
B) ...in all pages that are currently Open in FrontPage Code View (automated) This is basically the same as above, except you'll be using Find and Replace.
C) ... in a closed File that is NOT open in FrontPage Code View (automated) When you use FrontPage Find and Replace on either Current Page or Open Pages, as you did above, you are searching for text as it appears in code view in the FrontPage editor. The All Pages and Selected Pages options search for text in files that are not currently open in code view. As noted earlier, FrontPage automatically changes the webbot text while it is saving the file, so the text we search for will have to be different. You can verify this easily:
To Find and Replace in one or more closed Files:Before you do a Find and Replace that will affect a large number of files, you should make a backup copy of your entire site. A mistake in a global Find and Replace can make a mess in a hurry.
Handling relative path variations: Even though the webbot code can contain relative path variations such as <!--webbot bot="Include"
U-Include="../../../../inc/example.htm" TAG="BODY" --> you can use Find and Replace to do the replacements in Files.
Cleaning up after the conversionAfter you have copied all your .htm includes to .php files and changed all your webbot includes to PHP includes, go to the Website > Hyperlinks report and check your old .htm include files to make sure there are no remaining hyperlinks pointing to them. If you won't be using them anymore, delete the ones with no hyperlinks, and investigate the ones that still have hyperlinks. Finding and Replacing large text blocksText longer than 4096 characters is too long to paste into the Find What box, and you'll need to use one of these alternative methods to search for it: 1) Delete most of the text from the include fileYou've already copied your .htm include file to a .php file. If you aren't going to be using the .htm file anymore for other purposes, it is now expendable because you're going to delete it anyway, after you've completed the conversion to PHP includes. The real goal of the Find and Replace operation is to replace the webbot tags (which refer to the old include file by name) with PHP code (which refers to the new include file by its name). The fact that the webbot code embedded in the files also happens to contain the complete text of the included file is irrelevant. Therefore, you can simply shorten that text, as follows:
2) Do the Find and Replace using Regular Expressions (regex)Regular expressions are one of the most powerful, and probably underused, features of FrontPage Find and Replace. They take some getting used to, but they can do things that can't be done any other way. Even if it takes hours of testing to develop one good regexp to automate a particular task, it is well worth it if it saves even more hours of manual labor, which it often does. Two examples will be given. The variation you need for your purposes might fall somewhere between them in complexity, but there should be enough example code here to be a good starting point and reference. Both examples operate on closed files. A) A simple example: Find and Replace all instances of 1 webbot This example uses a regular expression to search for the exact text of the opening webbot tag, the exact text of the closing webbot tag, and uses wildcards for anything that happens to be between them. Assume that the webbot code you want to replace (which you obtained in Notepad) is this, similar to the earlier example: <!--webbot bot="Include"
U-Include="inc/example.htm" TAG="BODY" startspan --> This is a regular expression that will find it: <!--webbot bot="Include" U-Include="inc/example\.htm" TAG="BODY" startspan -->.@\n(.@\n)@(.@)<!--webbot bot="Include" i-checksum="46121" endspan --> The replacement string, same as earlier, will be <?php include($_SERVER['DOCUMENT_ROOT'] . '/inc/example.php'); ?> See the walkthrough in the next section for how to interpret the parts of this regular expression. B) Complex example: Find and Replace every FrontPage include in your website with its corresponding PHP include This is the regex for the Find What box, with its parts color coded: <!--webbot bot="Include" U-Include="(\.\./)*{.@\.}htm" TAG="BODY" startspan -->.@\n(.@\n)@(.@)<!--webbot bot="Include" i-checksum="[0-9]@" endspan --> Walk-through: <!--webbot bot="Include" U-Include=" All the black text is plain text that must be matched exactly. This first plain text section begins the search for the opening webbot tag. (\.\./)* This matches any number of leading "../" that might be present in a relative path in the webbot code. In regular expressions, some characters have special meanings unless you specify that you want to search for them as literal text. The period is a regex wildcard that means "any character". We indicate that we want to search for two literal periods by preceding each ("escaping" it) with a backslash. The enclosing parentheses make the "../" term into a single group to which we can apply the * operator. The * operator means "zero or more occurrences of the group that precedes it". {.@\.}htm In this expression, we start matching the name of the include file. The first period means "any character". The @ operator, like the * operator, means zero or more occurrences, but it is not "greedy" like the * operator is: The "\." after the @ is an "escaped" literal period, meaning we want the search for "zero or more any-characters" to stop when it encounters the first actual period. But note that a period is itself an "any-character". Using the greedy * operator here would run the risk that the period might be gobbled up into the search string being accumulated by the any-chars part of the search. Using the non-greedy ".@" guarantees that the any-chars search will end as soon as it encounters a period; the ".@" will be satisfied, and the next part of the regex matching will continue on from there. The next thing it hits is the period (as just discussed), and then the literal text "htm", the file extension of your include file. The braces {} have special significance. They indicate that anything matched by the expression within them should be stored into a variable. Consider what is going to be in that variable: we've already matched and discarded any number of leading "../" of the relative path, if any. Now we've stored any number of characters after that, up to and including a trailing period, into the variable. Therefore, our variable is going to contain the entire path and name of the include file, except for its htm extension. We are going to use this later in the Replace With string, except we'll give it a "php" extension. " TAG="BODY" startspan --> The above is just more literal text to match. .@\n \n is the regex expression for a newline (CR, LF, CRLF, or whatever it is). This is the newline at the end of the line containing the opening webbot tag. However, there might be some other characters (probably whitespace) between the end of the tag and the newline, so we must allow for that with another any-chars expression. Because a newline is itself an any-char, we again use the non-greedy ".@" form. (.@\n)@ Having finished the line containing the opening webbot tag, we now move on to all the lines that follow it (which is the text from the included file), up to but not including the closing webbot tag. The ".@\n" of this line is identical to what we just saw previously, and means zero or more any-chars and then a newline: in other words, it will match "any line", even a blank one of just a newline. We put that expression in parentheses to group it, and apply the non-greedy @ operator to it to indicate "zero or more lines containing any, or no, content". Using the @ operator here is very important, as I discovered. In my first attempt, I used the asterisk operator and searched a file that contained three occurrences of my include file, like this: <!--webbot bot="Include"
U-Include="inc/example.htm" TAG="BODY" startspan --> <!--webbot bot="Include"
U-Include="inc/example.htm" TAG="BODY" startspan --> <!--webbot bot="Include"
U-Include="inc/example.htm" TAG="BODY" startspan --> The Find and Replace operation replaced all 3 occurrences of the text with just a single PHP include. That was because * is greedy: The "(.*\n)*" expression matched any line at all, even ones containing the closing webbot tag, and it didn't stop matching lines until it reached the last of the closing webbot tags. At that point, it couldn't move on with more matches until it treated the last webbot tag it saw as a match, so that's what it did. And it treated everything between the first opening webbot tag and the last closing webbot tag as one single match of the regular expression. If the above 3 blocks had not been consecutive, the Find and Replace would have gutted my source file, removing not just the webbots, but all my other content between them, too.
(.@) The closing webbot tag might be indented or have other characters in front of it, so we apply another non-greedy any-chars search. It will stop when it finds something that starts matching the next search term, which is... <!--webbot bot="Include" i-checksum=" just more literal text to match. [0-9]@ The closing webbot tag contains a numeric checksum, which varies. The [] brackets indicate a regex "charset". It will result in a successful match if the next character is one of the ones within the brackets. The 0-9 notation used here indicates a range, any digit from 0 to 9. The @ again means zero or more. Matching will stop when it encounters the first non-digit character, which is... " endspan --> the final literal text of the closing webbot. We're almost done! The Replace With text: The Replace With text also contains a regular expression: <?php include($_SERVER['DOCUMENT_ROOT'] . '/\1php'); ?> Remember that when we were Finding, we stored the full path and filename of the original include file into a variable. Now we get to use it. FrontPage refers to this variable as "\1". The \ in a Replace With indicates a variable, and the 1 is because this is the first (and only) variable we created. The black text in the line above is all literal text to insert. Note that when we come to the path, we insert a leading "/". This is because it wasn't present in the string we copied, and PHP's $_SERVER['DOCUMENT_ROOT'] variable doesn't provide it either. Then we insert the path and file name that we copied previously, including its trailing period, and we append a .php extension. The amazing result is that the above regular expression Find and Replace will replace every instance of every FrontPage webbot include with its equivalent PHP include! Before you do it:
FrontPage regular expressions tips
Questions, comments? Try the discussion forum. |
|
|
|