25 Years of Programming
An open source source for C, C++, OWL, BASIC, MDB, XLS, DOT, and more...
Home   Projects   Sitemap   Search   Blog   Forum+Chat   About Us   Privacy   Terms of Use   Feedback   FAQ   Images   Services   Payments   Humor   Music

How to replace FrontPage included content webbots with PHP includes, and a walkthrough of a Regular Expressions Find and Replace

Introduction

FrontPage included content webbots are just text in the source code of a web page. PHP includes are also just text (PHP code) in the source code, so the procedure for replacing one with the other is basically just a text search and replace.

It is made more complicated because a) the text to search for varies due to relative path references in the webbot code, and b) the text to search for varies depending on whether you are searching in a page that is Open in code view or in a File that is not open in code view.

Some additional detail is provided here in case this is the first time you've used PHP.

There are several ways to do this conversion, ranging from all-manual to all-automated. At the end of this article is a detailed walkthrough of how to use Regular Expressions to convert every webbot include on your website to its PHP equivalent with one Find and Replace operation. 

You can also convert FrontPage Shared Borders to PHP includes, but it is less confusing if you do it in two steps: first convert the Shared Borders to webbot include pages, then convert the webbot include pages to PHP includes, as described here.

Advantages of replacing webbot includes with PHP

PHP includes can do things that FrontPage includes can't:

  1. PHP can choose at serve-time which file to include (such as for ad rotation).
     
  2. PHP allows you to always include the same file by name but to periodically change what is in that file (which you can do by cron job or PHP script). The way that FrontPage implements its "includes" (discussed below) makes that impossible.
     
  3. If you plan to migrate your site from FrontPage to another web design program such as Expression Web or Dreamweaver, the transition will be easier if you switch to a new include method first. Either of those programs will successfully import FrontPage webbot includes (because all they really are is HTML comments), but they won't be includes anymore, just text in the web pages. If you want to change the contents of (what used to be) the included page, you'll have to search all the pages in your site for exactly that text. Changing the include page itself won't propagate the change throughout your site anymore because the include page is no longer involved in the process at all.

Disadvantages of replacing webbot includes with PHP

  1. FrontPage cannot show a preview of PHP-included pages. Neither can Internet Explorer or Firefox by themselves. To preview what your pages really look like, you'll need to install WAMP (Windows Apache MySQL PHP) on your local computer (or LAMP on Linux, or MAMP on Mac).
     
  2. FrontPage cannot check for broken links in PHP code. When you use webbot includes, FrontPage monitors the hyperlinks. A reference to a nonexistent file triggers a Broken Hyperlink warning on your Website > Reports page. When using PHP, you have to monitor your PHP error log.
     
  3. If you plan to migrate away from FrontPage along a Microsoft path (Expression Web, SharePoint, Windows server), consider Microsoft's ASP (Active Server Pages) as a possible alternative to PHP.

Should I use PHP or Apache Server Side Includes?

I'd strongly recommend PHP. The capabilities of SSI are very limited. PHP is an entire programming language that opens a whole new world of things you can do with your website, and yet the code you need to learn to use PHP includes is minimal, no more difficult than SSI. Starting with "includes" is an easy way to begin learning about PHP.

If you decide to use Apache SSI, the procedures below are basically the same, except that the replacement code will be different. See the Apache SSI documentation for more information.

How webbot includes work

When you add a FrontPage include webbot on your page (Insert > Web Component > Included Content > Page), FrontPage puts into your code a line that looks something like these. It uses a relative path from the document file to the file being included:

<!--webbot bot="Include" U-Include="inc/example.htm" TAG="BODY" -->
<!--webbot bot="Include" U-Include="../inc/example.htm" TAG="BODY" -->
<!--webbot bot="Include" U-Include="../../inc/example.htm" TAG="BODY" -->

These are HTML comments (because they are inside <!-- and -->). It is only FrontPage that interprets them as instructions. When you save the page, FrontPage translates the webbots to HTML as it saves the file, as follows: 

It retrieves everything between the <body></body> tags of the file being referenced and inserts it directly into the file being saved, delimited by an opening comment tag that is slightly modified from the ones shown above and a newly-added closing comment tag so that the "included" text is sandwiched between two comments. What it puts into the file looks like this:

<!--webbot bot="Include" U-Include="inc/example.htm" TAG="BODY" startspan -->

This is the text that it copied from the "include" file.

<!--webbot bot="Include" i-checksum="46121" endspan -->

You can see that text if you open the file in Notepad instead of in FrontPage.

When you open the file again in the FrontPage editor, it removes the closing comment tag and the contents of the included file, and it restores the opening webbot comment tag to the form it had when you originally inserted it. Then it displays the result to you in the Code View pane for editing. It is not showing you the exact text that is in the file when it's saved to disk. That's why you must use Notepad to view the file's true text content.

Whenever you modify and save an included page, FrontPage searches your site for pages that include it, and then updates them with the new content. This is why there are several seconds of processing whenever you save an include page.

Important things to note:

  1. FrontPage includes are static, not dynamic. By the time you upload the main page to your server, it has the "included" page already embedded in it. At serve-time, the server simply serves the page as-is, without any special processing.
     
  2. A FrontPage web page document contains different text depending on whether you view it in the FrontPage editor or in Notepad. Why this difference is so significant will become evident when we start trying to Find and Replace the text. What we search for has to be different, depending on where we're searching! 

How PHP includes work

When your server receives a request for a file, it first sends the file through the PHP interpreter, which executes any PHP commands that are embedded in the file's text. One PHP command is include(filename). PHP reads filename and inserts its entire contents into the page. Then it gives the page back to Apache, which sends it to whoever requested it.

How Apache SSI includes work

The textual format of an Apache include is similar to a FrontPage include: the include command is contained within an HTML comment tag. However, SSI includes are dynamic like PHP, not static like FrontPage: Apache fetches and inserts the included document just before the file is sent out.

Preparation before you replace FrontPage includes with PHP

Please see recommended PHP configuration settings for php.ini and .htaccess. Those settings are important because although "includes" are an easy way to get started with PHP, they can also open a security hole to your website if the PHP configuration is wrong.

That article also shows you the .htaccess lines for instructing Apache to send all .htm pages through the PHP interpreter as if they had .php extensions. A file that uses PHP code must either have a .php extension or be processed by Apache as if it did. Renaming existing files can have many negative consequences, so processing .htm files as .php is the best solution.

How to replace a FrontPage include with PHP

1) Convert the include file from .htm to .php

  1. In the FrontPage Folder List pane, click the include file to select it.
  2. Type Ctrl+C to copy it, then Ctrl+V to paste the copy as a new file.
  3. Rename the copy to be the same as the original, but with a .php extension.
  4. Open the new file to edit it in code view.
  5. Select and delete everything from the top of the file down to and including the <body> tag.
  6. Select and delete the closing </body> tag and everything below it.

Explanation:

The FrontPage webbot was only using the contents of the "body" tag, and ignoring the rest of the file. PHP will pull in the whole file, so you need to remove the extraneous elements that make this a standalone HTML page. If you don't, the main page will end up containing its own <html><head>, and <body> tags, and then the <html><head>, and <body> tags from the include file, too, resulting in a corrupt and incorrect page. Your .php file should contain only the exact text that you want to be included. It CAN, and probably will, contain HTML tags, text, and other elements of an HTML page. It just can't contain the tags that define the structure of an independent HTML page.

2) Three ways to replace the webbot include with a PHP include

They are in order of increasing complexity. Understand each one before you move on to the next.

A) ...in a page that is currently Open in FrontPage Code View (manually)

  1. Find the file containing the webbot you want to replace, open it in code view, and navigate to the webbot, which will look like this:

<!--webbot bot="Include" U-Include="inc/example.htm" TAG="BODY" -->

Underneath the webbot, type this line, but using your path and filename:

<?php include($_SERVER['DOCUMENT_ROOT'] . '/inc/example.php'); ?>

  • <?php and ?> are the opening and closing PHP tags. PHP code always goes inside them.
     
  • $_SERVER['DOCUMENT_ROOT'] is a PHP variable that will return whatever is the top level directory of your website. PHP itself is NOT installed inside the website, so the actual path it uses internally to get your public_html folder depends on your installation.

    $_SERVER['DOCUMENT_ROOT'] allows you to create paths that reliably start at your top level folder, regardless of where PHP itself actually is and regardless of whether your top folder is called public_html or something else.
     
  • The period is the PHP string concatenation operator.
     
  • The rest is the path to your include file. $_SERVER['DOCUMENT_ROOT'] does not contain a trailing "/", so you must supply the leading "/" in your path.

This is the best way to specify the path to an include file in PHP. You can also use a relative path such as "../inc/example.php", but this very quickly makes things complicated. In a webbot include, FrontPage automatically creates or adjusts relative links for you in the code of each individual page. PHP does not, so you must calculate each relative path yourself. Furthermore, relative paths in a PHP file are not resolved until after the file has been included into the main file at serve-time, so a relative path is considered relative to wherever the main file is. This makes a big difference when you use nested includes (an include file that includes another file).

  1. Delete the webbot line just above your PHP code, and save the file.

You have replaced one webbot include! To test it, publish your new .php include file and the main file that includes it, and go to the main file in your browser.

B) ...in all pages that are currently Open in FrontPage Code View (automated) 

This is basically the same as above, except you'll be using Find and Replace.

  1. Open in code view all the pages containing a webbot that you want to replace. It is best to do this only on all the files in one folder at a time. This is because the webbot code contains relative path references that could look like any of these:

<!--webbot bot="Include" U-Include="inc/example.htm" TAG="BODY" -->
<!--webbot bot="Include" U-Include="../inc/example.htm" TAG="BODY" -->
<!--webbot bot="Include" U-Include="../../inc/example.htm" TAG="BODY" -->

The actual line, whatever it is, will be the same for all files that are in the same folder, because the relative path for all of them is the same.

  1. Copy the entire line of webbot code, including the opening < and closing >.
  2. Open Find and Replace (Ctrl+H).
  3. Paste the webbot code into the Find What box.
  4. Type the PHP include line into the Replace With box. In the above example, it was:

<?php include($_SERVER['DOCUMENT_ROOT'] . '/inc/example.php'); ?>

Since you're not using relative paths in your PHP, the PHP replacement text will be the same regardless of what relative path was being used in the webbot code.

Tip: to enter a newline into the Find or Replace box, use Shift+Enter.

  1. Then execute the Find and Replace in Open pages.

C) ... in a closed File that is NOT open in FrontPage Code View (automated)

When you use FrontPage Find and Replace on either Current Page or Open Pages, as you did above, you are searching for text as it appears in code view in the FrontPage editor.

The All Pages and Selected Pages options search for text in files that are not currently open in code view. As noted earlier, FrontPage automatically changes the webbot text while it is saving the file, so the text we search for will have to be different.

You can verify this easily:

  1. In FrontPage Code View, open a page containing a webbot include.
  2. Copy the webbot code.
  3. Open the Find dialog (Ctrl+F) and do a search for the webbot code. It will be found.
  4. Now close the file.
  5. Click the file in the Folder List pane to select it.
  6. Now do a Find in Selected file(s) for the same webbot code.
  7. This time, it will not be found, because that text is no longer in the file!

To Find and Replace in one or more closed Files:

Before you do a Find and Replace that will affect a large number of files, you should make a backup copy of your entire site. A mistake in a global Find and Replace can make a mess in a hurry.

  1. Find a file containing the webbot include that you want to replace.
  2. Open the file in Notepad.
  3. Locate the code for this webbot in the file. Make sure it's the right one.
  4. Remember that the webbot in the file now consists of an opening webbot tag (which has the text "startspan" in it), then the text from the included file, then the closing webbot tag (which has the text "endspan" in it). Copy it all, from the opening < of the opening webbot tag to the closing > of the closing webbot tag. It is this text that we are finding and replacing.
  5. Close the file in Notepad.
  6. Launch Find and Replace (Ctrl+H).
  7. Paste the text you copied into the Find box. Make sure it all went in. The Find box has a limit of 4096 characters. If your selection was longer and didn't paste in its entirety, use one of the alternative methods described below in Finding and Replacing large text blocks.
  8. Enter the replacement text (your PHP code) in the Replace With box.
  9. Make sure All Pages or Selected Page(s), and Find in Source Code, are selected, and click Replace All.

Handling relative path variations:

Even though the webbot code can contain relative path variations such as

<!--webbot bot="Include" U-Include="../../../../inc/example.htm" TAG="BODY" -->
<!--webbot bot="Include" U-Include="../../../inc/example.htm" TAG="BODY" -->
<!--webbot bot="Include" U-Include="../../inc/example.htm" TAG="BODY" -->
<!--webbot bot="Include" U-Include="../inc/example.htm" TAG="BODY" -->
<!--webbot bot="Include" U-Include="inc/example.htm" TAG="BODY" -->

you can use Find and Replace to do the replacements in Files. 

  1. Find the page on your site where that particular webbot include has the longest relative path (such as the first one above).
  2. Open the file in Notepad, copy the webbot code, and do the Find and Replace in Files as described above.
  3. Then go into the Find box, remove one of the "../", and do the Find and Replace again. The replacement PHP code will be the same each time because it doesn't use relative paths.
  4. Repeat the previous step until you have done them all.

Cleaning up after the conversion

After you have copied all your .htm includes to .php files and changed all your webbot includes to PHP includes, go to the Website > Hyperlinks report and check your old .htm include files to make sure there are no remaining hyperlinks pointing to them. If you won't be using them anymore, delete the ones with no hyperlinks, and investigate the ones that still have hyperlinks.


Finding and Replacing large text blocks

Text longer than 4096 characters is too long to paste into the Find What box, and you'll need to use one of these alternative methods to search for it:

1) Delete most of the text from the include file

You've already copied your .htm include file to a .php file. If you aren't going to be using the .htm file anymore for other purposes, it is now expendable because you're going to delete it anyway, after you've completed the conversion to PHP includes.

The real goal of the Find and Replace operation is to replace the webbot tags (which refer to the old include file by name) with PHP code (which refers to the new include file by its name). The fact that the webbot code embedded in the files also happens to contain the complete text of the included file is irrelevant. Therefore, you can simply shorten that text, as follows:

  1. Open the .htm include file and delete most or all of the content between the <body> and </body> tags. Don't delete the <body> and </body> tags themselves.
  2. Close the file and wait while FrontPage updates all the files that include that file.
  3. The amount of text you need to search for in each file is now reduced to less than 4096 characters, and you can search for it normally.

2) Do the Find and Replace using Regular Expressions (regex)

Regular expressions are one of the most powerful, and probably underused, features of FrontPage Find and Replace. They take some getting used to, but they can do things that can't be done any other way. Even if it takes hours of testing to develop one good regexp to automate a particular task, it is well worth it if it saves even more hours of manual labor, which it often does.

Two examples will be given. The variation you need for your purposes might fall somewhere between them in complexity, but there should be enough example code here to be a good starting point and reference.

Both examples operate on closed files.

A) A simple example: Find and Replace all instances of 1 webbot

This example uses a regular expression to search for the exact text of the opening webbot tag, the exact text of the closing webbot tag, and uses wildcards for anything that happens to be between them. 

Assume that the webbot code you want to replace (which you obtained in Notepad) is this, similar to the earlier example:

<!--webbot bot="Include" U-Include="inc/example.htm" TAG="BODY" startspan -->

This is the text that it copied from the include file.
This is some more text, to make it an example with multiple lines.
And some more.

<!--webbot bot="Include" i-checksum="46121" endspan -->

This is a regular expression that will find it:

<!--webbot bot="Include" U-Include="inc/example\.htm" TAG="BODY" startspan -->.@\n(.@\n)@(.@)<!--webbot bot="Include" i-checksum="46121" endspan -->

The replacement string, same as earlier, will be

<?php include($_SERVER['DOCUMENT_ROOT'] . '/inc/example.php'); ?>

See the walkthrough in the next section for how to interpret the parts of this regular expression.

B) Complex example: Find and Replace every FrontPage include in your website with its corresponding PHP include

This is the regex for the Find What box, with its parts color coded:

<!--webbot bot="Include" U-Include="(\.\./)*{.@\.}htm" TAG="BODY" startspan -->.@\n(.@\n)@(.@)<!--webbot bot="Include" i-checksum="[0-9]@" endspan -->

Walk-through:

<!--webbot bot="Include" U-Include="

All the black text is plain text that must be matched exactly. This first plain text section begins the search for the opening webbot tag.

(\.\./)*

This matches any number of leading "../" that might be present in a relative path in the webbot code. In regular expressions, some characters have special meanings unless you specify that you want to search for them as literal text. The period is a regex wildcard that means "any character". We indicate that we want to search for two literal periods by preceding each ("escaping" it) with a backslash. The enclosing parentheses make the "../" term into a single group to which we can apply the * operator. The * operator means "zero or more occurrences of the group that precedes it".

{.@\.}htm

In this expression, we start matching the name of the include file. The first period means "any character". The @ operator, like the * operator, means zero or more occurrences, but it is not "greedy" like the * operator is: The "\." after the @ is an "escaped" literal period, meaning we want the search for "zero or more any-characters" to stop when it encounters the first actual period. But note that a period is itself an "any-character". Using the greedy * operator here would run the risk that the period might be gobbled up into the search string being accumulated by the any-chars part of the search. Using the non-greedy ".@" guarantees that the any-chars search will end as soon as it encounters a period; the ".@" will be satisfied, and the next part of the regex matching will continue on from there. The next thing it hits is the period (as just discussed), and then the literal text "htm", the file extension of your include file.

The braces {} have special significance. They indicate that anything matched by the expression within them should be stored into a variable. Consider what is going to be in that variable: we've already matched and discarded any number of leading "../" of the relative path, if any. Now we've stored any number of characters after that, up to and including a trailing period, into the variable. Therefore, our variable is going to contain the entire path and name of the include file, except for its htm extension. We are going to use this later in the Replace With string, except we'll give it a "php" extension.   

" TAG="BODY" startspan -->

The above is just more literal text to match.

.@\n

\n is the regex expression for a newline (CR, LF, CRLF, or whatever it is). This is the newline at the end of the line containing the opening webbot tag. However, there might be some other characters (probably whitespace) between the end of the tag and the newline, so we must allow for that with another any-chars expression. Because a newline is itself an any-char, we again use the non-greedy ".@" form.

(.@\n)@

Having finished the line containing the opening webbot tag, we now move on to all the lines that follow it (which is the text from the included file), up to but not including the closing webbot tag. The ".@\n" of this line is identical to what we just saw previously, and means zero or more any-chars and then a newline: in other words, it will match "any line", even a blank one of just a newline. We put that expression in parentheses to group it, and apply the non-greedy @ operator to it to indicate "zero or more lines containing any, or no, content".  

Using the @ operator here is very important, as I discovered. In my first attempt, I used the asterisk operator and searched a file that contained three occurrences of my include file, like this:

<!--webbot bot="Include" U-Include="inc/example.htm" TAG="BODY" startspan -->
This is some text.
<!--webbot bot="Include" i-checksum="46121" endspan -->

<!--webbot bot="Include" U-Include="inc/example.htm" TAG="BODY" startspan -->
This is some text.
<!--webbot bot="Include" i-checksum="46121" endspan -->

<!--webbot bot="Include" U-Include="inc/example.htm" TAG="BODY" startspan -->
This is some text.
<!--webbot bot="Include" i-checksum="46121" endspan -->

The Find and Replace operation replaced all 3 occurrences of the text with just a single PHP include. That was because * is greedy: The "(.*\n)*" expression matched any line at all, even ones containing the closing webbot tag, and it didn't stop matching lines until it reached the last of the closing webbot tags. At that point, it couldn't move on with more matches until it treated the last webbot tag it saw as a match, so that's what it did. And it treated everything between the first opening webbot tag and the last closing webbot tag as one single match of the regular expression. If the above 3 blocks had not been consecutive, the Find and Replace would have gutted my source file, removing not just the webbots, but all my other content between them, too.

  • Before you do regex Find and Replace, back up your entire site!
  • After you do regex Find and Replace, check the result!

(.@)

The closing webbot tag might be indented or have other characters in front of it, so we apply another non-greedy any-chars search. It will stop when it finds something that starts matching the next search term, which is...

<!--webbot bot="Include" i-checksum="

just more literal text to match.

[0-9]@

The closing webbot tag contains a numeric checksum, which varies. The [] brackets indicate a regex "charset". It will result in a successful match if the next character is one of the ones within the brackets. The 0-9 notation used here indicates a range, any digit from 0 to 9. The @ again means zero or more. Matching will stop when it encounters the first non-digit character, which is...

" endspan -->

the final literal text of the closing webbot. We're almost done!

The Replace With text:

The Replace With text also contains a regular expression:

<?php include($_SERVER['DOCUMENT_ROOT'] . '/\1php'); ?>

Remember that when we were Finding, we stored the full path and filename of the original include file into a variable. Now we get to use it. FrontPage refers to this variable as "\1". The \ in a Replace With indicates a variable, and the 1 is because this is the first (and only) variable we created. 

The black text in the line above is all literal text to insert. Note that when we come to the path, we insert a leading "/". This is because it wasn't present in the string we copied, and PHP's $_SERVER['DOCUMENT_ROOT'] variable doesn't provide it either. Then we insert the path and file name that we copied previously, including its trailing period, and we append a .php extension.

The amazing result is that the above regular expression Find and Replace will replace every instance of every FrontPage webbot include with its equivalent PHP include! 

Before you do it:

  1. Make sure that's really what you want,
  2. Back up your entire website,
  3. Test it on one or a few pages to make sure the result is correct.

FrontPage regular expressions tips

  • In FrontPage 2003, the Find and Replace box has some help with regular expressions. Next to the "Find what:" box, click on the top (right-pointing) arrow to open a list of available operators.
     
  • For descriptions of additional regex operators not in that help box, open Help (F1 or Help > Microsoft Office FrontPage Help. Then search on: regular. One of the results is "Regular expressions". Click on that.
     
  • FrontPage regex operators, although useful, are less full-featured and somewhat nonstandard compared to those available in Unix and Linux, PHP and Perl, and even in other Microsoft products such as the Visual C++ editor.
     
  • In some situations, rather than searching for text that does contain a given string, it's necessary to find text that does not contain a string or where one string is not followed by a particular second string. These are called negative lookahead assertions, and FrontPage regular expressions can do them.

    As a very practical example, let's say you have a website that has been hit with an iframe injection attack, such that it has many pages containing malicious iframes. You want to find them. The problem is that you can't just search for <iframe tags because your pages also have legitimate iframes from amazon and google. How can you tell the malicious iframes apart from the legitimate ones? You need a regular expression that will find iframe tags but ignore the ones from those two legitimate sites. Here it is:

    <iframe~(.@(amazon|google)).@</iframe>

    <iframe begins the matching process. The tilde operator ~ (called "Prevent match" in FrontPage help) starts a negative lookahead assertion. It is like an "inner" regular expression. The text following <iframe is tested against this regular expression to see if it matches. It uses a non-greedy any-chars search .@ followed by a parenthesized "alternation" which will match either "amazon" or "google". The tilde operator causes this inner regular expression to work backwards: if it succeeds, it causes the outer regular expression to fail, so these harmless iframes won't appear in the search results; if it fails, it causes the outer regex to succeed, so any iframe not from those two sites will appear in the search results. The remaining part of the regex matches everything up to the first closing </iframe> tag encountered, so that if you're doing a Search and Replace, the whole iframe will be matched and can be easily deleted. 

Questions, comments? Try the discussion forum.


 

Valid HTML 4.01 Transitional Valid CSS
Yahoo! Search
Search the web Search this site
View content labeling at ICRA.