25 Years of Programming
An open source source for C, C++, OWL, BASIC, MDB, XLS, DOT, and more...
Home   Projects   Sitemap   Search   Blog   Forum+Chat   About Us   Privacy   Terms of Use   Feedback   FAQ   Images   Services   Ads   Donate   Humor

HTML:

Regular Expressions:

How to convert Amazon.com affiliate ads into valid HTML 4.01 Transitional

Did you try to validate the HTML on your web pages at W3C and get dozens or hundreds of errors because of your Amazon.com affiliate advertising code? It's fairly easy to fix the errors and make your code validate as HTML 4.01 Transitional.

It cannot validate as HTML 4.01 Strict because the Amazon code uses <iframe>, which is not legal in Strict. If you are currently using Strict, revise your DOCTYPE to declare the document as HTML 4.01 Transitional.

The purpose of declaring a DOCTYPE is to provide browsers with a predefined standard that tells them how your code should be interpreted and therefore how the page should be displayed. Which standard you use is not important to browsers or search engines. You need to use whatever standard renders the page properly.

The important thing is to use some standard. Asking which standard is "best" is like asking whether your page should be in English or in French. It depends on who the page is for, but if you choose English, it should be written in proper English. If you choose French, it should be written in proper French.

If you have a page that you cannot validate to any standard, it is better to omit the DOCTYPE altogether, rather than declare a DOCTYPE and then fail to meet it. When you don't declare a DOCTYPE, browsers go into "quirks mode" and attempt to render the page the best they can, which is usually reasonably good. When you declare a DOCTYPE, it says, "Don't use quirks mode," even if quirks mode might help render the page better.

Why Amazon HTML code doesn't validate

There are two aspects of the Amazon code that must be fixed:

  1. Unencoded ampersands (&) must be changed to HTML entities (&amp;).
  2. The border="0" property must be removed from <iframe> tags.

The rest of this article gives code examples and shows how to use Find and Replace to revise the code efficiently. I used FrontPage 2003, but other design programs such as Expression Web and Dreamweaver have similar Search and Replace methods.

Amazon Product Links and Search Box Links

When you get your code from the Associate Central Build Links interface, it looks something like this:

<iframe src="http://rcm.amazon.com/e/cm?t=YourAmazonID &o=1&p=8&l=as1&asins=B000SZZPHA &fc1=000000&IS2=1&lt1=_top&lc1=0000FF&bc1=000000&bg1=FFFFFF&f=ifr" style="width:120px;height:240px;" scrolling="no" marginwidth="0" marginheight="0" frameborder="0"></iframe>

At the W3C validator, every unencoded ampersand generates 3 errors and an informational message:

  • cannot generate system identifier for general entity "o"
  • general entity "o" not defined and no default entity
  • reference to entity "o" for which no system identifier could be generated
  • entity was defined here.

The above code for one Amazon ad has 11 ampersands and generates 44 messages!

The solution for the above example is to replace every & with &amp;

Search and Replace throughout the site

It's more difficult to do this replacement in every ad throughout a website, however. It's basically just a Find and Replace operation, but:

  • You can't replace unconditionally all & with &amp; because you have to avoid changing other HTML entities. You must not convert &nbsp; to &amp;nbsp; or &quot; to &amp;quot;.
  • Amazon Text Links ads contain a mixture of unencoded entities and encoded ones. You can't replace all the & with &amp; because that will turn all the &amp; into &amp;amp;, which is wrong.

Fully automated

Before you do a global Find and Replace that will have effects throughout your site, always make a backup copy of your entire website in case something goes wrong.

If each of your product link ads is stored in its own include file (which is a desirable organization for other reasons), the task is easy:

  1. Somehow get all the include files grouped together so your Find and Replace will operate only on those files and no others. In FrontPage, you can either select those files in the Folder List Pane so you can Find and Replace in "Selected page(s)", or open all of them in Code View so you can do your Find and Replace in "Open page(s)".
  2. Use a preliminary Find operation to make sure none of the files already contains the string &amp; (so you don't convert them to &amp;amp;).
  3. Use a preliminary Find operation to verify that there are 11 ampersands in each file. (FrontPage reports the number of instances found in each file.)
  4. If everything looks ok, do a Find and Replace on all files. Find: &  Replace: &amp;
  5. Visually inspect a few files to make sure it worked properly. If you're unsure, try putting one of the new ads on a page and validate it at W3C.
  6. If it worked, publish your revised files to their proper locations.

I was able to make 1700 replacements of & to &amp; with a single Find and Replace, which I was glad not to have to do manually.

Partially automated

If your situation is more complicated, you should still be able to automate it at least partially. You just need to make sure to replace everything that needs replacing, and not corrupt anything else as a side effect.

If you've been consistent in how you created your product links at Amazon, most of the ad code will be the same in every ad. Text that needs revision and is likely to be the same in all your ads is shown below in green. Your ad code will not be exactly the same as this:

<iframe src="http://rcm.amazon.com/e/cm?t=YourAmazonID &o=1&p=8&l=as1&asins=B000SZZPHA &fc1=000000&IS2=1&lt1=_top&lc1=0000FF&bc1=000000&bg1=FFFFFF&f=ifr" style="width:120px;height:240px;" scrolling="no" marginwidth="0" marginheight="0" frameborder="0"></iframe>

It is extremely unlikely that these two strings of consecutive characters appear anywhere except your Amazon ads, so it's possible to find and replace as follows. As described in the previous section (but not here), do some "test Finds" before doing the final Find and Replace:

1) Find (the original string with ampersands):

&o=1&p=8&l=as1&asins=

Replace With (each ampersand replaced by an HTML entity):

&amp;o=1&amp;p=8&amp;l=as1&amp;asins=

2) Find (the original string with ampersands):

&fc1=000000&IS2=1&lt1=_top&lc1=0000FF&bc1=000000&bg1=FFFFFF&f=ifr

Replace With (each ampersand replaced by an HTML entity):

&amp;fc1=000000&amp;IS2=1&amp;lt1=_top&amp;lc1=0000FF&amp;bc1=000000&amp; bg1=FFFFFF&amp;f=ifr

Using a Regular Expressions Find and Replace

Even if you have to spend a few hours learning to work with regular expressions, they are so powerful and time saving that you will eventually get all your time back, plus you'll know something new.

The following is an example of a site-wide regex Find and Replace for Amazon code. Your actual search and replace strings will probably not be the same, and if there are variations in your ads (properties of the iframe, for example), you'll need to add more regex clauses to accommodate the variations. The code that makes these regular expressions is highlighted:

Find:

<iframe src="http\://rcm\.amazon\.com/e/cm\?t=YourAmazonID &o=1&p=8&l=as1&asins={[A-Za-z0-9]@} &fc1=000000&IS2=1&lt1=_top&lc1=0000FF&bc1=000000&bg1=FFFFFF&f=ifr" style="width\:120px;height\:240px;" scrolling="no" marginwidth="0" marginheight="0" frameborder="0"></iframe>

\ means "interpret the next character as a literal instead of a special regex character". The {([A-Za-z0-9]@)} stores the product ID (which is different for each ad) to a variable so we can use it in the Replace operation. (I used FrontPage regex syntax.)

Replace With:

<iframe src="http://rcm.amazon.com/e/cm?t=YourAmazonID &amp;o=1&amp;p=8&amp;l=as1&amp;asins=\1 &amp;fc1=000000&amp;IS2=1&amp;lt1=_top&amp;lc1=0000FF&amp;bc1=000000 &amp;bg1=FFFFFF&amp;f=ifr" style="width:120px;height:240px;" scrolling="no" marginwidth="0" marginheight="0" frameborder="0"> </iframe>

\1 inserts the product ID at the appropriate location. 

Recommended Product Links

After converting all the ampersands as described above, also remove the text border="0", whose function is duplicated anyway by style="border:none;".

The Amazon-provided code for Recommended Product Links looks like this:

<iframe src="http://rcm.amazon.com/e/cm?t=YourAmazonID &o=1&p=14&l=st1&mode=books&search=neural%20networks &fc1=000000&lt1=_blank&lc1=3366FF&bg1=FFFFFF&f=ifr" marginwidth="0" marginheight="0" width="160" height="600" border="0" frameborder="0" style="border:none;" scrolling="no"></iframe>

Text Links

Text Links code has a mix of unencoded (&) and encoded (&amp;) ampersands. Be careful to replace only the unencoded ones. Also create a value for alt, something like alt="Amazon.com":

<a href="http://www.amazon.com/gp/search?ie=UTF8&keywords=artificial%20 intelligence&tag=YourAmazonID&index=books&linkCode=ur2&camp=1789 &creative=9325">Books on Artificial Intelligence</a><img src="http://www.assoc-amazon.com/e/ir?t=YourAmazon-ID&amp;l= ur2&amp;o=1" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" />

If your Text Links are placed on individual pages rather than stored in include files, you probably will have to do the ampersand replacements manually, and cannot automate the process. Here is an automation procedure, but note step 1!

  1. Make sure that the files you are about to operate on do not have any HTML entities in them at all except for &amp;.
  2. Change all &amp; to a string you don't use anywhere else such as @@@#####@@@ to get them out of the way temporarily.
  3. Change all & to &amp;
  4. Change all @@@#####@@@ back to &amp;
  5. If you accidentally changed some existing HTML entities that you hadn't noticed, such as &nbsp; (which would now be &amp;nbsp;), you can do a Find and Replace to change &amp;nbsp; back to &nbsp;.

Omakase and Context Links

These both use JavaScript and don't require conversion.

Testing

Make sure your ads function the same as before and watch your stats to make sure clicks are properly recorded.


These articles discuss regular expressions in more detail:


Questions, comments, assistance in the Forum.

 

 

Valid HTML 4.01 Transitional Valid CSS
View content labeling at ICRA.
Copyright ©2008 Steven Whitney. Last modified 06/30/2008.