|
25 Years of Programming
An open source source for C, C++, OWL, BASIC, MDB, XLS, DOT, and more... |
Home Projects Up Sitemap Search Blog Forum+Chat About Us Privacy Terms of Use Feedback FAQ Images Services Payments Humor Music |
Online calculator to count repeated words and phrases in text or web page HTML source codeInstructionsEnter text in the box and click Submit. The analysis is displayed in the text box.
Phrase Counter backgroundThe program began as a small part of an artificial intelligence project, to explore whether analyzing repeated words and phrases might help summarize text to get a sense of what it is about. It seems to be useful for that, at least as a quick and easy first step in the analysis. It might prove even more useful when combined with analysis of the word meanings, relationships, and grammatical constructions, tasks that are more complex and difficult and that I made some progress with in my "WTalk.cpp" chatterbot project. Using Phrase Counter for search engine optimizationOne day, I realized that a list of word and phrase repetitions has a more immediately practical application, analysis of keyword density on a web page. "Keywords" are the words and phrases that search engines extract from a web page to determine what it is about (sound familiar?), and to determine how well the content of a page matches a particular search query that their users might enter, how "relevant" the page is for the query. If the query is about "search engine optimization" or "SEO", the search engine will look in its index for pages mentioning those words. A page mentioning those terms many times is probably about that topic = highly relevant. A page that only mentions them once or twice might be mostly about some other topic, with only tangential references to SEO = less relevant. That would make it sound as though the best way to make your page relevant to a particular search query is to use your targeted search terms as many times as humanly possible in your text, right? Maybe, but it is called "keyword stuffing", and it is not the way to achieve top rankings! Although it can make your page look highly relevant, it is also an indicator to search engines that your page is probably low quality, causing it to drop far down in search results. There are usually many relevant pages for a search query. Determining which ones to show first in the results is based on factors other than keywords. It has to do with how good an authority the search engine thinks the page is likely to be for the topic. A highly relevant but poor quality page might not appear in the search results at all. The equation formula that a search engine uses for estimating the likely quality of a page, and therefore how high to place it in search results, is known as its ranking algorithm. The algorithms of different search engines are probably very different from each other. How much (or whether) a particular SEO technique will affect rankings at one or more search engines is something that can only be determined by experimentation. Adjusting keyword density (number of repetitions of keywords and phrases) to an optimum level (not too much, not too little) is a technique used by SEO consultants, to make a page look relevant for targeted search queries while trying to avoid penalties for keyword stuffing. Some webmasters in competitive market niches adjust their keyword densities to match those of competitor sites that rank above them in search results. I have even seen recommendations online that a web page should consist of a specific percentage of targeted keywords. The exact percentage varies over time, according to what people think is working best at Google. Using Phrase Counter to improve AdSense targetingAdSense advertising is "contextually targeted", which means that AdSense ads on a page are supposed to be about the same topics that the page is about. Which means that Google must determine, in a way not as specific as matching text against a search query, what the page is generally "about". Word and phrase repetitions must play a part in that determination. It has happened several times that I've published a new web page, only to discover that the AdSense targeting was poor, not relevant to the topic at all. It was easy to see where AdSense was going wrong, usually by misinterpreting the meaning or context of a repeated word. More than once, I was able to fix the problem, and trigger correctly targeted ads, by changing one repeated word on the page, or by changing every occurrence of an ambiguous word to a two-word phrase that made the meaning clear and that could not be misinterpreted. I still have some pages where AdSense ads are not about the topics I'd expect, and not what I'd prefer them to be. When I ran the text of one of the pages through Phrase Counter, I was surprised to see that the most often repeated words were not what I expected, and not tightly focused on the topic of the page. A human reader couldn't mistake what the page is about, but it appears that the method AdSense uses for determining the page topic might be misinterpreting. I'm not in general much of an advocate for the importance of keyword density, but in this case it does appear that the lack of it might be creating confusion, and an adjustment might be called for. If the full text of a page can't be keyword-adjusted for better AdSense targeting, there is always AdSense section targeting. It allows you to manually tag the text that you want AdSense to either emphasize, or ignore, for the purpose of targeting ads. One factor that can affect page topic interpretation for AdSense purposes more than for search engine purposes is keyword "heat". One or two occurrences of a high-heat keyword can potentially trigger off-topic AdSense ads. That is, some keywords seem to be so important to AdSense that a single occurrence can outrank other words and phrases on the page even if they are repeated. Notes about Phrase Counter
Please submit bug reports, feature suggestions, other feedback in the Discussion Forum. |
|
|
|
|
|
Copyright ©2012 Steven Whitney. Last modified Thu 04/26/2012 05:23:39 -0700. |
||