|
25 Years of Programming
An open source source for C, C++, OWL, BASIC, MDB, XLS, DOT, and more... |
Home Projects Up Sitemap Search Blog Forum+Chat About Us Privacy Terms of Use Feedback FAQ Images Services Payments Humor Music |
Online calculator to count occurrences of search engine query strings in website access logs
InstructionsPaste lines from your website's HTTP access log into the box above. Click Analyze. The lines you copy and paste should look like the following from a typical CLF (Combined Log Format) log. Be sure to copy from the correct log type (an HTTP access log, not an FTP log). This example is a referral from Google, with the query underlined. Those are the search terms that brought the user to the page, which is what this calculator extracts and reports: 111.222.333.444 - - [12/Jan/2011:13:27:05 -0700] "GET /blog/20070705.htm HTTP/1.1" 200 79205 "http://www.google.com/search?sourceid=chrome&ie=UTF-8&q=server+hacking+protection" "Mozilla/5.0 (Windows NT 5.1; rv:2.0) Gecko/20100101 Firefox/4.0" For where to find your HTTP logs and how to download and unzip them, see here. The only limit to the number of log lines you can paste is whatever JavaScript and your browser can handle4. The more lines you can paste (and the longer the time period it covers), the more comprehensive the report will be. The report is output into the same box where you pasted the input data. After a few status lines at the top, it is in TAB-separated multi-column format that should paste easily into any spreadsheet program. For easier viewing of the text, you can copy it into a text editor, or, in Firefox 4, use the textarea's drag handle to expand it to fit the text. DescriptionPopular web log analyzer programs provide reports of the top queries that people used for finding "your site" at search engines. Sometimes it is more useful to know ALL the queries that people used, and not just for "your site" in general, but for each individual page. That's what this online calculator is for. For every page in your site, it tells you ALL the search queries that brought people to it. That can help with search engine optimization. It tells you whether your page is ranking for broad queries (one or two search words) or mostly for long-tail queries (multiple search terms, a narrowly focused query). When your page ranks well for broad queries, it generally brings more traffic than if it only ranks for longer, more specific, queries. The calculator extracts the lines where the referer field contains a search engine query used by a visitor to find the page, decodes the search query to make it readable, and creates a multi-column TAB-separated sorted output:
The word and character counts can be used for creating alternative sort orders in a spreadsheet. Notes
Bug reports, feature suggestions, comments, questions can be submitted in the discussion forum. Search Engine Queries tabulator - Perl scriptHTTP logs tend to be very large, and it is only a small percentage of the lines that contain the information this calculator needs. The limitation of doing the task in a browser is that so many lines must be pasted into the box even though most of them are discarded. That's no problem for this Perl script that you run from the command line. It reads one or more log files and produces the same output as the JavaScript calculator. On my PC, it processes a 130MB log file in about 40 seconds. In the first column of this example report, the pages are sorted ascending alphabetically (only one page shown below). The most frequent search queries for each page are listed first, and within each count the queries are sorted alphabetically: PAGEREQUEST SEARCHCOUNT WORDS CHARS SEARCHSTRING /blog/20061231.htm 3 2 19 frontpage .htaccess /blog/20061231.htm 2 4 30 correct htaccess for frontpage /blog/20061231.htm 2 5 35 front page extensions .htacess file /blog/20061231.htm 2 2 18 htaccess frontpage /blog/20061231.htm 2 3 21 mod rewrite frontpage /blog/20061231.htm 1 3 20 .htaccess front page /blog/20061231.htm 1 2 19 .htaccess frontpage /blog/20061231.htm 1 3 30 .htaccess frontpage extensions /blog/20061231.htm 1 4 29 allow htaccess with frontpage /blog/20061231.htm 1 6 38 block address in htaccess in frontpage /blog/20061231.htm 1 8 51 can i still use .htaccess with frontpage ext /blog/20061231.htm 1 4 36 cpanel htaccess frontpage extensions /blog/20061231.htm 1 5 33 front page extensions + redirects /blog/20061231.htm 1 3 22 frontpage and htaccess /blog/20061231.htm 1 3 30 frontpage extensions .htaccess /blog/20061231.htm 1 6 32 how to use frontpage with cpanel /blog/20061231.htm 1 4 26 htaccess deny ip frontpage /blog/20061231.htm 1 4 34 htaccess with frontpage extensions You must have the Perl language installed on your PC to run this script. I developed and tested this script with Perl 5.10.0 in Ubuntu Linux and with ActivePerl 5.10.1 in Windows. Example usage: perl -WT SearchEngineQueryStrings.pl [options] [logfile1...] [> outfile]
Options:
-r|--regex="REGEXP" Report page requests matching this regex. Default=".*"
Example (pages ending in .htm or .html): "\.html?$"
-c|--case-sensitive Makes the -r option case-sensitive
so that \.htm$ matches .htm files but not .HTM files.
-m|--mincount=INT Report queries occurring at least this many times. Default=1
-v|--verbose Show column headings.
-h|--help Show this help and exit.
It uses this Perl module: use Getopt::Long; The program's code structure and formatting is more like typical C++ formatting than typical Perl formatting.
|
||||||||||||
|
|
|
|
|
Copyright ©2011 Steven Whitney. Last modified Mon 04/11/2011 03:37:49 -0700. |
||