25 Years of Programming
An open source source for C, C++, OWL, BASIC, MDB, XLS, DOT, and more...
Home   Projects   Up   Sitemap   Search   Blog   Forum+Chat   About Us   Privacy   Terms of Use   Feedback   FAQ   Images   Services   Payments   Humor   Music

WinHelp for the Natural Language Processing chatbot program

This Rich Text Format WTalk.rtf file is the source for the WinHelp file for the large WTalk.cpp Borland C++ ObjectWindows natural language processing chatbot project. This is the Help file that is launched when the user presses the F1 key.

The text gives information about the project and how it works that supplements the information on the project's home page (see link above). In addition, this help file was written at a time when I understood the workings of the program better than I do now.

This RTF document is included in the project's zip download.

WTalk.rtf is based on the Helpfile.dot Microsoft Word template.


# $ K +WTalk Help Contents

OverviewHID_OVERVIEW

File Formats (.PFS and .IFS)HID_FILEFORMATS
File Menu CommandsHID_FILEMENUCOMMANDS
Rules Menu CommandsHID_RULESMENUCOMMANDS
Parse Menu CommandsHID_PARSEMENUCOMMANDS
Rule Editor DialogHID_RULEEDITORDIALOG
Add Word to Dictionary DialogHID_ADDWORDDIALOG
References and Further ReadingHID_FURTHERREADING

# $ K +Overview

WTALK is an attempt at a program that acquires and uses knowledge about the world to carry on an "interesting" conversation with you.

Most of the program's data is stored in a Microsoft Access database called WTALK.MDB. The program uses Windows Dynamic Data Exchange (DDE) to manage its database. You can also make manual changes to the database at any time, even while the WTALK program is running.

The underlying philosophy of WTALK is that it is much more important for an artificial intelligence program to learn than it is for it to have clever algorithms, or at least that any energy spent on clever algorithms should be on ones that allow it to learn! Clever techniques require constant revision by the programmer, who winds up being the one doing the learning.

WTALK has the following learning methods:

1. Rule Bid Adjustments = Learning From Feedback

WTALK uses both a "lottery" system and a "bucket brigade" system as developed by John Holland and described in the book Complexity, The Emerging Science at the Edge of Order and Chaos.

The program has a number of rules that suggest what it should do under various circumstances, but what it actually does at any given time is determined by a lottery in which the strongest rules have the best chance of winning.

When a rule is invoked and leads to good results, its bid value is increased. When a rule wins, it also pays a portion of its own bid value to the rule that won the immediately preceding lottery on the grounds that its action may have helped get the current rule invoked. Because it may have contributed to the most recent positive outcome, it should receive some of the reward, too.

2. Grammatically Correct Phrase Structures = Learning By Example

When informed that a particular phrase of a sentence is grammatically correct, WTALK infers that the structure of the phrase is generally good, and saves the structure in a list of known-good ones so it can recognize it and accept it as good when encountered again.

3. Information Content of Words and Phrases = Learning By Example, Context, and Inference

WTALK started with a set of words known to reliably carry, identify, or flag a particular type of information. For example, the words "during" or "when" are reliably associated with a time period, information type "WHEN". When these words are used in a phrase, that entire phrase is marked as information type WHEN, as the most likely correct assumption. Then other individual words in the phrase containing the flag word are also updated in the database with information type WHEN.

In this manner, when a phrase is identified and processed, infotypes carried by its individual words get propagated throughout the phrase, and then propagated back to individual words, and thus the infotypes of words get propagated throughout the system.

As one example, given the phrase "to the park", it knows that in this usage "to" indicates WHERE, which is propagated to the noun "park". After the database update, it subsequently knows that this definition of "park" carries the infotype WHERE, and should therefore be considered a place. In this manner, it is able to extract certain information about a word by examining how it is used, rather than by being explicitly told or having to ask.

In addition to normal conversation, you can enter these commands:

BYE                   End the session.
LIST or FACTS     List facts database.
TOKS                 Show tokens from latest sentence.
MACHINE            Show state machine contents.
STATES              Show state variables (StateArray).

If you ask questions in the following formats, the program will try to find factual answers to them:

WHO (verb phrase)?
WHAT DID (noun phrase) DO?
WHY DID (noun phrase) (verb phrase)?

Scoring of the Program's Reply (0-5):

These buttons enter your text into the program and simultaneously allow you to rate how the program's response affected your interest in the conversation, for any reason. The program can't see or hear your reactions, so this is its substitute. Enter your text with the button that indicates how you score its previous response.

If its reply was coherent, or on the topic, or appropriate, or you found it funny for any reason, you should give it a good score. It uses your scores to decide how, when, or whether to use that response again.

Guidelines:

0 Worst. Use only to indicate unintelligible nonsense.
1 Bad
2 Poor
3 Average or Neutral
4 Good
5 Excellent

# $ K + Files and File Formats

Files Used with the Program

All data files other than the .MDB database are text files, and can be edited with any text editor.

ANSYES.DAT, ANSNO.DAT, ANSIND.DAT

Words and phrases that, if found within the first few words of a sentence, probably indicate that the sentence is an answer to a question, the effective answer being yes, no, or indeterminate. Entries are ordered from shortest to longest so that longer ones (probably more accurate) override any determinations made on the basis of the shorter ones being found.

Not sure this is the best scheme, and it may be that all the entries should be stored in one file, along with the type of answer that each entry indicates, so the entire set can be ordered from least reliable to most. But this would prevent keeping them in WordLists.

YNQSTART.DAT

Words that commonly begin questions whose answer should be Yes or No.

INFINIVB.DAT

Verbs often or usually followed by an infinitive, helps decide when "to" is a VBHELP instead of a PREP.

PRONOUNS.DAT

Pronouns. This, that, these, and those may be more difficult to handle because they also serve as DET (determiners, articles), and these, or some of the pronouns generally, may have to be broken out into their own files.

DIALOG.USR

A log of your entire conversation, labeled as to who said what.

RULES.DAT

The set of rules used by the program to determine its actions and construct its replies.

Numbering conventions (each number shown starts a section):

0 -      Not used. Don't use.
1 -      Fundamental rules essential to the operation of the system. These must all be locked, and must never be deleted from the file.
100 -   Rules (usually locked) that respond (with stock answers) to key words or phrases in the user's input.
400 -   Generic conversation starters, usually invoked when user appears to be unresponsive.
1000 -  This block was once assigned to a set of conversation starters that is no longer used. It is available for reassignment.
3000 -  New user-created rules.

# $ K + File Menu Commands

Open

Allows you to specify one or more files to read from disk as though you typed the input yourself. However, the program does not respond to each file line as it is read.

Exit

Exits the program.

# $ K + Rules Menu Commands

Create

If there was something in what you said that you think the program should have noticed and responded to, you can create a new rule that will catch and respond to that situation the next time. The rule specifies what circumstances should trigger it, and what its response should be.

Create invokes the Rule EditorHID_RULEEDITORDIALOG so you can create the new rule. For convenience, the editor is initialized to contain a rule, in valid format, that is based on your most recent reply. You can modify it however you want.

Edit

Allows you to edit a rule whose id number you specify. The default number is the id of the most recent lottery winner. When you are through reviewing rules, press Cancel in the number entry dialog box.

Delete a rule by changing both its Bid and Locked to 0. It will be automatically deleted.

# $ K + Parse Menu Commands

Is Perfect

Informs the program that its grammatical parsing and information content interpretation of your input (displayed in the Parse Window) is exactly correct. You should always invoke Parse Is Perfect when it is. This reinforces the rules that produced the correct parsing and performs some important updates to database information.

Revise

Allows you to Task-Switch to a Microsoft Access database form to revise the program's grammatical parsing of your input. It will use your revised information to infer new rules about various aspects of the words and phrases in the sentence. The program will not, however, re-analyze and re-parse your previous input, so any changes will not be reflected in the Parse Window unless you re-enter your sentence.

You should use Revise whenever you have the time. This improves its parsing ability by teaching it new phrase constructions.

Tips on Revising

1. In a phrase such as "the man who had a dog", include the subordinate clause (starting with "who") as part of the NOUNPHR it is adjacent to. This forces these wordtype sequences into the LegalPhrases table as parts of NOUNPHR sequences, where they are useful, instead of as SUBORD, where they are not. The main use of marking a subordinate clause is to exclude it during parsing, so the elements of the main sentence can be extracted correctly.

# $ K + Rule Editor Dialog

The easiest type of Rule to create is one that searches for and responds to a key word in your input. As a keyword, choose a word or phrase from your sentence that the program should watch for. It should be a significant one. When it sees this word or phrase, it will consider using the response. (It might not actually use it each time. That is determined by a lottery after it considers multiple factors.)

Type the new reply exactly as it should appear, including upper/lower case.

For now, if the rule read from your edited text is not valid, you lose it (you lose its text). Just in case, you can protect your edits by Edit|Copy to clipboard before closing the dialog. Then, if anything goes wrong, you can paste the text back on re-entry. You cannot close the dialog with OK until the text contains a valid rule.

Operators:

LT               1    Less Than
LE               2    Less Than or Equal
EQ              3    Equals
GE              4    Greater Than or Equal
GT              5    Greater Than
NE              6    Not Equal
CONTAINS    7    Contains
DNC            8     Does Not Contain

You can create a modified COPY of an existing rule:

  1. Use Rules|Edit (to specify the id# of the rule to copy)
  2. In the Rule Editor, change its text, AND change the ID to an unused #. (You must make sure the # is not a duplicate.)

# $ K + Add Word to Dictionary Dialog

(This dialog box is no longer used.)

If the program encounters a word it does not know, you will be asked to Task-Switch to Microsoft Access to fill in the blanks in a table. Use one of these methods to Task-Switch,

1. Alt+TAB: Hold down the ALT key and then press TAB repeatedly to cycle through a list of running applications; release both keys when you reach the one you want.

2. Ctrl+ESC: Press the CTRL and ESC keys simultaneously; then choose the application you want from the task list dialog box.

# $ K + For Further Reading

Books:

Complexity, The Emerging Science at the Edge of Order and Chaos, M. Mitchell Waldrop, Touchstone, 1992.

Describes "classifier systems" developed by John Holland, and other rule-based decision-making systems. In deciding what to do in a given situation, WTalk uses a rule-based approach in the spirit of a classifier system, but in which the rules themselves are coded and analyzed in English instead of in a code peculiar to the program.

Chaos Under Control, The Art and Science of Complexity, David Peak and Michael Frame, W. H. Freeman, 1994.

Brief discussion of classifier systems.

Artificial Intelligence Using C, Herbert Schildt, Osborne McGraw-Hill, 1987.

Demonstrates with program listings a number of artificial intelligence methods, including some for natural language processing. Has a short Eliza-like conversational program, and in other program listings demonstrates state-machine, context-free recursive-descent, and noise-disposal NLP parsers.

Magazine Articles:

Other Computer Programs:

Eliza, Joseph Weisenbaum, M.I.T.

 

Valid HTML 4.01 Transitional Valid CSS
Yahoo! Search
Search the web Search this site
View content labeling at ICRA.