25 Years of Programming
An open source source for C, C++, OWL, BASIC, MDB, XLS, DOT, and more...
Home   Projects   Up   Sitemap   Search   Blog   Forum+Chat   About Us   Privacy   Terms of Use   Feedback   FAQ   Images   Services   Ads   Donate   Humor

Natural language parsing (NLP) chatbot program - data files

These are the few data files that haven't yet been transferred to the WTalk.mdb database, probably because at program startup they are auto-loaded into WordLists, a class that hasn't been modified to obtain its data from the database instead of the text files.

A couple of these files show the "80/20 rule" that I sometimes applied while developing this program. This is the old adage that says that in any job the first 20% of the effort (or time, or money...) gets 80% of the job done.

There are some grammatical constructions that are difficult to deal with using formal rules. Instead of inventing overly contrived grammatical rules to deal with them, why not just handle them on a case by case basis? Intuitively, the answer is "because there are way too many of them to deal with individually". But it turns out that a relatively small number of special cases (the 20% of the effort) can handle the majority of these constructions that arise in normal conversation (the 80% of the job).

EXPAND.DAT

This is a list of common contractions. The program expands them when they are encountered. Some contractions are ambiguous, and in those cases the program can make a mistake. Only one expansion is allowed for each contraction.

"aren't","are not"
"can't","can not"
"couldn't","could not"
"didn't","did not"
"doesn't","does not"
"don't","do not"
"hadn't","had not"
"haven't","have not"
"he'd","he would"
"he'll","he will"
"he's","he is"
"here's","here is"
"how's","how is"
"i'd","I would"
"i'll","I will"
"i'm","I am"
"i've","I have"
"isn't","is not"
"it'd","it would"
"it'll","it will"
"it's","it is"
"let's","let us"
"o'clock","o'clock"
"she'd","she would"
"she'll","she will"
"she's","she is"
"shouldn't","should not"
"that's","that is"
"there's","there is"
"they'd","they would"
"they'll","they will"
"they're","they are"
"they've","they have"
"wasn't","was not"
"we'd","we would"
"we'll","we will"
"we're","we are"
"weren't","were not"
"what's","what is"
"when's","when is"
"where's","where is"
"who's","who is"
"why's","why is"
"won't","will not"
"wouldn't","would not"
"you'd","you would"
"you'll","you will"
"you're","you are"
"you've","you have"

INFINIVB.DAT

This is a list of verbs that are frequently followed by the infinitive form of another verb, such as: wanted to write. This is one of those situations where you'd think there would be a zillion of them, but instead this relatively short list accounts for the majority of occurrences of this construction in normal conversation, and any others can easily be added.

Identifying this construction is helpful because it eliminates the danger of misinterpreting the "to" as a preposition, especially if the infinitive it precedes has the potential of being interpreted as a noun.

"wanted"
"want"
"wanting"
"wants"
"try"
"tries"
"tried"
"trying"
"have"
"has"
"had"
"having"
"fight"
"fights"
"fought"
"fighting"
"attempt"
"attempts"
"attempted"
"attempting"
"wish"
"wishes"
"wished"
"wishing"
"begin"
"begins"
"began"
"begun"
"beginning"
"start"
"starts"
"started"
"starting"
"like"
"likes"
"liked"

ISWORDS.DAT

This is a list of various past, present, and future forms of "to be" and some of its synonyms. The program uses these to help identify sentences of the "equivalency" type, in which the clause that follows the verb serves to describe the subject of the sentence, which in the future will be useful for building the list of attributes that is created for that subject in the database. As with the infinivb's above, there are still some to add to this list.

"am"
"are"
"be"
"became"
"become"
"get"
"gets"
"got"
"is"
"was"
"were"

PRONOUNS.DAT

Self-explanatory, a list of pronouns.

"I"
"you"
"he"
"she"
"it"
"we"
"they"
"me"
"him"
"her"
"us"
"them"
"those"
"these"
"that"
"this"

REVERSE.DAT

A list of context reversal words that the program uses when echoing your sentence back to you.  If you said, "I am...", the program says, "You are..."

"am","are"
"are","am"
"i","you"
"me","you"
"my","your"
"myself","yourself"
"ourselves","yourselves"
"us","you"
"was","were"
"we","you"
"you","I"
"your","my"
"yourself","myself"

STARTERS.DAT

Some generic stock conversation starters that the program can fall back on if the user isn't saying much. (Each of these appears to have a rule in RULES.DAT. Not sure why this text file for them still exists.)

So, what have you been doing since the last time we talked?
Do you have anything in particular you would like to talk about?
Tell me some more about yourself.
How have you been sleeping?
Do you dream a lot?
Is there a dream you would like to tell me about?
What are you aware of right now at this moment?
Tell me something about your history, if you like.
Tell me a bit about your family, if you would like to.
Tell me about some things you like to do.
What sorts of things do you find particularly satisfying?

YNQSTART.DAT

If the user's input begins with one of these words, it probably indicates that the user is asking a type of question that solicits a yes or no answer.

"do"
"would"
"can"
"did"
"will"
"should"
"does"
"could"

STATES.DAT

A snapshot of the program's environmental state variables at a given moment. The format is [variable name], relation (3 means equals), [value].

11
[usrname],3,[Steve]
[usrdelay],3,[00004]
[usrscore],3,[00003]
[sprev],3,[that is enough .]
[s],3,[bye]
[slenchars],3,[00003]
[slenwords],3,[00002]
[sentype],3,[00001]
[factcount],3,[00031]
[pgmprev],3,[You said that is enough .   Please continue.]
[pgmlast],3,[Bye.]

These 12 lines translate as:

  1. There are 11 states recorded.
  2. The user's name is Steve.
  3. The user took 4 seconds to make the last reply.
  4. The user scored the program's last output as 3 (Average).
  5. The user's previous input was "That is enough."
  6. The user's current input is "bye".
  7. The user's current input is 3 characters long.
  8. The user's current input is 2 words long (including the program-added period not shown).
  9. The user's current input is of sentence type 1 (statement).
  10. There are 31 facts built from the user's input in this conversation.
  11. The previous thing the program said was, "You said that is enough. Please continue."
  12. The last thing the program said was, "Bye."

RULES.DAT

These are a couple of entries from the text file that stores the rules.

3203
The highest number ever assigned to a Rule.
176
The number of existing Rules.
1 712 125 1
Rule #1, its bid strength and history.
1
There is 1 condition:
[factcount],5,[00000]
If there more than 0 facts accumulated...
1
There is 1 action constituting the response:
[askaboutpreviousstatement]
Run the subroutine with this name.

 

3 260 8 1
Rule #3 and its bid strength and history
1
1 condition:
[sentype],3,[00002]
If the user's input was type 2 (a question)...
1
Take this 1 action:
[If I were asking you that, what would you tell me?]
Output this literal text.

 

 

Valid HTML 4.01 Transitional Valid CSS
View content labeling at ICRA.
Copyright ©2007 Steven Whitney. Last modified 09/25/2007.