|
25 Years of Programming
An open source source for C, C++, OWL, BASIC, MDB, XLS, DOT, and more... |
Home Projects Up Sitemap Search Blog Forum+Chat About Us Privacy Terms of Use Feedback FAQ Images Services Ads Donate Humor |
|
|
Natural language parsing (NLP) chatbot program - data filesThese are the few data files that haven't yet been transferred to the WTalk.mdb database, probably because at program startup they are auto-loaded into WordLists, a class that hasn't been modified to obtain its data from the database instead of the text files. A couple of these files show the "80/20 rule" that I sometimes applied while developing this program. This is the old adage that says that in any job the first 20% of the effort (or time, or money...) gets 80% of the job done. There are some grammatical constructions that are difficult to deal with using formal rules. Instead of inventing overly contrived grammatical rules to deal with them, why not just handle them on a case by case basis? Intuitively, the answer is "because there are way too many of them to deal with individually". But it turns out that a relatively small number of special cases (the 20% of the effort) can handle the majority of these constructions that arise in normal conversation (the 80% of the job). EXPAND.DATThis is a list of common contractions. The program expands them when they are encountered. Some contractions are ambiguous, and in those cases the program can make a mistake. Only one expansion is allowed for each contraction. "aren't","are not" "can't","can not" "couldn't","could not" "didn't","did not" "doesn't","does not" "don't","do not" "hadn't","had not" "haven't","have not" "he'd","he would" "he'll","he will" "he's","he is" "here's","here is" "how's","how is" "i'd","I would" "i'll","I will" "i'm","I am" "i've","I have" "isn't","is not" "it'd","it would" "it'll","it will" "it's","it is" "let's","let us" "o'clock","o'clock" "she'd","she would" "she'll","she will" "she's","she is" "shouldn't","should not" "that's","that is" "there's","there is" "they'd","they would" "they'll","they will" "they're","they are" "they've","they have" "wasn't","was not" "we'd","we would" "we'll","we will" "we're","we are" "weren't","were not" "what's","what is" "when's","when is" "where's","where is" "who's","who is" "why's","why is" "won't","will not" "wouldn't","would not" "you'd","you would" "you'll","you will" "you're","you are" "you've","you have" INFINIVB.DATThis is a list of verbs that are frequently followed by the infinitive form of another verb, such as: wanted to write. This is one of those situations where you'd think there would be a zillion of them, but instead this relatively short list accounts for the majority of occurrences of this construction in normal conversation, and any others can easily be added. Identifying this construction is helpful because it eliminates the danger of misinterpreting the "to" as a preposition, especially if the infinitive it precedes has the potential of being interpreted as a noun. "wanted" "want" "wanting" "wants" "try" "tries" "tried" "trying" "have" "has" "had" "having" "fight" "fights" "fought" "fighting" "attempt" "attempts" "attempted" "attempting" "wish" "wishes" "wished" "wishing" "begin" "begins" "began" "begun" "beginning" "start" "starts" "started" "starting" "like" "likes" "liked" ISWORDS.DATThis is a list of various past, present, and future forms of "to be" and some of its synonyms. The program uses these to help identify sentences of the "equivalency" type, in which the clause that follows the verb serves to describe the subject of the sentence, which in the future will be useful for building the list of attributes that is created for that subject in the database. As with the infinivb's above, there are still some to add to this list. "am" "are" "be" "became" "become" "get" "gets" "got" "is" "was" "were" PRONOUNS.DATSelf-explanatory, a list of pronouns. "I" "you" "he" "she" "it" "we" "they" "me" "him" "her" "us" "them" "those" "these" "that" "this" REVERSE.DATA list of context reversal words that the program uses when echoing your sentence back to you. If you said, "I am...", the program says, "You are..." "am","are" "are","am" "i","you" "me","you" "my","your" "myself","yourself" "ourselves","yourselves" "us","you" "was","were" "we","you" "you","I" "your","my" "yourself","myself" STARTERS.DATSome generic stock conversation starters that the program can fall back on if the user isn't saying much. (Each of these appears to have a rule in RULES.DAT. Not sure why this text file for them still exists.) So, what have you been doing since the last time we talked? Do you have anything in particular you would like to talk about? Tell me some more about yourself. How have you been sleeping? Do you dream a lot? Is there a dream you would like to tell me about? What are you aware of right now at this moment? Tell me something about your history, if you like. Tell me a bit about your family, if you would like to. Tell me about some things you like to do. What sorts of things do you find particularly satisfying? YNQSTART.DATIf the user's input begins with one of these words, it probably indicates that the user is asking a type of question that solicits a yes or no answer. "do" "would" "can" "did" "will" "should" "does" "could" STATES.DATA snapshot of the program's environmental state variables at a given moment. The format is [variable name], relation (3 means equals), [value]. 11 [usrname],3,[Steve] [usrdelay],3,[00004] [usrscore],3,[00003] [sprev],3,[that is enough .] [s],3,[bye] [slenchars],3,[00003] [slenwords],3,[00002] [sentype],3,[00001] [factcount],3,[00031] [pgmprev],3,[You said that is enough . Please continue.] [pgmlast],3,[Bye.] These 12 lines translate as:
RULES.DATThese are a couple of entries from the text file that stores the rules.
|
|
|
|
|
|
|
|