25 Years of Programming
An open source source for C, C++, OWL, BASIC, MDB, XLS, DOT, and more...
Home   Projects   Up   Sitemap   Search   Blog   Forum+Chat   About Us   Privacy   Terms of Use   Feedback   FAQ   Images   Services   Payments   Humor   Music

Intro  Learning  Patterns  Real Life  Genetics  Classifiers  Biology  Neural Nets  Connectionism  Life  AI

Essays on Complex Systems Part 2 - Patterns and Pattern Recognition

Learning as Pattern Recognition and Pattern Recognition as Data Filtering

Pattern recognition is the most fundamental mental process, and probably is therefore the most basic instinctive drive, because abstracting pattern data from the environment is required for any organism to function at all.

Pattern recognition is the ability to filter a noisy signal.

Raw sensory data has no cohesion or utility. It's just a collection of points. A raw signal is filtered when it has produced a coherent signal that does have utility. The resulting signal doesn't need to have any resemblance to the original signal; it only needs to have resulted from its presence.

What part of the signal constitutes noise and what part is useful depend on the circumstances at the time, and so does the method used for filtering. (Reflexes the most primitive, and highest priority, filtering method? -- Implies an "interrupt-driven" type system. Also, several filters may be operating at once on the same signal - parallel processing (each filter "tuned" to a different aspect of the signal -- see diagram in "Neural Nets" paper file) - all generating interrupts as their outputs; highest priority receives the attention: "classifier" bidding system.)

(Unrelated to this particular discussion, also consider conditioned responses. Their existence may predispose to a mindset quick to recognize or consider causation.)

The instinct to recognize patterns and infer causation probably accounts for the universal existence of superstitions. It may be that you don't learn about causation. It is an instinct. The lack of causation may be the thing one must learn over time.

An intelligent system also should be continuously filtering. At any given moment, its outputs are those consistent with the current best guess as to what the presenting situation is. As the situation is more clear, the responses become more refined. E.g., seeing someone from far away who you think might be someone you know. Your responses will change as you get nearer and it becomes more clear whether that's the person or not.

All the filtering and processing boils down to one thing: the one resultant action out of all possible actions. That is, the system as a whole acts as a giant filter, turning the raw data into a single action. The giant filter consists of many smaller filters.

The process of filtering seems related to that of setting up a figure/ground relationship, so related principles and ideas might be useful.

The stimulus strings in a classifier system are filters, and demonstrate the essence of what filtering has to do. Each rule states "If such and such is the situation, then this would be a good thing to do." It functions as a filter because it only becomes activated when the input data contains a certain pattern.

How can you "notice" patterns that you aren't already trained to look for specifically?

 Since pattern recognition is filtering, this implies the ability to generate new filters.

 Thus, the process of generating such filters might be considered in light of the "elemental unit" model of behavior. That is, to have the best ability to generate flexible or novel filters, you must have a library of very basic filtering techniques that can be combined to build more sophisticated filters, and thereby identify less obvious, more deeply hidden patterns.

(This will be key, and most complicated, in a stock market analysis system, since you don't know ahead of time which variables might be relevant. Instead of programming in a bunch of ratios you know about, you must have a general-purpose ratio-and-formula generating routine that can manipulate the data in any possible novel way and then test the results for relevance. And for technical analysis, do you use 50-day moving averages, or 65, or 120, or what? You must be able to manipulate the available data in any possible manner. Random note: A possible methodology for a stock market program: implement it as a Classifier system. Incoming data is preprocessed by independent modules that perform their calculations and then post bulletin board messages about their results. E.g. was the last price move up or down? If up, post "00000001". If down, post "00000000". It's a status message. The main classifier set may or may not have a rule that pays any attention to it. Ideally, you would have a system for generating the novel calculations, creating the modules to perform them, and for assigning unique messages for them to post as their status reports.)

When building a library of basic filters, first try to develop a comprehensive set of pattern types (spatial, sequential, repetitive, analogous. There may be a number of ways they can be classed.) Then creating the filters becomes creating routines to trap these patterns.

Many patterns, of all types, are actually fairly simple, and are perceived when the data is highly derived. E.g. "alternating". Patterns of alternating "anythings" are readily perceived in any data set in any medium. Often, to look for a pattern, you don't need to describe a thing in much detail at all. All you must do is identify it uniquely enough that you recognize another occurrence of it. In the most simple example, you only need to identify it as a "thing", noting the simple fact that it is something that is there.

A pattern exists when you can extract a subset of the full dataset, and apply it multiple times to recreate the full set. This is an operational definition that you can use to search for patterns, and you can first filter, transform, chunk, condense, or summarize the data any way you want.

What is a pattern type? Characteristics that can be used to describe any distribution of any collection of things.

Look for patterns in what media? (Any data stream.)

  1. Visual field (bitmap data)
  2. Sound (digital audio data)
  3. Language (text) (letter, phoneme, word, phrase, subject matter)
  4. Physical occurrences, events (correlations, inference of causation, etc.)
  5. Similarities of events (patterns in what happens)
  6. Temporal (patterns of timing)
  7. Any 1-dimensional numeric stream
  8. Any 2-dimensional field or 3-dimensional space
  9. Any set or collection of traits, features, or other data points (not necessarily numeric)

Basic Pattern Types (Godel Escher Bach:p344 has interesting related discussion):

(These should be as general as possible, applicable to any data stream.

Make a spreadsheet with the above media types on one axis, the below pattern types on the other, and in the slots examples of their occurrences or description of what it is that forms the patterns.

Also for the pattern types develop pattern templates that an AI system might use to recognize each pattern. E.g. "Alternating" = "01010101".

The goal is to develop a routine for each type of data stream that can summarize it with this same kind of basic representation, so it can be compared with these generic templates.

This seems to me to be one of the most basic and most important types of processing that our senses are always doing, and should be a high priority.

All incoming data streams are initially reduced to these basic representations and scanned for recognized patterns. If a pattern is discerned that is currently of high utility, we pay more attention to that data stream.

Minor note: Our attention may also be attracted if the data stream is particularly novel, that is, if our pattern matching failed particularly badly on it.

Also, part of the learning process might be described as a process of building more complex pattern matching systems. That is, you start with these most basic pattern types, then for a particular subject, or in a particular context, or whatever, you become sensitive to increasingly more complex variations of the basic types, so that the pattern matching apparatus in that particular area becomes more intricate and developed.

As an example, for a foreign language (or even first language), at first you may do well to recognize the sounds of individual letters, so your pattern matching is very crude.

After a while, you become sensitive to whole words -- that is, you recognize and process them without having to break them down into their component sounds.

As an analogy, instead of "letter-watching" robots monitoring the data stream and each raising a flag when the letter it's watching for is encountered, there are now a larger number of more highly specialized "word-watching" robots that do the same thing, and which make the letter-watching robots obsolete.

Later, you become sensitive to entire phrases. This process is the same as that of "chunking".)

  • Alternating
  • Groupings, Clusters
  • dense/sparse
  • Increasing/Decreasing
  • linear/exponential/etc.
  • Discontinuities
    • between 2 uniform areas (edge) (in vision could be horizontal, vertical, or diagonal either direction)
    • Central "on" area surrounded by "off"
    • Central "off" area surrounded by "on"
  • Uniform
  • High or Low contrast
  • Smooth or Rough
  • Regular/Irregular
  • Widely/Narrowly spaced
  • Brownian, 1/f, 1/f squared, fractal, etc.
  • Simple vs. Compound
  • Proportioned (e.g. 1:1, 2:1, etc.)
  • Ordinary vs. Unusual
  • Symmetrical vs. unsymmetrical (or symmetry types, radial, bilateral, horizontal, vertical)

Instead of trying to have one pattern-recognizing system recognize all possible patterns, you should have specialized subsystems, each trained on only one pattern. That way, the same data run through all the networks might generate multiple match messages.

[From a PBS show: Categorizing, organizing, and manipulating data to facilitate the identification of patterns in it is called "data mining", popularized by IBM in a series of ads. By now, probably a fully formed branch of mathematics. Should be particularly useful in that it automates some stages of scientific/statistical inquiry. Raw data is everywhere. The scientist's problem is identifying "what's there that's worth looking at; what clues are there to avenues that might lead to useful analyses?" Related to my idea for a statistical calculator to evolve relevant equations (see stats.doc).]

Create various pattern-recognizing neurons as described in GEB. For "on/off-center" neurons, each will have to monitor an area of 9 pixels. Then their outputs will have to be sent to "complex" cells (or small networks). Study carefully the patterns on p.345, and what the activity levels would be of on/off-center neurons focused on them. E.g. the half-field split light/dark would produce large areas of no activity (no contrast to report), but along the split there would be great activity, and the distribution of that activity would be picked up by the next stage of processing (?).

Some initial layers will have to be carefully wired: At the first level there must be 1 neuron per pixel. Otherwise there's no way to get the pixel data in. But beyond that level, you can create specialized "complex" neurons by how you wire their inputs. On/off center neurons would have 9 inputs (the center and surrounding areas). Other neurons might receive input from all the pixels (whole screen), or half (half screen). Some of these neurons, additionally, may not be the standard "Node" structs (in Neural.cpp), but will have to be highly specialized to do their computations. Remember that the fields covered by neurons can overlap. You don't have to exactly tile the field with them.

Start out with 1-dimensional linear patterns: lines, dashed or dotted lines, etc.

  • allow reducing a DIB of arbitrary size to one with fixed dimensions for comparison to the templates. Interesting way to resolve problems of scale might be to reduce and enlarge one of the images. If either transformation results in a better statistical match (as below), then keep going in that direction. Same method for focusing on some region of the test image. Move around, going towards any region that produces an increasingly good match. (Better idea is using scalable metafile objects, or scalable math equations describing object outlines; idea is on paper somewhere.)
  • Develop a statistical method to measure how similar 2 bitmaps are to each other. Assume black and white images, but leave open for colors (approximate color matching). Goal is to be able to take a bitmap, and use the correlation to decide "how well does this picture match, by various measures, this other picture of an idealized tree"? Parameters to consider: Angularity, Average color of entire field (1 mean color), number of colors used (in palette), Color frequency distribution (spectral analysis), Linearity, Centralization, (many of the various pattern types enumerated above),
  • CorelDraw has a number of pattern and texture bitmaps on CD, and many more in the Fill tool, which might be appropriated for use as template patterns, as could many of the clipart drawings.

An important problem for filters is that often the raw data isn't sufficient for particular purposes. You may need to preprocess it first, so that the filter operates on data derived from the raw data. For that you need operators, and for operators to work, they must know what kind of data is there (that is, what kind of transformations make sense to perform). The ideal situation would be to have your pattern types stored in the most generic possible form. Then incoming data would be transformed in whatever appropriate way is required to make it able to match the patterns you are on the lookout for. (It is reasonable to expect that you must have transformation methods for every data type your program handles, just as people have unique data-handling methods -- entire brain areas -- for managing each type of sensory input, visual, auditory, etc.)

In a classifier system, the stimulus strings are the filters, and are genetically mixed, so you do have the ability to create new ones.

  • Related to the above is the problem of allowing variables to exist or be created that you don't envision from the start, and how to save and access them without having explicitly programmed them in. One possibility is to set aside an area of fixed size in memory, place your struct into it, so that you can access the "members", but then allow the data within the coincident area of memory to be genetically scrambled. That is, allow access to this memory by two methods, ordered (the struct) and random (as simply an array of bytes that can be mixed up). And you could also allow random access to the data block: Go look at whatever is at location X, and interpret it as an integer, or a double, or whatever, regardless of how it was stored there, and even regardless of whether the starting byte you choose is the real start boundary of the data you stored there. All the members would have to be numeric (no strings, char arrays, or pointers!). This idea might have a lot of problems, and is here only to remember it. It is a possible way to create the kind of internal "program within the program" that Adapt.cpp has, but without having to write an entire new functional subunit language, as adapt has. But its huge flexibility might have too little structure to ever expect anything useful to evolve within it within any reasonable time. A somewhat more structured alternative might be to use arrays of unions as the variables. Each union would have a flag element to indicate which type is currently stored there, and thus how it can be used. This would also allow storing pointers (such as char*) there. (This is heading towards something like what Microsoft's Variant type probably is.)

Applicability of Perl regular expressions

1-21-2010

A method of abstracting data that would allow using Perl regular expressions to identify patterns in it. 

  1. Use edge detection (more generally, boundary detection) to bundle subsets of the raw data into the "objects" (not necessarily physical, but logical) that will be of interest. Often, the boundaries are already known at least at one level of description, if not at others. For example, in a bitmap, each pixel is an already-isolated object of interest, while the larger-scale objects/features in the image (the things that it is a picture of) are not and must be edge-detected.
  2. Analyze each object to construct a list of its attributes, the things it "is, has, or does". Using the conventions of my WTalk project, these would be the supersets of which it is a member, its adjectival attributes, its capabilities, any other measurements of interest.
  3. Create a coding scheme amenable to analysis with regular expressions, by mapping each attribute to a 1-character (ideally) code such as the letter "a".
  4. The attributes of neighboring objects can now be encoded into strings of standardized context-independent characters. A line of alternating red and blue pixels would encode as "abababab". The same encoding describes any two alternating musical notes, people standing in line who are short/tall/short/tall, or male/female/male/female.   
  5. Use regular expressions to test pattern-matching assertions about the object collection. The regex for alternating would be "(ab){2,}". Matching can be done even in the presence of noise. "(.*?a.*?b){2,}" detects alternating a's and b's even if there are intervening irrelevant characters. Analysis (such as counting) of the intervening characters gives an indication of how purely or strongly the pattern matches the "alternating" prototype, which is something humans detect instinctively.
  6. The Perl regex themselves are things that can potentially be mutated and evolved with genetic algorithms, according to which ones prove useful.

----------

  • A learning system is more powerful if it can:
  1. Address (track) a larger number of the variables relevant to its survival. I.e. has an extensive sensory system. Or has a larger data set, i.e. is larger.
  2. Extract most efficiently from the data stream the patterns that are relevant.
  3. "Appreciate" finer distinctions in the data stream. This requires the good sensory system, requires flexible filtering, and probably requires a large memory.

 

Valid HTML 4.01 Transitional Valid CSS
Yahoo! Search
Search the web Search this site
View content labeling at ICRA.