|
25 Years of Programming
An open source source for C, C++, OWL, BASIC, MDB, XLS, DOT, and more... |
Home Projects Up Sitemap Search Blog Forum+Chat About Us Privacy Terms of Use Feedback FAQ Images Services Payments Humor Music |
|
Intro Learning Patterns Real Life Genetics Classifiers Biology Neural Nets Connectionism Life AI Essays on Complex Systems Part 7 - Notes and plans from the artificial neural network C++ programsWhile building WNeural.cpp version 2.0, the following site had helpful plain-language descriptions of function procedures: http://www.dontveter.com/bpr/public2.html Technical Notes and CommentsNetwork architectureRecurrent network (only)I have the impression that a plain feedforward network, which WNeural.cpp now is, is nothing but a fancy, albeit nonlinear, deterministic calculator. Although such an architecture makes the network able to be analyzed and optimized using an amazing array of advanced mathematical methods, and allows it to be put to use in a wide variety of practical applications, it takes it in the wrong direction if what you want to do is model the machinery of a real brain. A real brain is a recurrent network, and its output at any given moment is a sample taken in the midst of its ongoing calculations, and its operation is not deterministic. It does not reliably produce a consistent result when given the same inputs. So although the deterministic ANN models have allowed mathematicians to tame the beast and put it to use, it makes the system incapable of spontaneity and creativity, much of which is based on the ability to make mistakes and be inconsistent. Much human innovation is caused by mistakes (mutations!) in performing a procedure. A brain-modeling ANN should be unreliable and inconsistent. Some of the optimizing considerations of backprop networks (e.g. numerical condition) relate to obtaining the "correct" answer. They might have little applicability in a recurrent brain-modeling network. There should be no local minima, since there is no single correct answer and no error calculation or resolution. SynaptogenesisRegarding how to create new synapses, synapse formation is the result of bidirectional signaling between two neurons. You should have some measure which causes two nodes to want to connect to each other. You would think that a cell that was being overstimulated (overworked) would desire new inhibitory connections (because an overuse of its firing machinery causes depletion of resources, the lack of which could cause inhibitory-synapse generating chemicals to be produced?, and an understimulated one new excitatory ones (buildup of unused calcium or whatever causes excitatory receptors to be created?). [Idea: call a variable "calcium", depleted when a node fires; it has an optimum level between X and Y. Maybe simulate refractory period by disallow firing if the level is too low? If the level is chronically too low, create a new synapse to a nearby node that has the opposite problem? (Remember, this is for a recurrent network, not backprop.]. It's realistic. Bored neurons do create new excitatory synapses. It would also be desirable to have a measure that causes connections to break. Do overstimulated neurons disengage themselves? Also try to maintain a consistent Excitatory/Inhibitory synapse balance, maybe a 50/50 ratio. BackpropagationIs it unrealistic to make a "forward pass" through the network? Real neurons are pumps operating completely asynchronously. A neuron receives injections of chemicals on no fixed schedule, and if at any moment its threshold is exceeded, it fires. [Is there any mechanism that causes its buildup of chemicals to decay, or does it just build up until it fires? The fact that they do fire continuously argues for the latter; high inputs just make it build up faster and fire more often. So should a neuron's "calcium" level be retained from one pass to the next and not fall to 0 until it fires? But if so, then multiple forward passes from the same input data will occasionally generate a new and different answer when sluggish nodes finally accumulate enough to fire. This is called a "leaky integrator" (Grossberg), and looks too hard to implement. It also introduces behavior similar to loops - there's no endpoint in the calculation.] Node connections and strengthA node's output should never be negative, so don't use -1 for off. A real neuron has a firing rate that has to be modeled somehow. (And the rate can only be positive.) A "stronger" connection is simply the result of more (duplicate) connections between the same 2 points, and the changes that result from learning (these stronger connections) are changes in the expression of genes in the neurons. (Eric Kandel, Charlie Rose 8/7/2001). WEIGHT models the number of synapses between any 2 neurons, not the firing rate. The logistic function models the firing rate, realistically. It is never 0, never Off. Network wiringBrain neuron connections aren't permanent. They connect and disconnect, possibly many times a second. (Research at Princeton.) What triggers connection and disconnection? This is like my ripping out connections that aren't helping. (The following paragraph related to Version 1.0, which used random connection between nodes and allowed new connections to be created during runtime. It is no longer applicable to Version 2.0, which is a fully connected regularly wired net.) The program should be much more aggressive changing connections: it should readily create new ones, maybe after each unsuccessful pass, and after a set is solved it should rip out connections and nodes that were never active (in any of the training sets). It should try to reduce itself to the minimum size required for its problem set, but readily expand when necessary. Adding nodes or connections is called "Constructive Learning" or "Growing Networks". Ideas for Changes4/23/07: WNeural is way too sensitive to the initial random weights and other values. Sometimes it cycles rapidly. Other times the display freezes quickly into one pattern. It IS cycling; it just can't solve the Case. These are probably local minima. Would it help to expand the values of the random weights? Maybe I just didn't let it run long enough. The problem is that the net's functioning and backprop are too deterministic and goal-oriented-optimized. If it could just wander around the problem space when it gets stuck, it would eventually stumble onto a good path (like my robot program whose wanderings became by design more erratic and random when more conservative path-finding methods failed to get around an obstacle). Typical of AI practice, smart people develop smart algorithms and thereby do prove how intelligent they are but then they don't end up with artificial intelligence. They're smart but their program isn't. The program is deprived of the opportunity to be intelligent because its own intelligence was stamped out of it. Intelligence HAS to be evolutionary. There is no other way. There must be randomness in the methods (to provide the creativity), and there must be a lock-in method of preserving the beneficial random mutations (which is learning). As I recall, apart from the messy complications of training sets and other peripherals, the basic network is pretty simple. Could it be converted to PHP for server use? Yes: FANN (at SourceForge) is a PHP (and C and other) ANN that may be worth looking at. (http://leenissen.dk/fann/) I could model an emotional state (or anything else) by using an extra input node encoding the state. (Happy, angry, etc.), which would affect the net's behavior. However, what I'd really want is a happiness value for the network (part of the network class, but not an i-node input value) that indicates how well the net thinks it's functioning. For example, if happiness is very low, it could start ripping apart synapses, or creating new nodes, or whatever. But maybe this should be part of the classifier system managing the network. In the previous version (Version 1), a sort of "happiness" value was hard-coded as the loop counter - if it took too long, it got frustrated and started creating new connections and nodes. See the other todo's in Complex.doc (here), in Patterns section. Adapt the Environment class from Classify.cpp to generate random problem sets for the net to solve. Image recognitionFor more complicated graphic images, use Corel clip art reduced to 100 x 100 pixels in black & white (color works best, gray scale next best, the method of conversion to B&W works well, but produces a blotchy image, not the silhouette I want.). For any complex graphic, you must be able to zoom into or out of the image, or any portion of it, in the attempt to trigger a response, because the original training was probably done at one size only. Training at multiple sizes should take a very long time, since each size is a completely different training set. It should be easier to allow resizing the input. Need a way to measure the "confidence" (or strength) of a response. Can move the focus region, or continue zooming in or out, as long as the confidence increases. The Corel library has many examples that would be interesting to try (see turned-down pages). For example, lots of different horses that are simple, basic, and similar enough that if the technique works at all, it should work on them. Then also somewhat similar pictures of dogs, pigs, etc., but whose necks and legs are never as long as the horses, etc. Training would consist of presenting multiple horse pictures and looking a number as output for each. Then try presenting horse pictures it hasn't seen, and the other animals, and pictures totally dissimilar. The goal would be for it to show greater likelihood to respond "horse" to anything with that general shape, and not for the totally dissimilar pictures. (It is interesting to note how easy it is to identify the pages of grouped images in the Corel book. Items in various categories do have very similar characteristics, either in coloring, angularity or roundedness of the images, etc. E.g. plants, fish, birds, boats, airplanes. But even a page of foods is easily identified as that, even before you identify any single item on the page.) Before you start running off to figure out, and code for later use, what these similarities are, remember that the whole point of using the neural net is to simply make this information available to the net, and see if it can group the images the same way we do. For more complex images, you can't treat the entire field of view as a single bitmap that demands a single identification. You need to be able to pick out pieces of the image independently and identify them. Making sense of the image involves figuring out how the pieces relate to each other. For example, a picture of a cyclist isn't perceived at once as a cyclist. You see a human form and a bicycle form, and know it's a cyclist. And each of those 2 forms in turn is composed of independent elements that serve to identify them: arms, legs, head, wheels, frame, etc. So ideally you should start parsing an image by trying to isolate its elements, even if they're run together or overlapping. Can use similarities of color, texture, outlines. Then for each element, see what identifications or associations it triggers. Then see if those identified elements are commonly grouped into a single known object. This would be a task for parallel networks, each trained on a different feature, as described elsewhere. Try to think up a way to use bucket brigade instead of backprop; it would better implement the idea of reinforcement rather than error correction, which seems artificial. However, reinforcement doesn't seem to be a relevant concept for neural nets, although it was Hebb's original idea of how a real brain worked! Could conceive of either weight or output (?) as a fixed quantity of something (as ions) that are transferred between nodes. A node could either transfer some of its weight back to the node that led to it each time that path is taken, or it could deplete its store when another node gets input from it. Could also apportion the output among the output nodes, such that the total output == output. (Currently full output goes to each node.) This would incorporate the idea of ions breaking into smaller groups as they travel down different paths, but there's only a fixed number of them. (Don't know whether their effect at their destination is proportional to their numbers, however.) This idea would also introduce the idea of a flow of something through the network, which should make backwards connections once again possible. Could try a companion variable (stored in Synapse with weight) that specifies the reluctance of that weight to be changed, to implement the idea of positive reinforcement and possibly the idea below of washing the net with fixer. (Note this is already being done by neural net researchers, and is called momentum.) The more times the weight participates in successful outcomes (solutions), the more resistant the weight should be to change. Maybe make it a percentage modifier of the basic computation: inlist[i].strength += changeability * error * inlist[i].from->output; Starts at 100% in a new net, then gets reduced to a limit of zero if weight is so successful that it shouldn't be changed -- any modifications must be done elsewhere or new nodes or connections are needed in the net. A visual afterimage seems to persist until you notice it, however long that takes, and then disappears. Is there some kind of acknowledgement system, such that when a neuron receives input, it turns the donor neuron off? Even if not, it's an interesting idea for an input mechanism (which is the reason for this note). ConceptualizationsA neural net node is a transistor (which has nonlinear characteristics) and each connection has a variable resistor in it (also nonlinear). Is the bias a bias voltage? An ANN is basically a digital computer simulation of an analog computer. The lack of backpropagation in the brain is given as evidence that the BP model is unrealistic, but what if my "fixer" model below is the backprop? A set of actions leads to an outcome that produces an emotional response which leads to the production of hormones. These are the fixer chemicals. Positive ones strengthen recent interneuron connections and negative ones weaken or destroy them. The amount of hormone (strength of the response) would constitute the calculated amount of the error to backpropagate. There HAS to be a feedback mechanism in the system, and this seems the most plausible one. Talk.doc: "The Computer's Experience" helps to envision how a neural net's inputs combine and interact to produce a state, and it could generate an output word that describes it. Inputs to a node are something like an election combined with a Classifier lottery.
After a "model" network is trained, its learning is finished, but a real life learning entity must have self-training procedures so learning can continue. Actually, it will always continue to generate some response to every input. As noted elsewhere, it's the simulated environment's responsibility to respond to the network's actions. And there must be, built into the network (or any learning entity), methods for determining whether the environment's response constituted reward or punishment: "Am I better off now for having done X?" In biology, we have a lot of systems that do this, i.e. that answer the question "Are things going in a good direction or a bad direction?": pain sensors, hunger pangs, thirst, ...), and a good many have nothing to do with learning or intelligence OR emergence; they're hard-wired in. But we also have a pleasure center in the brain, some of the responses of which probably are learned and which probably also develop due to the center's connection to the other more basic systems (pain, etc.). In an artificial learning system, if analogs to the most basic systems (pain, etc., reflexive or non-cognitive systems) are desired, they have to be programmed, because they aren't emergent. An analog to the pleasure center, or to any cognitive aspect of reward or punishment probably (fortunately!) should not be programmed in, because it is emergent. Whether it can emerge or not may be related to how much access the entity has to information about its own internal state (its self-awareness), and that raises the messy problem of how much self-awareness does need to be programmed in (perhaps no more than the reflexive systems). But leaving that aside for now (acknowledging that some of this awareness does have to be programmed), I suspect that an analog of the pleasure center, and thus a cognitive internal value system will develop. The problem of self-awareness may not be too difficult: pain, for example, could simply be a numeric register that can be altered by the "environment" object, (and perhaps even by the network itself), and whose output is sent into one of the network's input nodes, where it becomes part of the network's "knowledge" about the overall state of the system (including itself). Or in a classifier, information about "pain" would simply exist as a bulletin board message. How are paths reinforced?
The "current state" presumably contains within it a residual memory of the original problem, memory of key components of the solution steps taken, and the final system state.
This entire state is washed with the fixer. But if the entire system is washed, how do you know what the wash should stick to? That is, if you put your hand on the stove and get burned, you don't do it again, but you were also breathing, and you don't stop doing that. There should be some kind of masking agent, like wax, that will determine what the wash will affect and what it won't.
One criterion might be how recently a particular neuron-system was active and/or how strongly it has fired recently(?); that is, the wash affects (rewards or punishes) those parts of the system that were firing very recently, or those that recently fired strongly, which amounts to rewarding or punishing whatever was "on your mind" most recently. So when a neuron fires, it should "bob up" for a period of time, available to be washed with its reward or punishment, but as time passes, its availability should fade. A result of this method is that whenever the system starts to re-approach a negatively tagged state (but how does it know it's close?), there should be a resistance to going in that direction: it will do things to get away from that state. Basically, the system should develop an idea of what its "happy" state is, and continue to try to maintain it. All human memory (even short-term) is active, not the simple byproduct of the brain being in a particular state. If you're knocked unconscious, you don't remember the last thing you did, so some sort of fixing process was interrupted.
The more specialized the input cells, the less processing needed: reflexes require no processing; eye cells for detecting motion only, etc. Study "autoassociative memory" and unsupervised learning. They sound dumb (output = input), but they're not linear passthrough (which would be dumb because that would pass through every input, unchanged). They won't merely pass through an unknown input - they'll perform a calculation on it. What will happen when the unknown input is similar to one or more of the learned inputs? If the outputs are those of the learned things, then you have an ability to generalize and group, and also the ability to "free-associate", as in the paragraphs below. Along the same lines, the architecture you'd use for the below is a very large array of parallel small networks that all share the same inputs but have their own hidden nodes and output nodes that aren't interconnected. So you'd have a small network to recognize c-a-t (it would only take a few nodes) and one to recognize d-o-g. On a c-a-t input, one would output 1 and the other 0. (Related to discussion in Godel, Escher, Bach) After training, you have a net that is "tuned" to give a response to a particular input set, such as the sound of a cat producing a recalled image of a cat. That circuit could be considered to be resonant to a certain input set, or "tuned" to the cat concept. For text, all you'd need would be a separate, translator, network whose function was to translate the input, "c-a-t", into whatever inputs the tuned "cat" circuit needed to resonate, and feed that translated output directly to the tuned network's input. Thus, the response would be identical, whether the input was auditory or textual. Any other sense could do the same, provided it had a translator that would produce the input the tuned "cat" circuit needed to resonate. This is much like the idea of cell assemblies corresponding to a "symbol" or concept. They just have their input sensors waving in the neural net "breeze", (more accurately, they are constantly poked with inputs from the net), and when a particular pattern appears, they fire off their output. Sets of these tuned cell assemblies firing simultaneously could combine to make larger ideas, even seemingly comical ones: taking the grandmother example from Hofstadter, if your grandmother has features reminiscent of both a cat and an octopus, sufficient to fire both those cell assemblies, then that combination will, for you, constitute the "grandmother" concept. That is, larger concepts may result from very peculiar sets of smaller concepts, but that would account for "associative" memory: the whole idea of "that reminds me of...". It also accounts for how you try to remember something you're having trouble with: one of the component cell assemblies starts firing, and you feel you're close to it. You "try" to remember other related aspects of the memory, which probably causes lots of random firings, which might eventually produce the inputs needed to activate another related cell assembly; the activation spreads, and you feel the memory is becoming more "complete" and filled in. This organization also accounts for how people can have similar but not identical versions of a concept. Your grandmother may be so catlike that "cat" is part of the concept for everyone who knows her, while the octopus association may only be yours. For that matter, "aquarium" might be part of your association, not because she reminds you of one, but because you have a strong memory of looking at one with her once. But why do these assemblies become co-resonant? How do their inputs and outputs adapt to each other such that when one becomes active, it tends to excite one that it's become associated with, but not another that it hasn't become associated with? This may be completely emergent, just resulting from the adjustment of connecting weights. So might the development of these cycling cell assemblies - heavily weighted to each other but not to "outsiders". The question of two ideas becoming associated still isn't answered, though, because currently the net gets trained to produce the "correct" output. (No - maybe it is answered: remember that the net has been trained for both cat and octopus, meaning that an input with properties of both WILL cause paths to be taken that activate BOTH concepts simultaneously. Yes, but... when the input is presented at a moment when it's very catlike, but not very octopuslike, you still are able to artificially produce the association - you can remember that sometimes you are aware of the octopus-like qualities. Thus, when two cell assemblies frequently resonate together, they acquire the ability to cause each other to resonate. That is, if A frequently causes B and C, then later B can cause C directly and C can cause B directly. There does need to be a procedure for "absolutely" reinforcing connections: creating, and then strengthening connections between cell assemblies solely because they are active at the same time. Also, should there be a method for preserving connections, protecting them from backprop? If there isn't, a single backprop run to correct a single output error could wipe out vital connections that took a long time to learn. This does not happen in the brain. Maybe strengthening and preservation has something to do with transferring "memory" from short term to long term. There should be some sort of some path-reinforcing procedure. Figure out how the usage of ->output and ->strength relate to the Requirements For Learning. Idea for a bionic artificial eyeUse a CCD digital camera to gather pixel information. For each pixel, trail a microscopic wire into the optic nerve (or even farther back), such that when it is activated it generates a voltage no greater than an ordinary nerve would. Just leave the ends of the wires embedded in the nerve tissue without worrying about how to connect them up. The idea is to depend on the ability of neurons to interconnect themselves. Neurons won't be able to form new biological connections directly to the wires, but the wires will cause local stimulation to the neurons in their immediate vicinity. The question (and hope) is that the neurons farther down the line that can make use of this available information will wire themselves to the neurons being stimulated by the wires, thus actually bringing the information into the network. If it did work, it would be a fascinating question what the new information would be perceived as. That is, it might prove useful to the network, and thus be incorporated into it, giving the user the benefit of this knowledge that is the equivalent of sight, but there's no guarantee that it would be subjectively perceived in the same way as ordinary sight. If it worked, also, it would have other applications, such as for hearing, or even for providing hardware for sensory capabilities that people don't normally even have, such as for perceiving infrared or ultraviolet light, or whatever.
Training a network is learning by example. With consistent inputs presented to it, you keep retraining the network until the response it produces matches the example. The information from all available sensors must get into the system. You can't block it too close to the sensors themselves. But once it gets in, that information may be important to one part of the net and irrelevant to another. Where it is irrelevant, low weights must block it out. Where it is important, high weights must encourage its passage. This assumes that parts of the net become specialized, which is probably how it should be. Training setsOther problems to solve:
Applications of neural networksIt only makes sense to use ANNs for applications that they are known to be particularly well suited for. For a given task, you must choose an ANN architecture appropriate to the problem, build it (or modify an existing ANN program for it), determine the coding schemes of your input and output data, and possibly transform all your input data into a form usable by the net. Every one of these tasks is a nuisance task. For many problems, you're better off just using standard statistical analysis methods or writing an entire (non-ANN) program to do the job; it will be simpler. Because the rules for all the above a nuisance tasks are well studied and mostly deterministic, you'd think there would be a ANN program that could make some of those choices and build itself to suit a particular task, but so far I haven't run across any. Everyone builds their own. A real problem can be converted into a training problem, such as when SpamAssassin is trained: you give it an unknown and it makes a prediction. If the prediction was wrong (e.g. not spam), then you can give it a new training case with the previous inputs and the now-known correct answer. But you must then retrain it on all the previous training cases along with the new one until it solves them all. Classification problems
[End of NEURAL.DOC.] |
|
|
|
|
|
|
Copyright ©2010 Steven Whitney. Last modified Thu 10/21/2010 02:08:01 -0700. |
||