|
25 Years of Programming
An open source source for C, C++, OWL, BASIC, MDB, XLS, DOT, and more... |
Home Projects Up Sitemap Search Blog Forum+Chat About Us Privacy Terms of Use Feedback FAQ Images Services Ads Donate Humor |
|
|
Essays on Complex Systems Part 4 - GeneticsReal Genetics (Biology)How Species DivergeIf the dividing line between species is whether they can successfully reproduce together, then how does the split occur? It can't be a sudden, deciding, mutation in one individual, because it couldn't breed with others to pass it on. And you can't expect it to occur simultaneously in more than one individual. Yet, when two species diverge, there does have to be a point where they can interbreed, and then a successive point where they can't, and a specific cause of the change. One thing seems certain: they must stop interbreeding long before it is impossible to do so. Otherwise, through the interbreeding itself, they would remain compatible. So there must be an initial social split, one group either physically or socially separating from the other, which then allows a physical divergence to take place. That's the key. It really is a divergence of entire populations, not of individuals. All the individuals within each population remain compatible with each other, but the social or physical separation of the populations allows them to diverge. An example of how this can be misunderstood: Upright walking is often cited as something that caused humans to diverge from other apes, especially because it brought us out of the trees, while they remained there. But this is exactly backwards. Upright walking didn't drive humans out of the trees, nor even allow them to leave. They had to leave the trees first. It was only after the behavioral move out of the trees (and coincidentally away from the other apes) that mutations in the direction of upright walking would have been selected for. Another example was described in Scientific American Frontiers (Spiders) April 1999: 2 types of males in 1 species of spiders, with different courting behavior, both successful, but 1 safer for the male. A more obvious example is the evolution of American English and its many dialects from the original British. Separation has to precede the differentiation, else the population would simply remain homogeneous. Related question: with TV and increased mobility, do children today have less pronounced regional accents, and are regional idioms disappearing, as language fuses back into one? Notable exception: "ebonics", discussed elsewhere as an example of isolation producing divergence. Once a population is physically scattered, it seems inevitable that they will diverge, just through random variation. So any event, even a short term one (flood, a river becoming uncrossable, drought, earthquakes, volcanoes), could initiate the divergence, as long as the scattered populations don't reassemble for a long time. It would seem to be a corollary that the more separated ecosystems there are, the more different species there should be. ---------- Species classification is somewhat arbitrary, based only on the portion of DNA that allows interbreeding. You could have a huge divergence in the rest of the DNA only, producing radically different members of the same species (dogs), or a divergence in only the reproductive sections, producing individuals of different species but identical in every other respect. A number of mutations are said to produce sterility as a side effect, but are the individuals only deemed sterile because they can't produce offspring with the majority of other existing individuals, or because they actually lack something essential to reproduction? If the former, and if they could theoretically produce offspring if a compatible individual existed, then they're not sterile; they're a new species.
Misc.
Genetic Algorithms
Backtracking (from AI) is a method of organizing the functional subunits that will form a solution. Genetic algorithms seem to be a competing method. Each is an intelligence strategy, in competition with other strategies.
Basic Computing Instructions: OpcodesWhen I was developing "functional subunits" for the agents in Adapt.cpp, I realized at one point that the basic instructions that the agents seemed to need were similar to instructions in CPU instruction sets. On reviewing those sets, it further appeared that most of the "computations" that I could think of that were performed by us or by other living things could be described in terms of these basic instructions. It was a revelation to me that this set of instructions, which I would have considered an artificial one, might describe fundamental units of computation, regardless of context. In the opcode list below, the string manipulation instructions and some others are omitted because they seem overly dependent on the computer context. Other instructions are included because some seemingly "naturally occurring" computations can't be done without them, but not because they themselves seem to be inherent. Still others are included just because they seem useful. Most are extremely "elemental". The TI59 (TI-59 Programmable Calculator) opcodes are more self-contained (less dependent on each other), and proved somewhat more useful as a model for Adapt.cpp. Math functions:
Logical functions:
Program flow:
Somewhat related to this are the elemental transformations that might be performed upon a data set, such as insert a step, delete a step, etc. Many of these are implemented in Adapt.cpp. See Typogenetic Code (Godel Escher Bach:508ff) for ideas for others. Using OpcodesOpcodes must be designed such that they can be randomly shuffled through the genetic algorithm, and yet still always make enough sense that they don't crash the parent program, even if they are nonsense in the language they're written in. Not a trivial problem; time consuming. When creating an opcode set, there is a trade-off involving the size of the subunits: If the subunits are large, a solution can evolve quickly (because there aren't many different combinations to try out), but the program won't be very flexible, because the individual steps in the subunits can't be broken out and rearranged. If the subunits are very small, the program can be extremely flexible, but a solution might take forever to evolve. If you have any idea of what might constitute a solution to the problem, you must design the opcodes such that that solution could evolve in a reasonable time. For example, using 100 different opcodes, a 13 step program is one of 1026 possible. It is absurd to expect it ever to evolve: at a million cells per day, it will take about 274 quadrillion years. One possible compromise might be to use low level opcodes, but to place them into initial random programs in known-good (or suspected good) groups, as though you were actually using larger, more capable opcodes. Thus, you get a head start (doesn't take forever to evolve), but the steps are still mutable at a low level. This might also help with another similar problem, the relative size of mutations. Just as you can only shuffle whole opcodes, you can also only mutate whole opcodes, which makes all mutations large. If the system is far from a solution, large mutations may help since anything new is likely to be better, and large steps are better still. But if the system is already close to a solution, they will likely just tear it to pieces since anything new is likely to be bad, and large steps are even worse! (as happened with the "mybest" program). For fine tuning, small mutations must be possible. Interesting idea is to change mutation size based on a measure of existing success. (What measure?) ADAPT.DOC (Section of COMPLEX.DOC) - Project notes on the C++ program Adapt.cppThe purpose of Adapt.cpp (MSDOS) and WAdapt.cpp (Windows) is to provide a laboratory for studying natural selection and evolution not just as processes that supposedly occur in living populations, but as a fundamental natural law that describes the process of change in any pattern, living or not, that exists within any environment; and especially for studying how differences in various factors (elemental subunit size, genetic mating method, mutation rate, gene segmenting scheme) affect the efficiency of the process. To whatever degree it succeeds, it may simulate biological evolution (in that the cells aren't biological), but with regard to the broader process, it demonstrates it. That is, it isn't a simulator. Basic questions are:
The experimental answer to all is emphatically Yes. What's more interesting is how the various factors contribute, either enabling or inhibiting it. Technical Notes and CommentsAn explicit same-species test is meaningless. Don't use it. Even in biology, the test is solely pragmatic: Are offspring produced? Do they live? In computer, no offspring = objects are so different that no meaningful merger is possible (string DNA vs array of 5 doubles). And do they live = after creation, do they do anything other than get depleted by runcost? Which is a behavioral test. There's no way to determine species by examining DNA. (That is, there's no way to tell whether the offspring of 2 cells will be viable until you've created and tested it.) 2 cells with significantly different DNA can have functionally equivalent behavior. Any differing pgm steps that aren't particularly significant are analogous to eye color, etc. Mutation rate is now based on pgm length (quantity of genetic material available to mutate). Otherwise, shorter pgms have a much higher rate, and longer pgms hardly mutate at all. This should be a rule for all programs using mutation. This program displays two frustrating characteristics that I thought were weaknesses and flaws in the program design, but that I later heard discussed by Stephen Jay Gould in an interview as being characteristic of natural evolution:
There must be a net energy increase into the system for the population to grow. With no "plants" to eat, the easiest method is to give children more energy than they cost. But excessive growth generates numerous cells that get culled before they can run their programs, which wastes time. The network of energy inputs and drains seems to be important in any system. If population can't be stabilized here, add "Total System Energy" to reports or output to energy.log for Excel study and graphing. The energy in/outflow balance here is probably like lambda, determining population behavior (death, growth, & how interesting the behavior is). If necessary, replace artificial culling and topping up with an automatic adjustment so I don't have to fiddle with the parameters.
Adapt already selects very well for cell programs that exploit the rules, and changes in the rules produce almost immediate changes in the programs. In one long run, 2 species developed.
Other runs have been overly lopsided, producing only eaters or only maters.
Partial rewards are at least as important here as in Classify. If the environment or the rules are so unforgiving that a cell is either totally fit or dead, it's the same all-or-nothing problem. There won't be any evolution. It works best to the extent that less than perfectly fit cells can at least squeak by better than some others, i.e. a partial or graded reward system. The problem with killing off an old cell to make room for each new one was that since each new cell got one chance to run its program, it potentially allowed a single loop to last forever, with an endless chain of new children. No longer a problem because of energy threshold required for mating and random selection of which cell to run. MINPOP should be low. Periodic population crashes make accumulating energy useful, for surviving the lean periods, and the lower the population goes, the more severely you select for the best accumulators. A denser screen makes the "adjacent" opcodes effective. I think there are now entire programming languages (such as for defining objects used in computer games) dedicated to the types of things this program uses: objects moving around in space, governed by rules of interaction with the environment and other objects. And event-driven. One game even boasts of intelligent adversaries, implying the objects can learn. Check into it. All heading and other calculations use the correct spatial coordinates. Heading is only translated to match the inverted Y axis in reports to user. Other things to remain aware of: when testing for neighbors, Above is calculated as (y - 1), which is the point above it on the screen. Potential problem? Currently, think not. There must be no extra cost for moving and turning, and no per-step cost: Since all cells start with the same energy, if a brand new cell attacks another brand new cell, it will always lose and be eaten because of the extra cost of getting there, and the movers, who have the only decent chance (at least in the starting population) die out. Cells that don't eat will now weaken at the same rate, except that cells that execute many useless opcodes have a disadvantage, as they should, because they don't accomplish as much in the allotted time (or stepcount). OpcodesEvery opcode should leave, as a result code in ax, the most useful information it has about what it just did, formatted (ranged) to be most easily used for program branching control in the case that the opcode is followed by an ax test instruction. Remember to add sense() at end of any case that might have caused changes to the environment that are detectable by sense(). "if(test) repeat previous step" is not used because there's no way to change the data being tested, and an unconditional "repeat" is just an endless loop. You can use both "elemental" opcodes and "macro" opcodes that could be defined in terms of the elemental ones. Cells behave more interestingly when the macro opcodes are available. Organization of Genetic MaterialAgent programs are now segmented, modeled after the segmenting of chromosomes into genes, containing embedded delimiters to allow steps of a sequence to tend to cling together and get copied as a unit. This is necessary if insertions and deletions are allowed, because the same sequence may start at different locations in 2 cells, and step-by-step copying won't reproduce it. If program steps occupy fixed locations, this will happen automatically due to a good sequence's prevalence in the population. But then pgm must start at maximum size from the beginning, which, besides being wasteful, is unrealistic. My attempts at manually creating "good" cell programs, before segmenting was instituted, were torn apart by mutations. However good the program, having to mate with less effective ones ruins it. This presents an interesting problem: a program can be effective, yet not robust enough to survive. Is there in general less precision and more redundancy in high mutation-producing environments or in populations with much variation? One changed step can turn a good program into a useless one, so the fact that two programs look similar doesn't tell you anything about their relative fitness. It also means that there may be no steady progression toward improved fitness because there is no reward for a partially good solution. A program could thrash around "near" a good solution for quite a while, with no improvement at all. Then the last essential step falls into place, and it's a terrific program. Do a long program run from scratch (starting with one program step). The nature and progression of programs evolved from scratch may be different from those evolved starting from, say, 10 steps. More redundancy? Hardier sequences? One of the surprises from the Human Genome Project was the discovery that 50% of human DNA appears to be useless filler, parasitic hitchhiker DNA segments with no function except to preserve themselves in our DNA. They replicate and copy themselves to new locations. This would be consistent with the direction Adapt programs seemed to tend to go: when there's no overseeing organizer, you can wind up with redundancy, filler, obsolete sequences no longer used (i.e. no longer expressed, but that could become active again under different circumstances), all commingled in a hodge-podge mess. This, in turn, is consistent with the discovery that a gene can stop being expressed if a "stop" codon gets inserted into it, blocking its functionality. It's how traits go inactive. Example: human smell is currently 1/1000 as acute as it once was, but the coding for it is still there and could be reactivated. It is remarkable that a program as crude and unfinished as Adapt already shows parallels to so many natural phenomena like this. It certainly is generally on the right track. In brief reviews of programs, it looks like unnecessary zeroes (not actually useful in defining segments) do get squeezed out of programs. Find out: in sexual reproduction, at what level does the mix-and-match occur? Whole chromosomes only? Seems likely, which means you get huge blocks of DNA from each parent, not thousands of little segments cut and pasted together. Ideas for ChangesProgram Operationit could also have a stack (a stack container of doubles) to push ax onto for memory, or at least one additional test register. Or ax could be a StatArray, not sure what for! As long as any cell eats any other, overpopulation presents no actual food problem (just the opposite!), so don't try to simulate one (such as with Cull). It's not the real situation. Agent Behavioral Rules
Once program is fully operational: do successful programs exhibit these characteristics? Results and AnalysisChi-squared analysis should be particularly suitable for ending populations, since pgm steps are assigned randomly. At each pgm location; for opcode sequences; for segment sequences. Run 4 windows (populations) from either identical or random starting populations. Do they evolve in similar directions? There's an interesting energy economy in the program, with characteristics similar to a monetary economy. Energy tends to concentrate. Those who have it get more, and become unbeatable. Each cell is a little factory that consumes other cells as food and transforms their energy into a product (a cell more like itself than the one it ate!). The program should function well as a closed system, but tends not to. It is more robust if there is a net flow into it (inflation), especially since energy that gets concentrated is effectively removed, unavailable, and must be replaced. Energy flow (analogous to money velocity) is important to its dynamism; without runcost and complementary energy inputs, a population of cells could last indefinitely, but doing nothing, no growth, movement, change, evolution. [End of ADAPT.DOC] |
|
|
|
|