Since my previous (and first) post I began working on the actual simulation for the cells, but then soon ran into an obvious issue: the brains of the cells. For cellular automata it is sufficient to make a simple state machine in which cells make decisions using a sort of tiny program, but because I am using a reaction-diffusion simulation, this doesn't seem to be an ideal solution, because the output needs to be between 0 and 1, as I will explain later. My first assumption is that neural networks would be the ideal solution, since they work with non-binary results, though upon looking into the issue, it seemed that the ideal neuron was a perceptron, a simple neuron that sums weighted inputs and then runs a function on this value. Perceptrons originally were designed to output a 1 or 0 depending on if they exceed a threshold value, but they were modified to use sigmoid functions so that they can be trained using back-propagation (specifically the one 1/(1+e^-a)). In my situation, natural selection will train the creatures, since no "correct" output exists, thus I cannot use back-propogation. Another consideration would be recurrent neural networks, which are extremely difficult to train using back-propogatioin, yet they are widely used, so it seems that they would be very helpful to my situation, but they actually are complicated to process, and since I want to have lots of creatures at once, I believe that this is not the best option.
One thing I realized in my research is that a lot of effort was developed on algorithms that can be trained easily using back-propagation. One type of system that completely ignores that prospect is using, which I used with most of my older evolutionary celluar automata, is a genetic program. I believe that this is certainly a good option, so much that I wrote a whole blog post on them before I wrote this one, but as I was concluding the post I realized that this might not be a good option. After that, I did even more research into alternative algorithms, but I came up with very little that I hadn't already tried myself or read about online. I was going to use a genetic program, so I started writing the code, thinking of how I might just do it in the process. I needed to allocate space for so many small nodes that if I dynamically allocated memory from the heap for every node, I would end up wasting most of it in the overhead of allocation. It eventually forced me to accept that I would have to have some fixed maximum amount of inputs to a node. The main reason this was a problem was because I was still trying to incorporate perceptrons into my model. In addition, I would absolutely need to process the input, which in my case is all of the chemical concentrations, in a special way, since it was a scalar value that could be almost anywhere, while the output needed to be somewhere between 0 and 1, which works perfectly for a sigmoid function. Ultimately, I read up on the amazing things that deep neural networks were capable of, and then I was partially convinced, but not entirely.
One thing I felt was not considered in deep neural networks was the significance of multiplexers in logic. A multiplexer (MUX, for short) is normally used in digital logic to select from inputs using selector bits, and they can literally do everything. I really felt that I had to add these into the mix, though I can't put my finger exactly on what problem they would solve, but I just noticed that perceptrons would have a hard time selecting between two completely independent decisions, since all they can do is combine multiple decisions together depending on their weights, rather than choose one over the other. So I conceived of a hybrid model.
My model has three layers to it, which I will name according to convention. It has a fairly large input layer, which is designed to create multiple separable regions in the input space that can be used in subsequent calculations. The outputs of this layer are binary, so as to be computationally efficient, though I may switch later to sigmoid functions just to see how much better they are, but my understanding is that the purpose of the curve is just to help make back-propogation easier. The next layer is the hidden layer, which in my case is actually a series of layers each containing one muxeptron, which is a concept I imagined. Each one of these hidden layers can reach back to receive input from any previous layer, which I feel is very important to keep the input relevant even near the end of the calculation, but also so calculations can be reused. Muxeptrons are an invention I created for the purpose of adding MUXs into the mix. Each muxeptron contains three perceptrons; one perceptron will either fire or not fire; depending on whether the first perceptron fires or does not fire, one of the other two perceptrons will be chosen to be used as the output. Finally, the last layer is the output layer. Because my outputs need to be scalar values between 0 and 1, I figured this was a very good time to use a sigmoid function. The output layer can only access previous layers and uses normal perceptrons that use a sigmoid function, which is very basic and works properly to my needs. Since not all nodes will be used in the computation, I chose to compute from the output backwards. Computation begins at the output and as inputs are required, they are calculated accordingly.
Another thing I decided was that if a node was unconnected from the network, I would leave it unconnected. This seems strange, but my reasoning is that this will simulate junk DNA to some extent. In addition, if a new node is added to the network through a mutation, I will simply add it to the network unconnected; a future mutation may cause it to become connected to the network, which allows the network to gain complexity without significantly altering its function.
The purpose of this brain is to control the diffusion coefficients for each cell connection in the mesh. In each simulation step, the cells will run the "brain" algorithm to produce diffusion coefficients, which are intended to simulate active transport, though only for outgoing diffusion (I may change this in the future so that cells can transport chemicals inside of them). As in real biology, this active transport will require energy, which is also going to be a chemical (namely chemical 0, otherwise known as food) in the reaction-diffusion engine. When a cell produces a value of 0, it will force there to be no diffusion, while a value of 1 will force all chemicals to diffuse out of the cell in one cycle. However, a cell that actually managed to produce a 1 or 0 exactly (which should be impossible with the sigmoid function) would kill itself immediately because the food cost would be infinitely high. A natural rate of diffusion is established that is closer to 0 than 1 at the initialization of the simulation, and if a cell matches this rate of diffusion exactly, it will not have a food cost, thus cells should adapt to "understand" this value, possibly generating it and selecting it with their MUXs to avoid food costs when food is low.
The only things a cell can read from its environment are just the amount of chemicals inside of its own cell (including food, which IS a chemical). This means that a cell will have to use chemicals to send signals through the mesh.
Although this is unimplemented as of now, I want to share my thoughts on how the chemical reactions and physics calculations will occur. When the simulation is initialized, I will generate a series of random chemicals, then I will generate an arbitrary amount of chemical reactions that allow a few elements to combine in a 1:1 ratio into another element, so that there is conservation of chemicals. After the chemicals are all generated and organized in this fashion so that chemicals cannot be created or destroyed through chemical reactions, I will define a catalyst matrix so that each chemical acts to some degree as a catalyst for every available chemical reaction. This catalyst matrix will be multiplied by the vector of chemicals in every node, so that reaction rate constant can be determined. The reaction rate will still depend on the concentrations of the input chemicals to the reaction. I will probably bring these concentrations to the power of their ratio in the reaction, but I have a feeling that this will slow down my simulation substantially, so I might come up with an alternate solution later on after profiling.
As for the physics simulation, every chemical will interact with every other chemical. Two base force vectors will be computed: one "gravitational" force vector that makes both particles attract or repel and a "magnetic" force vector that is computed as the cross product of the velocities of the particles. Both forces will then be individually multiplied by the summation of the products of each of the two chemical concentrations multiplied by a constant coefficient for that particular force. Finally, the force vectors will be summed and then divided by the inverse of the distance squared so that they follow the inverse square law and have conservation of energy (to some extent). Opposite forces will be applied to each of the two nodes as particles. Also, forces will decrease linearly starting at some radius that I will define so that particle interactions do not get too strong and fling particles all over the place at high speed.
My main concern with the physics simulation is that multiplying all the chemicals together and by their physics coefficients might be very computationally expensive for large amounts of chemicals, so I might impose a limited set of physics interactions just like I did with the chemical reactions, but I will only do this if profiling shows this to be a significant burden, which it may not since it is only doing many multiplications, which aren't too difficult to compute, while the magnitude of the distance requires a square root to compute, which IS a requirement, because I have to divide the vector three times by this value to get from actual magnitude to inverse magnitude squared, thus I cannot just use it in its squared form. As things need to change in the simulation, I will update them and make them work a bit faster. This project will absolutely be using OpenCL eventually, but I will begin writing everything in C++ just to see how things turn out. I will also eventually optimize this by using a barnes-hut simulation so that the particle interactions don't get to expensive at large values.
As I said before, I will keep posting my ideas on this blog regarding this project. I have a feeling that very interesting things will result from the simulation. If I like it enough, maybe I will turn it into a game. Once the code is more fleshed-out, I will consider making it open-source, in which case I am likely to release it under the BSD license.
No comments:
Post a Comment