Identifying the Essence of a Function Clone

The first step in developing Automatic Function Multi-versioning (AFMV) is to identify whether the clones of a function are essentially the same or not. The easiest way we could go about doing this is to try to go through each function clone's GIMPLE statements line by line and compare them against other clones. If no differences are found, then we can assume that these clones are essentially the same.

The Issue

One issue we have with this though is that functions that are supposed to be "essentially" the same semantically look different after being converted to GIMPLE.

As you can see here, these lines look semantically different because they are using different variables names, etc.; however, when you look a bit closer you may notice that these lines are pretty much logically the same. This "logical essence" is what we want to try compare between clones, but how do we identify this?

The Solution

After thinking and doing some experimentation, I have come to realize that maybe instead of comparing the actual gimple statements, we could instead compare the "type" of gimple statement used. Because at the end of the day, despite using different names for variable or object properties, a GIMPLE assignment is still a GIMPLE assignment. A GIMPLE switch is still a GIMPLE switch. With this in mind, we could essentially tell that two clones are the same if they have the same order of GIMPLE statement types used.

I managed to create a pass that does just that and identifies this "essence" for any function clone and print it to a dump file. The code for the pass is available in the following GitHub gist:

https://gist.github.com/JonPeeAir/cc020a67e321ad9a0bdfe881e82243f1

Note: These files should be added / is part of the gcc subdirectory withing the gcc codebase, i.e., gcc/gcc/

In here, the "essence" of a function is basically a string of numbers that represent the order of gimple types used. When we visually compare this with other functions clones, those that have matching string are what we consider essentially the same and those that do not are considered vastly different.

Here are function clones that should be the same, so they should be pruned.

Here are function clones that should NOT be the same, so they should NOT be pruned.

What's next?

So far, the pass I made only calculates this "essence" of a function but is not yet able to compare it with other functions clones. I have been struggling to get this done, and I suspect that it might be because I am using a GIMPLE pass when I should instead be using an IPA pass. I am not quite sure though and will keep experimenting. I will continue my work on comparing function "essences" and will post updates on my next post.

For now, thanks for reading!

Comments

Popular posts from this blog

Lab 03 - Breakout Game

Building GCC

Lab 04 - Diving into 64-bit Assembly