Rust-based Bruteforcing tool

24/02/2024

Context

During the early part of 2024 I attended a intensive summer school program called National Computer Science School, or NCSS for short, at the University of New South Wales. This particular stream of the School for focussed on Cybersecurity. During my there we were given a wide range of tasks and challenges, ranging from cracking codes, open source intelligence, lock picking, social activities and learning some basics about how operations are carried out on computers. All whilst meeting tons of enthusiastic and interesting people. During the duration of the school a challenge was set to create a machine code program that could take the most possible 'steps'. This could either be achieved by using understanding of common and abstract processes used in programming or by creating a tool that could test all possible programs and figure out how many steps each would take and then return the best one.

During the school, I found it incredible difficult to find good programs and opted to create a program in python which could attempt to brute force it, unfortunately it was very slow, not capable of using the full power of computers at our disposal and ran into memory issue often. Since I knew that Python could be very slow because it is interpreted(I know it is a little more complicated than this, but for the sake of simplicity...) a compiled language would be optimal for this purpose. I knew a little bit of C++ but it notoriously difficult to comprehend, and I'd heard of another language that is growing in popularity for its speed and security, Rust! So after the summer school had finished, I decided it was time to learn.

The Tool

Initially the entire program was modelled on the version, I'd written in Python, which I had compared to the results I was getting from the one provided to us the school. I'd also like to make clear that I will not be releasing this code because that takes away from the fun of creating the programs. Because I knew that this general design worked and should still be relatively fast. I first created a data type which would be the emulator, containing all the variable and functions necessary to allow it to act as a perfect emulation of the chips we were using in the school. Then with a few modification here and there the code was optimised fast.

Then I need to find a way to get every possible program in sequence, without filling memory because, some basic math dictates that if the program were to load every possible program into memory it could take 16 exabytes of memory, which is just extremely unrealistic. On the flip side it is unlikely that the program could every run to the full extent but still. There is no need for it there are more efficient ways. So by store a number which can represent every possible program, this number can be iterated over with a for loop. This means that program 0 represents an empty program and so on. I achieved this by using some simple math, by using remainder theorem, I could generate programs here is an example.

The number 2051 in a system that uses 4 bit bytes would represent the program. 3 0 8 0 0 0 this is because 2048 is divisible by 2⁴ = 16 and by remainder theorem 2051 = 128 × 16 + 3 the remainder takes the first position then the loop begins over. It takes 128 the result becomes 128 = 8 × 16 + 0 So next instruction of the program is 0. Then since the 8 cannot be divided by 16 to give a whole number the final number of the program become 8. This is how every number can represent a different program. Now the program to bruteforce only need to store a number for every possible program.

Then because this is a process that can be very easily parallelised to work across multiple threads which is great because Rust is all about fearless concurrency. Each thread would iterate over this number offset by their thread number given by the order of their creation and step over the list by the number of threads, therefore evenly spreading the workload. This is because in list of numbers there are regions programs that infinitely loop and also those that perform well. Additionally this could mean that there must be some sort of pattern, which could be interesting area of research and testing with a Generative Adversarial Network which would attempt to predict good programs through the scoring of predictions.

Caveats

Obviously because there are large number of combinations now matter how much compute power one has there is point were in classical computing it become highly impractical to continue attempt combinations, due to the exponential nature of the process. In my testing programs of 5 instructions long could be found in seconds and 9 instructions long could take around a week. So just by extrapolating the results 10; would take months, 11; would take years, 12; would take around a lifetime, 13; more than millennium and so on. Although this could become an issue for passwords once quantum computing becomes mainstream, and quantum resilient algorithms will have to be applied. Thankfully they aren't yet strong enough to be applied to password cracking tasks.

Possible Improvements

It would not be that difficult to apply some type of algorithm for a variety of tasks that the brute force has to carry out, some to recognise programs that contain subsets of previously already run programs for example [1,1,7,8,0] which when compared to [1,1,7,8,0,0,7] is in all essence the same program. Because the final 7 in the program is never run, although the program would still test this, it will end up repeating the steps. Which is unnecessary, therefore, it should be possible for it to recognise that it is unnecessary to test.

Also the program encounters a large number of infinite loops which waste time. A type of loop detection which looks at the previous state of the system and if it is same as the last time it looped the system should automatically recognise it as an infinite loop and then quit there.

Final a machine learning model could be implemented to generate likely programs that would perform well. Data for which could be collected from slight modifications of this program, which instead of threads collecting only the top scoring program, it would collect all programs above a certain threshold.