Master Thesis in Software Security
1. Representation of Android apps
We expect the research effort invested in this master thesis to yield a scalable, robust and accurate neural network for representation learning of Android apps. This master thesis could be the first step to revolutionise the field of machine learning-based app analysis of Android apps, just like BERT revolutionised the NLP field. Concretely, with the yielded model, we hope to push research on malware detection and dissection, vulnerability identification, repackaging detection, and many other downstream tasks
2. Force execution of Android apps
The research question that will be investigated in the master thesis is how can we infer the right inputs to reach a specific location in the code of an Android app. This would be super useful, for instance, to trigger malicious payload or check the presence of vulnerabilities in suspicious locations. Besides generating the right inputs, the app under test would be required some instructions to bypass counter analysis measures such as logic bombs.
3. Unified representation of code
In the Android ecosystem, the possibility offered to the developers to use various programming languages makes a holistic analysis of Android apps challenging. Indeed, we can for instance use a static analyser to inspect the dex code, but we need to use another analyser to inspect native libraries which are released as binary components. The problem here is that by using two analysers, we lose the “flow” that actually exists between both worlds (i.e., between dex code and native code). This master thesis aims at developing a universal (or unified) code representation that could be used both for dex code and native libraries. Then a single analyser could be run on this unified representation allowing full analysis of the apps.
4. Mining pastebin public data available
This master thesis aims at analysing the public data available on pastebin-like websites (e.g., pastebin.com, pastebin.fr, etc.). The goal is to assess the potential risks of this publicly available information. Indeed, thousands of “pastes” are created every day, especially by developers sharing code samples, therefore the data is a wealth of information, it can be used to extract data such as security & privacy, private code, malicious code, extract statistics, etc. The student should be able to automatically extract relevant information for further analyses.
5. Comparative study of various Machine Learning Android Malware Detectors
The focus is set to machine learning and its reliability. We have recently replicated five state-of-the-art Android Malware detectors that rely on Machine Learning. We want to perform a comparative study of those approaches: Despite using different Feature Sets (and hence different dimensions/spaces): Do they 'learn' the same things?
6. Running Android Malware
The focus is set to security and system. We have millions of Android Malware. What kind of information could we dynamically (i.e., by running the apps) extract from them, and how? There is also an ethical aspect to that topic: what precautions must be taken to safely run Android Malware?
7. Extracting Android apps information to be stored for future querying
The focus is set to security, data management and architecture. Android researchers daily need to download Android apps from repositories, the challenge is to select apps that match specific characteristics (e.g., does the app use obfuscation?, does it use reflection APIs?, etc.).
Therefore this master thesis goal is to develop a solution that automatically extracts Android apps’ relevant information using high-performance computing clusters. The student will have the opportunity to design and implement the storage architecture in high-quality lab servers. Eventually, a query API should be developed to ease the work of researchers worldwide.