Thesis or project ideas from my research
Send me mail if you are interested in a certain topic.
Well, perhaps not really new in the case of eBPF, but exciting still. eBPF is the cornerstone of Cloudflares l4 DDOS protection at the edge. And Firecracker has just been announced at re:invent as a new and lightweigth micro-VM for servless function support. How to drop 8 million packets/sec . AWS releases firecracker, a Rust based Micro-VM . Let me know it you would like to give it a test drive.
I stumbled recently over an article by Murat Demirbas where he mentions his experiences with formal verification of distributed systems. He used it with Paxos and other complicated consensus protocols. This could be a starting point for further research: Why you should use modeling with distributed systems . Let me know it you would like to give it a test drive.
Frequently over the last couple of years I had a tummy feeling, that we would should look more carefully at byzantine protocols for consensus instead of always depending on a simple fail-stop error model. The discussion at The Morning Paper on Decentralization in Bitcoin and Ethereum networks Gencer et al., FC’18 shows an interesting idea for the use of Byzantine protocols in the blockchain. The measurements show how many of the blocks are controlled by a small group of miners. The group is so small, that the energy hungry mining proof of work process seems like a total overkill. And the paper suggests that a byzantine quorum of the size of 20 miners would already distribute control better. Another very interesting measurement shows, that ping distance between bitcoin miners is rather short and equal and much shorter than with Etereum. This is not a technical problem for the blockchain algorithm to work, but it raises an interesting question: How do we know that the few important miners really are independent identities? In other words: the sybill attack question raises its head. The close ping distances let us assume that most important miners are in one geographical area. Which means, that they can be controlled by one state. So, replacing the proof-of-work algoritm with a distributed consensus would only solve the energy question, not the possible control of the blockchain by external entities. What if ping distances would be much more diverse? First, ping times can always be faked in the direction of slower answers. Second: what stops a state from running mining nodes in other states? This would make the whole system look much more distributed as it really is. And as far as I know, there is currently no solution for the sybil attack in general (one identity controlling several different identities) in peer-to-peer systems, let alone political control of miners or conspiration between them.
This shows nicely, that byzantine protocols are not a panacea for distributed systems. But there are many cases, where conspiration by an external super-power can be reasonably excluded. Byzantine protocls certainly help against few attackers and make distributed protocols much more reliable. A good starting point is again a discussion on The Morning Paper of Practical Byzantine Fault Tolerance by Castro and Liskov.
Addendum: I just realized that George Bissias, A. Pinar Ozisik, Brian N. Levine and Marc Liberatore have written a paper on Sybil-Resistant Mixing for Bitcoin which could offer a solution in the case of bitcoins.
While measuring dust levels has become rather easy due to Feinstaub selber messen there is nothing to measure noise or to count and analyse traffic available, at least not at decent prices and for the Linux/Raspy guys. The project has basically two parts. One: measure noise levels reliably and with A/C category. This allows to control e.g. nightly truck traffic through villages to avoid toll fees. The measurements need to be automatic and round the clock with timestamps. Data could either be stored on a stick/micro SD or transmitted via Ethernet/WLAN/GSM somewhere. POE or a battery pack might be necessary to allow easy installation.
Second: Based on the raw noise data, a neuronal network should distinguish the vehicle types: truck, traktor, regular car, motorbike. For trucks perhaps also distinguish two weight classes. A counter is needed as well. This allows a detailed traffic profile in a city or village. Of course, this part of the traffic analysis can happen offline. Ideally, the results should be available in the cloud.
The full monty would be if the software could detect the same vehicle across measurement stations. This allows a detailed profile of routes taken by cars and e.g also allows detecton of toll avoidance. Well, while we are at it: the doppler effect might allow us to even measure the speed of vehicles as the change of frequency per time unit.
I was unable to locate anything like this in the public domain yet. It would be no problem to buy a sound level meter for this project, but again: most only can be used with windows and are of no help here. Another alternative could be to use cheap smartphones for this purpose. Or a microphone (outside) connected to a Raspberry pie (inside). Let me know if you think about this project and we can discuss alternatives.
On my first day at work in 1986 I saw DAC, a grammar based HDLC protocol implementation written by my friend Claus Gittinger. He used Yacc to generate the code. The advantages were clear: if some weired behavior showed up, we looked at the grammar to see if there was a protocol violation. This made the protocol very easy to debug and to adapt for changes in the standard. Nowadys, DISEL (based on Coq) seem to allow even more. Programming and proving with distributed protocols Sergey et al., POPL 18 . Let me know it you would like to give it a test drive. Its dependent types make sure that the implementation is correct without refinement and verification is compositional.
A secure core of a system is needed for secure processing. This works in cars as well as in PCs. Teslas RoT . Let me know if you would like to build something on a Raspbery Pie in an IoT context.
Looks like meltdown and sprectre and other attacks finally triggered some fundamental research on hardware and software for secure systems. System Security Integrated Through Hardware and Firmware (SSITH) program and Cyber Grand Challenge (CGC) . Let me know if you would like to look into the hardware or software side (theorem provers, deep learning)
Disributed Systems have always been kind of weak with respect to modeling their domain. TLA+ could change this. Amazon is using it for complex services . And it is used in classes on DS e.g. by Murat Demirbar. Your task would be to give it a test-drive in a demonstration project.
Currently, a group at HdM is building a universal sensor unit. Several sensors can be installed to collect data on NO2, CO2, noise, temparature, location, other chemicals etc. In the simplest version you need to come by with a smartphone to unload the data into a cloud service. If more energy is available, the unit can push data by itself into the cloud via Wifi or GSM. All sensors should have us USB connector. The unit can be used to measure fine dust particles also, using a sensor from Stuttgart.
The paper on time in VMWare VMs made me kind of skeptical about the performance of distributed algorithms (e.g. failuer detection) in settings with instable time. The thesis would set up some VMs and test timing behavior of distributed algorithms for consensus or failure detection. Will jumps in time only affect liveness or even consistency?
Using bots to collect data from social networks, e.g. to investigate the creation and spreading of fake news.
How to simulate distributed algorithms and their failure models to get a handle on complex protocols like Paxos. Or how to simulate a large scale site using components.
Social GUI construction and application integration for large-scale multi-touch installations. Social analysis of interaction patterns, technical design and implementation. The scale of those devices makes single user concepts useless and requires group interfaces.
This has been a sore spot for quite a while: to enable discussions at events it is necessary that everybody has access to a microphone (this is also needed for streaming). Unfortunately the way this is done till now is through the use of long poles with microphones held by helpers into the public, by stationary microphones hanging from the ceiling in large numbers or by speakers leaving their seat and walking over to fixed microphones. This is all either too expensive or too clumsy given out narrow rows of seats and the need to hold events in different rooms each time.
My idea involved throwable microphones which can be centrally controlled and which are activated by pressing the ball. Most discussions involve only very few people and locations within a larger room and a small number of these ball-mics would be distributed across the room. Speakers can get access quickly. The following drawings might give you an idea: and a version that uses balloons hanging from the ceiling:
The article "the problem of many speakers" gives an explanation of the context.
If you are interested in language processing, take a look at Forum Open Language Translation (FOLT). They are working on a translation support software on open source base called TMOSS (Translation Memory Open Source System). They are looking for participants and developers. You can download an expose of their ideas. .
1. Security Evaluation of XML Firewalls and Web Application firewall products o create market overview o description of current technologies o description of upcoming technologies Prepare decision which product is feasible for the company requirements. The company is a large, global enterprise. The thesis includes theoretical parts as well as an integration of WAF technology into the infrastructure. The company could probably create two thesis jobs from this one.
2. Modeling of large scale system architectures -definition of a generic "Dictionary of elementary functions" and creation of a guide how to define and model a functional model. This thesis should create the "role model" of functional modeling within a company. No specific operating system knowledge is required.
4. User-Analytics Investigate methods to track user behavior not only in web applications. This includes generation of attentional meta-data from behavior data (log-files, clicks etc.) and the analytical methods behind. Application instrumentation etc. is also a topic. Classic tools like webtrends are too narrow in this case. Search applications are one example but the results should be also applied to other application types. Read "programming collective intelligence" to get an idea of the machine learning methods useful for tracking users. Read "findability" to get an idea of its application for search engines.
5. Performance Analytics, Reporting and Monitoring in a global Enterprise This work deals with the problem of performance in distributed environments with multiple communication protocols and architectures. Goal is to create a monitoring system that allows tracking of complete business processes and the generation of high-level events from the combination of several low-level events (complex event processing). Applicants should be familiar with J2EE and perhaps other types of middleware and have an interest and broad problem spaces. Analytics and reporting are also involved.
Technical, legal and organizational aspects of e-voting in an university environment. Security analysis, policy etc. Evalutation of open source e-voting programs.
Use random access methods to uncover security problems in applications and system software.
Test peer-to-peer gaming frameworks for reliability and performance. Investigate bottlenecks and design issues with MMOGs.
Use the new language by OMG for system architecture modelling - together with some excellent simulators and generators from Ilogix.
(1) High-Availability Infrastructures, small and embedded systems (together with Stephan Rupp, Kontron) (2) Software-Technik Praktikum: compare concepts like OSGi with middleware for high-availability (with Stephan Rupp, Kontron).
Take Kevin Mitnicks book on "the art of deception" and create a classification of user errors which where exploited by Mitnick. Build a user conceptual model from those. Apply it to browser security. Does browser security work on the same semantic level? Is there a huge gap between resources the browser tries to protect and how people think and act? Speculate on better machine representations for user semantic models. This could be done together with a student from usability or information design faculties.
Develop new Fuzzing Tools and embed them into a new testing methodology.
Take the thesis from Mirko Bleyh on the modeling of operational aspects as a starting point to model security aspects.
One day the bandwidth to the internet may not be large enough for search engine companies - too many new pages join the web every day. How could we use the distributed computing power of edge machines for a better search index? What if we combine the distributed hash table technology with a seti@home like approach? Take a look at grub or YACY . And don't forget the tradingcenter project by Ron Kutschke and Markus Block which implements a distributed auction platform and which could be used as a start for a distributed search engine.
Use something like the UIMA framework (see IBM alphaworks) to design a system where "higher" modules (e.g. a semantic module) can give hints back to "lower" modules like word recognizers or taggers.
Enterprise level security is far from easy. Architectures like J2EE and .NET try to hide the complexity but still allow all the flexibility needed. Code access security, JAAS, identities and run-as modes, delegation and tracking, backend-access and registries. And tons of APIs to encrpyt, create secure sessions, declare or program calls and so on. Add interoperability to this with GSS-API or the webservices interfaces and developers are facing quite some challenges. One of the best books on J2EE 1.3 security that I found yet is - surprise - a book on mainframe security. The z/OS WebSphere Application Server V5 and J2EE 1.3 Security Handbook with its additional material . It shows how requests flow from the DMZ through web and application servers to backend services and databases on mainframes. And it gives a good explanation of how the identities, roles and privileges change during such a flow, based on the capabilities of J2EE and its declarative or programmatic features.
And the additional material is also very interesting. Some people at IBM tried to verify all security related interfaces with example programs - called SWIPE. The code is available for this and my idea was to turn this code into a learning facility for secure programming. It could start as a software-technik-praktikum at HDM where a group of students could try to improve the demo application(s) and learn a lot on security APIs and infrastructure. GO AND PORT IT TO JBOSS!!
Dennis Pilipchuk in the meantime is exploring Webservices Security and available frameworks which could go into the demonstration software as well.
Anand Raman wrote a nice piece on how to Create an anonymous authentication module and manages to explain basic J2EE security principles at the same time time.
The result would be a much improved understand of security infrastructure and programming.
Not only universities live from establishing a network of partners, friends and ex-members. Joint projects, sponsoring etc. all need dependable data about contacts, interests and other information. The goal of this project would be to build an information base that is able to not only keep simple informations about persons but can store the relations between partners. Topic Maps could be one approach here. The frontend should be web based. Access control should allow some of the information to be public and other parts (strategic) to be available only to qualified persons. Students could use this base to find business partners for thesis works. University personnel can use it to raise funds or plan events.
The idea is about using smartphones, generic UI devices and wireless headsets in helmets to provide groups of motorbikers with a cheap group communication feature based on bluetooth wireless networks. A more detailed description can be found here .
Investigate the technological and social mechanisms behind the new darknets - cryptographically closed environments for content swapping etc.
For this thesis the student should perform a domain analysis with the goal of finding commonalities and variations within a certain domain. An existing application from this business domain should then be analysed for existing (or missing hot spots) and the results combined with the results of the domain analysis. How could a software production line in this business domain look like? What other types of applications in this domain would then be possible?
I did a little analysis of a network of sites using 0190-dialers to rip off the unsuspecting. The truth behind "alkoholikerinnen.de" . I noticed that the GUI designs tries everything to obscure the fact that those sites want you to download and use 0190-dialers without noticing it. In many cases price information is either missing or printed in silver, which is especially prone to being overlooked. Then I ran across an excellent article on user interaction design and security issues by Ka-Ping Yee User Interaction Design for Secure Systems and found lots of good ideas there.
Yee lists 10 principles of sound interaction design:
The users must understand when and what kind of authority they grant to other actors like programs. This is perhaps the most important point which seems to also require a concept of capabilities as a means to restrict the delegation of authority. In most systems users have few choices when they run a program because the program will automatically inherit all of the users rights. So there is an interesting architectural problem behind this as well. See our current work on capabilities .
The others are:
|Path of Least Resistance|
|Trusted path (or why an NT login requires ctrl-alt-del to be pushed|
Some examples that come to my mind: the typical dialog for establishing a SSL session does not convey the most important point: that SSL does NOT guarantee that the receiver is really who you THINK it is and that it is YOUR job to verify the identity. If you don't understand this point,go and read Eric Rescorlas wonderful book on SSL and TLS, designing secure systems or have a look at my lecture on web application security.
You could try to come up with a better dialog e.g.