RSDA

 

2nd IEEE International Workshop on Reliability and Security Data Analysis
 

RSDA Welcome and Opening

Monday, Nov. 3, 09:15 - 09:30 - Hotel RC - Catalana Room

Antonio Pecchia, Olivier Thonnard. Welcome Message

 

RSDA #1: Vulnerability Analysis

Monday, Nov. 3, 09:30 - 10:30 - Hotel RC - Catalana Room

Chair: Antonio Pecchia, Federico II University of Naples

1.     Bindu Madhavi Padmanabhuni and Hee Beng Kuan Tan. Predicting Buffer Overflow Vulnerabilities through Mining Light-Weight Static Code Attributes

2.     Jeffrey Stuckman and James Purtilo. Mining security vulnerabilities from Linux distribution metadata

 

RSDA Keynote 1

Monday, Nov. 3, 11:00 - 11:30 - Hotel RC - Catalana Room

Chair: Olivier Thonnard, Symantec Research Labs/Amadeus

Leyla Yumer. Keynote: From Script-kiddies to Cyberwars...

 

RSDA #2: Security Data Mining

Monday, Nov. 3, 11:30 - 12:30 - Hotel RC - Catalana Room

Chair: Olivier Thonnard, SymantecResearch Labs/Amadeus

1.     Carlo Schäfer. Detection of Compromised Email Accounts used by a Spam Botnet with the Country Counting and the Theoretical Geographical Travelling Speed Extracted from Metadata

2.     Andrea Paudice, Santonu Sarkar and Domenico Cotroneo. An Experiment with Conceptual Clustering for the Analysis of Security Alerts

 

RSDA #3: Data-driven Analysis

Monday, Nov. 3, 14:00 - 15:30 - Hotel RC - Catalana Room

Chair: Marcello Cinque, Federico II University of Naples

1.     Xin Chen, Charng-Da Lu and Karthik Pattabiraman. Failure Prediction of Jobs in Compute Clouds: A Google Cluster Case Study

2.     Cagatay Turkay, Stephen Mason, Bojan Cukic and Ilir Gashi. Supporting Decision-making for Biometric System Deployment through Visual Analysis

3.     João Matos, Nuno Coração and João Garcia. Record and Replay GUI-based Applications with Less Overhead

 

RSDA #4: Secure and Reliable Design

Monday, Nov. 3, 16:00 - 18:00 - Hotel RC - Catalana Room

Chair: Keun Soo Yim, Google

1.     Stefano Rosiello, Amish Choudhary, Arpan Roy and Rajeshwari Ganesan. Black Box vs. White Box Testing of an Enterprise SaaS Service

2.     Peter T. Breuer and Jonathan P. Bowen. Avoiding Hardware Aliasing: Verifying RISC Machine and Assembly Code for Encrypted Computing

3.     Ingo Stierand, Sunil Malipatlolla, Sibylle Fröschle, Alexander Stühring and Stefan Henkler. Integrating the Security Aspect into Design Space Exploration of Embedded Systems

4.     Anna Kobusinska. On the equivalence of SOA rollback-recovery consistency models

 

RSDA: Keynote 2

Tuesday, Nov. 4, 14:00 - 15:00 - CC - Room A

Chair: Domenico Cotroneo, Federico II University of Naples

Brendan Murphy. Keynote: The role of data analytics in reliability and security verification

 

RSDA #5: Anomaly Detection

Tuesday, Nov. 4, 15:00 - 15:30 - CC - Room A

Chair: Domenico Cotroneo, Federico II University of Naples

1.     Flavio Frattini, Santonu Sarkar, Jyotiska Nath Khasnabish, and Stefano Russo. Using Invariants for Anomaly Detection: The Case Study of a SaaS Application

 

RSDA #6: Internet threats and countermeasures

Tuesday, Nov. 4, 16:00 - 18:00 - CC - Room A

Chair: Ilir Gashi, City University London

1.     Andrea Di Florio, Nino Vincenzo Verde, Antonio Villani, Domenico Vitali and Luigi Mancini. Bypassing Censorship: a proven tool against the recent Internet censorship in Turkey

2.     Miranda Mowbray and Josiah Hagen. Finding Domain-Generation Algorithms by Looking at Length Distributions

3.     Wim Mees and Thibault Debatty. Multi-Agent System for APT Detection

4.     Cesario Di Sarno, Nadir Murru, Felicita Di Giandomenico, Silvano Chiaradonna, Alessia Garofalo and Gianfranco Cerullo. Power Grid Outlier Treatment through Kalman Filter

 

RSDA Closing Remarks

Tuesday, Nov. 4, 18:00 - 18:15 - CC - Room A


Predicting Buffer Overflow Vulnerabilities through Mining Light-Weight Static Code Attributes
Static code attributes are widely used in defect prediction studies as an abstraction model because they capture general properties of the program. To counter buffer overflow exploits, programmers use buffer size checking and input validation schemes. In this paper, we propose light-weight static code attributes that can be extracted easily, to characterize buffer overflow safety mechanisms and input validation checks implemented in the code for predicting buffer overflows. We then use data mining methods on the collected static code attributes to predict buffer overflows in application programs. In our experiments across five applications, our best classifier could achieve a recall of 95% and precision over 80% suggesting that our proposed static code attributes are effective indicators in predicting buffer overflows.

Mining security vulnerabilities from Linux distribution metadata
Security vulnerability research has long been hindered by the difficulty in obtaining structured, detailed data on individual vulnerabilities in sufficient quantities for analysis. We minedvulnerabilities from historical changelog data from Linux distribution packages, tapping a yet-unexplored source of security data. Changelogs provide a unified view of a application's evolution,version branching structure, and vulnerability patching history, allowing for the large-scale compilation of data on susceptible (and, in some cases, non-susceptible) versions of the originalapplication for each vulnerability. We then compiled vulnerability datasets for multiple releases of Debian and Ubuntu Linux, analyzing trends in vulnerability patching over time. Patchingpractices in Debian and Ubuntu were similar, and patch rates stayed constant throughout each distribution's lifetime.

Detection of Compromised Email Accounts used by a Spam Botnet with Country Counting and Theoretical Geographical Travelling Speed Extracted from Metadata
Seventy six percent of sent spam and phishing emails have their origins in botnets. They use compromised email accounts to send junk mail through other SMTP servers to their destinations. Commonly, research is focused on the rapid detection of compromised accounts to protect the integrity of other systems. One possible way to do this is to scan the email content or limit the amount of messages that can be sent from an IP address or an account during a specified time period. An anomaly is properly detected if the limit is reached or spam emails are identified. The objective of the presented research is to detect the anomaly with geo location and country counting without the knowledge of the email content. A second method, called Theoretical Geographical Travelling Speed, was developed to raise the detection rate without false negatives. The proposed method is seven times faster than the default rate limited to the detection of a compromised account.

An Experiment with Conceptual Clustering for the Analysis of Security Alerts
In response to attack against corporative and enterprise networks, administrators deploy intrusion detection systems, monitors, vulnerability scans and log systems. These systems monitor and record host and network device activities searching for signs of anomalies and security incidents. Doing that, these systems generally produce a huge number of alerts that overwhelms security analysts. This paper proposes the application of a conceptual clustering technique for filtering alerts and shows the results obtained for seven months of security alerts generated in a real large scale SaaS Cloud system. The technique has been useful to support manual analysis activities conducted by the operations team of the reference Cloud system.

Failure Prediction of Jobs in Compute Clouds: A Google Cluster Case Study
Most cloud computing clusters are built from unreliable, commercial off-the-shelf components. The high failure rates in their hardware and software components result in frequent node and application failures. Therefore, it is important to predict application failures before they occur to avoid resource wastage. In this paper, we investigate how to identify application failures based on resource usage measurements from the Google cluster traces. We apply recurrent neural networks to the resource usage measures, and generate features to categorize the input resource usage time series into different classes. Our results show that the model is able to predict failures of batch applications, which are the dominant jobs in the Google cluster. Moreover, we explore early classification to identify failures, and find that the prediction algorithm provides the cloud system enough time to take proactive actions much earlier than the termination of applications, with an average 6% to 10% of resource savings.

Supporting Decision-making for Biometric System Deployment through Visual Analysis
Deployment of biometric systems in the specific environment is not straightforward. Based on pre-deployment performance test results, a decision maker needs to consider the selection of sensors and matching algorithms in terms of the cost, expected false-match and false-non-match failure rates and the underlying quality factors. which depend on operational scenarios, personnel training, demographics, etc. In this paper, we investigate information aggregation through visualization of fingerprint authentication experiments obtained from a large scale data collection with 494 participants. The data was collected using four biometric image capture devices. Each fingerprint image was analysed with two image quality algorithms, and the matching scores were generated using three different matchers. Additionally we collected and analyzed the impact of demographic characteristics, such as gender, age, ethnicity, height and weight, on system performance.

Record and Replay GUI-based Applications with Less Overhead
Debugging is, typically, a hard and time-consuming task. Fault-replication mechanisms facilitate the debugging process by providing software developers with an error’s “steps-to-reproduce”. The main challenge of fault-replication is the overhead imposed by recording all non-deterministic events of an execution, such as thread interleaving and the user interaction with the application. The overhead imposed by user input is especially significant for graphical-based applications. This paper proposes a new approach to record and replay user interactions with the GUI, which significantly reduces the amount of recorded information. We developed an open-source implementation of an execution-recording framework and evaluated it using a test bed that includes real bugs from well-known applications. We achieved average reductions of 3567 times fewer events recorded.

Combining Black Box Testing with White Box Code Analysis: A Heterogeneous Approach for Testing Enterprise SaaS Applications
Faulty enterprise applications may be producing incorrect outputs or performing below service expectations dueto code vulnerabilities that do not show up in standard code analyzers (e.g., CAST [1]). A tester can figure out such functionality or performance issues by black box functional testing and fault injection. Then based on the specific test scenario, targeted white box code analysis can be done to figure out the code errors causing the application functionality or performance issue. In this paper, we use such a heterogeneous testing approach that combines black box testing with white box code analysis for testing an enterprise licensing application (ELA). We describe experiments designed to uncover functionality and performance issues in ELA and then explore the corresponding code errors causing the issues. We find that our approach is effective in faster detection and fixingof application performance and functionality errors than simple white box code analysis.

Avoiding Hardware Aliasing: Verifying RISC Machine and Assembly Code for Encrypted Computing
'Hardware aliasing' classically arises when a processor through failure or design has fewer bits than required to address each memory location uniquely, and also -- nowadays -- in the context of homomorphically encrypted computing. In such contexts, different physical locations may sporadically be accessed by the same address due to factors not directly under the control of the programmer but still deterministic in nature.We check RISC machine and assembly code to ensure that each memory address is calculated in exactly the same way at one write to and the next read from it, which allows programs to work correctly even in a hardware aliasing context.

Integrating the Security Aspect into Design Space Exploration of Embedded Systems
Conventionally, the process of design space exploration(DSE) in embedded system design considers performance,energy and cost as important objectives for optimization.However, in many domains such as in modern day carsthe security aspect is becoming more and more significant. On the other hand, the inclusion of security aspect adds a new dimension to the existing complexity of large design spaces,thus an automated support for this is highly desired. The goal of this work is to integrate the security constraint in an automated DSE process to obtain an architecture which is both cost-optimized and secure. In specific, for a given system, our approach defines a formal notion of security, which along with other parameters is fed as an input to the DSE process to obtain an architecture satisfying the defined security and real-timerequirements. An evaluation of the proposed approach isalso performed using an example automotive embedded system.

On the equivalence of SOA rollback-recovery consistency models
There are different recovery consistency models that define when the recovered state of SOA processing is perceived as consistent. In this paper we prove that with the appropriatedefinition of services, the relaxed rollback-recovery models are equivalent to strict recovery model, and they allow to recover a consistent service state at lower costs compare to strict recovery consistency model.

Using Invariants for Anomaly Detection: The Case Study of a SaaS Application
Invariants represent properties of a system that are expected to hold when everything goes well. Thus, the violation of an invariant most likely corresponds to the occurrence of an anomaly in the system. In this paper, we discuss the accuracy and the completeness of an anomaly detection system based on invariants. The case study we have taken is a back- end operation of a SaaS platform. Results show the rationality of the approach and discuss the impact of the invariant mining strategy on the detection capabilities, both in terms of accuracy and of time to reveal violations.

Bypassing Censorship: a proven tool against the recent Internet censorship in Turkey
Users of mobile devices are experiencing great difficulties to circumvent Internet censorship technologies that violate human rights. Mobile users do not have full control of their own systems, and in many cases, they cannot even change the configuration imposed by their 3G/4G providers. Such limitations allow the provider acting under the government authority to enforce specific Internet filtering mechanisms, and to prevent access to censored material. In this paper, we survey the events related to the Internet censorship happened in Turkey during the first months of 2014 and we introduce DNSet, an Android app that has been used by Turkish citizens to successfully circumvent the Internet censorship. In particular, DNSet allows mobile users to easily change the DNS server imposed by their 3G/4G providers, without the mobile users have administrative rights on the device (i.e. without rooting the device). We report on data and information that has been anonymously collected through the DNSet application. Furthermore, we raise up the suspicion that a few censorship activities in Turkey began at least a month before the official ban on Twitter.

Finding Domain-Generation Algorithms by Looking at Length Distribution
In order to detect malware that uses domain fluxing to circumvent blacklisting, it is useful to be able to discover new domain-generation algorithms (DGAs) that are being used to generate algorithmically-generated domains (AGDs). This paper presents a procedure for discovering DGAs from Domain Name Service (DNS) query data. It works by identifying client IP addresses with an unusual distribution of second-level string lengths in the domain names that they query.  Running this fairly simple procedure on 5 days’ data from a large enterprise network uncovered 19 different DGAs, nine of which have not been identified as previously-known. Samples and statistical information about the DGA domains are given.

Multi-agent System for APT Detection
Advanced Persistent Threats (APTs) are targeted cyber attacks committed over a long period of time by highly skilled attackers. The ever increasing number of successful attacks indicates that classical network protection solutions (firewalls, Intrusion Detections Systems, proxies etc.) are no longer sufficient. Therefore, in this paper we propose a new system that combines multiples approaches using advancedaggregation techniques to achieve a better detection performance. We also test the system on real data from a small corporate network, and show that our system is able to attain a high probability of detection to probability of false alarm ratio.

Power Grid Outlier Treatment through Kalman Filter
Power grid monitoring is a very challenging task to avoid both disasters and economic damages. This is because many critical services e.g. health and welfare use the power grid as a reliable utility. Wide Area Monitoring System (WAMS) represents a technological solution to perform power grid monitoring. WAMS is based on two components: Phasor Measurement Unit (PMU) and Phasor Data Concentrator (PDC). PMUs are deployed within the power grid and their purpose is to provide synchronized measurements of both magnitude and angle phase of the electric signal. PDC analyses measurements provided by different PMUs to assess the global status of the power grid. In this paper we propose an enhanced WAMS architecture based on Kalman filter to improve state estimation and fault detection within the power grid; Kalman filter allows to reduce the noise effects on measurements gathered by PMUs. Experimental results show the effectiveness of the proposed solution.