Workshop en Knowledge Discovery y Cyberseguridad

Por Felipe Zipitría | April 20, 2016

Programa

09:00 - Formal Concept Analysis (FCA) - Basics and Beyond

Amedeo Napoli (INRIA Nancy, France)

Knowledge discovery in large and complex datasets is one of the main topics addressed by the so-called Data Science but is also a topic of main interest for the Science of Knowledge (or Artificial Intelligence). Indeed data and knowledge are interacting and knowledge discovery is applied on datasets and has a direct impact on the design of knowledge bases (or ontologies). Following this idea, it can be interesting to have at hand a generic formalism that can support knowledge discovery and, as well, knowledge representation and reasoning. Accordingly, in this presentation, we introduce Formal Concept Analysis (FCA), a mathematical formalism for data and knowledge processing. FCA starts with a binary table composed of objects and attributes and outputs a concept lattice, where each concept is made of an intent (i.e. the description of the concept in terms of attributes) and an extent (i.e. the objects instances of the concept). Intents and extents are two dual facets of a concept that naturally apply in knowledge representation. There are two main variations of FCA, Relational Concept Analysis (RCA) for dealing with relational data and Pattern Structures (PS) for dealing with complex data (numbers, sequences, trees, graphs). We will discuss the usability of FCA and its variations in knowledge discovery and knowledge engineering through various tasks and applications, such as e.g. data and text mining, information retrieval, biclustering and recommendation, and extraction of functional dependencies. Finally, the structure of a concept lattice can be visualized and allows a suggestive interpretation for human agents while it can be also processable by software agents.

10:30 - Coffee Break

11:00 - FCA and Biclustering - An application to entity resolution in Starcraft 2

Víctor Codocedo (LORIA/INRIA/CNRS - Nancy, France)

Formal Concept Analysis is a data analysis and mining formalism based on the extraction of maximum rectangles from an object-attribute incidence table or formal context. The pattern structure model generalizes FCA, adapting it to deal with complex object descriptions, such as numerical data, sequences, intervals, graphs and partitions. In this talk we provide an introduction to the model of pattern structures and particularly, we discuss on how we can exploit it to enumerate similar value biclusters from a matrix. Biclusters are a special type of clusters in which both, the object and the attribute spaces are cut simultaneously. Biclusters have proven very useful for Information Retrieval tasks such as indexing, and in the field of Bioinformatics, for gene mining. We provide an example in the usefulness of Biclusters for entity resolution in video-games, this is unifying different avatars given a professional player, by recognizing his game style. We explain this scenario and evaluate this approach using game traces of the video-game Starcraft 2.

11:45 - Autonomic Communications are alive, they are named Control Theory for Communication Services

Javier Baliosian (Instituto de Computación, UdelaR, Uruguay)

More than ten years ago Autonomic Communications (AC) were considered as the solution to reducing the cost and complexity of managing (then) future heterogeneous networks and ubiquitous computing devices. In general an autonomic system is one that is self-configuring, self-optimising, self-healing and self-protecting. These systems are supposed to require low administration and to be mostly policy-based managed. Past the standard academic hype, the practical and theoretical difficulties that making a working and large-enough autonomic communication system became obvious and the research effort moved to other issues. However, AC’s objectives are still there and as services and their enabling infrastructures become more and more complex, the need to take slow and fragile human decision making out of the management loop is even more needed than ten years ago.

12:30 - 14:30 Free time for lunch

14:30 - Revisando las tácticas arquitectónicas para seguridad

Hernán Astudillo (Universidad Técnica Federico Santa María, Chile)

Las tácticas arquitectónicas son decisiones de diseño para mejorar algún factor de calidad del sistema. Desde su propuesta inicial, han sido formalizadas, comparadas con patrones arquitectónicos y asociadas a estilos arquitectónicos, pero el conjunto inicial de tácticas para seguridad sólo ha sido perfeccionado una vez. Una colaboración de investigadores de arquitectura de software y de seguridad examinó este conjunto desde la perspectiva de investigación en seguridad, y concluimos que algunas tácticas son realmente principios o políticas, algunas no son necesarias, y otras no cubren las funciones necesarias para asegurar los sistemas, haciéndolas poco muy útil para diseñadores. En esta presentación veremos un conjunto y clasificación refinados de las tácticas de arquitectura para seguridad, mostraremos cómo materializarlas usando patrones de seguridad, y comentaremos brevemente trabajo en curso de ingeniería de software experimental para validar la clasificación y métodos.

15:15 - Using ontology technology for the identification and classification of computer security attacks

Juan Diego Campo and Marcelo Rodríguez (Instituto de Computación, UdelaR, Uruguay)

In the field of cybersecurity it is increasingly necessary to standardize and structure information to provide better treatment and response to computer incidents. In this line, MITRE has proposed the Structured Threat Information Expression (STIX), a structured language for describing cyber threat information so it can be shared, stored, and analyzed in a consistent manner. In particular, a STIX declaration may embody a list of known attacks patterns which are required to be expressed using CAPEC. An attack pattern is a descriptions of common methods for exploiting software, CAPEC is a dictionary and classification taxonomy of known attacks. On the other hand, the massive development of web technologies has radically changed the way users access and use computing resources. Security mechanisms are needed capable of protecting information and activities carried out by the end user, as well as the organizations with which they are interconnected. In this talk we present the outcomes of an experiment we have developed regarding the detection and classification of security attacks using ontologies concepts/mechanisms. We put forward a case study concerned with security threats on web systems. We shall describe a model that was developed to partially represent the HTTP protocol and some attacks that can be carried out on it and shall also illustrate the use of attack patterns and CAPEC to provide an initial classification and reporting of those attacks.

16:00 - Automated detection of web application attacks using machine learning techniques

Rodrigo Martínez (Instituto de Computación, UdelaR, Uruguay)

The constant growth in the use of Web Applications together with the evolution of available web technologies have opened up a new range of possibilities. It is known that the proliferation of Web Application vulnerabilities is increasing and the task of promptly identifying and fixing vulnerabilities is often not possible or cost prohibitive. The technology of Web Application Firewall (WAF) has been proposed as a means to leverage the security of (Web) applications with no need to modify their code. A WAF makes it possible to perform a real-time analysis of the security behaviour of an application using a configurable rule set that helps predicting, and usually preventing, attacks against the application based on the information embodied in the HTTP request/response of an application´s run. However, frequently it is the case that the lack of flexibility of the rule set hinders the finding of new types of vulnerabilities and also gives rise to the generation of False Positives, what eventually renders the WAF to become a Denial of Service tool. In this talk we shall discuss the application of machine learning techniques during the real-time analysis performed by a widely deployed WAF, called Mod_Security, to improve not only its detection capabilities but also to make decrease the False Positive rate.

16:30 - End of workshop