Is Big Data an opportunity to spot outliers in a time-constrained world?

An outlier or black swan (Black swan theory, 2012) is a strange value, it is difficult to detect when is going to befall.

The black swan theory or theory of black swan events is a metaphor that describes an event that is a surprise (to the observer),it has a major impact, and after the fact is often inappropriately rationalized with the benefit of hindsight. For instance (Taleb, 2007), the 11-S attack was a kind of an outlier, theoretically nobody could foresee this event: it was the first of its kind, and it had a major impact in the whole world.

There is a huge discussion on the matter is wheter you should or should not apply this technology to your business. If your business is related with human behaviour and its dynamics, pattern discover or complex dynamic systems we believe that the answers could be “yes”, the whole information is essential: outliers are intrinsically inside.

Days ago we were listening to Leslie Valiant from Harvard in an Alan Turing sinopsium at “Fundación Ramón Areces” (Valiant L, 2012) talking about evolution and mathematics, functions and objectives, time and space; minimum changes in the target function or in the nature of the algorithm could trigger unpredictable changes. A set of interest papers are talking about critical transitions too, such as (Lade SJ, Gross T, 2012) “Critical transitions are sudden, long-term changes in complex systems that occur when a threshold is crossed” or (Safarzyńska K et. al, 2012) “… interest has arisen in the study of large-scale socio-technical transitions to an environmentally sustainable economy”.

Statistics provides us with a “Survey Methodology”, and it could be useful to obtain a perfect representative sample. However, sometimes we are not able to achieve this perfection point, even classic statistics manage such outliers like residue and often disregard them. At the moment we are able to harvest a huge amount of information and process it in semi-realtime (e.g.: by means of Storm and Hadoop) checking the whole set of data to discover potential outliers. Perhaps we are in the process to creating or applying specific algorithms to detect these outliers, black sawns, etc … and react in a constrain-time world.

As Pedro Bernal Gutiérrez (Spanish Centre for National Defence Studies ex-director and Army Lt. Col. air) informed in an amazing cryptography speech days ago regarding to Bomba and Colossus machines (Colossus, 2012) in the Second World War; reacting time is important from a military point of view, but also for our business (from fraud detection to change in consumer habits) but is also important to react on time to this crucial, strange and beautiful outliers or black swan events, and it could be achieved through big data technology, exploring the whole search space looking for it.


(Black swan theory, 2012) Black swan theory

(Taleb, 2007) The Black Swan: Second Edition: The Impact of the Highly Improbable

(Valiant L, 2012) La evolución biológica como forma de aprender 

(Lade SJ, Gross T, 2012) Early warning signals for critical transitions: a generalized modeling approach.
(Safarzyńska K et. al, 2012) Evolutionary theorizing and modeling of sustainability transitions.

(Colossus, 2012) Colosuss computer

(Meta S. Brown, 2012) Big Data Blasphemy: Why Sample?

Foto de cnavarro

Telecommunications Engineer from the UPM, Carlos Navarro has been working at Paradigma for 7 years. In that time he has participated in many projects, including semantic technology projects, influence measurement, Big Data projects and other European projects such as MixedEmotions, in which he was the architectural leader for the development of a microservices platform for the extraction of emotions.

See all Carlos Navarro activity
Foto de rabad

Hace tiempo que me se trasladé a Madrid desde Valencia en busca de retos. Mi carrera profesional ha crecido paralelamente con la implantación definitiva de Internet, y he trabajado en todos los niveles asociados con el análisis de ésta: desde la recolección de datos hasta la visualización, área en la que actualmente estoy centrado. Divido mi tiempo entre el trabajo, trastear con Arduino, buscar la mejor tapa y recorrer Madrid en bicicleta.

See all Rubén Abad activity
, Mario Muoz
Foto de rmaestre

Roberto Maestre desarrolla su trabajo, junto con sus compañeros de Paradigma Labs, en los campos de Procesado de lenguaje natural, análisis de redes, rastreo de información y web semántica. Estudió Informática en la UPM, y actualmente se encuentra realizando su doctorado en el campo de los modelos algebraicos para la construcción de sistemas expertos y de razonamiento automático en el DIA FI-UPM. Anteriormente trabajó en el CSIC en el proyecto TECT de la ESF relacionado con el estudio de redes dinámicas de cooperación. Siempre dispuesto a probar una nueva tecnología o poner a prueba una teoría.

See all Roberto Maestre activity

Escribe un comentario