D4D Challenge completed!!

As we explained in a previous post, during the last months we have been working on a project for the Orange D4D Challenge. Our main task has been analyzing and visualizing the provided mobile communication datasets (collected in Ivory Coast from December, 2011 to April, 2012) looking for relevant and original findings for the society of this West-African country, that is, showing deductions in an easy and friendly way which helps government and NGOs to perform more accurate and correct decisions.


Therefore, it could be said that the project is divided into 2 dimensions:

a) Scientific side: gathering information on similar research projects (behavioural data, data mobile commuting, people dynamics), using different tools and strategies to manipulate such big amounts of data in an efficient way (BigData, Hadoop, Pig), checking diverse visualization options (Excel and R charts, Gephi, GIS tools like qGIS, uDig, ArcGIS, Leaflet, Polymaps, D3.js…), reflecting on the kind of extracted conclusions and their possible interpretations.

b) Cooperative side: mobile communication data are plentiful and their structure is really simple. There is a great deal of applications where this sort of data can play a really important role. Moreover, as their nature is so related with all of us (communications), many of the inferred ideas can be quickly tied with common people’s daily lives. Leaving aside those solutions with a major interest for companies (improving business basing on potential customers’ behavior, habits and trends, elaborating more sophisticated and customized marketing campaigns…), we have focused on those ones which can contribute to make people day-to-day fairer and more comfortable, especially in underdeveloped countries (detecting commuting patterns allowing public transport policy improvements, more adequate urban planning, determining massive usage of hospitals, police stations…).

Let’s describe how the project was faced and developed:

1) Studying related research project, both from private companies and from universities.
2) Storing and Processing datasets with cutting-edge technologies Hadoop/PIG, Mongo, Python, GIT…
3) Statistics: normalizations, means, dispersions, medians…
4) Charts: Excel, R, Python
5) Visualizations: network diagrams (Gephi), Kernel Densities Estimations maps(qGIS, ArcGIS)…
6) Web: customizable and interactive animations, making easier to display and spread the reached conclusions (Leaflet, D3.js, CartoCSS, TileMill, Mapnik, Polymaps)
7) Paper: collecting all our discoveries to write a final report (Latex)

As a result of the whole process, many interesting findings and ideas:

a) A designed and implemented mathematical model to detect geospatial-temporal commuting patterns.
b) Distinction between commuters and non-commuters, apart from their evolution during every day and for each city.
c) Identification of time periods (hours, week days) depending on the amount of phone calls; moreover, those regions or cities originating them are also located.
d) A set of charts and maps which illustrate the previous model, making easier to deduce interesting findings.
e) Discovery of the diary commuting pattern for this specific dataset (morning peak, central valley, evening peak).
f) An online application to display all this information in a friendly and customizable way
g) Drafting new R&D open worklines with a igh potential (clustering, replicating algrithms with other datasets, tesellations, use of DTW & LCS operators…)

Summing up, we are really glad and satisfied with the work we carried out. It has been a fantastic opportunity which has allowed us to learn a lot in different knowledge areas. The key for all this, motivation, doubtless; since the very beginning and during the whole challenge we have been very thrilled trying to collaborate and, at the same time, eager to learn from each other.

Foto de rlario

Raúl Lario es ingeniero superior de Telecomunicación por la Universidad Politécnica de Madrid. Ha trabajado como Desarrollador, Analista y Scrum Master en proyectos web Java: portal Simyo, CMS para Periodista Digital. También le interesan las Metodologías Ágiles, la Gestión de equipos y proyectos y la Interlocución con el cliente. Actualmente trabaja en temas de I+D en Paradigma Labs, pasando al mundo Python y estudiando temas de Semántica: NLP, minería de opinión, Linked Data, Tagging, Classification, NER... para diseñar e implementar servicios REST comercializables como Saas.

See all Raúl Lario activity
, Adrin Mart
Foto de rabad

Hace tiempo que me se trasladé a Madrid desde Valencia en busca de retos. Mi carrera profesional ha crecido paralelamente con la implantación definitiva de Internet, y he trabajado en todos los niveles asociados con el análisis de ésta: desde la recolección de datos hasta la visualización, área en la que actualmente estoy centrado. Divido mi tiempo entre el trabajo, trastear con Arduino, buscar la mejor tapa y recorrer Madrid en bicicleta.

See all Rubén Abad activity
, Mario Muoz
Foto de rmaestre

Roberto Maestre desarrolla su trabajo, junto con sus compañeros de Paradigma Labs, en los campos de Procesado de lenguaje natural, análisis de redes, rastreo de información y web semántica. Estudió Informática en la UPM, y actualmente se encuentra realizando su doctorado en el campo de los modelos algebraicos para la construcción de sistemas expertos y de razonamiento automático en el DIA FI-UPM. Anteriormente trabajó en el CSIC en el proyecto TECT de la ESF relacionado con el estudio de redes dinámicas de cooperación. Siempre dispuesto a probar una nueva tecnología o poner a prueba una teoría.

See all Roberto Maestre activity

Escribe un comentario