5 Big Data questions for Paco Nathan

Do you want our logo?

Do you want our logo description

Curating the program of a large conference like Big Data Spain is as exciting a challenge as it is a huge responsibility. Would you expose an audience of 800+ Big Data international experts to a self-denominated "Evil Mad Scientist" for the opening keynote of a 2-day long event?

This is exactly what I did by inviting Paco Nathan to join us at the 2014 and 2015 editions of the conference. To my credit, Paco's suave manners and his immense reputation in the industry are as far from the image of Goethe's Faust as it gets. He is a BS Math Sciences and MS Computer Science graduate from Stanford. His 30+ years tech experience ranges from Bell Labs to start-ups and areas like distributed systems, functional programming, cloud computing, machine learning and analytics.

The current activity of Paco involves leading O'Reilly Learning as Director with HQ in Sebastopol, California. I asked Paco two weeks ago in Madrid whether his speaking commitments and innumerable projects leave him any time at all to actually sit and do any programming. He told me that indeed his team and he are very much involved in just that. Judging from the packed theatre at his workshop Crash Introduction to Apache Spark at #BDS15 and our survey post-event, Paco's mastery of functional languages and Scala meets the most demanding expectations of the sharpest developers.

5 questions about Big Data

As a testament to Paco's kindness, he found the time to answer five questions about Big Data. Our last question is unshamefully selfish: I tapped into Paco's experience in large conferences in the US and Europe to improve our own.

Question #1: Data Science or Data Analytics

Analytics is an older term, perhaps broader in usage, though much narrower in scope. At least from my POV, the word analytics invokes organizations where the leadership knew about *data modeling*, *business intelligence*, *data warehouses*, etc., which were dominant approaches from the 1990s and earlier. Trying to compete today in global markets, where companies within the generation of Uber and Salesforce have ample power and flexibility, that implies using much more contemporary practices that scale. It also implies avoiding silos, both in terms of data management and thinking -- ergo Data Science.

Question #2: Marketing materials, like fishermen, have a reputation for exaggerating. Please provide an example of a truly “massive” data stream whose management and analysis would benefit from the current state-of-the-art in Big Data technologies.

Genomics, neuroscience, cancer research, bioinformatics, and life sciences in general. We're seeing big wins in genomics, due to the fact that the data rates are enormous, and yet Big Data technologies provide the needed tooling in a cost-effective way to enable advances in science. Check out talks by Timothy Danford in particular. CodeNeuro and work by Jeremy Freeman's lab in neuroscience provide another example of how large the data rates are in life sciences, and how much important work can be accomplished using Big Data technologies that simply were not feasible before.

Question #3: Please explain what an "almost-truism" about data is and provide your favourite example.

Sounds like a talk at Galvanize in Seattle this summer :) In that talk I attempted to give guidance to students in a new Data Science fellowship program, discussing important but non-intuitive trends in the industry. Most especially, I wanted to avoid making black and white distinctions, since some of these trends represent subtle nuances. That's crucial for those just entering the field. Perhaps adding some humor, too.

Note the *inceptionism-atop-impressionism* used for illustrations in the slide deck. That should be a hint about the overall theme! Therefore, in some sense, #05: Code Inceptionism is the thesis. Data Science is about *recontextualizing* at its heart: universities promote *decontextualizing* subjects into disciplines. That's important, and foundational. Data Science is an applied thing, a complement to university foundations.

Even so, #09: Learning Curves are Forever represents my passion among those eleven topics. This is where I choose to show off some pyrotechnics by applying advanced data science on large scale social problems that matter.

Question #4: How do you manage simultaneously to help (and influence) corporations and start-ups alike? Does size matter for Big Data to be a disruptive force for an organisation or company?

That question had worried me early on: those two audiences would seem in conflict. However, more recently I've seen how one neuroscience researcher with one microscope can generate more data than all of a large bank within days of experimentation. So the dividing lines of large vs. small organization seem to dissolve in the context of data. The disruptive force is more likely to be about changing process within an organization, which changes access to combining data that was previously in silos or not collected at all, which changes the thinking about how to leverage that data. Size of total available market is perhaps the larger question there.

Question #5: Big Data Spain coincides this year in Madrid on the same week with the CPhI Worldwide, a huge pharma congress. The audience of tens of thousands of attendees at CPhI dwarfs BDS15's 800+ delegates from hundreds of companies and its 50 speakers from all over the world. Hotel bookings ran very scarce weeks ago. How can Big Data Spain become the most relevant annual event in Europe in Big Data, if not the largest?

I see some excellent attributes of Big Data Spain that can scale well. First is location in España! So many speakers whom I know in EU *prefer* to visit Spain over other EU conference destinations -- which tend to be more in the north. Who wouldn't? :) I'm so glad to visit Spain.

Another aspect is language. The dual language content in Spanish and English provides much breadth of accessibility. It could be a good idea to pay for quality video transcriptions of the presentations. I can imagine that would help considerably for cross-language access.

One interesting point that I've noticed about working in conferences in both Spain and Latin America is the velocity of content spreading through social media. Latin countries of Europe -- Spain, Portugal, Italy, perhaps France -- share more than languages with Latin America: there is culture in common, the ties of family and friends, many companies operating on both sides of the Atlantic, exchanges among universities, etc. That can work to great advantage for thought leadership about Big Data based in Spain, where it serves as a focal point throughout Latin America as well and gains momentum for conference speakers, sponsoring companies, etc. Access to quality content in Spanish now extends into much of the United States as well. Leverage that advantage!

Pricing is a third consideration. As the conference grows, ostensibly so will sponsorship, and hopefully the pricing can be kept so amazingly affordable. Support for students would be great too. In some Big Data conferences in the US, we see VC firms holding contests for grad students to get fellowships to attend from afar: full travel expenses, stipends, guests of honor at investor network dinners, etc. That serves the students, the investors, and the richness of the conference. It could also help reinforce how Spain becomes a destination for a more global audience.

I would also suggest considering a livestream component of the conference, so that those who cannot attend in person may have some experience at remote sites. This get used so effectively for Cassandra Summit, for example, where it nearly doubles the attendance. Again, that helps build better reach for the speakers and sponsors, extends thought leadership across a broader audience.

Looking ahead

With these ideas and our commitment to quality of the program, the event will keep growing in reach, impact and substance. Help us remain relevant by joining the BigDataSpain (English) and Big Data Hispano (español) groups on LinkedIn.

Thank you very much to Paco Nathan and the dozens of suggestions from some of the 877 attendees, 50 speakers and 16 sponsors at the fourth edition of the conference. Follow @pacoid for more insights (no DM's please). This is all for now.