Is the current hype about Deep Learning justified?

Some key takeaways from the Cambridge Spark London Data Science Summit 2017

Cambridge Spark
Cambridge Spark

--

2017 London Data Science Summit at the Royal Statistical Society

Dr. Karthik Tadinada, Director of Data Science at Featurespace offered insight into real-time fraud detection and the signals to look out for when identifying fraudulent behaviour. For example, an approach to detect application fraud is to not only aggregate device data and third party data (e.g. consortium data to establish an individual’s credibility), but also gather website behavioural data to identify changes in interaction. Behavioural data provides a great source of signals and are incredibly good at picking up the predictable mouse trail patterns from bots. Additional signals to look out for include:

  • concurrences, such as how much data are you getting for an individual email and aggregate similar or sequential emails
  • patterns developing, such as auto-generated attributes and repetitive purchasing patterns
  • “out of pattern” behaviour — look at what’s congruous and what fits with recent history as people are mostly creatures of habit

We used really cool and trendy statistics from the 1930’s, which incidentally most data scientists are using as well.

Prof. Mark Girolami, Fellow, Alan Turing Institute

Mark presented another approach to fraud, where he outlined how his research team tackled detecting counterfeit currency as a statistical hypothesis. They didn’t follow the Deep Learning route as they needed to understand and interpret the model in order to develop a very flexible, data-driven solution that could be deployed across various countries. The methodology included:

  • mixture model clustering to represent the characterisation of the bank notes
  • optimisation theory to segment images of the notes into regions and optimising the tests on each region

Real world applications of Deep Learning were also covered. Dr. Miriam Redi, Research Scientist at the Wikimedia Foundation demonstrated how machine vision frameworks can extract intangible properties such as beauty and sentiment of images.

Marcin Druzkowski, Machine Learning Engineer at Ocado Technology then spoke about using Convolutional Neural Networks to automatically tag and classify customer emails based on priority, in order to improve customer service. Marcin raised the importance of ensuring models are reproducible and maintainable. For example, to support reproducibility at Ocado Technology, every model in production is kept with a JSON file containing information about what code the model has been trained on, as well as details about the environment (Docker image), the data, and the parameters used.

Patrick Short, CEO of Heterogeneous further reinforced that there is no Data Science without reproducibility. Not only is it essential when deploying Machine Learning in industry, but reproducibility is critical in the field of genetics where findings need to be replicated before “rolling out to generate excitement.” Patrick discussed the Machine Learning methods used to conduct Genome-Wide Association Studies. Like research undertaken by Mark and Karthik, the techniques used are not particularly sophisticated, simply linear models that examine if individuals systematically differ in the trait being studied to identify variations associated with a particular disease.

Patrick Short, CEO of Heterogeneous

Considering the range of techniques and applications that were presented at the Data Science Summit, it is clear that deciding which techniques you apply all comes down to the problem you are trying to solve.

It’s very tempting to go to the latest research paper and find the latest “state-of-the-art model to apply, but this is not the way to go. We should always question what we really want to check.

Marcin Druzkowski, Machine Learning Engineer, Ocado Technology

Be sure to join us at the next Data Science Summit to hear about real-world machine learning applications from industry experts.

Our next Summit will take place in Manchester, March 2018. Book your ticket now!

--

--