Automated Machine Learning has become a hot topic in the past two years. Players like DataRobot, H2O or SparkBeyond are pushing the topic onto the Data Science agenda and Company Managers are buying into the marketing message of “Democratizing AI”: Conjuring up expectations to substitute missing Data Scientists by giving data savvy business experts access to such tools. The latter one being a claim that we have heard of again and again with every new generation of Visual Data Analytics platforms since their first appearance in the late 1990s.
Time for a reality check!
In this session, I am going to show and discuss results we obtained at Deutsche Telekom by employing one of the currently most sophisticated Auto Machine Learning tools available on-premise. Benchmarking the tool on several application domains against human Data Scientists as well as other Auto ML tools, I am going to put our findings into perspective with the Data Mining life cycle and show where Auto ML tools actually provide substantial support – but also where it falls short of the expectations and high hopes. Finally, I will conclude with an outlook on the role of Data Scientists and the future relevance of Automated Machine Learning.
From search engines over natural language processing to genome research so called “Knowledge Graphs” become more and more prevalent. But what exactly is a “Knowledge Graph”, what components does a “Knowledge Graph” consist of and how can I utilise a “Knowledge Graph” to deliver value for my business problems? If you are interested in one of those questions, this session is exactly for you. After the session the participants have jointly created their first Knowledge Graph based on open source technologies, will have a good understanding of the basic terms such as “Knowledge Graph”, “Ontology” or “Semantic Reasoning” and are enabled to acquire further information and put them into the broader context.
For a successful and interactive participation the following things are required: Laptop, Docker (incl. Docker Compose) installation, Git Client and an open mind 🙂
A major requirement for Credit Scoring models is of course to provide a risk prediction that is as accurate as possible. In addition, regulators demand these models to be transparent and auditable. Thus, in Credit Scoring very simple Predictive Models such as Logistic Regression or Decision Trees are still widely used and the superior predictive power of modern Machine Learning algorithms cannot be fully leveraged. A lot of potential is therefore missed, leading to higher reserves or more credit defaults.
This talk presents an overview of techniques that are able to make “black box” machine learning models transparent and demonstrate how they can be applied in Credit Scoring. We use the DALEX set of tools to compare a traditional scoring approach with state of the art Machine Learning models and asses both approaches in terms of interpretability and predictive power. Results show that a comparable degree of interpretability can be achieved while machine learning techniques keep their ability to improve predictive power.
12:05 - 12:50
Barcamp #5: Auto Forecast - program conception/design (Bianca)
Eine lange Zeit bildeten Recurrent Neural Networks die Grundlage für state-of-the-art Modelle zur Verarbeitung sequentieller Daten wie z.B. Texte oder Audiosignale. Für Aufgaben aus dem Bereich des Natural Language Processing wie maschinelle Übersetzungen, Sprachverständnis und Textgenerierung wurden diese kürzlich durch Modelle wie BERT, GPT-2 und XLNet abgelöst. Sie alle basieren auf der Transformer Architektur, die durch den Einsatz des Self-Attention-Mechanismus vollständig auf rekurrente und convolutional Layer verzichten kann.
Ziel des Vortrags ist, die Funktionsweise der Bausteine der Transformer Architektur anhand von Use-Case-Szenarien intuitiv verständlich zu machen, sowie deren Vor- und Nachteile im Vergleich zu RNNs zu erläutern.
Image recognition is one of the most active applications of Machine Learning and Artificial Intelligence. In many cases, the quality of current techniques surpasses even human capabilities. A current trend here is software that runs smoothly even on mobile and embedded hardware.
One popular method for quickly finding objects in images is YOLO ("you only look once"). After a brief introduction to image recognition with neural networks, we will apply them in practice. We have prepared for you code snippets that will make it as easy as possible to start using YOLO. We will give an introduction into how you can use these snippets, and then the floor will be yours.
During the session, we would like you to form small groups, to brainstorm on interesting use cases and to then start prototyping them. To fuel your imagination, we are bringing a unique data source.
The simplest requirement to participate in the session is bringing a Laptop and access to Google Colab (i.e. a Google Account). If you like, you can also try to set up the project dependencies on your own Laptop, ideally a Linux machine that has Python 3.x, an environment manager (e.g. Anaconda/Miniconda), and Git installed.
The eCommerce sector is one of the key pioneer when it comes to the usage of data science and machine learning. In this slot, Benjamin Aunkofer will introduce three of the most value generating and beneficial use cases you can pitch, plan and realize in eCommerce companies by using Clustering, Classification and Regression Models using Ensemble Learning and Deep Learning.
1. Forecasting of Revenues, Order Cancelations and Returns in Controlling
2. Item Display Order Ranking (for dynamic shop-pages) in Product Management
3. Probalistic Attribution Modelling for improved budget allocation in Marketing
14:55 - 15:40
Application of Neural Nets in Content Classification
Larger companies face the problem of large numbers of incoming documents, binding significant resources in reading and redirecting to the corresponding sections. In this presentation an approach is presented, which discusses the solution regarding incoming emails at ERGO Nuremberg. Approx. 1k emails receives this location per day, which were manually read and redirected. Expecting an estimated increase of the email rate, a solution based on current machine learning approaches has to be implemented to support the company. This talk will cover the data preparation, the architecture and the final operationalization of this model in detail. The additional challenges of data science in the field of insurances regarding data protection law will be discussed as well as the technical requirements and boundaries as well as the usual pitfalls.
14:55 - 15:40
Barcamp #9: Presentation - Winter is coming (Paavo)
Deutsche Telekom is the 4th biggest telecommunications company in the world, with millions of customers using its roaming mobile services every day. This presentation details how Deutsche Telekom builds and deploys machine learning capabilities for its commercial roaming business.
I will give an overview of our machine learning use-cases and dive into addressing the challenges of at-scale deployment including several aspects of business integration, data quality monitoring, feature engineering, machine learning model life cycle management, package management, process automation, code reproducibility and continuous integration. You will learn about the end-to-end machine learning workflow what we build to produce capabilities from forecasting to anomaly detection and integration of different open source data science / data visualization tools.
Finally I will cover lessons learned on challenges associated with enterprise IT integration of open source solutions, providing practical examples of pitfalls and how we addressed them.
16:00 - 16:45
Barcamp #11: Why you shouldn't be a data scientist (Philipp)
Deutsche Post DHL Group employs over half a million people around the globe across postal services, international express delivery, forwarding and contract logistics. Not surprisingly, our activities generate a lot of data. We believe that our data is the key to unlocking untapped potential in our company – but how can we leverage our data assets to drive the decision making? How can we build AI products and deploy them in our business processes?
We try to answer these questions, provide real world examples and share our key learnings of leveraging data science in logistics at scale.