Program - c/o data science

PROGRAM 2019

C/O DATA SCIENCE 2019: Tuesday, 12 November
08:00 – 09:00

Registration
09:00 – 09:45

Camp Opening

By Thomas Löchte Informationsfabrik GmbH, Gabor Kotalik Deutsche Telekom

Mehr Infos in Kürze

OPENING + CLOSING KEYNOTE
09:55 – 10:40

Automated Machine Learning – Hype or Next Big Thing?

By Mathias Kirsten Deutsche Telekom

Automated Machine Learning has become a hot topic in the past two years. Players like DataRobot, H2O or SparkBeyond are pushing the topic onto the Data Science agenda and Company Managers are buying into the marketing message of “Democratizing AI”: Conjuring up expectations to substitute missing Data Scientists by giving data savvy business experts access to such tools. The latter one being a claim that we have heard of again and again with every new generation of Visual Data Analytics platforms since their first appearance in the late 1990s.
Time for a reality check!

In this session, I am going to show and discuss results we obtained at Deutsche Telekom by employing one of the currently most sophisticated Auto Machine Learning tools available on-premise. Benchmarking the tool on several application domains against human Data Scientists as well as other Auto ML tools, I am going to put our findings into perspective with the Data Mining life cycle and show where Auto ML tools actually provide substantial support – but also where it falls short of the expectations and high hopes. Finally, I will conclude with an outlook on the role of Data Scientists and the future relevance of Automated Machine Learning.

TALKS
09:55 – 10:40

Barcamp #1

By YOURSELF
09:55 – 10:40

Barcamp #2: Continual Learning (Jemia)

By YOURSELF
09:55 – 12:50

Hacksession: Knowledge Graph 101

By Joshua Görner

From search engines over natural language processing to genome research so called “Knowledge Graphs” become more and more prevalent. But what exactly is a “Knowledge Graph”, what components does a “Knowledge Graph” consist of and how can I utilise a “Knowledge Graph” to deliver value for my business problems? If you are interested in one of those questions, this session is exactly for you. After the session the participants have jointly created their first Knowledge Graph based on open source technologies, will have a good understanding of the basic terms such as “Knowledge Graph”, “Ontology” or “Semantic Reasoning” and are enabled to acquire further information and put them into the broader context.

For a successful and interactive participation the following things are required: Laptop, Docker (incl. Docker Compose) installation, Git Client and an open mind 🙂

HACKSESSIONS
10:40 – 11:10

Coffee Break & Networking
11:10 – 11:55

Barcamp #3: Collaboration & Reproducability Hands-On (Steffen)

By YOURSELF
11:10 – 11:55

Barcamp #4: Discussion – Level of fuel generated of tank systems: What kind of information can you generate from data? (Helen)

By YOURSELF
11:10 – 11:55

Credit Scoring und XAI

By Michael Bücker Fachhochschule Münster

A major requirement for Credit Scoring models is of course to provide a risk prediction that is as accurate as possible. In addition, regulators demand these models to be transparent and auditable. Thus, in Credit Scoring very simple Predictive Models such as Logistic Regression or Decision Trees are still widely used and the superior predictive power of modern Machine Learning algorithms cannot be fully leveraged. A lot of potential is therefore missed, leading to higher reserves or more credit defaults.

This talk presents an overview of techniques that are able to make “black box” machine learning models transparent and demonstrate how they can be applied in Credit Scoring. We use the DALEX set of tools to compare a traditional scoring approach with state of the art Machine Learning models and asses both approaches in terms of interpretability and predictive power. Results show that a comparable degree of interpretability can be achieved while machine learning techniques keep their ability to improve predictive power.

TALKS
12:05 – 12:50

Barcamp #5: Auto Forecast – program conception/design (Bianca)

By YOURSELF
12:05 – 12:50

Barcamp #6: Auto ML + automated data processing & FE (Johannes)

By YOURSELF
12:05 – 12:50

Einführung in Transformer-basierte Sequenzmodelle

By Juri Wiens REWE Digital

Eine lange Zeit bildeten Recurrent Neural Networks die Grundlage für state-of-the-art Modelle zur Verarbeitung sequentieller Daten wie z.B. Texte oder Audiosignale. Für Aufgaben aus dem Bereich des Natural Language Processing wie maschinelle Übersetzungen, Sprachverständnis und Textgenerierung wurden diese kürzlich durch Modelle wie BERT, GPT-2 und XLNet abgelöst. Sie alle basieren auf der Transformer Architektur, die durch den Einsatz des Self-Attention-Mechanismus vollständig auf rekurrente und convolutional Layer verzichten kann.

Ziel des Vortrags ist, die Funktionsweise der Bausteine der Transformer Architektur anhand von Use-Case-Szenarien intuitiv verständlich zu machen, sowie deren Vor- und Nachteile im Vergleich zu RNNs zu erläutern.

TALKS
12:50 – 14:00

Lunch Break & Networking
14:00 – 16:45

Hacksession: Image Recognition

By Thorben Jensen Informationsfabrik

Image recognition is one of the most active applications of Machine Learning and Artificial Intelligence. In many cases, the quality of current techniques surpasses even human capabilities. A current trend here is software that runs smoothly even on mobile and embedded hardware.

One popular method for quickly finding objects in images is YOLO („you only look once“). After a brief introduction to image recognition with neural networks, we will apply them in practice. We have prepared for you code snippets that will make it as easy as possible to start using YOLO. We will give an introduction into how you can use these snippets, and then the floor will be yours.

During the session, we would like you to form small groups, to brainstorm on interesting use cases and to then start prototyping them. To fuel your imagination, we are bringing a unique data source.

The simplest requirement to participate in the session is bringing a Laptop and access to Google Colab (i.e. a Google Account). If you like, you can also try to set up the project dependencies on your own Laptop, ideally a Linux machine that has Python 3.x, an environment manager (e.g. Anaconda/Miniconda), and Git installed.

HACKSESSIONS
14:00 – 16:45

Barcamp #7: Discussion – Introduce Predictive Maintenance (Manuel)

By YOURSELF
14:00 – 14:45

Barcamp #8: Data Processing (Miles)

By YOURSELF
14:00 – 14:45

Top 3 Cutting Edge Use Cases for eCommerce

By Benjamin Aunkofer DATANOMIQ GmbH

The eCommerce sector is one of the key pioneer when it comes to the usage of data science and machine learning. In this slot, Benjamin Aunkofer will introduce three of the most value generating and beneficial use cases you can pitch, plan and realize in eCommerce companies by using Clustering, Classification and Regression Models using Ensemble Learning and Deep Learning.
1. Forecasting of Revenues, Order Cancelations and Returns in Controlling
2. Item Display Order Ranking (for dynamic shop-pages) in Product Management
3. Probalistic Attribution Modelling for improved budget allocation in Marketing

TALKS
14:55 – 15:40

Application of Neural Nets in Content Classification

By Felix Jungmann ERGO, Robert Mertzig ERGO

Larger companies face the problem of large numbers of incoming documents, binding significant resources in reading and redirecting to the corresponding sections. In this presentation an approach is presented, which discusses the solution regarding incoming emails at ERGO Nuremberg. Approx. 1k emails receives this location per day, which were manually read and redirected. Expecting an estimated increase of the email rate, a solution based on current machine learning approaches has to be implemented to support the company. This talk will cover the data preparation, the architecture and the final operationalization of this model in detail. The additional challenges of data science in the field of insurances regarding data protection law will be discussed as well as the technical requirements and boundaries as well as the usual pitfalls.

TALKS
14:55 – 15:40

Barcamp #9: Presentation – Winter is coming (Paavo)

By YOURSELF
14:55 – 15:40

Barcamp #10:

By YOURSELF

Session still available

BARCAMP

Where

Barcamp #2 Entrance
15:40 – 16:00

Coffee Break & Networking
16:00 – 16:45

Scaling Machine Learning in Production at Deutsche Telekom Global Carrier

By Gabor Kotalik Deutsche Telekom

Deutsche Telekom is the 4th biggest telecommunications company in the world, with millions of customers using its roaming mobile services every day. This presentation details how Deutsche Telekom builds and deploys machine learning capabilities for its commercial roaming business.
I will give an overview of our machine learning use-cases and dive into addressing the challenges of at-scale deployment including several aspects of business integration, data quality monitoring, feature engineering, machine learning model life cycle management, package management, process automation, code reproducibility and continuous integration. You will learn about the end-to-end machine learning workflow what we build to produce capabilities from forecasting to anomaly detection and integration of different open source data science / data visualization tools.
Finally I will cover lessons learned on challenges associated with enterprise IT integration of open source solutions, providing practical examples of pitfalls and how we addressed them.

TALKS
16:00 – 16:45

Barcamp #11: Why you shouldn’t be a data scientist (Philipp)

By YOURSELF
16:00 – 16:45

Barcamp #12: Labeling a large amount of data without manual labeling (Robert)

By YOURSELF
17:00 – 17:45

Building AI-Driven Products At Scale– Key Learnings From Applied Data Science In Logistics

By Katrin König Deutsche Post DHL Group

Deutsche Post DHL Group employs over half a million people around the globe across postal services, international express delivery, forwarding and contract logistics. Not surprisingly, our activities generate a lot of data. We believe that our data is the key to unlocking untapped potential in our company – but how can we leverage our data assets to drive the decision making? How can we build AI products and deploy them in our business processes?

We try to answer these questions, provide real world examples and share our key learnings of leveraging data science in logistics at scale.

OPENING + CLOSING KEYNOTE
17:45 – 20:00

Camp Closing

PROGRAM 2019

C/O DATA SCIENCE 2019: Tuesday, 12 November

Registration

Camp Opening

Automated Machine Learning – Hype or Next Big Thing?

Barcamp #1

Barcamp #2: Continual Learning (Jemia)

Hacksession: Knowledge Graph 101

Coffee Break & Networking

Barcamp #3: Collaboration & Reproducability Hands-On (Steffen)

Barcamp #4: Discussion – Level of fuel generated of tank systems: What kind of information can you generate from data? (Helen)

Credit Scoring und XAI

Barcamp #5: Auto Forecast – program conception/design (Bianca)

Barcamp #6: Auto ML + automated data processing & FE (Johannes)

Einführung in Transformer-basierte Sequenzmodelle

Lunch Break & Networking

Hacksession: Image Recognition

Barcamp #7: Discussion – Introduce Predictive Maintenance (Manuel)

Barcamp #8: Data Processing (Miles)

Top 3 Cutting Edge Use Cases for eCommerce

Application of Neural Nets in Content Classification

Barcamp #9: Presentation – Winter is coming (Paavo)

Barcamp #10:

Coffee Break & Networking

Scaling Machine Learning in Production at Deutsche Telekom Global Carrier

Barcamp #11: Why you shouldn’t be a data scientist (Philipp)

Barcamp #12: Labeling a large amount of data without manual labeling (Robert)

Building AI-Driven Products At Scale– Key Learnings From Applied Data Science In Logistics

Camp Closing