«Vertrauen schaffen durch menschenzentrierte KI»: Unter diesem Slogan hat die Europäische Kommission in der vergangenen Woche ihren Vorschlag zur Regulierung von künstlicher Intelligenz (KI-Regulierung) vorgestellt. Dieser historische Schritt positioniert Europa als ersten Kontinent, der KI und den Umgang mit personenbezogenen Daten einheitlich reguliert. Mithilfe dieser wegweisenden Regulierung soll Europa Standards zur Nutzung mit Daten und KI setzen – auch über die europäischen Grenzen hinaus. Dieser Schritt ist richtig. KI ist ein Katalysator der digitalen Transformation mit nachhaltigen Implikationen für Wirtschaft, Gesellschaft und Umwelt. Klare Spielregeln für den Einsatz dieser Technologie sind deshalb ein Muss. Damit kann sich Europa als progressiver Standort positionieren, der bereit ist für das digitale Zeitalter. In der aktuellen Form wirft der Vorschlag aber noch einige Fragen bei der praktischen Umsetzung auf. Abstriche bei der digitalen Wettbewerbsfähigkeit kann sich Europa im immer weiter klaffenden Wettbewerb mit Amerika und China nicht leisten.

Transparenz bei Risiken von KI

Zwei zentrale Vorschläge der KI-Regulierung zur Schaffung von Vertrauen

Um Vertrauen in KI-Produkte zu schaffen, setzt der Vorschlag zur KI-Regulierung auf zwei zentrale Ansätze: Risiken künstlicher Intelligenz überwachen und gleichzeitig ein «Ökosystem an KI-Exzellenz» kultivieren. Konkret beinhaltet der Vorschlag ein Verbot für die Anwendung von KI zu manipulativen und diskriminierenden Zwecken oder zur Beurteilung von Verhalten durch ein «Social Scoring System». Anwendungsfälle, die nicht in diese Kategorie fallen, sollen trotzdem auf Gefahren untersucht und auf einer vagen Risikoskala platziert werden. An Hochrisikoanwendungen werden besondere Anforderungen gestellt, deren Einhaltung sowohl vor als auch nach der Inbetriebnahme geprüft werden soll.

Dass anstelle einer Pauschalregulierung KI-Anwendungen auf Fallbasis beurteilt werden sollen, ist entscheidend. Noch letztes Jahr forderte die Europäische Kommission in einem Whitepaper die breite Einstufung aller Anwendungen in Geschäftsbereichen wie dem Gesundheitssektor oder der Transportindustrie. Diese flächendeckende Einstufung anhand definierter Branchen, unabhängig der eigentlichen Use Cases, wäre hinderlich und hätte für ganze Industrien auf dem Kontinent strukturelle Benachteiligungen bedeutet. Diese Fall-zu-Fall-Beurteilung erlaubt die agile und innovative Entwicklung von KI in allen Sektoren und unterstellt zudem alle Branchen den gleichen Standards zur Zulassung von risikoreichen Anwendungen.

Klare Definition von Risiken einer KI-Anwendung fehlt

Allerdings lässt der Vorschlag zur KI-Regulierung eine klare Definition von «hohen Risiken» vermissen. Da Entwickler selbst für die Beurteilung ihrer Anwendungen zuständig sind, ist eine klar definierte Skala zur Beurteilung von Risiken unabdingbar. Artikel 6 und 7 umschreiben zwar Risiken und geben Beispiele von «Hochrisikoanwendungen», ein Prozess zur Beurteilung von Risiken einer KI-Anwendung wird aber nicht definiert. Besonders Start-ups und kleinere Unternehmen, die unter KI-Entwicklern stark vertreten sind, sind auf klare Prozesse und Standards angewiesen, um gegenüber Großunternehmen mit entsprechenden Ressourcen nicht ins Hintertreffen zu geraten. Dazu sind praxisnahe Leitlinien zur Beurteilung von Risiken nötig.

Wird ein Use Case als «Hochrisikoanwendung» eingestuft, dann müssen verschiedene Anforderungen hinsichtlich Data Governance und Risk Management erfüllt sein, bevor das Produkt auf den Markt gebracht werden kann. So müssen verwendete Trainingsdatensätze nachweislich auf Verzerrungen und einseitige Tendenzen geprüft werden. Auch sollen die Modellarchitektur und Trainingsparameter dokumentiert werden. Nach dem Deployment muss ein Maß an menschlicher Aufsicht über getroffene Entscheidungen des Modells sichergestellt werden.

Verantwortlichkeit zu KI-Produkten ist ein hohes und wichtiges Ziel. Allerdings bleibt erneut die praktische Umsetzung dieser Anforderungen fraglich. Viele moderne KI-Systeme nutzen nicht länger den herkömmlichen Ansatz von Trainings- und Testdaten, sondern setzen bspw. durch Reinforcement Learning auf exploratives Training durch Feedback anstelle eines statischen, prüfbaren Datensatzes. Fortschritte in Explainable AI brechen zwar undurchschaubare Black-Box Modelle stetig weiter auf und ermöglichen immer mehr Rückschlüsse auf die Wichtigkeit von Variablen im Entscheidungsprozess eines Modelles, aber komplexe Modellarchitekturen und Trainingsprozesse vieler moderner neuronaler Netzwerke machen einzelne Entscheide eines solchen Modells für Menschen kaum sinnhaft rekonstruierbar.

Auch werden Anforderungen an die Genauigkeit der Prognosen oder Klassifizierungen gestellt. Dies stellt Entwickler vor besondere Herausforderungen, denn kein KI-System hat eine perfekte Genauigkeit. Dieser Anspruch besteht auch nicht, oftmals werden Fehlklassifikationen so eingeplant, dass sie für den jeweiligen Use Case möglichst wenig ins Gewicht fallen. Deshalb ist es unabdinglich, dass die Anforderungen an die Genauigkeit von Prognosen und Klassifikationen von Fall zu Fall in Anbetracht der Anwendung festgelegt werden und auf Pauschalwerte verzichtet wird.

KI-Exzellenz ermöglichen

Europa gerät ins Hintertreffen

Mit diesen Anforderungen will der Vorschlag zur KI-Regulierung durch Transparenz und Verantwortlichkeit Vertrauen in die Technologie wecken. Dies ist ein erster, richtiger Schritt in Richtung «KI-Exzellenz». Nebst Regulierung muss der KI-Standort Europa dazu aber auch für Entwickler und Investoren mehr Strahlkraft erhalten.

Laut einer jüngst veröffentlichten Studie des Center for Data Innovation gerät Europa sowohl gegenüber den Vereinigten Staaten als auch China im Anspruch um die weltweite Führungsposition in Sachen KI bereits ins Hintertreffen. So hat China mittlerweile in der Anzahl veröffentlichter Studien und Publikationen zu künstlicher Intelligenz Europa den Rang abgelaufen und die weltweite Führung übernommen. Auch ziehen europäische KI-Unternehmen erheblich weniger Investitionen an als ihre amerikanischen Pendants. Europäische KI-Unternehmen investieren weniger Geld in Forschung und Entwicklung und werden auch seltener aufgekauft als ihre amerikanischen Kollegen.

Ein Schritt in die richtige Richtung: Unterstützung von Forschung und Innovation

Der Vorschlag der EU-Kommission erkennt an, dass für Exzellenz auf dem europäischen Markt mehr Unterstützung für KI-Entwicklung benötigt wird und verspricht Regulatory Sandboxes, also rechtliche Spielräume zur Entwicklung und Testung innovativer KI-Produkte, und die Kofinanzierung von Forschungs- und Teststätten für KI. Dadurch sollen insbesondere Start-ups und kleinere Unternehmen wettbewerbsfähiger werden und für mehr europäische Innovationen sorgen.

Dies sind notwendige Schritte, um Europa auf den Weg zur KI-Exzellenz zu hieven, allerdings ist damit nicht genug getan. KI-Entwickler brauchen einfacheren Zugang zu Märkten außerhalb der EU, was auch das Vereinfachen von Datenströmen über Landesgrenzen bedeutet. Die Möglichkeiten zur Expansion in die USA und Zusammenarbeit mit Silicon Valley ist für die digitale Branche besonders wichtig, um der Vernetzung von digitalen Produkten und Services gerecht zu werden.

Was in dem Vorschlag zur KI-Regulierung gänzlich fehlt ist die Aufklärung über KI und deren Potenzial und Risiken außerhalb von Fachkreisen. Mit der zunehmenden Durchdringung aller Alltagsbereiche durch künstliche Intelligenz wird dies immer wichtiger, denn um Vertrauen in neue Technologien stärken zu können, müssen diese zuerst verstanden werden. Die Aufklärung sowohl über das Potenzial als auch die Grenzen von KI ist ein essenzieller Schritt, um künstliche Intelligenz zu entmystifizieren und dadurch Vertrauen in die Technologie zu schaffen.

Potenzial noch nicht ausgeschöpft

Mit diesem Vorschlag erkennt die Europäische Kommission an, dass Künstliche Intelligenz wegweisend ist für die Zukunft des europäischen Marktes. Leitlinien für eine Technologie dieser Tragweite sind wichtig – genauso wie die Förderung von Innovation. Damit diese Strategien auch Früchte tragen, muss ihre praktische Umsetzung auch für Start-ups und KMU zweifelsfrei umsetzbar sein. Das Potenzial zur KI-Exzellenz ist in Europa reichlich vorhanden. Mit klaren Spielregeln und Anreizen kann dies auch realisiert werden.

Computer sehen zu lassen, dies mag für viele nach Science-Fiction klingen. Denn mit «sehen» ist nicht das Filmen mit einer Webcam, sondern das Verständnis von Bildmaterial gemeint. Tatsächlich sind derartige Technologien hinter den Kulissen vieler alltäglicher Services schon lange im Einsatz. Soziale Netzwerke erkennen seit Jahren Freunde und Bekannte auf Fotos und moderne Smartphones lassen sich mit dem Gesicht anstatt einem PIN-Code entsperren. Neben diesen kleinen Alltagserleichterungen birgt das rasant wachsende Feld der «Computer Vision» weitaus größeres Potenzial für den industriellen Einsatz. Die spezialisierte Verarbeitung von Bildmaterial verspricht sowohl viele repetitive Prozesse zu erleichtern und automatisieren. Zudem sollen Experten und Fachpersonal entlastet und in ihren Entscheidungen unterstützt werden.

Die Grundlagen für Bilderkennung und Computer Vision wurden bereits in den 1970er Jahren geschaffen. Allerdings hat das Feld erst in den letzten Jahren vermehrt Anwendung außerhalb der Forschung gefunden. In unserer Tätigkeit als Data Science & AI Beratung hier bei STATWORX haben wir bereits einige interessante Anwendungsfälle von Computer Vision kennengelernt. Dieser Beitrag stellt fünf ausgewählte und besonders vielversprechende Use Cases verschiedener Industrien vor, die entweder bereits in Produktion anzutreffen sind, oder in den kommenden Jahren große Veränderungen in ihren jeweiligen Feldern versprechen.

Inhaltsverzeichnis

Use Cases Computer Vision

1. Einzelhandel: Customer Behavior Tracking

Onlineshops wie Amazon können sich die Analysefähigkeit ihrer digitalen Plattform schon lange zunutze machen. Das Verhalten der Kundschaft kann detailliert analysiert und die User Experience dadurch optimiert werden. Auch die Retailbranche versucht die Erfahrung ihrer Kundschaft zu optimieren und ideal zu gestalten. Allerdings haben bisher die Tools gefehlt, um Interaktion von Personen mit ausgestellten Gegenständen automatisch zu erfassen. Computer Vision vermag diese Lücke für den Einzelhandel nun ein Stück weit zu schließen.

In Kombination mit bestehenden Sicherheitskameras können Algorithmen Videomaterial automatisch auswerten und somit das Kundschaftsverhalten innerhalb des Ladens studieren. Beispielsweise kann die aktuelle Anzahl an Personen im Laden jederzeit gezählt werden, was sich zu Zeiten der COVID-19 Pandemie mit den Auflagen zur maximal erlaubten Anzahl an Besuchern in Geschäften als Anwendungsgebiet anbietet. Interessanter dürften aber Analysen auf der Individualebene sein, wie die gewählte Route durch das Geschäft und einzelne Abteilungen. Damit lassen sich das Design, der Aufbau und die Platzierung von Produkten optimieren, Staus in gut besuchten Abteilungen vermeiden und insgesamt die User Experience der Kundschaft verbessern. Revolutionär ist die Möglichkeit zum Tracking der Aufmerksamkeit, welche einzelne Regale und Produkte von der Kundschaft erhalten. Spezialisierte Algorithmen sind dazu in der Lage, die Blickrichtung von Menschen zu erfassen und somit zu messen, wie lange ein beliebiges Objekt von Passanten betrachtet wird.

Mithilfe dieser Technologie hat der Einzelhandel nun die Möglichkeit zum Onlinehandel aufzuschließen und das Kundschaftsverhalten innerhalb ihrer Geschäfte detailliert auszuwerten. Dies ermöglicht nicht nur die Steigerung von Absätzen, sondern auch die Minimierung der Aufenthaltszeit und optimierte Verteilung von Kunden innerhalb der Ladenfläche.

Abbildung 1: Customer Behavior Tracking
Abbildung 1: Customer Behavior Tracking mit Computer Vision
(https://www.youtube.com/watch?v=jiaNA1hln5I)

2. Landwirtschaft: Erkennung von Weizenrost mittels Computer Vision

Moderne Technologien ermöglichen Landwirtschaftsbetrieben die effiziente Bestellung immer größerer Felder. Dies hat gleichzeitig zur Folge, dass diese Flächen auf Schädlinge und Pflanzenkrankheiten überprüfen müssen, denn falls übersehen, können Pflanzenkrankheiten zu schmerzhaften Ernteeinbrüchen und Verlusten führen.

Machine Learning verschafft hier Abhilfe, denn mittels des Einsatzes von Drohnen, Satellitenbildern und Remote-Sensoren können große Datenmengen generiert werden. Moderne Technologie erleichtert die Erhebung unterschiedlicher Messwerte, Parameter und Statistiken, welche automatisiert überwacht werden können. Landwirtschaftsbetriebe haben somit rund um die Uhr einen Überblick über die Bodenbedingungen, Bewässerungsgrad, Pflanzengesundheit und lokalen Temperaturen, trotz der großflächigen Bepflanzung von stetig größeren Feldern. Machine Learning Algorithmen werten diese Daten aus. So kann der Landwirtschaftbetrieb frühzeitig anhand dieser Informationen auf potenzielle Problemherde reagieren und vorhandene Ressourcen effizient verteilen kann.

Computer Vision ist für die Landwirtschaft besonders interessant, denn durch die Analyse von Bildmaterial lassen sich Pflanzenkrankheiten bereits im Anfangsstadium erkennen. Vor wenigen Jahren wurden Pflanzenkrankheiten häufig erst dann erkannt wurden, wenn sie sich bereits ausbreiten konnten. Basierend auf Computer Vision lässt sich die großflächige Ausbreitung mittels Frühwarnsysteme nun frühzeitig erkennen und stoppen. Landwirtschaftsbetriebe verlieren dadurch nicht nur weniger Ernte, sie sparen auch beim Einsatz von Gegenmaßnahmen wie Pestiziden, da vergleichsweise kleinere Flächen behandelt werden müssen.

Besonders die automatisierte Erkennung von Weizenrost hat innerhalb der Computer Vision Community viel Aufmerksamkeit erhalten. Verschiedene Vertreter dieses aggressiven Pilzes befallen Getreide in Ostafrika, rund ums Mittelmeer, wie auch in Zentraleuropa und führen zu großen Ernteausfällen von Weizen. Da der Schädling an Stängeln und Blättern von Getreide gut sichtbar ist, lässt er sich von trainierten Bilderkennungsalgorithmen schon früh erkennen und an der weiteren Ausbreitung hindern.

Abbildung 2: Erkennung von Weizenrost mit computer vision
Abbildung 2: Erkennung von Weizenrost mit Computer Vision
(https://www.kdnuggets.com/2020/06/crop-disease-detection-computer-vision.html)

3. Gesundheitswesen: Bildsegmentierung von Scans

Das Potenzial von Computer Vision im Gesundheitswesen ist riesig, die möglichen Anwendungen zahllos. Die medizinische Diagnostik verlässt sich stark auf das Studium von Bildern, Scans und Fotografien. Die Analyse von Ultraschallbildern, MRI- und CT-Scans gehören zum Standardrepertoire der modernen Medizin. Computer Vision Technologien versprechen diesen Prozess nicht nur zu vereinfachen, sondern auch Fehldiagnosen vorzubeugen und entstehende Behandlungskosten zu senken. Computer Vision soll dabei medizinisches Fachpersonal nicht ersetzen, sondern deren Arbeit erleichtern und bei Entscheidungen unterstützen. Bildsegmentierung hilft bei der Diagnostik, indem relevante Bereiche auf 2D- oder 3D Scans erkannt und eingefärbt werden können, um das Studium von Schwarz-Weiß-Bildern zu erleichtern.

Der neuste Use Case für diese Technologie liefert die COVID-19 Pandemie. Bildsegmentierung kann Ärzt*innen und Wissenschaftler*innen bei der Identifikation von COVID-19 und der Analyse und Quantifizierung der Ansteckung und des Krankheitsverlaufs unterstützen. Der trainierte Bilderkennungsalgorithmus identifiziert verdächtige Stellen auf CT-Scans der Lunge. Anschließend ermittelt er deren Größe und Volumen, sodass der Krankheitsverlauf betroffener Patienten klar verfolgt werden kann.

Der Nutzen für das Monitoring einer neuen Krankheit ist riesig. Computer Vision erleichtert Ärzt*innen nicht nur die Diagnose der Krankheit und Überwachung während der Therapie. Die Technologie generiert auch wertvolle Daten zum Studium der Krankheit und ihrem Verlauf. Dabei profitiert auch die Forschung von den erhobenen Daten und dem erstellten Bildmaterial, sodass mehr Zeit für Experimente und Teste anstatt der Datenerhebung verwendet werden kann.

4. Automobil Industrie: Objekterkennung und -klassifizierung im Verkehr

Selbstfahrende Autos gehören definitiv zu den Use Cases aus dem Bereich der künstlichen Intelligenz, denen in letzten Jahren medial am meisten Aufmerksamkeit gewidmet wurde. Zu erklären ist dies wohl eher mit dem futuristischen Anstrich der Idee von autonomem Fahren als den tatsächlichen Konsequenzen der Technologie. Im Grunde genommen sind darin mehrere Machine Learning Probleme verpackt, Computer Vision bildet aber ein wichtiges Kernstück bei deren Lösung.

So muss der Algorithmus (der sogenannte «Agent»), von dem das Auto gesteuert wird, jederzeit über die Umgebung des Autos aufgeklärt sein. Der Agent muss wissen wie die Straße verläuft, wo sich andere Autos in der Nähe befinden, wie groß der Abstand zu potenziellen Hindernissen und Objekten ist und wie schnell sich diese Objekte auf der Straße bewegen, um sich konstant der sich stets ändernden Umwelt anpassen zu können. Dazu sind autonome Fahrzeuge mit umfangreichen Kameras ausgestattet, welche ihre Umgebung flächendeckend filmen. Das erstellte Filmmaterial wird anschließend in Echtzeit von einem Bilderkennungsalgorithmus überwacht. Ähnlich wie beim Customer Behavior Tracking setzt dies voraus, dass der Algorithmus nicht nur statische Bilder, sondern einen konstanten Fluss an Bildern nach relevanten Objekten absuchen und diese klassifizieren kann.

Abbildung 5: Objekterkennung und Klassifizierung im Straßenverkehr
Abbildung 5: Objekterkennung und Klassifizierung im Straßenverkehr (https://miro.medium.com/max/1000/1*Ivhk4q4u8gCvsX7sFy3FsQ.png)

Diese Technologie existiert bereits und kommt auch industriell zum Einsatz. Die Problematik im Straßenverkehr stammt von dessen Komplexität, Volatilität und der Schwierigkeit, einen Algorithmus so zu trainieren, dass auch etwaiges Versagen des Agenten in komplexen Ausnahmesituationen ausgeschlossen werden kann. Dabei entblößt sich die Achillessehne von Computer Vision: Der Bedarf nach großen Mengen an Trainigsdaten, deren Generierung im Straßenverkehr mit hohen Kosten verbunden ist.

5. Fitness: Human Pose Estimation

Die Fitnessbranche befindet sich seit Jahren im Prozess der digitalen Transformation. Neue Trainingsprogramme und Trends werden via YouTube einem Millionenpublikum vorgestellt, Trainingsfortschritte werden mit Apps verfolgt und ausgewertet und spätestens seit dem Beginn der Coronakrise erfreuen sich virtuelle Trainings und Home Workouts massiver Beliebtheit. Gerade beim Kraftsport lassen sich Fitnesstrainer*innen aufgrund der hohen Verletzungsgefahr nicht aus dem Studio wegdenken – noch nicht. Denn während heute das Überprüfen der eigenen Haltung und Position beim Training via Video bereits gängig ist, ermöglicht es Computer Vision auch in diesem Feld Videomaterial genauer als das menschliche Auge auszuwerten und zu beurteilen.

Zum Einsatz kommt dabei eine Technologie, die dem bereits vorgestellten Attention Tracking der Einzelhandelsbranche ähnelt. Human Pose Estimation ermöglicht einem Algorithmus das Erkennen und Schätzen der Haltung und Pose von Menschen auf Video. Dazu wird die Position der Gelenke und deren Stellung im Bezug zueinander ermittelt. Da der Algorithmus gelernt hat, wie die ideale und sichere Ausführung einer Fitnessübung aussehen soll, lassen sich Abweichungen davon automatisiert erkennen und hervorheben. Implementiert in einer Smartphone App kann dies in Echtzeit und mit unmittelbarem Warnsignal geschehen. Somit kann rechtzeitig vor gefährlichen Fehlern gewarnt werden, anstatt Bewegungen erst im Nachhinein zu analysieren. Dies verspricht das Verletzungsrisiko beim Krafttraining maßgeblich zu reduzieren. Training ohne Fitnesstrainer*innen wird dadurch sicherer und die Kosten für sicheres Krafttraining werden gesenkt.

Human Pose Estimation ist ein weiterer Schritt in Richtung digitalem Fitnesstraining. Smartphones sind im Fitnesstraining bereits weitgehend etabliert. Apps, die das Training sicherer machen, dürften bei der breiten Nutzerbasis großen Anklang finden.

Abbildung 6: Analyse von Bewegungsabläufen in Echtzeit
Abbildung 6: Analyse von Bewegungsabläufen in Echtzeit mit Computer Vision
(https://mobidev.biz/wp-content/uploads/2020/07/detect-mistakes-knee-cave.gif)

Zusammenfassung

Computer Vision ist ein vielseitiges und vielversprechendes Feld von Machine Learning. Es verspricht die Lösung einer breiten Palette von Problemen in verschiedensten Branchen und Industrien. Das Verarbeiten von Bild- und Videomaterial in Echtzeit ermöglicht die Lösung von Problemstellungen weit komplexer als mit herkömmlichen Datenformaten. Das bringt den Stand von Machine Learning den «intelligenten» Systemen immer näher. Bereits heute bieten sich immer häufiger alltägliche Schnittstellen zu Computer Vision an – ein Trend, der sich in den kommenden Jahren nur zu beschleunigen scheint.

Die hier vorgestellten Beispiele sind nur die Spitze des Eisbergs. Tatsächlich gibt es in jeder der genannten Branchen große Bestrebungen mithilfe von Computer Vision Technologie bestehende Prozesse effizienter zu gestalten. Aktuell gibt es viele Bestrebungen Computer Vision in die dritte Dimension zu heben und anstelle von Fotos und Scans auch 3D-Modelle verarbeiten zu lassen. Die Nachfrage nach industrieller Bildverarbeitung in 3D wächst, sowohl in der Vermessung, der Medizin, wie auch der Robotik. Die Verarbeitung von 3D-Bildmaterial wird in den kommenden Jahren noch Beachtung erhalten, denn viele Problemstellungen lassen sich erst in 3D effizient lösen.

 

In my previous blog post, I have shown you how to run your R-scripts inside a docker container. For many of the projects we work on here at STATWORX, we end up using the RShiny framework to build our R-scripts into interactive applications. Using containerization for the deployment of ShinyApps has a multitude of advantages. There are the usual suspects such as easy cloud deployment, scalability, and easy scheduling, but it also addresses one of RShiny’s essential drawbacks: Shiny creates only a single R session per app, meaning that if multiple users access the same app, they all work with the same R session, leading to a multitude of problems. With the help of Docker, we can address this issue and start a container instance for every user, circumventing this problem by giving every user access to their own instance of the app and their individual corresponding R session.

If you’re not familiar with building R-scripts into a docker image or with Docker terminology, I would recommend you to first read my previous blog post.

So let’s move on from simple R-scripts and run entire ShinyApps in Docker now!

The Setup

Setting up a project

It is highly advisable to use RStudio’s project setup when working with ShinyApps, especially when using Docker. Not only do projects make it easy to keep your RStudio neat and tidy, but they also allow us to use the renv package to set up a package library for our specific project. This will come in especially handy when installing the needed packages for our app to the Docker image.

For demonstration purposes, I decided to use an example app created in a previous blog post, which you can clone from the STATWORX GitHub repository. It is located in the „example-app“ subfolder and consists of the three typical scripts used by ShinyApps (global.R, ui.R, and server.R) as well as files belonging to the renv package library. If you choose to use the example app linked above, then you won’t have to set up your own RStudio Project, you can instead open „example-app.Rproj“, which opens the project context I have already set up. If you choose to work along with an app of your own and haven’t created a project for it yet, you can instead set up your own by following the instructions provided by RStudio.

Setting up a package library

The RStudio project I provided already comes with a package library stored in the renv.lock file. If you prefer to work with your own app, you can create your own renv.lock file by installing the renv package from within your RStudio project and executing renv::init(). This initializes renv for your project and creates a renv.lock file in your project root folder. You can find more information on renv over at RStudio’s introduction article on it.

The Dockerfile

The Dockerfile is once again the central piece of creating a Docker image. We now aim to repeat this process for an entire app where we previously only built a single script into an image. The step from a single script to a folder with multiple scripts is small, but there are some significant changes needed to make our app run smoothly.

# Base image https://hub.docker.com/u/rocker/
FROM rocker/shiny:latest

# system libraries of general use
## install debian packages
RUN apt-get update -qq && apt-get -y --no-install-recommends install 
    libxml2-dev 
    libcairo2-dev 
    libsqlite3-dev 
    libmariadbd-dev 
    libpq-dev 
    libssh2-1-dev 
    unixodbc-dev 
    libcurl4-openssl-dev 
    libssl-dev

## update system libraries
RUN apt-get update && 
    apt-get upgrade -y && 
    apt-get clean

# copy necessary files
## app folder
COPY /example-app ./app
## renv.lock file
COPY /example-app/renv.lock ./renv.lock

# install renv & restore packages
RUN Rscript -e 'install.packages("renv")'
RUN Rscript -e 'renv::consent(provided = TRUE)'
RUN Rscript -e 'renv::restore()'

# expose port
EXPOSE 3838

# run app on container start
CMD ["R", "-e", "shiny::runApp('/app', host = '0.0.0.0', port = 3838)"]

The base image

The first difference is in the base image. Because we’re dockerizing a ShinyApp here, we can save ourselves a lot of work by using the rocker/shiny base image. This image handles the necessary dependencies for running a ShinyApp and comes with multiple R packages already pre-installed.

Necessary files

It is necessary to copy all relevant scripts and files for your app to your Docker image, so the Dockerfile does precisely that by copying the entire folder containing the app to the image.

We can also make use of renv to handle package installation for us. This is why we first copy the renv.lock file to the image separately. We also need to install the renv package separately by using the Dockerfile’s ability to execute R-code by prefacing it with RUN Rscript -e. This package installation allows us to then call renv directly and restore our package library inside the image with renv::restore(). Now our entire project package library will be installed in our Docker image, with the exact same version and source of all the packages as in your local development environment. All this with just a few lines of code in our Dockerfile.

Starting the App at Runtime

At the very end of our Dockerfile, we tell the container to execute the following R-command:

shiny::runApp('/app', host = '0.0.0.0', port = 3838)

The first argument allows us to specify the file path to our scripts, which in our case is ./app. For the exposed port, I have chosen 3838, as this is the default choice for RStudio Server, but can be freely changed to whatever suits you best.

With the final command in place every container based on this image will start the app in question automatically at runtime (and of course close it again once it’s been terminated).

The Finishing Touches

With the Dockerfile set up we’re now almost finished. All that remains is building the image and starting a container of said image.

Building the image

We open the terminal, navigate to the folder containing our new Dockerfile, and start the building process:

docker build -t my-shinyapp-image . 

Starting a container

After the building process has finished, we can now test our newly built image by starting a container:

docker run -d --rm -p 3838:3838 my-shinyapp-image

And there it is, running on localhost:3838.

docker-shiny-app-example

Outlook

Now that you have your ShinyApp running inside a Docker container, it is ready for deployment! Having containerized our app already makes this process a lot easier; there are further tools we can employ to ensure state of the art security, scalability, and seamless deployment. Stay tuned until next time, when we’ll go deeper into the full range of RShiny and Docker capabilities by introducing ShinyProxy.

At STATWORX, deploying our project results with the help of Shiny has become part of our daily business. Shiny is a great way of letting users interact with their own data and the data science products that we provide.

Applying the philosophy of reactivity to your app’s UI is an interesting way of bringing your apps closer in line with the spirit of the Shiny package. Shiny was designed to be reactive, so why limit this to only the server-side of your app? Introducing dynamic UI elements to your apps will help you reduce visual clutter, make for cleaner code and enhance the overall feel of your applications.

I have previously discussed the advantages of using renderUI in combination with lapply and do.call in the first part of this series on dynamic UI elements in Shiny. Building onto this I would like to expand our toolbox for reactive UI design with a few more options.

The objective

In this particular case we’re trying to build an app where one of the inputs reacts to another input dynamically. Let’s assume we’d like to present the user with multiple options to choose from in the shape of a selectInput. Let’s also assume that one of the options may call for more input from the user, let’s say a comment, to explain more clearly the previous selection. One way to do this would be to add a static textInput or similar to the app. A much more elegant solution would be to conditionally render the second input to only appear if the proper option had been selected. The image below shows how this would look in practice.

shiny-app-dynamic-ui-elements

There are multiple ways of going about this in Shiny. I’d like to introduce two of them to you, both of which lead to the same result but with a few key differences between them.

A possible solution: req

What req is usually used for

req is a function from the Shiny package whose purpose is to check whether certain requirements are met before proceeding with your calculations inside a reactive environment. Usually this is used to avoid red error messages popping up in your ShinyApp UI when an element of your app depends on an input that doesn’t have a set value yet. You may have seen one of these before:

shiny-error

These errors usually disappear once you have assigned a value to the needed inputs. req makes it so that your desired output is only calculated once its required inputs have been set, thus offering an elegant way to avoid the rather garish looking error messages in your app’s UI.

How we can make use of req

In terms of reactive UI design we can make use of req’s functionality to introduce conditional statements to our uiOutputs. This is achieved by using renderUI and req in combination as shown in the following example:

output$conditional_comment <- renderUI({
    # specify condition
    req(input$select == "B")

    # execute only if condition is met
    textAreaInput(inputId = "comment", 
                  label = "please add a comment", 
                  placeholder = "write comment here") 
  })

Within req the condition to be met is specified and the rest of the code inside the reactive environment created by renderUI is only executed if that condition is met. What is nice about this solution is that if the condition has not been met there will be no red error messages or other visual clutter popping up in your app, just like what we’ve seen at the beginning of this chapter.

A simple example app

Here’s the complete code for a small example app:

library(shiny)
library(shinydashboard)

ui <- dashboardPage(

  dashboardHeader(),
  dashboardSidebar(
    selectInput(inputId = "select", 
                label = "please select an option", 
                choices = LETTERS[1:3]),
    uiOutput("conditional_comment")
  ),
  dashboardBody(
    uiOutput("selection_text"),
    uiOutput("comment_text")
  )
)

server <- function(input, output) {

  output$selection_text <- renderUI({
    paste("The selected option is", input$select)
  })

  output$conditional_comment <- renderUI({
    req(input$select == "B")
    textAreaInput(inputId = "comment", 
                  label = "please add a comment", 
                  placeholder = "write comment here")
  })

  output$comment_text <- renderText({
    input$comment
  })
}

shinyApp(ui = ui, server = server)

If you try this out by yourself you will find that the comment box isn’t hidden or disabled when it isn’t being shown, it simply doesn’t exist unless the selectInput takes on the value of „B“. That is because the uiOutput object containing the desired textAreaInput isn’t being rendered unless the condition stated inside of req is satisfied.

The popular choice: conditionalPanel

Out of all the tools available for reactive UI design this is probably the most widely used. The results obtained with conditionalPanel are quite similar to what req allowed us to do in the example above, but there are a few key differences.

How does this differ from req?

conditionalPanel was designed to specifically enable Shiny-programmers to conditionally show or hide UI elements. Unlike the req-method, conditionalPanel is evaluated within the UI-part of the app, meaning that it doesn’t rely on renderUI to conditionally render the various inputs of the shinyverse. But wait, you might ask, how can Shiny evaluate any conditions in the UI-side of the app? Isn’t that sort of thing always done in the server-part? Well yes, that is true if the expression is written in R. To get around this, conditionalPanel relies on JavaScript to evaluate its conditions. After stating the condition in JS we can add any given UI-elements to our conditionalPanel as shown below:

conditionalPanel(
      # specify condition
      condition = "input.select == 'B'",

      # execute only if condition is met
      textAreaInput(inputId = "comment", 
                    label = "please add a comment", 
                    placeholder = "write comment here")
    )

This code chunk displays the same behaviour as the example shown in the last chapter with one major difference: It is now part of our ShinyApp’s UI-function unlike the req-solution, which was a uiOutput calculated in the server-part of the app and later passed to our UI-function as a list-element.

A simple example app:

Rewriting the app to include conditionalPanel instead of req yields a script that looks something like this:

library(shiny)
library(shinydashboard)

ui <- dashboardPage(

  dashboardHeader(),
  dashboardSidebar(
    selectInput(inputId = "select", 
                label = "please select an option", 
                choices = LETTERS[1:3]),
    conditionalPanel(
      condition = "input.select == 'B'",
      textAreaInput(inputId = "comment", 
                    label = "please add a comment", 
                    placeholder = "write comment here")
    )
  ),
  dashboardBody(
    uiOutput("selection_text"),
    textOutput("comment_text")
    )
)

server <- function(input, output) {

  output$selection_text <- renderUI({
    paste("The selected option is", input$select)
  })

  output$comment_text <- renderText({
    input$comment
  })
}

shinyApp(ui = ui, server = server)

With these two simple examples we have demonstrated multiple ways of letting your displayed UI elements react to how a user interacts with your app – both on the server, as well as the UI side of the application. In order to keep things simple I have used a basic textAreaInput for this demonstration, but both renderUI and conditionalPanel can hold so much more than just a simple input element.

So get creative and utilize these tools, maybe even in combination with the functions from part 1 of this series, to make your apps even shinier!

At STATWORX, we regularly deploy our project results with the help of Shiny. It’s not only an easy way of letting potential users interact with your R-code, but it’s also fun to design a good-looking app.

One of Shiny’s biggest strengths is its inherent reactivity after all being reactive to user input is a web-applications prime purpose. Unfortunately, many apps seem to only make use of Shiny’s responsiveness on the server side while keeping the UI completely static. This doesn’t have to be necessarily bad. Some apps wouldn’t profit from having dynamic UI elements. Adding them regardless could result in the app feeling gimmicky. But in many cases adding reactivity to the UI can not only result in less clutter on the screen but also cleaner code. And we all like that, don’t we?

A toolbox for reactivity: renderUI

Shiny natively provides convenient tools to turn the UI of any app reactive to input. In today’s blog entry, we are namely going to look at the renderUI function in conjunction with lapply and do.call.

renderUI is helpful because it frees us from the chains of having to define what kind of object we’d like to render in our render function. renderUI can render any UI element. We could, for example, let the type of the content of our uiOutput be reactive to an input instead of being set in stone.

Introducing reactivity with lapply

Imagine a situation where you’re tasked with building a dashboard showing the user three different KPIs for three different countries. The most obvious approach would be to specify the position of each KPI-box on the UI side of the app and creating each element on the server side with the help of shinydashboard::renderValueBox as seen in the example below.

The common way

library(shiny)
library(shinydashboard)

ui <- dashboardPage(

  dashboardHeader(),
  dashboardSidebar(),

  dashboardBody(column(width = 4, 
                       fluidRow(valueBoxOutput("ch_1", width = 12)),
                       fluidRow(valueBoxOutput("jp_1", width = 12)),
                       fluidRow(valueBoxOutput("ger_1", width = 12))),
                column(width = 4,
                       fluidRow(valueBoxOutput("ch_2", width = 12)),
                       fluidRow(valueBoxOutput("jp_2", width = 12)),
                       fluidRow(valueBoxOutput("ger_2", width = 12))),
                column(width = 4, 
                       fluidRow(valueBoxOutput("ch_3", width = 12)),
                       fluidRow(valueBoxOutput("jp_3", width = 12)),
                       fluidRow(valueBoxOutput("ger_3", width = 12)))
  )
)

server <- function(input, output) {

  output$ch_1 <- renderValueBox({
    valueBox(value = "CH",
             subtitle = "Box 1")
  })

  output$ch_2 <- renderValueBox({
    valueBox(value = "CH",
             subtitle = "Box 2")
  })

  output$ch_3 <- renderValueBox({
    valueBox(value = "CH",
             subtitle = "Box 3",
             width = 12)
  })

  output$jp_1 <- renderValueBox({
    valueBox(value = "JP",
             subtitle = "Box 1",
             width = 12)
  })

  output$jp_2 <- renderValueBox({
    valueBox(value = "JP",
             subtitle = "Box 2",
             width = 12)
  })

  output$jp_3 <- renderValueBox({
    valueBox(value = "JP",
             subtitle = "Box 3",
             width = 12)
  })

  output$ger_1 <- renderValueBox({
    valueBox(value = "GER",
             subtitle = "Box 1",
             width = 12)
  })

  output$ger_2 <- renderValueBox({
    valueBox(value = "GER",
             subtitle = "Box 2",
             width = 12)
  })

  output$ger_3 <- renderValueBox({
    valueBox(value = "GER",
             subtitle = "Box 3",
             width = 12)
  })
}

shinyApp(ui = ui, server = server)

This might be a working solution to the task at hand, but it is hardly an elegant one. The valueboxes take up a large amount of space in our app and even though they can be resized or moved around, we always have to look at all the boxes, regardless of which ones are currently of interest. The code is also highly repetitive and largely consists of copy-pasted code chunks. A much more elegant solution would be to only show the boxes for each unit of interest (in our case countries) as chosen by the user. Here’s where renderUI comes in.

renderUI not only allows us to render UI objects of any type but also integrates well with the lapply function. This means that we don’t have to render every valuebox separately, but let lapply do this repetitive job for us.

The reactive way

Assuming we have any kind of input named „select“ in our app, the following code chunk will generate a valuebox for each element selected with that input. The generated boxes will show the name of each individual element as value and have their subtitle set to „Box 1“.

lapply(seq_along(input$select), function(i) {
      fluidRow(
        valueBox(value = input$select[i],
               subtitle = "Box 1",
               width = 12)
      )
    })

How does this work exactly? The lapply function iterates over each element of our input „select“ and executes whatever code we feed it once per element. In our case, that means lapply takes the elements of our input and creates a valuebox embedded in a fluidrow for each (technically it just spits out the corresponding HTML code that would create that).

This has multiple advantages:

  • Only boxes for chosen elements are shown, reducing visual clutter and showing what really matters.
  • We have effectively condensed 3 renderValueBox calls into a single renderUI call, reducing copy-pasted sections in our code.

If we apply this to our app our code will look something like this:

library(shiny)
library(shinydashboard)

ui <- dashboardPage(
  dashboardHeader(),

  dashboardSidebar(
    selectizeInput(
      inputId = "select",
      label = "Select countries:",
      choices = c("CH", "JP", "GER"),
      multiple = TRUE)
  ),

  dashboardBody(column(4, uiOutput("ui1")),
                column(4, uiOutput("ui2")),
                column(4, uiOutput("ui3")))
  )

server <- function(input, output) {

  output$ui1 <- renderUI({
    req(input$select)

    lapply(seq_along(input$select), function(i) {
      fluidRow(
        valueBox(value = input$select[i],
               subtitle = "Box 1",
               width = 12)
        )
    })
  })

  output$ui2 <- renderUI({
    req(input$select)

    lapply(seq_along(input$select), function(i) {
      fluidRow(
        valueBox(value = input$select[i],
               subtitle = "Box 2",
               width = 12)
      )
    })
  })

  output$ui3 <- renderUI({
    req(input$select)

    lapply(seq_along(input$select), function(i) {
      fluidRow(
        valueBox(value = input$select[i],
               subtitle = "Box 3",
               width = 12)
      )
    })
  })
}

shinyApp(ui = ui, server = server)

The UI now dynamically responds to our inputs in the selectizeInput. This means that users can still show all KPI boxes if needed – but they won’t have to. In my opinion, this flexibility is what shiny was designed for – letting users interact with R-code dynamically. We have also effectively cut down on copy-pasted code by 66% already! There is still some repetition in the multiple renderUI function calls, but the server side of our app is already much more pleasing to read and make sense of than the static example of our previous app.

dynamic-ui-with-selectizeInput

Beyond lapply: Going further with do.call

We have just seen that with the help of lapply renderUI can dynamically generate entire UI elements. That is, however, not the full extent of what renderUI can do. Individual parts of a UI element can also be generated dynamically if we employ the help of functions that allow us to pass the dynamically generated parts of a UI element as arguments to the function call creating the element. Within the reactive context of renderUI we can call functions at will, which means that we have more tools than just lapply on our hands. Enter do.call. The do.call function enables us to execute function calls by passing a list of arguments to said function. This may sound like function-ception but bear with me.

Following the do.call

Assume that we’d like to create a tabsetPanel, but instead of specifying the number of tabs shown we let the users decide. The solution to this task is a two-step process.

  1. We use lapply to iterate over a user-chosen number to create the specified amount of tabs.
  2. We use do.call to execute the shiny::tabsetPanel function with the tabs from step 1 being passed to the do.call as a simple argument.

This would look something like this:

# create tabs from input
myTabs <- lapply(1:input$slider, function(i) {

  tabPanel(title = glue("Tab {i}"),
           h3(glue("Content {i}"))
  )
})

# execute tabsetPanel with tabs added as arguments
do.call(tabsetPanel, myTabs)

This creates the HTML for a tabsetPanel with a user-chosen number of tabs that all have a unique title and can be filled with content. You can try it out with this example app:

library(shiny)
library(shinydashboard)
library(glue)

ui <- dashboardPage(
  dashboardHeader(),

  dashboardSidebar(
    sliderInput(inputId = "slider", label = NULL, min = 1, max = 5, value = 3, step = 1)
  ),

  dashboardBody(
    fluidRow(
      box(width = 12,
          p(mainPanel(width = 12,
                      column(width = 6, uiOutput("reference")),
                      column(width = 6, uiOutput("comparison"))
          )
          )
      )
    )
  )
)

server <- function(input, output) {

  output$reference <- renderUI({
    tabsetPanel(
      tabPanel(
        "Reference",
        h3("Reference Content"))
    )
  })

  output$comparison <- renderUI({
    req(input$slider)

    myTabs <- lapply(1:input$slider, function(i) {

      tabPanel(title = glue("Tab {i}"),
               h3(glue("Content {i}"))
      )
    })
    do.call(tabsetPanel, myTabs)
  })
}

shinyApp(ui = ui, server = server)
dynamic-ui-with-do.call

As you can see, renderUI offers a very flexible and dynamic approach to offer to UI design when being used in conjunction with lapply and the more advanced do.call.

Try using these tools next time you build an app and bring the same reactivity to Shiny’s UI as your already used to utilizing in its server part.

Since its release in 2014, Docker has become an essential tool for deploying applications. At STATWORX, R is part of our daily toolset. Clearly, many of us were thrilled to learn about RStudio’s Rocker Project, which makes containerizing R code easier than ever.

Containerization is useful in a lot of different situations. To me, it is very helpful when I’m deploying R code in a cloud computing environment, where the coded workflow needs to be run on a regular schedule. Docker is a perfect fit for this task for two reasons: On the one hand, you can simply schedule a container to be started at your desired interval. On the other hand, you always know what behavior and what output to expect, because of the static nature of containers. So if you’re tasked with deploying a machine-learning model that should regularly make predictions, consider doing so with the help of Docker. This blog entry will guide you through the entire process of getting your R script to run in a Docker container one step at a time. For the sake of simplicity, we’ll be working with a local dataset.

I’d like to start off with emphasizing that this blog entry is not a general Docker tutorial. If you don’t really know what images and containers are, I recommend that you take a look at the Docker Curriculum first. If you’re interested in running an RStudio session within a Docker container, then I suggest you pay a visit to the OpenSciLabs Docker Tutorial instead. This blog specifically focuses on containerizing an R script to eventually execute it automatically each time the container is started, without any user interaction – thus eliminating the need for the RStudio IDE. The syntax used in the Dockerfile and the command line will only be treated briefly here, so it’s best to get familiar with the basics of Docker before reading any further.

What we’ll need

For the entire procedure we’ll be needing the following:

  • An R script which we’ll build into an image
  • A base image on top of which we’ll build our new image
  • A Dockerfile which we’ll use to build our new image

You can clone all following files and the folder structure I used from the STATWORX GitHub Repository.

building an R script into an image

The R script

We’re working with a very simple R script that imports a dataframe, manipulates it, creates a plot based on the manipulated data and, in the end, exports both the plot and the data it is based on. The dataframe used for this example is the US 500 Records dataset provided by Brian Dunning. If you’d like to work along, I’d recommend you to copy this dataset into the 01_data folder.

library(readr)
library(dplyr)
library(ggplot2)
library(forcats)

# import dataframe
df <- read_csv("01_data/us-500.csv")

# manipulate data
plot_data <- df %>%
  group_by(state) %>%
  count()

# save manipulated data to output folder
write_csv(plot_data, "03_output/plot_data.csv")

# create plot based on manipulated data
plot <- plot_data %>% 
  ggplot()+
  geom_col(aes(fct_reorder(state, n), 
               n, 
               fill = n))+
  coord_flip()+
  labs(
    title = "Number of people by state",
    subtitle = "From US-500 dataset",
    x = "State",
    y = "Number of people"
  )+ 
  theme_bw()

# save plot to output folder
ggsave("03_output/myplot.png", width = 10, height = 8, dpi = 100)

This creates a simple bar plot based on our dataset:

example plot from US-500 dataset

We use this script not only to run R code inside a Docker container, but we also want to run it on data from outside our container and afterward save our results.

The base image

The DockerHub page of the Rocker project lists all available Rocker repositories. Seeing as we’re using Tidyverse-packages in our script the rocker/tidyverse image should be an obvious choice. The problem with this repository is that it also includes RStudio, which is not something we want for this specific project. This means that we’ll have to work with the r-base repository instead and build our own Tidyverse-enabled image. We can pull the rocker/r-base image from DockerHub by executing the following command in the terminal:

docker pull rocker/r-base

This will pull the Base-R image from the Rocker DockerHub repository. We can run a container based on this image by typing the following into the terminal:

docker run -it --rm rocker/r-base

Congratulations! You are now running R inside a Docker container! The terminal was turned into an R console, which we can now interact with thanks to the -it argument. The —-rm argument makes sure the container is automatically removed once we stop it. You’re free to experiment with your containerized R session, which you can exit by executing the q() function from the R console. You could, for example, start installing the packages you need for your workflow with install.packages(), but that’s usually a tedious and time-consuming task. It is better to already build your desired packages into the image, so you don’t have to bother with manually installing the packages you need every time you start a container. For that, we need a Dockerfile.

The Dockerfile

With a Dockerfile, we tell Docker how to build our new image. A Dockerfile is a text file that must be called „Dockerfile.txt“ and by default is assumed to be located in the build-context root directory (which in our case would be the „R-Script in Docker“ folder). First, we have to define the image on top of which we’d like to build ours. Depending on how we’d like our image to be set up, we give it a list of instructions so that running containers will be as smooth and efficient as possible. In this case, I’d like to base our new image on the previously discussed rocker/r-base image. Next, we replicate the local folder structure, so we can specify the directories we want in the Dockerfile. After that we copy the files which we want our image to have access to into said directories – this is how you get your R script into the Docker image. Furthermore, this allows us to prevent having to manually install packages after starting a container, as we can prepare a second R script that takes care of the package installation. Simply copying the R script is not enough, we also need to tell Docker to automatically run it when building the image. And that’s our first Dockerfile!

# Base image https://hub.docker.com/u/rocker/
FROM rocker/r-base:latest

## create directories
RUN mkdir -p /01_data
RUN mkdir -p /02_code
RUN mkdir -p /03_output

## copy files
COPY /02_code/install_packages.R /02_code/install_packages.R
COPY /02_code/myScript.R /02_code/myScript.R

## install R-packages
RUN Rscript /02_code/install_packages.R

Don’t forget preparing and saving your appropriate install_packages.R script, where you specify which R packages you need to be pre-installed in your image. In our case the file would look like this:

install.packages("readr")
install.packages("dplyr")
install.packages("ggplot2")
install.packages("forcats")

Building and running the image

Now we have assembled all necessary parts for our new Docker image. Use the terminal to navigate to the folder where your Dockerfile is located and build the image with

docker build -t myname/myimage .

The process will take a while due to the package installation. Once it’s finished we can test our new image by starting a container with

docker run -it --rm -v ~/"R-Script in Docker"/01_data:/01_data -v ~/"R-Script in Docker"/03_output:/03_output myname/myimage

Using the -v arguments signales Docker which local folders to map to the created folders inside the container. This is important because we want to both get our dataframe inside the container and save our output from the workflow locally so it isn’t lost once the container is stopped.

This container can now interact with our dataframe in the 01_data folder and has a copy of our workflow-script inside its own 02_code folder. Telling R to source("02_code/myScript.R") will run the script and save the output into the 03_output folder, from where it will also be copied to our local 03_output folder.

running a container based on our image

Improving on what we have

Now that we have tested and confirmed that our R script runs as expected when containerized, there’s only a few things missing.

  1. We don’t want to manually have to source the script from inside the container, but have it run automatically whenever the container is started.

We can achieve this very easily by simply adding the following command to the end of our Dockerfile:

## run the script
CMD Rscript /02_code/myScript.R

This points towards the location of our script within the folder structure of our container, marks it as R code and then tells it to run whenever the container is started. Making changes to our Dockerfile, of course, means that we have to rebuild our image and that in turn means that we have to start the slow process of pre-installing our packages all over again. This is tedious, especially if chances are that there will be further revisions of any of the components of our image down the road. That’s why I suggest we

  1. Create an intermediary Docker image where we install all important packages and dependencies so that we can then build our final, desired image on top.

This way we can quickly rebuild our image within seconds, which allows us to freely experiment with our code without having to sit through Docker installing packages over and over again.

Building an intermediary image

The Dockerfile for our intermediary image looks very similar to our previous example. Because I decided to modify my install_packages() script to include the entire tidyverse for future use, I also needed to install a few debian packages the tidyverse depends upon. Not all of these are 100% necessary, but all of them should be useful in one way or another.

# Base image https://hub.docker.com/u/rocker/
FROM rocker/r-base:latest

## install debian packages
RUN apt-get update -qq && apt-get -y --no-install-recommends install 
libxml2-dev 
libcairo2-dev 
libsqlite3-dev 
libmariadbd-dev 
libpq-dev 
libssh2-1-dev 
unixodbc-dev 
libcurl4-openssl-dev 
libssl-dev

## copy files
COPY 02_code/install_packages.R /install_packages.R

## install R-packages
RUN Rscript /install_packages.R

I build the image by navigating to the folder where my Dockerfile sits and executing the Docker build command again:

docker build -t oliverstatworx/base-r-tidyverse .

I have also pushed this image to my DockerHub so if you ever need a base-R image with the tidyverse pre-installed you can simply build it ontop of my image without having to go through the hassle of building it yourself.

Now that the intermediary image has been built we can change our original Dockerfile to build on top of it instead of rocker/r-base and remove the package-installation because our intermediary image already takes care of that. We also add the last line that automatically starts running our script whenever the container is started. Our final Dockerfile should look something like this:

# Base image https://hub.docker.com/u/oliverstatworx/
FROM oliverstatworx/base-r-tidyverse:latest

## create directories
RUN mkdir -p /01_data
RUN mkdir -p /02_code
RUN mkdir -p /03_output

## copy files
COPY /02_code/myScript.R /02_code/myScript.R

## run the script
CMD Rscript /02_code/myScript.R

The final touches

Since we built our image on top of an intermediary image with all our needed packages, we can now easily modify parts of our final image to our liking. I like making my R script less verbose by suppressing warnings and messages that are not of interest anymore (since I already tested the image and know that everything works as expected) and adding messages that tell the user which part of the script is currently being executed by the running container.

suppressPackageStartupMessages(library(readr))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(forcats))

options(scipen = 999,
        readr.num_columns = 0)

print("Starting Workflow")

# import dataframe
print("Importing Dataframe")
df <- read_csv("01_data/us-500.csv")

# manipulate data
print("Manipulating Data")
plot_data <- df %>%
  group_by(state) %>%
  count()

# save manipulated data to output folder
print("Writing manipulated Data to .csv")
write_csv(plot_data, "03_output/plot_data.csv")

# create plot based on manipulated data
print("Creating Plot")
plot <- plot_data %>% 
  ggplot()+
  geom_col(aes(fct_reorder(state, n), 
               n, 
               fill = n))+
  coord_flip()+
  labs(
    title = "Number of people by state",
    subtitle = "From US-500 dataset",
    x = "State",
    y = "Number of people"
  )+ 
  theme_bw()

# save plot to output folder
print("Saving Plot")
ggsave("03_output/myplot.png", width = 10, height = 8, dpi = 100)
print("Worflow Finished")

After navigating to the folder where our Dockerfile is located we rebuild our image once more with: docker build -t myname/myimage . Once again we start a container based on our image and map the 01_data and 03_output folders to our local directories. This way we can import our data and save our created output locally:

docker run -it --rm -v ~/"R-Script in Docker"/01_data:/01_data -v ~/"R-Script in Docker"/03_output:/03_output myname/myimage

Congratulations, you now have a clean Docker image that not only automatically runs your R script whenever a container is started, but also tells you exactly which part of the code it is executing via console messages. Happy docking!

Since its release in 2014, Docker has become an essential tool for deploying applications. At STATWORX, R is part of our daily toolset. Clearly, many of us were thrilled to learn about RStudio’s Rocker Project, which makes containerizing R code easier than ever.

Containerization is useful in a lot of different situations. To me, it is very helpful when I’m deploying R code in a cloud computing environment, where the coded workflow needs to be run on a regular schedule. Docker is a perfect fit for this task for two reasons: On the one hand, you can simply schedule a container to be started at your desired interval. On the other hand, you always know what behavior and what output to expect, because of the static nature of containers. So if you’re tasked with deploying a machine-learning model that should regularly make predictions, consider doing so with the help of Docker. This blog entry will guide you through the entire process of getting your R script to run in a Docker container one step at a time. For the sake of simplicity, we’ll be working with a local dataset.

I’d like to start off with emphasizing that this blog entry is not a general Docker tutorial. If you don’t really know what images and containers are, I recommend that you take a look at the Docker Curriculum first. If you’re interested in running an RStudio session within a Docker container, then I suggest you pay a visit to the OpenSciLabs Docker Tutorial instead. This blog specifically focuses on containerizing an R script to eventually execute it automatically each time the container is started, without any user interaction – thus eliminating the need for the RStudio IDE. The syntax used in the Dockerfile and the command line will only be treated briefly here, so it’s best to get familiar with the basics of Docker before reading any further.

What we’ll need

For the entire procedure we’ll be needing the following:

You can clone all following files and the folder structure I used from the STATWORX GitHub Repository.

building an R script into an image

The R script

We’re working with a very simple R script that imports a dataframe, manipulates it, creates a plot based on the manipulated data and, in the end, exports both the plot and the data it is based on. The dataframe used for this example is the US 500 Records dataset provided by Brian Dunning. If you’d like to work along, I’d recommend you to copy this dataset into the 01_data folder.

library(readr)
library(dplyr)
library(ggplot2)
library(forcats)

# import dataframe
df <- read_csv("01_data/us-500.csv")

# manipulate data
plot_data <- df %>%
  group_by(state) %>%
  count()

# save manipulated data to output folder
write_csv(plot_data, "03_output/plot_data.csv")

# create plot based on manipulated data
plot <- plot_data %>% 
  ggplot()+
  geom_col(aes(fct_reorder(state, n), 
               n, 
               fill = n))+
  coord_flip()+
  labs(
    title = "Number of people by state",
    subtitle = "From US-500 dataset",
    x = "State",
    y = "Number of people"
  )+ 
  theme_bw()

# save plot to output folder
ggsave("03_output/myplot.png", width = 10, height = 8, dpi = 100)

This creates a simple bar plot based on our dataset:

example plot from US-500 dataset

We use this script not only to run R code inside a Docker container, but we also want to run it on data from outside our container and afterward save our results.

The base image

The DockerHub page of the Rocker project lists all available Rocker repositories. Seeing as we’re using Tidyverse-packages in our script the rocker/tidyverse image should be an obvious choice. The problem with this repository is that it also includes RStudio, which is not something we want for this specific project. This means that we’ll have to work with the r-base repository instead and build our own Tidyverse-enabled image. We can pull the rocker/r-base image from DockerHub by executing the following command in the terminal:

docker pull rocker/r-base

This will pull the Base-R image from the Rocker DockerHub repository. We can run a container based on this image by typing the following into the terminal:

docker run -it --rm rocker/r-base

Congratulations! You are now running R inside a Docker container! The terminal was turned into an R console, which we can now interact with thanks to the -it argument. The —-rm argument makes sure the container is automatically removed once we stop it. You’re free to experiment with your containerized R session, which you can exit by executing the q() function from the R console. You could, for example, start installing the packages you need for your workflow with install.packages(), but that’s usually a tedious and time-consuming task. It is better to already build your desired packages into the image, so you don’t have to bother with manually installing the packages you need every time you start a container. For that, we need a Dockerfile.

The Dockerfile

With a Dockerfile, we tell Docker how to build our new image. A Dockerfile is a text file that must be called „Dockerfile.txt“ and by default is assumed to be located in the build-context root directory (which in our case would be the „R-Script in Docker“ folder). First, we have to define the image on top of which we’d like to build ours. Depending on how we’d like our image to be set up, we give it a list of instructions so that running containers will be as smooth and efficient as possible. In this case, I’d like to base our new image on the previously discussed rocker/r-base image. Next, we replicate the local folder structure, so we can specify the directories we want in the Dockerfile. After that we copy the files which we want our image to have access to into said directories – this is how you get your R script into the Docker image. Furthermore, this allows us to prevent having to manually install packages after starting a container, as we can prepare a second R script that takes care of the package installation. Simply copying the R script is not enough, we also need to tell Docker to automatically run it when building the image. And that’s our first Dockerfile!

# Base image https://hub.docker.com/u/rocker/
FROM rocker/r-base:latest

## create directories
RUN mkdir -p /01_data
RUN mkdir -p /02_code
RUN mkdir -p /03_output

## copy files
COPY /02_code/install_packages.R /02_code/install_packages.R
COPY /02_code/myScript.R /02_code/myScript.R

## install R-packages
RUN Rscript /02_code/install_packages.R

Don’t forget preparing and saving your appropriate install_packages.R script, where you specify which R packages you need to be pre-installed in your image. In our case the file would look like this:

install.packages("readr")
install.packages("dplyr")
install.packages("ggplot2")
install.packages("forcats")

Building and running the image

Now we have assembled all necessary parts for our new Docker image. Use the terminal to navigate to the folder where your Dockerfile is located and build the image with

docker build -t myname/myimage .

The process will take a while due to the package installation. Once it’s finished we can test our new image by starting a container with

docker run -it --rm -v ~/"R-Script in Docker"/01_data:/01_data -v ~/"R-Script in Docker"/03_output:/03_output myname/myimage

Using the -v arguments signales Docker which local folders to map to the created folders inside the container. This is important because we want to both get our dataframe inside the container and save our output from the workflow locally so it isn’t lost once the container is stopped.

This container can now interact with our dataframe in the 01_data folder and has a copy of our workflow-script inside its own 02_code folder. Telling R to source("02_code/myScript.R") will run the script and save the output into the 03_output folder, from where it will also be copied to our local 03_output folder.

running a container based on our image

Improving on what we have

Now that we have tested and confirmed that our R script runs as expected when containerized, there’s only a few things missing.

  1. We don’t want to manually have to source the script from inside the container, but have it run automatically whenever the container is started.

We can achieve this very easily by simply adding the following command to the end of our Dockerfile:

## run the script
CMD Rscript /02_code/myScript.R

This points towards the location of our script within the folder structure of our container, marks it as R code and then tells it to run whenever the container is started. Making changes to our Dockerfile, of course, means that we have to rebuild our image and that in turn means that we have to start the slow process of pre-installing our packages all over again. This is tedious, especially if chances are that there will be further revisions of any of the components of our image down the road. That’s why I suggest we

  1. Create an intermediary Docker image where we install all important packages and dependencies so that we can then build our final, desired image on top.

This way we can quickly rebuild our image within seconds, which allows us to freely experiment with our code without having to sit through Docker installing packages over and over again.

Building an intermediary image

The Dockerfile for our intermediary image looks very similar to our previous example. Because I decided to modify my install_packages() script to include the entire tidyverse for future use, I also needed to install a few debian packages the tidyverse depends upon. Not all of these are 100% necessary, but all of them should be useful in one way or another.

# Base image https://hub.docker.com/u/rocker/
FROM rocker/r-base:latest

## install debian packages
RUN apt-get update -qq && apt-get -y --no-install-recommends install 
libxml2-dev 
libcairo2-dev 
libsqlite3-dev 
libmariadbd-dev 
libpq-dev 
libssh2-1-dev 
unixodbc-dev 
libcurl4-openssl-dev 
libssl-dev

## copy files
COPY 02_code/install_packages.R /install_packages.R

## install R-packages
RUN Rscript /install_packages.R

I build the image by navigating to the folder where my Dockerfile sits and executing the Docker build command again:

docker build -t oliverstatworx/base-r-tidyverse .

I have also pushed this image to my DockerHub so if you ever need a base-R image with the tidyverse pre-installed you can simply build it ontop of my image without having to go through the hassle of building it yourself.

Now that the intermediary image has been built we can change our original Dockerfile to build on top of it instead of rocker/r-base and remove the package-installation because our intermediary image already takes care of that. We also add the last line that automatically starts running our script whenever the container is started. Our final Dockerfile should look something like this:

# Base image https://hub.docker.com/u/oliverstatworx/
FROM oliverstatworx/base-r-tidyverse:latest

## create directories
RUN mkdir -p /01_data
RUN mkdir -p /02_code
RUN mkdir -p /03_output

## copy files
COPY /02_code/myScript.R /02_code/myScript.R

## run the script
CMD Rscript /02_code/myScript.R

The final touches

Since we built our image on top of an intermediary image with all our needed packages, we can now easily modify parts of our final image to our liking. I like making my R script less verbose by suppressing warnings and messages that are not of interest anymore (since I already tested the image and know that everything works as expected) and adding messages that tell the user which part of the script is currently being executed by the running container.

suppressPackageStartupMessages(library(readr))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(forcats))

options(scipen = 999,
        readr.num_columns = 0)

print("Starting Workflow")

# import dataframe
print("Importing Dataframe")
df <- read_csv("01_data/us-500.csv")

# manipulate data
print("Manipulating Data")
plot_data <- df %>%
  group_by(state) %>%
  count()

# save manipulated data to output folder
print("Writing manipulated Data to .csv")
write_csv(plot_data, "03_output/plot_data.csv")

# create plot based on manipulated data
print("Creating Plot")
plot <- plot_data %>% 
  ggplot()+
  geom_col(aes(fct_reorder(state, n), 
               n, 
               fill = n))+
  coord_flip()+
  labs(
    title = "Number of people by state",
    subtitle = "From US-500 dataset",
    x = "State",
    y = "Number of people"
  )+ 
  theme_bw()

# save plot to output folder
print("Saving Plot")
ggsave("03_output/myplot.png", width = 10, height = 8, dpi = 100)
print("Worflow Finished")

After navigating to the folder where our Dockerfile is located we rebuild our image once more with: docker build -t myname/myimage . Once again we start a container based on our image and map the 01_data and 03_output folders to our local directories. This way we can import our data and save our created output locally:

docker run -it --rm -v ~/"R-Script in Docker"/01_data:/01_data -v ~/"R-Script in Docker"/03_output:/03_output myname/myimage

Congratulations, you now have a clean Docker image that not only automatically runs your R script whenever a container is started, but also tells you exactly which part of the code it is executing via console messages. Happy docking!