Privacy and Security


“Workday will be unveiling a new Data-as-a-Service offering, for which benchmarking will be the first service.” The benchmarking service lets customers select metrics that they wish to contribute to the service. In return, they gain access to benchmarking data on the same metrics from peer groups.

Workday customers have the ability to find answers for analytical questions such as “How does my company’s turnover rate compare to small technology companies?” or “What’s the median profit margin of similar size companies in my industry and location?”

The challenged to build a data sharing platform that could satisfy these requirements when fearless leader (then SVP of User Experience and now CTO) Joe Korngiebel announced Data-as-a-Service based on customer demand at Workday Rising in September of 2016.


The Architecture

The diagram below depicts the Data-as-a-Service (DaaS) architecture in a layered structure. This layering standardizes the data collection and data access controls:


  • A global data architecture (single schema) for aggregate (de-identified) data.
  • Scalable and secure cloud data storage.
  • Rationalized definitions across customers (a common taxonomy).


Stateless Warehouse

The warehouse sits on top of Amazon Redshift, which enables the warehouse to be highly scalable and to meet growing demand. In this sense, the Data-as-a-Service Platform uses scalable infrastructure components of a public cloud, such as load balancers and managed services.


Asynchronous Data Contribution

The Push components consist of all software subsystems that we need to curate, de-identify, validate, and contribute the customer data for the data sets that they have opted in for. The push job runs asynchronously on each customer tenant that has opted into at least one data set, and contributes only that data set. The data collection frequency and periodicity are governed by each data set. This model builds resilience into the architecture, and handles:

  • Privacy, Ethics and Compliance (PEC) requirements; allows customers to opt out and be forgotten.
  • Built-in disaster recovery, long running or intermittent job failures.
  • Schema changes and new data sets with every Workday deployment, and bug fixes.


Realtime Reporting Requirements

The Pull components consist of runtime query request, query parameters, privacy controls, and query DSL (Domain specific language). Workday applications connect to the DaaS Data Warehousing System to issue real-time analytical queries using the Workday microservice for DaaS. 

The microservice interfaces with other Workday services that are only within the Workday network, Amazon Simple Storage Service, and Redshift services that are within the Workday Amazon VPC (Virtual Private Cloud). This API-driven access allows any Workday applications to interact with the data in the DaaS Platform within Workday data centers, providing flexibility to other Workday services. There are a few layers of APIs defined that can be used to interact with DaaS data sets:

  1. Native REST call: Service-to-service access.
  2. Application layer access: Internal Workday XpressO API.
  3. Framework services access: Low level Java API access.


Privacy and Security

On a software-as-a-service platform, tenant data is strictly segregated to maintain separation between the data of each tenant. In the benchmarking use case, it is desirable to share certain measure data for comparison purposes and to get a more complete view of a situation (example: salary surveys or other industry benchmarks). This sharing requires preparation for the tenant to scrub the data of any proprietary or sensitive information. Because of this, the sharing needs to take place periodically with an extract and transformation job to be able pass through the de-identification filters and aggregation functions prior to sending the contribution.


Permission and Configurable Security

The framework supports segmented security at subcategory granularity. More enhancements in this area will surface as we add more functional areas, such as intersection security for geographic location or worker type, for example. The diagram below depicts the level of control the customers’ administrators can implement for their tenant.


Authorization

The Data-as-a-Service Platform is only accessible via the Workday Microservice. The authorization is at a service level between Workday Services and Microservice. From Microservice to Redshift, it is a Virtual Private Cloud with an SSL certificate and username and password authentication.

Encryption

The data is transferred from the customer via TLS (Transport Layer Security, https). It is then encrypted at rest via AWS KMS Keys.

Privacy Controls


After much research, we have identified two methods of protecting the customers’ anonymity:

  • Differential privacy and error injection.
  • Thresholding as a strategy.


Use Cases That Are Prime for the DaaS Platform


Benchmarking solution: A global automated data collection and analytical system that allows Workday customers to compare their company performance indicators to their peers in their industry.


Single version for 3rd-party data: The DaaS Platform can act as a single global, secure and performant storage with key-value pair search capabilities. For example, Workday applications can utilize publicly available data for supply chain vendor integration or the geo-location data sets.


Marketplace for configuration data across customer tenants: The DaaS Platform can be enhanced to store and allow sharing of configuration data across customer tenants. For example, professional services firms can share high-value custom report definitions, or a customer can share configurations that they find useful.


Billing and metering: The Workday Cloud Platform API usage and metering data sets are currently hosted on the DaaS Platform. This can be expanded to be a single billing system for all Workday applications that require a billing solution.


Near-real time performance optimizations: Workday services can register incremental usage statistics to the DaaS platform. Those services can then make data-driven runtime analysis and adjustments by utilizing the real-time aspect of the DaaS query capabilities. Examples are dynamic ordering of report filters at report runtime based on field execution statistics, or transaction commit logic that can be optimized per task.