Azure Data Factory Date Parameters with Timezone Sensitive Default or Adhoc Values

Date and time values have been the bane of programmers’ existence since Pandora opened a box several millennia back……at least it feels that long. I wanted to share with you how we implemented flexible date parameters for an Azure Data Factory pipeline in a recent project.

Names, places, industries, widgets and other identifying information have been changed for entertainment purposes.

The Scenario

Hakamada Industries wants to keep track of their Big Kahuna Surfboards manufacturing subsidiary. Hakamada’s corporate offices are in Tokyo (UTC+9) and Big Kahuna Surfboards operates in Honolulu (UTC-10).

Big Kahuna Surfboards has a REST API that will take StartDate and EndDate parameters in Hawaiian Standard Time and return data that has been updated in the specified time-frame.

Mr. Hakamada wants to have an automated pipeline that grabs the updated data from the API daily, and he also wants to be able to grab data from the API for whatever date range he wants. Hakamada Industries would prefer Azure Data Factory to be used because that is where all of their other ETL processes live. Most Azure resources and services use Coordinated Universal Time (UTC) by default.

For those of you keeping track at home, we’re dealing with Tokyo (UTC+9), Azure (UTC) and Honolulu (UTC-10)

The wrinkle

Dealing with different time-zones was one of the date/time challenges we faced in this project. On top of that we had to work around  limitations and assumptions made by an API that was out of our control.

DateTimes or Time values cannot be specified in Big Kahuna Surfboards’ API due to the ancient setup they are running. Despite their API not accepting them, DateTimes are ultimately what are being queried on the back end by the API. This means that providing the API with a StartDate and EndDate of 2000/10/10 will not return all of the surfboards made on October 10 2000, because both dates translate to 2000/10/10 00:00:00.

No one makes surfboards at midnight, and even if they did, the API would only return the surfboards made in that 1 second.

Because of this wrinkle, when looking for the most recent full day of surfboard production, the StartDate must be: TODAY – 1.

Back to the requirements and best practices

For normal automated runs, the StartDate will be yesterday and EndDate will be today. For an adhoc run, the StartDate and EndDate will be specified by the user or process calling the pipeline. For simplicity’s sake, we will assume whoever or whatever is performing adhoc runs is taking the eccentricities of the API into account when providing dates.

In being good programmers and following the DRY principle (Do not Repeat Yourself), we created one pipeline to handle the automated and adhoc workloads.

I know the example is a bit convoluted, and I appreciate you rolling with it. Rest assured we faced an API like this in real life.

Azure Data Factory (ADF)

Lets jump in to the Azure Data Factory. We’ll start with the blank pipeline named BigKahunaHakamada.

Step 1 – Add Date Parameters

We are going to add two string parameters to our pipeline as pDate1 and pDate2. The default values for these parameters will be “yyyy-mm-dd”.

Why string parameters? ADF does not allow date parameters at this time. ADF does have built in conversion functions from string to date so we will be able to treat them as proper dates later in the pipeline.

Why “yyyy-MM-dd”? While not the most common date format for most humans in most parts of the world, “yyyy-mm-dd” is one the most common computer date formats. It is officially known as  ISO 8601, I’m sure her mom calls her something else, and it is the date format required by the REST API. The “yyyy-mm-dd” default value also gives Mr. Hakamada, or whoever might be putting their date range into the pipeline manually, a guide as to the date format to use.

Step 2 – Add Date Variables

Now we are going to add two date variables called vDate1 and vDate2. They will also have a data type of string but will not have default values. vDate1 will be the StartDate passed to the API and vDate2 will be the EndDate.

The data type choices for variables are fewer than parameters. Only string, boolean, and array are available. Unlike parameters, variables can be updated in the pipeline.

Step 3 – Add an “If Condition” Activity Container

We are going to add an “If Condition” activity container to the canvas and call it “If DefaultDateParams“.  The If Condition activity will evaluate a boolean expression and then perform one set of activities if the expression is TRUE, or a different set of activities if the expression is FALSE.

Step 4 – Configure “If DefaultDateParams”

Step 4a – Set “If” Expression
Our expression is going to check to see if a specific date has been specified for pDate1 or if it is still “yyyy-mm-dd”.

The expression would be:

@equals(pipeline().parameters.pDate1,’yyyy-mm-dd’)

Step 4b – Configure ‘True’ Activities
In our scenario, if the value for pDate1 is “yyyy-mm-dd” this is most likely a scheduled pipeline run, or a run that is only looking to grab the most recent full day’s data. vDate1 should be yesterday and vDate2 should be today, both in Hawaiian Standard Time.

We will add a “Set variable” activity for vDate1 as a “True” activity inside the “If DefaultDateParams” container and name it vDate1ToYday. The value will be the dynamic content expression:

@formatDateTime(subtractFromTime(convertTimeZone(utcnow(),’UTC’,’Hawaiian Standard Time’),1,’Day’),’yyyy-MM-dd’)

 

There is a lot to unpack there. Let’s start with formatDateTime(….,’yyyy-MM-dd’). The formatDateTime() conversion function takes a datetime value and converts it to a string format specified in the second part of the function. vDate1 is a string, so the expression needs to return a string representation of the date. Since the Big Kahuna Surfboards API expects date parameters in ISO 8601 format, that is what we convert it to.

You will notice that we specify the output format as ‘yyyy-MM-dd’ and we have the default for pDate1 and pDate2 as ‘yyyy-mm-dd’.  The capitalized ‘MM’ is correct. Lowercase ‘m’ means “minutes” and would not return the correct value. Since end users will most likely not see the expression inside the pipeline, but will see the default parameter when triggering a pipeline from the ADF front end, we decided to go with the more user-friendly default parameters where everything is lower case. The case for the default parameter does not matter since the default value will never be directly set to a variable.

Next, there is the datetime timezone conversion. Hawaiian Standard Time is 10 hours behind UTC, meaning that for over 40% of the day, the current UTC date is not the current date in Hawaiian Standard time. We cannot simply take the current UTC date and subtract 1 day to get yesterday. Big Kahuna Surfboards’ API runs off of a server in Hawaii and we do not have access to modify the API. We have to talk to it in Hawaiian dates. The convertTimeZone() function will convert a datetime from one timezone to another. This part of the function will convert the current UTC datetime to the current Hawaiian Standard Time datetime:
convertTimeZone(utcnow(),’UTC’,’Hawaiian Standard Time’)

However, we do not want the current date, we want yesterday’s date. The subtractFromTime() function takes a datetime and subtracts the specified periods. In this case subtractFromTime(<converted Hawaiian datetime>,1,’Day’).

In summary, we used the following functions to set vDate1 to yesterday’s Hawaiian date if a user-defined value was not defined for pDate1.

  • formatDateTime() – to return a string
  • convertTimeZone() – to get the current date in Hawaii Standard Time
  • subtractFromTime() – to arrive at yesterday by subtracting 1 day.

Now we will add another “Set variable” activity in the True pathway. We will name it vDate2TovDate1 and set vDate2 equal to the current Hawaiian date:

@formatDateTime(convertTimeZone(utcnow(),’UTC’,’Hawaiian Standard Time’),’yyyy-MM-dd’)

Step 4c – Configure ‘False’ Activities
Luckily, the ‘False‘ activities are more straightforward. In our scenario these ‘False‘ activities will be executed when an end user or external process is providing a date range. We just need to set the variables vDate1 and vDate2 to match their parameter counterparts pDate1 and pDate2.

We’ll place two Set variable activities in the “If” container for vDate1 and vDate2 respectively. The dynamic formula for the vDateTopDate1 activity is:
@formatDateTime(pipeline().parameters.pDate1,’yyyy-MM-dd’)

You could probably guess the formula for the vDate2TopDate2 activity, but you can check out the screenshot if you want the spoilers.

Since setting these values are not dependent on each other, we can just place them both inside the “If” container. There is no need to link them to each other or put one in front of the other.

At this point we have set up a pipeline that will take two parameters and set two variables based on the parameter values.

Step 5 – Plug the date values into the API

The Azure Data Factory configuration for retrieving the data from an API will vary from API to API. To keep things simple for this example, we will make a GET request using the Web activity and provide the date parameters vDate1 and vDate2 as request header values.

Wrapping Up

And there you have it. This pipeline can provide default dates to an API based on the current date, and take time-zones into account. The same pipeline can also call the API with a date range provided to the pipeline at runtime.

Azure Data Factory is simplifying data movement and we would be excited to talk to you about how Azure Data Factory could help your organization. Head over to our contact page and let us know what ideas you have that we could help you help you realize.

But we’re not done yet. A super-fantastic feature of Azure Data Factory is that all of the work done in the screenshots can be easily replicated, shared, deployed, and a lot of other fancy DevOps verbs, using JSON.

Check it out:

{
    "$schema": "http://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {
        "factoryName": {
            "type": "string",
            "metadata": "Data Factory name"
        }
    },
    "variables": {
        "factoryId": "[concat('Microsoft.DataFactory/factories/', parameters('factoryName'))]"
    },
    "resources": [
        {
            "name": "[concat(parameters('factoryName'), '/BigKahunaHakamada')]",
            "type": "Microsoft.DataFactory/factories/pipelines",
            "apiVersion": "2018-06-01",
            "properties": {
                "activities": [
                    {
                        "name": "If DefaultDateParams",
                        "type": "IfCondition",
                        "dependsOn": [],
                        "userProperties": [],
                        "typeProperties": {
                            "expression": {
                                "value": "@equals(pipeline().parameters.pDate1,'yyyy-mm-dd')",
                                "type": "Expression"
                            },
                            "ifFalseActivities": [
                                {
                                    "name": "vDate1TopDate1",
                                    "type": "SetVariable",
                                    "dependsOn": [],
                                    "userProperties": [],
                                    "typeProperties": {
                                        "variableName": "vDate1",
                                        "value": {
                                            "value": "@formatDateTime(pipeline().parameters.pDate1,'yyyy-MM-dd')",
                                            "type": "Expression"
                                        }
                                    }
                                },
                                {
                                    "name": "vDate2TopDate2",
                                    "type": "SetVariable",
                                    "dependsOn": [],
                                    "userProperties": [],
                                    "typeProperties": {
                                        "variableName": "vDate2",
                                        "value": {
                                            "value": "@formatDateTime(pipeline().parameters.pDate2,'yyyy-MM-dd')",
                                            "type": "Expression"
                                        }
                                    }
                                }
                            ],
                            "ifTrueActivities": [
                                {
                                    "name": "vDate1ToYday",
                                    "type": "SetVariable",
                                    "dependsOn": [],
                                    "userProperties": [],
                                    "typeProperties": {
                                        "variableName": "vDate1",
                                        "value": {
                                            "value": "@formatDateTime(subtractFromTime(convertTimeZone(utcnow(),'UTC','Hawaiian Standard Time'),1,'Day'),'yyyy-MM-dd')",
                                            "type": "Expression"
                                        }
                                    }
                                },
                                {
                                    "name": "vDate2ToToday",
                                    "type": "SetVariable",
                                    "dependsOn": [
                                        {
                                            "activity": "vDate1ToYday",
                                            "dependencyConditions": [
                                                "Succeeded"
                                            ]
                                        }
                                    ],
                                    "userProperties": [],
                                    "typeProperties": {
                                        "variableName": "vDate2",
                                        "value": {
                                            "value": "@formatDateTime(convertTimeZone(utcnow(),'UTC','Hawaiian Standard Time'),'yyyy-MM-dd')",
                                            "type": "Expression"
                                        }
                                    }
                                }
                            ]
                        }
                    },
                    {
                        "name": "GetBigKahuna",
                        "type": "WebActivity",
                        "dependsOn": [
                            {
                                "activity": "If DefaultDateParams",
                                "dependencyConditions": [
                                    "Succeeded"
                                ]
                            }
                        ],
                        "policy": {
                            "timeout": "7.00:00:00",
                            "retry": 0,
                            "retryIntervalInSeconds": 30,
                            "secureOutput": false,
                            "secureInput": false
                        },
                        "userProperties": [],
                        "typeProperties": {
                            "url": "https://bigkahuna.com/api/surfboards",
                            "method": "GET",
                            "headers": {
                                "StartDate": {
                                    "value": "@variables('vDate1')",
                                    "type": "Expression"
                                },
                                "EndDate": {
                                    "value": "@variables('vDate2')",
                                    "type": "Expression"
                                }
                            }
                        }
                    }
                ],
                "parameters": {
                    "pDate1": {
                        "type": "string",
                        "defaultValue": "yyyy-mm-dd"
                    },
                    "pDate2": {
                        "type": "string",
                        "defaultValue": "yyyy-mm-dd"
                    }
                },
                "variables": {
                    "vDate1": {
                        "type": "String"
                    },
                    "vDate2": {
                        "type": "String"
                    }
                },
                "annotations": [],
                "lastPublishTime": "2020-08-28T22:17:17Z"
            },
            "dependsOn": []
        }
    ]
}

Import that JSON as an Azure Data Factory ARM Template and you will have the same sample pipeline in your own environment.

For more examples of real life data problems solved for fictionalized use cases, keep coming back to https://blog.mi4.com.