Data Wrangling in Oracle Analytics Cloud (OAC) Using Data Flows
As the need for instant reporting has started to take shape, Oracle Analytics Cloud (OAC) provides end users the ability to manipulate data sets to make them available for visualizations. In OAC, the user now can create a data flow. A data flow is OAC’s user-friendly version of a data transformation tool. Although this provides a simple interface, it still offers considerable functionality to cleanse and transform data:
To create a data flow using the “Oracle DV Desktop” tool, go to the top right corner of the home page and click the “Create” button. From there, select the “Data Flow” button. It will then prompt the user to select a data source to start with, and can begin adding steps to transform the data:
The user is not limited to only Microsoft Excel workbooks as a data set. Oracle provides all of the following connections below to begin a transformation:
Once the user has selected the data set to start with, they can create row-based and column-based manipulations of the data set. Below are all of the steps that can be used in the “Data Flows” tool:
One feature that is pivotal in this solution is column selection. There will be times where columns are included in a data set that are not of any use to the end user. This feature allows the user to select only the columns that are useful to the discovery of data that interests them. Add the “Select Columns” step to the data flow, and then add only the columns necessary on the right side of the screen:
Another key capability is filtering the data set. Sometimes there will be outliers in data sets that are not necessary for reporting needs. To filter, select the “Add Step” button and select the “Filter” option. From there, select the column(s) that should be filtered and add the filter parameters.
Data flows allow the user to create a binned column based off an existing measure value with up to 99 different segments. To create a binning column, click the “Add New Step” button. From there select the “Bin” option and the column to bin.
Three options for setting the bin ranges are demonstrated below:
Manual: The user defines where the binning makes the most sense to be done. The user also gets to name the bins to their liking.
Equal Width: Takes the highest and lowest measure value, and then bins equally by how many the user defines.
Equal Height: Takes the total amount and makes an equal amount of values in each bin.
The results of the complete data flow can either be saved locally in Oracle Data Visualization (DV) Desktop or stored in a database using a previously created connection. When saving to a database, the “USERID” in the connection configuration must have “CREATE TABLE” and “INSERT” privileges granted. The database can be either an on-premises or cloud implementation in Oracle Database-as-a-Service (DBaaS):
There are two options for getting the data flow to run at any time. The user can either execute or run the data flow. Executing the data flow is done via the “Editing the Data Flow” functionality. This is the easiest option if the user has made changes or has recently created the new data set and would like to see its new output. Running the data flow can be done at any time via the “Data” tab in the DV Desktop instance. The user can run a selected data flow by right-clicking on it and selecting “Run.” This option may be more useful if there are multiple data flows that need to be run.
This post describes some of the more commonly-used steps for data manipulation that are available in Data Flows. There are much more advanced steps like “Analyzing Sentiment” and “Time-Series Forecasting” that puts Data Flows into the next level of “on the go” data manipulation tools.
For a more in-depth look at the Oracle OAC Data Flows option, follow this link** to a recently recorded Performance Architects webinar that includes a demo of DV Data Flows in action.
**Please note, a valid Performance Architects Learning Center account is needed to view the webinar recording mentioned in this blog post.