As more organizations move their business intelligence (BI) environments to the cloud, loading and accessing enterprise data will become as important as the front-end visualizations. Oracle’s BI Cloud Service (BICS) offers several options for those data requirements that go beyond simple data upload. Each has a specific purpose, features, benefits, and limitations. One of the more powerful options for true enterprise-level data transfer to the cloud is Oracle’s Data Sync tool.
Data Sync Overview
Data Sync provides a full-featured data transfer tool with a client interface that allows for scheduling load jobs that efficiently move data from flat files, database tables, and other cloud data sources into the BICS Database Schema Service or Oracle Database Cloud Service. It can also directly load data as a data set source for the Visual Analyzer projects that are available in BICS.
It includes many of the features found in other data loading tools: logging of load job execution steps, restarting after failures, incremental loading of new or modified data, and configuring the sequence of load operations. A Data Sync solution would more likely lend itself either to new development of data load processes or to a more agile analytics environment that allowed for changing processes and data models more rapidly than would be possible with an on-premise database.
Data Sync Configuration Steps
Data Sync’s primary function is to upload data into a BICS environment. Data can be loaded from flat files (CSV or XLSX), relational database sources (either tables, views, or SQL statements that are executed dynamically), Oracle Transactional BI (OTBI), JDBC data sources (e.g., MongoDB, Impala, SalesForce, RedShift), or the Oracle Service Cloud.
Uploaded data can be stored in cloud-based tables or data sets accessible to the data visualization components.
Data Sync can:
Load data sources other than Oracle in addition to data files or Oracle tables
Execute incremental data loads or rolling deletes and insert / append strategies
Merge data from multiple sources
Schedule data loads
Data Sync is installed on a local computer running either the Windows or Linux operating systems. Prior to installing Data Sync, ensure that Java Development Kit (JDK) 1.7 or later is installed on the local computer. It must be the JDK and not a JRE. It is also necessary to validate that the user account that will be used to access the BICS database schema has the proper permission. Work with your cloud administrator to request permission to upload data to Oracle BICS by assigning the BI Data Load Author application role to the account.
To upload data to a data set instead of a table, the BI Advanced Content Author application role should be assigned.
Download the Data Sync software from Oracle Technology Network. Currently located at: Data Sync Download on OTN
Unzip BICSDataSync.Zip to a local directory (no spaces in directory name)
Set the JAVA_HOME variable in config.bat or config.sh to point to JDK location
Copy database-specific JDBC drivers to Data Sync’s \lib directory
Data Sync is comprised of both a server component and a client GUI interface. To start Data Sync and its server component, run datasync.bat (Windows) or datasync.sh (Linux/UNIX) from the Data Sync installation directory. The Data Sync icon displays in the system icon tray to indicate that the server is up and running.
To access the client interface, click on the icon and choose Start UI to open the Data Sync client.
Click on the icon and choose Exit to stop the Data Sync server.
Data Sync Updates
Data Sync automatically checks against the cloud environment to see if a new version is available prior to each load job executing. It is possible that a new version of Data Sync has been installed in the cloud that is incompatible with the local version. If the versions match, the data load continues unaffected. If the minor version of the tool is changed in the cloud, it indicates a patch is available. This triggers a one-time alert and an optional email if the Data Sync server is configured for emailing notifications. If the version change indicates a major version release, an alert is created and an email sent. The data load job is stopped and will not run until the new version is installed.
New versions (minor or major) are simple to install by following the standard installation process into a new home directory and then by copying the existing environment configuration into the new installation.
Data Sync Terminology
Connection: Defines the data sources and target databases.
Project: A container which describes and organizes the information related to a data load. There can be any number of projects within a Data Sync instance.
Job: The mechanism for uploading all the data sources defined in a project to BICS.
When moving data to the cloud, a load strategy defines how that data is loaded from the data source into the target. Incremental data loading is available as long as there is a column on the source table which uniquely identifies each row (primary) as well as another column with a “Date/Timestamp” data type that can be used to identify which rows have been added or modified since the previous load. During the initial load job, the full source table is transmitted to the cloud. In subsequent loads, the last update date is compared to a stored timestamp for
Load strategy options include: