Crystal Taggart is a specialist in Bots and AI, and LOVES all automated things.

Data Integration Best Practices

How to Avoid Painful Data Problems by Following these Best Practices

Many companies struggle to get visibility to their data but it doesn't have to be that hard. By following a few best practices, data integration can be easy to set up and manage, making it easy to get insights across all your systems.

Traceability

Most people think that you just get data from an API, stuff it into a database and then you are done. Voilà! Instant insights!
The reality is that data integration and aggregation is a little more complicated than that.
When we import data, we do several things in order to ensure that we can trace back the history of the data over time:

Job History: We store the history of what data was picked up from where and when. If we need to re-load the data because of an incomplete file, this makes it easy to delete bad data and reload.
Source Systems Logging: Many times data needs to be aggregated from multiple systems in for useful reporting. When we aggregate data, we also identify which system the data was sourced from, making to easy to see what systems are reporting what numbers.
File Archiving: Occasionally data gets corrupt and needs to be re-loaded from source systems. We store the original raw data from imported systems as files so we can recover the original data from the original source without problem.

Re-Runnability

We build import processes that can be re-run when necessary (either at the database level or at the day-level.) To do this we follow a few best-practices when importing data:

Stage Data: We stage the data before importing it into the destination table. This helps us verify the upload of the data was complete before adding it to existing tables in the database.
Insert vs. Update: Many times (depending on timing) data needs to be updated instead of inserted. We follow best-practices for identifying which records should be updated vs. inserted.
Raw Data vs. Aggregated Data: We always store a raw data table per api which is the 'source of truth.' From here data is aggregated and correlated into other tables for performance. This makes it easy to re-load aggregated data if failures happen along the way.

Logging

We built standardized logging at the database and application level in order to trace back history of every job. Some of the logs that we create are:

Application-Level Logging: We store logs at the application level that show the overall health of all integration processes.
Module-Level Logging: We create a log file per process that stores key updates as the process runs. This gives insights into errors that happen within each individual job.
Job-Level Logging: We create database log updates showing the run times, file process history and source and destination files making it easy to verify the entire file processed correctly and making it easy to compare job run times and file sizes over time.

Quality Checking

We create checks as part of our processes that ensure the quality of the data across calculated tables including:

Alerting

We create alerts via channels like SMS, Slack and email when errors occur, making sure that problems can be quickly identified and resolved.

Interested in Data Integration?

Schedule a free consultation to learn more about our data integration services and platform.

Let's Meet!

More to explorer

Dashboards

Data Integration Best Practices

How to Avoid Painful Data Problems by Following these Best Practices

Traceability

Re-Runnability

Logging

Quality Checking

Alerting

Interested in Data Integration?

More to explorer

Open Source Analytics with Grafana

Automation in 2020 – Beyond Drip Campaigns

How to Predict Customer Behavior using AI

Your Marketing Attribution is Wrong (And How to Fix It)

Becoming AI-Driven in 2020

Why AI Projects Fail – What to Expect from your First AI Project

Get Started

About

Downloads

Free Book

The AI Revolution
The Future of Profit

Enter your email to receive the link instantly.

You'll also get a free copy of our book Marketing AI: 90 Days to AI-Driven once it's released.

Data Integration Best Practices

How to Avoid Painful Data Problems by Following these Best Practices

Traceability

Re-Runnability

Logging

Quality Checking

Alerting

Interested in Data Integration?

More to explorer

Open Source Analytics with Grafana

Automation in 2020 – Beyond Drip Campaigns

How to Predict Customer Behavior using AI

Your Marketing Attribution is Wrong (And How to Fix It)

Becoming AI-Driven in 2020

Why AI Projects Fail – What to Expect from your First AI Project

Get Started

About

Downloads

Free Book

The AI Revolution The Future of Profit

Enter your email to receive the link instantly. You'll also get a free copy of our book Marketing AI: 90 Days to AI-Driven once it's released.

The AI Revolution
The Future of Profit

Enter your email to receive the link instantly.

You'll also get a free copy of our book Marketing AI: 90 Days to AI-Driven once it's released.