Share on facebook
Share on google
Share on twitter
Share on linkedin

Data Integration Best Practices

How to Avoid Painful Data Problems by Following these Best Practices

Many companies struggle to get visibility to their data but it doesn't have to be that hard. By following a few best practices, data integration can be easy to set up and manage, making it easy to get insights across all your systems.

Traceability

Most people think that you just get data from an API, stuff it into a database and then you are done. Voilà! Instant insights!
The reality is that data integration and aggregation is a little more complicated than that.
When we import data, we do several things in order to ensure that we can trace back the history of the data over time:

  • Job History: We store the history of what data was picked up from where and when. If we need to re-load the data because of an incomplete file, this makes it easy to delete bad data and reload.
  • Source Systems Logging: Many times data needs to be aggregated from multiple systems in for useful reporting. When we aggregate data, we also identify which system the data was sourced from, making to easy to see what systems are reporting what numbers.
  • File Archiving: Occasionally data gets corrupt and needs to be re-loaded from source systems. We store the original raw data from imported systems as files so we can recover the original data from the original source without problem.

Re-Runnability

We build import processes that can be re-run when necessary (either at the database level or at the day-level.) To do this we follow a few best-practices when importing data:

  • Stage Data: We stage the data before importing it into the destination table. This helps us verify the upload of the data was complete before adding it to existing tables in the database.
  • Insert vs. Update: Many times (depending on timing) data needs to be updated instead of inserted. We follow best-practices for identifying which records should be updated vs. inserted.
  • Raw Data vs. Aggregated Data: We always store a raw data table per api which is the 'source of truth.' From here data is aggregated and correlated into other tables for performance. This makes it easy to re-load aggregated data if failures happen along the way.

Logging

We built standardized logging at the database and application level in order to trace back history of every job. Some of the logs that we create are:

  • Application-Level Logging: We store logs at the application level that show the overall health of all integration processes.
  • Module-Level Logging: We create a log file per process that stores key updates as the process runs. This gives insights into errors that happen within each individual job.
  • Job-Level Logging: We create database log updates showing the run times, file process history and source and destination files making it easy to verify the entire file processed correctly and making it easy to compare job run times and file sizes over time.

Quality Checking

We create checks as part of our processes that ensure the quality of the data across calculated tables including:

  • Duplicate Checking: We check for duplicate inserts as part of our denormalization process to ensure that data isn't double-counted.
  • Total Checking: We verify that totals of key metrics match in all denormalized tables to ensure that calculations are correct.

Alerting

We create alerts via channels like SMS, Slack and email when errors occur, making sure that problems can be quickly identified and resolved.

Interested in Data Integration?

Schedule a free consultation to learn more about our data integration services and platform.
Share on facebook
Share on google
Share on twitter
Share on linkedin

More to explorer

AI

How to Predict Customer Behavior using AI

Predicting customer behavior can be easy if you are analyzing your customers and leads. Using advanced segmentation, you can predict how your customer will respond in a number of scenarios including churn, offers, upsells and more.

Read More »
AI

Becoming AI-Driven in 2020

AI is still in the very early stages of adoption at most companies or a ‘wish list’ item for others.

For even more companies it’s a source of confusion – what is AI, what does it do, how can I leverage it for my business, and how can I prepare?

Read More »

This website uses cookies to monitor and measure traffic and optimize our site.

Free Book

The AI Revolution
The Future of Profit

Enter your email to receive the link instantly.

You'll also get a free copy of our book Marketing AI: 90 Days to AI-Driven once it's released.

We might send you a product update every once in a while, but we promise not to spam you.