What is a Data Integration and Why Would I Do it?
The Digital Archaeological Record (tDAR) originated out of the attempt to solve a major research challenge in archaeology — how to synthesize systematically collected data recorded using different coding conventions across multiple data sets and sites. Over the years while tDAR’s mission has evolved to also include preservation and access to data, data integration has remained at the forefront.
Built within tDAR is the ability to combine and analyze separate datasets. Our data integration tool allows users to combine disparate data sets into a single, new data set. This is an excellent way to make use of multiple datasets in tDAR to accomplish larger scale, comparative research.
Data Integration combines datasets using an Ontology. In tDAR, an Onology is like a coding key that specifies how variables in different datasets are related. The Ontology is used to “translate” your multiple datasets into a single document. For more about Ontologies, visit “Creating a New Ontology.” For an example of the power of Ontologies and Data Integration in tDAR, see Neusius et al. 2019. When planning for Data Integration, you may also find it useful to check out some examples of Ontologies already uploaded to tDAR by searching for Ontologies related to the material types or kinds of datasets you want to integrate (e.g., lithics, fauna, etc.)
When your Data Integration is complete, your results can be downloaded and fed into SASS, SPSS, or R for analysis.
Performing Data Integration
Prior to beginning the data integration, make sure that you have 1) selected the datasets you want to integrate, and 2) created an Ontology for the datasets. Once you have completed those steps, you are ready to begin.
To perform a data integration:
How to Perform a Data Integration
- Create or choose the datasets you wish to integrate
- Go to Integrate page…
a. Click on the “Start Now” link.
b. You should now be on a blank “Dataset Integration” page - Fill out the “Integration Name” and “Description” fields (any non-blank name is fine)
- Add the datasets you have chosen…
a. Click on “Add Datasets…”
b. Check the checkboxes for the two datasets you just created (are they not showing up? make sure they aren’t still in Draft status) - Add an integration column…
a. Click on the button labeled “Add Integration Column”
b. choose an ontology name (if you used the provided sample files, there should be one option here)
c. After choosing the ontology, a table should appear below the form, containing a list of all the possible ontology values.
d. At the top of the ontology list, beside a label named “Select values that appear in…” click on the “Any column” button, - Add a display column…
a. Click on the button labeled “Add Display Column”. A new tab should appear in the section below w/ two dropdown boxes
b. Choose a value for each dropdown (for example, the two sample datasets have a column named “name” and “game”, respectively” - Click the blue “Integrate” button.
Congratulations you have integrated data and are ready to begin analyzing the new dataset!