There are multiple methods that can be used to combine or append two or more datasets in SAS. In this article you’ll learn 5 simple ways to combine and append SAS datasets as follows.
We will explore each of these methods to combine and append datasets in SAS using the following sample data sets work.University1 and work.University2
/* create sample datasets */ data University1; input student $ grade $; datalines; Jackobe A Kjell B Morten A Steven C Peter A ; proc print; title 'Sample Dataset: work.University1'; run; data University2; input student $ grade $; datalines; Anna B Jan A Nils A Kristin C Michael A ; proc print; title 'Sample Dataset: work.University2'; run;
You can append data vertically using the SET statement in the data step. It reads an observation from one or more SAS data sets. You can also say, it stacks the multiple datasets one after another.
The best part is you can combine or append many datasets by listening after the SET statement.
In this example two sample datasets University1 and University2 will be stacked vertically and created a new dataset named work.CombinedUniversity
/* Appending Or Concatenating Two Datasets Vertically - Stacking the datasets */ data CombinedUniversity; set University1 University2; run; /* view dataset */ proc print data=CombinedUniversity; title 'CombinedUniversity Dataset'; run;
This method is quite similar to the previous one. The only minor difference here is it stack the rows after sorting the dataset. It means rows from multiple datasets will append together but after sorting it out on a specified BY variable in the data step.
The BY statement controls the operation of a SET, MERGE, MODIFY, or UPDATE statement in the DATA step and sets up special grouping variables.
Prerequisite: Before you use BY statement to combine or append multiple datasets with SET statement, all of your input datasets must be sorted on BY variable.
In this example two input datasets University1 and University2 must be sorted on BY variable “student” using the PROC SORT procedure.
/* sorting datasets on BY variable */ proc sort data=University1; by student; run; proc sort data=University2; by student; run; /* Appending Or Concatenating Stacked Sorted Data Values */ data CombinedUniversity; set University1 University2; by student; run; /* view dataset */ proc print data=CombinedUniversity; run;
If you compare the output of this query with the previous method, you can clearly see the sequence of the rows has changed due to the sorted stacked method used to combine and append datasets.
This is a very popular SAS procedure when it comes to appending the two datasets. The only drawback with this method is, it can only appends two datasets at a time. Whereas first methods can combine and append two or more dataset at the same time.
You could append more than two datasets using the proc append method but you have to write this procedure multiple times in order to cover all the datasets which you want to append.
In the proc append procedure base=dataset is the master dataset and data=dataset is the dataset which is to be appended to the master dataset.
If the master-dataset doesn’t exist then it creates a new dataset and appends the data. In this example work.CombinedUniversity dataset does not exist in the beginning so when the first proc append procedure runs it creates work.CombinedUniversity dataset and appends work.University1 dataset.
The second proc append procedure appends the work.University2 dataset to the CombinedUniversity dataset.
/* PROC APPEND To Concatenate Datasets */ proc append base=CombinedUniversity data=University1; run; proc append base=CombinedUniversity data=University2; run; /* view dataset */ proc print data=CombinedUniversity; run;
Did you know? How to use the PROC APPEND in 10 different ways.
This is a slightly different version of the proc append procedure. With the PROC DATASETS procedure you can use the APPEND statement to append multiple datasets.
Here also for the first time if base=dataset doesn’t exist then it creates a new dataset and appends data from data=dataset. You must write the multiple append statement to append more than one datasets.
/* PROC DATASETS To Concatenate Datasets */ proc datasets lib=work memtype=data; append base=CombinedUniversity data=University1; append base=CombinedUniversity data=University2; run; quit; /* view dataset */ proc print data=CombinedUniversity; title 'CombinedUniversity Dataset'; run;
You can use the PROC SQL method to combine and append multiple datasets in SAS. There are two simple methods which can be used to combine and append SAS datasets using proc sql procedure.
1. PROC SQL With UNION ALL option
You can select all the observations from two or more datasets and combine them together using the UNION ALL option.
The UNION option processes unique rows from both the queries. But you can preserve duplicate rows as well using ALL keyword.
The optional ALL keyword preserves the duplicate rows , reduces the execution by one step, and thereby improves the query-expression’s performance.
/* PROC SQL To Concatenate Datasets: using union all */ proc sql; create table combineduniversity as select * from University1 union all select * from University2; quit; proc print; title 'A UNION ALL B'; run;
2. PROC SQL With OUTER UNION Corr Option
The OUTER UNION option concatenates the query results. The CORR which is also known as CORRESPONDING option causes PROC SQL to match the columns in table-expressions by name and not by ordinal position.
/* PROC SQL To Concatenate Datasets: using outer union corr */ proc sql; create table CombinedUniversity as select * from University1 outer union corR select * from University2 ; quit; proc print; title ' A OUTER UNION CORR B'; run;