These pages should (1) get students started using SAS, and (2) address the idiosyncracies of our system.
The computer manuals you should be using (and if you want to buy some, be careful: Student Stores still has manuals from Release 6.03 for sale) are: SAS/STAT User's Guide, Version 6, Fourth Edition, Volumes 1 and 2, and SAS/STAT Software: Changes and Enhancements through Release 6.12 for the statistics packages, SAS/GRAPH Software, Version 6, First Edition, Volumes 1 and 2 for the graphics packages, SAS/IML User's Guide, for the matrix language, and we don't have an INSIGHT manual in the department (but it's pretty straight-forward). Personally I just use the Help screens for figuring out the syntax. Also, the SAS/Language and SAS/Procedures guides may come in handy.
Oh yeah, Dan used to be a SAS-user---he may remember something.
Links to SAS Institute, Inc., and to the R&D division.
And a link back to my home page.
Okay, the easy stuff. To start the display-manager system you can either
This brings up several SAS windows: the Program Editor, the LOG window, and the OUTPUT window. You write and submit programs from the Program Editor, check the error messages in the LOG window, and view your results in the OUTPUT window. To exit the window system, choose FILE:EXIT from the Program Editor window.
If you wish to run a program in the background (overnight, say), then the command to enter from a shell is:
nice sas filename.ext &
nice allows other people to use the machine, filename.ext is the name of the program you wish to run, and the ampersand returns control to your shell. If you want to stop this job, bring up a shell (from the same machine you submitted the program from e.g. Savage), enter ps -x , find your PID# for "sas filename.ext" (pretend it's 4321) and enter kill 4321
NOTE: do not use kill -9 4321 because then some garbage gets left in the swap space.
The pulldown menus do the usual thing (don't be afraid to browse around--at worst you'll try to launch a slow-loading program).
The "00001", "00002", ... on the left-hand side of the editor is the line-command area. To learn how to use these, go into the help screens (pull-down Help: Extended Help: SAS Language: Text Editor and look up the Line Commands. For example, to insert a blank line before a line of code, type IB in the numbered area and hit the return key. Use these command to copy, delete, split text, join text, indent, unindent,...
Use the "Insert" key on the keyboard to toggle insert/overprint.
Your first programs (in 105, anyway) will probably use PROC REG, the regression package. When you become more familiar with regression and SAS, you will probably move to PROC GLM.
Easiest way to learn a language is to jump right in; here's a simple regression program:
data one; input y x1 x2; x1x2=x1*x2; cards; 8 1 3 7 3 5 4 6 3 1 7 4 ; proc print data=one; proc reg data=one; model y=x1 x2 x1x2; run;
SAS consists of DATA steps and PROC steps--do all your data entering and manipulation within the DATA step and do your analyses with PROC steps. Here we create a data set called ONE (this is only available in the current session--we'll talk about reading in external files in a minute); the CARDS statement (a bit archaeic--they want you to use DATALINES now) says that the following lines, until the semicolon, are the data. You have to create interaction terms in the DATA step for use in PROC REG (GLM can create interaction and nesting terms on the fly), hence the x1x2 line (*=multiplication, of course). The RUN command tells the program to execute what I've typed so far--SAS will automatically execute whenever it sees a new DATA or a new PROC command, but since our program ends we have to explicitly tell it to run the regression statements. The PROC PRINT command simply prints the data set to the OUTPUT window.
To submit a job you have four choices (at least):
To submit PART of a job (say you only want to submit the first 10 lines of code in your Program Editor)
Don't like using "Ctrl F" and "Ctrl H"? Redefine the Keys, say to "Ctrl S" and "Ctrl D" (whatever, it's your set-up).
To recall a previously run program, either
To open a saved program, either
The output printed in the OUTPUT window is displayed for viewing on a computer--to have it set for printing (page breaks, width of the page) use the pulldown menus from the Program Editor: Globals: Options: Global options and set LINESIZE to 76 or 80, and PAGESIZE to 60 or 66 or... While we're here, you can use the radio buttons to turn off the DATE that gets printed on the output and turn off the NUMBERing of the pages.
By the way, if you just print from the windows without thinking about what you're doing, you'll probably print 1000 pages with only 15 lines of output per page. Either clear the OUTPUT window before your final run, or just output the information to an ASCII file and fix it up first.
PROC REG prints high-resolution graphics (well, experimentally in v6.11 but it's full-production in v6.12). This includes quantile-quantile plots (which were a pain to deal with before they added these options). To get graphics to display, run:
proc reg data=one graphics; model y=x1 x2 x1x2; plot p.*r. nqq.*r.; run;
This creates predicted by residual plots and q-q plots for the residuals. A SAS/GRAPH window will be launched when you run this. If you want to save the image as a GIF file, go ahead and use the pull-down menus (I think it's, from the Graphics window, FILE:Export), HOWEVER if you want to save as a postscript file, READ THE GRAPHICS SECTION, and DO NOT SAVE FROM THIS WINDOW!!
Oh yeah, reading from/writing to external files. If your data is stored in ~/path/file.dat and you want to massage the data and save to ~/path2/file.out, submit something like:
data one; infile '~/path/file.dat'; input y x1 x2; z=y-x1; file '~/path2/file.out'; put z 5.2; run;
INFILE names the file to read, you still have to tell SAS what's in the file (but now you don't need that CARDS statement), FILE names the file to write to, PUT prints Z to the output file, and 5.2 says to print it as a number with 5 places, 2 of which are to the right of the decimal point (leaving 2 for the left of the decimal point and 1 for the decimal point). For more on the DATA step and input and output formats, check out the SAS Language manual (it's around here somewhere), OR use the Help screens.
The PROCs also can create data sets, and these can be written out to a flat file. For example,
proc reg data=one outest=ests; model y=x1; output out=out1 p=p r=r; proc print data=out1; proc print data=ests; data _null_; set ests; filename myout '~/path/filename.ext'; file myout; put _name_ $8.; run;
OUTEST=ESTS sends a certain REG output data set to SAS-data-set ESTS, the OUTPUT statement in REG sends a different data set (containing the predicted values and residuals) to the SAS-data-set OUT1. The data statement will write ESTS to a flat-file--_NULL_ means that no SAS-data-set will be created, FILENAME names the file MYOUT, and FILE MYOUT opens it for writing.
That's enuf! Read the manuals for anything else. Oh, REG is described in Volume 2 and REG Graphics is in the C&E manual.
We already mentioned running graphics through PROC REG; you can also create graphics using special procedures. For example:
proc gplot data=one; plot y*x1; run;
That's all there is to plotting points. Of course everybody wants to do more, so read the SAS/GRAPH manuals for manipulating legends, horizontal and vertical axis labels and formatting, notes, lines, splines, greek text, making 3D plots...I'll be writing about how to create Postscript images, and that's all. Oh, and printing multiple plots on a page, which is not easy to do with SAS.
It has come to my attention (thanks, Mandy) that the SYMBOLn statements will not be honored in a graph unless you include a color specification:
symbol1 v=dot c=black;I never noticed. Also, if you want to create black-white images with different symbols instead of the default different colors, you can try making your graphics after submitting
goptions colors=(black);This will force SAS to only use black characters, and it will cycle through the plus, x, circle, square,... symbols (in some order)--see the graphics manual (Volume 1) for the different symbols you can use.
If you save Postscript images from the SAS/GRAPH window, your files will be HUGE! Run the following script (or something like it) before you run your program in order to create reasonably sized PS files:
goptions reset=all dev=psepsf gaccess=sasgaedt gsfname=gsasfile gsfmode=replace hsize=3 in vsize=2.5 in htext=3.0 pct htitle=3.5 pct vorigin=0 in horigin=0 in rotate=portrait noprompt nodisplay border lfactor=1 ftitle=swiss ftext=swiss colors=(black) cback='CXFFFFFF'; filename gsasfile "~/path/imagename.eps";
GOPTIONS means graphics options.
DEV= names the device to print on--in this case it's an encapsulated postscript
printer. I also use psll for postscript images and imggif
for gif images.
GSFNAME matches the name in the FILENAME statement,
GSFMODE is either replace the file or append to the end,
HSIZE= and VSIZE= give the dimensions of the image to save,
HTEXT= and HTITLE= give the relative sizes of the text and titles that
may be included in the image,
VORIGIN= and HORIGIN= determines the position of the image on the page
(0,0 works for inclusion in LaTeX documents),
ROTATE= gives landscape or portrait images,
NODISPLAY will not attempt to launch a GRAPH window,
BORDER draws a line around the image (NOBORDER undoes this),
LFACTOR= set the width of any drawn lines,
FTITLE= and FTEXT= name the fonts used in printing the title and the text,
COLORS= is set to black for a black/white image (shades of gray are fine--check
out the graphics manuals...I think gray, ligr, and medgr are three valid
color-names. However, to print grays nicely on a PostScript printer
from LaTeX you should set the dpi to 600 on both the printer and in the
dvips options [this last sentence was for LaTeX users only]),
CBACK='CXFFFFFF' sets the background to a real white (the GRAPH window
is off-white, so anything you print from there--DON'T--will
have a gray-ish background).
After running that script (save it somewhere!) the command
proc gplot data=one; plot y*x; run;
will print the plot to the file ~/path/imagename.eps. All following images will ALSO be written there, so make sure to change the names. If you want to remap the GSFNAME=GSASFILE to a different file, the manual claims
filename gsasfile clear; filename gsasfile "~/path/imagename2.eps";will do the trick, but it never seems to work for me. I usually just copy the images to wherever I want them to be. To stop printing to a postscript file and to display on the SAS/GRAPH window again, submit
goptions reset=all;
AND, to launch a Ghostview of your postscript image right from SAS, just submit (this should work outside of DATA and PROC steps, and from inside IML)
x ghostview ~/path/imagename.eps ;
The SAS/GRAPH window that is automatically displayed is actually a view into the WORK.GSEG (default) graphics catalog. A catalog is just a place to store lots of images, output, programs,... The WORK prefix says the catalog will be destroyed at the end of the session; any other prefix will save a catalog. For example, if you save an image to a SASUSER.IMAGES catalog, there will be a file called ~/sasuser/images.ssct001 (or something like this). How do you save images to special catalogs? Easy; for the previous gplot example, you could submit
proc gplot data=one gout=sasuser.images; plot y*x; run;
You can save PROC REG images similarly. One word of warning: these catalogs can get huge--and since I strongly suggest you DON'T print from them there's no reason to use them, unless you want to print multiple images on a single page (in which case, use the WORK catalog; e.g. replace "sasuser.images" in the above code with "work.images").
To view all the catalogs and their contents, use the pull-down menus from the Program Editor: Globals: Access: Display libraries. Double clicking on the underline will open a catalog, typing 'D' on the line will delete that entry. To kill the window, either use the pulldown Edit: End or define a button on your toolbox.
I just made a set of gifs for fun, and it seemed like a decent topic to add to this file.
I have a data set consisting of x-y coordinates and functions f(x,y), fmin(x,y), fmax(x,y), and fest(x,y). I could create a bunch of 3d plots and try to visually compare them, but instead I want to take slices through the surface at various x-values, and display the cross-sectioned f*y plots. AND, I want these slices saved as a bunch of gif images, so I can use the gifmerge utility to make a slide-show. (Okay, messing with gifmerge is a bit low-tech these days---just humor me.) In 6.12, we use the IMGGIF device to make gifs, and we'll use macros to make the appropriate slices. I think the later versions of 6.12 have a GIF device, but this is not available at the STAT department yet (I'm trying to get Anne to reinstall SAS, since we have that nasty PostScript problem now).
So, use the same goptions you used to create PostScript images, but change a few of the parameters:
goptions dev=imggif
hsize=5 in
vsize=4 in
rotate=landscape
htext=3.5 pct
htitle=4 pct;
Add whatever axis and symbol commands you want to use:
axis1 order=0 to .025 by 0.005; legend1 label=none; symbol1 c=black i=join; symbol2 c=red i=join; symbol3 c=green i=join; symbol4 c=blue i=join;
And a macro that does the following: It fixes an x-value (from -20 to 20, the range of x for my example), defines two prefixes for the gif files---the first nine have an extra '0' appended to the prefix, so when I run gifmerge all the images will show up in the proper order---and resets the name of the gsas file, then plots the data using a where statement, which will just select the slice x=x-value. Oh, the "quit" enables you to reset the filename. After it's finished looping, the x command lets me run UNIX shell commands from SAS, so I run gifmerge then clean out the other gifs from my directory (actually, I didn't try this. But it should work, in principle). Note that if the gifmerge does not work, you may have to run your gifs (after recreating them) through xv and resave them as gifs again, then run the gifmerge---trust me, it works. Now that I think about this, I know SAS tends to add blank lines all over the postscript files---maybe it adds blank lines to gif images too, and gifmerge chokes on them?
%macro plotem;
%do x=-20 %to 20 %by 1;
%let j=%eval(&x+21);
%if %eval(&j<10) %then %let prefix = graf0;
%else %let prefix = graf;
filename gsas "/home/Mickey/students/derr/&prefix&j..gif";
proc gplot data=all2;
where x=&x;
plot (f fmin fmax fest)*y / vaxis=axis1 overlay legend=legend1;
run;
quit;
%end;
x gifmerge &prefix.*.gif > merged.gif;
x rm -f &prefix.*.gif;
%mend;
%plotem;
Michele removed the html editors from UNIX; color-coding by hand is not fun.
May I first suggest creating a bunch of postscript images, then using LaTeX to place them on a page?
If you just want to plot, say, all possible pairs of 5 variables, then consider using INSIGHT. it's very user friendly...
Otherwise, it's time to talk about PROC GREPLAY (this is NOT user-friendly)
Suppose you want to print 6 graphs on one page, in 3 rows and 2 columns. You first create your graphs and save them to a SAS/CATALOG. Place the six plots in the proper position by writing them to a template created with TDEF. To create a template, just describe where the corners of each plot is to be placed ("llx"="lower left x-value"). The name of this template is "three left three right".
Here's the program that accomplishes this; explanations follow.
%let gout=gout=work.gseg; filename gsasfile '/u/sasred/unc/grafjunk'; goptions dev=psll gaccess=gsasfile gsfmode=append rotate=landscape; proc greplay nofs tc=work.tempcat igout=work.bin &gout; tdef t3l3r des='three left three right' 1/llx=0 lly=67 ulx=0 uly=100 lrx=50 lry=67 urx=50 ury=100 2/llx=50 lly=67 ulx=50 uly=100 lrx=100 lry=67 urx=100 ury=100 3/llx=0 lly=34 ulx=0 uly=67 lrx=50 lry=34 urx=50 ury=67 4/llx=50 lly=34 ulx=50 uly=67 lrx=100 lry=34 urx=100 ury=67 5/llx=0 lly=0 ulx=0 uly=34 lrx=50 lry=0 urx=50 ury=34 6/llx=50 lly=0 ulx=50 uly=34 lrx=100 lry=0 urx=100 ury=34 ; template t3l3r; treplay 1:gplot 2:gplot1 3:gplot2 4:gplot3 5:gplot4 6:gplot5; run;
For more examples, this is a link to a cribsheet I wrote as I figured out GREPLAY. It contains the following examples:
0 -215.76 RL -215.76 0 RL 0 215.76 RL
This is very much like MatLab--except the syntax is just diferent enough to make knowing both packages a real pain. For example, in IML you use nrow(matrix) to count the number of rows in a matrix, but in MatLab use sizeof(matrix,1) (or something like that).
To use the SAS matrix language, enter
proc iml;
from the Program Editor and you can begin. To quit a session, enter
quit;
The IML session is interactive--you can write and test your coding on the fly. For example, to perform a regression on the ONE dataset, you can enter the following:
proc iml;
use one;
read all var{y x1 x2};
print y x1 x2;
x=j(nrow(x1),1,1) || x1 || x2;
bhat=inv(x`*x)*x`*y;
print bhat;
The USE and READ statements allow you to use a previously defined SAS data set. (Why can't you use just one statement? Because that'd be easy.) The variables Y, X1, and X2 are read into column vectors, and || concatenates the two x-vectors and a vector of 1s (so X is an nx3 matrix). In IML, the matrix transpose is the backward-quote, not the usual quote (it's on the ~ key). * is matrix multiplication, while # is element-by-element multiplication (i.e. 3#x multiplies every element in the matrix x by 3). The PRINT statement prints the regression estimates to the OUTPUT window immediately. You can continue inputting different commands until you enter the QUIT statement, or you can write an entire program and submit it in one fell swoop (esp. for long batch jobs).
Creating graphics in IML is painful--SAS refers to them as "graphics primitives," and they are! Your best bet is to output the appropriate data to a SAS data set then use a SAS/GRAPH procedure (GPLOT or G3D, say) to create the image. Or input the data set to MatLab. If you're stubborn like me, here's more-or-less what has to be done within an IML session.
Suppose you created a nx2 matrix DATA of points to be plotted in the rectangle (0,xmax)x(0,ymax). Submit, within an IML session,
xmax=data[<>,1]; /* maximum value in the first column */
ymax=data[<>,2];
call gstart; /* begin the graphics routines */
call gopen(,1);
wd=(0||0)//(xmax||ymax); /* define the window to display */
call gwindow(wd);
call gport({5 5,95 95}); /* define how much of the window the plot uses */
call gpoint(data[,1], data[,2],,'black',,wd); /* plot the points */
call gshow; /* display the image */
call gstop; /* end the graphics routines */
The magenta text is the only command that says "plot these points;" the rest of the commands are a hassle. There are other options for making the image pretty, adding axes and titles, etc, and other statements for connecting the points, shading in an area, etc...
If you want to print multiple plots on the same page (without using GREPLAY or LaTeX), you can play around with viewports and windows--it takes some doing but it can be done (maybe I'll include a program that does this later...).
To save your image as a Postscript file, read Creating Postscript Images.
This is not explained well anywhere in the manuals (but it IS explained, somewhere). In order to sort a vector, use
a={6 3 7 56 6 2 4 9 7};
b=a;
a[rank(a)]=b;
print a;
To sort a matrix based on the first column, execute
a={6 4 , 2 1, 5 7, 9 2, 6 2};
b=a;
a[rank(a[,1]),]=b;
print a;
To sort a matrix based on one column, and within the sort based on another column, you may want to use the following module.
start SortMat(data,col1,col2);
/*==============================================================
| Sort a numeric matrix by col1, and within col1 by col2
| a very slow and primitive routine
===============================================================*/
/* sort by first column */
b=data;
data[rank(data[,col1]),]=b;
/* sort within first col by second col */
nswitch=1;
if col2>0 then do until(nswitch=0);
nswitch=0;
do i=1 to nrow(all)-1;
if data[i,col1]=data[i+1,col1] then do;
if data[i,col2]>data[i+1,col2] then do;
temp=data[i,];
data[i,]=data[i+1,];
data[i+1,]=temp;
nswitch=nswitch+1;
end;
end;
end;
end;
return(data);
finish;
To execute this, insert the module at the top of your session/program (well, you have to execute it before you use it, anyway), and simply invoke
b=SortMat(a,1,2);
print a b;
I seem to recall this actually works, but you'd better test it first. You can store modules in a library somewhere so you don't have to copy them into your programs, but I prefer saving them in a flat-file and using the %include '~path/filename.ext' command instead (e.g. save this as ~/path/sortit.sas and after the proc iml command type in the line %inc '~/path/sortit.sas';). If this made absolutely no sense to you, forget I mentioned it.
INSIGHT's pretty cool--rotating 3D plots, point-and-click GLM-type analyses, kernal density estimation (and a bunch of relatives), univariate and multivariate analyses, line-plotting and scatter-plotting (and, e.g.,if you want to plot 3 variables against each other, it'll create a scatter-plot matrix), box-plots (yahoo.)... The data interface is spread-sheet-ish, but you can do a fair amount with the menus. Basically the best way to use it is to load in a data set and start playing.
To get into INSIGHT, use the pull-downs: Globals:Analyze: Interactive Data Analysis. You will be prompted for a data set to load--or you can create a new one. The data sets are stored in different SAS Catalogs--the data sets you've just created in the current session (ONE) are in the WORK catalog, while you can save some for posterity in the SASUSER catalog or in a catalog that you name yourself.
Aside: to permanently save a data set in a catalog, use two-level names, like
data sasuser.savit; set one;
Once you have your data set loaded, just start playing.
I generally use Macros when I want to run the same program many times (e.g. for 1000 simulations). To do this you can write
%let numruns=1000;
%macro runit;
%do i=1 %to &numruns;
<insert program code here>
print &i otherstuff;
%end;
%mend;
%runit
%macro...%mend; define the macro. %runit runs the macro; note that you do NOT need a semicolon after %runit. The macro variable &numruns and &i are referred to with an initial ampersand--the %let statement defines &numruns while the %do..%end statement defines &i. You can also print the macro variable values.
For people who have gotten around to the SYMGET and SYMPUT commands, let me just say that creating a macro like this:
%macro runit;
do iii=1 to 1000;
call symget('jjj',trim(left(char(i))));
print "&j";
<insert program code here>
end;
%mend;
%runit
will NOT work--for some reason the outer DO loop has to be a
MACRO %DO loop for III to be translated into a character
and assigned as that character to the macro variable JJJ.
There's an AWFUL lot more in here, some of which is quite usful. Hope you enjoy reading the manuals (I think we have one...I know I do). For debugging purposes, go into the pulldown menus from the Program Editor: Globals: Options: Global Options and select MPRINT and SYMBOLGEN from the radio buttons (MLOGIC is sometimes good, but it prints a lot of garbage).
Sometimes your programs will crash because the system has run out of space. There are several causes and several work-arounds:
If these don't work, try changing your code to use more CPU and less memory, complain to the system administrator and the computer committee, or wait and try again after midnight.
As I said earlier, I tend to use the help screens before resorting to the manuals. To bring the syntax helps up, use the pulldowns from the Program Editor: Help: Extended Help. Click on
Hope this helps...
Another link back to my home page.