Coding qualitative data is a time-consuming and often costly aspect of the survey process. In this post, I will provide feedback on IBM SPSS Text Analytics for Surveys (STAS), which is software designed specifically for coding survey data. The benefits of using this software include improved efficiency and consistency when coding text.
STAS allows you to set up a customizable coding algorithm to code your data. You can create your own coding scheme, or you can use one that is already built in. The built-in coding mechanism can identify responses that mention a place, a name, a positive or negative opinion, an opinion of how affordable an item is, and a number of other types of responses. One benefit of STAS is that it recognizes synonyms and alternative versions of words, making the task of setting up a coding algorithm more efficient.
I recently used STAS for two very different projects and found that it worked great for some purposes and not so great for others. If you are considering STAS, you may want to answer the following questions to inform your decision.
1. How much data do I have?
STAS is optimized for coding 1,000 responses or fewer, each about 40 characters long. I used it to code about 5,000 open-ended survey responses at a time and it still worked reasonably well.
On the other hand, I also used STAS to code over 100,000 Tweets for research into whether Twitter data can supplement survey data. I set up the coding system (STAS calls this a “Text Analysis Package”) with a subset of Tweets, then I used it to code 25,000 Tweets at a time. Initially, I tried coding all 100,000+ Tweets simultaneously, but STAS could not handle it, so I resigned to coding 25,000 at a time and waiting an hour or two while the process was running. See here and here.
2. Will I be coding more data?
Depending on how complicated your dataset is, setting up the coding system can take a lot of time. You’ll want to consider whether it is worth the initial time investment to set up the coding system, because for a low volume of complicated data, manual coding may be more efficient. For longitudinal studies especially, it might be worth that initial investment of time because once the system is set up you will be able to code subsequent waves of data with minimal effort.
3. How much will the responses vary?
I found that STAS worked best for coding demographic data because the range of responses was limited. It worked reasonably well for coding types of injuries (also somewhat limited), but it did not work as well for coding relationships between people. These responses ranged from simple responses such as “mother,” “friend,” or “teacher,” to complicated responses that were typically unrepeated in the dataset. The complicated responses were along the lines of “my best friend’s boyfriend’s step-mom’s boss.” I’m sure STAS can be set up to accurately code these sorts of responses, but I had a hard time figuring it out (see #5 below) and quite frankly, it’s probably more efficient to manually code these responses.
4. Would I be relying on STAS for sentiment analysis?
STAS is capable of coding sentiment, but I would test it carefully to see how it works with your data. As part of our Twitter research, we manually coded a random sample of 500 Tweets in our dataset and found that STAS sentiment coding was in agreement with manual coding only 44% of the time. STAS would likely perform better with survey data than Tweets, which often use unconventional language, but I would still recommend proceeding with caution if you plan to use STAS for sentiment analysis.
5. Am I able to take a course on STAS?
It took several days of working with test data and poring over the user’s manual (which I was not impressed with) for me to really figure out what to do with STAS. I know enough about STAS to get by, but I have also come to realize just how much I don’t know about STAS. I encourage you to take a STAS course if you are able. Learning new software is usually easy for me, but without any training, I really struggled with STAS.
If you decide to proceed with STAS for coding your data, here’s one tip as you get started. Run your data through spell check (e.g. in Word or Excel) before importing into STAS. STAS catches many spelling errors, but not all. Anything you can correct will speed up the coding process.
Do you have additional STAS tips to share? Has your experience with STAS been similar to or different than mine? Would you recommend something besides STAS for coding?
Please comment below!