Using the Internet for Primary Research Data Collection
"Things you may not know"
by
Tregg Farmer
President, InfoTek Research Group, Inc.
Originally Published in 1998
An estimated 75 million people in the U.S. will be connected to the Internet by the end of 1998. Imagine having the capability, at your fingertips, to reach these individuals almost instantaneously and obtain feedback on research issues. The truth is, however, given the infancy of the Internet’s infrastructure, researchers are currently unable to take full advantage of its potential. This has sparked an interesting debate among the research profession over the Internet’s current capability to collect quality research data. This, in turn, has resulted in only a small percentage of market research firms using what we believe will be one of the most valuable tools in a researcher’s tool kit.
One common thread binds these venturous firms together – a belief that this trend will cause the largest paradigm shift in market research since the high penetration of telephones in the late 1950’s.
Regarding the future of Internet data collection, there are two important questions to ask:
First, are research suppliers prepared and competent to provide quality Internet data collection?
Second, and most importantly, are clients willing to embrace this new medium as a viable, reliable alternative or addition to the industry’s existing research tools?
The objective of this paper is to give the reader a better understanding of the benefits and drawbacks to Internet research, along with an idea of when it is best use this data collection approach. It is our desire to expand the acceptance and quality of Internet data collection by empowering the researcher and client with an "acid test" of when to use or not use the Internet for research. Specifically, we plan to address the following:
A word on the current Internet participant mindset…
Participants contacted over the Internet are ‘of a different breed’. This is a fact that must not be ignored when preparing a research plan that involves the Internet. All too often, researchers and clients fail to either recognize or admit that the typical Internet research respondent has salient differences that must be taken into account when setting objectives, designing the study and instrument and, most importantly, when interpreting the results and drawing conclusions.
We have empirical evidence that suggests the following about Internet user characteristics.
What type of past research methodology does Internet data collection best resemble?
Generally speaking, data collection on the Internet falls in the same category as mail data collection via disk-by-mail (DBM), with both methodologies having the following differences and similarities.
The good points…
The not-so-good points…
Most researchers who offer Web-based fielding will say "Almost any study can be fielded using the Internet." Of course, common sense leaves us skeptical. Let’s be honest: currently, some can but most cannot. For which ones, then, CAN the Internet be used for primary research?
Studies currently vary from simple six-question surveys to on-line discussions or focus groups. Ultimately, what works and what doesn’t is up to the user of the research and what he or she feels comfortable with.
As a general rule, a researcher must ask at least three questions before selecting the Internet for data collection.
The limited nature of Internet research along these lines has been revived somewhat with a sampling concept called ‘sifting.’ Sifting is used when a universe of potential respondents can be ‘over-sampled’. It is the process of including only the participants from the sample who accurately represent the population you are attempting to have represented. For example, if a researcher wanted to ask questions of females between the ages of 30-45 who owned a PC, an ‘invitation banner’ can be placed on an Internet search engine in the PC category. Everyone who ‘hits’ the invitation and completes the questionnaire would then be sent through a sifting process that would exclude everyone that did not fit the profile.
In some cases, however, the target market has a large enough Internet user base that ‘sifting’ is not required. The best example is users of Internet yellow pages.
If the material is confidential, the Internet should not be used because the distribution of material cannot be controlled. Another alternative, such as a central location interviewing methodology, is best suited for this study. (address security or encryption)
In the future, it is expected that many ‘confidentiality’ roadblocks will be alleviated to some extent, such as restricting the ability to save or print a page from the Internet. As with any research method, however, complete confidentiality of research material will never be possible.
Highly involved studies are not recommended for the Internet at this time. InfoTek has experimented with complexity and has found that any instrument which takes more than 15 minutes to complete has a very high probability of containing non-response error (that error caused by questions not answered or instruments not fully completed).
Relative to other methodologies, the ‘point of diminishing returns’ comes earlier. As an example, InfoTek received a request from a client to have respondents review several Web sites and compare them to each other for relative appeal and usefulness. The results ultimately revealed that wait times in bouncing from site to site pre-empted completion among hundreds of respondents.
Overall, we believe that using a little research ‘common sense’ can prevent stepping over these boundaries. If the researcher is ever in doubt, we strongly recommend a well executed pre-test.
Which types of questions work best?
Since using the Internet is most similar to using disk-by-mail (DBM) for data collection, the researcher can use the same thought process when it comes to formulating questions. To assist in this process, the table below shows several types of questions along with their relevance and effectiveness using the Internet. A grade is assigned to each, based on our experience, regarding a typical participant’s ability to interpret the question and response format.
|
Question Type |
Example |
Grade |
Reasoning for Grade |
| Single response, dichotomous or multichotomous question | Male or Female, pick one from list of many |
A |
Internet programming allows us to force someone to pick only one from the list |
| Scaled question (nominal ordinal data) | 1-7 importance scale |
A |
Again, participants can only select one point on the scale |
| Paired comparison or trade-off | Which one of the two do you prefer? |
A |
Again, Internet programming allows us to force someone to pick only one |
| Multiple comparison | Which one of these do you prefer? |
B |
Added complexity increases response error |
| Multiple response | Choose one or more from a list |
B- |
As with mail, the burden is on the respondent to read the entire list |
| Ranking (ordinal data) | First pick, second pick, etc. |
B- |
More complex response process, but works as well on the Net as with other methodologies. Being able to provide color graphics or pictures adds to the appeal of this question type |
| Open-ended response (short answer) | In one or two sentences, why do you feel this way? |
C |
Having to type a response is an inhibitor and increases non-response error in the instrument |
| Open-ended response (long answer) | In detail, why do you feel this way? |
D |
Very high non-response error |
Which Internet data collection methodologies work best?
There are several Internet data collection interfaces. And, as with past other methodologies, researchers have an differing opinions as to which ones work best on the Internet. It is our goal to give an unbiased opinion, presented below, of which methodologies work best below. Please note that our opinion includes the opinions of many clients as well. We give each method two grades. The first grade represents the current methodology. The second grade represents what we think the potential is for this methodology is in the future.
The reader will note that none of the methodologies receives an "A" grade, currently. This judgment is because of the drawbacks of the Internet’s infrastructure today do not create allow for the ‘perfect’ data collection. Of course, as the Internet becomes more secure, easier to navigate, and used by more people, it is expected to change dramatically in the future.
|
Method |
Description |
Current |
Future |
Positive |
Negative |
| Flat File Instrument | A simple questionnaire posted as a form on a Web page. It does not contain any interactive programming. |
B |
B+ |
Internet users are most familiar with this format and its simple design keeps costs of programming to a minimum. | Not being interactive (e.g., skips) limits the type and number of questions a researcher can include. |
| Interactive Instrument | This is similar to the flat file except it uses server and/or client based programming to mirror the capabilities of CAPI instruments. This includes skip patterns, using responses of past questions in future questions, etc. |
C- |
A |
Once the Internet’s infrastructure becomes more robust with greater bandwidth and better client and server software, having a CAPI interface will be be very well welcomedreceived. | Currently, the many problems of the Internet (e.g., speed, loss of connection, cookie screeners, reliability of the data transmission network) make it difficult to prepare the lengthy, complex instruments CAPI is well known for today. |
| On-Line Chat used for Qualitative Discussion | On-line chat forum is created by allowing several individuals to type messages to each other. This usually includes a session moderator. |
D- |
B- |
Creating interactionveness among
participants is a trademark of qualitative research. However, currently
the researcher choosing this method must lower his/her expectations in
order to be satisfied.
In the future, the Internet’s speed and video capability should increase the dynamic nature of this method to the point were true interactivity can be achieved. |
The speed of interactionvity created by the current text-only interface does not come close, in most cases, to gathering true qualitative data. |
How can the Internet be combined with other methodologies?
Many studies require a combination of data collection methodologies. For example, the phone-fax-phone method is used quite often. In this procedure, a participant is screened and recruited over the phone, sent a fax instrument and phoned back to collect the answers.
Can the Internet be used in a similar fashion. The answer is ‘yes’ in most cases, but requires a much different understanding of the roadblocks that need to be overcome.
Example: Participants can be recruited via the phone and then asked to go to an Internet page containing an instrument (given they have a connection pathway to the Internet). However, studies conducted by InfoTek using this approach have not been as productive as using the fax. The reason stems from the fact that participants are less likely to review an instrument on the Internet compared to when they are a faxed an instrument.
On average, in this combination, we have found that a participant is about 25% less likely to complete an Internet instrument than a fax instrument. Why is this? It is our belief that the drawback lies in asking participants to actively log-on to the Internet and find the location of the page. This differs from a fax instrument where most of the time a fax arrives at their desk with no or little effort on their part. Thus, the difference stems in that the fax instrument is passive -- a respondent does not have to spend any of his or her time seeking out the instrument. On the other hand, the Internet instrument requires an active involvement that may last as much as 3-4 minutes.
Internet data collection can also be used in combination with qualitative work, such as focus groups or one-on-ones. Again, if the researcher thinks of Internet data collection as a speedy and interactive DBM survey it can be used in combination with almost any methodology.
What high-level statistics can be employed?
The most cited modeling technique programmed for the Internet is some form of Conjoint or Choice Modeling. Currently, Full-Profile Conjoint and Discrete Choice Modeling are the easiest techniques. It is our opinion that interactive modeling such as Sawtooth’s Adaptive Conjoint Analysis (ACA) are currently not viable alternatives. However, these interactive methods will most likely be programmed for the Internet in the future.
As with any paper-and-pencil instrument, most any parametric and non-parametric statistical procedures can be conducted using data from an Internet instrument. Simple methods such as significance testing and Chi-square analysis are commonly used. On the other end of the spectrum, the researcher can employ grouping procedures such as Cluster Analysis.
What will the future of Internet research most likely look like?
Imagine almost all families in the U.S. being seamlessly connected to the Internet via one of their PCs or televisions, all of which are interfaced via cable, fiber or wireless connectivity to the Internet. Imagine being able to identify and pre-screen thousands of potential research respondents in seconds before even asking for their participation. Imagine from that point requesting participation via another keystroke. Finally, within 24 hours, over 1,000 qualified participants have responded to your survey. Another viable future will be people participating in longitudinal studies where they participate in multiple phases of a project.
Another likely scenario is people from any location in the U.S. or other countries participating in an on-line videoconference about a new idea or product. Participants will be able to sit with their desktop, laptop or handheld PC, regardless of the location, and engage in a lively discussion about the merits and their reactions to the questions and stimulus. These conversations will involve 3D or holographic images along with sound, video, and text images in point to multi-point real time sessions.
Will it be this easy in the future? Maybe. One thing is for certain, however. Data collection will shift in this direction to some extent. Those researchers not currently working with this ‘new kid on the block’ will most likely be left behind to some extent by others seeing its future capabilities. One last note. We should all keep in mind what Wayne Gretzky is known for saying,: "Tthe key to success is skating to where you think the puck will be, not where it is at right now."
Tregg Farmer was founder and is currently president of InfoTek Research Group, Inc. based in Yakima, Washington. InfoTek specializes in conducting primary research on high technology products and services. Over the past 15 years, Tregg has assisted Fortune 500 companies as well as startup companies in designing and implementing Internet data collection procedures and programs. In addition, the fact that InfoTek assists these organizations in developing several many of the hardware and software products for the Internet’s infrastructure. This provides supplies usInfoTek with the knowledge of what data collection options will likely be most plausible in the future.