Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

An innovative randomized response model based on a customizable random tool

  • Ahmad M. Aboalkhair ,

    Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology

    aaboalkhair@kfu.edu.sa (AMA); mzayed@kfu.edu.sa (MAZ); malrawad@kfu.edu.sa (MA).

    Affiliations Department of Quantitative Methods, School of Business, King Faisal University, Al-Ahsa, Saudi Arabia, Department of Applied Statistics and Insurance, Faculty of Commerce, Mansoura University, Mansoura, Egypt

  • Mohammad A. Zayed ,

    Roles Formal analysis, Investigation, Methodology, Resources

    aaboalkhair@kfu.edu.sa (AMA); mzayed@kfu.edu.sa (MAZ); malrawad@kfu.edu.sa (MA).

    Affiliations Department of Quantitative Methods, School of Business, King Faisal University, Al-Ahsa, Saudi Arabia, Department of Applied Statistics and Insurance, Faculty of Commerce, Mansoura University, Mansoura, Egypt

  • Tamer Elbayoumi,

    Roles Visualization, Writing – original draft, Writing – review & editing

    Affiliations Department of Applied Statistics and Insurance, Faculty of Commerce, Mansoura University, Mansoura, Egypt, Department of Mathematics and Statistics, North Carolina A & T State University, Greensboro, North Carolina, United States of America

  • Abdullah Alnefaie,

    Roles Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Department of Quantitative Methods, School of Business, King Faisal University, Al-Ahsa, Saudi Arabia

  • Mahmaod Alrawad ,

    Roles Funding acquisition, Project administration, Writing – review & editing

    aaboalkhair@kfu.edu.sa (AMA); mzayed@kfu.edu.sa (MAZ); malrawad@kfu.edu.sa (MA).

    Affiliations Department of Quantitative Methods, School of Business, King Faisal University, Al-Ahsa, Saudi Arabia, College of Business Administration and Economics, Al-Hussein Bin Talal University, Ma’an, Jordan

  • Ahmed M. Elshehawey

    Roles Methodology, Project administration, Resources, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Department of Applied, Mathematical & Actuarial Statistics, Faculty of Commerce, Damietta University, New Damietta, Egypt

Abstract

This paper suggests an innovative randomized response model utilizing customizable random tool. The suggested model offers a general framework for some previously pioneering randomized response models and generates new efficient models. Comparison of the efficiency between one of these newly generated models and other groundbreaking models through theoretical and numerical ways, demonstrates higher efficiency for the new generated model. Additionally, ethical considerations and privacy protection of the suggested model are examined.

1. Introduction

Sample surveys may present situations where individuals would rather withhold or provide false information about certain questions when dealing with an interviewer, like cases of drug use, psychiatric conditions, infidelity issues, delinquency, criminal abortion, illegitimacy details and even political party affiliation. Evasive response bias could indeed be challenging to evaluate. Warner [1] came up with a proposal that could help reduce this bias— and that is ensuring privacy for the interviewee by using randomized responses technique (RRT). In such setups, individuals are at liberty to keep their personal information private by giving responses in a random manner, which helps address the issue of evasive response bias.

The RRT in surveys aims to minimize or avoid response errors when questioning individuals about delicate topics. The basic idea behind a design of randomized response is that information will be collected indirectly from interviewees by asking questions whose answers cannot be known with certainty by an interviewer. Thus, in RRT usage it is believed that interviewees provide truthful information that can aid in estimation.

Even though Warner’s technique enables collecting answers on delicate matters while upholding anonymity, its estimations have a raised standard error because of utilizing the random tool. Following Warner’s initial suggestion, many researchers have expanded the RRT in different dimensions. Their focus, however, has always revolved around curtailing estimation variance and bolstering model efficiency; this they achieve through various means such as proposing parameters selection based on specific criteria aimed at minimizing variance and resorting to alternative estimation methods— but primarily by suggesting design modifications to Warner’s original model.

Design modification is the primary approach taken by the majority of studies to improve the efficiency of RRT. Several authors have suggested different modifications to Warner’s model with the goal of enhancing its efficacy [219].

Aboalkhair et al. [20] brought in an innovative effective model through a design modification approach based on three randomizing devices. In their work, Aboalkhair et al. [20] showed that their method is an efficient substitute for models introduced by Mangat & Singh and Warner. The study established that using a randomized multi-stage instrument, especially with a higher number of stages, raises the chance of the sensitive question being chosen without significantly impacting interviewee’ trust in the tool or their honesty hence, it led to the effectiveness of the RR technique is enhanced. The inspiration for this research comes from previous research, and its aim is to create a generalized randomized response model.

2. Previously pioneering models

2.1. Warner’s model

The groundbreaking RR model is suggested by Warner [1] to estimate the percentage of individuals who possess delicate attributes π. According to Warner’s model, the estimation of π with suitable changes of notation is:

(1)

and variance given by:

(2)

2.2. Mangat & Singh’s model

Mangat & Singh [5] introduced an efficient two-stage RR model. The estimate of π in Mangat & Singh’s design is:

(3)

with variance given by:

(4)

Mangat & Singh [5] demonstrated that, their model outperforms the original Warner’s model by appropriately selecting any feasible values of and .

2.3. Aboalkhair’s model

Aboalkhair [20] suggested an efficient model utilizing a three-stage random tool. Utilizing a random sample of n interviewees, Aboalkhair’s estimate of π and its variance with appropriate changes of notation are:

(5)

and,

(6)

In the following section, we propose a generalized version of Aboalkhair’s model that incorporates previous models, such as that suggested by Mangat & Singh and Warner as special cases, from which new efficient RR models can be generated.

3. The suggested model

3.1. Model description

To estimate π the proportion of individuals that have a delicate attribute (D) in specific population, each interviewee in a selected random sample is provided with a customizable random tool with j-stage as depicted in Fig 1. At the onset (in stage-one (S1)), the interviewee randomly chooses between two options: the first option being a yes/no query that determines if he/she has the delicate attribute, while the second option instructs them to proceed to the subsequent stage Si, (where i ranges from 2 to j-1). If he/she proceeds to (Si), he/she is given the same previous choice. If the interviewee reaches to the final stage (Sj), he/she is given a yes/no query about the sensitive attribute, similar to the original Warner’s model.

The probability of “Yes” () will be:

(7)

where:

: The probability of the question that determines if the interviewee has a specific delicate attribute or not shows up at stage g, as g =  1,2,3,…,j and .

The estimator of π is as follows:

(8)

where denotes the ratio of ‘yes’ answers in the sample and .

Customize in Eq. (8) we get

which coincides with Eq. (1). If we customize , we get

which is coincide with Eq. (3) and if set , we get Eq. (5).

3.2. Properties of the suggested estimator

Since , therefore is an unbiased estimator of α, and the variance of is:

Theorem 1. The proposed estimator variance is

(9)

Proof. Utilizing Eq. (8), is

(10)

As , then

(11)

Substitute by Eq. (11) in Eq. (10) then,

(12)

Using Eq. (7), can be calculated as,

(13)

Then, Eq. (9) is obtained via substituting in Eq. (12) by Eq. (13).

Set , in Eq. (9) we get

Which is coincide with Eq. (2), if we customize , we get

Which is coincide with Eq. (4) and if we customize , we get Eq. (6).

Theorem 2. has an unbiased estimator given by

(14)

Proof. The result holds by taking the expected value of Eq. (14).

3.3. Efficiency comparison of the suggested estimator

The proposed model with j-stage outperforms the model with j-g-stage, in terms of efficiency iff:

(15)

4. Privacy protection

In the randomized response technique, all types of models are subject to several ethical considerations. The main purpose of them is to empower researchers to find the balance between their need to elicit sensitive information and the ethical treatment of participants. First, the respondents must voluntarily and explicitly agree to participate, being informed about the nature and manner of implementation of the RR technique, its purpose, and their freedom to refuse to take part. The researcher has to make the details of data gathering and publication evident by describing the purpose of the study and how it will be shared.

Furthermore, researchers need to consider the possible effects of asking sensitive questions on participants and try to minimize associated harm or distress resulting from responding to them. All studies using Randomized Response techniques have to receive ethical approval from institutional review boards or ethics committees so that research is conducted in line with the ethical standards giving consideration to protecting the rights and welfare of the research participants. Individual responses will be anonymous and untraceable, with a special focus on privacy protection for respondents.

4.1. Privacy protection measure

One of the fundamental aspects of the randomized response technique is preserving interviewee ‘s privacy. Several privacy measures are suggested by researchers such as Anderson [21], Lanke [22], Leysieffer and Warner [23], and Zhimin & Zaizai [24]. Based on the latter approach, the privacy measure for Warner’s model is:

(16)

Also, the measure of privacy protection for Mangat & Singh’s model can be expressed as follows:

(17)

and for Aboalkhair’s model:

(18)

And for suggested model the design probabilities are:

and

Then, the privacy protection measure is:

where

(19)

To verify the validity of Eq. (19), set j = 1, j = 2 and j = 3 the measure of protection for Warner’s estimate, Mangat & Singh’s estimate and Aboalkhair’s estimate are obtained, as indicated by Eq. (16), Eq. (17), and Eq. (18) respectively.

A correlation between the privacy protection measure discussed earlier and the efficiency of each of the four models can be established. These correlation relationships are outlined as follows:

(20)(21)(22)(23)

It is clear from Eqs (2023) that as the values of decrease, the efficiency of also decreases. Moreover, Zhimin & Zaizai [24] demonstrated that a higher level of privacy protection for interviewees is achieved when their measure of privacy protection has smaller values. A balance act is required.

5. Suggested RR model with four-stage random tool

To get a particular meaning for the suggested model with j-stage random tool, we consider the scenario where the number of stages is customized to be four (j = 4). In the initial stage (S1), the interviewee chooses between two options randomly: the first option being a yes/no query that determines if he/she has the delicate attribute, while the second option instructs them to proceed to the subsequent stage Si, (Si, where i takes values of 2 and 3), he/she is given the same choice as in the first stage. If the interviewee reaches the final stage (S4), he/she is given a yes/no query about the sensitive attribute, similar to the original Warner’s model.

The probability of receiving a “Yes” response (α) can be:

(24)

where:

: The probability of the question that determines if the interviewee has a delicate attribute shows up at stage g, as s =  1,2,3,4 and .

The estimator suggested for is:

(25)

where is the proportion of ‘yes’ answer in the sample.

5.1. Properties of estimator

Corollary 1. The proposed estimator variance is

(26)

Corollary 2. has an unbiased estimator given by

(27)

5.2. Efficiency comparison

In this context, our aim is to show the particular circumstances in which the suggested estimator, with four-stage random tool, surpasses estimators that suggested by Warner, Mangat & Singh, and Aboalkhair.

The suggested model is more effective than Warner’s model iff:

(28)

This is achievable by selecting appropriate values for while maintaining a suitable practicable value for .

The difference in efficiency between the suggested estimator (P) and Warner’s estimator (W) across feasible values and varying and values is illustrated in Fig 2a2d. The vertical axis indicates efficiency difference, and the other two axes indicate the values of and . Parts a,b,c,d of the figure are for repectively and for practical values of (less than 0.5). Positive values indicate a clear advantage in favor of the suggested estimator in all cases.

thumbnail
Fig 2. (a–d) The difference in efficiency between the suggested estimator (P) and Warner’s estimator (W) across feasible q_1 values and varying q_(2,) q_3 and 〖 q〗_4 values.

https://doi.org/10.1371/journal.pone.0319780.g002

It can be noted that (Fig 2a2d):

  1. The estimates from the suggested model exhibit superior efficiency compared to those of Warner’s for and all values of the probabilities .
  2. When increases from 0.1 to 0.4 and fixing , the efficiency difference between the suggested estimate and that of Warner’s also increases.
  3. When fixing any set of three out the four probabilities , the efficiency difference between the suggested estimate and that of Warner’s increases as the fourth probability decreases from 0.9 to 0.1. This is mainly because the fixed efficiency of Warner’s estimate whereas that of the suggested estimate always decreases when any of the decreases.

The suggested model is more effective than Mangat & Singh’s model (MS) if:

(29)

This is achievable by selecting appropriate values for while maintaining a suitable practicable value for and .

The difference in efficiency between the suggested estimator (P) and Mangat & Singh’s estimator (MS) across feasible values and varying and values is illustrated in Fig 3 (a–d). Same settings as in Fig 2 are used for . Positive values indicate the advantage of the suggested estimator in terms of efficiency.

thumbnail
Fig 3. (a–d) The difference in efficiency between the suggested estimator (P) and Mangat & Singh’s estimator (MS) across feasible q_1 values and varying q_(2,) q_3 and 〖 q〗_4 values.

https://doi.org/10.1371/journal.pone.0319780.g003

It can be noted that (Fig 3a3d):

  1. The estimates from the suggested model are more efficient than that of MS for all values of , and at practicable values range (0.1 to 0.4) of ,.
  2. When fixing and any set of two out the three probabilities , the efficiency difference between the suggested estimate and that of MS increases as the fourth probability increases from 0.1 to 0.4.
  3. When fixing , and any of the remaining two probabilities , the efficiency difference between the suggested estimate and that of MS increases as the fourth probability decreases from 0.9 to 0.1. This is mainly because the variance of MS estimate is fixed whereas that of the suggested estimate always decreases when any of the decreases.

The suggested estimator is more effective than Aboalkhair’s estimator iff:

(30)

This is achievable by selecting appropriate values for while maintaining a suitable practicable value for , and .

The difference in efficiency between the suggested estimator (P) and Aboalkhair’s estimator (AK) across feasible values and varying and values is illustrated in Fig 4a4d. In this comparison as well, all differences were positive in favor of the proposed estimator.

thumbnail
Fig 4. (a–d) The difference in efficiency between the suggested estimator (P) and Aboalkhair’s estimator (AK) across feasible q_1 values and varying q_(2,) q_3 and 〖 q〗_4 values.

https://doi.org/10.1371/journal.pone.0319780.g004

It can be noted that (Fig 4a4d):

  1. The estimates from the generalized suggested model are more efficient than that of AK for all values of and at practicable values range (0.1 to 0.4) of ,.
  2. When fixing and any set of two out the three probabilities , the efficiency difference between the suggested estimate and that of AK increases as the fourth probability increases from 0.1 to 0.4.
  3. The efficiency difference between the suggested estimate and that of AK always increases as decreases and fixing all other probabilities .

5.3. Practical guidelines for applying the suggested model in real-world scenarios

To demonstrate the process of estimating the proportion of individuals possessing a sensitive trait using the proposed model in real-world contexts, let’s say with a four-stage random tool, the following practical guidelines are recommended to be followed:

  • The survey administrator customs the random device to an optimal number of stages (j), aligning with their assessment of real-world circumstances to strike a balance between efficiency and simplicity ( in this case)
  • For the customized random device, the survey administrator sets a probability for selecting a delicate statement in each stage () where .
  • A suitable random sample of ‘n’ participants is chosen.
  • At the beginning of the trial, a concise overview is provided, outlining the entire process and emphasizing the design’s focus on safeguarding privacy.
  • Each participant receives ‘Yes’ and ‘No’ cards, along with a four-stage random device.
  • They are instructed to pick a card based on the random device’s outcome and their actual status regarding the sensitive attribute.
  • Depending on the random device’s result, the process may end at any stage (g = 1,2,3,4).
  • Participants discreetly place their chosen card into a container without disclosing to the interviewer their selection or at which stage the process has ended.
  • The estimation of the proportion of individuals with a sensitive trait and its variance is accomplished by analyzing the sample outcomes and utilizing Eqs. (24 and 27).

6. Discussion

Comparisons of efficiency indicate that the suggested RR model with a four-stage random tool offers a more effective substitute for all of Warner [1], Mangat & Singh [5], and Aboalkhair [20] models. Furthermore, Aboalkhair’s model proves to be a more effective substitute for models suggested by Warner and Mangat & Singh. Similarly, Mangat & Singh’s model offers a more effective substitute for Warner’s model.

Setting value 0.1 for each of the probabilities appears to be the most efficient, and the least favorable, in laying the foundation of privacy protection. On the other hand, set value 0.9 for each of the probabilities appears as the optimal value for protecting privacy as it is the lowest value for efficiency. Hence, opting for any of the combinations 0.5, 0.5, 0.5, 0.8; 0.5, 0.5, 0.8, 0.5; 0.5, 0.8, 0.5, 0.5; or 0.8, 0.5, 0.5, 0.5 as values for the probabilities and is quite rational. This selection will make sure that both the privacy and efficiency of the suggested model are comparable to the model that suggested by Warner when . Furthermore, it is more likely to achieve the desirable result of targeting specific questions containing sensitive information without making interviewees too suspicious and to contribute to their cooperation.

From Fig 5, it can be deduced that the suggested model exhibits superior efficiency compared to those of Mangat & Singh [5] and Aboalkhair [20] when , regardless of the value of . Also, when and , the suggested model efficiency equivalent to the efficiency of Warner’s model at , Mangat & Singh’s model at , and Aboalkhair’s model at . These outcomes confirm the core idea of the generalized suggested model, which aims at increasing efficiency, suggesting the utilization of an increasing number of random devices while assigning low probability for choosing the sensitive question.

thumbnail
Fig 5. (a, b) The variances of the proposed estimator, Aboalkhair’s estimator, Mangat & Singh’s estimator, and Warner’s estimator at selected values for q_1, q_2, q_3, q_(4).

https://doi.org/10.1371/journal.pone.0319780.g005

7. Limitations and future research

A possible limitation of the suggested model is that it views a general framework for some earlier models and generator of new efficient models in situations where complete honesty is expected. However, when it comes to highly delicate matters, then the probability of incomplete truthfulness arises. Which in turn opens up a future avenue to revise this model to comply with a scenario of incomplete truthfulness, and hence make it more suitable for accurately determining extremely sensitive characteristics.

References

  1. 1. Warner SL. Randomized response: A survey technique for eliminating evasive answer bias. J Am Stat Assoc. 1965;60(309):63–9.
  2. 2. Greenberg BG, Abul-Ela A-LA, Simmons WR, Horvitz DG. The unrelated question randomized response model: Theoretical framework. J Am Stat Assoc. 1969;64(326):520–39.
  3. 3. Moors JJA. Optimization of the unrelated question randomized response model. J Am Stat Assoc. 1971;66(335):627–9.
  4. 4. Raghavarao D. On an estimation problem in warner’s randomized response technique. Biometrics. 1978;34(1):87.
  5. 5. Mangat NS, Singh R. An alternative randomized response procedure. Biometrika. 1990;77(2):439–42.
  6. 6. Kuk AYC. Asking sensitive questions indirectly. Biometrika. 1990;77(2):436–8.
  7. 7. Mangat NS. An improved randomized response strategy. J R Stat Soc B. 1994;56:93–95.
  8. 8. Singh S, Singh R, Mangat NS. Some alternative strategies to Moors’ model in randomized response sampling. J Stat Plan Inference. 2000;83(1):243–55.
  9. 9. Bhargava M, Singh R. A modified randomization device for Warner’s model. Statistica. 2000;60:315–22.
  10. 10. Singh S, Horn S, Singh R, Mangat NS. On the use of modified randomization device for estimating the prevalence of a sensitive attribute. Stat Transit. 2003;6:515–22.
  11. 11. Huang K. A survey technique for estimating the proportion and sensitivity in a dichotomous finite population. Statistica Neerlandica. 2004;58(1):75–82.
  12. 12. Gupta S, Shabbir J, Iembo R. Modifications to Warner’s model using blank cards. Am J Math Manag Sci. 2006;26(1–2):185–96.
  13. 13. Gjestvang CR, Singh S. A new randomized response model. J R Stat Soc B. 2006;68(3):523–30.
  14. 14. Perri PF. Modified randomized devices for Simmons’ model. MAS. 2008;3(3):233–9.
  15. 15. Batool F, Shabbir J, Hussain Z. An improved binary randomized response model using six decks of cards. Commun Stat Simul Comput. 2016;46(4):2548–62.
  16. 16. Singh HP, Gorey SM. A new efficient unrelated randomized response model. Commun Stat Theory Methods. 2017;46(24):12059–74.
  17. 17. Narjis G, Shabbir J. Estimation of population proportion and sensitivity level using optional unrelated question randomized response techniques. Commun Stat Simul Comput. 2018;49(12):3212–26.
  18. 18. Singh GN, Suman S. A modified two-stage randomized response model for estimating the proportion of stigmatized attribute. J Appl Stat. 2018;46(6):958–78.
  19. 19. Aboalkhair AM, Zayed MA, Al-Nefaie AH, Alrawad M, Elshehawey AM. A novel efficient randomized response model designed for attributes of utmost sensitivity. Heliyon. 2024;10(20):e39082. pmid:39640826
  20. 20. Aboalkhair AM, Elshehawey AM, Zayed MA. A new improved randomized response model with application to compulsory motor insurance. Heliyon. 2024;10(5):e27252. pmid:38486730
  21. 21. Anderson H. Estimation of a proportion through randomized response. Int Stat Rev. 1976;44(2):213.
  22. 22. Lanke J. On the degree of protection in randomized interviews. Int Stat Rev. 1976;44(2):197.
  23. 23. Leysieffer FW, Warner SL. Respondent jeopardy and optimal designs in randomized response models. J Am Stat Assoc. 1976;71(355):649–56.
  24. 24. Zhimin H, Zaizai Y. Measure of privacy in randomized response model. Qual Quant. 2012;46(4):1167–80.