Abstract
This paper suggests an innovative randomized response model utilizing customizable random tool. The suggested model offers a general framework for some previously pioneering randomized response models and generates new efficient models. Comparison of the efficiency between one of these newly generated models and other groundbreaking models through theoretical and numerical ways, demonstrates higher efficiency for the new generated model. Additionally, ethical considerations and privacy protection of the suggested model are examined.
Figures
Citation: Aboalkhair AM, Zayed MA, Elbayoumi T, Alnefaie A, Alrawad M, Elshehawey AM (2025) An innovative randomized response model based on a customizable random tool. PLoS ONE 20(4): e0319780. https://doi.org/10.1371/journal.pone.0319780
Editor: Sara Hemati, SKUMS: Shahrekord University of Medical Science, IRAN, ISLAMIC REPUBLIC OF
Received: November 4, 2024; Accepted: February 9, 2025; Published: April 18, 2025
Copyright: © 2025 Aboalkhair et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the paper and its Supporting Information files.
Funding: This research was funded through the annual funding track by the Deanship of Scientific Research, from the vice presidency for graduate studies and scientific research, King Faisal University, Saudi Arabia [KFU250553].
Competing interests: The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
1. Introduction
Sample surveys may present situations where individuals would rather withhold or provide false information about certain questions when dealing with an interviewer, like cases of drug use, psychiatric conditions, infidelity issues, delinquency, criminal abortion, illegitimacy details and even political party affiliation. Evasive response bias could indeed be challenging to evaluate. Warner [1] came up with a proposal that could help reduce this bias— and that is ensuring privacy for the interviewee by using randomized responses technique (RRT). In such setups, individuals are at liberty to keep their personal information private by giving responses in a random manner, which helps address the issue of evasive response bias.
The RRT in surveys aims to minimize or avoid response errors when questioning individuals about delicate topics. The basic idea behind a design of randomized response is that information will be collected indirectly from interviewees by asking questions whose answers cannot be known with certainty by an interviewer. Thus, in RRT usage it is believed that interviewees provide truthful information that can aid in estimation.
Even though Warner’s technique enables collecting answers on delicate matters while upholding anonymity, its estimations have a raised standard error because of utilizing the random tool. Following Warner’s initial suggestion, many researchers have expanded the RRT in different dimensions. Their focus, however, has always revolved around curtailing estimation variance and bolstering model efficiency; this they achieve through various means such as proposing parameters selection based on specific criteria aimed at minimizing variance and resorting to alternative estimation methods— but primarily by suggesting design modifications to Warner’s original model.
Design modification is the primary approach taken by the majority of studies to improve the efficiency of RRT. Several authors have suggested different modifications to Warner’s model with the goal of enhancing its efficacy [2–19].
Aboalkhair et al. [20] brought in an innovative effective model through a design modification approach based on three randomizing devices. In their work, Aboalkhair et al. [20] showed that their method is an efficient substitute for models introduced by Mangat & Singh and Warner. The study established that using a randomized multi-stage instrument, especially with a higher number of stages, raises the chance of the sensitive question being chosen without significantly impacting interviewee’ trust in the tool or their honesty hence, it led to the effectiveness of the RR technique is enhanced. The inspiration for this research comes from previous research, and its aim is to create a generalized randomized response model.
2. Previously pioneering models
2.1. Warner’s model
The groundbreaking RR model is suggested by Warner [1] to estimate the percentage of individuals who possess delicate attributes π. According to Warner’s model, the estimation of π with suitable changes of notation is:
and variance given by:
2.2. Mangat & Singh’s model
Mangat & Singh [5] introduced an efficient two-stage RR model. The estimate of π in Mangat & Singh’s design is:
with variance given by:
Mangat & Singh [5] demonstrated that, their model outperforms the original Warner’s model by appropriately selecting any feasible values of and
.
2.3. Aboalkhair’s model
Aboalkhair [20] suggested an efficient model utilizing a three-stage random tool. Utilizing a random sample of n interviewees, Aboalkhair’s estimate of π and its variance with appropriate changes of notation are:
and,
In the following section, we propose a generalized version of Aboalkhair’s model that incorporates previous models, such as that suggested by Mangat & Singh and Warner as special cases, from which new efficient RR models can be generated.
3. The suggested model
3.1. Model description
To estimate π the proportion of individuals that have a delicate attribute (D) in specific population, each interviewee in a selected random sample is provided with a customizable random tool with j-stage as depicted in Fig 1. At the onset (in stage-one (S1)), the interviewee randomly chooses between two options: the first option being a yes/no query that determines if he/she has the delicate attribute, while the second option instructs them to proceed to the subsequent stage Si, (where i ranges from 2 to j-1). If he/she proceeds to (Si), he/she is given the same previous choice. If the interviewee reaches to the final stage (Sj), he/she is given a yes/no query about the sensitive attribute, similar to the original Warner’s model.
The probability of “Yes” () will be:
where:
: The probability of the question that determines if the interviewee has a specific delicate attribute or not shows up at stage g, as g = 1,2,3,…,j and
.
The estimator of π is as follows:
where denotes the ratio of ‘yes’ answers in the sample and
.
Customize in Eq. (8) we get
which coincides with Eq. (1). If we customize , we get
3.2. Properties of the suggested estimator
Since , therefore
is an unbiased estimator of α, and the variance of
is:
Theorem 1. The proposed estimator variance is
Proof. Utilizing Eq. (8), is
As , then
Substitute by Eq. (11) in Eq. (10) then,
Using Eq. (7), can be calculated as,
Then, Eq. (9) is obtained via substituting in Eq. (12) by Eq. (13).
Set , in Eq. (9) we get
Which is coincide with Eq. (2), if we customize , we get
Which is coincide with Eq. (4) and if we customize , we get Eq. (6).
Theorem 2. has an unbiased estimator given by
Proof. The result holds by taking the expected value of Eq. (14).
4. Privacy protection
In the randomized response technique, all types of models are subject to several ethical considerations. The main purpose of them is to empower researchers to find the balance between their need to elicit sensitive information and the ethical treatment of participants. First, the respondents must voluntarily and explicitly agree to participate, being informed about the nature and manner of implementation of the RR technique, its purpose, and their freedom to refuse to take part. The researcher has to make the details of data gathering and publication evident by describing the purpose of the study and how it will be shared.
Furthermore, researchers need to consider the possible effects of asking sensitive questions on participants and try to minimize associated harm or distress resulting from responding to them. All studies using Randomized Response techniques have to receive ethical approval from institutional review boards or ethics committees so that research is conducted in line with the ethical standards giving consideration to protecting the rights and welfare of the research participants. Individual responses will be anonymous and untraceable, with a special focus on privacy protection for respondents.
4.1. Privacy protection measure
One of the fundamental aspects of the randomized response technique is preserving interviewee ‘s privacy. Several privacy measures are suggested by researchers such as Anderson [21], Lanke [22], Leysieffer and Warner [23], and Zhimin & Zaizai [24]. Based on the latter approach, the privacy measure for Warner’s model is:
Also, the measure of privacy protection for Mangat & Singh’s model can be expressed as follows:
and for Aboalkhair’s model:
And for suggested model the design probabilities are:
and
Then, the privacy protection measure is:
where
To verify the validity of Eq. (19), set j = 1, j = 2 and j = 3 the measure of protection for Warner’s estimate, Mangat & Singh’s estimate and Aboalkhair’s estimate are obtained, as indicated by Eq. (16), Eq. (17), and Eq. (18) respectively.
A correlation between the privacy protection measure discussed earlier and the efficiency of each of the four models can be established. These correlation relationships are outlined as follows:
It is clear from Eqs (20–23) that as the values of decrease, the efficiency of
also decreases. Moreover, Zhimin & Zaizai [24] demonstrated that a higher level of privacy protection for interviewees is achieved when their measure of privacy protection has smaller values. A balance act is required.
5. Suggested RR model with four-stage random tool
To get a particular meaning for the suggested model with j-stage random tool, we consider the scenario where the number of stages is customized to be four (j = 4). In the initial stage (S1), the interviewee chooses between two options randomly: the first option being a yes/no query that determines if he/she has the delicate attribute, while the second option instructs them to proceed to the subsequent stage Si, (Si, where i takes values of 2 and 3), he/she is given the same choice as in the first stage. If the interviewee reaches the final stage (S4), he/she is given a yes/no query about the sensitive attribute, similar to the original Warner’s model.
The probability of receiving a “Yes” response (α) can be:
where:
: The probability of the question that determines if the interviewee has a delicate attribute shows up at stage g, as s = 1,2,3,4 and
.
The estimator suggested for is:
where is the proportion of ‘yes’ answer in the sample.
5.1. Properties of estimator
Corollary 1. The proposed estimator variance is
Corollary 2. has an unbiased estimator given by
5.2. Efficiency comparison
In this context, our aim is to show the particular circumstances in which the suggested estimator, with four-stage random tool, surpasses estimators that suggested by Warner, Mangat & Singh, and Aboalkhair.
The suggested model is more effective than Warner’s model iff:
This is achievable by selecting appropriate values for while maintaining a suitable practicable value for
.
The difference in efficiency between the suggested estimator (P) and Warner’s estimator (W) across feasible values and varying
and
values is illustrated in Fig 2a–2d. The vertical axis indicates efficiency difference, and the other two axes indicate the values of
and
. Parts a,b,c,d of the figure are for
repectively and for practical values of
(less than 0.5). Positive values indicate a clear advantage in favor of the suggested estimator in all cases.
It can be noted that (Fig 2a–2d):
- The estimates from the suggested model exhibit superior efficiency compared to those of Warner’s for
and all values of the probabilities
.
- When
increases from 0.1 to 0.4 and fixing
, the efficiency difference between the suggested estimate and that of Warner’s also increases.
- When fixing any set of three out the four probabilities
, the efficiency difference between the suggested estimate and that of Warner’s increases as the fourth probability decreases from 0.9 to 0.1. This is mainly because the fixed efficiency of Warner’s estimate whereas that of the suggested estimate always decreases when any of the
decreases.
The suggested model is more effective than Mangat & Singh’s model (MS) if:
This is achievable by selecting appropriate values for while maintaining a suitable practicable value for
and
.
The difference in efficiency between the suggested estimator (P) and Mangat & Singh’s estimator (MS) across feasible values and varying
and
values is illustrated in Fig 3 (a–d). Same settings as in Fig 2 are used for
. Positive values indicate the advantage of the suggested estimator in terms of efficiency.
It can be noted that (Fig 3a–3d):
- The estimates from the suggested model are more efficient than that of MS for all values of
,
and at practicable values range (0.1 to 0.4) of
,
.
- When fixing
and any set of two out the three probabilities
, the efficiency difference between the suggested estimate and that of MS increases as the fourth probability increases from 0.1 to 0.4.
- When fixing
,
and any of the remaining two probabilities
, the efficiency difference between the suggested estimate and that of MS increases as the fourth probability decreases from 0.9 to 0.1. This is mainly because the variance of MS estimate is fixed whereas that of the suggested estimate always decreases when any of the
decreases.
The suggested estimator is more effective than Aboalkhair’s estimator iff:
This is achievable by selecting appropriate values for while maintaining a suitable practicable value for
,
and
.
The difference in efficiency between the suggested estimator (P) and Aboalkhair’s estimator (AK) across feasible values and varying
and
values is illustrated in Fig 4a–4d. In this comparison as well, all differences were positive in favor of the proposed estimator.
It can be noted that (Fig 4a–4d):
- The estimates from the generalized suggested model are more efficient than that of AK for all values of
and at practicable values range (0.1 to 0.4) of
,
.
- When fixing
and any set of two out the three probabilities
, the efficiency difference between the suggested estimate and that of AK increases as the fourth probability increases from 0.1 to 0.4.
- The efficiency difference between the suggested estimate and that of AK always increases as
decreases and fixing all other probabilities
.
5.3. Practical guidelines for applying the suggested model in real-world scenarios
To demonstrate the process of estimating the proportion of individuals possessing a sensitive trait using the proposed model in real-world contexts, let’s say with a four-stage random tool, the following practical guidelines are recommended to be followed:
- The survey administrator customs the random device to an optimal number of stages (j), aligning with their assessment of real-world circumstances to strike a balance between efficiency and simplicity (
in this case)
- For the customized random device, the survey administrator sets a probability
for selecting a delicate statement in each stage (
) where
.
- A suitable random sample of ‘n’ participants is chosen.
- At the beginning of the trial, a concise overview is provided, outlining the entire process and emphasizing the design’s focus on safeguarding privacy.
- Each participant receives ‘Yes’ and ‘No’ cards, along with a four-stage random device.
- They are instructed to pick a card based on the random device’s outcome and their actual status regarding the sensitive attribute.
- Depending on the random device’s result, the process may end at any stage (g = 1,2,3,4).
- Participants discreetly place their chosen card into a container without disclosing to the interviewer their selection or at which stage the process has ended.
- The estimation of the proportion of individuals with a sensitive trait and its variance is accomplished by analyzing the sample outcomes and utilizing Eqs. (24 and 27).
6. Discussion
Comparisons of efficiency indicate that the suggested RR model with a four-stage random tool offers a more effective substitute for all of Warner [1], Mangat & Singh [5], and Aboalkhair [20] models. Furthermore, Aboalkhair’s model proves to be a more effective substitute for models suggested by Warner and Mangat & Singh. Similarly, Mangat & Singh’s model offers a more effective substitute for Warner’s model.
Setting value 0.1 for each of the probabilities appears to be the most efficient, and the least favorable, in laying the foundation of privacy protection. On the other hand, set value 0.9 for each of the probabilities
appears as the optimal value for protecting privacy as it is the lowest value for efficiency. Hence, opting for any of the combinations 0.5, 0.5, 0.5, 0.8; 0.5, 0.5, 0.8, 0.5; 0.5, 0.8, 0.5, 0.5; or 0.8, 0.5, 0.5, 0.5 as values for the probabilities
and
is quite rational. This selection will make sure that both the privacy and efficiency of the suggested model are comparable to the model that suggested by Warner when
. Furthermore, it is more likely to achieve the desirable result of targeting specific questions containing sensitive information without making interviewees too suspicious and to contribute to their cooperation.
From Fig 5, it can be deduced that the suggested model exhibits superior efficiency compared to those of Mangat & Singh [5] and Aboalkhair [20] when , regardless of the value of
. Also, when
and
, the suggested model efficiency equivalent to the efficiency of Warner’s model at
, Mangat & Singh’s model at
, and Aboalkhair’s model at
. These outcomes confirm the core idea of the generalized suggested model, which aims at increasing efficiency, suggesting the utilization of an increasing number of random devices while assigning low probability for choosing the sensitive question.
7. Limitations and future research
A possible limitation of the suggested model is that it views a general framework for some earlier models and generator of new efficient models in situations where complete honesty is expected. However, when it comes to highly delicate matters, then the probability of incomplete truthfulness arises. Which in turn opens up a future avenue to revise this model to comply with a scenario of incomplete truthfulness, and hence make it more suitable for accurately determining extremely sensitive characteristics.
References
- 1. Warner SL. Randomized response: A survey technique for eliminating evasive answer bias. J Am Stat Assoc. 1965;60(309):63–9.
- 2. Greenberg BG, Abul-Ela A-LA, Simmons WR, Horvitz DG. The unrelated question randomized response model: Theoretical framework. J Am Stat Assoc. 1969;64(326):520–39.
- 3. Moors JJA. Optimization of the unrelated question randomized response model. J Am Stat Assoc. 1971;66(335):627–9.
- 4. Raghavarao D. On an estimation problem in warner’s randomized response technique. Biometrics. 1978;34(1):87.
- 5. Mangat NS, Singh R. An alternative randomized response procedure. Biometrika. 1990;77(2):439–42.
- 6. Kuk AYC. Asking sensitive questions indirectly. Biometrika. 1990;77(2):436–8.
- 7. Mangat NS. An improved randomized response strategy. J R Stat Soc B. 1994;56:93–95.
- 8. Singh S, Singh R, Mangat NS. Some alternative strategies to Moors’ model in randomized response sampling. J Stat Plan Inference. 2000;83(1):243–55.
- 9. Bhargava M, Singh R. A modified randomization device for Warner’s model. Statistica. 2000;60:315–22.
- 10. Singh S, Horn S, Singh R, Mangat NS. On the use of modified randomization device for estimating the prevalence of a sensitive attribute. Stat Transit. 2003;6:515–22.
- 11. Huang K. A survey technique for estimating the proportion and sensitivity in a dichotomous finite population. Statistica Neerlandica. 2004;58(1):75–82.
- 12. Gupta S, Shabbir J, Iembo R. Modifications to Warner’s model using blank cards. Am J Math Manag Sci. 2006;26(1–2):185–96.
- 13. Gjestvang CR, Singh S. A new randomized response model. J R Stat Soc B. 2006;68(3):523–30.
- 14. Perri PF. Modified randomized devices for Simmons’ model. MAS. 2008;3(3):233–9.
- 15. Batool F, Shabbir J, Hussain Z. An improved binary randomized response model using six decks of cards. Commun Stat Simul Comput. 2016;46(4):2548–62.
- 16. Singh HP, Gorey SM. A new efficient unrelated randomized response model. Commun Stat Theory Methods. 2017;46(24):12059–74.
- 17. Narjis G, Shabbir J. Estimation of population proportion and sensitivity level using optional unrelated question randomized response techniques. Commun Stat Simul Comput. 2018;49(12):3212–26.
- 18. Singh GN, Suman S. A modified two-stage randomized response model for estimating the proportion of stigmatized attribute. J Appl Stat. 2018;46(6):958–78.
- 19. Aboalkhair AM, Zayed MA, Al-Nefaie AH, Alrawad M, Elshehawey AM. A novel efficient randomized response model designed for attributes of utmost sensitivity. Heliyon. 2024;10(20):e39082. pmid:39640826
- 20. Aboalkhair AM, Elshehawey AM, Zayed MA. A new improved randomized response model with application to compulsory motor insurance. Heliyon. 2024;10(5):e27252. pmid:38486730
- 21. Anderson H. Estimation of a proportion through randomized response. Int Stat Rev. 1976;44(2):213.
- 22. Lanke J. On the degree of protection in randomized interviews. Int Stat Rev. 1976;44(2):197.
- 23. Leysieffer FW, Warner SL. Respondent jeopardy and optimal designs in randomized response models. J Am Stat Assoc. 1976;71(355):649–56.
- 24. Zhimin H, Zaizai Y. Measure of privacy in randomized response model. Qual Quant. 2012;46(4):1167–80.