***************************************************************************************
* SIMULATION OF THE AGREEMENT BETWEEN TWO NEEDS MEASURES IN RAPID NEEDS ASSESSMENTS:  *
* 1. SEVERITY (ordinal; independent for each need variable) and                       *  
* 2. PRIORITY (interval-level under Borda count assumptions; relative to other needs) *
* under various levels of measurement error.                                          *
*                                                                                     *
* Aldo Benini for the Assessment Capacity Project (ACAPS), Geneva, Switzerland        *
* as part of a review of the Syria J-RANS II Assessment.                              *
*                                                                                     *
* Version: 25 June 2013. USER TO REPLACE C:\... WITH OWN WORKING DIRECTORY!           *
***************************************************************************************

set more off

**************************************************************************
* PART 1: SIMULATED NEEDS VARIABLES, INITIALLY WITHOUT MEASUREMENT ERROR *
**************************************************************************

* Generating the random variates for needs, the derived needs scores and (Borda-scored) priorities
* Working with 7 different need sectors; no measurement error at this stage.
* The needs sectors are not substantively identified here.
* In the J-RANS II they were: Health, food security, nutrition, WASH, shelter/NFI, education, protection.
* However, severity scores were elicitied for five sectors only. This simulation is not limited in this way.
* Assuming continuous underlying need variables, weakly correlated (0.4 between any two of them), which is
* close to the median Spearman's rank correlation observed among sector severity scores in the J-RANS II.

* Setting a working directory:
cd C:\...

* Correlation structure stored in:
use "C:\...\130623_1136AB_CorrelationStructure.dta", clear

mkmat need1-need7, matrix(Needscorr)
mkmat nmeans, matrix(Needmeans)
set obs 100
corr2data needreal1 - needreal7,  corr(Needscorr) seed(1002) means(Needmeans)
summ needreal*
corr needreal*
drop need1-need7 nmeans
matrix drop Needscorr
matrix drop Needmeans

save "C:\...\130623_1145AB_NeedsIn100CommunitiesSimul.dta", replace
* Save working copy for the error simulation:
save "C:\...\130623_1447AB_SeverityPriorityCorr_w_Error.dta", replace

* Create the categorical severity scores, 
* between 1 (no shortages) and 5 (many people are already dying as a result of shortages [in the sector in point])
forvalues i = 1(1)7 {
gen needscore`i' = needreal`i' if needreal`i'  ~=.
replace needscore`i' = round(needscore`i')
replace needscore`i' = 1 if  needscore`i' < 1
replace needscore`i' = 5 if  needscore`i' > 5 & needreal`i'  ~=.
}

* Create a rank variable for every underlying need variable, with 7 being the highest need
rowranks needreal1- needreal7, gen(needrank1-needrank7)
	* "rowranks" is not a standard command in STATA. It was written by Nicholas J. Cox and advertised
	* in his "Speaking STATA: Rowwise", The Stata Journal, 2009, 9/1, 137-157. To install it, type
	* "findit pr0046" in the command box (without the inverted commas) and go from there.

* Correlations between each scored need and the ranks:
* [Of interest only because below we generate a Borda score limited to three priorities]
forvalues i = 1(1)7 {
di "Correlation for need `i' :"
spearman needscore`i' needrank`i'
}

* Generate Borda scores, but score only the highest three ranks, as 3, 2 and 1, and set the others to zero:
forvalues i = 1(1)7 {
gen Borda3opt_`i' = needrank`i' - 4
replace Borda3opt_`i' = 0 if Borda3opt_`i' <0
}

************************************************
* SUMMARY STATISTICS AND MEASURES OF AGREEMENT *
************************************************
capture drop Needs67SevPriAgree
* In case this variable already exists. 

* Summary statistics of interest:
tabstat needreal* needscore* Borda3opt_*, statistics( mean p50 min max sd) c(s)
* The results - in this particular simulation setup - show concordance between
* severity and (Borda-scored) priority for the two highest needs, needs #6 and 7.
* This is based on the comparison of medians of the ordinal severity scores and
* the Borda scores (for the latter, it is indifferent whether the ordinal [medians] or cardinal 
* [means] interpretations are used).

* Therefore we look at the degree of concordance over all cases:

* 1. by means of the correlations between scored needs and the truncated Borda score:
	forvalues i = 1(1)7 {
	di "Correlation for need `i' :"
	spearman needscore`i' Borda3opt_`i'
	}
* 2. by the extent to Needs #6 and 7 are correspondingly rated in severity and priority scores:
	gen byte Needs67SevPriAgree = ( needscore6 + needscore7 >= 8) *( Borda3opt_6+ Borda3opt_7 == 5) ///
	 +  ( needscore6 + needscore7 < 8) *( Borda3opt_6 + Borda3opt_7 < 5) /* The "*" and "+" outside the parentheses work as logical operators. */
	summ Needs67SevPriAgree
* [The agreement is not 100% because: 1. rounding and truncation in severity scores, 2. interference by other needs variables in the Borda scores.]

save "C:\...\130623_1447AB_SeverityPriorityCorr_w_Error.dta", replace

*************************************************************************
* COMBINED GRAPH, TO DEMONSTRATE CONNECTION BETW. SEVERITY AND PRIORITY *
* Exemplified with highest need (need # 7) and lowest (need # 1)        *
* Produces raw graph, to be edited for titles                           *
*************************************************************************

capture drop group1 group7 tag1 tag7 group1total group7total one
* In case these variables already exist.

gen byte one = 1

*Panel 1:
* Underlying needs, lowest vs. highest:
twoway scatter needreal7 needreal1, name(scatterreal, replace)
* graph save Graph "C:\...\ScatterNeedreal7vs1.gph", replace

* Panel 2:
* Bubble graph severity scores cross-tabulation
preserve
contract needscore7 needscore1
twoway scatter needscore7 needscore1 [aw=_freq], yscale(range(0.5 5.5)) ylabel(1(1)5) xscale(range(0.5 5.5)) xlabel(1(1)5) name(scatterscores, replace)
* graph save scatterscores "C:\...\ScatterNeedscore7vs1.gph", replace
restore

* Panel 3:
* Bubble graph Borda scores cross-tabulation
preserve
contract Borda3opt_7 Borda3opt_1
twoway scatter Borda3opt_7 Borda3opt_1 [aw=_freq], yscale(range(-0.5 3.5)) xscale(range(-0.5 3.5)) ylabel(0(1)3) xlabel(0(1)3) name(scatterBorda, replace)
*graph save scatterBorda "C:\...\ScatterBorda7vs1.gph", replace
restore

* Auxiliary variables needed to produce two Bubble graphs crossing severity and priority (Borda) scores,
* one for need 1, one for need 7:
egen group7 = group( needscore7 Borda3opt_7)
egen tag7 = tag( group7)
egen group1 = group( needscore1 Borda3opt_1)
egen tag1 = tag( group1)
bysort group7: egen group7total = total(one)
bysort group1: egen group1total = total(one)

* Panel 4:
* Two combined Bubble graphs for severity vs. priority:
twoway scatter Borda3opt_7 needscore7 [aw = group7total] if tag7, yscale(range(-0.5 3.5)) xscale(range(0.5 5.5)) ylabel(0(1)3) xlabel(1(1)5) name(Bubble7, replace)
* graph save Bubble7 "C:\...\Bubble7.gph", replace
twoway scatter Borda3opt_1 needscore1 [aw = group1total] if tag1, yscale(range(-0.5 3.5)) xscale(range(0.5 5.5)) ylabel(0(1)3) xlabel(1(1)5) name(Bubble1, replace)
* graph save Bubble1 "C:\...\Bubble1.gph", replace
graph combine Bubble1 Bubble7, name(TwoBubbles, replace)
* graph save TwoBubbles "C:\...\TwoBubbles.gph", replace

* Combined raw graph:
graph combine scatterreal scatterscores scatterBorda TwoBubbles
* graph save Graph "C:\...\CombinedFourPanelsEdited.gph", replace

drop group7 - group1total
save "C:\...\130623_1447AB_SeverityPriorityCorr_w_Error.dta", replace

***********************************************************
* PART 2: MEASUREMENT ERRORS IN SCORING AND PRIORITIZING  *
***********************************************************

* PRELIMINARIES:

* Create an empty shell for the collection of simulation results at the bottom of the do file:
	clear
	gen recno = _n
	forvalues i = 1(1)7 {
	gen NeedSc`i'Med = .
	gen PriorSc`i'Med = .
	gen PriorSc`i'Mean = .
	gen Corr`i' = .
	}
	gen Needs67_agree = .
	save CollectSimResults2, replace


* THE PROGRAM FOR THE SIMULATION PART:

* The program part, which the simulate command (below) calls to generate observations with measurement error.
capture program drop SeverPriorCorrel /* capture ignores the error if there is no program "SeverPriorCorrel" to drop */

* Create variables for the error-laden needs score base and needs priority base
use "C:\...\130623_1447AB_SeverityPriorityCorr_w_Error.dta", clear

	forvalues i = 1(1)7 {
	gen needscore`i'err = .
	gen priorbase`i'err = . /* The error-laden underlying needs variables on which to compute priorities, then Borda scores. */
	gen Borda3opt_`i'err = .
	}

save "C:\...\130623_1447AB_SeverityPriorityCorr_w_Error.dta", replace

program SeverPriorCorrel, rclass
version 12

* Define the program's arguments:
args errormultsever errormultprior
* These are names for the error multiplication factors used in the formulas below. The simulate command will pass values to them.
* Further below, in the simulation part, errormultsever will be represented by "k", errormultprior by "j".

* Access the data file with the "observed" variables:
use "C:\...\130623_1447AB_SeverityPriorityCorr_w_Error.dta", clear

* Housekeeping stuff:
 	capture drop severerr*
	capture drop priorerr*
	capture drop needrankerr*
	capture drop Borda3opt*err
	capture drop Needs67ErrAgree
 
* Generate error factors, and hence the severity and Borda scores with errors:
	forvalues i = 1(1)7 {

	* Create the severity scores with error:
	gen severerr`i' = rnormal() * 0.25 * `errormultsever' /* Error factor */
	* The simulation steps up errormultsever from 0 to 4; i.a.w. at the maximum, 
	* the error component will have the same SD (= 0.25 * 4 = 1) as the simulated needs.
	* This would make for rather stark mean absolute errors; the first two steps up
	* (up to half the SD of the simulated needs) should be more realistic.
	* [Same remark holds for the priorities with error below.]
	
	* ADDITIVE ERROR MODEL
	replace needscore`i'err = severerr`i' + needreal`i'
	replace needscore`i'err = round(needscore`i'err)
	replace needscore`i'err = 1 if  needscore`i'err < 1
	replace needscore`i'err = 5 if  needscore`i'err > 5 & needreal`i'  ~=. /* "& needreal`i'  ~=." only for good manners. Not really needed. */


	* Create the bases with error for the Borda scoring:
	* [We assume that the scoring and priority setting errors are independent.]
	gen priorerr`i' = rnormal() * 0.25 * `errormultprior' /* Error factor; see remarks above. */
	replace priorbase`i'err = priorerr`i' + needreal`i'
	}
	
	rowranks priorbase*err, gen(needrankerr1-needrankerr7)
	
	forvalues i = 1(1)7 {
	gen Borda3opt_`i'err = needrankerr`i' - 4
	replace Borda3opt_`i'err = 0 if Borda3opt_`i'err < 0
	}

*************************************************************************
* Calculate statistics on the error-laden severity and priority scores: *
*************************************************************************
	
* Medians of severity scores; medians and means of Borda scores:	
	forvalues i = 1(1)7 {
	summ needscore`i'err, detail
	return scalar needsc`i'err_med = r(p50)
	summ Borda3opt_`i'err, detail
	return scalar Borda3opt_`i'err_med = r(p50)
	return scalar Borda3opt_`i'err_mean = r(mean)
	}

* Spearman's rank order correlations between the severity scores and the modified Borda scores:
	forvalues i = 1(1)7 {
	spearman needscore`i'err Borda3opt_`i'err 
	return scalar SpearmanCorr`i' = r(rho)
	}

* Agreement statistic for needs #6 and 7, to either be the highest on both severity and priority, or neither:
	gen byte Needs67ErrAgree = ( needscore6err + needscore7err >= 8) * ( Borda3opt_6err + Borda3opt_7err == 5) ///
		+ ( needscore6err + needscore7err < 8) * ( Borda3opt_6err+ Borda3opt_7err < 5)	
	summarize Needs67ErrAgree
	return scalar Needs67Agree = r(mean)

 
 end

* TESTING THE PROGRAM
* Unstar these three lines if you want to test the program:
* set seed 1111
* SeverPriorCorrel 1 1 /*Arbitrary test values for the two arguments. Because > 0, they do cause observed and true to differ, as intended. */
* exit

***************************
* PART 3: THE SIMULATION  *
***************************

* The simulation part, in which forvalues augments the measurement error factor in steps from 0 (no error) to 4
* the severity and modified Borda scores:

* Simulation command:

forvalues k = 0/4 {
	forvalues j = 0/4 {
	local seedi = 1234 + `k' + 10 * `j' /*Changes the random number seed at the start of each simulation run as we augment the error mult. factor.*/
	set seed `seedi'
	simulate    NeedSc1Med = r(needsc1err_med) ///
				NeedSc2Med = r(needsc2err_med) ///
				NeedSc3Med = r(needsc3err_med) ///
				NeedSc4Med = r(needsc4err_med) ///
				NeedSc5Med = r(needsc5err_med) ///
				NeedSc6Med = r(needsc6err_med) ///
				NeedSc7Med = r(needsc7err_med) ///
				PriorSc1Med = r(Borda3opt_1err_med) ///
				PriorSc2Med = r(Borda3opt_2err_med) ///
				PriorSc3Med = r(Borda3opt_3err_med) ///
				PriorSc4Med = r(Borda3opt_4err_med) ///
				PriorSc5Med = r(Borda3opt_5err_med) ///
				PriorSc6Med = r(Borda3opt_6err_med) ///
				PriorSc7Med = r(Borda3opt_7err_med) ///
				PriorSc1Mean = r(Borda3opt_1err_mean) ///
				PriorSc2Mean = r(Borda3opt_2err_mean) ///
				PriorSc3Mean = r(Borda3opt_3err_mean) ///
				PriorSc4Mean = r(Borda3opt_4err_mean) ///
				PriorSc5Mean = r(Borda3opt_5err_mean) ///
				PriorSc6Mean = r(Borda3opt_6err_mean) ///
				PriorSc7Mean = r(Borda3opt_7err_mean) ///		
				Corr1 = r(SpearmanCorr1) ///
				Corr2 = r(SpearmanCorr2) ///
				Corr3 = r(SpearmanCorr3) ///
				Corr4 = r(SpearmanCorr4) ///
				Corr5 = r(SpearmanCorr5) ///
				Corr6 = r(SpearmanCorr6) ///
				Corr7 = r(SpearmanCorr7) ///
				Needs67_agree = r(Needs67Agree) ///
				, reps(100) nodots: SeverPriorCorrel `k' `j'
				/* Not very elegant: attempts to use forvalues all causes errors */
	summarize
	tempfile results
		gen byte ScoreErrorFactor = `k'
		gen byte PriorErrorFactor = `j'
		save "`results'", replace
	use CollectSimResults2, clear
	append using "`results'"
	replace recno = _n
	save CollectSimResults2, replace
	
	}   
}

***********************
* Tables of interest: *
***********************

* 1. Summary stats:
bysort ScoreErrorFactor PriorErrorFactor: summ NeedSc*Med PriorSc*Med PriorSc*Mean 

* 2. Correlation between severity and priority scores - Robustness to measurement error:
* [Example: Highest need (need #7).]
table ScoreErrorFactor PriorErrorFactor, c(mean Corr7)

* 3. Agreement, as regards needs #6 and 7, between severity and priority rating - in response to error levels:
table ScoreErrorFactor PriorErrorFactor, c(mean Needs67_agree)


* Housekeeping:
set more on
* Unstar "exit" if the variables with error of the last simulation run are to be kept.
* exit
use "C:\...\130623_1447AB_SeverityPriorityCorr_w_Error.dta", clear
capture drop Needs67SevPriAgree
capture drop needscore1err - Borda3opt_7err
save "C:\...\130623_1447AB_SeverityPriorityCorr_w_Error.dta", replace




