Computing with words for student peer assessment in oral presentation Cómputo con palabras para la evaluación de pares estudiantiles en presentaciones orales

Peer assessment in an oral presentation can motivate and give more sense of responsibility to students. In recent years, various methods have been proposed to evaluate peers. In this paper, a novel peer online assessment method is proposed for oral presentation using perceptual computing. The output of the proposed system can be a numerical score for the overall assessment of a student in the presentation, which allows comparison and ranking of student performance. Furthermore, a linguistic evaluation that describes the student's performance is obtained from the system. A case study has been conducted to show the effectiveness of the proposed method; then the results are analyzed and reviewed.


INTRODUCTION
Students are evaluated and graded mainly based on human judgments that tend to be subjective. Peer assessment techniques provide a good alternative by distributing the evaluation task among individuals instead of relying on the individual's judges and aggregates different opinions for the final ranking. Peer assessment is a process in which individuals are rated by their peers (Boud & Holmes, 1981).
Peer assessment may be performed in numerous ways: assessment of groups or individuals, of written or oral assignments, by peers with the same or with different capabilities, performed anonymously or nonanonymously, etc. (Topping, 2010). Significant research has shown the effectiveness of peer assessment in teaching and learning many subjects. (Falchikov, 1995) noted that peer assessment could have at least five distinct objectives: as a social control tool, as an assessment tool, as a learning tool, as a learning-how-toassess tool, as an active participation tool.
In general, peer assessment has two types: (i) asking students to assess other students' work in a class; (ii) asking students to assess other students' performance in an identical group (Falchikov, 1995) (Chai, Tay, & Lim, 2015). The rapid development of Internet technologies has led to a dramatic change in educational methods in the last decade. Online peer evaluation has received considerable attention, which is a more effective way to collect peer feedback than face-to-face assessment (Pangaro, 2000). This article focuses on the summative assessment that students judge online about their peers' oral presentation against specific criteria.
Numbers have traditionally been used to assess student performance. However, as (Yang, Wang, Xu, & Chin, 2006) points out, it may be difficult to define evaluation scores as independent crisp sets, so it would be more natural to use ambiguous and subjective linguistic terms to define evaluation scores whose meanings may overlap. In the last decade, many efforts have been made to reduce uncertainty in the educational evaluation process by applying fuzzy set theory.
The use of techniques related to fuzzy sets in education assessment is not new (Adachi et al., 2010;Bai & Chen, 2008;Biswas, 1995;Saleh & Kim, 2009). Fuzzy set theory is an efficient and effective way to display uncertainty and fuzzy terms in the assessment environments (Ma & Zhou, 2000). Compared to approaches depending on numerical grading scores, fuzzy sets offer an alternative to a linguistic evaluation in which the "fuzzy" words are used instead of numbers throughout the evaluation method.
Compared to T1 FSs, IT2 FSs provide the ability to model second-order uncertainties. Given the usefulness and flexibility of IT2 FSs, a perceptual computing method is suggested.
In this paper, we present a CWW model based on Perceptual Computer architecture. All linguistic terms are exploited from the students' own point of view and are characterized by type 2 fuzzy sets (IT2FS).
The rest of this paper is organized as follows: In Section 2, some definitions and basics regarding type-1 fuzzy sets and interval type-2 fuzzy sets are briefly reviewed. In Section 3, some recent literature on fuzzy self and peer assessment is reviewed. Some CWW applications are reviewed in section 4. In Section 5, the complete approach for peer assessment using Per-C is proposed. In Section 6, the proposed approach is validated on empirical examples. Finally, we concluded in section 7.

PRELIMINARIES
In recent years, type-2 fuzzy sets (T2FSs) have been considered by many researchers to implement systems with many uncertainties intrinsically, especially in design and implementation processes. Since using general type 2 fuzzy sets and interval type-2 fuzzy sets (IT2FSs) results in high computational cost and complexity, more attention has been paid. To make this paper more self-contained, in section 2.1, some definitions and basics regarding to type-1 fuzzy sets and interval type-2 fuzzy sets are briefly reviewed.

Definitions
Definition 1 (Chen & Lee, 2010): A type-1 fuzzy set A in the universe of discourse U is a normal type-1 fuzzy set iff ∃x ∈ U, such that maxx µA (x)=1, where µA denotes the membership function of type-1 fuzzy set A.
Definition 2 (Chen & Lee, 2010): A type-2 fuzzy set ̃ in the universe of discourse U can be represented by a type-2 membership function µ̃ , which is shown as follows: (1) Where Jx denotes an interval in [0, 1]. The type-2 fuzzy set µ̃ also can be represented as follows: Where Jx ⊆ [0, 1] and denotes the union overall admissible x and u. For discrete universes of discourse, ∫ is replaced by ∑.
In other words, Ã is described by two type-1 fuzzy sets (T1 FSs) and , whose membership functions ̃( ) and ̃( ) are called the lower membership function (LMF) and the upper membership function (UMF) of A, respectively (Figure1).
Definition 4 (Chaturvedi et al., 2017;Chen & Lee, 2010;Mendel & Rajati, 2014): An IT2FS, Ã , is described by its FOU, i.e., FOU (Ã) , where FOU (Ã) is described by its LMF and UMF of Ã, i.e., ̃, ̃ respectively, as shown in Figure 1. Both ̃ and ̃ are T1FSs, as follows: where ∪ is a set-theoretic union. Definition 5 (Mendel & Rajati, 2014): The FOU of a nine-parameter trapezoidal interval type-2 fuzzy set ̃ can be represented as: where ̃ and ̃ denote the UMF with the height 1 and LMF with the height ℎ of IT2 FS, respectively. Figure 2 shows the FOU of a trapezoidal IT2 FS with nine parameters. it should be noted that the UMF is determined by the parameters ( , , , ), and LMF is determined by ( , , , , h), where h is the height of the LMF. Figure 2. The FOU of a nine-parameter trapezoidal IT2 FS (Mendel & Rajati, 2014) Definition 6 (Wu & Mendel, 2007b): One of the most popular defuzzification methods for T1FSs is the centroid. The centroid is calculated as: A common defuzzification method for IT2FSs is the Center of Centroid (COC) which is the union of the centroids of all its embedded T1 FSs Ae of a given IT2FS. The centroid C(̃) of an IT2 FS ̃ is as follows: where ∪ is the union operation, and (̃) and (̃) are the minimu and maximum values of all centroids, respectively. (̃) and (̃) are computed efficiently using Enhanced Karnik-Mendel (EKM) algorithm (Wu & Mendel, 2009b). The COC of ̃ is then computed as the center of the centroid of ̃, ̃ as: Definition 7 (Wu & Mendel, 2009a): The Jaccard similarity measure (̃,̃) between two IT2 FSs ̃ and ̃ is defined as follows: For discrete universes of discourse, ∫ is replaced by ∑.
Definition 8 (Wu & Mendel, 2007a): The Linguistic Weighted Average (LWA) is defined as where ̃ and ̃, i=1,..,n are words modeled by IT2 FSs, ̃ is also an IT2 FS. Note that crisp numbers, intervals, and fuzzy sets are specific kinds of IT2 FSs, so any weighted average (WA) model containing these types can be computed using LWA.

FUZZY SELF AND PEER ASSESSMENT
In assessing a student's performance, we must assess his or her abilities, competencies, and skills. Since these concepts are ambiguous, the important question is how to model the assessors' opinions. Fuzzy logic can be useful for modeling human uncertainty, especially in decision situations. This is an efficient and effective way to display uncertainty and ambiguous terms in the evaluation environment. The use of fuzzy logic in student assessment makes it more flexible and reliable. (Biswas, 1995) proposed a new evaluation method using fuzzy set theory in education that used two methods of fuzzy evaluation and generalized fuzzy evaluation method to evaluate students' answer scripts. Using fuzzy sets, (Chen & Lee, 1999) proposed two methods to solve the Biswas' method, which gave two different fuzzy scores to students with the same total score.
In (Echauz & Vachtsevanos, 1995) a fuzzy grading system was proposed that converted traditional scores into letter grades.
Using a fuzzy rule-based system (FRBS), (Bai & Chen, 2008) suggested a new scheme to rank the students, considering the characteristics of questions like difficulty, importance, and complexity. A fuzzy set approach is proposed by (Ma & Zhou 2000) to assess student-centered learning outcomes using the evaluation of their peers and lecturer. (Wang & Chen, 2008) proposed a significantly more flexible and intelligent method for evaluating students' answer scripts using fuzzy numbers associated with the evaluator's degrees of confidence. (Wang & Chen, 2008) Wang and Chen noted that although the methods presented in (Chen & Lee, 1999) are much faster and fair in student evaluation, the methods have the following drawbacks: (1) they cannot deal with the situation where the evaluating values are represented by fuzzy numbers associated with degrees of confidence between zero and one and (2) they do not consider the degree of optimism of the evaluator in evaluating students' answer scripts. (Bai et al., 2009) proposed a method to automatically construct the grade MFs of fuzzy rules for evaluating student's learning achievement. (Saleh & Kim, 2009) proposed a system for evaluating student performance based on Mamdani's fuzzy inference engine and the center of gravity (COG) defuzzification technique.
A new fuzzy peer assessment methodology which is the synthesis of Per-C and a fuzzy ranking algorithm and considers vagueness and imprecision of words used throughout the evaluation process in a cooperative learning environment, is proposed in (Chai et al., 2015).   (Mendel, 2007)

APPLICATION OF PER-C
A specific architecture for CWW(Computing with Words) using interval type-2 fuzzy sets (IT2 FSs) was depicted in figure 3 called a Perceptual Computer-Per-C for short (Mendel, 2007) . The Per-C consists of three components: encoder, CWW engine, and decoder. Perceptions (i.e., granulated terms, words) activate the Per-C and are also output by the Per-C; so, a human can interact with the Per-C just using a vocabulary-words. Linguistic grades or words are converted to IT2FS via an encoder. The CWW engine collects the encoder outputs. The decoder maps the CWW engine's output into a recommendation, which can be word, rank, or class. This vocabulary is context-dependent that a human uses to interface with the Per-C and which the Per-C uses to communicate its findings back to the human in a user-friendly manner. Several research approaches have been proposed that for solving multi-person multiple-criteria decisionmaking problems with uncertainty and subjective evaluation data successfully implemented Per-C.
An energy management approach with awareness of user satisfaction, called "Computer Perceptual Power Management (Per-C PMA) Approach", is presented in (Muhuri, Gupta, & Mendel, 2018) based on perceptual computation technique. It processes user feedback in linguistic terms using perceptual computing to generate suitable recommendations for energy saving. , demonstrates the application of Per-C in the article reviewing process for a general journal. This Journal Publication Judgment Advisor assists editors in managing the review process. This software represents other distributed and hierarchical decision-making programs. A hierarchical decisionmaking framework for the evaluation of weapon systems was proposed in , which describes the perceptual computing approach for the missile-evaluation problem. (Safarzadegan Gilan, Sebt, & Shahhosseini, 2012) presents a CWW approach, based on the specific architecture of Perceptual Computer (Per-C) and the Linguistic Weighted Average (LWA), for competency-based selection of human resources in construction firms. (Acosta, Wu, & Forrest, 2010) highlights the particular benefits offered by Per-C and IT2FSs to evaluate environmental risks. Besides that, Per-C was used to solve a number of hierarchical group decision-making problems, such as selecting a suitable location for an international logistics center (Han & Mendel, 2012).

PEER ASSESSMENT: PER-C APPROACH
In the proposed approach, each presentation is assessed by all other peers. Students are scored based on an assessment questionnaire with twelve criteria listed in Table 1. We aimed at a presenter-oriented assessment. Therefore, the presenter's teaching abilities, the appropriate use of body language, and communication are included in the assessment process. The assessment questionnaire's Linguistic scores are: Excellent, Good, Average, Below Average, and Poor. The instructor can assign a predefined linguistic weight to each criterion. Similarly, each peer can be tagged with a predefined weight too. This section will discuss the details of the proposed approach for assessing a student's oral presentation based on the Per-C model.

Encoder
The encoder maps linguistic terms (words) into their corresponding IT2 FSs and leads to the generation of the codebook. A vocabulary of application-related words for designing the encoder is needed. Two different codebooks are needed, one for the words that will be used by the peers and another for the weights. These codebooks contain five and seven vocabularies of words, respectively. It should be noted that the codebook's vocabularies have been selected based on consultations with domain experts. For generating the codebooks, students were asked to determine the endpoints of the interval associated with each qualitative phrase. Then, each word's collected intervals are used to create the FOU by using the EIA method. These FOUs, along with their words, are stored as a codebook which is required for the CWW engine and decoder. See (Wu, Mendel, & Coupland, 2012) for more details on how to process data and map each word to an IT2 FS. The word names and associated FOUs are illustrated in Figures 4 and 5. Subject of presentation Criteria2 How to express content Criteria3 Classroom management Criteria4 Observance of etiquette and movements Criteria5 Ability to attract the audience Criteria6 Proportion of topic and audience Criteria7 Ability to attract the audience Criteria8 Timing Criteria9 Innovation in speech Criteria10 Ability to answer questions Criteria11 Technical and general knowledge on the topic Criteria12 Use of technological presentation materials Figure 4. Linguistic terms of weights (extremely insignificant, extremely low, so insignificant, insignificant, relatively insignificant, very low, low, relatively low, medium, high, relatively high, very high, extremely high)

Computing-with-words
Since all weights are modeled with IT2 FSs, the CWW engine uses the Linguistic weighted average(LWA). To aggregate assessments, we first calculate each peer's assessments independently using (11) to obtain each peer assessment. Then the assessments of all peers are aggregated, and the final answer is obtained.
where ̃, i denotes the linguistic score from student r given to the specified student and i indicates weight for criteria i and i denotes the specific criterion (i.e., i = 1, . . ., 12). As an example ̃2 ,1 is the linguistic grade given to the first specific criterion by Student# 2. Similarly, (11) is also used to obtain the aggregated results for other students. The results from all students are aggregated using (12).
where ̃ indicates the weight of student r. In this paper, all students are equally weighted. Figure 6 illustrates how ỹ r and Ỹ R is calculated.

Decoder
In this research, the decoder plays two roles. First of all, mapping the CWW Engine output FOUs (represented in IT2FS ỸR) to the recommendation in terms of words, and the second is ranking the aggregated outcomes of students. The former attempts to map the CWW Engine outputs, i.e., ỸR, into the words with Jaccard similarity indicator using (9) (Wu & Mendel, 2009a).
To rank the aggregated outcomes of students, we use the average centroid ranking method (Wu & Mendel, 2009a) for ranking IT2 FSs based on their center of centroids using (8), which ranks IT2 FSs based on their average centroids.

RESULTS AND DISCUSSION
A simulation study was conducted with data from the Research and Technical Presentation course within the bachelor of computer engineering students. The task was a 20 minute (plus 5 minutes for questions) oral presentation in a class of 26 students. For more effective peer assessment, a website was designed that allowed students to submit their linguistic scores at a convenient time (in Persian). For peer assessment, after each presentation, students had the opportunity to express their views for Normally, within two days after the presentation. The peer-review interface includes an assessment questionnaire which the teacher designs. The assessment questionnaire has twelve items, which are described in Table 1.
To perform computation by the CWW engine, the weight of the criteria and peers are first determined by the teacher. The teacher determines the Linguistic weight of the criteria in Table 2-a. The weight of the peers in this simulation is assumed to be the same. The calculated results for 26 students can be seen in Table 2-b. It is also possible to map scores into linguistic words. The outputs are represented as IT2FS, the COC, and the linguistic words obtained using (9) in Figure 7. Students can be ranked based on COC using (8 The noteworthy point in these grades is that, for example, the grades may be very close and similar in the conventional method, and as a result, it makes the ranking difficult. While in the proposed method their scores are 17.672 and 17.2563, respectively, which are comparable.

CONCLUSIONS
In this paper, a peer review method in an online oral presentation using Per-C is proposed. The Linguistic Weighted Average(LWA) and the ranking method based on the COC method are used in the proposed method. Per-C is also able to recommend a linguistic word equivalent to a score using Jaccard similarity.
A real case study has been conducted on an engineering course (e.g., research and technical presentation course). The results of conventional and proposed methodologies are compared, analyzed, and discussed in detail. The positive results indicate that the proposed method is a potential solution for performing fuzzy peer assessment tasks in shared learning environments. The proposed method has also been shown to be a potentially useful solution to the shortcomings associated with the use of crisp numbers in peer assessment. In our research, a small population of students is used as a case study to prove the effectiveness of the proposed method in peer assessment, which can be assessed through a comprehensive study with a larger population. Also, the weight of the criteria is determined by the teacher. The assessment questionnaires can be prepared in such a way that each peer can determine the desired weight for each criterion.