Academic Plagiarism Detection: A Systematic Literature Review Academic Plagiarism Detection: A Systematic Literature Review

ACM Comput. Surv., Vol. 52, No. 6, Article 112, Publication date: October 2019. DOI: https://doi.org/10.1145/3345317

This article summarizes the research on computational methods to detect academic plagiarism by systematically reviewing 239 research papers published between 2013 and 2018. To structure the presentation of the research contributions, we propose novel technically oriented typologies for plagiarism prevention and detection efforts, the forms of academic plagiarism, and computational plagiarism detection methods. We show that academic plagiarism detection is a highly active research field. Over the period we review, the field has seen major advances regarding the automated detection of strongly obfuscated and thus hard-to-identify forms of academic plagiarism. These improvements mainly originate from better semantic text analysis methods, the investigation of non-textual content features, and the application of machine learning. We identify a research gap in the lack of methodologically thorough performance evaluations of plagiarism detection systems. Concluding from our analysis, we see the integration of heterogeneous analysis methods for textual and non-textual content features using machine learning as the most promising area for future research contributions to improve the detection of academic plagiarism further.

ACM Reference format: Tomáš Foltýnek, Norman Meuschke, and Bela Gipp. 2019. Academic Plagiarism Detection: A Systematic Literature Review. ACM Comput. Surv. 52, 6, Article 112 (October 2019), 42 pages. https://doi.org/10.1145/3345317

INTRODUCTION

Academic plagiarism is one of the severest forms of research misconduct (a “cardinal sin”) [ 14 ] and has strong negative impacts on academia and the public. Plagiarized research papers impede the scientific process, e.g., by distorting the mechanisms for tracing and correcting results. If researchers expand or revise earlier findings in subsequent research, then papers that plagiarized the original paper remain unaffected. Wrong findings can spread and affect later research or practical applications [ 90 ]. For example, in medicine or pharmacology, meta-studies are an important tool to assess the efficacy and safety of medical drugs and treatments. Plagiarized research papers can skew meta-studies and thus jeopardize patient safety [ 65 ].

Furthermore, academic plagiarism wastes resources. For example, Wager [ 261 ] quotes a journal editor stating that 10% of the papers submitted to the respective journal suffered from plagiarism of an unacceptable extent. In Germany, the ongoing crowdsourcing project VroniPlag 1 has investigated more than 200 cases of alleged academic plagiarism (as of July 2019). Even in the best case, i.e., if the plagiarism is discovered, reviewing and punishing plagiarized research papers and grant applications still causes a high effort for the reviewers, affected institutions, and funding agencies. The cases reported in VroniPlag showed that investigations into plagiarism allegations often require hundreds of work hours from affected institutions.

If plagiarism remains undiscovered, then the negative effects are even more severe. Plagiarists can unduly receive research funds and career advancements as funding agencies may award grants for plagiarized ideas or accept plagiarized research papers as the outcomes of research projects. The artificial inflation of publication and citation counts through plagiarism can further aggravate the problem. Studies showed that some plagiarized papers are cited at least as often as the original [ 23 ]. This phenomenon is problematic, since citation counts are widely used indicators of research performance, e.g., for funding or hiring decisions.

From an educational perspective, academic plagiarism is detrimental to competence acquisition and assessment. Practicing is crucial to human learning. If students receive credit for work done by others, then an important extrinsic motivation for acquiring knowledge and competences is reduced. Likewise, the assessment of competence is distorted, which again can result in undue career benefits for plagiarists.

The problem of academic plagiarism is not new but has been present for centuries. However, the rapid and continuous advancement of information technology (IT), which offers convenient and instant access to vast amounts of information, has made plagiarizing easier than ever. At the same time, IT also facilitated the detection of academic plagiarism. As we present in this article, hundreds of researchers address the automated detection of academic plagiarism and publish hundreds of research papers a year.

The high intensity and rapid pace of research on academic plagiarism detection make it difficult for researchers to get an overview of the field. Published literature reviews alleviate the problem by summarizing previous research, critically examining contributions, explaining results, and clarifying alternative views [ 212 , 40 ]. Literature reviews are particularly helpful for young researchers and researchers who newly enter a field. Often, these two groups of researchers contribute new ideas that keep a field alive and advance the state of the art.

In 2013, we provided a first descriptive review of the state of the art in academic plagiarism detection [ 160 ]. Given the rapid development of the field, we see the need for a follow-up study to summarize the research since 2013. Therefore, this article provides a systematic qualitative literature review [ 187 ] that critically evaluates the capabilities of computational methods to detect plagiarism in academic documents and identifies current research trends and research gaps.

The literature review at hand answers the following research questions:

  • Did researchers propose conceptually new approaches for this task?
  • Which improvements to existing detection methods have been reported?
  • Which research gaps and trends for future research are observable in the literature?

To answer these questions, we organize the remainder of this article as follows. The section Methodology describes our procedure and criteria for data collection. The following section, Related Literature Reviews , summarizes the contributions of our compared to topically related reviews published since 2013. The section Overview of the Research Field describes the major research areas in the field of academic plagiarism detection. The section Definition and Typology of Plagiarism introduces our definition and a three-layered model for addressing plagiarism (methods, systems, and policies). The section Review of Plagiarism Typologies synthesizes the classifications of plagiarism found in the literature into a technically oriented typology suitable for our review. The section Plagiarism Detection Methods is the core of this article. For each class of computational plagiarism detection methods, the section provides a description and an overview of research papers that employ the method in question. The section Plagiarism Detection Systems discusses the application of detection methods in plagiarism detection systems. The Discussion section summarizes the advances in plagiarism detection research and outlines open research questions.

METHODOLOGY

To collect the research papers included in our review, we performed a keyword-based automated search [ 212 ] using Google Scholar and Web of Science. We limited the search period to 2013 until 2018 (including). However, papers that introduced a novel concept or approach often predate 2013. To ensure that our survey covers all relevant primary literature, we included such seminal papers regardless of their publication date.

Google Scholar indexes major computer science literature databases, including IEEE Xplore, ACM Digital Library, ScienceDirect, SpringerLink, and TandFonline, as well as grey literature. Fagan [ 68 ] provides an extensive list of “ recent studies [that] repeatedly find that Google Scholar's coverage meets or exceeds that of other search tools, no matter what is identified by target samples, including journals, articles, and citations ” [ 68 ]. Therefore, we consider Google Scholar as a meta-database that meets the search criteria recommended in the guidelines for conducting systematic literature reviews [ 40 , 137 ]. Using Google Scholar also addresses the “lack of conformity, especially in terms of searching facilities, across commonly used digital libraries,” which Brereton et al. [ 40 ] identified as a hindrance to systematic literature reviews in computer science.

Criticism of using Google Scholar for literature research includes that the system's relevance ranking assigns too much importance to citation count [ 68 ], i.e., the number of citations a paper receives. Moreover, Google Scholar covers predatory journals [ 31 ]. Most guidelines for systematic reviews, therefore, recommend using additional search tools despite the comprehensive coverage of Google Scholar [ 68 ]. Following this recommendation, we additionally queried Web of Science. Since we seek to cover the most influential papers on academic plagiarism detection, we consider a relevance ranking based on citation counts as an advantage rather than a disadvantage. Hence, we used the relevance ranking of Google Scholar and ranked search results from Web of Science by citation count. We excluded all papers (11) that appeared in venues mentioned in Beall's List of Predatory Journals and Publishers . 2

Our procedure for paper collection consisted of the five phases described hereafter. We reviewed the first 50 search results when using Google Scholar and the first 150 search results when using Web of Science.

In the first phase , we sought to include existing literature reviews on plagiarism detection for academic documents. Therefore, we queried Google Scholar using the following keywords: plagiarism detection literature review, similarity detection literature review, plagiarism detection state of art, similarity detection state of art, plagiarism detection survey, similarity detection survey .

In the second phase , we added topically related papers using the following rather general keywords: plagiarism, plagiarism detection, similarity detection, extrinsic plagiarism detection, external plagiarism detection, intrinsic plagiarism detection, internal plagiarism detection .

After reviewing the papers retrieved in the first and second phases, we defined the structure of our review and adjusted the scope of our data collection as follows:

  • We focused our search on plagiarism detection for text documents and hence excluded papers addressing other tasks, such as plagiarism detection for source code or images. We also excluded papers focusing on corpora development.
  • We excluded papers addressing policy and educational issues related to plagiarism detection to sharpen the focus of our review on computational detection methods.

Having made these adjustments to our search strategy, we started the third phase of the data collection. We queried Google Scholar with the following keywords related to specific sub-topics of plagiarism detection, which we had identified as important during the first and second phases: semantic analysis plagiarism detection, machine-learning plagiarism detection .

In the fourth phase , we sought to prevent selection bias from exclusively using Google Scholar by querying Web of Science using the keyword plagiarism detection .

In the fifth phase , we added to our dataset papers from the search period that are topically related to papers we had already collected. To do so, we included relevant references of collected papers and papers that publishers’ systems recommended as related to papers in our collection. Following this procedure, we included notebook papers of the annual PAN and SemEval workshops. To ensure the significance of research contributions, we excluded papers that were not referenced in the official overview papers of the PAN and SemEval workshops or reported results below the baseline provided by the workshop organizers. For the same reason, we excluded papers that do not report experimental evaluation results.

To ensure the consistency of paper processing, the first author read all papers in the final dataset and recorded the paper's key content in a mind map. All authors continuously reviewed, discussed, and updated the mind map. Additionally, we maintained a spreadsheet to record the key features of each paper (task, methods, improvements, dataset, results, etc.).

Table 1 and Table 2 list the numbers of papers retrieved and processed in each phase of the data collection.

1) Google Scholar: reviews 66 28 38 38
2) Google Scholar: related papers 143 54 89 23 104
3) Google Scholar: sub-topics 49 42 111
4) Web of Science 134 82 52 35 128
5) Processing stage 126 126 254
Papers identified by keyword-based automated search 128
Papers collected through references and automated recommendations 126
Inaccessible papers 3
Excluded papers 12
- Reviews and general papers 35
- Papers containing experiments (included in overview tables) 204
– Extrinsic PD 136
– Intrinsic PD 67
– Both extrinsic and intrinsic PD 1

Methodological Risks

The main risks for systematic literature reviews are incompleteness of the collected data and deficiencies in the selection, structure, and presentation of the content.

We addressed the risk of data incompleteness mainly by using two of the most comprehensive databases for academic literature—Google Scholar and Web of Science. To achieve the best possible coverage, we queried the two databases with keywords that we gradually refined in a multi-stage process, in which the results of each phase informed the next phase. By including all relevant references of papers that our keyword-based search had retrieved, we leveraged the knowledge of domain experts, i.e., the authors of research papers and literature reviews on the topic, to retrieve additional papers. We also included the content-based recommendations provided by the digital library systems of major publishers, such as Elsevier and ACM. We are confident that this multi-faceted and multi-stage approach to data collection yielded a set of papers that comprehensively reflects the state of the art in detecting academic plagiarism.

To mitigate the risk of subjectivity regarding the selection and presentation of content, we adhered to best practice guidelines for conducting systematic reviews and investigated the taxonomies and structure put forward in related reviews. We present the insights of the latter investigation in the following section.

RELATED LITERATURE REVIEWS

Table 3 lists related literature reviews in chronological order and categorized according to (i) the plagiarism detection (PD) tasks the review covers (PD for text documents, PD for source code, other PD tasks), (ii) whether the review includes descriptions or evaluations of productive plagiarism detection systems, and (iii) whether the review addresses policy issues related to plagiarism and academic integrity. All reviews are “narrative” according to the typology of Pare et al. [ 187 ]. Two of the reviews (References [ 61 ] and [ 48 ]) cover articles that appeared at venues included in Beall's List of Predatory Journals and Publishers .

Meuschke and Gipp [ ] YES NO NO YES NO
Chong [ ] YES NO NO NO NO
Eisa et al. [ ] YES NO YES NO NO
Agarwal and Sharma [ ] YES YES NO YES NO
Chowdhury et al. [ ] YES YES NO YES NO
Kanjirangat and Gupta [ ] YES YES NO YES NO
Velasquez et al. [ ] YES NO NO YES YES
Hourrane and Benlahmar [ ] YES NO NO NO NO

Our previous review article [ 160 ] surveyed the state of the art in detecting academic plagiarism, presented plagiarism detection systems, and summarized evaluations of their detection effectiveness. We outlined the limitations of text-based plagiarism detection methods and suggested that future research should focus on semantic analysis approaches that also include non-textual document features, such as academic citations.

The main contribution of Chong [ 47 ] is an extensive experimental evaluation of text preprocessing methods as well as shallow and deep NLP techniques. However, the paper also provides a sizable state-of-the-art review of plagiarism detection methods for text documents.

Eisa et al. [ 61 ] defined a clear methodology and meticulously followed it but did not include a temporal dimension. Their well-written review provides comprehensive descriptions and a useful taxonomy of features and methods for plagiarism detection. The authors concluded that future research should consider non-textual document features, such as equations, figures, and tables.

Agarwal and Sharma [ 8 ] focused on source code PD but also gave a basic overview of plagiarism detection methods for text documents. Technologically, source code PD and PD for text are closely related, and many plagiarism detection methods for text can also be applied for source code PD [ 57 ].

Chowdhury et al. [ 48 ] provided a comprehensive list of available plagiarism detection systems.

Kanjirangat and Gupta [ 251 ] summarized plagiarism detection methods for text documents that participated in the PAN competitions and compared four plagiarism detection systems.

Velasquez et al. [ 256 ] proposed a new plagiarism detection system but also provided an extensive literature review that includes a typology of plagiarism and an overview of six plagiarism detection systems.

Hourrane and Benlahmar [ 114 ] described individual research papers in detail but did not provide an abstraction of the presented detection methods.

The literature review at hand extends and improves the reviews outlined in Table 3 as follows:

  • We include significantly more papers than other reviews.
  • Our literature survey is the first that analyses research contributions during a specific period to provide insights on the most recent research trends.
  • Our review is the first that adheres to the guidelines for conducting systematic literature surveys.
  • We introduce a three-layered conceptual model to describe and analyze the phenomenon of academic plagiarism comprehensively.

OVERVIEW OF THE RESEARCH FIELD

The papers we retrieved during our research fall into three broad categories: plagiarism detection methods, plagiarism detection systems , and plagiarism policies . Ordering these categories by the level of abstraction at which they address the problem of academic plagiarism yields the three-layered model shown in Figure 1 . We propose this model to structure and systematically analyze the large and heterogeneous body of literature on academic plagiarism.

Fig. 1.

Layer 1: Plagiarism detection methods subsumes research that addresses the automated identification of potential plagiarism instances. Papers falling into this layer typically present methods that analyze textual similarity at the lexical, syntactic, and semantic levels, as well as similarity of non-textual content elements, such as citations, figures, tables, and mathematical formulae. To this layer, we also assign papers that address the evaluation of plagiarism detection methods, e.g., by providing test collections and reporting on performance comparisons. The research contributions in Layer 1 are the focus of this survey.

Layer 2: Plagiarism detection systems encompasses applied research papers that address production-ready plagiarism detection systems, as opposed to the research prototypes that are typically presented in papers assigned to Layer 1. Production-ready systems implement the detection methods included in Layer 1, visually present detection results to the users and should be able to identify duly quoted text. Turnitin LLC is the market leader for plagiarism detection services. The company's plagiarism detection system Turnitin is most frequently cited in papers included in Layer 2 [ 116 , 191 , 256 ].

Layer 3: Plagiarism policies subsumes papers that research the prevention, detection, prosecution, and punishment of plagiarism at educational institutions. Typical papers in Layer 3 investigate students’ and teachers’ attitudes toward plagiarism (e.g., Reference [ 75 ]), analyze the prevalence of plagiarism at institutions (e.g., Reference [ 50 ]), or discuss the impact of institutional policies (e.g., Reference [ 183 ]).

The three layers of the model are interdependent and essential to analyze the phenomenon of academic plagiarism comprehensively. Plagiarism detection systems (Layer 2) depend on reliable detection methods (Layer 1), which in turn would be of little practical value without production-ready systems that employ them. Using plagiarism detection systems in practice would be futile without the presence of a policy framework (Layer 3) that governs the investigation, documentation, prosecution, and punishment of plagiarism. The insights derived from analyzing the use of plagiarism detection systems in practice (Layer 3) also inform the research and development efforts for improving plagiarism detection methods (Layer 1) and plagiarism detection systems (Layer 2).

Continued research in all three layers is necessary to keep pace with the behavior changes that are a typical reaction of plagiarists when being confronted with an increased risk of discovery due to better detection technology and stricter policies. For example, improved plagiarism detection capabilities led to a rise in contract cheating, i.e., paying ghostwriters to produce original works that the cheaters submit as their own [ 177 ]. Many researchers agree that counteracting these developments requires approaches that integrate plagiarism detection technology with plagiarism policies.

Originally, we intended to survey the research in all three layers. However, the extent of the research fields is too large to cover all of them in one survey comprehensively. Therefore, the curr- ent article surveys plagiarism detection methods and systems. A future survey will cover the research on plagiarism policies.

DEFINITION AND TYPOLOGY OF PLAGIARISM

In accordance with Fishman, we define academic plagiarism as the use of ideas, content, or structures without appropriately acknowledging the source to benefit in a setting where originality is expected [ 279 ]. We used a nearly identical definition in our previous survey [ 160 ], because it describes the full breadth of the phenomenon. The definition includes all forms of intellectual contributions in academic documents regardless of their presentation, e.g., text, figures, tables, and mathematical formulae, and their origin. Other definitions of academic plagiarism often include the notion of theft (e.g., References [ 13 , 38 , 116 , 146 , 188 , 274 , 252 ]), i.e., require intent and limit the scope to reusing the content of others. Our definition also includes self-plagiarism, unintentional plagiarism, and plagiarism with the consent of the original author.

Review of Plagiarism Typologies

Aside from a definition, a typology helps to structure the research and facilitates communication on a phenomenon [ 29 , 261 ]. Researchers proposed a variety of typologies for academic plagiarism. Walker [ 263 ] coined a typology from a plagiarist's point of view, which is still recognized by contemporary literature [ 51 ]. Walker's typology distinguishes between:

  • Sham paraphrasing ( presenting copied text as a paraphrase by leaving out quotations )
  • Illicit paraphrasing
  • Other plagiarism ( plagiarizing with the original author's consent )
  • Verbatim copying ( without reference )
  • Recycling ( self-plagiarism )
  • Ghostwriting
  • Purloining ( copying another student's assignment without consent )

All typologies we encountered in our research categorize verbatim copying as one form of academic plagiarism. Alfikri and Ayu Purwarianti [ 13 ] additionally distinguished as separate forms of academic plagiarism the partial copying of smaller text segments, two forms of paraphrasing that differ regarding whether the sentence structure changes and translations. Velasquez et al. [ 256 ] distinguished verbatim copying and technical disguise, combined paraphrasing and translation into one form, and categorized the deliberate misuse of references as a separate form. Weber-Wulff [ 265 ] and Chowdhury and Bhattacharyya [ 48 ] likewise categorized referencing errors as a form of plagiarism. Many authors agreed on classifying idea plagiarism as a separate form of plagiarism [ 47 , 48 , 114 , 179 , 252 ]. Mozgovoy et al. [ 173 ] presented a typology that consolidates other classifications into five forms of academic plagiarism:

  • Verbatim copying
  • Hiding plagiarism instances by paraphrasing
  • Technical tricks exploiting weaknesses of current plagiarism detection systems
  • Deliberately inaccurate use of references
  • Tough plagiarism

“Tough plagiarism” subsumes the forms of plagiarism that are difficult to detect for both humans and computers, like idea plagiarism, structural plagiarism, and cross-language plagiarism [ 173 ].

The typology of Eisa et al. [ 61 ], which originated from a typology by Alzahrani et al. [ 21 ], distinguishes only two forms of plagiarism: literal plagiarism and intelligent plagiarism . Literal plagiarism encompasses near copies and modified copies, whereas intelligent plagiarism includes paraphrasing, summarization, translation, and idea plagiarism.

Our Typology of Plagiarism

Since we focus on reviewing plagiarism detection technology, we exclusively consider technical properties to derive a typology of academic plagiarism forms. From a technical perspective, several distinctions that are important from a policy perspective are irrelevant or at least less important. Technically irrelevant properties of plagiarism instances are whether:

  • the original author permitted to reuse content;
  • the suspicious document and its potential source have the same author(s), i.e., whether similarities in the documents’ content may constitute self-plagiarism.
  • how much of the content represents potential plagiarism;
  • whether a plagiarist uses one or multiple sources. Detecting compilation plagiarism (also referred to as shake-and-paste, patch-writing, remix, mosaic or mash-up) is impossible at the document level but requires an analysis on the level of paragraphs or sentences.

Both properties are of little technical importance, since similar methods are employed regardless of the extent of plagiarism and whether it may originate from one or multiple source documents.

Our typology of academic plagiarism derives from the generally accepted layers of natural language: lexis, syntax, and semantics. Ultimately, the goal of language is expressing ideas [ 96 ]. Therefore, we extend the classic three-layered language model to four layers and categorize plagiarism forms according to the language layer they affect. We order the resulting plagiarism forms increasingly by their level of obfuscation:

  • Literal plagiarism (copy and paste)
  • Possibly with mentioning the source
  • Technical disguise
  • Synonym substitution
  • Translation
  • Paraphrase (mosaic, clause quilts)
  • Structural plagiarism
  • Using concepts and ideas only

Characters-preserving plagiarism includes, aside from verbatim copying, plagiarism forms in which sources are mentioned, like “pawn sacrifice” and “cut and slide” [ 265 ]. Syntax-preserving plagiarism often results from employing simple substitution techniques, e.g., using regular expressions. Basic synonym substitution approaches operate in the same way; however, employing more sophisticated substitution methods has become typical. Semantics-preserving plagiarism refers to sophisticated forms of obfuscation that involve changing both the words and the sentence structure but preserve the meaning of passages. In agreement with Velasquez et al. [ 256 ], we consider translation plagiarism as a semantics-preserving form of plagiarism, since a translation can be seen as the ultimate paraphrase. In the section devoted to semantics-based plagiarism detection methods, we will also show a significant overlap in the methods for paraphrase detection and cross-language plagiarism detection. Idea-preserving plagiarism (also referred to as template plagiarism or boilerplate plagiarism) includes cases in which plagiarists use the concept or structure of a source and describe it entirely in their own words. This form of plagiarism is difficult to identify and even harder to prove. Ghostwriting [ 47 , 114 ] describes the hiring of a third party to write genuine text [ 50 , 263 ]. It is the only form of plagiarism that is undetectable by comparing a suspicious document to a likely source. Currently, the only technical option for discovering potential ghostwriting is to compare stylometric features of a possibly ghost-written document with documents certainly written by the alleged author.

PLAGIARISM DETECTION APPROACHES

Conceptually, the task of detecting plagiarism in academic documents consists of locating the parts of a document that exhibit indicators of potential plagiarism and subsequently substantiating the suspicion through more in-depth analysis steps [ 218 ]. From a technical perspective, the literature distinguishes the following two general approaches to plagiarism detection.

The extrinsic plagiarism detection approach compares suspicious documents to a collection of documents assumed to be genuine (reference collection) and retrieves all documents that exhibit similarities above a threshold as potential sources [ 252 , 235 ].

The intrinsic plagiarism detection approach exclusively analyzes the input document, i.e., does not perform comparisons to documents in a reference collection. Intrinsic detection methods employ a process known as stylometry to examine linguistic features of a text [ 90 ]. The goal is to identify changes in writing style, which the approach considers as indicators for potential plagiarism [ 277 ]. Passages with linguistic differences can become the input for an extrinsic plagiarism analysis or be presented to human reviewers. Hereafter, we describe the extrinsic and intrinsic approaches to plagiarism detection in more detail.

Extrinsic Plagiarism Detection

The reference collection to which extrinsic plagiarism detection approaches compare the suspicious document is typically very large, e.g., a significant subset of the Internet for production-ready plagiarism detection systems. Therefore, pairwise comparisons of the input document to all documents in the reference collection are often computationally infeasible. To address this challenge, most extrinsic plagiarism detection approaches consist of two stages: candidate retrieval (also called source retrieval) and detailed analysis (also referred to as text alignment) [ 197 ]. The candidate retrieval stage efficiently limits the collection to a subset of potential source documents. The detailed analysis stage then performs elaborate pairwise document comparisons to identify parts of the source documents that are similar to parts of the suspicious document.

Candidate Retrieval.  Given a suspicious input document and a querying tool, e.g., a search engine or database interface, the task in the candidate retrieval stage is to retrieve from the reference collection all documents that share content with the input document [ 198 ]. Many plagiarism detection systems use the APIs of Web search engines instead of maintaining own reference collections and querying tools.

Recall is the most important performance metric for the candidate retrieval stage of the extrinsic plagiarism detection process, since the subsequent detailed analysis cannot identify source documents missed in the first stage [ 105 ]. The number of queries issued is another typical metric to quantify the performance in the candidate retrieval stage. Keeping the number of queries low is particularly important if the candidate retrieval approach involves Web search engines, since such engines typically charge for issuing queries.

Detailed Analysis.  The set of documents retrieved in the candidate retrieval stage is the input to the detailed analysis stage. Formally, the task in the detailed analysis stage is defined as follows. Let d q be a suspicious document. Let $D = \lbrace {{d_s}} \rbrace\;|\;s = 1 \ldots n$ be a set of potential source documents. Determine whether a fragment ${s_q} \in {d_q}$ is similar to a fragment $s \in {d_s}$ ( ${d_s} \in D$ ) and identify all such pairs of fragments $( {{s_q},\;s} )$ [ 202 ]. Eventually, an expert should determine whether the identified pairs $( {{s_q},\;s} )$ constitute legitimate content re-use, plagiarism, or false positives [ 29 ]. The detailed analysis typically consists of three steps [ 197 ]:

  • Seeding : Finding parts of the content in the input document (the seed) within a document of the reference collection
  • Extension : Extending each seed as far as possible to find the complete passage that may have been reused
  • Filtering : Excluding fragments that do not meet predefined criteria (e.g., that are too short), and handling of overlapping passages

The most common strategy for the extension step is the so-called rule-based approach. The approach merges seeds if they occur next to each other in both the suspicious and the source document and if the size of the gap between the passages is below a threshold [ 198 ].

Paraphrase Identification is often a separate step within the detailed analysis stages of extrinsic plagiarism detection methods but also a research field on its own. The task in paraphrase identification is determining semantically equivalent sentences in a set of sentences [ 71 ]. SemEval is a well-known conference series that addresses paraphrase identification for tweets [ 9 , 222 ]. Identifying semantically equivalent tweets is more difficult than identifying semantically equivalent sentences in academic documents due to out-of-vocabulary words, abbreviations, and slang terms that are frequent in tweets [ 24 ]. Al-Samadi et al. [ 9 ] provided a thorough review of the research on paraphrase identification.

Intrinsic Plagiarism Detection

The concept of intrinsic plagiarism detection was introduced by Meyer zu Eissen and Stein [ 277 ]. Whereas extrinsic plagiarism detection methods search for similarities across documents, intrinsic plagiarism detection methods search for dissimilarities within a document. A crucial presumption of the intrinsic approach is that authors have different writing styles that allow identifying the authors. Juola provides a comprehensive overview of stylometric methods to analyze and quantify writing style [ 127 ].

Intrinsic plagiarism detection consists of two tasks [ 200 , 233 ]:

  • Style breach detection : Delineating passages with different writing styles
  • Author identification : Identifying the author of documents or passages

Author identification furthermore subsumes two specialized tasks:

  • Author clustering : Grouping documents or passages by authorship
  • Author verification : Deciding whether an input document was authored

by the same person as a set of sample documents

Style Breach Detection.  Given a suspicious document, the goal of style-breach detection is identifying passages that exhibit different stylometric characteristics [ 233 ].

Most of the algorithms for style breach detection follow a three-step process [ 214 ]:

  • Text segmentation based on paragraphs, (overlapping) sentences, character or word n-grams
  • Feature space mapping , i.e., computing stylometric measures for segments
  • Clustering segments according to observed critical values

Author Clustering typically follows the style breach detection stage and employs pairwise comparisons of passages identified in the previous stage to group them by author [ 247 ]. For each pair of passages, a similarity measure is computed that considers the results of the feature space mapping in the style-breach detection stage. Formally, for a given set of documents or passages D , the task is to find the decomposition of this set ${D_1},\;{D_2},\ldots Dn$ , such that:

  • $D = {\rm{U}}_{i = 1}^n{D_i}$
  • ${D_i} \cap {D_j} = \emptyset $ for each $i \ne j$
  • All documents of the same class have the same author;

For each pair of documents from different classes, the authors are different.

Author Verification is typically defined as the prediction of whether two pieces of text were written by the same person. In practice, author verification is a one-class classification problem [ 234 ] that assumes all documents in a set have the same author. By comparing the writing style at the document level, outliers can be detected that may represent plagiarized documents. This method can reveal ghostwriting [ 127 ], unless the same ghost-writer authored all documents in the set.

Author Identification (also referred to as author classification), takes multiple document sets as input. Each set of documents must have been written verifiably by a single author. The task is assigning documents with unclear authorship to the stylistically most similar document set. Each authorship identification problem, for which the set of candidate authors is known, is easily transformable into multiple authorship verification problems [ 128 ]. An open-set variant of the author identification problem allows for a suspicious document with an author that is not included in any of the input sets [ 234 ].

Several other stylometry-based tasks, e.g., author profiling, exist. However, we limit the descriptions in the next section to methods whose main application is plagiarism detection. We recommend readers interested in related tasks to refer to the overview paper of PAN’17 [ 200 ].

PLAGIARISM DETECTION METHODS

We categorize plagiarism detection methods and structure their description according to our typology of plagiarism. Lexical detection methods exclusively consider the characters in a document. Syntax-based detection methods consider the sentence structure, i.e., the parts of speech and their relationships. Semantics-based detection methods compare the meaning of sentences, paragraphs, or documents. Idea-based detection methods go beyond the analysis of text in a document by considering non-textual content elements like citations, images, and mathematical content. Before presenting details on each class of detection methods, we describe preprocessing strategies that are relevant for all classes of detection methods.

Preprocessing

The initial preprocessing steps applied as part of plagiarism detection methods typically include document format conversions and information extraction. Before 2013, researchers described the extraction of text from binary document formats like PDF and DOC as well as from structured document formats like HTML and DOCX in more details than in more recent years (e.g., Refer- ence [ 49 ]). Most research papers on text-based plagiarism detection methods we review in this article do not describe any format conversion or text extraction procedures. We attribute this development to the technical maturity of text extraction approaches. For plagiarism detection approaches that analyze non-textual content elements, e.g., academic citations and references [ 90 , 91 , 161 , 191 ], images [ 162 ], and mathematical content [ 163 , 165 ], document format conversion, and information extraction still present significant challenges.

Specific preprocessing operations heavily depend on the chosen approach. The aim is to remove noise while keeping the information required for the analysis. For text-based detection methods, typical preprocessing steps include lowercasing, punctuation removal, tokenization, segmentation, number removal or number replacement, named entity recognition, stop words removal, stemming or lemmatization, Part of Speech (PoS) tagging, and synset extension. Approaches employing synset extension typically employ thesauri like WordNet [ 69 ] to assign the identifier of the class of synonymous words to which a word in the text belongs. The synonymous words can then be considered for similarity calculation. Detection methods operating on the lexical level usually perform chunking as a preprocessing step. Chunking groups text elements into sets of given lengths, e.g., word n-grams, line chunks, or phrasal constituents in a sentence [ 47 ].

Some detection approaches, especially in intrinsic plagiarism detection, limit preprocessing to a minimum to not loose potentially useful information [ 9 , 67 ]. For example, intrinsic detection methods typically do not remove punctuation.

All preprocessing steps we described represent standard procedures in Natural Language Processing (NLP), hence well-established, publicly available software libraries support these steps. The research papers we reviewed predominantly used the multilingual and multifunctional text processing pipelines Natural Language Toolkit Kit (Python) or Stanford CoreNLP library (Java). Commonly applied syntax analysis tools include Penn Treebank, 3 Citar, 4 TreeTagger, 5 and Stanford parser. 6 Several papers present resources for Arabic [ 33 , 34 , 227 ] and Urdu [ 54 ] language processing.

Lexical Detection Methods

Lexical detection methods exclusively consider the characters in a text for similarity computation. The methods are best suited for identifying copy-and-paste plagiarism that exhibits little to no obfuscation. To detect obfuscated plagiarism, the lexical detection methods must be combined with more sophisticated NLP approaches [ 9 , 67 ]. Lexical detection methods are also well-suited to identify homoglyph substitutions, which are a common form of technical disguise. The only paper in our collection that addressed the identification of technically disguised plagiarism is Refer- ence [ 19 ]. The authors used a list of confusable Unicode characters and applied approximate word n-gram matching using the normalized Hamming distance.

Lexical detection approaches typically fall into one of the three categories we describe in the following: n-gram comparisons, vector space models, and querying search engines .

N-gram Comparisons.  Comparing n-grams refers to determining the similarity of sequences of $n$ consecutive entities, which are typically characters or words and less frequently phrases or sentences. n-gram comparisons are widely applied for candidate retrieval or the seeding phase of the detailed analysis stage in extrinsic monolingual and cross-language detection approaches as well as in intrinsic detection.

Approaches using n-gram comparisons first split a document into (possibly overlapping) n-grams, which they use to create a set-based representation of the document or passage (“fingerprint”). To enable efficient retrieval, most approaches store fingerprints in index data structures. To speed up the comparison of individual fingerprints, some approaches hash or compress the n-grams that form the fingerprints. Hashing or compression reduces the lengths of the strings under comparison and allows performing computationally more efficient numerical comparisons. However, hashing introduces the risk of false positives due to hash collisions. Therefore, hashed or compressed fingerprinting is more commonly applied for the candidate retrieval stage, in which achieving high recall is more important than achieving high precision.

Fingerprinting is the most popular method for assessing local lexical similarity [ 104 ]. However, recent research has focused increasingly on detecting obfuscated plagiarism. Thus n-gram fingerprinting is often restricted to the preprocessing stage [ 20 ] or used as a feature for machine learning [ 7 ]. Character n-gram comparisons can be applied to cross-language plagiarism detection (CLPD) if the languages in question exhibit a high lexical similarity, e.g., English and Spanish [ 79 ].

Table 4 presents papers employing word n-grams; Table 5 lists papers using character n-grams, and Table 6 shows papers that employ hashing or compression for n-gram fingerprinting.

Extrinsic Document-level detection Stop words removed [ , , , , , , ]
Stop word n-grams [ ]
Candidate retrieval Stop words removed [ ]
All word n-grams and stop word n-grams [ ]
Detailed analysis All word n-grams [ , , ]
Stop words removed [ , ]
All word n-grams, stop word n-grams, and named entity n-grams [ ]
Numerous n-gram variations [ , ]
Context n-grams [ , ]
Paraphrase identification All word n-grams [ , ]
Combination with ESA [ ]
CLPD Stop words removed [ ]
Intrinsic Author identification Overlap in LZW dictionary [ ]
Author verification Word n-grams [ , , , , , , ]
Stop word n-grams [ , , ]
Extrinsic Document-level detection Pure character n-grams [ , ]
Overlap in LZW dictionary [ ]
Machine learning [ ]
Combined with Bloom filters [ ]
Detailed analysis Hashed character n-grams [ ]
Paraphrase identification Feature for machine learning [ ]
Cross-language PD Cross-language CNG [ , , , ]
Intrinsic Style-breach detection CNG as stylometric features [ , ],
Author identification Bit n-grams [ ]
Author verification CNG as stylometric features [ , , , , ], [ , , , , , , , , , , , ]
Author clustering CNG as stylometric features [ , , , , ]
Document-level detection Hashing [ , , , ]
Candidate retrieval Hashing [ , , , ]
Detailed analysis Hashing [ , , , ]
Document-level detection Compression [ ]
Author identification Compression [ , , , ]

Vector Space Models (VSM) are a classic retrieval approach that represents texts as high-dimensional vectors [ 249 ]. In plagiarism detection, words or word n-grams typically form the dimensions of the vector space and the components of a vector undergo term frequency–inverse document frequency (tf-idf) weighting [ 249 ]. Idf values are either derived from the suspicious document or the corpus [ 205 , 238 ]. The similarity of vector representations—typically quantified using the cosine measure, i.e., the angle between the vectors—is used as a proxy for the similarity of the documents the vectors represent.

Most approaches employ predefined similarity thresholds to retrieve documents or passages for subsequent processing. Kanjirangat and Gupta [ 249 ] and Ravi et al. [ 208 ] follow a different approach. They divide the set of source documents into K clusters by first selecting K centroids and then assigning each document to the group whose centroid is most similar. The suspicious document is used as one of the centroids and the corresponding cluster is passed on to the subsequent processing stages.

VSM remain popular and well-performing approaches not only for detecting copy-and-paste plagiarism but also for identifying obfuscated plagiarism as part of a semantic analysis. VSM are also frequently applied in intrinsic plagiarism detection. A typical approach is to represent sentences as vectors of stylometric features to find outliers or to group stylistically similar sentences.

Table 7 presents papers that employ VSM for extrinsic plagiarism detection; Table 8 lists papers using VSM for intrinsic plagiarism detection.

Document-level detection sentence Combination of similarity metrics [ ]
Document-level detection sentence VSM as a bitmap; compressed for comparison [ ]
Document-level detection sentence Machine learning to set similarity thresholds [ ]
Document-level detection word Synonym replacement [ ]
Document-level detection word, sentence Fuzzy set of WordNet synonyms [ ]
Candidate retrieval word Vectors of word N-grams [ , , ],
Candidate retrieval word K-means clustering of vectors to find documents most similar to the input doc. [ , ]
Candidate retrieval word Z-order mapping of multidimensional vectors to scalar and subsequent filtering [ ]
Candidate retrieval word Topic-based segmentation; Re-ranking of results based on the proximity of terms [ ]
Detailed analysis sentence Pure VSM [ , , , ]
Detailed analysis sentence Adaptive adjustment of parameters to detect the type of obfuscation [ , ]
Detailed analysis sentence Hybrid similarity (Cosine+ Jaccard) [ ]
Detailed analysis word Pure VSM [ ]
Paraphrase identification sentence Semantic role annotation [ ]
Style-breach detection word Word frequencies [ ]
Style-breach detection word Vectors of lexical and syntactic features [ , ]
Style-breach detection sentence Vectors of word embeddings [ ]
Style-breach detection sentence Vectors of lexical features [ ]
Style-breach detection sliding window Vectors of lexical features [ ]
Author clustering document Vectors of lexical features [ , , , , ]
Author clustering document Word frequencies [ ]
Author clustering document Word embeddings [ ]
Author verification document Word frequencies [ ]
Author verification document Vectors of lexical features [ , , , ]
Author verification document Vectors of lexical and syntactic features [ , , , ]
Author verification document Vectors of syntactic features [ ]

Querying Web Search Engines.  Many detection methods employ Web search engines for candidate retrieval, i.e., for finding potential source documents in the initial stage of the detection process. The strategy for selecting the query terms from the suspicious document is crucial for the success of this approach. Table 9 gives an overview of the strategies for query term selection employed by papers in our collection.

Querying the words with the highest tf-idf value [ , , , , , ]
Querying the least frequent words [ , ]
Querying the least frequent strings [ ]
Querying the words with the highest tf-idf value as well as noun phrases [ , , ]
Querying the nouns and most frequent words [ ]
Querying the nouns and verbs [ ]
Querying the nouns, verbs, and adjectives [ , , , ]
Querying the nouns, facts (dates, names, etc.) as well as the most frequent words [ ]
Querying keywords and the longest sentence in a paragraph [ , ]
Comparing different querying heuristics [ ]
Incrementing passage length and passage selection heuristics [ ]
Query expansion by words from UMLS Meta-thesaurus [ ]

Intrinsic detection approaches can employ Web Search engines to realize the G eneral Impostors Method . This method transforms the one-class verification problem regarding an author's writing style into a two-class classification problem. The method extracts keywords from the suspicious document to retrieve a set of topically related documents from external sources, the so-called “impostors.” The method then quantifies the “average” writing style observable in impostor documents, i.e., the distribution of stylistic features to be expected. Subsequently, the method compares the stylometric features of passages from the suspicious document to the features of the “average” writing style in impostor documents. This way, the method distinguishes the stylistic features that are characteristic of an author from the features that are specific to the topic [ 135 ]. Koppel and Winter present the method in detail [ 146 ]. Detection approaches implementing the general impostors method achieved excellent results in the PAN competitions, e.g., winning the competition in 2013 and 2014 [ 128 , 232 ]. Table 10 presents papers using this method.

Author verification [ , , , , ]

Syntax-based Methods

Syntax-based detection methods typically operate on the sentence level and employ PoS tagging to determine the syntactic structure of sentences [ 99 , 245 ]. The syntactic information helps to address morphological ambiguity during the lemmatization or stemming step of preprocessing [ 117 ], or to reduce the workload of a subsequent semantic analysis, typically by exclusively comparing the pairs of words belonging to the same PoS class [ 102 ]. Many intrinsic detection methods use the frequency of PoS tags as a stylometric feature.

The method of Tschuggnall and Specht [ 245 ] relies solely on the syntactic structure of sentences. Table 11 presents an overview of papers using syntax-based methods.

Extrinsic PoS tagging Addressing morphological ambiguity [ , ]
Word comparisons within the same PoS class only [ , ]
Combined with stop-words [ ]
Comparing PoS sequences [ ]
Combination with PPM compression [ ]
Intrinsic PoS tags as stylometric features PoS frequency [ , , , , ]
PoS n-gram frequency [ , , , , , , , ]
PoS frequency, PoS n-gram frequency, starting PoS tag [ ]
Comparing syntactic trees Direct comparison [ , ]
Integrated syntactic graphs [ ]

Semantics-based Methods

Papers presenting semantics-based detection methods are the largest group in our collection. This finding reflects the importance of detecting obfuscated forms of academic plagiarism, for which semantics-based detection methods are the most promising approach [ 216 ]. Semantics-based methods operate on the hypothesis that the semantic similarity of two passages depends on the occurrence of similar semantic units in these passages. The semantic similarity of two units derives from their occurrence in similar contexts.

Many semantics-based methods use thesauri (e.g., WordNet or EuroVoc 7 ). Including semantic features, like synonyms, hypernyms, and hyponyms, in the analysis improves the performance of paraphrase identification [ 9 ]. Using a canonical synonym for each word helps detecting synonym-replacement obfuscation and reduces the vector space dimension [ 206 ]. Sentence segmentation and text tokenization are crucial parameters for all semantics-based detection methods. Tokenization extracts the atomic units of the analysis, which are typically either words or phrases. Most papers in our collection use words as tokens.

Employing established semantic text analysis methods like Latent Semantic Analysis (LSA), Explicit Semantic Analysis (ESA), and word embeddings for extrinsic plagiarism detection is a popular and successful approach. This group of methods follows the idea of “distributional semantics,” i.e., terms co-occurring in similar contexts tend to convey a similar meaning. In the reverse conclusion, distributional semantics assumes that similar distributions of terms indicate semantically similar texts. The methods differ in the scope within which they consider co-occurring terms. Word embeddings consider only the immediately surrounding terms, LSA analyzes the entire document and ESA uses an external corpus.

Latent Semantic Analysis is a technique to reveal and compare the underlying semantic structure of texts [ 55 ]. To determine the similarity of term distributions in texts, LSA computes a matrix, in which rows represent terms, columns represent documents and the entries of the matrix typically represent log-weighted tf-idf values [ 46 ]. LSA then employs Singular Value Decomposition (SVD) or similar dimensionality reduction techniques to find a lower-rank approximation of the term-document matrix by reducing the number of rows (i.e., pruning less relevant terms) while maintaining the similarity distribution between columns (i.e., the text representations). The terms remaining after the dimensionality reduction are assumed to be most representative of the semantic meaning of the text. Hence, comparing the rank-reduced matrix-representations of texts allows computing the semantic similarity of the texts [ 46 ].

LSA can reveal similarities between texts that traditional vector space models cannot express [ 116 ]. The ability of LSA to address synonymy is beneficial for paraphrase identification. For example, Satyapanich et al. [ 222 ] considered two sentences as paraphrases if their LSA similarity is above a threshold. While LSA performs well in addressing synonymy, its ability to reflect polysemy is limited [ 55 ].

Ceska [ 46 ] first applied LSA for plagiarism detection. AlSallal et al. [ 15 ] proposed a novel weighting approach that assigns higher weights to the most common terms and used LSA as a stylometric feature for intrinsic plagiarism detection. Aldarmaki and Diab [ 11 ] used weighted matrix factorization—a method similar to LSA—for cross-language paraphrase identification. Table 12 lists other papers employing LSA for extrinsic and intrinsic plagiarism detection.

Extrinsic Document-level detection LSA with phrase tf-idf [ , ]
LSA in combination with other methods [ ]
Candidate retrieval LSA only [ ]
Paraphrase identification LSA only [ ]
LSA with machine learning [ , , , ]
Weighted matrix factorization [ ]
Intrinsic Document-level detection LSA with stylometric features [ ]
Author identification LSA with machine learning [ , ]
LSA at CNG level [ ]

Explicit Semantic Analysis is an approach to model the semantics of a text in a high-dimensional vector space of semantic concepts [ 82 ]. Semantic concepts are the topics in a man-made knowledge base corpus (typically Wikipedia or other encyclopedias). Each article in the knowledge base is an explicit description of the semantic content of the concept, i.e., the topic of the article [ 163 ]. ESA builds a “semantic interpreter” that allows representing texts as concept vectors whose components reflect the relevance of the text for each of the semantic concepts, i.e., knowledge base articles [ 82 ]. Applying vector similarity measures, such as the cosine metric, to the concept vectors then allows determining the texts’ semantic similarity.

Table 13 shows detection methods that employed ESA depending on the corpus used to build the semantic interpreter. Constructing the semantic interpreter from multilingual corpora, such as Wikipedia, allows the application of ESA for cross-language plagiarism detection [ 78 ]. ESA has several applications beyond PD, e.g., when applied for document classification, ESA achieved a precision above 95% [ 124 , 174 ].

Wikipedia (monolingual) [ , , ]
Wikipedia (cross-language) [ , ]
Wikipedia + FanFiction [ ]

The Information Retrieval-based semantic similarity approach proposed by Itoh [ 120 ] is a generalization of ESA. The method models a text passage as a set of words and employs a Web search engine to obtain a set of relevant documents for each word in the set. The method then computes the semantic similarity of the text passages as the similarity of the document sets obtained, typically using the Jaccard metric. Table 14 presents papers that also follow this approach.

Articles from Wikipedia [ ]
Synonyms from Farsnet [ ]

Word embeddings is another semantic analysis approach that is conceptually related to ESA. While ESA considers term occurrences in each document of the corpus, word embeddings exclusively analyze the words that surround the term in question. The idea is that terms appearing in proximity to a given term are more characteristic of the semantic concept represented by the term in question than more distant words. Therefore, terms that frequently co-occur in proximity within texts should also appear closer within the vector space [ 73 ]. In cross-language plagiarism detection, word embeddings outperformed other methods when syntactic weighting was employed [ 73 ]. Table 15 summarizes papers that employ word embeddings.

Extrinsic Candidate retrieval [ ]
Cross-language PD [ ]
Intrinsic Paraphrase identification [ , , , , ]
Style-breach detection [ ]
Author clustering [ ]

Word Alignment is a semantic analysis approach widely used for machine translation [ 240 ] and paraphrase identification. Words are aligned, i.e., marked as related, if they are semantically similar. Semantic similarity of two words is typically retrieved from an external database, like WordNet. The semantic similarity of two sentences is then computed as the proportion of aligned words. Word alignment approaches achieved the best performance for the paraphrase identification task at SemEval 2014 [ 240 ] and were among the top-performing approaches at SemEval-2015 [ 9 , 242 ].

Cross-language alignment-based similarity analysis (CL-ASA) is a variation of the word alignment approach for cross-language semantic analysis. The approach uses a parallel corpus to compute the similarity that a word $x$ in the suspicious document is a valid translation of the term $y$ in a potential source document for all terms in the suspicious and the source documents. The sum of the translation probabilities yields the probability that the suspicious document is a translation of the source document [ 28 ]. Table 16 presents papers using Word alignment and CL-ASA.

Word alignment only [ , ]
Word alignment-based modification of Jaccard and Levenshtein measure [ ]
Word alignment in combination with machine learning [ , , ]
CL-ASA [ , ]
Translation + word alignment [ ]

Graph-based Semantic Analysis. Knowledge graph analysis (KGA) represents a text as a weighted directed graph, in which the nodes represent the semantic concepts expressed by the words in the text and the edges represent the relations between these concepts [ 79 ]. The relations are typically obtained from publicly available corpora, such as BabelNet 8 or WordNet. Determining the edge weights is the major challenge in KGA. Traditionally, edge weights were computed from analyzing the relations between concepts in WordNet [ 79 ]. Salvador et al. [ 79 ] improved the weighting procedure by using continuous skip-grams that additionally consider the context in which the concepts appear. Applying graph similarity measures yields a semantic similarity score for documents or parts thereof (typically sentences).

Inherent characteristics of KGA like word sense disambiguation, vocabulary expansion, and language independence are highly beneficial for plagiarism detection. Thanks to these characteristics, KGA is resistant to synonym replacements and syntactic changes. Using multilingual corpora allows the application of KGA for cross-language PD [ 79 ]. KGA achieves high detection effectiveness if the text is translated literally; for paraphrased translations, the results are worse [ 77 ].

The universal networking language approach proposed by Avishek and Bhattacharyyan [ 53 ] is conceptually similar to KGA. The method constructs a dependency graph for each sentence and then compares the lexical, syntactic, and semantic similarity separately. Kumar [ 147 ] used semantic graphs for the seeding phase of the detailed analysis stage. In those graphs, the nodes corresponded to all words in a document or passage. The edges represented the adjacency of the words. The edge weights expressed the semantic similarity of words based on the probability that the words occur in a 100-word window within a corpus of DBpedia 9 articles. Overlapping passages in two documents were identified using the minimum weight bipartite clique cover.

Table 17 presents detection methods that employ graph-based semantic analysis.

Document-level detection Knowledge graph analysis [ ]
Detailed analysis Semantic graphs [ ]
Detailed analysis Word n-gram graphs for sentences [ ]
Paraphrase identification Knowledge graph analysis [ ]
Paraphrase identification Universal networking language [ ]
Cross-language plagiarism detection Knowledge graph analysis [ , , ]

Semantic Role Labeling (SRL) determines the semantic roles of terms in a sentence, e.g., the subject, object, events, and relations between these entities, based on roles defined in linguistic resources, such as PropBank 10 or VerbNet. 11 The goal is to extract “who” did “what” to “whom” “where” and “when” [ 188 ]. The first step in SRL is PoS tagging and syntax analysis to obtain the dependency tree of a sentence. Subsequently, the semantic annotation is performed [ 71 ].

Paul and Jamal [ 188 ] used SRL in combination with sentence ranking for document-level plagiarism detection. Hamza and Salim [ 182 ] employed SRL to extract arguments from sentences, which they used to quantify and compare the syntactic and semantic similarity of the sentences. Ferreira et al. [ 71 ] obtained the similarity of sentences by combining various features and measures using machine learning. Table 18 lists detection approaches that employ SRL.

Document-level detection [ , ]
Paraphrase identification [ ]
Monolingual plagiarism detection Citation-based PD [ , , , , ]
Math-based PD [ , ]
Image-based PD [ ]
Cross-lingual plagiarism detection CbPD [ ]

Idea-based Methods

Idea-based methods analyze non-textual content elements to identify obfuscated forms of academic plagiarism. The goal is to complement detection methods that analyze the lexical, syntactic, and semantic similarity of text to identify plagiarism instances that are hard to detect both for humans and for machines. Table 19 lists papers that proposed idea-based detection methods.

Citation-based plagiarism detection (CbPD) proposed by Gipp et al. [ 91 ] analyses patterns of in-text citations in academic documents, i.e., identical citations occurring in proximity or in a similar order within two documents. The idea is that in-text citations encode semantic information language-independently. Thus, analyzing in-text citation patterns can indicate shared structural and semantic similarity among texts. Assessing semantic and structural similarity using citation patterns requires significantly less computational effort than approaches for semantic and syntactic text analysis [ 90 ]. Therefore, CbPD is applicable for the candidate retrieval and the detailed analysis stage [ 161 ] of monolingual [ 90 , 93 ] and cross-lingual [ 92 ] detection methods. For weakly obfuscated instances of plagiarism, CbPD achieved comparable results as lexical detection methods; for paraphrased and idea plagiarism, CbPD outperformed lexical detection methods in the experiments of Gipp et al. [ 90 , 93 ]. Moreover, the visualization of citation patterns was found to facilitate the inspection of the detection results by humans, especially for cases of structural and idea plagiarism [ 90 , 93 ]. Pertile et al. [ 191 ] confirmed the positive effect of combining citation and text analysis on the detection effectiveness and devised a hybrid approach using machine learning. CbPD can also alert a user when the in-text citations are inconsistent with the list of references. Such inconsistency may be caused by mistake, or deliberately to obfuscate plagiarism.

Meuschke et al. [ 163 ] proposed mathematics-based plagiarism detection (MathPD) as an extension of CbPD for documents in the Science, Technology, Engineering and Mathematics (STEM) fields. Mathematical expressions share many properties of academic citations, e.g., they are essential components of academic STEM documents, are language-independent, and contain rich semantic information. Furthermore, some disciplines, such as mathematics and physics, use academic citations sparsely [ 167 ]. Therefore, a citation-based analysis alone is less likely to reveal suspicious content similarity for these disciplines [ 163 ], [ 165 ]. Meuschke et al. showed that an exclusive math-based similarity analysis performed well for detecting confirmed cases of academic plagiarism in STEM documents [ 163 ]. Combining a math-based and a citation-based analysis further improved the detection performance for confirmed cases of plagiarism [ 165 ].

Image-based plagiarism detection analyze graphical content elements. While a large variety of methods to retrieve similar images have been proposed [ 56 ], few studies investigated the application of content-based image retrieval approaches for academic plagiarism detection. Meuschke et al. [ 162 ] is the only such study we encountered during our data collection. The authors proposed a detection approach that integrates established image retrieval methods with novel similarity assessments for images that are tailored to plagiarism detection. The approach has been shown to retrieve both copied and altered figures.

Ensembles of Detection Methods

Each class of detection methods has characteristic strengths and weaknesses. Many authors showed that combining detection methods achieves better results than applying the methods individually [ 7 , 62 , 78 , 128 , 133 , 234 , 242 , 273 , 275 ]. By assembling the best-performing detection methods in PAN 2014, the organizers of the workshop created a meta-system that performed best overall [ 232 ].

In intrinsic plagiarism detection, combining feature analysis methods is a standard approach [ 233 ], since an author's writing style always comprises of a multitude of stylometric features [ 127 ]. Many recent author verification methods employ machine learning to select the best performing feature combination [ 234 ].

In general, there are three ways of combining plagiarism detection methods:

  • Using adaptive algorithms that determine the obfuscation strategy, choose the detection method, and set similarity thresholds accordingly
  • Using an ensemble of detection methods whose results are combined using static weights
  • Using machine learning to determine the best-performing combination of detection methods

The winning approach at PAN 2014 and 2015 [ 216 ] used an adaptive algorithm . After finding the seeds of overlapping passages, the authors extended the seeds using two different thresholds for the maximum gap. Based on the length of the passages, the algorithm automatically recognized different plagiarism forms and set the parameters for the VSM-based detection method accordingly.

The “ linguistic knowledge approach ” proposed by Abdi et al. [ 2 ] exemplifies an ensemble of detection methods . The method combines the analysis of syntactic and semantic sentence similarity using a linear combination of two similarity metrics: (i) the cosine similarity of semantic vectors and (ii) the similarity of syntactic word order vectors [ 2 ]. The authors showed that the method outperformed other contesters on the PAN-10 and PAN-11 corpora. Table 20 lists other ensembles of detection methods.

Document-level detection Linguistic knowledge [ ]
Candidate retrieval Querying a Web search engine Combination of querying heuristics [ ]
Detailed analysis Vector space model Adaptive algorithm [ , , ]

Machine Learning approaches for plagiarism detection typically train a classification model that combines a given set of features. The trained model can then be used to classify other datasets. Support vector machine (SVM) is the most popular model type for plagiarism detection tasks. SVM uses statistical learning to minimize the distance between a hyperplane and the training data. Choosing the hyperplane is the main challenge for correct data classification [ 66 ].

Machine-learning approaches are very successful in intrinsic plagiarism detection. Supervised machine-learning methods, specifically random forests, were the best-performing approach at the intrinsic detection task of the PAN 2015 competition [ 233 ]. The best-known method for author verification is unmasking [ 232 ], which uses an SVM classifier to distinguish the stylistic features of the suspicious document from a set of documents for which the author is known. The idea of unmasking is to train and run the classifier and then remove the most significant features of the classification model and rerun the classification. If the classification accuracy drops significantly, then the suspicious and known documents are likely from the same author; otherwise, they are likely written by different authors [ 232 ]. There is no consensus on the stylometric features that are most suitable for authorship identification [ 158 ]. Table 21 gives an overview of intrinsic detection methods that employ machine-learning techniques.

Style-breach detection Gradient Boosting Regression Trees Lexical, syntax [ ]
Author identification SVM Semantic (LSA) [ ]
Author clustering Recurrent ANN Lexical [ ],
SVM Lexical, syntax [ ]
Author verification Recurrent ANN Lexical [ ]
k-nearest neighbor Lexical [ ]
Lexical, syntax [ ]
Homotopy-based classification Lexical [ ]
Naïve Bayes Lexical [ ]
SVM Lexical, syntax [ , , , , ]
Equal error rate Lexical [ ]
Decision Tree Lexical [ ]
Random Forest Lexical, syntax [ , , ]
Genetic algorithm Lexical, syntax [ , ]
Multilayer perceptron Lexical, semantic (LSA) [ ]
Many Lexical [ , ]
Lexical, syntax [ ]

For extrinsic plagiarism detection, the application of machine learning has been studied for various components of the detection process [ 208 ]. Gharaviet al. [ 88 ] used machine learning to determine the suspiciousness thresholds for a vector space model. Zarrella et al. [ 273 ] won the SemEval competition in 2015 with their ensemble of seven algorithms; most of them used machine learning. While Hussain and Suryani [ 116 ] successfully used an SVM classifier for the candidate retrieval stage [ 269 ], Williams et al. compared many supervised machine-learning methods and concluded that applying them for classifying and ranking Web search engine results did not improve candidate retrieval. Kanjirangat and Gupta [ 252 ] used a genetic algorithm to detect idea plagiarism. The method randomly chooses a set of sentences as chromosomes. The sentence sets that are most descriptive of the entire document are combined and form the next generation. In this way, the method gradually extracts the sentences that represent the idea of the document and can be used to retrieve similar documents.

Sánchez-Vega et al. [ 218 ] proposed a method termed rewriting index that evaluates the degree of membership of each sentence in the suspicious document to a possible source document. The method uses five different Turing machines to uncover verbatim copying as well as basic transformations on the word level (insertion, deletion, substitution). The output values of the Turing machines are used as the features to train a Naïve Bayes classifier and identify reused passages.

In the approach of Afzal et al. [ 5 ], the linear combination of supervised and unsupervised machine-learning methods outperformed each of the methods applied individually. In the experiments of Alfikri and Purwarianti [ 13 ], SVM classifiers outperformed Naïve Bayes classifiers. In the experiments of Subroto and Selamat [ 236 ], the best performing configuration was a hybrid model that combined SVM and an artificial neural network (ANN). El-Alfy et al. [ 62 ] found that an abductive network outperformed SVM. However, as shown in Table 22 , SVM is the most popular classifier for extrinsic plagiarism detection methods. Machine learning appears to be more beneficial when applied for the detailed analysis, as indicated by the fact that most extrinsic detection methods apply machine learning for that stage (cf. Table 22 ).

Document-level detection SVM Semantic [ , ]
SVM, Naïve Bayes Lexical, semantic [ ]
Decision tree, k-nearest neighbor Syntax [ ]
Naïve Bayes, SVM, Decision tree Lexical, syntax [ ]
Many Semantic (CbPD) [ ]
Candidate retrieval SVM Lexical [ ]
Linear discriminant analysis Lexical, syntax [ ]
Genetic algorithm Lexical, syntax [ ]
Detailed analysis Logical regression model Lexical, syntax, semantic [ ]
Naïve Bayes Lexical [ ]
Naïve Bayes, Decision Tree, Random Forest Lexical [ ]
SVM Lexical, semantic [ ]
Paraphrase identification SVM Lexical [ ]
Lexical, semantic [ , ]
Lexical, syntax, semantic [ , , ]
MT metrics [ ]
ML with syntax and semantic features [ ]
k-nearest neighbor, SVM, artificial neural network Lexical [ ]
SVM, Random forest, Gradient boosting Lexical, syntax, semantic, MT metrics [ ]
SVM, MaxEnt Lexical, syntax, semantic [ ]
Abductive networks Lexical [ ]
Linear regression Lexical, syntax, semantic [ ]
L2-regularized logistic regression Lexical, syntax, semantic, ML [ ]
Ridge regression Lexical, semantic [ ]
Gaussian process regression Lexical, semantic [ ]
Isotonic regression Semantic [ ]
Artificial neural network Lexical, semantic [ ]
Deep neural network Syntax, semantic [ ]
Semantic [ ]
Decision Tree Semantic [ ]
Lexical, syntax, semantic [ , ]
Random Forest Semantic, MT metrics [ ]
Many Lexical, semantic [ , ]
Lexical, syntax, semantic [ , ]
Cross-language PD Artificial neural networks Semantic [ ]

Evaluation of Plagiarism Detection Methods

The availability of datasets for development and evaluation is essential for research on natural language processing and information retrieval. The PAN series of benchmark competitions is a comprehensive and well‑established platform for the comparative evaluation of plagiarism detection methods and systems [ 197 ]. The PAN test datasets contain artificially created monolingual (English, Arabic, Persian) and—to a lesser extent—cross-language plagiarism instances (German and Spanish to English) with different levels of obfuscation. The papers included in this review that present lexical, syntactic, and semantic detection methods mostly use PAN datasets 12 or the Microsoft Research Paraphrase corpus. 13 Authors presenting idea-based detection methods that analyze non-textual content features or cross-language detection methods for non-European languages typically use self-created test collections, since the PAN datasets are not suitable for these tasks. A comprehensive review of corpus development initiatives is out of the scope of this article.

Since plagiarism detection is an information retrieval task, precision, recall, and F‑measure are typically employed to evaluate plagiarism detection methods. A notable use-case-specific extension of these general performance measures is the PlagDet metric. Potthast et al. introduced the metric to evaluate the performance of methods for the detailed analysis stage in external plagiarism detection [ 201 ]. A method may detect only a fragment of a plagiarism instance or report a coherent instance as multiple detections. To account for these possibilities, Potthast et al. included the granularity score as part of the PlagDet metric. The granularity score is the ratio of the detections a method reports and the true number of plagiarism instances.

PLAGIARISM DETECTION SYSTEMS

Plagiarism detection systems implement (some of) the methods described in the previous sections. To be applicable in practice, the systems must address the tradeoff between detection performance and processing speed [ 102 ], i.e., find sources of plagiarism with reasonable computational costs.

Most systems are Web-based; some can run locally. The systems typically highlight the parts of a suspicious document that likely originate from another source as well as which source that is. Understanding how the source was changed is often left to the user. Providers of plagiarism detection systems, especially of commercial systems, rarely publish information on the detection methods they employ [ 85 , 256 ]. Thus, estimating to what extent plagiarism detection research influences practical applications is difficult.

Velásquez et al. [ 256 ] provided a text-matching software and described its functionality that included the recognition of quotes. The system achieved excellent results in the PAN 10 and PAN 11 competitions. Meanwhile, the authors commercialized the system [ 195 ].

Academics and practitioners are naturally interested in which detection system achieves the best results. Weber-Wulff and her team performed the most methodologically sound investigation of this question in 2004, 2007, 2008, 2010, 2011, 2012, and 2013 [ 266 ]. In their latest benchmark evaluation, the group compared 15 systems using documents written in English and German.

Chowdhury and Bhattacharyya [ 48 ] provided an exhaustive list of currently available plagiarism detection systems. Unfortunately, the description of each system is short, and the authors did not provide performance comparisons. Pertile et al. [ 191 ] summarized the basic characteristics of 17 plagiarism detection systems. Kanjirangat and Gupta [ 251 ] compared four publicly available systems. They used four test documents that contained five forms of plagiarism (copy-and-paste, random obfuscation, translation to Hindi and back, summarization). All systems failed to identify plagiarism instances other than copy-and-paste and random obfuscation.

There is consensus in the literature that the inability of plagiarism detection systems to identify obfuscated plagiarism is currently their most severe limitation [ 88 , 251 , 266 ].

In summary, there is a lack of systematic and methodologically sound performance evaluations of plagiarism detection systems, since the benchmark comparisons of Weber-Wulff ended in 2013. This lack is problematic, since plagiarism detection systems are typically a key building block of plagiarism policies. Plagiarism detection methods and plagiarism policies are the subjects of extensive research. We argue that plagiarism detection systems should be researched just as extensively but are currently not.

In this section, we summarize the advancements in the research on methods to detect academic plagiarism that our review identified. Figure 2 depicts the suitability of the methods discussed in the previous sections for identifying the plagiarism forms presented in our typology. As shown in the Figure, n-gram comparisons are well-suited for detecting character-preserving plagiarism and partially suitable for identifying ghostwriting and syntax-preserving plagiarism. Stylometry is routinely applied for intrinsic plagiarism detection and can reveal ghostwriting and copy-and-paste plagiarism. Vector space models have a wide range of applications but appear not to be particularly beneficial for detecting idea plagiarism. Semantics-based methods are tailored to the detection of semantics-preserving plagiarism, yet also perform well for character-preserving and syntax-preserving forms of plagiarism. Non-textual feature analysis and machine learning are particularly beneficial for detecting strongly obfuscated forms of plagiarism, such as semantics-preserving and idea-preserving plagiarism. However, machine learning is a universal approach that also performs well for less strongly disguised forms of plagiarism.

Fig. 2.

The first observation of our literature survey is that ensembles of detection methods tend to outperform approaches based on a single method [ 93 , 161 ]. Chong experimented with numerous methods for preprocessing as well as with shallow and deep NLP techniques [ 47 ]. He tested the approaches on both small and large-scale corpora and concluded that a combination of string-matching and deep NLP techniques achieves better results than applying the techniques individually.

Machine-learning approaches represent the logical evolution of the idea to combine heterogeneous detection methods. Since our previous review in 2013, unsupervised and supervised machine-learning methods have found increasingly wide-spread adoption in plagiarism detection research and significantly increased the performance of detection methods. Baroni et al. [ 27 ] provided a systematic comparison of vector-based similarity assessments. The authors were particularly interested in whether unsupervised count-based approaches like LSA achieve better results than supervised prediction-based approaches like Softmax. They concluded that the prediction-based methods outperformed their count-based counterparts in precision and recall while requiring similar computational effort. We expect that the research on applying machine learning for plagiarism detection will continue to grow significantly in the future.

Considering the heterogeneous forms of plagiarism (see the typology section), the static one-fits-all approach observable in most plagiarism detection methods before 2013 is increasingly replaced by adaptive detection algorithms. Many recent detection methods first seek to identify the likely obfuscation method and then apply the appropriate detection algorithm [ 79 , 198 ], or at least to dynamically adjust the parameters of the detection method [ 216 ].

Graph-based methods operating on the syntactic and semantic levels achieve comparable results to other semantics-based methods. Mohebbi and Talebpour [ 168 ] successfully employed graph-based methods to identify paraphrases. Franco-Salvador et al. [ 79 ] demonstrated the suitability of knowledge graph analysis for cross-language plagiarism detection.

Several researchers showed the benefit of analyzing non-textual content elements to improve the detection of strongly obfuscated forms of plagiarism. Gipp et al. demonstrated that analyzing in-text citation patterns achieves higher detection rates than lexical approaches for strongly obfuscated forms of academic plagiarism [ 90 , 92 – 94 ]. The approach is computationally modest and reduces the effort required of users for investigating the detection results. Pertile et al. [ 191 ] combined lexical and citation-based approaches to improve detection performance. Eisa et al. [ 61 ] strongly advocated for additional research on analyzing non-textual content features. The research by Meuschke et al. on analyzing images [ 162 ] and mathematical expressions [ 164 ] confirms that non-textual detection methods significantly enhance the detection capabilities. Following the trend of combining detection methods, we see the analysis of non-textual content features as a promising component of future integrated detection approaches.

Surprisingly many papers in our collection addressed plagiarism detection for Arabic and Persian texts (e.g., References [ 22 , 118 , 231 , 262 ]). The interest in plagiarism detection for the Arabic language led the organizers of the PAN competitions to develop an Arabic corpus for intrinsic plagiarism detection [ 34 ]. In 2015, the PAN organizers also introduced a shared task on plagiarism detection for Arabic texts [ 32 ], followed by a shared task for Persian texts one year later [ 22 ]. While these are promising steps toward improving plagiarism detection for Arabic, Wali et al. [ 262 ] noted that the availability of corpora and lexicons for Arabic is still insufficient when compared to other languages. This lack of resources and the complex linguistic features of the Arabic language cause plagiarism detection for Arabic to remain a significant research challenge [ 262 ].

For cross-language plagiarism detection methods, Ferrero et al. [ 74 ] introduced a five-class typology that still reflects the state of the art: cross-language character n-grams (CL-CNG), cross-language conceptual thesaurus-based similarity (CL-CTS), cross-language alignment-based similarity analysis (CL-ASA), cross-language explicit semantic analysis (CL-ESA), and translation with monolingual analysis (T+MA). Franco-Salvador et al. [ 80 ] showed that the performance of these methods varies depending on the language and corpus. The observation that the combination of detection methods improves the detection performance also holds for the cross-language scenario [ 80 ]. In the analysis of Ferrero et al. [ 74 ], the detection performance of methods exclusively depended on the size of the chosen chunk but not on the language, nor the dataset. Translation with monolingual analysis is a widely used approach. For the cross-language detection task (Spanish–English) at the SemEval competition in 2016, most of the contesters applied a machine translation from Spanish to English and then compared the sentences in English [ 7 ]. However, some authors do not consider this approach as cross-language plagiarism detection but as monolingual plagiarism detection with translation as a preprocessing step [ 80 ].

For intrinsic plagiarism detection, authors predominantly use lexical and syntax-based text analysis methods. Widely analyzed lexical features include character n-grams, word frequencies, as well as the average lengths of words, sentences, and paragraphs [ 247 ]. The most common syntax-based features include PoS tag frequencies, PoS tag pair frequencies, and PoS structures [ 247 ]. At the PAN competitions, methods that analyzed lexical features and employed simple clustering algorithms achieved the best results [ 200 ].

For the author verification task, the most successful methods treated the problem as a binary classification task. They adopted the extrinsic verification paradigm by using texts from other authors to identify features that are characteristic of the writing style of the suspected author [ 233 ]. The general impostors method is a widely used and largely successful realization of this approach [ 135 , 146 , 159 , 224 ].

From a practitioner's perspective, intrinsic detection methods exhibit several shortcomings. First, stylometric comparisons are inherently error-prone for documents collaboratively written by multiple authors [ 209 ]. This shortcoming is particularly critical, since most scientific publications have multiple authors [ 39 ]. Second, intrinsic methods are not well suited for detecting paraphrased plagiarism, i.e., instances in which authors illegitimately reused content from other sources that they presented in their own words. Third, the methods are generally not reliable enough for practical applications yet. Author identification methods achieve a precision of approximately 60%, author profiling methods of approximately 80% [ 200 ]. These values are sufficient for raising suspicion and encouraging further examination but not for proving plagiarism or ghostwriting. The availability of methods for automated author obfuscation aggravates the problem. The most effective methods can mislead the identification systems in almost half of the cases [ 199 ]. Fourth, intrinsic plagiarism detection approaches cannot point an examiner to the source document of potential plagiarism. If a stylistic analysis raised suspicion, then extrinsic detection methods or other search and retrieval approaches are necessary to discover the potential source document(s).

Other Applications of Plagiarism Detection Methods

Aside from extrinsic and intrinsic plagiarism detection, the methods described in this article have numerous other applications such as machine translation [ 67 ], author profiling for marketing applications [ 211 ], spam detection [ 248 ], law enforcement [ 127 , 211 ], identifying duplicate accounts in internet fora [ 4 ], identifying journalistic text reuse [ 47 ], patent analysis [ 1 ], event recognition based on tweet similarity [ 24 , 130 ], short answer scoring based on paraphrase identification [ 242 ], or native language identification [ 119 ].

In 2010, Mozgovoy et al. [ 173 ] proposed a roadmap for the future development of plagiarism detection systems. They suggested the inclusion of syntactic parsing, considering synonym thesauri, employing LSA to discover “tough plagiarism,” intrinsic plagiarism detection, and tracking citations and references. As our review of the literature shows, all these suggestions have been realized. Moreover, the field of plagiarism detection has made a significant leap in detection performance thanks to machine learning.

In 2015, Eisa et al. [ 61 ] praised the effort invested into improving text-based plagiarism detection but noted a critical lack of “techniques capable of identifying plagiarized figures, tables, equations and scanned documents or images .” While Meuschke et al. [ 163 , 165 ] proposed initial approaches that addressed these suggestions and achieved promising results, most of the research still addresses text-based plagiarism detection only.

A generally observable trend is that approaches that integrate different detection methods—often with the help of machine learning—achieve better results. In line with this observation, we see a large potential for the future improvement of plagiarism detection methods in integrating non-textual analysis approaches with the many well-performing approaches for the analysis of lexical, syntactic, and semantic text similarity.

To summarize the contributions of this article, we refer to the four questions Kitchenham et al. [ 138 ] suggested to assess the quality of literature reviews:

  • “Are the review's inclusion and exclusion criteria described and appropriate?
  • Is the literature search likely to have covered all relevant studies?
  • Did the reviewers assess the quality/validity of the included studies?
  • Were the basic data/studies adequately described?”

We believe that the answers to these four questions are positive for our survey. Our article summarizes previous research and identifies research gaps to be addressed in the future. We are confident that this review will help researchers newly entering the field of academic plagiarism detection to get oriented as well that it will help experienced researchers to identify related works. We hope that our findings will aid in the development of more effective and efficient plagiarism detection methods and system that will then facilitate the implementation of plagiarism policies.

  • Assad Abbas, Limin Zhang, and Samee U. Khan. 2014. A literature review on the state-of-the-art in patent analysis. World Pat. Inf. 37 (2014), 3–13. DOI: 10.1016/j.wpi.2013.12.006
  • Asad Abdi, Norisma Idris, Rasim M. Alguliyev, and Ramiz M. Aliguliyev. 2015. PDLK: Plagiarism detection using linguistic knowledge. Expert Syst. Appl . 42, 22 (2015), 8936–8946. DOI: 10.1016/j.eswa.2015.07.048
  • Samira Abnar, Mostafa Dehghani, Hamed Zamani, and Azadeh Shakery. 2014. Expanded n-grams for semantic text alignment—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Sadia Afroz, Aylin Caliskan Islam, Ariel Stolerman, Rachel Greenstadt, and Damon McCoy. 2014. Doppelgänger finder: Taking stylometry to the underground. In Proceedings of the 2014 IEEE Symposium on Security and Privacy . 212–226.
  • Naveed Afzal, Yanshan Wang, and Hongfang Liu. 2016. MayoNLP at SemEval-2016 Task 1: Semantic textual similarity based on lexical semantic net and deep learning semantic model. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16) . 674–679.
  • Basant Agarwal, Heri Ramampiaro, Helge Langseth, and Massimiliano Ruocco. 2018. A deep network model for paraphrase detection in short text messages. Inf. Process. Manag. 54, 6 (2018), 922–937. DOI: 10.1016/j.ipm.2018.06.005
  • Eneko Agirre, Carmen Banea, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, Rada Mihalcea, German Rigau, and Janyce Wiebe. 2016. Semeval-2016 task 1: Semantic textual similarity, monolingual and cross-lingual evaluation. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16) . 497–511.
  • Mayank Agrawal and Dilip Kumar Sharma. 2016. A state of art on source code plagiarism detection. In Proceedings of the 2016 2nd International Conference on Next Generation Computing Technologies (NGCT’16) . 236–241. DOI: 10.1109/NGCT.2016.7877421
  • Mohammad Al-Smadi, Zain Jaradat, Mahmoud Al-Ayyoub, and Yaser Jararweh. 2017. Paraphrase identification and semantic text similarity analysis in arabic news tweets using lexical, syntactic, and semantic features. Inf. Process. Manag. 53, 3 (2017), 640–652. DOI: 10.1016/j.ipm.2017.01.002
  • Houda Alberts. 2017. Author clustering with the aid of a simple distance measure—Notebook for PAN at CLEF 2017. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’17) .
  • Hanan Aldarmaki and Mona Diab. 2016. GWU NLP at SemEval-2016 Shared Task 1: Matrix factorization for crosslingual STS. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16) . 663–667.
  • Mahmoud Alewiwi, Cengiz Orencik, and Erkay Savas. 2016. Efficient top-k similarity document search utilizing distributed file systems and cosine similarity. Cluster Comput . 19, 1 (2016), 109–126. DOI: 10.1007/s10586-015-0506-0
  • Zakiy Firdaus Alfikri and Ayu Purwarianti. 2014. Detailed analysis of extrinsic plagiarism detection system using machine learning approach (naive bayes and svm). Indones. J. Electr. Eng. Comput. Sci. 12, 11 (2014), 7884–7894.
  • Muna Alsallal, Rahat Iqbal, Saad Amin, and Anne James. 2013. Intrinsic plagiarism detection using latent semantic indexing and stylometry. In Proceedings of the 2013 6th International Conference on Developments in eSystems Engineering . 145–150. DOI: 10.1109/DeSE.2013.34
  • Muna AlSallal, Rahat Iqbal, Saad Amin, Anne James, and Vasile Palade. 2016. An integrated machine learning approach for extrinsic plagiarism detection. In Proceedings of the 2016 9th International Conference on Developments in eSystems Engineering (DeSE’16) . 203–208. DOI: 10.1109/DeSE.2016.1
  • Muna AlSallal, Rahat Iqbal, Vasile Palade, Saad Amin, and Victor Chang. 2019. An integrated approach for intrinsic plagiarism detection. Fut. Gener. Comput. Syst. 96 (2019), 700–712. DOI: 10.1016/j.future.2017.11.023
  • Miguel A. Álvarez-Carmona, Marc Franco-Salvador, Esaú Villatoro-Tello, Manuel Montes-y-Gómez, Paolo Rosso, and Luis Villaseñor-Pineda. 2018. Semantically-informed distance and similarity measures for paraphrase plagiarism identification. J. Intell. Fuzzy Syst. 34, 5 (2018), 2983–2990.
  • Faisal Alvi, Mark Stevenson, and Paul Clough. 2014. Hashing and merging heuristics for text reuse detection. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) . 939–946.
  • Faisal Alvi, Mark Stevenson, and Paul Clough. 2017. Plagiarism detection in texts obfuscated with homoglyphs. In Advances in Information Retrieval . 669–675.
  • Salha Alzahrani. 2015. Arabic plagiarism detection using word correlation in N-Grams with K-Overlapping approach—Working notes for PAN-AraPlagDet at FIRE 2015. In Proceedings of the Forum for Information Retrieval Evaluation (FIRE’15) .
  • Salha M. Alzahrani, Naomie Salim, and Ajith Abraham. 2012. Understanding plagiarism linguistic patterns, textual features, and detection methods. IEEE Trans. Syst. Man, Cybern. C Appl. Rev. 42, 2 (2012), 133–149.
  • Habibollah Asghari, Salar Mohtaj, Omid Fatemi, Heshaam Faili, Paolo Rosso, and Martin Potthast. 2016. Algorithms and corpora for persian plagiarism detection. In Proceedings of the Forum for Information Retrieval Evaluation (FIRE’16) . 61.
  • Duygu Ataman, Jose G. C. De Souza, Marco Turchi, and Matteo Negri. 2016. FBK HLT-MT at SemEval-2016 Task 1: Cross-lingual semantic similarity measurement using quality estimation features and compositional bilingual word embeddings. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16) . 570–576.
  • Farzindar Atefeh and Wael Khreich. 2015. A survey of techniques for event detection in twitter. Comput. Intell. 31, 1 (2015), 132–164. DOI: 10.1111/coin.12017
  • Douglas Bagnall. 2015. Author identification using multi-headed recurrent neural networks—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Douglas Bagnall. 2016. Authorship clustering using multi-headed recurrent neural networks—Notebook for PAN at CLEF 2016. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’16) .
  • Marco Baroni, Georgiana Dinu, and Germán Kruszewski. 2014. Don't count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) . 238–247.
  • Alberto Barrón-Cedeño, Parth Gupta, and Paolo Rosso. 2013. Methods for cross-language plagiarism detection. Knowl.-Based Syst. 50 (2013), 211–217. DOI: 10.1016/j.knosys.2013.06.018
  • Alberto Barrón-Cedeño, Marta Vila, M. Antònia Martí, and Paolo Rosso. 2013. Plagiarism meets paraphrasing: insights for the next generation in automatic plagiarism detection. Comput. Linguist. 39, 4 (2013), 917–947. DOI: 10.1162/COLI_a_00153
  • Alberto Bartoli, Alex Dagri, Andrea De Lorenzo, Eric Medvet, and Fabiano Tarlao. 2015. An author verification approach based on differential features—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Jeffrey Beall. 2016. Best practices for scholarly authors in the age of predatory journals. Ann. R. Coll. Surg. Engl. 98, 2 (2016), 77–79.
  • Imene Bensalem, Imene Boukhalfa, Paolo Rosso, Lahsen Abouenour, Kareem Darwish, and Salim Chikhi. 2015. Overview of the AraPlagDet PAN@FIRE2015 shared task on arabic plagiarism detection. In Proceedings of the Forum for Information Retrieval Evaluation (FIRE’15) .
  • Imene Bensalem, Salim Chikhi, and Paolo Rosso. 2013. Building arabic corpora from wikisource. In Proceedings of the 2013 ACS International Conference on Computer Systems and Applications (AICCSA’13) . 1–2. DOI: 10.1109/AICCSA.2013.6616474
  • Imene Bensalem, Paolo Rosso, and Salim Chikhi. 2013. A new corpus for the evaluation of arabic intrinsic plagiarism detection. In Information Access Evaluation: Multilinguality, Multimodality, and Visualization . 53–58.
  • Imene Bensalem, Paolo Rosso, and Salim Chikhi. 2014. Intrinsic plagiarism detection using n-gram classes. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14) . 1459–1464.
  • Ergun Bicici. 2016. RTM at SemEval-2016 Task 1: Predicting semantic similarity with referential translation machines and related statistics. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16) . 758–764.
  • Victoria Bobicev. 2013. Authorship detection with PPM—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Hadj Ahmed Bouarara, Amine Rahmani, Reda Mohamed Hamou, and Abdelmalek Amine. 2014. Machine learning tool and meta-heuristic based on genetic algorithms for plagiarism detection over mail service. In Proceedings of the 2014 IEEE/ACIS 13th International Conference on Computer and Information Science (ICIS’14) . 157–162. DOI: 10.1109/ICIS.2014.6912125
  • Barry Bozeman, Daniel Fay, and Catherine P. Slade. 2013. Research collaboration in universities and academic entrepreneurship: The-state-of-the-art. J. Technol. Transf. 38, 1 (2013), 1–67. DOI: 10.1007/s10961-012-9281-8
  • Pearl Brereton, Barbara A. Kitchenham, David Budgen, Mark Turner, and Mohamed Khalil. 2007. Lessons from applying the systematic literature review process within the software engineering domain. J. Syst. Softw. 80, 4 (2007), 571–583. DOI: 10.1016/j.jss.2006.07.009
  • Tomáš Brychcín and Lukáš Svoboda. 2016. UWB at SemEval-2016 Task 1: Semantic textual similarity using lexical, syntactic, and semantic information. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16) . 588–594.
  • Davide Buscaldi, Joseph Le Roux, Jorge J. García Flores, and Adrian Popescu. 2013. LIPN-CORE: Semantic text similarity using n-grams, wordnet, syntactic analysis, ESA and information retrieval based features. In Proceedings of the 2nd Joint Conference on Lexical and Computational Semantics . 63.
  • Esteban Castillo, Ofelia Cervantes, Darnes Vilariño, David Pinto, and Saul León. 2014. Unsupervised method for the authorship identification task—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF ’ 14) .
  • Daniel Castro, Yaritza Adame, María Pelaez, and Rafael Muñoz. 2015. Authorship verification, combining linguistic features and different similarity functions—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Daniele Cerra, Mihai Datcu, and Peter Reinartz. 2014. Authorship analysis based on data compression. Pattern Recogn. Lett. 42 (2014), 79–84. DOI: 10.1016/j.patrec.2014.01.019
  • Zdenek Ceska. 2008. Plagiarism detection based on singular value decomposition. In Advances in Natural Language Processing . Springer, 108–119.
  • Man Yan Miranda Chong. 2013. A study on plagiarism detection and plagiarism direction identification using natural language processing techniques. Ph. D Thesis. University of Wolverhampton.
  • Hussain A. Chowdhury and Dhruba K. Bhattacharyya. 2016. Plagiarism: Taxonomy, tools and detection techniques. In Proceedings of the 19th National Convention on Knowledge, Library and Information Networking (NACLIN’16) .
  • Daniela Chudá, Jozef Lačný, Maroš Maršalek, Pavel Michalko, and Ján Súkeník. 2013. Plagiarism detection in slovak texts on the web. In Proceedings of the Conference on Plagiarism across Europe and Beyond . 249–260.
  • Guy J. Curtis and Joseph Clare. 2017. How prevalent is contract cheating and to what extent are students repeat offenders? J. Acad. Ethics 15, 2 (2017), 115–124. DOI: 10.1007/s10805-017-9278-x
  • Guy J. Curtis and Lucia Vardanega. 2016. Is plagiarism changing over time? A 10-year time-lag study with three points of measurement. High. Educ. Res. Dev. 35, 6 (2016), 1167–1179. DOI: 10.1080/07294360.2016.1161602
  • Michiel van Dam. 2013. A basic character n-gram approach to authorship verification—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Avishek Dan and Pushpak Bhattacharyya. 2013. Cfilt-core: Semantic textual similarity using universal networking language. In Proceedings of the 2nd Joint Conference on Lexical and Computational Semantics (*SEM’13) . 216–220.
  • Ali Daud, Wahab Khan, and Dunren Che. 2017. Urdu language processing: a survey. Artif. Intell. Rev. 47, 3 (2017), 279–311. DOI: 10.1007/s10462-016-9482-x
  • Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard A. Harshman. 1990. Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41, 6 (1990), 391. DOI: 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9
  • T. Dharani and I. Laurence Aroquiaraj. 2013. A survey on content based image retrieval. In Proceedings of the 2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering . 485–490. DOI: 10.1109/ICPRIME.2013.6496719
  • Michal Ďuračík, Emil Kršák, and Patrik Hrkút. 2017. Current trends in source code analysis, plagiarism detection and issues of analysis big datasets. Proc. Eng. 192 (2017), 136–141. DOI: 10.1016/j.proeng.2017.06.024
  • Nava Ehsan and Azadeh Shakery. 2016. Candidate document retrieval for cross-lingual plagiarism detection using two-level proximity information. Inf. Process. Manag. 52, 6 (2016), 1004–1017. DOI: 10.1016/j.ipm.2016.04.006
  • Nava Ehsan and Azadeh Shakery. 2016. A pairwise document analysis approach for monolingual plagiarism detection. In Proceedings of the Forum for Information Retrieval Evaluation ( FIRE’16) . 145–148.
  • Nava Ehsan, Frank Wm. Tompa, and Azadeh Shakery. 2016. Using a dictionary and n-gram alignment to improve fine-grained cross-language plagiarism detection. In Proceedings of the 2016 ACM Symposium on Document Engineering (DocEng’16) . 59–68. DOI: 10.1145/2960811.2960817
  • Taiseer Abdalla Elfadil Eisa, Naomie Salim, and Salha Alzahrani. 2015. Existing plagiarism detection techniques: A systematic mapping of the scholarly literature. Online Inf. Rev. 39, 3 (2015), 383–400.
  • El-Sayed M. El-Alfy, Radwan E. Abdel-Aal, Wasfi G. Al-Khatib, and Faisal Alvi. 2015. Boosting paraphrase detection through textual similarity metrics with abductive networks. Appl. Soft Comput. 26, (2015), 444–453. DOI: 10.1016/j.asoc.2014.10.021
  • Victoria Elizalde. 2013. Using statistic and semantic analysis to detect plagiarism—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF ’ 13) .
  • Victoria Elizalde. 2014. Using noun phrases and tf-idf for plagiarized document retrieval—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Erik von Elm, Greta Poglia, Bernhard Walder, and Martin R. Tramèr. 2004. Different patterns of duplicate publication: An Analysis of articles used in systematic reviews. JAMA 291, 8 (2004), 974–980. DOI: 10.1001/jama.291.8.974
  • Fezeh Esteki and Faramarz Safi Esfahani. 2016. A plagiarism detection approach based on SVM for persian texts. In Proceedings of the Forum for Information Retrieval Evaluation (FIRE’16) . 149–153.
  • Asli Eyecioglu and Bill Keller. 2015. Twitter paraphrase identification with simple overlap features and SVMs. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval’15) . 64–69.
  • Jody Condit Fagan. 2017. An evidence-based review of academic web search engines, 2014–2016: Implications for librarians’ practice and research agenda. Inf. Technol. Libr. 36, 2 (2017), 7.
  • Christiane Fellbaum. 1998. WordNet: An Electronic Lexical Database (Language, Speech, and Communication) . The MIT Press.
  • Vanessa Wei Feng and Graeme Hirst. 2013. Authorship verification with entity coherence and other rich linguistic features—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Rafael Ferreira, George D. C. Cavalcanti, Fred Freitas, Rafael Dueire Lins, Steven J. Simske, and Marcelo Riss. 2018. Combining sentence similarities measures to identify paraphrases. Comput. Speech Lang. 47 (2018), 59–73. DOI: 10.1016/j.csl.2017.07.002
  • Jérémy Ferrero, Frederic Agnes, Laurent Besacier, and Didier Schwab. 2017. CompiLIG at SemEval-2017 Task 1: Cross-language plagiarism detection methods for semantic textual similarity. arXiv:1704.01346 .
  • Jérémy Ferrero, Frédéric Agnes, Laurent Besacier, and Didier Schwab. 2017. Using word embedding for cross-language plagiarism detection. arXiv:1702.03082 .
  • Jérémy Ferrero, Laurent Besacier, Didier Schwab, and Frédéric Agnes. 2017. Deep investigation of cross-language plagiarism detection methods. arXiv:1705.08828 .
  • Tomáš Foltýnek and Irene Glendinning. 2015. Impact of policies for plagiarism in higher education across europe: Results of the project. Acta Univ. Agric. Silvic. Mendel. Brun. 63, 1 (2015), 207–216.
  • Marc Franco-Salvador, Parth Gupta, and Paolo Rosso. 2013. Cross-language plagiarism detection using a multilingual semantic network. In Advances in Information Retrieval . 710–713.
  • Marc Franco-Salvador, Parth Gupta, and Paolo Rosso. 2014. Knowledge graphs as context models: Improving the detection of cross-language plagiarism with paraphrasing. In Bridging Between Information Retrieval and Databases: PROMISE Winter School 2013 , Nicola Ferro (ed.). Springer-Verlag, Berlin, 227–236. DOI: 10.1007/978-3-642-54798-0_12
  • Marc Franco-Salvador, Parth Gupta, Paolo Rosso, and Rafael E. Banchs. 2016. Cross-language plagiarism detection over continuous-space- and knowledge graph-based representations of language. Knowl.-Based Syst. 111 (2016), 87–99. DOI: 10.1016/j.knosys.2016.08.004
  • Marc Franco-Salvador, Paolo Rosso, and Manuel Montes-y-Gómez. 2016. A systematic study of knowledge graph analysis for cross-language plagiarism detection. Inf. Process. Manag. 52, 4 (2016), 550–570. DOI: 10.1016/j.ipm.2015.12.004
  • Marc Franco-Salvador, Paolo Rosso, and Roberto Navigli. 2014. A knowledge-based representation for cross-language document retrieval and categorization. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics . 414–423.
  • Jordan Fréry, Christine Largeron, and Mihaela Juganaru-Mathieu. 2014. UJM at CLEF in Author Identification—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Evgeniy Gabrilovich and Shaul Markovitch. 2007. Computing semantic relatedness using wikipedia-based explicit semantic analysis. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’07) . 1606–1611.
  • Jean-Gabriel Ganascia, Peirre Glaudes, and Andrea Del Lungo. 2014. Automatic detection of reuses and citations in literary texts. Lit. Linguist. Comput. 29, 3 (2014), 412–421. DOI: 10.1093/llc/fqu020
  • Yasmany García-Mondeja, Daniel Castro-Castro, Vania Lavielle-Castro, and Rafael Muñoz. 2017. Discovering author groups using a b-compact graph-based clustering—notebook for PAN at CLEF 2017. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’17) .
  • Urvashi Garg and Vishal Goyal. 2016. Maulik: A plagiarism detection tool for hindi documents. Ind. J. Sci. Technol. 9, 12 (2016).
  • Shahabeddin Geravand and Mahmood Ahmadi. 2014. An efficient and scalable plagiarism checking system using bloom filters. Comput. Electr. Eng. 40, 6 (2014), 1789–1800.
  • M. R. Ghaeini. 2013. Intrinsic author identification using modified weighted KNN—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Erfaneh Gharavi, Kayvan Bijari, Kiarash Zahirnia, and Hadi Veisi. 2016. A deep learning approach to persian plagiarism detection. In Proceedings of the Forum for Information Retrieval Evaluation (FIRE’16) . 154– 159.
  • Lee Gillam. 2013. Guess again and see if they line up: Surrey's runs at plagiarism detection—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Bela Gipp. 2014. Citation-based Plagiarism Detection -Detecting Disguised and Cross-language Plagiarism Using Citation Pattern Analysis . Springer Vieweg Research. Retrieved from http://www.springer.com/978-3-658-06393-1 .
  • Bela Gipp and Norman Meuschke. 2011. Citation pattern matching algorithms for citation-based plagiarism detection: Greedy citation tiling, citation chunking and longest common citation sequence. In Proceedings of the 11th ACM Symposium on Document Engineering . 249–258. DOI: 10.1145/2034691.2034741
  • Bela Gipp, Norman Meuschke, and Joeran Beel. 2011. Comparative evaluation of text- and citation-based plagiarism detection approaches using guttenplag. In Proceedings of 11th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL’11) . 255–258. DOI: 10.1145/1998076.1998124
  • Bela Gipp, Norman Meuschke, and Corinna Breitinger. 2014. Citation‐based plagiarism detection: Practicability on a large‐scale scientific corpus. J. Assoc. Inf. Sci. Technol. 65, 8 (2014), 1527–1540. DOI: 10.1002/asi.23228
  • Bela Gipp, Norman Meuschke, Corinna Breitinger, Jim Pitman, and Andreas Nürnberger. 2014. Web-based demonstration of semantic similarity detection using citation pattern visualization for a cross language plagiarism case. In Proceedings of the International Conference on Enterprise Information Systems (ICEIS’14) . 677–683. DOI: 10.5220/0004985406770683
  • Goran Glavaš, Marc Franco-Salvador, Simone P. Ponzetto, and Paolo Rosso. 2018. A resource-light method for cross-lingual semantic textual similarity. Knowl.-Based Syst. 143 (2018), 1–9. DOI: 10.1016/j.knosys.2017.11.041
  • Lila Gleitman and Anna Papafragou. 2005. Language and thought. In The Cambridge Handbook of Thinking and Reasoning , Keith J. Holyoak and Robert G. Morrison (eds.). Cambridge University Press, 633– 661.
  • Demetrios G. Glinos. 2014. A hybrid architecture for plagiarism detection—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) . 958–965.
  • Helena Gómez-Adorno, Yuridiana Alemán, Darnes Vilariño Ayala, Miguel A Sanchez-Perez, David Pinto, and Grigori Sidorov. 2017. Author clustering using hierarchical clustering analysis—Notebook for PAN at CLEF 2017. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’17) .
  • Helena Gómez-Adorno, Grigori Sidorov, David Pinto, and Ilia Markov. 2015. A graph based authorship identification approach—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Philipp Gross and Pashutan Modaresi. 2014. Plagiarism alignment detection by merging context seeds—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Deepa Gupta, Vani Kanjirangat, and L. M. Leema. 2016. Plagiarism detection in text documents using sentence bounded stop word n-grams. J. Eng. Sci. Technol . 11, 10 (2016), 1403–1420.
  • Deepa Gupta, Vani Kanjirangat, and Charan Kamal Singh. 2014. Using natural language processing techniques and fuzzy-semantic similarity for automatic external plagiarism detection. In Proceedings of the 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI’14) . 2694–2699. DOI: 10.1109/ICACCI.2014.6968314
  • Josue Gutierrez, Jose Casillas, Paola Ledesma, Gibran Fuentes, and Ivan Meza. 2015. Homotopy based classification for author verification task—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Yaakov HaCohen-Kerner and Aharon Tayeb. 2017. Rapid detection of similar peer-reviewed scientific papers via constant number of randomized fingerprints. Inf. Process. Manag. 53, 1 (2017), 70–86. DOI: 10.1016/j.ipm.2016.06.007
  • Matthias Hagen, Martin Potthast, and Benno Stein. 2015. Source retrieval for plagiarism detection from large web corpora: Recent approaches. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Osama Haggag and Samhaa Smhaa El-Beltagy. 2013. Plagiarism candidate retrieval using selective query formulation and discriminative query scoring. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Oren Halvani and Lukas Graner. 2017. Author clustering based on compression-based dissimilarity scores—notebook for PAN at CLEF 2017. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’17) .
  • Oren Halvani and Martin Steinebach. 2014. VEBAV - A simple, scalable and fast authorship verification scheme—notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Oren Halvani, Martin Steinebach, and Ralf Zimmermann. 2013. Authorship verification via k-nearest neighbor estimation—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Oren Halvani and Christian Winter. 2015. A generic authorship verification scheme based on equal error rates—notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Christian Hänig, Robert Remus, and Xose De La Puente. 2015. Exb themis: Extensive feature extraction from word alignments for semantic textual similarity. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval’15) . 264–268.
  • Sarah Harvey. 2014. Author verification using PPM with parts of speech tagging—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Hua He, John Wieting, Kevin Gimpel, Jinfeng Rao, and Jimmy Lin. 2016. UMD-TTIC-UW at SemEval-2016 Task 1: Attention-based multi-perspective convolutional neural networks for textual similarity measurement. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16) . 1103–1108.
  • Oumaima Hourrane and El Habib Benlahmar. 2017. Survey of plagiarism detection approaches and big data techniques related to plagiarism candidate retrieval. In Proceedings of the 2nd International Conference on Big Data, Cloud and Applications (BDCA’17) . 15:1–15:6. DOI: 10.1145/3090354.3090369
  • Manuela Hürlimann, Benno Weck, Esther van denBerg, Simon Šuster, and Malvina Nissim. 2015. GLAD: Groningen lightweight authorship detection—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Syed Fawad Hussain and Asif Suryani. 2015. On retrieving intelligently plagiarized documents using semantic similarity. Eng. Appl. Artif. Intell. 45 (2015), 246–258. DOI: 10.1016/j.engappai.2015.07.011
  • Ashraf S. Hussein. 2015. A plagiarism detection system for arabic documents. In Intelligent Systems 2014 , D. Filev, J. Jabłkowski, J. Kacprzyk, M. Krawczak, I. Popchev, L. Rutkowski, V. Sgurev, E. Sotirova, P. Szynkarczyk, and S. Zadrozny (Eds.). Springer International Publishing, 541–552.
  • Ashraf S. Hussein. 2015. Arabic document similarity analysis using n-grams and singular value decomposition. In Proceedings of the 2015 IEEE 9th International Conference on Research Challenges in Information Science (RCIS’15) . 445–455. DOI: 10.1109/RCIS.2015.7128906
  • Radu Tudor Ionescu, Marius Popescu, and Aoife Cahill. 2014. Can characters reveal your native language? A language-independent approach to native language identification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14) . 1363–1373.
  • Hideo Itoh. 2016. RICOH at SemEval-2016 Task 1: IR-based semantic textual similarity estimation. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16) . 691–695.
  • Magdalena Jankowska, Vlado Kešelj, and and Evangelos Milios. 2013. Proximity based one-class classification with common n-gram dissimilarity for authorship verification task—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Magdalena Jankowska, Vlado Kešelj, and Evangelos Milios. 2014. Ensembles of proximity-based one-class classifiers for author verification—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Arun Jayapal and Binayak Goswami. 2013. Vector space model and overlap metric for author identification—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Zhuoren Jiang, Miao Chen, and Xiaozhong Liu. 2014. Semantic annotation with rescoredESA: Rescoring concept features generated from explicit semantic analysis. In Proceedings of the 7th International Workshop on Exploiting Semantic Annotations in Information Retrieval (ESAIR’14) . 25–27. DOI: 10.1145/2663712.2666192
  • M. A. C. Jiffriya, M. A. C. Akmal Jahan, and Roshan G. Ragel. 2014. Plagiarism detection on electronic text based assignments using vector space model. In Proceedings of the 7th International Conference on Information and Automation for Sustainability . 1–5. DOI: 10.1109/ICIAFS.2014.7069593
  • M. A. C. Jiffriya, M. A. C. Akmal Jahan, Roshan G. Ragel, and Sampath Deegalla. 2013. AntiPlag: Plagiarism detection on electronic submissions of text based assignments. In Proceedings of the 2013 IEEE 8th International Conference on Industrial and Information Systems . 376–380. DOI: 10.1109/ICIInfS.2013.6732013
  • Patrick Juola. 2017. Detecting contract cheating via stylometric methods. In Proceedings on the Conference on Plagiarism across Europe and Beyond . 187–198. Retrieved from https://plagiarism.pefka.mendelu.cz/files/proceedings17.pdf .
  • Patrick Juola and Efstathios Stamatatos. 2013. Overview of the author identification task at PAN 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Rune Borge Kalleberg. 2015. Towards detecting textual plagiarism using machine learning methods. University of Agder. Retrieved from https://brage.bibsys.no/xmlui/bitstream/handle/11250/299460/RuneBorgeKalleberg.pdf?sequence=1 .
  • Rafael-Michael Karampatsis. 2015. CDTDS: Predicting paraphrases in twitter via support vector regression. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval’15) . 75–79.
  • Daniel Karaś, Martyna Śpiewak, and Piotr Sobecki. 2017. OPI-JSA at CLEF 2017: Author clustering and style breach detection—Notebook for PAN at CLEF 2017. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’17) .
  • Roman Kern. 2013. Grammar checker features for author identification and author profiling—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Imtiaz H. Khan, Muazzam A. Siddiqui, Kamal M. Jambi, Muhammad Imran, and Abobakr A. Bagais. 2014. Query optimization in Arabic plagiarism detection: An empirical study. Int. J. Intell. Syst. Appl. 7, 1 (2014), 73.
  • Jamal Ahmad Khan. 2017. Style breach detection: An unsupervised detection model—Notebook for PAN at CLEF 2017. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’17) .
  • Mahmoud Khonji and Youssef Iraqi. 2014. A Slightly-modified GI-based Author-verifier with Lots of Features (ASGALF)—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Khadijeh Khoshnavataher, Vahid Zarrabi, Salar Mohtaj, and Habibollah Asghari. 2015. Developing monolingual persian corpus for extrinsic plagiarism detection using artificial obfuscation—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Barbara Kitchenham. 2004. Procedures for performing systematic reviews. Keele University Technical Report TR/SE-0401. Keele University. 33.
  • Barbara Kitchenham, O. Pearl Brereton, David Budgen, Mark Turner, John Bailey, and Stephen Linkman. 2009. Systematic literature reviews in software engineering—A systematic literature review. Inf. Softw. Technol. 51, 1 (2009), 7–15. DOI: 10.1016/j.infsof.2008.09.009
  • Mirco Kocher. 2016. UniNE at CLEF 2016: Author clustering—Notebook for PAN at CLEF 2016. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’16) .
  • Mirco Kocher and Jacques Savoy. 2015. UniNE at CLEF 2015: Author identification—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Mirco Kocher and Jacques Savoy. 2017. UniNE at CLEF 2017: Author clustering—Notebook for PAN at CLEF 2017. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’17) .
  • Leilei Kong, Yong Han, Zhongyuan Han, Haihao Yu, Qibo Wang, Tinglei Zhang, and Haoliang Qi. 2014. Source retrieval based on learning to rank and text alignment based on plagiarism type recognition for plagiarism detection—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Leilei Kong, Zhimao Lu, Yong Han, Haoliang Qi, Zhongyuan Han, Qibo Wang, Zhenyuan Hao, and Jing Zhang. 2015. Source retrieval and text alignment corpus construction for plagiarism detection—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Leilei Kong, Zhimao Lu, Haoliang Qi, and Zhongyuan Han. 2014. Detecting high obfuscation plagiarism: Exploring multi-features fusion via machine learning. Int. J. u-and e-Serv. Sci. Technol. 7, 4 (2014), 385–396.
  • Leilei Kong, Haoliang Qi, Cuixia Du, Mingxing Wang, and Zhongyuan Han. 2013. Approaches for source retrieval and text alignment of plagiarism detection—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Moshe Koppel and Yaron Winter. 2014. Determining if two documents are written by the same author. J. Assoc. Inf. Sci. Technol. 65, 1 (2014), 178–187.
  • Niraj Kumar. 2014. A graph based automatic plagiarism detection technique to handle artificial word reordering and paraphrasing. In Computational Linguistics and Intelligent Text Processing . 481–494.
  • Marcin Kuta and Jacek Kitowski. 2014. Optimisation of character n-gram profiles method for intrinsic plagiarism detection. In Artificial Intelligence and Soft Computing . 500–511.
  • Mikhail Kuznetsov, Anastasia Motrenko, Rita Kuznetsova, and Vadim Strijov. 2016. Methods for intrinsic plagiarism detection and author diarization. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’16 ) . 912–919. Retrieved from http://ceur-ws.org/Vol-1609/ .
  • Robert Layton, Paul Watters, and Richard Dazeley. 2013. Local n-grams for author identification—notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Paola Ledesma, Gibran Fuentes, Gabriela Jasso, Angel Toledo, and and Ivan Meza. 2013. Distance learning for author verification—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Taemin Lee, Jeongmin Chae, Kinam Park, and Soonyoung Jung. 2013. CopyCaptor: Plagiarized source retrieval system using global word frequency and local feedback—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Chi-kiu Lo, Cyril Goutte, and Michel Simard. 2016. CNRC at SemEeval-2016 task 1: Experiments in crosslingual semantic textual similarity. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16) . 668–673.
  • Tara C. Long, Mounir Errami, Angela C. George, Zhaohui Sun, and Harold R. Garner. 2009. Responding to possible plagiarism. Science 323, 5919 (2009), 1293–1294. DOI: 10.1126/science.1167408
  • Ahmed Magooda, Ashraf Y. Mahgoub, Mohsen Rashwan, Magda B. Fayek, and Hazem Raafat. 2015. RDI System for extrinsic plagiarism detection (RDI_RED)—Working Notes for PAN-AraPlagDet at FIRE 2015. In Proceedings of the Forum for Information Retrieval Evaluation (FIRE’15) .
  • Peyman Mahdavi, Zahra Siadati, and Farzin Yaghmaee. 2014. Automatic external persian plagiarism detection using vector space model. In Proceedings of the 2014 4th International eConference on Computer and Knowledge Engineering (ICCKE’14) . 697–702.
  • Ashraf Y. Mahgoub, Ahmed Magooda, Mohsen Rashwan, Magda B. Fayek, and Hazem Raafat. 2015. RDI System for intrinsic plagiarism detection (RDI_RID)—Working Notes for PAN-AraPlagDet at FIRE 2015. In Proceedings of the Forum for Information Retrieval Evaluation (FIRE’15) .
  • Promita Maitra, Souvick Ghosh, and Dipankar Das. 2015. Authorship verification - an approach based on random forest—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Cristhian Mayor, Josue Gutierrez, Angel Toledo, Rodrigo Martinez, Paola Ledesma, Gibran Fuentes, and and Ivan Meza. 2014. A single author style representation for the author verification task—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Norman Meuschke and Bela Gipp. 2013. State-of-the-art in detecting academic plagiarism. Int. J. Educ. Integr. 9, 1 (2013), 50–71.
  • Norman Meuschke and Bela Gipp. 2014. Reducing computational effort for plagiarism detection by using citation characteristics to limit retrieval space. In Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries . 197–200.
  • Norman Meuschke, Christopher Gondek, Daniel Seebacher, Corinna Breitinger, Daniel A. Keim, and Bela Gipp. 2018. An adaptive image-based plagiarism detection approach. In Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL’18) . DOI: 10.1145/3197026.3197042
  • Norman Meuschke, Moritz Schubotz, Felix Hamborg, Tomáš Skopal, and Bela Gipp. 2017. Analyzing mathematical content to detect academic plagiarism. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM’17) . 2211–2214. DOI: 10.1145/3132847.3133144
  • Norman Meuschke, Nicolas Siebeck, Moritz Schubotz, and Bela Gipp. 2017. Analyzing semantic concept patterns to detect academic plagiarism. In Proceedings of the 6th International Workshop on Mining Scientific Publications (WOSP’17) . 46–53. DOI: 10.1145/3127526.3127535
  • Norman Meuschke, Vincent Stange, Moritz Schubotz, Michael Kramer, and Bela Gipp. 2019. Improving academic plagiarism detection for STEM documents by analyzing mathematical content and citations. In Proceeedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL’19) .
  • Pashutan Modaresi and Philipp Gross. 2014. A language independent author verifier using fuzzy c-means clustering—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • H. F. Moed, W. J. M. Burger, J. G. Frankfort, and A. F. J. Van Raan. 1985. The application of bibliometric indicators: Important field- and time-dependent factors to be considered. Scientometrics 8, 3–4 (1985), 177–203. DOI: 10.1007/BF02016935
  • Majid Mohebbi and Alireza Talebpour. 2016. Texts semantic similarity detection based graph approach. Int. Arab J. Inf. Technol. 13, 2 (2016), 246–251.
  • Mozhgan Momtaz, Kayvan Bijari, Mostafa Salehi, and Hadi Veisi. 2016. Graph-based approach to text alignment for plagiarism detection in persian documents. In Proceedings of the Forum for Information Retrieval Evaluation (FIRE’16) . 176–179.
  • Erwan Moreau, Arun Jayapal, and Carl Vogel. 2014. Author verification: exploring a large set of parameters using a genetic algorithm—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Erwan Moreau, Arun Jayapal, Gerard Lynch, and Carl Vogel. 2015. Author verification: Basic stacked generalization applied to predictions from a set of heterogeneous learners—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Erwan Moreau and Carl Vogel. 2013. Style-based distance features for author verification—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Maxim Mozgovoy, Tuomo Kakkonen, and Georgina Cosma. 2010. Automatic student plagiarism detection: Future perspectives. J. Educ. Comput. Res. 43, 4 (2010), 511–531.
  • Aibek Musaev, De Wang, Saajan Shridhar, and Calton Pu. 2015. Fast text classification using randomized explicit semantic analysis. In Proceedings of the 2015 IEEE International Conference on Information Reuse and Integration . 364–371. DOI: 10.1109/IRI.2015.62
  • El Moatez Billah Nagoudi, Ahmed Khorsi, Hadda Cherroun, and Didier Schwab. 2018. 2L-APD: A Two-Level Plagiarism Detection System for Arabic Documents. Cybern. Inf. Technol. 18, 1 (2018), 124–138. DOI: 10.2478/cait-2018-0011
  • Rao Muhammad Adeel Nawab, Mark Stevenson, and Paul Clough. 2017. An IR-based approach utilizing query expansion for plagiarism detection in MEDLINE. IEEE/ACM Trans. Comput. Biol. Bioinforma. 14, 4 (2017), 796–804. DOI: 10.1109/TCBB.2016.2542803
  • Philip M. Newton. 2018. How common is commercial contract cheating in higher education and is it increasing? A Systematic Review. Front. Educ. 3 (2018). DOI: 10.3389/feduc.2018.00067
  • Le Thanh Nguyen, Nguyen Xuan Toan, and Dinh Dien. 2016. Vietnamese plagiarism detection method. In Proceedings of the 7th Symposium on Information and Communication Technology (SoICT’16) . 44–51. DOI: 10.1145/3011077.3011109
  • Gabriel Oberreuter and Juan D. VeláSquez. 2013. Text mining applied to plagiarism detection: The use of words for detecting deviations in the writing style. Exp. Syst. Appl. 40, 9 (2013), 3756–3763.
  • Milan Ojsteršek, Janez Brezovnik, Mojca Kotar, Marko Ferme, Goran Hrovat, Albin Bregant, and Mladen Borovič. 2014. Establishing of a slovenian open access infrastructure: A technical point of view. Program 48, 4 (2014), 394–412. DOI: 10.1108/PROG-02-2014-0005
  • Adeva Oktoveri, Agung Toto Wibowo, and Ari Moesriami Barmawi. 2014. Non-relevant document reduction in anti-plagiarism using asymmetric similarity and AVL tree index. In Proceedings of the 2014 5th International Conference on Intelligent and Advanced Systems (ICIAS’14) . 1–5. DOI: 10.1109/ICIAS.2014.6869547
  • Ahmed Hamza Osman and Naomie Salim. 2013. An improved semantic plagiarism detection scheme based on Chi-squared automatic interaction detection. In Proceedings of the 2013 International Conference on Computing, Electrical and Electronic Engineering (ICCEEE’13) . 640–647. DOI: 10.1109/ICCEEE.2013.6634015
  • Caleb Owens and Fiona A. White. 2013. A 5‐year systematic strategy to reduce plagiarism among first‐year psychology university students. Aust. J. Psychol. 65, 1 (2013), 14–21. DOI: 10.1111/ajpy.12005
  • María Leonor Pacheco, Kelwin Fernandes, and Aldo Porco. 2015. Random forest with increased generalization: A universal background approach for authorship verification—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Yurii Palkovskii and Alexei Belov. 2013. Using hybrid similarity methods for plagiarism detection—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Yurii Palkovskii and Alexei Belov. 2014. Developing high-resolution universal multi-type n-gram plagiarism detector. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) . 984–989.
  • Guy Paré, Marie-Claude Trudel, Mirou Jaana, and Spyros Kitsiou. 2015. Synthesizing information systems knowledge: A typology of literature reviews. Inf. Manag. 52, 2 (2015), 183–199. DOI: 10.1016/j.im.2014.08.008
  • Merin Paul and Sangeetha Jamal. 2015. An Improved SRL based plagiarism detection technique using sentence ranking. Procedia Comput. Sci. 46 (2015), 223–230. DOI: 10.1016/j.procs.2015.02.015
  • Ellie Pavlick, Pushpendre Rastogi, Juri Ganitkevitch, Benjamin Van Durme, and Chris Callison-Burch. 2015. PPDB 2.0: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing . 425–430.
  • Jian Peng, Kim-Kwang Raymond Choo, and Helen Ashman. 2016. Bit-level n-gram based forensic authorship analysis on social media: Identifying individuals from linguistic profiles. J. Netw. Comput. Appl. 70 (2016), 171–182. DOI: 10.1016/j.jnca.2016.04.001
  • Solange de L. Pertile, Viviane P. Moreira, and Paolo Rosso. 2015. Comparing and combining content‐ and citation‐based approaches for plagiarism detection. J. Assoc. Inf. Sci. Technol. 67, 10 (2015), 2511–2526. DOI: 10.1002/asi.23593
  • Solange de L. Pertile, Paolo Rosso, and Viviane P. Moreira. 2013. Counting co-occurrences in citations to identify plagiarised text fragments. In Proceedings of the International Conference of the Cross-Language Evaluation Forum for European Languages . 150–154.
  • Timo Petmanson. 2013. Authorship identification using correlations of frequent features—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Mohammad Taher Pilehvar, David Jurgens, and Roberto Navigli. 2013. Align, disambiguate and walk: A unified approach for measuring semantic similarity. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) . 1341–1351.
  • Gaspar Pizarro V. and Juan D. Velásquez. 2017. Docode 5: Building a real-world plagiarism detection system. Eng. Appl. Artif. Intell. 64 (Jun. 2017), 261–271. DOI: 10.1016/j.engappai.2017.06.001
  • Juan-Pablo Posadas-Durán, Grigori Sidorov, Ildar Batyrshin, and Elibeth Mirasol-Meléndez. 2015. Author verification using syntactic n-grams—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Martin Potthast, Tim Gollub, Matthias Hagen, Martin Tippmann, Johannes Kiesel, Paolo Rosso, Efstathios Stamatatos, and Benno Stein. 2013. Overview of the 5th International Competition on Plagiarism Detection. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Martin Potthast, Matthias Hagen, Anna Beyer, Matthias Busse, Martin Tippmann, Paolo Rosso, and Benno Stein. 2014. Overview of the 6th International Competition on Plagiarism Detection. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Martin Potthast, Matthias Hagen, and Benno Stein. 2016. Author Obfuscation: Attacking the state of the art in authorship verification. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’16) .
  • Martin Potthast, Francisco Rangel, Michael Tschuggnall, Efstathios Stamatatos, Paolo Rosso, and Benno Stein. 2017. Overview of PAN’17: Author identification, author profiling, and author obfuscation. In Proceedings of the 7th International Conference of the CLEF Initiative . DOI: 10.1007/978-3-319-65813-1_25
  • Martin Potthast, Benno Stein, Alberto Barrón-Cedeño, and Paolo Rosso. 2010. An Evaluation framework for plagiarism detection. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters (COLING’10) . 997–1005.
  • Martin Potthast, Benno Stein, Andreas Eiselt, Alberto Barrón-Cedeño, and Paolo Rosso. 2009. Overview of the 1st international competition on plagiarism detection. In Proceedings of the SEPLN 09 Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN’09) . 1–9.
  • Amit Prakash and Sujan Kumar Saha. 2014. Experiments on document chunking and query formation for plagiarism source retrieval—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Piotr Przybyła, Nhung T. H. Nguyen, Matthew Shardlow, Georgios Kontonatsios, and Sophia Ananiadou. 2016. NaCTeM at SemEval-2016 Task 1: Inferring sentence-level semantic similarity from an ensemble of complementary lexical and sentence-level features. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16) . 614–620.
  • Javad Rafiei, Salar Mohtaj, Vahid Zarrabi, and Habibollah Asghari. 2015. Source retrieval plagiarism detection based on noun phrase and keyword phrase extraction—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Shima Rakian, Esfahani Faramarz Safi, and Hamid Rastegari. 2015. A Persian fuzzy plagiarism detection approach. J. Inf. Syst. Telecommun. 3, 3 (2015), 182–190.
  • N Riya Ravi and Deepa Gupta. 2015. Efficient paragraph based chunking and download filtering for plagiarism source retrieval—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • N. Riya Ravi, Vani Kanjirangat, and Deepa Gupta. 2016. Exploration of fuzzy C means clustering algorithm in external plagiarism detection system. In Intelligent Systems Technologies and Applications . Springer, 127–138.
  • Andi Rexha, Stefan Klampfl, Mark Kröll, and Roman Kern. 2015. Towards authorship attribution for bibliometrics using stylometric features. In Proceedings of the Conference on Computational Linguistics and Bibliometrics co-located with the International Conference on Scientometrics and Informetrics (CLBib@ ISSI) . 44–49.
  • Diego Antonio Rodríguez Torrejón and José Manuel Martín Ramos. 2014. CoReMo 2.3 Plagiarism detector text alignment module—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Paolo Rosso, Francisco Rangel, Martin Potthast, Efstathios Stamatatos, Michael Tschuggnall, and Benno Stein. 2016. Overview of PAN’16. In Experimental IR Meets Multilinguality, Multimodality, and Interaction . 332–350.
  • Frantz Rowe. 2014. What literature review is not: Diversity, boundaries and recommendations. Eur. J. Inf. Syst. 23, 3 (2014), 241–255. DOI: 10.1057/ejis.2014.7
  • Barbara Rychalska, Katarzyna Pakulska, Krystyna Chodorowska, Wojciech Walczak, and Piotr Andruszkiewicz. 2016. Samsung Poland NLP Team at SemEval-2016 Task 1: Necessity for diversity; combining recursive autoencoders, WordNet and ensemble methods to measure semantic similarity. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16) . 602–608.
  • Kamil Safin and Rita Kuznetsova. 2017. Style breach detection with neural sentence embeddings—Notebook for PAN at CLEF 2017. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’17) .
  • Anuj Saini and Aayushi Verma. 2016. Anuj@ DPIL-FIRE2016: a novel paraphrase detection method in hindi language using machine learning. In Proceedings of the Forum for Information Retrieval Evaluation . 141–152.
  • Miguel A. Sanchez-Perez, Alexander Gelbukh, and Grigori Sidorov. 2015. Dynamically adjustable approach through obfuscation type recognition—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Miguel A Sanchez-Perez, Grigori Sidorov, and Alexander F Gelbukh. 2014. A winning approach to text alignment for text reuse detection at PAN 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) . 1004–1011.
  • Fernando Sánchez-Vega, Esaú Villatoro-Tello, Manuel Montes-y-Gómez, Luis Villaseñor-Pineda, and Paolo Rosso. 2013. Determining and characterizing the reused text for plagiarism detection. J. Assoc. Inf. Sci. Technol. 65, 5 (2013), 1804–1813. DOI: 10.1016/j.eswa.2012.09.021
  • Yunita Sari and Mark Stevenson. 2015. A machine learning-based intrinsic method for cross-topic and cross-genre authorship verification—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Yunita Sari and Mark Stevenson. 2016. Exploring word embeddings and character n-grams for author clustering—Notebook for PAN at CLEF 2016. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’16) .
  • Satyam, Anand, Arnav Kumar Dawn, and and Sujan Kumar Saha. 2014. Statistical analysis approach to author identification using latent semantic analysis—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Taneeya Satyapanich, Hang Gao, and Tim Finin. 2015. Ebiquity: Paraphrase and semantic similarity in twitter using skipgrams. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval’15) . 51–55.
  • Andreas Schmidt, Reinhold Becker, Daniel Kimmig, Robert Senger, and Steffen Scholz. 2014. A concept for plagiarism detection based on compressed bitmaps. In Procceedings of the 6th International Conference on Advances in Databases, Knowledge, and Data Applications . 30–34.
  • Shachar Seidman. 2013. Authorship verification using the impostors method—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF13) .
  • Prasha Shrestha, Suraj Maharjan, and Thamar Solorio. 2014. Machine translation evaluation metric for text alignment—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Prasha Shrestha and Thamar Solorio. 2013. Using a variety of n-grams for the detection of different kinds of plagiarism. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Muazzam Ahmed Siddiqui, Imtiaz Hussain Khan, Kamal Mansoor Jambi, Salma Omar Elhaj, and Abobakr Bagais. 2014. Developing an arabic plagiarism detection corpus. Comput. Sci. Inf. Technol. 4, 2014 (2014), 261–269. DOI: 10.5121/csit.2014.41221
  • L. Sindhu and Sumam Mary Idicula. 2015. Fingerprinting based detection system for identifying plagiarism in malayalam text documents. In Proceedings of the 2015 International Conference on Computing and Network Communications (CoCoNet’15) . 553–558. DOI: 10.1109/CoCoNet.2015.7411242
  • Abdul Sittar, Hafiz Rizwan Iqbal, and Rao Muhammad Adeel Nawab. 2016. Author diarization using cluster-distance approach. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’16) . 1000–1007.
  • Sidik Soleman and Ayu Purwarianti. 2014. Experiments on the Indonesian plagiarism detection using latent semantic analysis. In Proceedings of the 2014 2nd International Conference on Information and Communication Technology (ICoICT’14) . 413–418. DOI: 10.1109/ICoICT.2014.6914098
  • Hussein Soori, Michal Prilepok, Jan Platos, Eshetie Berhan, and Vaclav Snasel. 2014. Text similarity based on data compression in Arabic. In AETA 2013: Recent Advances in Electrical Engineering and Related Sciences . Springer, 211–220.
  • Efstathios Stamatatos, Walter Daelemans, Ben Verhoeven, Martin Potthast, Benno Stein, Patrick Juola, Miguel A. Sanchez-Perez, and Alberto Barrón-Cedeño. 2014. Overview of the author identification task at PAN 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Efstathios Stamatatos, Martin Potthast, Francisco Rangel, Paolo Rosso, and Benno Stein. 2015. Overview of the PAN/CLEF 2015 Evaluation Lab. In Experimental IR Meets Multilinguality, Multimodality, and Interaction: Proceedings of the 6th International Conference of the CLEF Initiative (CLEF’15) . 518–538. DOI: 10.1007/978-3-319-24027-5_49
  • Efstathios Stamatatos, Walter Daelemans Ben Verhoeven, Patrick Juola, Aurelio López-López, Martin Potthast, and Benno Stein. 2015. Overview of the author identification task at PAN 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Benno Stein, Sven zu Eissen, and Martin Potthast. 2007. Strategies for retrieving plagiarized documents. In Proceedings of the 30th Annual International ACM SIGIR Conference . 825–826. DOI: 10.1145/1277741.1277928
  • Imam Much Ibnu Subroto and Ali Selamat. 2014. Plagiarism detection through internet using hybrid artificial neural network and support vectors machine. Telecommun. Comput. Electron. Control. 12, 1 (2014), 209–218.
  • Šimon Suchomel and Michal Brandejs. 2014. Heterogeneous queries for synoptic and phrasal search—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Šimon Suchomel and Michal Brandejs. 2015. Improving synoptic querying for source retrieval. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Šimon Suchomel, Jan Kasprzak, and Michal Brandejs. 2013. Diverse queries and feature type selection for plagiarism discovery—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • M. D. Arafat Sultan, Steven Bethard, and Tamara Sumner. 2014. DLS@CU: Sentence similarity from word alignment. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval’14) . 241–246.
  • M. D. Arafat Sultan, Steven Bethard, and Tamara Sumner. 2014. Back to basics for monolingual alignment: Exploiting word similarity and contextual evidence. Trans. Assoc. Comput. Linguist. 2 (2014), 219–230.
  • M. D. Arafat Sultan, Steven Bethard, and Tamara Sumner. 2015. DLS@CU: Sentence similarity from word alignment and semantic vector composition. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval’15) . 148–153.
  • Junfeng Tian and Man Lan. 2016. ECNU at SemEval-2016 Task 1: Leveraging word embedding from macro and micro views to boost performance for semantic textual similarity. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16) . 621–627.
  • Diego A. Rodríguez Torrejón and José Manuel Martín Ramos. 2013. Text alignment module in CoReMo 2.1 plagiarism detector. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Michael Tschuggnall and Günther Specht. 2013. Detecting plagiarism in text documents through grammar-analysis of authors. Datenbanksysteme für Business, Technologie und Web (BTW) 2028 , Volker Markl, Gunter Saake, Kai-Uwe Sattler, Gregor Hackenbroich, Bernhard Mitschang, Theo Härder, and Veit Köppen (Eds.). Gesellschaft für Informatik e.V., 241--259.
  • Michael Tschuggnall and Günther Specht. 2013. Using grammar-profiles to intrinsically expose plagiarism in text documents. In Natural Language Processing and Information Systems . 297–302.
  • Michael Tschuggnall, Efstathios Stamatatos, Ben Verhoeven, Walter Daelemans, Günther Specht, Benno Stein, and Martin Potthast. 2017. Overview of the author identification task at PAN-2017: Style breach detection and author clustering. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’17) .
  • Alper Kursat Uysal and Serkan Gunal. 2014. Text classification using genetic algorithm oriented latent semantic features. Exp. Syst. Appl. 41, 13 (2014), 5938–5947. DOI: 10.1016/j.eswa.2014.03.041
  • Vani Kanjirangat and Deepa Gupta. 2014. Using K-means cluster based techniques in external plagiarism detection. In Proceedings of the 2014 International Conference on Contemporary Computing and Informatics (IC3I’14) . 1268–1273. DOI: 10.1109/IC3I.2014.7019659
  • Vani Kanjirangat and Deepa Gupta. 2015. Investigating the impact of combined similarity metrics and POS tagging in extrinsic text plagiarism detection system. In Proceedings of the 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI’15) . 1578–1584. DOI: 10.1109/ICACCI.2015.7275838
  • Vani Kanjirangat and Deepa Gupta. 2016. Study on extrinsic text plagiarism detection techniques and tools. J. Eng. Sci. Technol. Rev. 9, 5 (2016), 9–23.
  • Vani Kanjirangat and Deepa Gupta. 2017. Detection of idea plagiarism using syntax–semantic concept extractions with genetic algorithm. Exp. Syst. Appl. 73 (2017), 11–26. DOI: 10.1016/j.eswa.2016.12.022
  • Vani Kanjirangat and Deepa Gupta. 2017. Identifying document-level text plagiarism: A two-phase approach. J. Eng. Sci. Technol. 12, 12 (2017), 3226–3250.
  • Vani Kanjirangat and Deepa Gupta. 2017. Text plagiarism classification using syntax based linguistic features. Exp. Syst. Appl. 88 (2017), 448–464. DOI: 10.1016/j.eswa.2017.07.006
  • Anna Vartapetiance and Lee Gillam. 2013. A textual modus operandi: surrey's simple system for author identification—notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Juan D Velásquez, Yerko Covacevich, Francisco Molina, Edison Marrese-Taylor, Cristián Rodríguez, and Felipe Bravo-Marquez. 2016. DOCODE 3.0 (DOcument COpy DEtector): A system for plagiarism detection by applying an information fusion process from multiple documental data sources. Inf. Fus. 27 (2016), 64–75. DOI: 10.1016/j.inffus.2015.05.006
  • Ondřej Veselý, Tomáš Foltýnek, and Jiří Rybička. 2013. Source retrieval via naïve approach and passage selection heuristics—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Darnes Vilariño, David Pinto, Helena Gómez, Saúl León, and Esteban Castillo. 2013. Lexical-syntactic and graph-based features for authorship verification—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Ngoc Phuoc An Vo, Octavian Popescu, and Tommaso Caselli. 2014. FBK-TR: SVM for semantic relatedeness and corpus patterns for RTE. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval’14) . 289–293.
  • Hai Hieu Vu, Jeanne Villaneau, Farida Saïd, and Pierre-François Marteau. 2014. Sentence similarity by combining explicit semantic analysis and overlapping N-grams. In Text, Speech and Dialogue . 201–208.
  • Elizabeth Wager. 2014. Defining and responding to plagiarism. Learn. Publ. 27, 1 (2014), 33–42. DOI: 10.1087/20140105
  • Wafa Wali, Bilel Gargouri, and Abdelmajid Ben Hamadou. 2015. Supervised learning to measure the semantic similarity between arabic sentences. In Computational Collective Intelligence . 158–167.
  • John Walker. 1998. Student plagiarism in universities: What are we doing about it? High. Educ. Res. Dev. 17, 1 (1998), 89–106. DOI: 10.1080/0729436980170105
  • Shuai Wang, Haoliang Qi, Leilei Kong, and Cuixia Nu. 2013. Combination of VSM and jaccard coefficient for external plagiarism detection. In Proceedings of the 2013 International Conference on Machine Learning and Cybernetics . 1880–1885. DOI: 10.1109/ICMLC.2013.6890902
  • Debora Weber-Wulff. 2014. False feathers: A Perspective on Academic Plagiarism . Springer, Berlin.
  • Debora Weber-Wulff, Christopher Möller, Jannis Touras, and Elin Zincke. 2013. Plagiarism Detection Software Test 2013. Retrieved from http://plagiat.htw-berlin.de/wp-content/uploads/Testbericht-2013-color.pdf .
  • Agung Toto Wibowo, Kadek W. Sudarmadi, and Ari M. Barmawi. 2013. Comparison between fingerprint and winnowing algorithm to detect plagiarism fraud on Bahasa Indonesia documents. In Proceedings of the 2013 International Conference of Information and Communication Technology (ICoICT’13) . 128–133. DOI: 10.1109/ICoICT.2013.6574560
  • Kyle Williams, Hung-Hsuan Chen, Sagnik Ray Chowdhury, and C. Lee Giles. 2013. Unsupervised ranking for plagiarism source retrieval—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Kyle Williams, Hung-Hsuan Chen, and C. Lee Giles. 2014. Classifying and ranking search engine results as potential sources of plagiarism. In Proceedings of the 2014 ACM Symposium on Document Engineering (DocEng’14) . 97–106. DOI: 10.1145/2644866.2644879
  • Kyle Williams, Hung-Hsuan Chen, and C. Lee Giles. 2014. Supervised ranking for plagiarism source retrieval—notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop ( CLEF’ 14) .
  • Xuchen Yao, Benjamin Van Durme, Chris Callison-Burch, and Peter Clark. 2013. A lightweight and high performance monolingual word aligner. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) . 702–707.
  • Takeru Yokoi. 2015. Sentence-based plagiarism detection for japanese document based on common nouns and part-of-speech structure. In Intelligent Software Methodologies, Tools and Techniques . 297–308.
  • Guido Zarrella, John Henderson, Elizabeth M. Merkhofer, and Laura Strickhart. 2015. MITRE: Seven systems for semantic similarity in tweets. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval’15) . 12–17.
  • Chunxia Zhang, Xindong Wu, Zhendong Niu, and Wei Ding. 2014. Authorship identification from unstructured texts. Knowl.-Based Syst. 66 (2014), 99–111. DOI: 10.1016/j.knosys.2014.04.025
  • Jiang Zhao and Man Lan. 2015. Ecnu: Leveraging word embeddings to boost performance for paraphrase in twitter. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval’15) . 34–39.
  • Valentin Zmiycharov, Dimitar Alexandrov, Hristo Georgiev, Yasen Kiprov, Georgi Georgiev, Ivan Koychev, and Preslav Nakov. 2016. Experiments in authorship-link ranking and complete author clustering—Notebook for PAN at CLEF 2016. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’16) .
  • Sven Meyer Zu Eissen and Benno Stein. 2006. Intrinsic plagiarism detection. In Proceedings of the European Conference on Information Retrieval . 565–569.
  • Denis Zubarev and Ilya Sochenkov. 2014. Using sentence similarity measure for plagiarism source retrieval—notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop ( CLEF’ 14) .
  • Teddi Fishman. 2009. We know it when we see it' is not good enough: Toward a standard definition of plagiarism that transcends theft, fraud, and copyright. In Proceedings 4th Asia Pacific Conference on Educational Integrity (4APCEI'09) . 5.
  • 1 http://de.vroniplag.wikia.com .
  • 2 https://beallslist.weebly.com/standalone-journals.html .
  • 3 https://www.ldc.upenn.edu/ .
  • 4 http://github.com/danieldk/citar .
  • 5 http://www.cis.uni-muenchen.de/∼schmid/tools/TreeTagger/ .
  • 6 http://nlp.stanford.edu/software/lex-parser.shtml .
  • 7 https://publications.europa.eu/en/web/eu-vocabularies/th-dataset/-/resource/dataset/eurovoc .
  • 8 https://babelnet.org/ .
  • 9 https://wiki.dbpedia.org/ .
  • 10 http://verbs.colorado.edu/∼mpalmer/projects/ace.html .
  • 11 http://verbs.colorado.edu/∼mpalmer/projects/verbnet.html .
  • 12 https://pan.webis.de/data.html .
  • 13 https://www.microsoft.com/en-us/download/details.aspx?id=52398 .

This work was supported by the EU ESF grant CZ.02.2.69/0.0/0.0/16_027/0007953 “MENDELU international development.”

Authors’ addresses: T. Foltýnek, Department of Informatics, Faculty of Business and Economics, Mendel University in Brno, Zemědělská 1, 613 00 Brno, Czechia; email: [email protected] ; N. Meuschke and B. Gipp, Data & Knowledge Engineering Group, University of Wuppertal, School of Electrical, Information and Media Engineering, Rainer-Gruenter-Str. 21, D-42119 Wuppertal, Germany; emails: [email protected] , [email protected] , [email protected] , [email protected] .

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).

CC-BY share alike license image

This work is licensed under a Creative Commons Attribution-Share Alike International 4.0 License .

©2019 Copyright held by the owner/author(s). 0360-0300/2019/10-ART112 $15.00 DOI: https://doi.org/10.1145/3345317

Publication History: Received March 2019; revised August 2019; accepted August 2019

  • Resources Home 🏠
  • Try SciSpace Copilot
  • Search research papers
  • Add Copilot Extension
  • Try AI Detector
  • Try Paraphraser
  • Try Citation Generator
  • April Papers
  • June Papers
  • July Papers

SciSpace Resources

Plagiarism in Research — The Complete Guide [eBook]

Deeptanshu D

Table of Contents

Plagiarism in research

Plagiarism can be described as the not-so-subtle art of stealing an already existing work, violating the principles of academic integrity and fairness. Well, there's no denying that we see further by standing on the shoulders of giants, and when it comes to constructing a research prose, we often need to look at the world through their lens. However, in this process, many students and researchers, knowingly or otherwise, resort to plagiarism.

In many instances, plagiarism is intentional, whether through direct copying or paraphrasing. Unfortunately, there are also times when it happens unintentionally. Regardless of the intent, plagiarism goes against the ethos of the scientific world and is considered a severe moral and disciplinary offense.

The good news is that you can avoid plagiarism and even work around it. So, if you're keen on publishing unplagiarized papers and maintaining academic integrity, you've come to the right place.

With this comprehensive ebook on plagiarism, we intend to help you understand what constitutes plagiarism in research, why it happens, plagiarism concepts and types, how you can prevent it, and much more.

What is plagiarism?

Plagiarism

Plagiarism is defined as representing a part of or the entirety of someone else's work as your own. Whether published or unpublished, this could be ideas, text verbatim, infographics, etc. It is no different in the academic writing, either. However, it is not considered plagiarism if most of your work is original and the referred part is diligently cited.

The degree of plagiarism can vary from discipline to discipline. Like in mathematics or engineering, there are times when you have to copy and paste entire equations or proofs, which can take a significant chunk of your paper. Again, that is not constituted plagiarism, provided there's an analysis or rebuttal to it.

That said, there are some objective parameters defining plagiarism. Get to know them, and your life as a researcher will be much smoother.

Common types of plagiarism

Types of plagiarism

Plagiarism often creeps into academic works in various forms, from complete plagiarism to accidental plagiarism.

The types of plagiarism varies depending on the two critical aspects — the writer's intention and the degree to which the prose is plagiarized. These aspects help institutions and publishers define plagiarism types more accurately.

Common forms of Plagiarism

The agreed-upon forms of plagiarism that occur in research writing include:

1. Global or Complete Plagiarism

Global or Complete plagiarism is inarguably the most severe form of plagiarism  — It is as good as stealing. It happens when an author blatantly copies somebody else's work in its entirety and passes it on as their own.

Since complete plagiarism is always committed deliberately and disguises the ownership of the work, it is directly recognized under copyright violation and can lead to intellectual property abuse and legal battles. That, along with irredeemable repercussions like a damaged reputation, getting expelled, or losing your job.

2. Verbatim or Direct Plagiarism

Verbatim or direct plagiarism happens when you copy a part of someone else's work, word-to-word, without providing adequate credits or attributions. The ideas, structure, and diction in your work would match the original author's work. Even if you were to change a few words or the position of sentences here and there, the final result remains the same.

The best way to avoid this is to minimize copy-pasting entire paragraphs and use it only when the situation calls for it. And when you do so, use quotation marks and in-text citations, crediting the original source.

3. Source-based Plagiarism

Source-based plagiarism results from an author trying to mislead or disguise the natural source of their work. Say you write a paper, giving enough citations, but when the editor or peer reviewers try to cross-check your references, they find a dead end or incorrect information. Another instance is when you use both primary and secondary data to support your argument but only cite the former with no reference for the latter.

In both cases, the information provided is either irrelevant or misleading. You may have cited it, but it does not support the text completely.

Similarly, another type of plagiarism is called data manipulation and counterfeiting . Data Manipulation is creating your own data and results. In contrast, data counterfeiting is skipping or adultering the key findings to suit your expected outcomes.

Using misinformed sources in a research study constitutes grave violations and offenses. Particularly in the medical field, it can lead to legal issues such as wrong data presentation. Its interpretation can lead to false clinical trials, which can have grave consequences.

4. Paraphrasing Plagiarism

Paraphrasing plagiarism is one of the more common types of plagiarism. It refers to when an author copies ideas, thoughts, and inferences, rephrases sentences, and then claims ownership.

Compared to verbatim, paraphrasing plagiarism involves changing words, sentences, semantics or translating texts. The general idea or the topic of the thesis, however, remains the same and as clever as it may seem, it is straightforward to detect.

More often authors commit paraphrasing by reading a few sources and writing them in their own words without due citation. This can lead the reader to believe that the idea was the author's own when it wasn’t.

plagiarism research paper

5. Mosaic or Patchwork Plagiarism

One of the more mischievous ways to abstain from writing original work is mosaic plagiarism. Patchwork or mosaic plagiarism occurs when an author stitches together a research paper by lending pieces from multiple sources and weaving them as their creation. Sure, the author can add a few new words and phrases, but the meat of the paper is stolen.

It’s common for authors to refer to various sources during the research. But to patch them together and form a new paper from them is wrong.

Mosaic plagiarism can be difficult to detect, so authors, too confident in themselves, often resort to it. However, these days, there are plenty of online tools like Turnitin, Enago, and EasyBib that identify patchwork and correctly point to the sources from which you have borrowed.

6. Ghostwriting

Outside of the academic world, ghostwriting is entirely acceptable. Leaders do it, politicians do it, and artists do it. In academia, however, ghostwriting is a breach of conduct that tarnishes the integrity of a student or a researcher.

Ghostwriting is the act of using an unacknowledged person’s assistance to complete a paper. This happens in two ways — when an author has their paper’s foundation laid out but pays someone else to write, edit, and proofread. The other is when they pay someone to write the whole article from scratch.

In either case, it’s utterly unacceptable since the whole point of a paper is to exhibit an author's original thoughts presented by them. Ghostwriting, thus, raises a serious question about the academic capabilities of an author.

7. Self-plagiarism

This may surprise many, but rehashing previous works, even if they are your own, is also considered plagiarism. The biggest reason why self-plagiarism is a fallacy is because you’re trying to claim credit for something that you have already received credit for.

Authors often borrow their past data or experiment results, use them in their current work, and present them as brand new. Some may even plagiarize old published works' ideas, cues, or phrases.

The degree to which self-plagiarism is still under debate depends on the volume of work that has been copied. Additionally, many academic and non-academic journals have devised a fixed ratio on what percentage of self-plagiarism is acceptable. Unless you have made a proper declaration through citations and quotation marks about old data usage, it will fall under the scope of self-plagiarism.

8. Accidental Plagiarism

Apart from the intentional forms of plagiarism, there’s also accidental plagiarism. As the name suggests, it happens inadvertently. Unwitting paraphrasing, missing in-text or end-of-text citations, or not using quotation blocks falls under the same criteria.

While writing your academic papers, you have to stay cautious to avoid accidental plagiarism. The best way to do this is by going through your article thoroughly. Proofread as if your life depended on it, and check whether you’ve given citations where required.

Why is it important to avoid research plagiarism?

Why we should avoid plagiarism

As a scholar, you must be aware that the sole purpose of any article or academic writing is to present an original idea to its readers. When the prose is plagiarized, it removes any credibility from the author, discredits the source, and leaves the reader misinformed which goes against the ethos of academic institutions.

Here are the few reasons why you should avoid research plagiarism:

Critical analysis is important

While writing research papers, an author must dive deep into finding various sources, like scholarly articles, especially peer-reviewed ones. You are expected to examine the sources keenly to understand the gaps in the chosen topic and formulate your research questions.

Crafting critical questions related to the field of study is essential as it displays your understanding and the analysis you employed to decipher the problems in the chosen topic. When you do this, your chances of being published improve, and it’s also good for your long-term career growth.

Streamlined scholarly communication

An extended form of scholarly communication is established when you respond and craft your academic work based on what others have previously done in a particular domain. By appropriately using others' work, i.e., through citations, you acknowledge the tasks done before you and how they helped shape your work. Moreover, citations expand the doorway for readers to learn more about a topic from the beginning to the current state. Plagiarism prevents this.

Credibility in originality

Originality is invaluable in the research community. From your thesis topic and fresh methodology to new data, conclusion, and tone of writing, the more original your paper is, the more people are intrigued by it. And as long as your paper is backed by credible sources, it further solidifies your academic integrity. Plagiarism can hinder these.

How does plagiarism happen?

Even though plagiarism is a cardinal sin and plagiarized academic writing is consistently rejected, it still happens. So the question is, what makes people resort to plagiarism?

Some of the reasons why authors choose the plagiarism include:

  • Lack of knowledge about plagiarism
  • Accidentally copying a work
  • Forgetting to cite a source
  • Desire to excel among peers
  • A false belief that no one will catch them
  • No interest in academic work and just taking that as an assignment
  • Using shortcuts in the form of self-plagiarism
  • Fear of failing

Whatever the reason an author may have, plagiarism can never be justified. It is seen as an unfair advantage and disrespect to those who have put in the blood, sweat, and tears into doing their due diligence. Additionally, remember that readers, universities, or publishers are only interested in your genuine ideas, and your evaluation, as an author, is done based on that.

Related Article: Citation Machine Alternatives — Top citation tools 2023

Consequences of plagiarism

We have reiterated enough that plagiarism is objectionable and has consequences. But what exactly are the consequences? Well, that depends on who the author is and the type of plagiarism.

For minor offenses like accidental plagiarism or missing citations, a slap on the wrist in the form of feedback from the editor or peers is the norm. For major cases, let’s take a look:

For students

  • Poor grades

Even if you are a first-timer, your professor may choose to fail you, which can have a detrimental effect on your scores.

  • Failing a course

It is not rare for professors to fail Ph.D. and graduate students when caught plagiarizing. Not only does this hurt your academics, but it also extends the duration of your study by a year.

  • Disciplinary action

Every university or academic institution has strict policies and regulations regarding plagiarism. If caught, an author may have to face the academic review committee to decide their future. The results seen in general cases range from poor grades, failure for a year, or being banished from any academic or research-related work.

  • Expulsion from the university

A university may resort to expulsion only in the worst of cases, like copyright violation or Intellectual Property theft.

  • Tarnished academic reputation

This just might be the most consequential of all scenarios. It takes a lifetime to build a great impression but a few seconds to tarnish it. Many academics lose their peers' trust and find it hard to recover.  Moreover, background checks for future jobs or fellowships become a nightmare.

For universities

A university is built on reputation. Letting plagiarism slide is the quickest way to tarnish its reputation. This leads to lesser interest from top talent and publishers and trouble finding grant money.

Prospective students turning away from a university means losing out on tuition money. This further drives experienced faculty away. And the cycle continues.

For researchers

  • Legal battles

Since it falls under copyright infringement, researchers may face legal battles if their academic work is believed to be plagiarized. There is no shortage of case studies, like those of Doris Kearns Goodwin or Mark Chabedi, where authors, without permission, used another person's work and claimed it to be their own. In all these instances, they faced legal issues that led to fines, barred from writing and research, and sometimes, imprisonment even.

  • Professional reputation

Publishers and journals will not engage authors with a past of plagiarism to produce content under their brand name. Also, if the author is a professor or a fellow, it can lead to contract termination.

How to avoid plagiarism in research?

Things to watch out for to avoid plagiarism

The simplest way to avoid plagiarism would be to put in the work. Do original research, collect new data, and derive new conclusions. If you use references, keep track of each and every single one and cite them in your paper.

To ensure that your academic writing or research paper is unique and free from any type of plagiarism, incorporate the following tips:

  • Pay adequate attention to your references

Writing a paper requires extraordinary research. So, it’s understandable when researchers sometimes lose track of their references. This often leads to accidental plagiarism.

So, instead of falling into this trap, maintain lists or take notes of your reference while doing your research. This will help you when you’re writing your citations.

  • Find credible sources

Always refer to credible sources, whether a paper, a conference proceeding or an infographic.  These will present unbiased evidence and accurate experimentation results with facts backing the evidence presented by your paper.

  • Proper use of paraphrasing, quotations, and citations

It’s borderline impossible to avoid using direct references in your paper, especially if you’re providing a critical analysis or a rebuttal to an already existing article. So, to avoid getting prosecuted, use quotation marks when using a text verbatim.

In case you’re paraphrasing, use citations so that everyone knows that it’s not your idea. Credit the original author and a secondary source, if any. Publishers usually have guidelines about how to cite. There are many different styles like APA, MLA, Chicago, etc. Be on top of what your publisher demands.

Usually, it is observed that readers or the audience have a greater inclination towards paraphrasing than the quotes, especially if it is bulky sections. The reason is obvious: paraphrasing displays your understanding of the original work's meaning and interpretation, uniquely suiting the current state of affairs.

  • Review and recheck your work multiple times

Before submitting the final, you must subject your work to scrutiny. Multiple times at that. The more you do it, the less your chances of falling under accidental plagiarism.  To ensure that your final work does not constitute any types of plagiarism, ensure that:

  • There are no misplaced or missed citations
  • The paraphrased text does not closely resemble the original text
  • You don’t have any wrongful references
  • You’re not missing quotation marks or failing to provide the author's credentials after quotation marks
  • You use a plagiarism checker

More on how to avoid plagiarism .

On top of these, read your university or your publisher’s policies. All of them have their sets of rules about what’s acceptable and what’s not. They also define the punishment for any offense, factoring in its degree.

  • Use Online Tools

After receiving your article, most universities, publishers, and other institutions will run it through plagiarism checkers, including AI detectors , to detect all types of plagiarism. These plagiarism checkers function based on drawing similarities between your article and previously published works present in their database. If found similar, your paper is deemed plagiarized.

You can always save yourself from embarrassment by staying a step ahead. Use a plagiarism checker before you submit your paper. Using plagiarism checker tools, you can quickly identify if you have committed plagiarism. Then, no one except you will know about it, and you will have a chance to correct yourself.

Best Plagiarism Checkers in 2023

Plagiarism checkers are an incredibly convenient tool for improving academic writing. Therefore, here are some of the best plagiarism checkers for academic writing.

Turnitin's iThenticate

This is one of the best plagiarism checker for your academic paper and a good fit for academic writers, researchers, and scholars.

Turnitin’s iThenticare claims to cross-check your paper against 99 billion+ current and archived web pages, 1.8 billion student papers, and best-in-class scholarly content from top publishers in every major discipline and dozens of languages.

The iThenticate plagiarism checker is now available on SciSpace. ( Instructions on how to use it .)

Grammarly serves as a one-stop solution for better writing. Through Grammarly, you can make your paper have fewer grammatical errors, better clarity, and, yes, be plagiarism-free.

Grammarly's plagiarism checker compares your paper to billions of web pages and existing papers online. It points out all the sentences which need a citation, giving you the original source as well. On top of this, Grammarly also rates your document for an originality score.

ProWritingAid

ProWritingAid is another AI writing assistant that offers a plethora of tools to better your document. One of its paid services include a ProWritingAid Plagiarism Checker that helps authors find out how much of their work is plagiarized.

Once you scan your document, the plagiarism checker gives you details like the percentage of non-original text, how much of that is quoted, and how much is not. It will also give you links so you can cite them as required.

EasyBib Plagiarism Checker

EasyBib Plagiarism Checker compares your writing sample with billions of available sources online to detect plagiarism at every level. You'll be notified which phrases are too similar to current research and literature, prompting a possible rewrite or additional citation.

Moreover, you'll get feedback on your paper's inconsistencies, such as changes in text, formatting, or style. These small details could suggest possible plagiarism within your assignment.

Plagiarism CheckerX

Working on the same principle of scanning and matching against various sources, the critical aspect of Plagiarism CheckerX is that you can download and use it whenever you wish. It is slightly faster than others and never stores your data, so you can stay assured of any data loss.

Compilatio Magister

Compilatio Magister is a plagiarism checker designed explicitly for teaching professionals. It lets you access turnkey educational resources, check for plagiarism against thousands of documents, and seek reliable and accurate analysis reports.

Quick Wrap Up

In the world of academia, the spectre of plagiarism lurks but fear not, for armed with awareness and right plagiarism checkers, you have the power to conquer this foe.

Even though plenty of students or researchers believe they can get away with it, it’s never the case. You owe it to yourself and everyone who has invested time and resources in you to publish original, plagiarism-free research work every time.

Throughout this eBook, we have explored the depths of plagiarism, unraveling its consequences and the importance of originality. Many universities have specific classes and workshops discussing plagiarism to create ample awareness of the subject. Thus, you should continue to be honourable in this regard and write papers from the heart.

Hey there! We encourage you to visit our SciSpace discover page to explore how our suite of products can make research workflows easier and allow you to spend more time advancing science.

With the best-in-class solution, you can manage everything from literature search and discovery to profile management, research writing, and much more.

Frequently Asked Questions (FAQs)

1. how to paraphrase without plagiarizing.

  • Understand the original text completely.
  • Write the idea in your own words without looking at the original text.
  • Change the structure of sentences, not just individual words.
  • Use synonyms wisely and ensure the context remains the same.
  • Lastly, always cite the original source.

Even when paraphrasing, it's important to attribute ideas to the original author.

2. How to avoid plagiarism in research?

  • Understand what constitutes plagiarism.
  • Always give proper credit to the original authors when quoting or paraphrasing their work.
  • Use plagiarism checker tools to ensure your work is original.
  • Keep track of your sources throughout your research.
  • Quote and paraphrase accurately.

3. Examples of plagiarism?

  • Copying and pasting text directly from a source without quotation or citation.
  • Paraphrasing someone else's work without correct citation.
  • Presenting someone else's work or ideas as your own.
  • Recycling or self-plagiarism, where you mention your previous work without citing it.

4. How much plagiarism is allowed in a research paper?

In the academic world, the goal is always to strive for 0% plagiarism. However, sometimes, minor plagiarism can occur unintentionally, such as when common phrases are matched in plagiarism software. Most institutions and publishers will allow a small percentage, typically under 10%, for such instances. Remember, this doesn't mean you can deliberately plagiarize 10% of your work.

5. What are the four types of plagiarism?

  • Direct Plagiarism definition: This occurs when one directly copies someone else's work word-for-word without giving credit.
  • Mosaic Plagiarism definition: This happens when someone borrows phrases from a source without using quotation marks, or finds synonyms for the author's language while keeping the same general structure and meaning.
  • Accidental Plagiarism definition: This happens when a person neglects to cite their sources, or misquotes their sources, or unintentionally paraphrases a source by using similar words, groupings, or phrases without attribution.
  • Self-Plagiarism definition: This happens when someone recycles their own work from a previous paper or study and presents it as new content without citing the original.

6. How much copying is considered plagiarism?

Any amount of copying can be considered plagiarism if you're presenting someone else's work as your own without attribution. Even a single sentence copied without proper citation can be seen as plagiarism. The key is to always give credit where it's due.

7. How to check plagiarism in a research paper?

There are numerous online tools and software that you can use to check plagiarism in a research paper. Some popular ones include Grammarly, and Copyscape. These tools compare your paper with millions of other documents on the web and databases to identify any matches. You can also use SciSpace paraphraser to rephrase the content and keep it unique.

plagiarism research paper

You might also like

Plagiarism FAQs: 10 Most Commonly Asked Questions on Plagiarism in Research Answered

Plagiarism FAQs: 10 Most Commonly Asked Questions on Plagiarism in Research Answered

Reyon Gifto

3 Common Mistakes in Research Publication, and How to Avoid Them

Monali Ghosh

  • Locations and Hours
  • UCLA Library
  • Research Guides
  • Research Tips and Tools

Citing Sources

  • How to Avoid Plagiarism
  • Introduction
  • Reading Citations

Best Practices for Avoiding Plagiarism

The entire section below came from a research guide from Iowa State University.  To avoid plagiarism, one must provide a reference to that source to indicate where the original information came from (see the "Source:" section below).

"There are many ways to avoid plagiarism, including developing good research habits, good time management, and taking responsibility for your own learning. Here are some specific tips:

  • Don't procrastinate with your research and assignments. Good research takes time. Procrastinating makes it likely you'll run out of time or be unduly pressured to finish. This sort of pressure can often lead to sloppy research habits and bad decisions. Plan your research well in advance, and seek help when needed from your professor, from librarians and other campus support staff.
  • Commit to doing your own work. If you don't understand an assignment, talk with your professor. Don't take the "easy way" out by asking your roommate or friends for copies of old assignments. A different aspect of this is group work. Group projects are very popular in some classes on campus, but not all. Make sure you clearly understand when your professor says it's okay to work with others on assignments and submit group work on assignments, versus when assignments and papers need to represent your own work.
  •  Be 100% scrupulous in your note taking. As you prepare your paper or research, and as you begin drafting your paper. One good practice is to clearly label in your notes your own ideas (write "ME" in parentheses) and ideas and words from others (write "SMITH, 2005" or something to indicate author, source, source date). Keep good records of the sources you consult, and the ideas you take from them. If you're writing a paper, you'll need this information for your bibliographies or references cited list anyway, so you'll benefit from good organization from the beginning.
  • Cite your sources scrupulously. Always cite other people's work, words, ideas and phrases that you use directly or indirectly in your paper. Regardless of whether you found the information in a book, article, or website, and whether it's text, a graphic, an illustration, chart or table, you need to cite it. When you use words or phrases from other sources, these need to be in quotes. Current style manuals are available at most reference desks and online. They may also give further advice on avoiding plagiarism.
  • Understand good paraphrasing. Simply using synonyms or scrambling an author's words and phrases and then using these "rewrites" uncredited in your work is plagiarism, plain and simple. Good paraphrasing requires that you genuinely understand the original source, that you are genuinely using your own words to summarize a point or concept, and that you insert in quotes any unique words or phrases you use from the original source. Good paraphrasing also requires that you cite the original source. Anything less and you veer into the dangerous territory of plagiarism."

Source: Vega García, S.A. (2012). Understanding plagiarism: Information literacy guide. Iowa State University. Retrieved from  http://instr.iastate.libguides.com/content.php?pid=10314 . [Accessed January 3, 2017]

Plagiarism prevention.

  • Plagiarism Prevention (onlinecolleges.net) This resource provides information about preventing plagiarism, understanding the various types of plagiarism, and learning how to cite properly to avoid plagiarism.

UCLA has a campuswide license to Turnitin.com. Faculty may turn in student papers electronically, where the text can be compared with a vast database of other student papers, online articles, general Web pages, and other sources. Turnitin.com then produces a report for the instructor indicating whether the paper was plagiarized and if so, how much.

For more information, go to Turnitin.com .

Plagiarism in the News

  • << Previous: Plagiarism
  • Next: Get Help >>
  • Last Updated: May 17, 2024 2:33 PM
  • URL: https://guides.library.ucla.edu/citing
  • Utility Menu

University Logo

fa3d988da6f218669ec27d6b6019a0cd

A publication of the harvard college writing program.

Harvard Guide to Using Sources 

  • The Honor Code

How to Avoid Plagiarism

It's not enough to know why plagiarism is taken so seriously in the academic world or to know how to recognize it. You also need to know how to avoid it. The simplest cases of plagiarism to avoid are the intentional ones: If you copy a paper from a classmate, buy a paper from the Internet, copy whole passages from a book, article, or Web site without citing the author, you are plagiarizing. Here's the best advice you'll ever receive about avoiding intentional plagiarism: If you're tempted to borrow someone else's ideas or plagiarize in any way because you're pressed for time, nervous about how you're doing in a class, or confused about the assignment, don't do it . The problems you think you're solving by plagiarizing are really minor compared to the problems you will create for yourself by plagiarizing. In every case, the consequences of plagiarism are much more serious than the consequences of turning in a paper late or turning in a paper you're not satisfied to have written.

"...the consequences of plagiarism are much more serious than the consequences of turning in a paper late..."

The consequences of accidental plagiarism are equally daunting and should be avoided at all costs. Whether or not you intended to plagiarize, you will still be held responsible. As a member of an intellectual community you are expected to respect the ideas of others in the same way that you would respect any other property that didn't belong to you, and this is true whether you plagiarize on purpose or by accident. The best way to make sure you don't plagiarize due to confusion or carelessness is to 1) understand what you're doing when you write a paper and 2) follow a method that is systematic and careful as you do your research . In other words, if you have a clear sense of what question you're trying to answer and what knowledge you're building on, and if you keep careful, clear notes along the way, it's much easier to use sources effectively and responsibly and, most of all, to write a successful paper. If you have questions about plagiarism at any point in your research or writing process, ask. It's always better to ask questions than it is to wait for an instructor to respond to work that you have turned in for a grade. Once you have turned in your final work, you will be held responsible for misuse of sources.

With these principles in mind, here are some guidelines for conducting research responsibly:

Keep track of your sources; print electronic sources

While it's easy enough to keep a stack of books or journal articles on your desk where you can easily refer back to them, it's just as important to keep track of electronic sources. When you save a PDF of a journal article, make sure you put it into a folder on your computer where you'll be able to find it. When you consult a Web site, log the Web address in a separate document from the paper you're writing so that you'll be able to return to the Web site and cite it correctly. You should also print the relevant pages from any Web sites you use, making sure you note the complete URL and the date on which you printed the material. Because electronic sources aren't stable and Web pages can be deleted without notice, beware of directing your readers to sources that might have disappeared. Check when the Web site you're using was last updated and update the URLs as you work and once again right before you submit your essay. If an electronic source disappears before you submit your work, you will need to decide whether or not to keep the source in your paper. If you have printed the source and can turn it in with your paper, you should do so. If you have not printed the source, you should consult your instructor about whether or not to use that source in your paper.

The library has several helpful resources for managing your sources, including RefWorks .

Keep sources in correct context

Whenever you consult a source, you should make sure you understand the context, both of the ideas within a source and of the source itself. You should also be careful to consider the context in which a source was written. For example, a book of essays published by an organization with a political bias might not present an issue with adequate complexity for your project.

The question of context can be more complicated when you're working with Internet sources than with print sources because you may see one Web page as separate from an entire Web site and use or interpret that page without fully understanding or representing its context. For example, a definition of "communism" taken from a Web site with a particular political agenda might provide one interpretation of the meaning of the word—but if you neglect to mention the context for that definition you might use it as though it's unbiased when it isn't. Likewise, some Internet searches will take you to a URL that's just one Web page within a larger Web site; be sure to investigate and take notes on the context of the information you're citing.

Research can often turn out to be more time-consuming that you anticipate. Budget enough time to search for sources, to take notes, and to think about how to use the sources in your essay. Moments of carelessness are more common when you leave your essay until the last minute and are tired or stressed. Honest mistakes can lead to charges of plagiarism just as dishonesty can; be careful when note-taking and when incorporating ideas and language from electronic sources so you always know what language and ideas are yours and what belongs to a source.

Don't cut and paste: File and label your sources

Never cut and paste information from an electronic source straight into your own essay, and never type verbatim sentences from a print source straight into your essay. Instead, open a separate document on your computer for each source so you can file research information carefully. When you type or cut and paste into that document, make sure to include the full citation information for the print source or the full URL and the date you copied the page(s). For Web sources, make sure to cite the page from which you're taking information, which may not necessarily be the home page of the site you're using. Use logical and precise names for the files you create, and add citation information and dates. This allows you to retrieve the files easily, deters you from accidentally deleting files, and helps you keep a log of the order in which your research was conducted. It's a good idea to add a note to each file that describes how you might use the information in that file. Remember: you're entering a conversation with your sources, and accurate file names and notes can help you understand and engage that conversation. And, of course, always remember to back up your files.

Keep your own writing and your sources separate

Work with either the printed copy of your source(s) or (in the case of online sources), the copy you pasted into a separate document—not the online version—as you draft your essay. This precaution not only decreases the risk of plagiarism but also enables you to annotate your sources in various ways that will help you understand and use them most effectively in your essay.

Keep your notes and your draft separate

Be careful to keep your research notes separate from your actual draft at all stages of your writing process. This will ensure that you don't cut language from a source and paste it into your paper without proper attribution. If you work from your notes, you're more likely to keep track of the boundaries between your own ideas and those in a source.

Paraphrase carefully in your notes; acknowledge your sources explicitly when paraphrasing

When you want to paraphrase material, it's a good idea first to paste the actual quotation into your notes (not directly into your draft) and then to paraphrase it (still in your notes). Putting the information in your own words will help you make sure that you've thought about what the source is saying and that you have a good reason for using it in your paper. Remember to use some form of notation in your notes to indicate what you've paraphrased and mention the author's name within the material you paraphrase. You should also include all citation information in your notes.

When you decide to use paraphrased material in your essay, make sure that you avoid gradually rewording the paraphrased material from draft to draft until you lose sight of the fact that it's still a paraphrase. Also, avoid excessive paraphrasing in which your essay simply strings together a series of paraphrases. When the ideas taken from your sources start to blend in deceptively with your own thinking, you will have a more difficult time maintaining the boundaries between your ideas and those drawn from sources. Finally, whenever you paraphrase, make sure you indicate, at each logical progression, that the ideas are taken from an authored source.

Avoid reading a classmate's paper for inspiration

If you're in a course that requires peer review or workshops of student drafts, you are going to read your classmates' work and discuss it. This is a productive way of exchanging ideas and getting feedback on your work. If you find, in the course of this work, that you wish to use someone else's idea at some point in your paper (you should never use someone else's idea as your thesis, but there may be times when a classmate's idea would work as a counterargument or other point in your paper), you must credit that person the same way you would credit any other source. On the other hand, if you find yourself reading someone else's paper because you're stuck on an assignment and don't know how to proceed, you may end up creating a problem for yourself because you might unconsciously copy that person's ideas. When you're stuck, make an appointment with your instructor or go to the Writing Center for advice on how to develop your own ideas.

Don't save your citations for later

Never paraphrase or quote from a source without immediately adding a citation. You should add citations in your notes, in your response papers, in your drafts, and in your revisions. Without them, it's too easy to lose track of where you got a quotation or an idea and to end up inadvertently taking credit for material that's not your own.

Quote your sources properly

Always use quotation marks for directly quoted material, even for short phrases and key terms.

Keep a source trail

As you write and revise your essay, make sure that you keep track of your sources in your notes and in each successive draft of your essay. You should begin this process early, even before you start writing your draft. Even after you've handed in your essay, keep all of your research notes and drafts. You ought to be able to reconstruct the path you took from your sources to your notes and from your notes to your drafts and revision. These careful records and clear boundaries between your writing and your sources will help you avoid plagiarism. And if you are called upon to explain your process to your instructor, you'll be able to retrace the path you took when thinking, researching, and writing, from the essay you submitted back through your drafts and to your sources.

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access

Peer-reviewed

Research Article

Factors influencing plagiarism in higher education: A comparison of German and Slovene students

Roles Conceptualization, Data curation, Formal analysis, Project administration, Supervision, Writing – original draft, Writing – review & editing

* E-mail: [email protected]

Affiliation Department of Personnel and Education, Faculty of Organizational Sciences, University of Maribor, Kranj, Slovenia

ORCID logo

Roles Funding acquisition, Writing – original draft, Writing – review & editing

Affiliation Faculty of Natural Sciences and Mathematics, University of Maribor, Maribor, Slovenia; School of Electronic and Information Engineering, Beihang University, Beijing, China

Roles Conceptualization, Data curation, Investigation, Supervision, Writing – original draft, Writing – review & editing

Affiliation Department of Economics and Law, Frankfurt University of Applied Sciences, Frankfurt, Germany

Roles Data curation, Formal analysis, Investigation, Writing – original draft

Affiliation Department of Methodology, Faculty of Organizational Sciences, University of Maribor, Kranj, Slovenia

Roles Formal analysis, Resources, Writing – original draft

Roles Funding acquisition, Project administration, Supervision, Writing – original draft

Roles Conceptualization, Formal analysis, Methodology, Supervision, Writing – original draft, Writing – review & editing

  • Eva Jereb, 
  • Matjaž Perc, 
  • Barbara Lämmlein, 
  • Janja Jerebic, 
  • Marko Urh, 
  • Iztok Podbregar, 
  • Polona Šprajc

PLOS

  • Published: August 10, 2018
  • https://doi.org/10.1371/journal.pone.0202252
  • Reader Comments

Table 1

Over the past decades, plagiarism has been classified as a multi-layer phenomenon of dishonesty that occurs in higher education. A number of research papers have identified a host of factors such as gender, socialisation, efficiency gain, motivation for study, methodological uncertainties or easy access to electronic information via the Internet and new technologies, as reasons driving plagiarism. The paper at hand examines whether such factors are still effective and if there are any differences between German and Slovene students’ factors influencing plagiarism. A quantitative paper-and-pencil survey was carried out in Germany and Slovenia in 2017/2018 academic year, with a sample of 485 students from higher education institutions. The major findings of this research reveal that easy access to information-communication technologies and the Web is the main reason driving plagiarism. In that regard, there are no significant differences between German and Slovene students in terms of personal factors such as gender, motivation for study, and socialisation. In this sense, digitalisation and the Web outrank national borders.

Citation: Jereb E, Perc M, Lämmlein B, Jerebic J, Urh M, Podbregar I, et al. (2018) Factors influencing plagiarism in higher education: A comparison of German and Slovene students. PLoS ONE 13(8): e0202252. https://doi.org/10.1371/journal.pone.0202252

Editor: Andreas Wedrich, Medizinische Universitat Graz, AUSTRIA

Received: May 21, 2018; Accepted: July 6, 2018; Published: August 10, 2018

Copyright: © 2018 Jereb et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the paper and its Supporting Information files.

Funding: MP was supported by the Slovenian Research Agency (Grant Nos. J1-7009 and P5-0027), http://www.arrs.gov.si/ . The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Many of those who teach in higher education have encountered the phenomenon of plagiarism as a form of dishonesty in the classroom. According to the Oxford English Dictionary online 2017, the term plagiarism is defined as ‘the practice of taking someone else's work or ideas and passing them off as one's own’. Perrin, Larkham and Culwin define plagiarism as the use of an author's words, ideas, reflections and thoughts without proper acknowledgment of the author [ 1 – 3 ]. Koul et al. define plagiarism as a form of cheating and theft since in cases of plagiarism one person takes credit for another person’s intellectual work [ 4 ]. According to Fishman, ‘Plagiarism occurs when someone: 1) uses words, ideas, or work products; 2) attributable to another identifiable person or source; 3) without attributing the work to the source from which it was obtained; 4) in a situation in which there is a legitimate expectation of original authorship; 5) in order to obtain some benefit, credit, or gain which need not be monetary’ [ 5 ]. But why do students use someone else's words or ideas and pass them on as their own? Which factors influence this behaviour? That is the main focus of our research, to discover the factors influencing plagiarism and see if there are any differences between German and Slovene students.

Koul et al. pointed out that particular circumstances or events should be considered in the definition of plagiarism since plagiarism may vary across cultures and societies [ 4 ]. Hall has described Eastern cultures (the Middle East, Asia, Africa, South America) and Western cultures (North America and much of Europe) using the idea of ‘context’, which refers to the framework, background, and surrounding circumstances in which an event takes place [ 6 ]. Western societies are generally ‘low context’ societies. In other words, people in Western societies play by external rules (e.g., honour codes against plagiarism), and decisions are based on logic, facts, and directness. Eastern societies are generally ‘high context’ societies, meaning that people in Eastern societies put strong emphasis on relational concerns, and decisions are based on personal relationships. Nisbett et al. have suggested that differences between Westerners and Easterners may arise from people being socialised into different worldviews, cognitive processes and habits of mind [ 7 ]. In Germany, there has been ongoing reflection on academic plagiarism and other dishonest research practices since the late 19th century [ 8 ]. However, according to Ruiperez and Garcia-Cabrero, in Germany, 2011 became a landmark year with the appearance of an extensive public debate about plagiarism—brought back into the limelight because of an investigation into the incumbent German Defence Minister’s doctoral thesis [ 9 ]. Aside from the numerous cases of plagiarism detected in academic work since 2011, several initiatives have enriched the debate on academic plagiarism. For example, the development of a consolidated cooperative textual research methodology using a specific Wiki called ‘VroniPlag’ has made Germany one of the most advanced European countries in terms of combating these practices. Similar to Germany, Slovenia has also paid increased attention to plagiarism in recent years. The debate about plagiarism became public after it was discovered that certain Slovene politicians had resorted to academic plagiarism. Today, universities in Slovenia use a variety of tools (Turnitin, plagiarism plug-ins for Moodle, plagiarisma.net, etc.) in order to detect plagiarism. The focus of this research is to investigate the factors influencing plagiarism and if there are any differences between Slovene and German students’ factors influencing plagiarising. The research questions (RQ) of the study were divided into three groups:

  • RQ group 1: Which factors influence plagiarism in higher education?
  • RQ group 2: Are there any differences between male and female students regarding factors influencing plagiarism? Are the factors influencing plagiarism connected with specific areas of study (technical sciences, social sciences, natural sciences)?
  • RQ group 3: Does the students’ motivation affect their factors influencing plagiarism? Are there any differences between male and female students regarding this?

In addition, for all three research question groups, we also wanted to know if there were any differences between the German and Slovene students.

Theoretical background

Plagiarism is a highly complex phenomenon and, as such, it is likely that there is no single explanation for why individuals engage in plagiarist behaviours [ 10 ]. The situation is often complex and multi-dimensional, with no simple cause-and-effect link [ 11 ].

McCabe et al. noted that individual factors (e.g. gender, average grade, work ethic, self-esteem), institutional factors (e.g., faculty response to cheating, sanction threats, honour codes) and contextual factors (e.g., peer cheating behaviours, peer disapproval of cheating behaviours, perceived severity of penalties for cheating) influence cheating behaviour [ 12 ]. Giluk and Postlethwaite also related individual characteristics and situational factors to cheating—individual characteristics such as gender, age, ability, personality, and extracurricular involvement; and situational factors such as honour codes, penalties, and risk of detection [ 13 ]. The study of Jereb et al. also revealed that specific individual characteristics pertaining to men and women influence plagiarism [ 14 ]. Newstead et al. suggested that gender differences (plagiarism is more frequent among boys), age differences (plagiarism is more frequent among younger students), and academic performance differences (plagiarism is more frequent among lower performers) are specific factors for plagiarism [ 15 ]. Gerdeman stated that the following five student characteristic variables are frequently related to the incidence of dishonest behaviour: academic achievement, age, social activities, study major, and gender [ 16 ].

One of the factors influencing plagiarism could be that students do not have a clear understanding of what constitutes plagiarism and how it can be avoided [ 17 , 18 ]. According to Hansen, students don’t fully understand what constitutes plagiarism [ 19 ]. Park states genuine lack of understanding as one of the reasons for plagiarism. Some students plagiarise unintentionally, when they are not familiar with proper ways of quoting, paraphrasing, citing and referencing and/or when they are unclear about the meaning of ‘common knowledge’ and the expression ‘in their own words’ [ 11 ].

Furthermore, it is important to remember that, in our current day and age, information is easily accessed through new technologies. In addition, as Koul et al. have stated, the belief that we as people have greater ownership of information than we have paid for may influence attitudes towards plagiarism [ 4 ]. Many other authors have also stated that the Internet has increased the potential for plagiarism, since information is easily accessed through new technologies [ 14 , 20 , 21 , 22 ]. Indeed, the Internet grants easy access to an enormous amount of knowledge and learning materials. This provides an opportunity for students to easily cut, paste, download and plagiarise information [ 21 , 23 ]. Online resources are available 24/7 and enable a flood of information, which is also constantly updated. Given students' ease of access to both digital information and sophisticated digital technologies, several researchers have noted that students may be more likely to ignore academic ethics and to engage in plagiarism than would otherwise be the case [ 24 ].

In a study of the level of plagiarism in higher education, Tayraukham found that students with performance goals were more likely to indulge in plagiarism behaviours than students who wanted to achieve mastery of a particular subject [ 25 ]. Most of the students plagiarised in order to provide the right responses to study questions, with the ultimate goal of getting higher grades—rather than gaining expertise in their subjects of study. Anderman and Midgley observed that a relatively higher performance-oriented classroom climate increases cheating behaviour; while a higher mastery-oriented classroom climate decreases cheating behaviour [ 26 ]. Park also claimed that one of the reasons that students plagiarise is efficiency gain, that is, that students plagiarise in order to get a better grade and save time [ 11 ]. Songsriwittaya et al. stated that what motivates students to plagiarise is the goal of getting good grades and comparing their success with that of their peers [ 27 ]. The study of Ramzan et al. also revealed that the societal and family pressures of getting higher grades influence plagiarism [ 21 ]. Such pressures sometimes push students to indulge in unfair means such as plagiarism as a shortcut to performing better in exams or producing a certain number of publications. Engler et al. and Hard et al. tended to agree with this idea, stating that plagiarism arises out of social norms and peer relationships [ 28 , 29 ]. Park also stated that there are many calls on students’ time, including peer pressure for maintaining an active social life, commitment to college sports and performance activities, family responsibilities, and pressure to complete multiple work assignments in short amounts of time [ 11 ]. Šprajc et al. agreed that students are under an enormous amount of pressure from family, peers, and instructors, to compete for scholarships, admissions, and, of course, places in the job market [ 30 ]. This affects students’ time management and can lead to plagiarism. In addition to time pressures, Franklin-Stokes and Newstead found another six major reasons given by students to explain cheating behaviours: the desire to help a friend, a fear of failure, laziness, extenuating circumstances, the possibility of reaping a monetary reward, and because ‘everybody does it’ [ 31 ].

Another common reason for plagiarism is the poor preparation of lecture notes, which can lead to the inadequate referencing of texts [ 32 ]. Šprajc et al. found out that too many assignments given within a short time frame pushes students to plagiarise [ 30 ]. Poor explanations, bad teaching, and dissatisfaction with course content can also drive students to plagiarise. Park exposed students’ attitudes towards teachers and classes [ 11 ]. Some students cheat because they have negative attitudes towards assignments and tasks that teachers believe to have meaning but that they don’t [ 33 ]. Cheating tends to be more common in classes where the subject matter seems unimportant or uninteresting to students, or where the teacher seemed disinterested or permissive [ 16 ].

Park mentioned students’ academic skills (researching and writing skills, knowing how to cite, etc.) as another reason for plagiarism [ 11 ]. New students and international students whose first language is not English need to transition to the research culture by understanding the necessity of doing research, and the practice and skills required to do so, in order to avoid unintentional plagiarism [ 21 ]. According to Park to some students, plagiarism is a tangible way of showing dissent and expressing a lack of respect for authority [ 11 ]. Some students deny to themselves that they are cheating or find ways of legitimising their behaviour by passing the blame on to others. Other factors influencing plagiarising are temptation and opportunity. It is both easier and more tempting for students to plagiarise since information has become readily accessible with the Internet and Web search tools, making it faster and easier to find information and copy it. In addition, some people believe that since the Internet is free for all and a public domain, copying from the Internet requires no citation or acknowledgement of the source [ 34 ]. To some students, the benefits of plagiarising outweigh the risks, particularly if they think there is little or no chance of getting caught and there is little or no punishment if they are indeed caught [ 35 ].

One of the factors influencing plagiarism could be also higher institutions’ attitudes towards plagiarism, that is, whether they have clear policies regarding plagiarism and its consequences or not. The effective communication of policies, increased student awareness of penalties, and enforcement of these penalties tend to reduce dishonest behaviour [ 36 ]. Ramzan et al. [ 21 ] mentioned the research of Razera et al., who found that Swedish students and teachers need training to understand and avoid plagiarism [ 37 ]. In order to deal with plagiarism, teachers want and need a clear set of policies regarding detection tools, and extensive training in the use of detection software and systems. According to Ramzan et al., Dawson and Overfield determined that students are aware that plagiarism is bad but that they are not clear on what constitutes plagiarism and how to avoid it [ 21 , 38 ]. In Dawson and Overfield’s study, students required teachers to also observe the rules set up to avoid plagiarism and be consistently kept aware of plagiarism—in order to enforce the university’s resolve to control this academic misconduct.

According to this literature review and our experiences in higher education teaching, we determined that the following factors influence plagiarism: students’ individual factors, information-communication technologies (ICT) and the Web, regulation, students’ academic skills, teaching factors, different forms of pressure, student pride, and other reasons. The statements used in the instrument we developed, and the results of our research are presented in the following chapters.

Participants

The paper-and-pencil survey was carried out in the 2017/18 academic year at the University of Maribor in Slovenia and at the Frankfurt University of Applied Sciences in Germany. Students were verbally informed of the nature of the research and invited to freely participate. They were assured of anonymity. The study was approved by the Ethical Committee for Research in Organizational Sciences at Faculty of Organizational Sciences University of Maribor.

A sample of 191 students from Slovenia (SLO) (99 males (51.8%) and 92 (48.2%) females) and 294 students from Germany (GER) (115 males (39.1%) and 171 (58.2%) females) participated in this study. Slovene students’ ages ranged from 19 to 36 years, with a mean of 21 years and 1 months ( M = 21 . 12 and SD = 1 . 770 ) and German students’ ages ranged from 18 to 40 years, with a mean of 22 years and 10 months ( M = 22 . 84 and SD = 3 . 406 ). About half (49.2%) of the Slovene participants were social sciences students, 34.9% were technical sciences students, and 15.9% were natural sciences students. More than half (58.5%) of the German participants were social sciences students, 32% were technical sciences students and 2% were natural sciences students. More than half of the Slovene students (53.4%) attended blended learning, and 46.6% attended classic learning. The majority of German students (87.8%) attended classic learning, and 6.8% attended blended learning. More than half of the Slovene students (61.6%) were working at the time of the study, and 39.8% of all participants had scholarships. In addition, in Germany, more than half the students (65.0%) were working at the time of the study, but only 10.2% of all the German participants had scholarships. More than two thirds (68.9%) of the Slovene students were highly motivated for study and 31.1% less so; 32.6% of the students spend 2 or fewer hours per day on the Internet, 41.6% spend between 2 and 5 hours on the Internet, and 25.8% spend 5 or more hours on the Internet per day. Also, more than two thirds (73.1%) of the German students were highly motivated for study and 23.8% less so; 33.3% of the students spend 2 or fewer hours per day on the Internet, 32.3% spend between 2 and 5 hours on the Internet, and 27.9% spend 5 or more hours on the Internet per day. The general data can be seen in S1 Table .

The questionnaire contained closed questions referring to: (i) general/individual data (gender, age, area of study, method of study, working status, scholarship, motivation for study, average time spent on the internet), and factors influencing plagiarism (ii) ICT and Web, (iii) regulation, (iv) academic skills, (v) teaching factors, (vi) pressure, (vii) pride, (viii) other reasons. The items in the groups (ii) to (viii) used a 5-point Likert scale from strongly disagree (1) to strongly agree (5), with larger values indicating stronger orientation.

The statements used in the survey were as follows:

  • 1.1 It is easy for me to copy/paste due to contemporary technology
  • 1.2 I do not know how to cite electronic information
  • 1.3 It is hard for me to keep track of information sources on the web
  • 1.4 I can easily access research material using the Internet
  • 1.5 Easy access to new technologies
  • 1.6 I can easily translate information from other languages
  • 1.7 I can easily combine information from multiple sources
  • 1.8 It is easy to share documents, information, data
  • 2.1 There is no teacher control on plagiarism
  • 2.2 There is no faculty regulation against plagiarism
  • 2.3 There is no university regulation against plagiarism
  • 2.4 There are no penalties
  • 2.5 There are no honour codes relating to plagiarism
  • 2.6 There are no electronic systems of control
  • 2.7 There is no systematic tracking of violators
  • 2.8 I will not get caught
  • 2.9 I am not aware of penalties
  • 2.10 I do not understand the consequences
  • 2.11 The penalties are minor
  • 2.12 The gains are higher than the losses
  • 3.1 I run out of time
  • 3.2 I am unable to cope with the workload
  • 3.3 I do not know how to cite
  • 3.4 I do not know how to find research materials
  • 3.5 I do not know how to research
  • 3.6 My reading comprehension skills are weak
  • 3.7 My writing skills are weak
  • 3.8 I sometimes have difficulty expressing my own ideas
  • 4.1 The tasks are too difficult
  • 4.2 Poor explanation—bad teaching
  • 4.3 Too many assignments in a short amount of time
  • 4.4 Plagiarism is not explained
  • 4.5 I am not satisfied with course content
  • 4.6 Teachers do not care
  • 4.7 Teachers do not read students' assignments
  • 5.1 Family pressure
  • 5.2 Peer pressure
  • 5.3 Under stress
  • 5.4 Faculty pressure
  • 5.5 Money pressure
  • 5.6 Afraid to fail
  • 5.7 Job pressure
  • 6.1 I do not want to look stupid in front of peers
  • 6.2 I do not want to look stupid in front of my professor
  • 6.3 I do not want to embarrass my family
  • 6.4 I do not want to embarrass myself
  • 6.5 I focus on how my competences will be judged relative to others
  • 6.6 I am focused on learning according to self-set standards
  • 6.7 I fear asking for help
  • 6.8 My fear of performing poorly motivates me to plagiarise
  • 6.9 Assigned academic work will not help me personally/professionally
  • 7.1 I do not want to work hard
  • 7.2 I do not want to learn anything, just pass
  • 7.3 My work is not good enough
  • 7.4 It is easier to plagiarise than to work
  • 7.5 To get a better/higher mark (score)

All statistical tests were performed with SPSS at the significance level of 0.05. Parametric tests (Independent–Samples t-Test and One-Way ANOVA) were selected for normal and near-normal distributions of the responses. Nonparametric tests (Mann-Whitney Test, Kruskal-Wallis Test, Friedman’s ANOVA) were used for significantly non-normal distributions. Chi-Square Test was used to investigate the independence between variables.

The average values for the groups (and standard deviations) of the responses referring to the factors influencing plagiarism can be seen in Table 1 (descriptive statistics for all statements can be seen in S2 Table ), shown separately for Slovene and German students. An Independent Samples t-test was conducted to obtain the average values of the responses, and thus evaluate for which statements these differed significantly between the Slovene and German students.

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

https://doi.org/10.1371/journal.pone.0202252.t001

According to the Friedman’s ANOVA (see Table 2 ), the Slovene students’ factors influencing plagiarism can be formed into four homogeneous subsets, where in each subset, the distributions of the average values for the responses are not significantly different. At the top of the list is the existence of ICT and the Web (group 1). The second subset consists of teaching factors (group 4). The third subset is composed of academic skills, other reasons, and pride, in order from highest to lowest (groups 3, 7 and 6). The fourth subset is composed of other reasons, pride, pressure, and regulation, respectively (groups 7, 6, 5 and 2).

thumbnail

https://doi.org/10.1371/journal.pone.0202252.t002

For the Slovene students, ICT and the Web were detected as the dominant factors influencing plagiarism and, as such, we investigated them in greater detail. A Friedman Test ( Chi-Square = 7.180, p = .066) confirmed that the distributions of the responses to the statements 1.1, 1.4, 1.5 and 1.8—those with the highest sample means—are not significantly different. Consequently, the average values (means) of the responses to the statements 1.1, 1.4, 1.5 and 1.8 are not significantly different. The average values of the responses for all the other statements (1.7, 1.6, 1.2, and 1.3 listed in the descending order of sample means) are significantly lower. A Mann-Whitney Test showed that there is no statistically significant difference between the distributions of the responses in the group of ICT and Web reasons considering gender (male, female) and motivation for study (lower, higher). For statement 1.2, a Kruskal-Wallis Test ( Chi-Square = 7.466, p = .024) confirmed that there are different distributions for the responses when the area of study is considered (technical sciences, social sciences, natural sciences).

According to the Friedman’s ANOVA (see Table 3 ), the German students’ factors influencing plagiarism can be formed into five homogeneous subsets, where in each subset, the distributions of the average values for the responses are not significantly different. At the top of the list is the existence of ICT and the Web (group 1). The second subset is composed of pressure and pride, in order from highest to lowest (groups 5 and 6). The third subset consists of pride, teaching factors and other reasons, respectively (groups 6, 4 and 7). The fourth subset is composed of teaching factors, other reasons and academic skills, in order from highest to lowest (groups 4, 7 and 3). Finally, the last subset consists of regulation (group 2).

thumbnail

https://doi.org/10.1371/journal.pone.0202252.t003

Just like the Slovene students, for the German students ICT and the Web were detected as the dominant factors influencing plagiarism. That the distributions of the responses to the statements 1.4, 1.5 and 1.8—those with the highest sample means—are not significantly different was confirmed by Friedman Test ( Chi-Square = 5.815, p = .055). Consequently, the average values (means) of the responses to the statements 1.4, 1.5 and 1.8 are not significantly different. The average values of the responses for all the other statements (1.1, 1.7, 1.6, 1.2, and 1.3 listed in the descending order of sample means) are significantly lower. A Wilcoxon Signed Ranks Tests also confirmed that the distributions of the responses to the statements 1.6 and 1.7 are not statistically significantly different ( Z = -0.430, p = .667). The same holds for statements 1.2 and 1.3 ( Z = -0.407, p = .684). A Mann-Whitney Test showed that there is no statistically significant difference between the distributions of the responses in the group of ICT and Web reasons considering gender (male, female), area of study (technical and social sciences (students of natural sciences were omitted due to the small sample size)) and motivation for study.

ICT and Web reasons were detected as the dominant factors influencing plagiarism for Slovene and German students. As can be seen in Table 1 , there are significant differences ( t = 4.177, p = .000 ) between the Slovene and German students regarding this factor. It seems that the Slovene students ( M = 3.69, SD = 0.56) attribute greater importance to the ICT and Web reasons than the German students ( M = 3.47, SD = 0.55). There are also significant differences ( t = 5.137, p = .000 ) between the Slovene and German students regarding regulation. It seems that the Slovene students ( M = 2.35, SD = 0.63) attribute greater importance to regulation reasons than the German students ( M = 2.05, SD = 0.61). Both, however, consider this factor to have the lowest impact on plagiarism overall. There are no significant differences ( t = 1.939, p = .053 ) between the Slovene students ( M = 2.56, SD = 0.67) and the German students ( M = 2.44, SD = 0.68) regarding academic skills. The Slovene students ( M = 2.87, SD = 0.68) attribute greater importance to teaching factors than the German students ( M = 2.56, SD = 0.72). The differences are significant ( t = 4.827, p = .000 ) . There are significant differences ( t = -3.522, p = .000 ) between the Slovene and German students regarding pressure, whereas the German students ( M = 2.71, SD = 0.91) attribute greater importance to this reason than the Slovene students ( M = 2.42, SD = 0.86). The same goes for pride. The German students ( M = 2.67, SD = 0.80) attribute greater importance to pride reasons than the Slovene students ( M = 2.43, SD = 0.84). The differences are significant ( t = -3.032, p = .003 ) . There are no significant differences ( t = - 0.836, p = .404 ) between the Slovene students ( M = 2.47, SD = 0.82) and the German students ( M = 2.54, SD = 0.94) regarding other factors influencing plagiarism.

We conducted an Independent Samples t-test to compare the average time (in hours) spent per day on the Internet by the Slovene students with that of the German students. The test was significant, t = -2.064, p = .004. The Slovene students on average spent less time on the Internet ( M = 3.52, SD = 2.23) than the German students ( M = 4.09, SD = 3.72).

The average values of the responses for individual statements according to gender (male, female) and the significances for the t-test of equality of means are shown in S3 Table for the Slovene students and in S4 Table for the German students. The average values of the responses for these statements are significantly different. They are higher for males than for females (except in the case of statement 3.8 for the Slovene students and 4.1 for the German students). Slovene and German male students think that they will not get caught and that the gains are higher than the losses. Both also think that teachers do not read students’ assignments.

The average values of the responses for individual statements according to area of study (technical sciences, social sciences, natural sciences) and the results for ANOVA for the Slovene students are shown in S5 Table . Gabriel's post hoc test was used to confirm the differences between groups. The significant difference between the students of technical sciences and the students of social sciences was confirmed for all statements listed in S5 Table . There were higher average values of responses for the students of technical sciences. The only significant difference between the students of technical sciences and the students of natural sciences was confirmed for statement 5.6 (there were higher average values of responses for the students of technical sciences). No other pairs of group means were significantly different.

The average values of the responses for individual statements according to area of study (technical sciences, social sciences) and the significances for the t-test of equality of means for German students are shown in S6 Table . For German students, only technical and social sciences were considered because of the low number of natural sciences students. The average values of responses for these statements are significantly different. They were higher for the students of technical sciences than for the students of social sciences.

The average values of the responses for individual statements according to the motivation of the students (lower, higher) and the significances for t-Test of equality of means are shown in S7 Table for the Slovene students and in S8 Table for the German students. The average values of the responses for these statements are significantly different. They were higher for students with lower motivation for both groups of students, except in the case of statements 2.1 and 6.6 for Slovene students.

We conducted an Independent Samples t-test to compare the average time (in hours) spent per day on the Internet by groups of low motivated students with groups of highly motivated students. For Slovene students, the test was not significant, t = -1.423, p = .156. For German students, the test was significant, t = 2.298, p = .024. Students with lower motivation for study ( M = 5.24, SD = 4.84) on average spent more time on the Internet than those with higher motivation for study ( M = 3.76, SD = 3.27).

The Chi-Square Test of Independence was used to determine whether there is an association between gender (male, female) and motivation for study (lower, higher). There was a significant association between gender and motivation for the Slovene students ( Chi-Square = 4.499, p = .034). Indeed, it was more likely for females to have a high motivation for study (76.9%) than for males to have a high motivation for study (61.6%). For the German students, the test was not significant ( Chi-Square = 0.731, p = .393).

In this study, we aimed to explore factors that influence students’ factors influencing plagiarism. An international comparison between German and Slovene students was made. Our research draws on students from two universities from the two considered countries that cover all traditional subjects of study. In this regard the conclusions are representative and statistically relevant, although we of course cannot exclude the possibility of small deviations if other or more institutions would be considered. Taken as a whole, there are no major differences between German and Slovene students when it comes to motivation for study and working habits. In both cases, more than two thirds of the students were highly motivated for study and more than 60% were working during their time of study. About 33% of the surveyed students spend on average two or less hours a day on the Internet, and about one quarter spend on average more than five hours a day on the Internet.

When it comes to explaining plagiarism in higher education, the German and Slovene students equally indicated the ease-of-use of information-communication technologies and the Web as the top one cause for their behaviour. Which does not lag behind other notions of current contributions to the topic of plagiarism in the world. Indeed, our findings reinforce the notion that new technologies and the Web have a strong influence on students and are the main driver behind plagiarism [ 20 , 21 , 22 ]. An academic moral panic has been caused by the arrival in higher education of a new generation of younger students [ 39 ], deemed to be ‘digital natives’ [ 40 ] and allegedly endowed with an inherent ability for using information-communication technologies (ICT). This younger generation is dubbed ‘Generation Me’ [ 41 ], and it is believed that their expectations, interactions and learning processes have been affected by ICT. Introna, et al., Ma et al., and Yeo, agree that the understanding of the concept of plagiarism through the use of ICT is the main contributor to it being a problem [ 42 , 43 , 44 ]. The effortless use of ICT such as the Internet has made it easy for students to retrieve information with a simple click of the mouse [ 45 , 46 ].

The Slovene students in our study nominated the teaching factor as the second most important reason for plagiarism. This result is also found in other studies, namely those of Šprajc et al. [ 30 ] and Barnas [ 47 ]. Young people in Slovenia are, like in other Western societies, given a prolonged period of identity exploration and self-focus, i.e., freedom from institutional demands and obligations, competence, and freedom to decide for themselves [ 48 , 49 ]. The results of the German students however, contradict this finding that teaching factors are one of the most important factors influencing plagiarism. Indeed, the top two factors influencing plagiarism for the German students are actually pressure and pride—and not teaching factors. Overall though, the findings for both the German students and the Slovene students are in line with e.g. Koul et al., who suggest that factors influencing plagiarism may vary across cultures [ 4 ]. Among German students, the pressure and pride in the second and third places in terms of importance are mostly reflected, which does not lag behind the mention of the author Rothenberg stated that in Germany today ‘pride could be expressed for individual accomplishments’ [ 50 ]. As far as the Slovene students are concerned, the authors Kondrič et al. presumed that there is a specific set of values in Slovenia, which perhaps intensify the distinction between the collectivist culture of former socialist countries and the individualism of Western countries [ 51 ]. This might shed light on why the Slovene students consider teaching factors as being one of the most important factors influencing plagiarism.

Furthermore, several studies have implied that individual characteristics, especially gender, play an important role when it comes to plagiarism [ 12 , 13 , 15 , 16 ]. A number of studies from around the world have shown that men more frequently plagiarise than women do. For example, Reviews of North American’s research into conventional plagiarism has indicated that male students cheat more often than female students [ 12 ]. The results we found are basically in line with these findings. Since the average values of responses are significantly different for male and female students, gender seems to play an important role in terms of plagiarism.

Park pointed out that one reason for plagiarism is efficiency gain [ 11 ]. About 15 years after this statement, the study at hand is empirical evidence that efficiency gain due to different forms of pressure is still a factor that influences students’ behaviour in terms of plagiarism. Lack of knowledge and uncertainties about methodologies are additional factors that are frequently recognized as reasons for plagiarism [ 11 , 17 , 18 ]. The results at hand support these studies since the responses about e.g. academic skills demonstrate students’ lack of knowledge.

Another interesting finding of our study shows that students with a lower motivation for study spend more time on the Internet, which complements our finding that the Internet is one of the simplest solutions for studying. The German students showed a somewhat higher level of motivation to study than the Slovene students, but the difference is not statistically significant.

We would nevertheless like to draw attention to the perceived difference, which refers to the perception of the factors influencing the plagiarism of the teacher factors and academic skills (Slovene students) and pride and pressure (German students). The perceived difference between students is one of the social dimensions that represents a tool to promote true motivation for study and proper orientation without ethically disputable solutions (such as plagiarism). In all this, it makes sense to direct students and educate them from the beginning of education together with information technology, while also builds responsible individuals who will not take technology and the Internet as a negative tool for studying and succeeding, but to help them to solve and make decisions in the right way. The main aim of this research into Slovene and German students was to increase understanding of students’ attitudes towards plagiarism and, above all, to identify the reasons that lead students to plagiarise. On this basis, we want to expose the way of non-plagiarism promotion to be developed in a way that will be more acceptable and more understandable in each country and adequately controlled on a personal and institutional level.

Conclusions

In contrast to a number of preliminary studies, the major findings of this research paper indicate that new technologies and the Web have a strong and significant influence on plagiarism, whereas in this specific context gender and socialisation factors do not play a significant role. Since the majority of the students in our study believe that new technologies and the Web have a strong influence on plagiarism, we can assume that technological progress and globalisation has started breaking down national frontiers and crossing cultural boundaries. These findings have also created the impression that at universities the gender gap is not predominant in all areas as it might be in society.

Nevertheless, some minor results in our study indicate that there are still some differences between Slovene and German students. For example, it seems like in Slovenia, teaching factors have a greater influence on plagiarism than in Germany. Indeed, in Germany, the focus should rest on the implementation and publication of a code of ethics, and on training students to deal with pressure.

This research focuses on only two countries, Slovenia and Germany. Thus, the findings at hand are not necessarily generalizable, though they do manifest a certain trend in terms of the reasons why students resort to plagiarism. Furthermore, the results could be a starting point for additional comparative studies between different European regions. In particular, further research into the influence of digitalization and the Web on plagiarism, and the role of socialisation and gender factors on plagiarism, could contribute to the discourse on plagiarism in higher education institutions.

Understanding the reasons behind plagiarism and fostering awareness of the issue among students might help prevent future academic misconduct through increased support and guidance during students’ time studying at the university. In this sense, further reflection on preventive measures is required. Indeed, rather than focusing on the detection of plagiarism, focusing on preventive measures could have a positive effect on good scientific practice in the near future.

Supporting information

S1 table. frequency distributions of the study variables..

https://doi.org/10.1371/journal.pone.0202252.s001

S2 Table. Descriptive statistics for items referring to the factors influencing plagiarism, by nationality and results of the t-Test.

https://doi.org/10.1371/journal.pone.0202252.s002

S3 Table. Descriptive statistics for items referring to the factors influencing plagiarism, by gender and results of the t-Test (SLO).

https://doi.org/10.1371/journal.pone.0202252.s003

S4 Table. Descriptive statistics for items referring to the factors influencing plagiarism, by gender and results of the t-Test (GER).

https://doi.org/10.1371/journal.pone.0202252.s004

S5 Table. Descriptive statistics for items referring to the factors influencing plagiarism, by area of study and results of the One-Way ANOVA (SLO).

https://doi.org/10.1371/journal.pone.0202252.s005

S6 Table. Descriptive statistics for items referring to the factors influencing plagiarism, by study area and results of the t-Test (GER).

https://doi.org/10.1371/journal.pone.0202252.s006

S7 Table. Descriptive statistics for items referring to the factors influencing plagiarism, by motivation and results of the t-Test (SLO).

https://doi.org/10.1371/journal.pone.0202252.s007

S8 Table. Descriptive statistics for items referring to the factors influencing plagiarism, by motivation and results of the t-Test (GER).

https://doi.org/10.1371/journal.pone.0202252.s008

S1 File. Individual data.

https://doi.org/10.1371/journal.pone.0202252.s009

  • 1. Perrin R. Pocket guide to APA style. 3 rd ed. Boston, MA: Wadsworth; 2009.
  • View Article
  • Google Scholar
  • 5. Fishman T. We Know it When We See it is not Good Enough: Toward a Standard Definition of Plagiarism that Transcends Theft, Fraud, and Copyright. Paper presented at the 4th Asia Pacific Conference on Educational Integrity, NSW, Australia. 2009. Available from: http://www.bmartin.cc/pubs/09-4apcei/4apcei-Fishman.pdf
  • 6. Hall TE. Beyond culture. New York: Anchor Books; 1979.
  • PubMed/NCBI
  • 8. Schwinges RC. (Ed.). Examen, Titel, Promotionen. Akademisches und Staatliches Qualifikationswesen vom 13. bis zum 21. Jahrhundert [Examinations, Titles, Doctorates, Academic and Government Qualifications from the 13th to the 21th Century]. Basilea: Schwabe; 2007. https://doi.org/10.1093/acprof:osobl/9780199694-044.003.0009
  • 16. Gerdeman RD. Academic dishonesty and the community college, ERIC Digest, ED447840; 2000. Available from: https://www.ericdigests.org/2001-3/college.htm
  • 27. Songsriwittaya A, Kongsuwan S, Jitgarum K, Kaewkuekool S, Koul R. Engineering Students' Attitude towards Plagiarism a Survey Study. Korea: ICEE & ICEER; 2009.
  • 37. Razera D, Verhage H, Pargman TC, Ramberg R. Plagiarism awareness, perception, and attitudes among students and teachers in Swedish higher education-a case study. Paper Presented at the 4th International Plagiarism Conference-Towards an authentic future. Newcastle Upon Tyne, UK. 2010. Available from http://www.plagiarismadvice.org/researchpapers/item/plagiarism-awareness
  • 39. Bennett S, Maton K. Intellectual field or faith-based religion: moving on from the idea of ‘digital natives’. In Thomas M, editor. Deconstructing digital natives. London: Routledge; 2011. pp. 169–185.
  • 40. Prensky M. Digital wisdom and homo sapiens digital. In Thomas M, editor. Deconstructing digital natives. London: Routledge; 2011. pp. 15–29.
  • 42. Introna L, Hayes N, Blair L, Wood E. Cultural attitudes towards plagiarism. Lancaster: University of Lancaster; 2003.
  • 48. Puklek-Levpušček M, Zupančič M. Slovenia. In Arnett JJ, editor. International encyclopedia on adolescence. New York: Routledge; 2007. pp. 866–877. 10.1007/s10734-011-9481-4 .
  • 49. Zupančič M. Razvojno obdobje prehoda v odraslost—temeljne značilnosti. [Developmental period of transition to adulthood—basic characteristics]. In Puklek-Levpušček M, Zupančič M, editors. Študenti na prehodu v odraslost [Students in transition to adulthood]. Ljubljana: Znanstveno raziskovalni inštitut Filozofske fakultete; 2011. pp. 9–38.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

Plagiarism in research

Affiliation.

  • 1 Department of Learning, Informatics, Management and Ethics, Stockholm Centre for Healthcare Ethics, Karolinska Institutet, 171 77, Stockholm, Sweden, [email protected].
  • PMID: 24993050
  • DOI: 10.1007/s11019-014-9583-8

Plagiarism is a major problem for research. There are, however, divergent views on how to define plagiarism and on what makes plagiarism reprehensible. In this paper we explicate the concept of "plagiarism" and discuss plagiarism normatively in relation to research. We suggest that plagiarism should be understood as "someone using someone else's intellectual product (such as texts, ideas, or results), thereby implying that it is their own" and argue that this is an adequate and fruitful definition. We discuss a number of circumstances that make plagiarism more or less grave and the plagiariser more or less blameworthy. As a result of our normative analysis, we suggest that what makes plagiarism reprehensible as such is that it distorts scientific credit. In addition, intentional plagiarism involves dishonesty. There are, furthermore, a number of potentially negative consequences of plagiarism.

PubMed Disclaimer

Similar articles

  • Whose idea is it anyway? The importance of reputation in acknowledgement. Shaw A, Olson K. Shaw A, et al. Dev Sci. 2015 May;18(3):502-9. doi: 10.1111/desc.12234. Epub 2014 Sep 16. Dev Sci. 2015. PMID: 25227735
  • Beyond Trust: Plagiarism and Truth. Penders B. Penders B. J Bioeth Inq. 2018 Mar;15(1):29-32. doi: 10.1007/s11673-017-9825-6. Epub 2017 Dec 12. J Bioeth Inq. 2018. PMID: 29234992 Free PMC article.
  • Knowing and avoiding plagiarism during scientific writing. Kumar PM, Priya NS, Musalaiah S, Nagasree M. Kumar PM, et al. Ann Med Health Sci Res. 2014 Sep;4(Suppl 3):S193-8. doi: 10.4103/2141-9248.141957. Ann Med Health Sci Res. 2014. PMID: 25364588 Free PMC article. Review.
  • The effects of repeated idea elaboration on unconscious plagiarism. Stark LJ, Perfect TJ. Stark LJ, et al. Mem Cognit. 2008 Jan;36(1):65-73. doi: 10.3758/mc.36.1.65. Mem Cognit. 2008. PMID: 18323063
  • Rising from Plagiarising. Mohan M, Shetty D, Shetty T, Pandya K. Mohan M, et al. J Maxillofac Oral Surg. 2015 Sep;14(3):538-40. doi: 10.1007/s12663-014-0705-x. Epub 2014 Oct 21. J Maxillofac Oral Surg. 2015. PMID: 26225041 Free PMC article. Review.
  • Editorial bullying: an exploration of acts impacting publication ethics and related environment. Javed F, Michelogiannakis D, Rossouw PE. Javed F, et al. Front Res Metr Anal. 2024 Feb 21;9:1345553. doi: 10.3389/frma.2024.1345553. eCollection 2024. Front Res Metr Anal. 2024. PMID: 38450043 Free PMC article. Review.
  • The underappreciated wrong of AIgiarism - bypass plagiarism that risks propagation of erroneous and bias content. Tang BL. Tang BL. EXCLI J. 2023 Aug 26;22:907-910. doi: 10.17179/excli2023-6435. eCollection 2023. EXCLI J. 2023. PMID: 37780940 Free PMC article. No abstract available.
  • The Cultural Context of Plagiarism and Research Misconduct in the Asian Region. Rodrigues F, Gupta P, Khan AP, Chatterjee T, Sandhu NK, Gupta L. Rodrigues F, et al. J Korean Med Sci. 2023 Mar 27;38(12):e88. doi: 10.3346/jkms.2023.38.e88. J Korean Med Sci. 2023. PMID: 36974397 Free PMC article. Review.
  • "There must be Someone's Name Under Every Bit of Text, Even if it is Unimportant or Incorrect": Plagiarism as a Learning Strategy. Bielska B, Rutkowski M. Bielska B, et al. J Acad Ethics. 2022;20(4):479-498. doi: 10.1007/s10805-021-09419-z. Epub 2021 Jun 17. J Acad Ethics. 2022. PMID: 34155438 Free PMC article.
  • Publication Ethics. Mishra K, Dabas A. Mishra K, et al. Indian Pediatr. 2021 Aug 15;58(8):781-785. doi: 10.1007/s13312-021-2291-6. Epub 2021 Apr 20. Indian Pediatr. 2021. PMID: 33876779 Free PMC article.
  • Nature. 2010 Sep 9;467(7312):153 - PubMed
  • Nature. 2007 Oct 11;449(7163):658 - PubMed
  • Vaccine. 2012 Nov 26;30(50):7131-3 - PubMed
  • Sci Eng Ethics. 2012 Jun;18(2):223-39 - PubMed
  • Bioethics. 2010 Jul;24(6):267-72 - PubMed
  • Search in MeSH

LinkOut - more resources

Full text sources, other literature sources.

  • scite Smart Citations

Research Materials

  • NCI CPTC Antibody Characterization Program
  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

Scribbr Plagiarism Checker

Plagiarism checker software for students who value accuracy

Extensive research shows that Scribbr's plagiarism checker, in partnership with Turnitin, detects plagiarism more accurately than other tools, making it the no. 1 choice for students.

plagiarism-checker-comparison-2022

What you get with a premium plagiarism check

Plagiarism report

Plagiarism Checker

Catch accidental plagiarism with high accuracy with Scribbr’s Plagiarism Checker in partnership with Turnitin.

AI detector

AI Detector

Detect AI-generated content, like ChatGPT3.5 and GPT4, with Scribbr’s AI Detector.

Grammar checked document

AI Proofreader

Find and fix spelling and grammar issues with Scribbr’s AI Proofreader.

* Only available when uploading an English .docx (Word) document

How Scribbr detects plagiarism better

Scribbr is an authorized Turnitin partner

Powered by leading plagiarism checking software

Scribbr is an authorized partner of Turnitin, a leader in plagiarism prevention. Its software detects everything from exact word matches to synonym swapping .

Exclusive content databases

Access to exclusive content databases

Your submissions are compared to the world’s largest content database , covering 99 billion webpages, 8 million publications, and over 20 languages.

Upload documents to check for self-plagiarism

Comparison against unpublished works

You can upload your previous assignments, referenced works, or a classmate’s paper or essay to catch (self-)plagiarism that is otherwise difficult to detect.

Turnitin Similarity Report

The Scribbr Plagiarism Checker is perfect for you if:

  • Are a student writing an essay or paper
  • Value the confidentiality of your submissions
  • Prefer an accurate plagiarism report
  • Want to compare your work against publications

This tool is not for you if you:

  • Prefer a free plagiarism checker despite a less accurate result
  • Are a copywriter, SEO, or business owner

Get started

Trusted by students and academics worldwide

University applicant checking their essay for plagiarism

University applicants

Ace your admissions essay to your dream college.

Compare your admissions essay to billions of web pages, including other essays.

  • Avoid having your essay flagged or rejected for accidental plagiarism.
  • Make a great first impression on the admissions officer.

Student checking for plagiarism

Submit your assignments with confidence.

Detect plagiarism using software similar to what most universities use.

  • Spot missing citations and improperly quoted or paraphrased content.
  • Avoid grade penalties or academic probation resulting from accidental plagiarism.

Academic working to prevent plagiarism

Take your journal submission to the next level.

Compare your submission to millions of scholarly publications.

  • Protect your reputation as a scholar.
  • Get published by the journal of your choice.

Money-back guarantee

Happiness guarantee

Scribbr’s services are rated 4.9 out of 5 based on 13,583 reviews. We aim to make you just as happy. If not, we’re happy to refund you !

Privacy guarantee

Privacy guarantee

Your submissions will never be added to our content database, and you’ll never get a 100% match at your academic institution.

Price per document

Select your currency

Prices are per check, not a subscription

  • Turnitin-powered plagiarism checker
  • Access to 99.3B web pages & 8M publications
  • Comparison to private papers to avoid self-plagiarism
  • Downloadable plagiarism report
  • Live chat with plagiarism experts
  • Private and confidential

Volume pricing available for institutions. Get in touch.

Request volume pricing

Institutions interested in buying more than 50 plagiarism checks can request a discounted price. Please fill in the form below.

Name * Email * Institution Name * Institution’s website * Country * Phone number Give an indication of how many checks you need * Please indicate how you want to use the checks * Depending of the size of your request, you will be contacted by a representative of either Scribbr or Turnitin. * Required

Avoiding accidental plagiarism

You don't need a plagiarism checker, right?

You would never copy-and-paste someone else’s work, you’re great at paraphrasing, and you always keep a tidy list of your sources handy.

But what about accidental plagiarism ? It’s more common than you think! Maybe you paraphrased a little too closely, or forgot that last citation or set of quotation marks.

Even if you did it by accident, plagiarism is still a serious offense. You may fail your course, or be placed on academic probation. The risks just aren’t worth it.

Scribbr & academic integrity

Scribbr is committed to protecting academic integrity. Our plagiarism checker software, Citation Generator , proofreading services , and free Knowledge Base content are designed to help educate and guide students in avoiding unintentional plagiarism.

We make every effort to prevent our software from being used for fraudulent or manipulative purposes.

Ask our team

Want to contact us directly? No problem.  We  are always here for you.

Support team - Nina

Frequently asked questions

No, the Self-Plagiarism Checker does not store your document in any public database.

In addition, you can delete all your personal information and documents from the Scribbr server as soon as you’ve received your plagiarism report.

Scribbr’s Plagiarism Checker is powered by elements of Turnitin’s Similarity Checker , namely the plagiarism detection software and the Internet Archive and Premium Scholarly Publications content databases .

The add-on AI detector is powered by Scribbr’s proprietary software.

Extensive testing proves that Scribbr’s plagiarism checker is one of the most accurate plagiarism checkers on the market in 2022.

The software detects everything from exact word matches to synonym swapping. It also has access to a full range of source types, including open- and restricted-access journal articles, theses and dissertations, websites, PDFs, and news articles.

At the moment we do not offer a monthly subscription for the Scribbr Plagiarism Checker . This means you won’t be charged on a recurring basis – you only pay for what you use. We believe this provides you with the flexibility to use our service as frequently or infrequently as you need, without being tied to a contract or recurring fee structure.

You can find an overview of the prices per document here:

Small document (up to 7,500 words) $19.95
Normal document (7,500-50,000 words) $29.95
Large document (50,000+ words) $39.95

Please note that we can’t give refunds if you bought the plagiarism check thinking it was a subscription service as communication around this policy is clear throughout the order process.

Your document will be compared to the world’s largest and fastest-growing content database , containing over:

  • 99.3 billion current and historical webpages.
  • 8 million publications from more than 1,700 publishers such as Springer, IEEE, Elsevier, Wiley-Blackwell, and Taylor & Francis.

Note: Scribbr does not have access to Turnitin’s global database with student papers. Only your university can add and compare submissions to this database.

Scribbr’s plagiarism checker offers complete support for 20 languages, including English, Spanish, German, Arabic, and Dutch.

The add-on AI Detector and AI Proofreader are only available in English.

The complete list of supported languages:

If your university uses Turnitin, the result will be very similar to what you see at Scribbr.

The only possible difference is that your university may compare your submission to a private database containing previously submitted student papers. Scribbr does not have access to these private databases (and neither do other plagiarism checkers).

To cater to this, we have the Self-Plagiarism Checker at Scribbr. Just upload any document you used and start the check. You can repeat this as often as you like with all your sources. With your Plagiarism Check order, you get a free pass to use the Self-Plagiarism Checker. Simply upload them to your similarity report and let us do the rest!

Your writing stays private. Your submissions to Scribbr are not published in any public database, so no other plagiarism checker (including those used by universities) will see them.

Free plagiarism checker by EasyBib

Check for plagiarism, grammar errors, and more.

  • Expert Check

plagiarism research paper

Check for accidental plagiarism

Avoid unintentional plagiarism. Check your work against billions of sources to ensure complete originality.

plagiarism research paper

Find and fix grammar errors

Turn in your best work. Our smart proofreader catches even the smallest writing mistakes so you don't have to.

plagiarism research paper

Get expert writing help

Improve the quality of your paper. Receive feedback on your main idea, writing mechanics, structure, conclusion, and more.

What students are saying about us

plagiarism research paper

"Caught comma errors that I actually struggle with even after proofreading myself."

- Natasha J.

plagiarism research paper

"I find the suggestions to be extremely helpful especially as they can instantly take you to that section in your paper for you to fix any and all issues related to the grammar or spelling error(s)."

- Catherine R.

plagiarism research paper

Check for unintentional plagiarism

Easily check your paper for missing citations and accidental plagiarism with the EasyBib plagiarism checker. The EasyBib plagiarism checker:

  • Scans your paper against billions of sources.
  • Identifies text that may be flagged for plagiarism.
  • Provides you with a plagiarism score.

You can submit your paper at any hour of the day and quickly receive a plagiarism report.

What is the EasyBib plagiarism checker? 

Most basic plagiarism checkers review your work and calculate a percentage, meaning how much of your writing is indicative of original work. But, the EasyBib plagiarism checker goes way beyond a simple percentage. Any text that could be categorized as potential plagiarism is highlighted, allowing you time to review each warning and determine how to adjust it or how to cite it correctly.

You’ll even see the sources against which your writing is compared and the actual word for word breakdown. If you determine that a warning is unnecessary, you can waive the plagiarism check suggestion.

Plagiarism is unethical because it doesn’t credit those who created the original work; it violates intellectual property and serves to benefit the perpetrator. It is a severe enough academic offense, that many faculty members use their own plagiarism checking tool for their students’ work. With the EasyBib Plagiarism checker, you can stay one step ahead of your professors and catch citation mistakes and accidental plagiarism before you submit your work for grading.

plagiarism research paper

Why use a plagiarism checker? 

Imagine – it’s finals week and the final research paper of the semester is due in two days. You, being quite familiar with this high-stakes situation, hit the books, and pull together a ten-page, last-minute masterpiece using articles and materials from dozens of different sources.

However, in those late, coffee-fueled hours, are you fully confident that you correctly cited all the different sources you used? Are you sure you didn’t accidentally forget any? Are you confident that your teacher’s plagiarism tool will give your paper a 0% plagiarism score?

That’s where the EasyBib plagiarism checker comes in to save the day. One quick check can help you address all the above questions and put your mind at ease.

What exactly is plagiarism? 

Plagiarism has a number of possible definitions; it involves more than just copying someone else’s work. Improper citing, patchworking, and paraphrasing could all lead to plagiarism in one of your college assignments. Below are some common examples of accidental plagiarism that commonly occur.

Quoting or paraphrasing without citations

Not including in-text citations is another common type of accidental plagiarism. Quoting is taking verbatim text from a source. Paraphrasing is when you’re using another source to take the same idea but put it in your own words. In both cases, it’s important to always cite where those ideas are coming from. The EasyBib plagiarism checker can help alert you to when you need to accurately cite the sources you used.

Patchwork plagiarism

When writing a paper, you’re often sifting through multiple sources and tabs from different search engines. It’s easy to accidentally string together pieces of sentences and phrases into your own paragraphs. You may change a few words here and there, but it’s similar to the original text. Even though it’s accidental, it is still considered plagiarism. It’s important to clearly state when you’re using someone else’s words and work.

Improper citations

Depending on the class, professor, subject, or teacher, there are multiple correct citation styles and preferences. Some examples of common style guides that are followed for citations include MLA, APA, and Chicago style. When citing resources, it’s important to cite them accurately. Incorrect citations could make it impossible for a reader to track down a source and it’s considered plagiarism. There are EasyBib citation tools to help you do this.

Don’t fall victim to plagiarism pitfalls. Most of the time, you don’t even mean to commit plagiarism; rather, you’ve read so many sources from different search engines that it gets difficult to determine an original thought or well-stated fact versus someone else’s work. Or worse, you assume a statement is common knowledge, when in fact, it should be attributed to another author.

When in doubt, cite your source!

Time for a quick plagiarism quiz! 

Which of the following requires a citation?

  • A chart or graph from another source
  • A paraphrase of an original source
  • Several sources’ ideas summarized into your own paragraph
  • A direct quote
  • All of the above

If you guessed option E than you’d be correct. Correct punctuation and citation of another individual’s ideas, quotes, and graphics are a pillar of good academic writing.

What if you copy your own previous writing?

Resubmitting your own original work for another class’s assignment is a form of self-plagiarism, so don’t cut corners in your writing. Draft an original piece for each class or ask your professor if you can incorporate your previous research.

What features are available with the EasyBib plagiarism checker? 

Along with providing warnings and sources for possible plagiarism, the EasyBib  plagiarism checker works alongside the other EasyBib tools, including a grammar checker  and a spell checker . You’ll receive personalized feedback on your thesis and writing structure too!

The  plagiarism checker compares your writing sample with billions of available sources online so that it detects plagiarism at every level. You’ll be notified of which phrases are too similar to current research and literature, prompting a possible rewrite or additional citation. You’ll also get feedback on your paper’s inconsistencies, such as changes in text, formatting, or style. These small details could suggest possible plagiarism within your assignment.

And speaking of citations, there are also  EasyBib citation tools  available. They help you quickly build your bibliography and avoid accidental plagiarism. Make sure you know which citation format your professor prefers!

Great! How do I start? 

Simply copy and paste or upload your essay into the checker at the top of this page. You’ll receive the first five grammar suggestions for free! To try the plagiarism checker for free, start your EasyBib Plus three-day free trial.* If you love the product and decide to opt for premium services, you’ll have access to unlimited writing suggestions and personalized feedback.

The EasyBib plagiarism checker is conveniently available 24 hours a day and seven days a week. You can cancel anytime.  Check your paper for free today!.

*See Terms and Conditions

Visit www.easybib.com for more information on helpful EasyBib writing and citing tools.

For informational guides and on writing and citing, visit the EasyBib guides homepage .

Enago Academy

How to Avoid Plagiarism in Research Papers (Part 1)

' src=

Writing a research paper poses challenges in gathering literature and providing evidence for making your paper stronger. Drawing upon previously established ideas and values and adding pertinent information in your paper are necessary steps, but these need to be done with caution without falling into the trap of plagiarism . In order to understand how to avoid plagiarism , it is important to know the different types of plagiarism that exist.

What is Plagiarism in Research?

Plagiarism is the unethical practice of using words or ideas (either planned or accidental) of another author/researcher or your own previous works without proper acknowledgment. Considered as a serious academic and intellectual offense, plagiarism can result in highly negative consequences such as paper retractions and loss of author credibility and reputation. It is currently a grave problem in academic publishing and a major reason for paper retractions .

It is thus imperative for researchers to increase their understanding about plagiarism. In some cultures, academic traditions and nuances may not insist on authentication by citing the source of words or ideas. However, this form of validation is a prerequisite in the global academic code of conduct. Non-native English speakers  face a higher challenge of communicating their technical content in English as well as complying with ethical rules. The digital age too affects plagiarism. Researchers have easy access to material and data on the internet which makes it easy to copy and paste information.

Related: Conducting literature survey and wish to learn more about scientific misconduct? Check out this resourceful infographic today!

How Can You Avoid Plagiarism in a Research Paper?

Guard yourself against plagiarism, however accidental it may be. Here are some guidelines to avoid plagiarism.

1. Paraphrase your content

  • Do not copy–paste the text verbatim from the reference paper. Instead, restate the idea in your own words.
  • Understand the idea(s) of the reference source well in order to paraphrase correctly.
  • Examples on good paraphrasing can be found here ( https://writing.wisc.edu/Handbook/QPA_paraphrase.html )

2. Use Quotations

Use quotes to indicate that the text has been taken from another paper. The quotes should be exactly the way they appear in the paper you take them from.

3. Cite your Sources – Identify what does and does not need to be cited

  • The best way to avoid the misconduct of plagiarism is by self-checking your documents using plagiarism checker tools.
  • Any words or ideas that are not your own but taken from another paper  need to be cited .
  • Cite Your Own Material—If you are using content from your previous paper, you must cite yourself. Using material you have published before without citation is called self-plagiarism .
  • The scientific evidence you gathered after performing your tests should not be cited.
  • Facts or common knowledge need not be cited. If unsure, include a reference.

4. Maintain records of the sources you refer to

  • Maintain records of the sources you refer to. Use citation software like EndNote or Reference Manager to manage the citations used for the paper
  • Use multiple references for the background information/literature survey. For example, rather than referencing a review, the individual papers should be referred to and cited.

5. Use plagiarism checkers

You can use various plagiarism detection tools such as iThenticate or HelioBLAST (formerly eTBLAST) to see how much of your paper is plagiarised .

Tip: While it is perfectly fine to survey previously published work, it is not alright to paraphrase the same with extensive similarity. Most of the plagiarism occurs in the literature review section of any document (manuscript, thesis, etc.). Therefore, if you read the original work carefully, try to understand the context, take good notes, and then express it to your target audience in your own language (without forgetting to cite the original source), then you will never be accused with plagiarism (at least for the literature review section).

Caution: The above statement is valid only for the literature review section of your document. You should NEVER EVER use someone else’s original results and pass them off as yours!

What strategies do you adopt to maintain content originality? What advice would you share with your peers? Please feel free to comment in the section below.

If you would like to know more about patchwriting, quoting, paraphrasing and more, read the next article in this series!

' src=

Nice!! This article gives ideas to avoid plagiarism in a research paper and it is important in a research paper.

the article is very useful to me as a starter in research…thanks a lot!

it’s educative. what a wonderful article to me, it serves as a road map to avoid plagiarism in paper writing. thanks, keep your good works on.

I think this is very important topic before I can proceed with my M.A

it is easy to follow and understand

Nice!! These articles provide clear instructions on how to avoid plagiarism in research papers along with helpful tips.

Amazing and knowledgeable notes on plagiarism

Very helpful and educative, I have easily understood everything. Thank you so much.

Rate this article Cancel Reply

Your email address will not be published.

plagiarism research paper

Enago Academy's Most Popular Articles

best plagiarism checker

  • Language & Grammar
  • Reporting Research

Best Plagiarism Checker Tool for Researchers — Top 4 to choose from!

While common writing issues like language enhancement, punctuation errors, grammatical errors, etc. can be dealt…

Use synonyms

How to Use Synonyms Effectively in a Sentence? — A way to avoid plagiarism!

Do you remember those school days when memorizing synonyms and antonyms played a major role…

plagiarism detector

  • Manuscripts & Grants

Reliable and Affordable Plagiarism Detector for Students in 2022

Did you know? Our senior has received a rejection from a reputed journal! The journal…

Similarity Report

  • Publishing Research
  • Submitting Manuscripts

3 Effective Tips to Make the Most Out of Your iThenticate Similarity Report

This guest post is drafted by an expert from iThenticate, a plagiarism checker trusted by the world’s…

originality

How Can Researchers Avoid Plagiarism While Ensuring the Originality of Their Manuscript?

How Can Researchers Avoid Plagiarism While Ensuring the Originality of Their…

Is Your Reputation Safe? How to Ensure You’re Passing a Spotless Manuscript to Your…

Should the Academic Community Trust Plagiarism Detectors?

plagiarism research paper

Sign-up to read more

Subscribe for free to get unrestricted access to all our resources on research writing and academic publishing including:

  • 2000+ blog articles
  • 50+ Webinars
  • 10+ Expert podcasts
  • 50+ Infographics
  • 10+ Checklists
  • Research Guides

We hate spam too. We promise to protect your privacy and never spam you.

  • Industry News
  • AI in Academia
  • Promoting Research
  • Career Corner
  • Diversity and Inclusion
  • Infographics
  • Expert Video Library
  • Other Resources
  • Enago Learn
  • Upcoming & On-Demand Webinars
  • Peer Review Week 2024
  • Open Access Week 2023
  • Conference Videos
  • Enago Report
  • Journal Finder
  • Enago Plagiarism & AI Grammar Check
  • Editing Services
  • Publication Support Services
  • Research Impact
  • Translation Services
  • Publication solutions
  • AI-Based Solutions
  • Thought Leadership
  • Call for Articles
  • Call for Speakers
  • Author Training
  • Edit Profile

I am looking for Editing/ Proofreading services for my manuscript Tentative date of next journal submission:

plagiarism research paper

In your opinion, what is the most effective way to improve integrity in the peer review process?

Banner

  • Arnold Bernhard Library
  • Research Guides

APA 7th Edition - Citing Sources (Arnold Bernhard Library)

  • Avoid Plagiarism
  • APA Reference List Examples
  • In-Text Citations
  • Paper Format
  • Annotated Bibliographies

Academic Integrity

What is plagiarism, types of plagiarism, plagiarism spectrum, what is common knowledge, when to cite, check your understanding of plagiarism, learning commons at qu.

  • Cite with RefWorks

At Quinnipiac, our community has chosen integrity as one of its guiding principles. Please follow this link to QU's Academic Integrity page:

  • Quinnipiac's Academic Integrity Policy This policy, overseen and administered by the Office of Academic Innovation and Effectiveness, is part of the larger educational effort at Quinnipiac University in which community members learn and practice ethical behavior. All members of the Quinnipiac University community are expected to commit themselves to personal and academic integrity.

An important aspect of maintaining academic integrity is to avoid plagiarism, which is the uncredited use (both intentional and unintentional) of someone else's words or ideas.

Plagiarism is the act of presenting someone else's work or ideas as your own, without giving proper credit. This includes copying text directly, paraphrasing without citation, or using information without acknowledging the source.

Intentional Plagiarism Examples:

Copying someone else’s research and turning it in as your own

Buying a paper from a private or commercial source

Copying out paragraphs, or even sentences, of articles or books and incorporating them into your own paper without documentation

Lifting ideas or interpretations from other sources without acknowledgment

Using a paper you submitted for one assignment for another assignment

Unintentional Plagiarism Examples :

Forgetting to put quotation marks around a sentence

If your paraphrase is too close to the wording or sentence structure of a source

If you quote something, but do not put a source

If you do not cite an idea you clearly did not come up with on your own

Plagiarism Spectrum.

Turnitin. (2021). Plagiarism Spectrum 2.0 .  https://www.turnitin.com/resources/plagiarism-spectrum-2-0

Common knowledge  is information that is widely known and accepted by the general public.

For example, it is common knowledge that:

  • The Earth revolves around the sun.
  • George Washington was the first president of the United States.
  • Water freezes at 32 ° F / 0 ° C.

Common knowledge facts do NOT require citation . However, it's essential to use good judgment when determining if something is truly common knowledge. If you're unsure, it's always better to cite your source.

You MUST include an in-text citation along with a corresponding end reference citation (Bibliography, Works Cited page, etc.) when:

  • You use a direct quotation , even if it is in quotation marks
  • You use facts that are not common knowledge
  • You paraphrase the author’s idea(s)
  • You have changed some of the author’s words (i.e., used synonyms )
  • You use key words or phrases from the author
  • You mention the author’s name in your sentence
  • You have written a sentence that mostly consists of your own thoughts, but you have made a reference to another author’s idea

The staff of the Learning Commons provide invaluable support for the QU community, including:

  • Peer Academic Support / Tutoring
  • Support for Students with Disabilities
  • Academic Coaching & Outreach

If you need tutoring assistance, please contact the Learning Commons via their email:  [email protected]

Please remember that tutors will not copy-edit your papers for you, but they can look for and give advice on copy-editing problems. The tutor email box is checked daily. Feedback turnaround time can be up to 48 hours, though it can be longer over the weekends or in busy times. If you anticipate needing a tutor, make contact with the Learning Center email address 5 days before the assignment is due.

You can also view the Learning Commons page in MyQ via this link: Learning Commons in MyQ

Web Resources

  • Purdue OWL: Plagiarism This resource explores plagiarism and how to avoid it.
  • << Previous: Tutorials
  • Next: Cite with RefWorks >>
  • Last Updated: Aug 13, 2024 9:27 AM
  • URL: https://libraryguides.quinnipiac.edu/APA7th

IEEE - Advancing Technology for Humanity

is Mainsite

IEEE - Advancing Technology for Humanity

  • Search all IEEE websites
  • Mission and vision
  • IEEE at a glance
  • IEEE Strategic Plan
  • Organization of IEEE
  • Diversity, Equity, & Inclusion
  • Organizational Ethics
  • Annual Report
  • History of IEEE
  • Volunteer resources
  • IEEE Corporate Awards Program
  • Financials and Statistics
  • IEEE Future Directions
  • IEEE for Industry (Corporations, Government, Individuals)
  • IEEE Climate Change
  • Humanitarian and Philanthropic Opportunities
  • Select an option
  • Get the latest news
  • Access volunteer resources (Code of Ethics, financial forms, tools and templates, and more)
  • Find IEEE locations
  • Get help from the IEEE Support Center
  • Recover your IEEE Account username and password
  • Learn about the IEEE Awards program and submit nomination
  • View IEEE's organizational structure and leadership
  • Apply for jobs at IEEE
  • See the history of IEEE
  • Learn more about Diversity, Equity & Inclusion at IEEE
  • Join an IEEE Society
  • Renew your membership
  • Member benefits
  • IEEE Contact Center
  • Connect locally
  • Memberships and Subscriptions Catalog
  • Member insurance and discounts
  • Member Grade Elevation
  • Get your company engaged
  • Access your Account
  • Learn about membership dues
  • Learn about Women in Engineering (WIE)
  • Access IEEE member email
  • Find information on IEEE Fellows
  • Access the IEEE member directory
  • Learn about the Member-Get-a-Member program
  • Learn about IEEE Potentials magazine
  • Learn about Student membership
  • Affinity groups
  • IEEE Societies
  • Technical Councils
  • Technical Communities
  • Geographic Activities
  • Working groups
  • IEEE Regions
  • IEEE Collabratec®
  • IEEE Resource Centers
  • IEEE DataPort
  • See the IEEE Regions
  • View the MGA Operations Manual
  • Find information on IEEE Technical Activities
  • Get IEEE Chapter resources
  • Find IEEE Sections, Chapters, Student Branches, and other communities
  • Learn how to create an IEEE Student Chapter
  • Upcoming conferences
  • IEEE Meetings, Conferences & Events (MCE)
  • IEEE Conference Application
  • See benefits of authoring a conference paper
  • Search for 2025 conferences
  • Search for 2024 conferences
  • Find conference organizer resources
  • Register a conference
  • Publish conference papers
  • Manage conference finances
  • Learn about IEEE Meetings, Conferences & Events (MCE)
  • Visit the IEEE SA site
  • Become a member of the IEEE SA
  • Find information on the IEEE Registration Authority
  • Obtain a MAC, OUI, or Ethernet address
  • Access the IEEE 802.11™ WLAN standard
  • Purchase standards
  • Get free select IEEE standards
  • Purchase standards subscriptions on IEEE Xplore®
  • Get involved with standards development
  • Find a working group
  • Find information on IEEE 802.11™
  • Access the National Electrical Safety Code® (NESC®)
  • Find MAC, OUI, and Ethernet addresses from Registration Authority (regauth)
  • Get free IEEE standards
  • Learn more about the IEEE Standards Association
  • View Software and Systems Engineering Standards
  • IEEE Xplore® Digital Library
  • Subscription options
  • IEEE Spectrum
  • The Institute
  • Proceedings of the IEEE
  • IEEE Access®
  • Author resources
  • Get an IEEE Xplore digital library trial for IEEE members
  • Review impact factors of IEEE journals
  • Request access to the IEEE Thesaurus and Taxonomy
  • Access the IEEE copyright form
  • Find article templates in Word and LaTeX formats
  • Get author education resources
  • Visit the IEEE Xplore digital library
  • Find Author Digital Tools for IEEE paper submission
  • Review the IEEE plagiarism policy
  • Get information about all stages of publishing with IEEE
  • IEEE Learning Network (ILN)
  • IEEE Credentialing Program
  • Pre-university
  • IEEE-Eta Kappa Nu
  • Accreditation
  • Access continuing education courses on the IEEE Learning Network
  • Find STEM education resources on TryEngineering.org
  • Learn about the TryEngineering Summer Institute for high school students
  • Explore university education program resources
  • Access pre-university STEM education resources
  • Learn about IEEE certificates and how to offer them
  • Find information about the IEEE-Eta Kappa Nu honor society
  • Learn about resources for final-year engineering projects
  • Access career resources
  • Publications
  • IEEE - Plagiarism

Identifying Plagiarism

  • IEEE Xplore Digital Library Subscriptions
  • Get Institutional Access to IEEE Information
  • Learn about IEEE Institutional* Subscription Terms of Use
  • Institutional Subscriptions* Terms of Use and How You Can Get Even More Benefits for Your Organization
  • Benefits of Publishing with IEEE
  • Open Access
  • IEEE Intellectual Property Rights
  • Publishing Services for IEEE Organizations
  • Contact IEEE Publishing

When does plagiarism occur? Is there an established percentage, a rule of thumb, a saturation point that we can use to determine when plagiarism has taken place? Or is it simply that "plagiarism is plagiarism"? The answer may lie somewhere between the stark (and perhaps too simple) dictum and the convenience of ready-made measures. In most cases, the dictum can be applied appropriately: plagiarism is plagiarism. However, there are in fact degrees of plagiarism: one can steal an entire paper, or a section of a paper, or a page, a paragraph or a sentence. Even copying phrases without credit and quotation marks can be considered plagiarism. In other words, paraphrasing done improperly can qualify as plagiarism.

So, there are several basic factors to consider when evaluating a case of possible plagiarism:

  • Amount or quantity (full paper, a section of a paper, a page, a paragraph, a sentence, phrases)
  • Use of quotation marks for all copied text
  • Appropriate placement of credit notices
  • Improper paraphrasing

Possible plagiarism scenarios

Plagiarism, in short.

Potentially complicating the effort to identify plagiarism is the fact that each of the above basic factors can be combined with other factors, creating a range of possible plagiarism scenarios. Here, then, is a full list of possible scenarios, starting with the worst case:

  • Uncredited Verbatim Copying of a Full Paper, or Uncredited Verbatim Copying of a Major Portion (more than 50%) within a Single Paper--An instance is where a large section of the original paper is copied without quotation marks, credit notice, reference, and bibliography. This case also includes instances where different portions of a paper are copied without attribution from a number of papers by other authors, and the sum of plagiarized material is more 50%, or Uncredited Verbatim Copying within More than a Single Paper by the Same Author(s)--This includes instances where more than one paper by the offending author(s) has been found to contain plagiarized content, and all the percentages of plagiarized material in each of the discovered papers sum to greater than 50%.
  • Uncredited Verbatim Copying of a Large Portion (greater than 20% and up to 50%) within a Paper.--An instance is where a section of the original paper is copied from another paper without quotation marks, credit notice, reference, and bibliography. This case also includes instances where different portions of a paper are copied without attribution from a number of papers by other authors, and the sum of copying results in a large portion of plagiarized material (up to 50%) in the paper, or Uncredited Verbatim Copying within More than One Paper by the Same Author(s)--This includes instances where the sum of plagiarized material from the different papers would constitute the equivalent of a large portion (greater than 20% and up to 50%) of the discovered paper with the fewest words.
  • Uncredited Verbatim Copying of Individual Elements (Paragraph(s), Sentence(s), Illustration(s), etc.) Resulting in a Significant Portion (up to 20%) within a Paper--An instance could be where portions of original paper are used in another paper without quotation marks, credit notice, reference, and bibliography.
  • Uncredited Improper Paraphrasing of Pages or Paragraphs. Instances of improper paraphrasing occur when only a few words and phrases have been changed or when the original sentence order has been rearranged; no credit notice or reference appears with the text.
  • Credited Verbatim Copying of a Major Portion of a Paper without Clear Delineation. Instances could include sections of an original paper copied from another paper; credit notice is used but absence of quotation marks or offset text does not clearly reference or identify the specific, copied material.

The extreme and more obvious cases notwithstanding, the above scenarios provide us with some basic determining factors we can use when attempting to deal with allegations of plagiarism between authors.

  • Amount or quantity does not play a part in defining plagiarism. However, the amount of material plagiarized should play an important part in determining the appropriate corrective action.
  • Credit notices or references are not sufficient to deflecting a charge of plagiarism if quotation marks or offset text have not been used to identify the specific material being copied.
  • Paraphrasing can leave an author open to a charge of plagiarism if he or she has changed only a few words or phrases or has only rearranged the original sentence order. Even a proper paraphrasing of the original text can lead to a charge of plagiarism if the original source is not properly cited.

Any discussion on a subject such as plagiarism must be founded on a few, basic ideas on which all can agree. A discussion will help refine our understanding, but we need to start with some accepted basics.

Consistency

One such idea, as already mentioned, is that plagiarism is plagiarism, regardless of the amount having been copied. However, scale is important, especially in trying to determine an appropriate corrective action. Introducing scale as an important consideration also brings the idea of "consistency" into the discussion. Until the "Guidelines for Adjudicating Different Levels of Plagiarism" had been developed and approved, there had not been any measure or method for linking "scale" with a corresponding corrective action so that consistent and fair judgments may be reached across all IEEE organizational units and over the years. Early in the discussion, consistency was seen as a critically important subject for the successful development of effective guidelines.

Are there valid exceptions to the rules against plagiarism in technical writing?

The fundamental nature of scientific/technical writing on and reporting of research results is that so much of it is closely based on the archival literature. Is it not required for new work to call upon and use the work that has already been published, at least in order to establish a necessary level of authentication and validation? New work depends on the very close and careful use of the archive. Therefore, are exceptions to be made for scientific/technical writing where the rules against plagiarism are concerned?

Similarly, some opinion has it that since technical writing is not "literary" writing, i.e., not at the level of Shakespeare, it is therefore acceptable to use a "certain amount" of someone else's text without having to indicate the specific text, especially when a citation or reference appears in the vicinity of the copied material. The same school of thought would argue that the use of quotation marks and/or indented text to signify the use of someone else's text would interrupt the flow of the writing, would interfere with the reader's comprehension of the work, especially since there would be, by necessity, so much of it (quote marks or indents). Again, should the nature of technical/scientific/archival writing allow exceptions to the proper use of, in this case, quotation marks and/or indented text?

Paraphrasing

Paraphrasing will always be a difficult area to adjudicate. Since plagiarism involves not only the unacknowledged reuse of some else's words but also someone's ideas, it is possible to render a properly paraphrased section of text and still be open to a charge of plagiarism if proper credit for the idea has not been given. Even so, we should be able to agree that changing only a few words or phrases or only rearranging the original sentence order of another author's work will be defined as plagiarism.

Quetext

Recognizing & Avoiding Plagiarism in Your Research Paper

  • Posted on January 26, 2024 January 26, 2024

Recognizing and Avoiding Plagiarism in Your Research Paper

Plagiarism in research is unfortunately still a serious problem today. Research papers with plagiarism contain unauthorized quoting from other authors; the writer may even try to pass off others’ work as their own. This damages the individual’s reputation, but also the entire class, school, or field, because one can never fully trust that writer’s work is genuine. Naturally, you don’t want to contribute to that problem.

Unfortunately, plagiarism doesn’t have to be intentional to be damaging. College students and even professionals often fail to properly cite their sources for anything that isn’t common knowledge. While accidental plagiarism is more innocent, it is not less dangerous as it can still get you in a great deal of academic trouble.

The good news is, as long as you put research integrity first, and do your plagiarism due diligence, you shouldn’t have anything to worry about.

Ready to learn all about research paper citation rules, and how to avoid getting caught in this trap? Let’s take a look.

What is Plagiarism in Research?

Plagiarism is the act of using someone else’s work or intellectual property without acknowledging what you’re doing. Before you can truly understand plagiarism rules, it’s critical to know what plagiarism is in academic writing. In a nutshell, work is considered plagiarized if it is not your own original work and is also not cited as someone else’s.

Plagiarism is unethical because it takes the blood, sweat, and tears of other writers and passes it off as yours, without real effort on your part. This can dilute someone else’s standing or lead to confusion down the line about where credit is due. If readers can’t tell whose work led to certain academic papers, data sets, or theories, the science and art worlds suffer.

Science writers, journalists, marketing experts, medical and dental researchers, and students, among others, have all been stung by misunderstanding these rules. Even a simple copy and paste without attribution or referencing the original author is enough to signal professional and/or academic dishonesty.

The bottom line is, if you’re using someone else’s text word-for-word, you absolutely must note where that work came from. This protects the ideas of others and upholds publication ethics for all of us. That means, of course, you need to spot any plagiarism red flags from the get-go.

What Types of Plagiarism Can Occur in a Research Paper?

Some of the most common forms of plagiarism that occur in research papers and other forms of academic writing are:

  • In-text citations, in parentheses, without a corresponding citation in a bibliography or works cited page, which means people have trouble finding the true source
  • Citing work incorrectly
  • Not following the prescribed citation style, whether that’s APA, MLA, or Chicago, making it difficult or impossible for others to find the source
  • Paraphrasing someone else’s work too closely without citing the source
  • Using data or statistics from someone else without a proper citation
  • Following the format of someone else’s work in a section or in the paper as a whole
  • Attributing research to the wrong person, such as cutting and pasting someone else’s quote and attributing it either to an incorrect author, or simply not providing attribution at all
  • Relying too heavily on just a few sources, meaning you are taking their ideas wholesale

The truth is, most of this plagiarism isn’t even on purpose. Indeed, unintentional plagiarism is a major source of confusion in academia, where you yourself don’t realize that you have committed it. Self-plagiarism poses a problem too and is when you reuse your own work without citing it. This is definitely considered plagiarism even though you are the original author. Although re-using your old work is allowed with citations, doing so without them passes off old work as original, which has two drawbacks:

  • You are not obeying the spirit of the assignment, which is to put in the time to create something new with your own ideas.
  • You create downstream confusion when people are searching for your work, which conflicts with the entire goal of citing sources.

Direct plagiarism also occurs; however, direct plagiarism is intentional. Intentional thievery is even worse because it is often disguised by the person committing it and therefore more harmful to the original author. Again, this leads to severe moral and ethical problems, as it dilutes the hard work of others. Considering the fact that it’s generally quite easy to detect direct plagiarism, it’s worthwhile for students to realize that committing plagiarism intentionally is never worth it.

In summary, there are many examples of plagiarism of which to be aware. All of these can lead to serious trouble if you’re not continually wary of the plagiarism research paper traps. Students should know that Blackboard and other online academic portals check for plagiarism . Professionals should know that serious plagiarizing can cost them licenses, grants, and standing among their peers.

In other words, it’s no joke. To avoid potential consequences, keep an eye out for the following plagiarism research paper warning signs.

Warning Signs of Plagiarism in a Research Paper

To avoid plagiarism, research papers must be free of uncited work that uses the ideas of others. That means indicating the original source every single time you use one, with a proper citation, in the correct style as dictated by your professor or industry.

While unintentional plagiarism can happen to anyone, knowing its signs can help students and professional authors realize when they need to rewrite or add a citation to their writing. That will help you stay on the good side of academia, respect others’ work and ensure your own work is always improving. When reviewing your paper, look for the following signs that you may have failed to cite sources properly.

Infrequent Use of Citations

If you simply don’t have very many citations in a long research paper, you are likely using the ideas of others without proper credit. Most well-researched papers use dozens of sources for a 10-page paper. That indicates that you are weaving together others’ work to express your own ideas.

However, if your work contains close to five or six citations, chances are you are relying too heavily on ideas that are not your own. This indicates that you need to search more carefully for ideas that belong to others in your writing and cite them. As another suggestion, you should probably seek additional different sources to support your argument.

Using Words That You Don’t Normally Use

Any section of your paper that contains a smattering of words that don’t fit into your existing vocabulary hints you’ve likely nabbed them from somewhere else. While it’s fine (and good!) to build your word bank, inserting non-typical words into your text is a good indication that you are also inserting the ideas of others without credit. Comb over such sections carefully to ensure you have properly accredited the original writer.

Changes in Tone and Sentence Structure

As with words you don’t use, tone and sentence structure that is alien to your writing should be a red flag. Look carefully at these sections, asking yourself:

  • Are any of these sentences just reconstructions of someone else’s writing?
  • If I rewrote this idea from the ground up, would it sound different?
  • Is this the tone I’m even going for in this paper?

Changes in Font

Changes in the font used in your research paper is a dead giveaway. It indicates clearly that you have copied and pasted something into your paper, be that from an outside source or your own previous work. If you spot such a section, you should either rewrite it or source it accurately, and be sure to change the font to match the rest of your paper.

Tips to Avoid Plagiarizing

Avoiding plagiarism is truly easy. Simply provide citations for all research and ideas that you didn’t create yourself, in the correct styles. These styles include APA, most common for science and medical writing; MLA, common for the arts; and Chicago Style, usually used for publishing. You can also use the following tips for beating a plagiarism checker :

  • Paraphrase the thoughts of others in your own words instead of copying their work verbatim. This reduces the chances that your work will pull up in a search ahead of theirs, which is the fair thing to do. Make sure that you don’t confuse paraphrasing with complete freedom to forego citations, though, as both are important together.
  • Link your own ideas together using the ideas of others, but rely most heavily on your original work. Others’ thoughts and words should be used to support yours, not vice versa. Before you turn to sources for your paper, outline your own approach thoroughly. This will minimize the chances of unintended theft and maximize the impact of your contributions.
  • Always use quotation marks if you are using someone’s ideas word for word. Depending on the citation style you are using, you may instead use blocked and indented text to indicate a quote from someone else. Be sure to format your paper correctly, according to the style that has been assigned to you by a professor or superior.
  • Never use words you’re not familiar with. Not only can that lead to you expressing your ideas incorrectly, but it can also trigger plagiarism checkers if you haven’t made ideas your own.
  • Provide a full works-cited or bibliography page with every assignment you submit. Again, adhere to the citation style that was given to you, which will allow others to easily locate the sources you used. Make sure to properly cite sources in the text, footnotes, and at the end of your paper, as dictated by your style guide.
  • Be honest with professors or bosses. If you truly cannot finish something in time and are motivated to act unethically, resist the urge and take your concerns to the proper authority figure. Even turning in a botched assignment is far better for your reputation and your own ethics than using someone else’s work without the proper citations.
  • Use a citation checker to ensure that you haven’t ripped off someone else’s work without meaning it. This protects them, protects you, and protects academics as a whole.

Remember, as long as you go into an assignment with the intention to create something original that reflects your honest opinion, you will likely be fine. However, you do yourself a huge disservice if you don’t take that extra step and check your sources with a plagiarism checker.

Using Quetext To Avoid Plagiarizing in Research Papers

A citation generator can help professionals, researchers, and university students alike cite web pages, journal articles, books, newspapers, and more. With proper citation, you’ll never have to worry about accusations of plagiarism again.

Using the Quetext plagiarism checker before submitting the assignment can provide reassurance that no unauthorized quoting is taking place in your research paper. It will help you by flagging any spots in your paper that still require citations, such as a missing attribution for a book or original source.

Quetext is not only reliable, but it is also easy to use. If plagiarism of any kind is detected, the tool automatically generates the proper citation, in the required citation style, right inside the text. The citation generator will create the citations your paper needs in APA, MLA, or Chicago Style. All you have to do is enter the citation components, and voilà: your works cited page, bibliography, footnotes, and the paper as a whole will appear in the proper style.

The tool works whether the source is private or published, personal, academic, professional, or anything else. Now you can be sure to honor others’ work and avoid any negative consequences from plagiarised work. This will keep you in good standing with academic institutions and free you from any shadow of scientific misconduct.

As long as you make the effort to do your own work, respect your school’s academic integrity and use a plagiarism checker, you should have nothing to worry about. Don’t wait any longer to get peace of mind … start today.

Sign Up for Quetext Today!

Click below to find a pricing plan that fits your needs.

plagiarism research paper

You May Also Like

plagiarism research paper

The 9 Best AI Detector Tools to Uncover AI Content

  • Posted on August 22, 2024

plagiarism research paper

  • Tips & Guides

The Importance of Proofreading: Techniques for Catching Errors and Polishing Your Writing

  • Posted on August 16, 2024 August 19, 2024

plagiarism research paper

The Benefits of Peer Review: How to Give and Receive Constructive Feedback on Your Writing

  • Posted on August 9, 2024

plagiarism research paper

Teaching Students About Plagiarism: Strategies for Promoting Academic Integrity

  • Posted on August 2, 2024

plagiarism research paper

Encouraging Proper Citation Practices: Tips for Teaching Students How to Cite Sources Correctly and Ethically

  • Posted on July 22, 2024

plagiarism research paper

A Guide to Paraphrasing Poetry, With Examples

  • Posted on July 12, 2024

plagiarism research paper

Preparing Students for the Future: AI Literacy and Digital Citizenship

  • Posted on July 5, 2024

plagiarism research paper

How to Summarize a Paper, a Story, a Book, a Report or an Essay

  • Posted on June 25, 2024 June 25, 2024

Input your search keywords and press Enter.

plagiarism research paper

Free Online Plagiarism Checker

google drive

Possible plagiarism detected!

If you submit this paper, your institution may take disciplinary measures against you. The content requires editing and modification of parts. We know how to make it unique.

This is weighted average of all matches in your text. For example, if half of your paper is 100% plagiarized, your score would be 50%

Well done, your text is unique!

Need an essay written but don't have the time?

With PapersOwl you’ll get it professionally researched, written and received right on time!

Make it unique with

Increase your SEO performance with

Text matches these sources

Verifying your text. It’ll take approximately 10 seconds

Get a 100% accurate report from an advanced AI-powered writing assistant. Our plagiarism checker works with all common file formats.

  • Deep Search
  • Check in real time
  • Data Safety

How to avoid plagiarism?

Proper citation style.

Avoid plagiarism by always listing the source and formatting it correctly when you are note-taking. Take care of the proper formatting and citation style when using content from outside sources.

Write on your own

Avoid borrowing and overusing large pieces of the content from outside sources, especially from Wikipedia. Write your own thoughts and use sources only to support your opinion (remember to cite it though!).

Rewriting Service

PapersOwl expert can rewrite up to 75% of your content, edit and proofread your paper to make it plagiarism free and ready to use.

Editing Service

PapersOwl expert can edit up to 50% of your content, proofread and polish your paper to make it plagiarism free and ready to use.

Writing Service

PapersOwl expert can rewrite your paper from scratch according to instructions and guidelines and make it plagiarism free and ready to use.

Suits your similarity index. Consider using it!

Plagiarism Checker Review

Get speed and uniqueness when you use the free Papersowl plagiarism checker that accepts an unlimited word count compared to other platforms.

Features Any Plagiarism Checker Papersowl Plagiarism Checker
Free
100% uniqueness
High-quality check
Swift Check
Identify original sources
No word limit
Available 24/7

Online Plagiarism Checker For Students

Writing an academic paper can be challenging when you’re not sure if it’s original enough to pass a plagiarism check. Of course, students take information from various sites before writing their own text. Sometimes, it just so happens that certain parts are very similar to your resources, making your professor think that you’ve just copied work from somewhere. That’s why it’s crucial for any modern college or university student to ensure that their work has 100% original content to maintain academic integrity.

Luckily, a free plagiarism checker online can solve this issue quickly and easily. Many cheap essay writing services use a plagiarism checker for research paper. However, students sometimes forget that they should too. But with so many options that pop up when you ask Google to “check my paper for plagiarism”, how do you choose the right one for detection? We’ve got the solution in the form of PapersOwl’s free plagiarism checker tool! Our simple tool makes it convenient to check any writing task without having to spend a dime. It works quickly and highly accurately, ensuring that you get the top grade you deserve. So, if you want to check plagiarism online before turning your task in, head over to our website and get started!

Accurate Check for Plagiarism with Percentage

Many students wishing to produce original content aren’t quite sure how to get an exact percentage of plagiarised text in their work. This percentage is important since many universities have a certain limit of non-unique words you can have in your essay for it to be considered okay. If your plagiarism search doesn’t give you the exact percentage, you can’t be sure if your assignment will go through or not.

When using a free plagiarism tool, it’s essential to have this data provided to you. Only when you have it can you decide which parts to change and which ones to chuck out to achieve your desired results. Plagiarized content is a big issue in modern educational institutions, so getting reliable and trustworthy results is vital. This is the most essential requirement when you check plagiarism.

PapersOwl’s plagiarism detection tool gives you all the information you need to fix plagiarized content. Whether you’ve fallen victim to accidental plagiarism or have tried to make your life easier by copying some text from different sources, you’ll get an accurate percentage with our plagiarism checker online. If you’re wondering how to check paper for plagiarism, it’s nothing complicated at all! Simply visit our site, paste your whole essay into the relevant text box or upload the text file, click on Check For Plagiarism, and you’ll get accurate plagiarism results in a matter of seconds. You’ll see the problematic parts with plagiarism detected highlighted, with links to where similar content exists. Our service with plagiarism detector will also give you the option to check my essay for plagiarism and then to hire a professional paper writer to fix your task quickly if you're busy with other things!

The Fastest Plagiarism Checker Online

Gaining insight into duplicate content only works if you get your results quickly. There are so many free plagiarism software online that promise to do the job for you. However, a lot of them are clunky, slow, and inaccurate. How can you produce original work without similarity detection you can trust?

PapersOwl stands out in this regard because it will detect plagiarism in seconds. This is a plagiarism scanner that’s able to perform a Swift Check to give you a uniqueness check right there and then. It also conducts a Deep Search, going through millions of sources on the internet to check for plagiarism. A document of about 1500 words takes only about 10 seconds to get processed! You get a clear plagiarism score of how much text is plagiarized and how much is original. All the sources that your essay matches are listed based on how much similarity there is in your academic writing. And on top of that, you get a handy Make It Unique button that’ll take you to an order page where you can ask our expert writers to rewrite your work and make it 100% unique.

All of this is done almost instantly, allowing students to continue do assignments without missing a beat. Not every plagiarism detection software works this quickly, making ours the best one you’ll ever use.

Plagiarism Checker Helps Boost Your Grade

A lot of students make the mistake of considering their papers automatically free from plagiarism. After all, they’ve written it themselves, so how could it be problematic? What they don’t realize is that it’s very easy to borrow some information mistakenly. Turning such a paper in can cause multiple problems, as your professor might think you haven’t done the work at all.

That is why you should always use a plagiarism scanner to test for plagiarized content in your college papers. Our online plagiarism checker for students is designed for this exact purpose. A simple, free plagiarism check could help you check plagiarism, fix any mistakes you see, and submit high-quality text that no one will question.

Our plagiarism detector has a lot going for it. It makes plagiarism detection easier than ever before. Unlike copying and pasting each passage individually into Google, simply upload the whole file into our plagiarism checker free for students, and you don’t have to do anything else. All the matches are highlighted so you know what to change.

The plagiarism test will give you a uniqueness percentage too. This will help you figure out where you stand and how much time you need to adjust anything if required. So, using our copyright checker online free to check your writing is essential. This way, you’ll submit the task only when you’re sure it meets the level of uniqueness required by your school. As a result, your grades will drastically improve when you check for plagiarism.

Free Tools for Writing

PapersOwl is a well-known provider of all types of academic papers.

  • Research paper
  • Dissertation

and many more

  • Stuck with a lot of homework assignments?
  • Worried about making your work 100% plagiarism free?
  • Looking for a writing help with affordable price?

How Does Plagiarism Checker Work?

  • If you already have a completed text, all you need is just to copy-paste the whole thing in the special box of the chosen plagiarism tool or website, choose suitable settings (if any), then press “check for plagiarism”. It is quite simple and takes just a few moments.
  • Once you have pressed “check for plagiarism”, the system will analyze your text and compare it with different sources to find similarities. As a rule, the duration depends on the text’s length. A standard free online plagiarism checker with percentage can give you the result within five minutes or less.
  • When the system finishes the work you will be transmitted to the reporting page – it contains the comprehensive report on your work, a percentage of its uniqueness, and a list of sources on which similarities were detected. Often, such tools also highlight the overlaps that were found.

As you can see, it is simple. However, for the best and reliable result you have to be careful. There are tons of programs and online tools that can be used but keep in mind that many of them work differently and not all are good for you. To be confident in the truthfulness of the received result, you need to select the best plagiarism checker because only a professional and high-quality software can detect all similarities and give you a reasoned assessment.

Polish your paper and get rid of plagiarism!

We’ll change up to 75% of your paper, edit and proofread it.

  • Reliable Editors
  • Any Field of Study
  • Fair Prices

Free Plagiarism Checker is rated 4.9 /5 based on 772 user reviews.

Want your voice to count in? Send us your review with all the details.

Advantages Of Plagiarism Checker By PapersOwl

Why choose us? Our service offers a professional online plagiarism checker with report that will provide you with a comprehensive report to make you confident in the 100% uniqueness of your paper. Our free plagiarism checker for students guarantees the best check and here are the key advantages of using our tool that prove this:

You don’t need to pay anything to check your paper for plagiarism because we know the value of original and unique works.

One of the main benefits of our antiplagiat checker online is that it works so fast that you will not even have enough time to make yourself a cup of coffee while it analyzes your text, and it is safe!

We use the latest and the best algorithms and software in order to provide you with an advanced check and help you receive the high-quality papers.

It is simple in use and won’t take much time!

Many students have already confirmed that our free tool is a great and convenient feature that helped them detect and fix errors that could lead to a failure. With us, you will no longer need to look for a different scanner!

Leaving already?

Get 10% off your first order!

* you'll see the discount on checkout page

OUR WRITERS

You can choose the writers after viewing information about them. Just select the writer whose experience is closest to your subject.

Writer avatar

Completed orders: 3583

Symbolism in to Kill a Mockingbird

  • Paper Type: Essay (Any Type)
  • Subject: Literature

Sample

Completed orders: 982

Should the government raise the federal minimum wage?

  • Subject: Law

Writer avatar

Completed orders: 1576

Rhetorical Analysis of Steve Jobs’ Commencement Address

Writer avatar

Completed orders: 938

Social media impact

  • Subject: English

Plagiarism Checker FAQ

Can i check my essay for plagiarism free online, can i use papersowl plagiarism checker as a student for free, can i check my research paper for plagiarism for free, will the papersowl plagiarism report be the same as at my university, what are the consequences of plagiarism, why wait place an order right now.

Simply fill out the form, click the button, and have no worries!

plagiarism research paper

Plagiarism: Causes and Possible Solutions

  • January 2021
  • SSRN Electronic Journal

Reshad Rayhan at Universiti Teknologi Malaysia

  • Universiti Teknologi Malaysia

Anas Amer at Universiti Teknologi Malaysia

Discover the world's research

  • 25+ million members
  • 160+ million publication pages
  • 2.3+ billion citations

Rahmawansyah Sahib

  • Ira Eka Pratiwi

Andi Anto Patak

  • Seher Balbay

Selcan Kilis

  • Arpita Basu

Joydip Chandra

  • EDUC TECHNOL SOC

Tshepo Batane

  • J Colella-Sandercock
  • Recruit researchers
  • Join for free
  • Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up
  • Open access
  • Published: 23 August 2024

Knowledge and practices of plagiarism among journal editors of Nepal

  • Krishna Subedi   ORCID: orcid.org/0000-0001-5409-1751 1 ,
  • Nuwadatta Subedi 2 &
  • Rebicca Ranjit 3  

Research Integrity and Peer Review volume  9 , Article number:  9 ( 2024 ) Cite this article

Metrics details

This study was conducted to assess the knowledge and ongoing practices of plagiarism among the journal editors of Nepal.

This web-based questionnaire analytical cross-sectional was conducted among journal editors working across various journals in Nepal. All journal editors from NepJOL-indexed journals in Nepal who provided e-consent were included in the study using a convenience sampling technique.

A final set of questionnaires was prepared using Google Forms, including six knowledge questions, three practice questions (with subsets) for authors, and four (with subsets) for editors. These were distributed to journal editors in Nepal via email, Facebook Messenger, Viber, and WhatsApp. Reminders were sent weekly, up to three times.

Data analysis was done in R software. Frequencies and percentages were calculated for the demographic variables, correct responses regarding knowledge, and practices related to plagiarism. Independent t-test and one-way ANOVA were used to compare mean knowledge with demographic variables. For all tests, statistical significance was set at p  < 0.05.

A total of 147 participants completed the survey.The mean age of the participants was found to be 43.61 ± 8.91 years. Nearly all participants were aware of plagiarism, and most had heard of both Turnitin and iThenticate. Slightly more than three-fourths correctly identified that citation and referencing can avoid plagiarism. The overall mean knowledge score was 5.32 ± 0.99, with no significant differences across demographic variables.

As authors, 4% admitted to copying sections of others' work without acknowledgment and reusing their own published work without proper citations. Just over one-fifth did not use plagiarism detection software when writing research articles. Fewer than half reported that their journals used authentic plagiarism detection software.

Four-fifths of them suspected plagiarism in the manuscripts assigned through their journal. Three out of every five participants reported the plagiarism used in the manuscript to the respective authors. Nearly all participants believe every journal must have plagiarism-detection software.

Conclusions

Although journal editors' knowledge and practices regarding plagiarism appear to be high, they are still not satisfactory. It is strongly recommended to use authentic plagiarism detection software by the journals and editors should be adequately trained and update their knowledge about it.

Peer Review reports

Introduction

With the rise in the number of publications, misconduct in research is increasing which is a global threat to evidence-based research [ 1 ]. The National Academy of Sciences in the United States (US) in 1992 defined misconduct in science as “fabrication, falsification, or plagiarism, in proposing, performing, or reporting research” [ 2 ]. Plagiarism is possibly the most serious and widely recognized violations of ethical standards [ 3 ].

World Association of Medical Editors has defined plagiarism as the “use of others' published and unpublished ideas or words (or other intellectual property) without attribution or permission, and presenting them as new and original rather than derived from an existing source” [ 4 ]. The US Office of Research Integrity (ORI) defined plagiarism as “both the theft or misappropriation of intellectual property and the substantial unattributed textual copying of another's work. This does not pertain to authorship or credit disputes ” [ 5 ]. Self-plagiarism occurs when an author reuses sections of their previous writings on the same subject in another publication without providing proper citation using quotation marks [ 4 ].

Poor quality of the journal and lack of education regarding plagiarism are the two reasons besides many other reasons for plagiarism [ 6 ]. To overcome this problem, software (iThenticate, Turnitin, Grammarly, PlagScan, Plagiarism Scanner, etc.) has been developed to detect plagiarism [ 7 , 8 ].

Though the exact prevalence of plagiarism in Nepal is not known, several incidents related to plagiarism across universities have been reported [ 9 ]. Seven researchers, including professors and PhD students, were penalized after plagiarism was detected in Nepal [ 10 ].

Till date, there are no any published literature available regarding the knowledge and practices of editors regarding plagiarism in Nepal. Therefore, this study was conducted to assess the knowledge and ongoing practices of plagiarism among the journal editors of Nepal.

Study design, setting, and participants

This was a web-based analytical cross-sectional questionnaire-based study conducted among journal editors working across various journals in Nepal. The data collection was done from 1st December 2023 to 30th April 2024.

All Nepali journals listed in Nepal Journal Online (NepJOL) with available Email IDs of the editorial team on their website and journals that have updated their website after 2020 were included. All journal editors from NepJOL-indexed journals in Nepal who provided e-consent were included in the study using a convenience sampling technique.

Data collection technique

Demographic characteristics including age, sex, education, province, duration of working in the journal, and number of publications were recorded.

The questionnaires included the knowledge and self-reported practice components. Knowledge components include ten items that were taken from previous research [ 11 ] as well as prepared by the authors. Self-reported practice components included practice as an author and practice as a journal editor. Self-reported practice as an author includes six items and as an editor includes four items. The content validity of the questionnaire was done by sending questions to five experts. Lynn indicated that at least three experts are required and five experts will provide a sufficient level of agreement whereas using more than 10 experts will be of no use in calculating the content validity [ 12 ]. Each member of the panel was asked to respond to the following question for each of the items: Is the skill (or knowledge) measured by this item for the essential scale to measure knowledge and practice of plagiarism among journal editors as 1 = Not essential; 2 = Useful but not essential; 3 = Essential, relevant scale as: 1 = Not relevant; 2 = Somewhat relevant (need some revision); 3 = Quite relevant (need minor revision); 4 = Very relevant and clarity scale as: 1 = Not clear; 2 = Item needs some revision; 3 = Very clear [ 13 ].

Content validity Index (CVI): CVI is the most widely reported approach for content validity in instrument development and can be computed using the Item-CVI (I-CVI). I-CVI is computed as the number of experts giving a rating of “very relevant” for each item divided by the total number of experts. Values range from 0 to 1 where the item is relevant if I-CVI > 0.79, the item needs revision if it is between 0.70 and 0.79, and if the value is below 0.70, the item is eliminated [ 14 , 15 ]. A I-CVIs ≥ 0.78 have excellent content validity [ 15 , 16 ].

Questions were distributed to five experts for content validation through email. Experts chosen were highly knowledgeable in research and plagiarism, and have experience working as editors for both national and international journals. Experts provided their opinions via email, and their responses were analyzed for the I-CVI. Two questions from the knowledge section and three questions from the practice as an author section were removed as the I-CVI score was less than 1.0. Therefore final set of questionnaire included eight questions for knowledge, three questions (with subsets) for practice as an author and four questions (with subsets) as an editor. Two questions from the knowledge Sect. (1. Are you aware of plagiarism? 2. Have you heard about any plagiarism detection software?) were put in the demographic sections as these questions could not measure the knowledge. Therefore a total of six questions were for the knowledge section. Each of the six questions had a single correct answer with a binary outcome coded as one for correct and zero for incorrect. Every correct answer was scored as one, while incorrect answers were scored as zero. An overall composite score was then calculated by summing the individual scores for each question. The highest possible knowledge score for each individual was six.

The prepared questionnaires underwent pilot testing among journal editors of a medical journal to assess readability and comprehension. Items in the questionnaire that were found to be confusing to the editors were subsequently revised.

The final questionnaires were prepared using Google Forms and sent via email, Facebook Messenger, Viber, and Whats app to the various journal editors in Nepal. There were a total of 396 journals listed in NepJOL. Out of which 16 were no longer being published, 12 had not updated their journal since 2020, two had changed their name, 60 had no contact lists on their website on the date of March 15, 2024. Therefore a total of 306 journals were selected and 497 editors were contacted using their Email-Ids. In some journals, only the Email IDs of the Editor-in-Chief and/or managing editors were available, but not for all editorial teams. In such cases, an Email was sent to the designated address with a request to circulate the link to their editorial team members. NepJOL is a comprehensive database that features journals published in Nepal across various academic disciplines. All materials on NepJOL are freely available for viewing, searching, and browsing. However, the copyright of all content is retained by the journals or authors. This resource is managed by the Tribhuvan University Central Library and hosted by Ubiquity Press [ 17 ].

A set of questionnaire was sent a maximum of three times, once a week as a reminder. Questionnaires that were not responded to even after a reminder of three times were not considered in the analysis.

Dependent Variables: Knowledge and practice of journal editors.

Independent Variables: Sex, role in a journal, working province, working experience in journal (in years), and number of publications.

Ethical consideration and informed consent

Ethical clearance was obtained from Gandaki Medical College -Institutional Review Committee (ref no: 08/080/081-F). Electronic informed consent was taken from all participants before starting the survey. The survey was anonymous, and confidentiality was ensured.

Statistical analysis

All data in the Microsoft Excel spreadsheet linked to the online survey Google form was imported into R. The frequencies and percentages were calculated for background characteristics, knowledge, and practice scores of plagiarism. Independent t-test and one-way ANOVA were used to compare mean knowledge with demographic variables. For all tests, statistical significance was set at p  < 0.05.

The reliability of the factors and scales was based on I-CVI value.

A total of 147 participants completed the survey with a response rate of 29.58% (147/497). The mean age of the participants was found to be 43.61 ± 8.91 (ranging from 22.0 to 67.0) years. More than two-thirds of the participants were male. Bagmati province accounted for over half of the participants, while Madhesh province represented less than 3%. Just over half of the participants had completed master's level education. Approximately half comprised the editorial team members. Slightly more than half of the participants were affiliated with biomedical journals. More than six out of every ten participants had published 10 or more research articles. Nearly all participants were aware of plagiarism, and the majority had heard of both plagiarism software: Turnitin and iThenticate (Table  1 ).

The majority of participants correctly answered questions about plagiarism, with almost everyone agreeing that plagiarism can be a severe form of ethical misconduct. Additionally, slightly more than three-fourths of participants correctly identified that citation and referencing can be used to avoid plagiarism (Table  2 ).

As an author, 4% had ever copied and pasted a section of someone's else work without acknowledgment and quotation as well as reused their published work without proper citations and references. Just over one-fifth of the participants did not use plagiarism detection software when writing research articles. Among those who did use such software, two-fifths utilized freely available online tools, while nearly a quarter used Turnitin, and another quarter used iThenticate (Table  3 ).

Fewer than half of the participants indicated that the journals they worked for used authentic plagiarism detection software. Among them almost half of the journal used iThenticate as a plagiarism detection software. Almost 18% didn't mentioned the name of software their journal were using.

Four-fifths of them suspected plagiarism in the manuscripts assigned through their journal. Three out of every five participants reported the plagiarism used in the manuscript to the respective authors. Nearly all participants believe it is necessary for every journal to have plagiarism detection software (Table  4 ).

The overall mean knowledge score of the participants was 5.32 ± 0.99. No significant difference was found in mean knowledge across various demographic variables (Table  5 ).

This study is unique compared to others on similar topics because it exclusively involves journal editors, whereas previous studies have not focused specifically on this group.

The reason for not conducting similar studies on journal editors might be the assumption that editors are already well aware of plagiarism, making it seem unnecessary to study their knowledge on the topic.

However, the authors of this study believe that not all editors and journals may be fully informed about plagiarism, and even if they are aware, they may not be practicing proper plagiarism control. It is crucial for those in central roles to thoroughly understand and implement anti-plagiarism measures. This ensures they can identify and minimize plagiarism in manuscripts submitted to their journals.

Due to a lack of similar studies, comparisons are made with the few available studies. A study conducted by Smart et al. among journal editors found that 2–5% of submitted manuscripts were plagiarized [ 18 ].

The results of the study showed that overall knowledge and practice related to plagiarism seem to be higher.

Bagmati province accounted for over half of the participants, while Madhesh province had less than 3%. Bagmati Province is the most populous in Nepal, and most developmental and research activities are highly centralized there compared to other provinces. Additionally, Kathmandu, the capital city of Nepal, is located in Bagmati Province, where a larger number of journals and editors are based. This could explain the higher number of participants from this province. Additionally, the lack of personal communication with the editors form other provinces might be another contributing factor.

Nearly one in seven participants disagreed that using other’s image or video without receiving proper permission or providing appropriate citations is plagiarism. While this number may seem low in general, it is relatively high for journal editors. Journal editors should be well-trained and regularly updated on issues of plagiarism.

Almost 15% of the participants disagreed that paraphrasing or quoting can be used to avoid plagiarism which is higher as compared to a study done by Phyo et al. [ 11 ]. The reason may be due to the fact that most of the editors have completed master or Ph.D. courses and already have done research whereas in the study done by Phyo et al. involved postgraduate students.

More than one-fifth of the participants disagreed that citation and referencing can be used to avoid plagiarism which is lower as compared to a study done by Phyo et al. [ 11 ].

Around one in eleven disagreed that plagiarism detection software can be used to avoid or detect plagiarism which is lower as compared to a study done by Phyo et al. [ 11 ]. This supports the authors' opinion that not all editors are fully aware of or trained in handling plagiarism. Therefore, it's crucial for all journal editors to receive training and updates on plagiarism to effectively manage manuscripts and check for plagiarism. The other reason may be the accuracy of the software detection. Some software may not accurately detect plagiarism. It can incorrectly flag properly cited and referenced material as non-original content [ 19 ].

Almost 5% disagreed that authors reusing their previously written work or data in a ‘new’ written article without citation and referencing is plagiarism. This percentage is lower compared to university students, where one-quarter of the participants did not know that self-plagiarism is considered plagiarism [ 20 ].

Almost all agreed that plagiarism can be a very serious form of ethical misconduct. It is universally acknowledged that plagiarism is a serious ethical misconduct. Authors should be fully aware of this before writing a research manuscript to minimize or avoid instances of plagiarism.

Practice as an author

Almost 4% ever copied and pasted a section of someone else’s work without acknowledgment and quotation and a similar proportion reused their work that has been published in one journal without proper citations and references. There are no directly comparable studies. However, a study by Gupta et al. [ 21 ] reported that slightly less than one-fifth of the participants, who were editors and researchers, had published articles containing copied parts.

Just over one-fifth of the participants did not use plagiarism detection software when writing research articles which is almost similar to a study done by Gupta et al. [ 21 ] where one-fourth of the participants did not use any form of plagiarism detection software.

Practice as an editor

Fewer than half of the participants indicated that the journals they worked for used authentic plagiarism detection software. It is crucial for every journal to use authentic plagiarism detection software, as freely available online tools may not accurately detect all instances of plagiarism [ 22 ]. Cost may be a factor in choosing plagiarism detection software. Individuals can use freely available tools cautiously, but it is always recommended that journals or institutions use authentic, reliable software.

Four-fifths of them suspected plagiarism in the manuscripts assigned to them, which is higher than the findings of Smart et al., where just under two-thirds reported experiencing some plagiarized submissions. The larger percentage in this study may be because participants only suspected plagiarism, while in the study by Smart et al., they reported confirmed cases of plagiarism [ 18 ]. This indicates that a significant number of manuscripts were suspected of plagiarism. To confirm these suspicions, reliable software should be used before corresponding with the authors.

Three out of every five participants reported the plagiarism used in the manuscript to the respective authors. It is recommended to report detected plagiarism to both the author and the journal. Failure to do so can harm the author’s career and damage the journal’s reputation.

The primary reason that not all editors were well-informed about plagiarism may be that they were trained in editorial processing but did not receive specific training on plagiarism.

Limitations

Due to the use of convenience sampling and social media for data collection, the survey may have primarily attracted participants who were genuinely interested and had better knowledge. Those with less knowledge might not have participated, potentially leading to over-reporting. Social desirability bias could have occurred. This may lead to more positive responses in knowledge as well as in practice-based questionnaires. Since this study includes only journal editors from Nepal, its findings cannot be generalized beyond the country. However, the study participants include editors working in various areas of Nepal, covering a wide range of disciplines, the results could be generalized to the Nepalese population.

Although journal editors' knowledge and practices regarding plagiarism appear to be high, they are still not satisfactory. It is strongly recommended to use authentic plagiarism detection software by the journals and editors should be adequately trained and update their knowledge about it. Authors should also be aware of plagiarism and its consequences when writing and submitting a research manuscript to a journal.

Availability of data and materials

Data will be made available upon reasonable request to the corresponding author (Krishna Subedi).

Abbreviations

Content validity Index

Nepal Journal Online

Office of Research Integrity

United States

Farthing MJG. Research misconduct: a grand global challenge for the 21st century: research misconduct. J Gastroenterol Hepatol. 2014;29(3):422–7.

Article   Google Scholar  

National Academy of Sciences. Responsible science: ensuring the integrity of the research process: volume I. Washington, DC: National Academy Press; 1992. Available from: https://www.ncbi.nlm.nih.gov/books/NBK234523/pdf/Bookshelf_NBK234523.pdf . Accessed 9 Aug 2023.

Roig M. Avoiding plagiarism, self-plagiarism, and other questionable writing practices: a guide to ethical writing. The Office of Research Integrity (ORI); 2015. Available from: https://www.cse.msu.edu/~alexliu/plagiarism.pdf . Accessed 9 Aug 2023.

World association of medical editors. Recommendations on publication ethics policies for medical journals. Available from: https://wame.org/recommendations-on-publication-ethics-policies-for-medical-journals . Accessed 10 Aug 2023.

The Office of Research Integrity. ORI policy on plagiarism. Available from: https://ori.hhs.gov/ori-policy-plagiarism . Accessed 10 Aug 2023.

Roka YB. Plagiarism: types, causes and how to avoid this worldwide problem. Nepal Journal of Neuroscience. 2017;14(3):2–6. Available from: https://pdfs.semanticscholar.org/2ab2/361ea075a3f5db8456b4414505077b9f6c69.pdf . Accessed 11 Aug 2023

Chowdhury HA, Bhattacharyya DK. Plagiarism: taxonomy, tools and detection techniques. arXiv preprint arXiv:1801.06323. 2018. Available from: https://arxiv.org/ftp/arxiv/papers/1801/1801.06323.pdf . Accessed 11 Aug 2023.

Khaled F, Al-Tamimi MS. Plagiarism detection methods and tools: an overview. Iraqi J Sci. 2021:2771–83. Available from: https://www.iasj.net/iasj/download/32f83e0c6cbbc13c . Accessed 11 Aug 2023.

Roka YB. The recent trend of plagiarism in Nepal. Nepal J Neurosci. 2017;14(3):1–1.

Retraction Watch. Seven barred from research after plagiarism, duplications in eleven papers. Available from: https://retractionwatch.com/2021/04/05/retired-professor-banned-from-research-after-plagiarism-duplications-in-eleven-papers/ . Accessed 2 Aug 2024.

Phyo EM, Lwin T, Tun HP, Oo ZZ, Mya KS, Silverman H. Knowledge, attitudes, and practices regarding plagiarism of postgraduate students in Myanmar. Account Res. 2022;10:1–20.

Google Scholar  

Lynn MR. Determination and quantification of content validity. Nurs Res. 1986;35(6):382–6.

Lawshe CH. A quantitative approach to content validity. Personnel psychology. 1975;28(4):563–75. Available from: https://parsmodir.com/wp-content/uploads/2015/03/lawshe.pdf . Accessed 6 Aug 2023

Zamanzadeh V, Ghahramanian A, Rassouli M, Abbaszadeh A, Alavi-Majd H, Nikanfar AR. Design and implementation content validity study: development of an instrument for measuring patient-centered communication. J Caring Sci. 2015;4(2):165–78.

Rodrigues IB, Adachi JD, Beattie KA, MacDermid JC. Development and validation of a new tool to measure the facilitators, barriers and preferences to exercise in people with osteoporosis. BMC Musculoskelet Disord. 2017;18(1):540.

Shi J, Mo X, Sun Z. [Content validity index in scale development]. Zhong Nan Da Xue Xue Bao Yi Xue Ban. 2012;37(2):152–5.

Nepal Journals Online. About. Available from: https://www.nepjol.info/index.php/index/about . Accessed 2 Aug 2024.

Smart P, Gaston T. How prevalent are plagiarized submissions? Global survey of editors. Learned Publishing. 2019;32(1):47–56.

Anson CM, Kruse O. Plagiarism detection and intertextuality software. In: Digital writing technologies in higher education: theory, research, and practice. Cham: Springer International Publishing; 2023. p. 231-243.

Clarke O, Chan WYD, Bukuru S, Logan J, Wong R. Assessing knowledge of and attitudes towards plagiarism and ability to recognize plagiaristic writing among university students in Rwanda. High Educ. 2023;85(2):247–63.

Gupta L, Tariq J, Yessirkepov M, Zimba O, Misra DP, Agarwal V, et al. Plagiarism in non-anglophone countries: a cross-sectional survey of researchers and journal editors. J Korean Med Sci. 2021;36(39):e247.

Anil A, Saravanan A, Singh S, Shamim MA, Tiwari K, Lal H, et al. Are paid tools worth the cost? A prospective cross-over study to find the right tool for plagiarism detection. Heliyon. 2023;9(9):e19194.

Download references

Acknowledgements

The authors acknowledge all the participants.

Author information

Authors and affiliations.

Department of Community Dentistry, Gandaki Medical College Teaching Hospital and Research Centre, Pokhara, Nepal

Krishna Subedi

Department of Forensic Medicine, Gandaki Medical College Teaching Hospital and Research Centre, Pokhara, Nepal

Nuwadatta Subedi

Department of Periodontics, Gandaki Medical College Teaching Hospital and Research Centre, Pokhara, Nepal

Rebicca Ranjit

You can also search for this author in PubMed   Google Scholar

Contributions

KS participated in selecting the research title, conducted statistical analysis, contributed to the study design, and drafted the manuscript. KS, NS, and RR conducted the studies, literature search, and participated in data collection. All authors reviewed and approved the final manuscript.

Corresponding author

Correspondence to Krishna Subedi .

Ethics declarations

Ethics approval and consent to participate, consent for publication.

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Cite this article.

Subedi, K., Subedi, N. & Ranjit, R. Knowledge and practices of plagiarism among journal editors of Nepal. Res Integr Peer Rev 9 , 9 (2024). https://doi.org/10.1186/s41073-024-00149-5

Download citation

Received : 22 May 2024

Accepted : 15 August 2024

Published : 23 August 2024

DOI : https://doi.org/10.1186/s41073-024-00149-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Journal editors
  • Knowledge and practices

Research Integrity and Peer Review

ISSN: 2058-8615

plagiarism research paper

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • J Korean Med Sci
  • v.35(27); 2020 Jul 13

Logo of jkms

Similarity and Plagiarism in Scholarly Journal Submissions: Bringing Clarity to the Concept for Authors, Reviewers and Editors

Aamir raoof memon.

Institute of Physiotherapy & Rehabilitation Sciences, Peoples University of Medical & Health Sciences for Women, Nawabshah (Shaheed Benazirabad), Sindh, Pakistan.

INTRODUCTION

What constitutes plagiarism? What are the methods to detect plagiarism? How do “plagiarism detection tools” assist in detecting plagiarism? What is the difference between plagiarism and similarity index? These are probably the most common questions regarding plagiarism that many research experts in scientific writing are usually faced with, but a definitive answer to them is less known to many. According to a report published in 2018, papers retracted for plagiarism have sharply increased over the last two decades, with higher rates in developing and non-English speaking countries. 1 Several studies have reported similar findings with Iran, China, India, Japan, Korea, Italy, Romania, Turkey, and France amongst the countries with highest number of retractions due to plagiarism. 1 , 2 , 3 , 4 A study reported that duplication of text, figures or tables without appropriate referencing accounted for 41.3% of post-2009 retractions of papers published from India. 5 In Pakistan, Journal of Pakistan Medical Association started a special section titled “Learning Research” and published a couple of papers on research writing skills, research integrity and scientific misconduct. 6 , 7 However, the problem has not been adequately addressed and specific issues about it remain unresolved and unclear. According to an unpublished data based on 1,679 students from four universities of Pakistan, 85.5% did not have a clear understanding of the difference between similarity index and plagiarism (unpublished data). Smart et al. 8 in their global survey of editors reported that around 63% experienced some plagiarized submissions, with Asian editors experiencing the highest levels of plagiarized/duplicated content. In some papers, journals from non-English speaking countries have specifically discussed the cases of plagiarized submissions to them and have highlighted the drawbacks in relying on similarity checking programs. 9 , 10 , 11 The cases of plagiarism in non-English speaking countries have a strong message for honest researchers that they should improve their English writing skills and credit used sources by properly citing and referencing them. 12

Despite aggregating literature on plagiarism from non-Anglophonic countries, the answers to the aforementioned questions remain unclear. In order to answer these questions, it is important to have a thorough understanding of plagiarism and bring clarity to the less known issues about it. Therefore, this paper aims to 1) define plagiarism and growth in its prevalence as well as literature on it; 2) explain the difference between similarity and plagiarism; 3) discuss the role of similarity checking tools in detecting plagiarism and the flaws on completely relying on them; and 4) discuss the phenomenon called Trojan citation. At the end, suggestions are provided for authors and editors from developing countries so that this issue maybe collectively addressed.

Defining plagiarism and its prevalence in manuscripts

To begin with, plagiarism maybe defined as “ when somebody presents the published or unpublished work of others, including ideas, scholarly text, images, research design and data, as new and original rather than crediting the existing source of it. ” 13 The common types of plagiarism, including direct, mosaic, paraphrasing, intentional (covert) or unintentional (accidental) plagiarism, and self-plagiarism have been discussed in previous reviews. 14 , 15 , 16

Evidence suggests that the first paper accused for plagiarism was published in 1979 and there has been a substantial growth in the cases of plagiarism over time. 1 , 2 , 3 , 4 , 5 , 8 , 17 Previous studies have pointed that plagiarism is prevalent in developing and non-English speaking countries but the occurrence of plagiarism in developed countries suggests that it is rather a global problem. 1 , 2 , 3 , 4 , 18 , 19 , 20 As of today (1 April 2020), the search conducted in Retraction Database ( http://retractiondatabase.org/RetractionSearch.aspx ?) for papers retracted for plagiarism found 2,280 documents. Similarly, Scopus search for plagiarism in title of journal articles found 2,159 results. This suggests that the papers retracted for plagiarism are in fact higher than the papers published on this issue. However, what we see now may not necessary be true i.e., the cases of plagiarism might be higher than we know. Certainly, database search for papers tagged for plagiarism is limited to indexed journals only, which keeps non-indexed journals (both low-quality and deceptive journals) out of focus. 5 , 21 Moreover, journal coverage may vary from one database to the other as reported in a recent paper on research dissemination in South Asia. 22 Therefore, both the prevalence of plagiarism and literature published on it as reported by database search are most likely “ understated as of today .” 5

Reasons for plagiarism: lack of understanding and poor citing practices

Although reasons for plagiarism are complex, previous papers have suggested possible causes for plagiarism by authors. 16 , 23 , 24 , 25 , 26 One of the major but less known reason for this might be that the students, naïve researchers, and even some faculty members either lack clarity about what constitutes plagiarism or are unable to differentiate similarity index versus plagiarism. 24 , 26 , 27 For example, a recent online survey conducted on the participants in the AuthorAID MOOC on Research Writing found that 84.4% of the survey participants were unaware of the difference between similarity index and plagiarism, though almost all of them had reported having an understanding of plagiarism. 24 The same paper reported that one in three participants admitted that they had plagiarized at some point during their academic career. 24 Therefore, it is important to have clarity about what constitutes plagiarism and the difference between similarity index and plagiarism so that the increasing rates of plagiarism could be deterred.

The ‘existing source’ or ‘original source’ in the definition of plagiarism refers to the main (primary) source and not the source (secondary) from where the author extracts the information. For example, someone cites a paper for a passage on mechanism of how exercise affects sleep but the cited paper aims to determine the prevalence of sleep disorders and exercise level rather than the mechanistic association. A thorough evaluation finds that the cited paper had used the text from another review paper that talked about the mechanisms relating sleep with exercise behavior. This phenomenon of improper secondary (or indirect) citations may be common among students and novice researchers, particularly from developing countries, and should be discouraged. 27

SIMILARITY INDEX

Plagiarism vs. similarity index and the role of similarity checking tools.

Plagiarism as defined above refers to the intentional (covert) or unintentional (accidental) theft of published or unpublished intellectual property (i.e., words or ideas), whereas similarity index refers to “ the extent of overlap or match between an author's work compared to other existing sources (books, websites, student thesis, and research articles) in the databases of similarity checking tools. ” 9 , 24 The advancements in information technology has helped researchers get help from various freely available (i.e., Viper, eTBLAST/HelioBLAST, PlagScan, PlagiarismDetect, Antiplagiat, Plagiarisma, DupliChecker) and subscription-based (i.e., iThenticate, Turnitin, Similarity Check) similarity checking tools. 8 , 24 Many journal editors use iThenticate and/or Similarity Check (Crossref) for screening submitted manuscripts for similarity detection whereas Turnitin is commonly used by universities and faculty to assess text similarity in students' work; however, there is a fairness issue that not every journal or university, particularly those from developing countries, can afford to pay for using these subscription-based services. 28 For instance, an online survey found that only about 18% participants could use Turnitin through their university subscription. 24 Another problem is the way these tools are commonly referred to as i.e., plagiarism detection tools, plagiarism checking software, or plagiarism detection programs. However, based on the function they perform, it would be appropriate to call them differently, such as similarity checking tools, similarity checkers, text-matching tools, or simply text-duplicity detection tools. 5 , 8 , 23 This means that these tools help locate matching or overlapping text (similarity) in submitted work, without directly flagging up plagiarism. 24

Taking Turnitin as an example, these tools reflect the text similarity through color codes, each linked to an online source of it; details for this have been described elsewhere. 23 , 28 Journal editors, universities and some organizations consider text above specific cutoff values for the percentage of similarity as problematic. According to a paper, 5% or less text similarity (overlap of the text in the manuscript with text in the online literature) is acceptable to some journal editors, while others might want to put the manuscript under scrutiny if the text similarity is over 20%. 29 , 30 Another paper observed that journal editors tend to reject a manuscript if text similarity is above 10%. 31 The study on participants completing the AuthorAID MOOC on Research Writing also found that some participants reported that their institutions consider text similarity of less than 20% as acceptable. 24 As an example, the guidelines of the University Grants Commission of India allow for similarity up to 10% as acceptable or minor (Level 0), but anything above is categorized into different levels (based on the percentages), each with separate list of repercussions for students and researchers. 32 This approach might miss the cases where the acceptable similarity of 10% comes from a single source, especially if the editors relied on the numbers only. In addition, this approach has the potential for punishing authors who have not committed plagiarism at all. To illustrate this, the randomly written text presented in Fig. 1 would be considered plagiarism based on the rule of cutoff values. Some authors opine that text with over four consecutive words or a number of word strings should be treated as plagiarized. 28 , 33 This again is not a good idea as the text “the International Physical Activity Questionnaire was used to measure …” would be same in several papers, but this is definitely not plagiarism because the methodology of different papers on the same topic could be similar; so, the decision should not be based on the numbers reflected by similarity detection tools. 28 Therefore, it would be prudent not to set any cutoff values for text similarity as it will lead to a slippery slope (“a course of action that seems to lead inevitably from one action or result to another with unintended consequences” –defined by Merriam-Webster Dictionary ) and give “a sense of impunity to the perpetrators.” 32

An external file that holds a picture, illustration, etc.
Object name is jkms-35-e217-g001.jpg

Drawbacks of similarity checking tools

There are a few drawbacks on completely relying on the similarity checking tools. First, these tools are not foolproof and might miss the incidents of translational plagiarism and figure plagiarism. 24 Translational plagiarism is the most invisible type of copying in non-Anglophone countries where an article published in languages other than English is copied (with or without minor modifications) and published in an English journal or vice versa. 10 This is indeed extremely difficult type of plagiarism to detect, and different approaches (e.g., use of Google translator) to address it have been recently reported. 34 , 35 Nevertheless, there might be some cases where this practice maybe acceptable, such as publishing policy papers (see “ Identifying predatory or pseudo-journals” – this paper was published in International Journal of Occupational and Environmental Medicine , National Medical Journal of India , and Biochemia Medica in 2017 by authors affiliated with World Association of Medical Editors (WAME) – or “The revised guidelines of the Medical Council of India for academic promotions: Need for a rethink” – this paper was published in over ten journals during 2016 by four journal editors and endorsed by members (not all) of the Indian Association of Medical Journal Editors, for example). Second, text similarity in some parts of manuscript (i.e., methods and results) should be weighed differently from other sections (i.e., introduction and discussions) and its conclusions. 31 In addition, based on the personal experience of the author of this paper, some individuals might use a sophisticated technique to avoid detection of high similarity through the use of inappropriate synonyms, jargon, and deliberate grammatical and structural errors in the text of the manuscript. Third, plagiarism of ideas may be missed by these tools as they can only detect plagiarism of words. 23 , 32 Therefore, similarity checking tools tend to underestimate plagiarized text or sometimes overestimate non-plagiarized material as problematic ( Fig. 1 ). 24 , 36 It should be noted that these tools serve as only an aid to determine suspected instances of plagiarism and the text of the manuscript should always be evaluated by experts, so “a careful human cannot be replaced.” 31 , 37 A few papers published in the Journal of Korean Medical Science have presented the examples where plagiarized content was missed by similarity checking tools and later noticed after a careful examination of the text. 9 , 10 Finally, plagiarism of unpublished work cannot be detected by these tools as they are limited to online sources only. 23 This is particularly important in the context of developing countries where research theses/dissertations of students are not deposited in research repositories, and where commercial, predatory editing and brokering services exist. 10 , 38 For example, the research repository of the Higher Education Commission of Pakistan allows deposition of doctoral theses only, and less than five universities (out of over 150) across the country have a research repository allowing for deposition of scholarly content. 38 Recently some strange trend of predatory editing and brokering services has emerged that offer clones of previously published papers or unpublished work to non-Anglophone or some lazy authors demanding quick and easy route to publications for promotion and career advancement. 10 Although plagiarism of unpublished work would not be easy for experts to detect, this may be possible through their previous experience and scholarly networks.

TROJAN CITATION: PERSONAL EXPERIENCE

A recent experience worth discussion in context to plagiarism comes in the shape of the Trojan citation where someone “ makes reference to a source one time to in order to evade detection (by editors and readers) of bad intentions and provide cover for a deeper, more pervasive plagiarism. ” 39 This practice is particularly common in those with an intent of deceiving the readers and playing with the system. A few months ago, the author of this paper was invited to review a manuscript on predatory publishing by a journal. The content of the manuscript appeared suspicious but was not labelled “plagiarized” during the first round of the review. However, during the second round, it was noticed that this was a case of Trojan citation where the author(s) cited the main source for a minor point and copied the major part of the manuscript from a paper published in Biochemia Medica (a Croatian journal) with slight modification in the content. 40 The editor of the journal was informed about this and the manuscript was rejected further processing. This example suggests that careful human intervention by experts is required to highlight the cases of plagiarism.

In conclusion, what we know about the growth in the prevalence of plagiarism may be ‘just the tip of the iceberg’. Therefore, collective contribution from authors, reviewers, and editors, particularly from Asia-Pacific region, is required. Authors from the Asia-Pacific region and developing countries, with an expertise on this topic, should play their role by supporting journal editors and through their mentorship skills. Furthermore, senior researchers should encourage and help their honors and master students to publish their unpublished work before it gets stolen by commercial, brokering agencies. They should also work in close collaboration with universities and organizations related with higher education in countries where this issue is not properly addressed, and should facilitate education and training sessions on plagiarism as previous evidence suggests that workshops and online training sessions may be helpful. 5 On the other hand, journal editors from Asia-Pacific region and developing countries should not judge the manuscripts solely on the basis of percentage of similarity as reflected by similarity checking services. They should have a database of their own where manuscripts about plagiarism in scientific writing, for example, should be sent for review to the experts on this subject. As journal editors may not be experts in all fields, networking and seeking help from experts would be helpful in avoiding the cases of plagiarism in the future. It would be appropriate that the journal editors and the trainee editors, particularly from the resource-limited countries, are educated about the concept of scientific misconduct and the advancement in knowledge around this area. Moreover, journal editors should publish and publically discuss the cases of plagiarism as a learning experience for others. The Journal of Korean Medical Science has used this approach regarding cases of plagiarism, which other journals from the region are encouraged to adopt. 9 , 10 Likewise, a paper discussing case scenarios of salami publication (i.e., “ a distinct form of redundant publication which is usually characterized by similarity of hypothesis, methodology or results but not text similarity ”) serves as a good example of how journal editors may facilitate authors to utilize their mentorship skills and support journals in educating researchers. 41 There should be strict penalties on cases of plagiarism, and safety measures for security of whistleblowers should be in place and be ensured. By doing so, evil and lazy authors who bypass the system would be punished and honest authors would be served. Thus, the take-home message for editors from Asia-Pacific region is that a collective effort and commitment from authors, reviewers, editors and policy-makers is required to address the problem of plagiarism, especially in the developing and non-English speaking countries.

Disclosure: The author has no potential conflicts of interest to disclose.

American Psychological Association

In-Text Citations

In scholarly writing, it is essential to acknowledge how others contributed to your work. By following the principles of proper citation, writers ensure that readers understand their contribution in the context of the existing literature—how they are building on, critically examining, or otherwise engaging the work that has come before.

APA Style provides guidelines to help writers determine the appropriate level of citation and how to avoid plagiarism and self-plagiarism.

We also provide specific guidance for in-text citation, including formats for interviews, classroom and intranet sources, and personal communications; in-text citations in general; and paraphrases and direct quotations.

plagiarism research paper

Academic Writer ®

Master academic writing with APA’s essential teaching and learning resource

illustration or abstract figure and computer screen

Course Adoption

Teaching APA Style? Become a course adopter of the 7th edition Publication Manual

illustration of woman using a pencil to point to text on a clipboard

Instructional Aids

Guides, checklists, webinars, tutorials, and sample papers for anyone looking to improve their knowledge of APA Style

IMAGES

  1. PPT

    plagiarism research paper

  2. (PDF) The Truth about Plagiarism

    plagiarism research paper

  3. » How to Check Plagiarism Percentage in Your Research Paper?

    plagiarism research paper

  4. Identify the various forms of plagiarism Research Paper

    plagiarism research paper

  5. Prov Plagiarism Research Paper

    plagiarism research paper

  6. (PDF) Plagiarism in Scientific Research and Publications and How to

    plagiarism research paper

COMMENTS

  1. Plagiarism detection and prevention: a primer for researchers

    Plagiarism is an ethical misconduct affecting the quality, readability, and trustworthiness of scholarly publications. Improving researcher awareness of plagiarism of words, ideas, and graphics is essential for avoiding unacceptable writing practices.

  2. What is plagiarism and how to avoid it?

    In 1999, the Committee on Publication Ethics (COPE) 5, 6 defined plagiarism as "Plagiarism ranges from the unreferenced use of others' published and unpublished ideas including research grant applications to submission under new authorship of a complex paper, sometimes in a different language.

  3. How to Avoid Plagiarism

    Learn how to avoid plagiarism in your academic papers by keeping track of your sources, quoting and paraphrasing correctly, and citing your sources properly. Find out how to use a plagiarism checker and AI tools responsibly.

  4. What Is Plagiarism?

    Plagiarism means using someone else's work without giving them proper credit. In academic writing, plagiarizing involves using words, ideas, or information from a source without citing it correctly. In practice, this can mean a few different things. Examples of plagiarism.

  5. (PDF) Plagiarism in research

    Abstract Plagiarism is a major problem for research. There are, however, divergent views on how to define plagiarism and on what makes plagiarism reprehensible. In this paper we explicate the ...

  6. Academic Plagiarism Detection: A Systematic Literature Review

    This article summarizes the research on computational methods to detect academic plagiarism by systematically reviewing 239 research papers published between 2013 and 2018. To structure the presentation of the research contributions, we propose novel technically oriented typologies for plagiarism prevention and detection efforts, the forms of ...

  7. Plagiarism in Research

    Learn what plagiarism is, why it happens, and how to avoid it in research writing. This ebook covers plagiarism concepts, types, examples, and prevention tips for academic integrity.

  8. Knowing and Avoiding Plagiarism During Scientific Writing

    In 1999, the Committee on Publication Ethics (COPE) defined plagiarism as, "plagiarism ranges from the unreferenced use of others' published and unpublished ideas, including research grant applications to submission under "new" authorship of a complete paper, sometimes in a different language.

  9. Research Guides: Citing Sources: How to Avoid Plagiarism

    To avoid plagiarism, one must provide a reference to that source to indicate where the original information came from (see the "Source:" section below). "There are many ways to avoid plagiarism, including developing good research habits, good time management, and taking responsibility for your own learning. Here are some specific tips:

  10. How to Avoid Plagiarism

    It's not enough to know why plagiarism is taken so seriously in the academic world or to know how to recognize it. You also need to know how to avoid it. The simplest cases of plagiarism to avoid are the intentional ones: If you copy a paper from a classmate, buy a paper from the Internet, copy whole passages from a book, article, or Web site without citing the author, you are plagiarizing.

  11. Plagiarism in research

    Plagiarism is a major problem for research. There are, however, divergent views on how to define plagiarism and on what makes plagiarism reprehensible. In this paper we explicate the concept of "plagiarism" and discuss plagiarism normatively in relation to research. We suggest that plagiarism should be understood as "someone using someone else's intellectual product (such as texts ...

  12. Factors influencing plagiarism in higher education: A comparison of

    Over the past decades, plagiarism has been classified as a multi-layer phenomenon of dishonesty that occurs in higher education. A number of research papers have identified a host of factors such as gender, socialisation, efficiency gain, motivation for study, methodological uncertainties or easy access to electronic information via the Internet and new technologies, as reasons driving ...

  13. Plagiarism in research

    Abstract Plagiarism is a major problem for research. There are, however, divergent views on how to define plagiarism and on what makes plagiarism reprehensible. In this paper we explicate the concept of "plagiarism" and discuss plagiarism normatively in relation to research.

  14. Free Plagiarism Checker in Partnership with Turnitin

    Our plagiarism checker, AI Detector, Citation Generator, proofreading services, paraphrasing tool, grammar checker, summarize, and free Knowledge Base content are designed to help students produce quality academic papers. We make every effort to prevent our software from being used for fraudulent or manipulative purposes.

  15. Plagiarism Checker: Free Scan for Plagiarism

    Free plagiarism checker by EasyBib Our plagiarism checker detects plagiarism in your work and checks for common writing errors.

  16. How to Avoid Plagiarism in Research Papers (Part 1)

    Writing a research paper poses challenges in gathering literature and providing evidence for making your paper stronger. Drawing upon previously established ideas and values and adding pertinent information in your paper are necessary steps, but these need to be done with caution without falling into the trap of plagiarism. In order to understand how to avoid plagiarism, it is important to ...

  17. Avoid Plagiarism

    Intentional Plagiarism Examples: Copying someone else's research and turning it in as your own. Buying a paper from a private or commercial source. Copying out paragraphs, or even sentences, of articles or books and incorporating them into your own paper without documentation. Lifting ideas or interpretations from other sources without ...

  18. Plagiarism in Scientific Research and Publications and How to Prevent

    There are ways to avoid plagiarism, and should just be followed simple steps when writing a paper. There are several ways to avoid plagiarism ( 1, 6 ): Paraphrasing - When information is found that is great for research, it is read and written with own words. Quote - Very efficient way to avoid plagiarism.

  19. Full article: The case for academic plagiarism education: A PESA

    It is the argument of this paper that plagiarism education needs to be taught by examining plagiarism in the historical emergence of academic culture as a quasi-legal system together with its different genres and its academic norms, ethics, and procedures that govern the acceptability or non-acceptability of various practices of academic writing.

  20. IEEE

    How to identify and avoid potential plagiarism scenarios in papers.

  21. Recognizing & Avoiding Plagiarism in Your Research Paper

    Recognizing and Avoiding Plagiarism in Your Research Paper. Plagiarism in research is unfortunately still a serious problem today. Research papers with plagiarism contain unauthorized quoting from other authors; the writer may even try to pass off others' work as their own. This damages the individual's reputation, but also the entire class ...

  22. Free Plagiarism Checker Online for Students

    Advanced Online Plagiarism Checker for Students. Check your Paper and get a Report with Plagiarism Percentage. Free Usage ⌛Quick Results ☝️ High Quality.

  23. Plagiarism: Causes and Possible Solutions

    understood to be able to reduce plagiarism worldwide. This paper aims to investigate the. probable causes of plagiarism and to offer solutions which could further reduce the occurrence of ...

  24. Knowledge and practices of plagiarism among journal editors of Nepal

    With the rise in the number of publications, misconduct in research is increasing which is a global threat to evidence-based research [].The National Academy of Sciences in the United States (US) in 1992 defined misconduct in science as "fabrication, falsification, or plagiarism, in proposing, performing, or reporting research" [].Plagiarism is possibly the most serious and widely ...

  25. Similarity and Plagiarism in Scholarly Journal Submissions: Bringing

    INTRODUCTION What constitutes plagiarism? What are the methods to detect plagiarism? How do "plagiarism detection tools" assist in detecting plagiarism? What is the difference between plagiarism and similarity index? These are probably the most common questions regarding plagiarism that many research experts in scientific writing are usually faced with, but a definitive answer to them is ...

  26. In-text citations

    APA Style provides guidelines to help writers determine the appropriate level of citation and how to avoid plagiarism and self-plagiarism. We also provide specific guidance for in-text citation, including formats for interviews, classroom and intranet sources, and personal communications; in-text citations in general; and paraphrases and direct quotations.

  27. PlagiarismCheck Review: Trusted Plagiarism Checker for Your ...

    Plagiarism is the improper borrowing of someone else's ideas, words, data, or research results without properly acknowle ... They often need to write such papers as essays, research papers, and ...

  28. Full article: Prevalence of plagiarism in hijacked journals: A text

    The higher frequency of plagiarism in papers associated with developing countries can be attributed to less defined ethical norms in research (Kurt Citation 2018), tolerance for misconduct due to cultural and organizational characteristics (Ison, Citation 2018), different attitude to questionable research practices, for example self-plagiarism ...

  29. How to prove your innocence after a false positive from Turnitin

    AI detectors like Turnitin and GPTZero suffer from false positives that can accuse innocent students of cheating. Here's the advice of academics, AI scientists and students on how to deal with it.