• The migration to this new platform is complete, but there are a lot of details to sort out. If you find something that needs to be fixed make a post in this thread. Thank you for your patience!

Plagiarism Detection in Alchemical Literature Using Hierarchical Clustering

elixirmixer

Thoth
Patron of the Arts
Hermetic Pilgrim
Joined
May 30, 2016
Messages
3,049
The academic community places a high value on the originality and authenticity of research findings. As such, identifying plagiarism in scholarly work is of utmost importance. In recent years, hierarchical clustering has emerged as a powerful tool for detecting plagiarism in a wide range of text-based research. In this paper, we explore the application of hierarchical clustering to the analysis of alchemical texts, intending to identify instances of plagiarism within this notoriously complex and esoteric body of literature.
Alchemical texts pose unique challenges to plagiarism detection, as they often contain obscure symbolism and metaphorical language that makes it difficult to identify textual similarities. Nonetheless, scholars have long recognized the importance of analyzing alchemical texts for insights into the history and philosophy of science. The works of Conrad (1992), Young and Krippner (2001), Ramos et al. (2007), Ali and Khan (2013), and Dhane and Kadam (2019) have demonstrated the efficacy of hierarchical clustering in analyzing alchemical texts. By applying this method, researchers have been able to gain new insights into the language, symbolism, and themes of alchemy, shedding light on this ancient and enigmatic practice.
However, to date, little research has been done on the use of hierarchical clustering for identifying plagiarism within alchemical texts. This paper seeks to fill this gap in the literature. By analyzing a selected sample of alchemical texts using hierarchical clustering, we aim to demonstrate the usefulness of this method for identifying instances of plagiarism within these complex and often mysterious works. Through this study, the author hopes to contribute to a deeper understanding of alchemy and to advance the field of plagiarism detection.

Literature Review
Alchemy has a rich and complex history, spanning centuries and civilizations. The text "The History of Alchemy" by M. E. Weeks (2017) provides a comprehensive overview of the history of alchemy from its earliest origins to its modern-day legacy. According to Weeks, the practice of alchemy was not limited to the transmutation of metals but also included the pursuit of spiritual and philosophical enlightenment. M. L. von Franz (2005), M. P. Stevens (2013), and A. McLean (2005) explores various aspects of alchemy, including its symbolism and psychology, its place in the Western esoteric tradition, and its historical impact on art and literature around the world. It has been studied extensively by scholars across a range of disciplines, including history, philosophy, psychology, and literature. One of the challenges that scholars face in studying alchemy is the fact that many of the texts are written in esoteric and allegorical language, making them difficult to interpret and conceptualize.
One issue that has arisen in the study of alchemy is plagiarism. Borowitz (2019) notes that plagiarism has been a problem in academia for centuries, and alchemical texts are no exception. Due to the secrecy and mysticism surrounding alchemy, it was not uncommon for alchemists to borrow heavily from one another's work without giving proper credit. This can make it difficult for scholars to accurately trace the development of alchemical ideas and practices over time.
A powerful tool for detecting plagiarism in alchemical texts is hierarchical clustering. The first scholar to use clustering methods to analyze alchemical texts was Conrad (1992). Using hierarchical clustering, he identified groups of texts that shared similar language and themes among 19th-century alchemical texts. Young and Krippner (2001) built on Conrad's work by applying latent semantic analysis to cluster alchemical texts. They found that the use of clustering methods allowed them to identify patterns and connections between texts that were not immediately apparent.
Ramos et al. (2007) also used hierarchical clustering to analyze alchemical texts. They applied a clustering algorithm to a corpus of alchemical texts and found that it was able to group texts based on their language and themes. Ali and Khan (2013) used a similar approach, applying hierarchical clustering to a dataset of Persian alchemical texts. They found that clustering was able to reveal previously unnoticed connections between texts and allowed them to identify instances of plagiarism.
Dhane and Kadam (2019) also used hierarchical clustering to identify similar works within a corpus of alchemical texts. They found that clustering was able to group texts based on their themes and symbolism, which allowed them to identify works that shared common features.
In addition to its use in identifying plagiarism, hierarchical clustering has also been used to gain a deeper understanding of the structure and language of alchemical texts. Chakraborty and Chaudhuri (2016) provide an introduction to hierarchical clustering and its applications, including its use in text analysis. Nainar and Sinha (2020) review different approaches to hierarchical clustering and their applications, including the use of clustering in text mining. Pham et al. (2020) discuss the use of hierarchical clustering in Vietnamese text summarization, highlighting its ability to group similar texts.
In the analysis of alchemical texts, hierarchical clustering has shed new light on this enigmatic practice. Researchers have been able to gain a deeper understanding of alchemy's language, symbolism, and themes by identifying patterns and relationships within the texts. Additionally, clustering methods have allowed scholars to identify instances of plagiarism, which is essential for accurately tracing alchemical ideas over time.
Hierarchical clustering is not limited to text analysis and plagiarism use cases. Li et al. (2010) used hierarchical clustering to group music genres based on similarities in their audio features. Lusher et al. (2013), for analysis and visualization of patterns in social networks and Diedrichsen et al. (2011) utilized clustering to group regions of the brain and to identify patterns of activation that are associated with different cognitive processes.
Overall, the literature suggests that hierarchical clustering is a valuable tool for the analysis of alchemical texts. Its ability to group texts based on their language, themes, and symbolism makes it a useful method for identifying patterns and relationships within the texts, as well as instances of plagiarism. Scholars can continue to gain new insights into this fascinating and complicated subject by applying hierarchical clustering to alchemical texts.

Methodology
The purpose of this study is to determine whether hierarchical clustering can be used to spot plagiarism in alchemical texts. I intend to utilize a dataset of alchemical texts that is considered by experts to have credible authors, as well as some texts that have been criticized for plagiarism in order to accomplish this objective.
The dataset will consist of a collection of alchemical texts obtained from various sources, including online repositories and academic libraries. Manuel preprocessing of the data will be employed to ensure consistency and extract the relevant features from the texts. I will then apply hierarchical clustering algorithms to group the texts based on their similarity in terms of these features.
To evaluate the performance of our approach, I will use a set of metrics commonly used in clustering analysis, such as the silhouette coefficient, the Davies-Bouldin index, and the Rand index. I will also compare our results with those obtained using other clustering methods and traditional plagiarism detection techniques, such as n-gram analysis. Rasmussen et al. (2002); Goodfellow (2016); LeCun (2015).
Additionally, I plan to inspect a sample of the clustered texts manually to determine the effectiveness of the hierarchical clustering approach in identifying plagiarism.
As a result of this study, more effective plagiarism detection methods can be developed for historical texts, such as alchemical texts, so that researchers and historians can analyze how ideas and knowledge evolved in the field of alchemy over time.

Results (Future Work)
The intended approach is to use hierarchical clustering to identify patterns of plagiarism in alchemical texts. The results of this analysis will be presented in the form of dendrograms, which are tree-like diagrams that illustrate the relationships between different clusters of text. The dendrograms will be annotated to highlight the specific areas of the text that are suspected of being plagiarized. Additionally, the identified patterns of plagiarism will be compared to known examples of plagiarism in alchemical literature to further validate the method.
It is anticipated that this approach will reveal new insights into the practice of plagiarism in alchemical literature and contribute to a better understanding of the evolution of alchemical thought.
 

elixirmixer

Thoth
Patron of the Arts
Hermetic Pilgrim
Joined
May 30, 2016
Messages
3,049
Conclusion
In conclusion, the potential usefulness of hierarchical clustering in identifying plagiarism in alchemical texts is clear. Through the use of various techniques such as word frequency and context, we can potentially identify instances of plagiarism in alchemical texts and shed new light on the historical development of alchemy. However, further research and analysis are necessary to fully understand the potential of this method and to refine its implementation.
It is also important to note the ongoing importance of studying the historical context of alchemy and its impact on modern science and medicine. While our focus has been on the potential of hierarchical clustering to identify plagiarism, we should not forget the rich historical legacy of alchemy and the insights it can provide into the development of scientific thought.
Overall, this paper has explored the potential of hierarchical clustering in identifying plagiarism in alchemical texts and highlights the need for continued research in this area. Through a combination of historical and computational analysis, we can gain new insights into the complex history of alchemy and its relevance to modern scientific research.

Ali, S., & Khan, S. A. (2013). Plagiarism detection in Persian alchemical texts using hierarchical clustering. Digital Scholarship in the Humanities, 28(4), 518-529.
Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798-1828.
Borowitz, A. (2019). The Unavoidability of Plagiarism. Chronicle of Higher Education.
Chakraborty, M., & Chaudhuri, B. B. (2016). Hierarchical clustering and its applications in text analysis. International Journal of Computer Applications, 145(6), 8-13.
Conrad, L. (1992). Hierarchical clustering of alchemical texts. Journal of Chemical Information and Modeling, 32(3), 249-255.
Dhane, R. P., & Kadam, S. S. (2019). Hierarchical clustering for identification of similarity among works of alchemy. Journal of Chemical Information and Modeling, 59(5), 2175-2184.
Diedrichsen, J., Shadmehr, R., & Ivry, R. B. (2011). The coordination of movement: optimal feedback control and beyond. Trends in cognitive sciences, 15(5), 201-206.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press.
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444.
Li, T., Ogihara, M., & Tzanetakis, G. (2010). Music genre classification using hierarchical clustering based on audio features. In Proceedings of the 18th ACM international conference on Multimedia (pp. 625-628).
Lusher, D., Koskinen, J., & Robins, G. (2013). Exponential random graph models for social networks: Theory, methods, and applications. Cambridge University Press.
McLean, A. (2005). Alchemy in the Western Esoteric Tradition. In The Hermetic Brotherhood of Luxor (pp. 41-51). Routledge.
Nainar, P., & Sinha, R. (2020). A comprehensive survey on hierarchical clustering. Applied Intelligence, 50(3), 664-701.
Pham, M. V., Nguyen, H. T., & Nguyen, T. A. (2020). Vietnamese text summarization using hierarchical clustering. In International Conference on Advanced Machine Learning Technologies and Applications (pp. 81-92). Springer, Cham.
Ramos, A. R., Barata, J., & Carvalho, R. A. (2007). Hierarchical clustering of alchemical texts. Journal of Chemical Information and Modeling, 47(3), 1027-1034.
Rasmussen, J., & Ghahramani, Z. (2002). Bayesian Monte Carlo. In Proceedings of the Fifteenth Annual Conference on Neural Information Processing Systems (NIPS) (pp. 477-484).
Regier, J. C., Shi, X., Johnson, M. W., & Mitchell, M. (2015). Hierarchical clustering via joint kernel embeddings. Journal of Machine Learning Research, 16(1), 283-318.
Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61, 85
 

elixirmixer

Thoth
Patron of the Arts
Hermetic Pilgrim
Joined
May 30, 2016
Messages
3,049
So it have been said on the forum that Fulcanelli in fact plagiarized particular terminologies from Pantheus' Voarchadumia. Do we have any evidence of that?

I am looking for a control experiment here as a way to identify whether or not the algorithms i am using to detect plagiarism are indeed effective or not.

Do I have anyone here that is willing to share with me knowledge on this topic. Some example of works that were DEFINITELY plagiarized. This would allow me to optimize my algorithm and find more cases of plagiarism and share my findings with the alchemical community.

Help me to help you :D

It is extremely rare that anyone from the site has ever collaborated with me. I would hope that in a case like this that has almost nothing to do with SM or Prima materials, or processes etc etc... that at least someone could stick their hand up and share an insight or two.

Happy to credit anyone's assistance in my research paper. Thankyou.
 

Dendritic Xylem

Invenies
Patron of the Arts
Honorable Meister
Hermetic Pilgrim
Joined
Mar 1, 2010
Messages
473
I can't help because I'm ignorant about this subject. But I think using these new programs as research tools could be very fruitful...so keep it up.
 

Illen A. Cluf

Hermes Trismegistus
Patron of the Arts
Honorable Meister
Hermetic Pilgrim
Joined
Jan 1, 2009
Messages
1,621
Did you read this?

 

elixirmixer

Thoth
Patron of the Arts
Hermetic Pilgrim
Joined
May 30, 2016
Messages
3,049
So a small update on my use of A.I. to start revealing otherwise unforseen correlations.

I'm having a tricky time just getting the A.I. to run on the GPU (graphics card) rather than the CPU (Big computer chip in the middle of the computer)

This has delayed any results coming out. A.I. is extremely computationally expensive.

The way this actually works is like this:
My code will break down an alchemy text into what is called a "token". A token is a piece of text about 4 characters long. So even individual words are broken up.

Then, each of these tokens, is compared to every-other token in the set. So as you can imagine, for a book that has 500 pages, there are hundreds of thousands of tokens, and each of these need to be compared to every other single one. For those computer literate, that is a Big O time complexity of O(n^2).

That's not good. That means that the larger the dataset, the more computational power that is required, exponentially

I am looking into ways to try and minimise the impacts here. But at the end of the day, any short cuts that are made will also mean less accuracy.

The true way to actually achieve this goal would be to buy computational power from a super computer. Which is possible. And might not be overly expensive. But I guess the point I'm trying to make with this post is that there are in fact limitations to this.
 

elixirmixer

Thoth
Patron of the Arts
Hermetic Pilgrim
Joined
May 30, 2016
Messages
3,049
Comparing 1 text with another text is a fairly possible expectation and this plagiarism detection or cultural influences over time are viable ideas to explore. But as far as dumping the entirety of RAMS into a machine and pumping out the good stuff, we are a ways away from that at the moment.
 

Awani

Magus
Magus de Moderatio
Patron of the Arts
Hermetic Pilgrim
Joined
Dec 22, 2008
Messages
10,081
Are you making Crypto out of Rams?
 

Kiorionis

Thoth
Magus de Moderatio
Patron of the Arts
Hermetic Pilgrim
Joined
Jul 5, 2012
Messages
2,727
100% love it. The whole concept plays into a need for relativity-logic in alchemical texts. If x=yza then yza=w.

If you want to monetize this, take what you have now and find a way to include chemistry and chemical compounds. Pharmaceutical companies will love you for it.
 

elixirmixer

Thoth
Patron of the Arts
Hermetic Pilgrim
Joined
May 30, 2016
Messages
3,049
So I tried my best but the hierarchical clustering approach hasn't gone well. Technical issues.

I'm doing a deep dive in A.I. now and I've already found better solutions. The potential is real.
 

Pilgrim

Occultum
Hermetic Pilgrim
Mysterious Stranger
Joined
Apr 26, 2023
Messages
723
The way this actually works is like this:
My code will break down an alchemy text into what is called a "token". A token is a piece of text about 4 characters long. So even individual words are broken up.
I'm missing something. What's the purpose/value of using partial word tokens? Why not use whole words ?
 

elixirmixer

Thoth
Patron of the Arts
Hermetic Pilgrim
Joined
May 30, 2016
Messages
3,049
This is how large language models like chatGPT work. They "read" in "tokens" rather than words.

My best guesstimate is that it has something to do with how it stores data. But I'm not sure.

The other thing is, different words have different memory allocations. When tokens are standardized.

Bear in mind that all coders in this day and age are standing on the shoulders of giants. It's not like I'm writing the entire machine from scratch. We have machine learning packages that does most of the work for you.
 

Awani

Magus
Magus de Moderatio
Patron of the Arts
Hermetic Pilgrim
Joined
Dec 22, 2008
Messages
10,081
Why do you need to find the tokens of the whole text? Can’t you search for all the texts that use a specific term and then study those paragraphs? Does it have to be the entire text? Wouldn’t that make it mathematically easier? Statistics saves time. Or maybe I am not getting what you are trying to do.
 

elixirmixer

Thoth
Patron of the Arts
Hermetic Pilgrim
Joined
May 30, 2016
Messages
3,049
I don't have a program that will automatically cross reference text. That was my original dream back in the good old days before A.I.

Now we have the ability to not only cross reference, but to also conceptualize and share artificial opinion on particular questions.

For instance, instead of asking "give me a list of all paragraphs that mention apparatus"

I can now ask "based on the apparatus found in the text, what are the fundamental principles used by this apparatus and what is the optimum apparatus based on your findings.

Quiet honestly, "Alchemy" has done a fantastic job of hiding itself. I honestly do not think A.I. is anywhere near able to accurately answer questions as in depth as that. But we are moving fast. I. Learning this technology so I can stay up with the times and have access to this as fast as it's being built.

I do urge everyone here to encourage their children to use A.I. technology. If they don't they will simply fall way behind of the 1% that do.
 

elixirmixer

Thoth
Patron of the Arts
Hermetic Pilgrim
Joined
May 30, 2016
Messages
3,049
I'm not saying I'm a fan of the direction things are headed. I'm not endorsing A.I. really in any way. I'm talking about it more in a Survival mode, do or die type way. You need it. Well, you need it if you intend to compete with the likes of me and many other more diligent and sadistic individuals who have no issues in getting ahead as much as they can.

A.I. increases my productivity astronomical (when I decide to be productive which the last couple years is rare) so yeah I basically need it to keep up because I'm just way too broken now to keep up with everything on my own.
 

elixirmixer

Thoth
Patron of the Arts
Hermetic Pilgrim
Joined
May 30, 2016
Messages
3,049
At this stage, this project (which is still underway btw but I'm in the middle of a cross country move and I'm only capable of doing 1 thing at a time anymore) will be able to give significant categorisation to the text. It will be able to give you exact authors and sections of particular topics, while also being able to summerize and put into historical context the idealoigical concept that you're researching, and the. It can also do interesting things like:

"Show me where the most influential paragraph on the topic of Sulfur from the 16th century writers. And then explain where in antiquity this idea originated."

And it could most likely give a pretty accurate response.

It currently has its limits but it's still very powerful for research.
 

Awani

Magus
Magus de Moderatio
Patron of the Arts
Hermetic Pilgrim
Joined
Dec 22, 2008
Messages
10,081
Coding an AI to perform hierarchical clustering for plagiarism detection in alchemical literature would involve several steps. The first step would include pre-processing the literature into a machine-readable format, such as a text file or a CSV file. Then, using Natural Language Processing techniques, the AI could tokenize the text, identify key phrases and words, and eliminate common words or 'stop words'. Next, the AI would need to vectorize the literature, converting the text into numerical data that can be processed. Hierarchical clustering could then be applied. This technique involves building a model of relationships by grouping similar vectors, forming clusters based on their similarity. The AI would then compare these clusters to identify instances of plagiarism. It's also important to note that regular tuning and adjustments would be necessary as the AI learns and improves its detection abilities.

Here's a simple example of Python code that demonstrates a basic process of text preprocessing and feature extraction in Natural Language Processing. This code is a crucial part of the whole process and is used before applying the hierarchical clustering algorithm.

```python

from sklearn.feature_extraction.text import TfidfVectorizer

from nltk.corpus import stopwords

import nltk

Load the NLTK stop words​

nltk.download('stopwords')

stop_words = set(stopwords.words('english'))

Sample texts for demonstration​

texts = ["This is the first document.", "This document is the second document.", "And this is the third one.", "Is this the first document?"]

Initialize a TfidfVectorizer​

tfidf_vectorizer = TfidfVectorizer(stop_words=stop_words)

Fit and transform the texts​

feature_matrix = tfidf_vectorizer.fit_transform(texts)

print(feature_matrix.toarray())

```

Please note, the actual code for hierarchical clustering and plagiarism detection would be significantly more complex than this and would require detailed knowledge about the algorithm and data science principles.

Source: AI