In, what we can already call, our fifth session, we started off only speaking English and we did a bit of a reflection on the previous session. We learned a lot about linked data and how to make a linked data model via programs like turtle, writing with the RDF standard. We also did a great exercise on co-creation with Stanford Protegé, something I think is very important for this course to succeed. I really enjoyed learning a bit about code writing. The linked data made out of the subject, predicate and object was also something new for me and it turned out to be really interesting. Ivar gave us a short summary about the upcoming exercises and subjects. He talked about the start of the final project and the article which needs to be around 5000 words, about a self-chosen subject. We also talked about project management which is quite important for this project to succeed via a nice manner. Project management is about how the project is organized, planned, prepared and completed. An important tool for project management is the Kanban board. It makes your project extremely organized which leads to a good working environment. The picture below is one of the variants of the Kanban board. You put the different tasks at the stage which it is in on the Kanban board. David, Stan and I will definitely use the Kanban board for our project.
Then Simon, Ivar, David, Stan and I talked about “the Article” and they showed us a few things about writing one. Simon and Ivar showed us that google scholar is a great place to find information and articles and they gave us a button to access documents that are not published for everybody to read for free. They also taught us the basics of writing an article by using the APA Research Paper Model, among other things. They taught us these things because David, Stan and I are 3 secondary school students, who participate in the U-Talent academy. We are doing this course for our thesis (and for our “Profiel Werkstuk”), therefore we do not know everything about writing a proper article.
Then we started doing some preparation, with quite some difficulty, for the workshop from Dr Viola. We used the Homebrew software and Dr Viola began talking about Topic modelling. I will give a summary about the things she taught us.
A topic: “a number of words that are related to each other” or “a set of terms that are likely to occur together”. A topic modelling tool like Mallet takes any unstructured text without computer readable annotations and looks for topics. A topic modelling tool does not know the meaning of the words. The model will run statistical calculations and will determine which words are the most likely to occur together and makes clusters of them and as you know that is what we call a topic.
There are many purposes for topic modelling. It is very useful if you need to examine a large corpora and you have great difficulty finding topics, or if you need subjective information and patterns and if you just do not have much time. However, topic modelling is less useful if your research is about collocations or is highly interpretive and only about quality.
Then we went on with topic modelling using mallet. Topic modelling with mallet guarantees impartial results, this without any intervention of the researcher. In mallet you are able to vary the amount of topics. You need to take a look at the created topics and determine with which amount of topics you will have the best quality of data. The topics are really useful but you still need to know your data, otherwise you won’t understand the topics.
For the next time I would like to be informed beforehand, this way we are able to prepare before the session and we would not have had the problems installing homebrew during the course.
We could use the topic modelling to create an organized summary of all the different newspaper articles related to the Neude. We will definitely try to use the topic modelling with our project.
By Brent van Dijken