The story of my bachelor thesis about Software Repository Mining

Open Source Community

This post is about my bachelor thesis, a conference, incredible people from Spain, some academic papers, a lot of open source, and how it can be if you let the flow runs :)

Everything started in February 2013. At this time, I studied business informatics next to my full-time job at wmdb Systems GmbH as a software developer (mainly as a web developer for TYPO3 related projects).

The regular time of this course of studies is seven semesters. In February 2013, I was in the 5th semester, and I passed every exam. Just in (regular) time.

In the 7th semester, I had to write my bachelor thesis. A scientific paper about a specific topic with ~40-60 pages. I was thinking a lot about a possible topic, and I knew that I wanted to write something about software, software development, etc. But I did not have a specific topic in mind at this time.

On 2nd & 3rd, February 2013, I visited FOSDEM in Brussels, Belgium for the first time with some of my friends. It was not far away from my hometown (~200 km), so it was a very short ride. And due to the wide range of topics, this conference sounds very interesting!

At the conference, I attend many talks about different topics. But one talk was special for me: Do you want to measure your project? by Jesus M. Gonzalez-Barahona (Video, Slides). Jesus talks about Metrics Grimoire, a toolset to crawl data which were produced during software development and Viz Grimoire, a toolset to visualize the received data. Some tools of MetricsGrimoire are CVSAnalY to crawl VCS (CVS, Subversion, Git, …), Bicho to extract data from Bugtracker (Jira, Redmine, Mantis, …) or MLStats to get email content from mailing lists.

After every conference, I`m very motivated to start new things and have a more in-depth look at topics where I attend talks. This was the same for FOSDEM. After the conference, I had a more in-depth look at CVSAnaly. I downloaded it, installed it, and started to crawl some repositories I know (e.g. TYPO3.CMS.git). Sixteen days after the conference, I began to contribute to CVSAnaly with a short notice about the max_allowed_packet of MySQL for the first time.

At this time, I didn`t write one line Python. But every programmer has to learn one language per year. I started to learn Python and continued contribution to CVSAnaly. In the same time (Apr 19, 2013) I started TYPO3-Analytics, a project to analyze and visualize various data sources of the TYPO3 ecosystem based on open/standard APIs. CVSAnaly was integrated into this analysis suite.

From my point of view, the topic of analyzing data that will be produced during software development was fascinating because I’m a software developer. And how cool is it to get new information and knowledge from data you have produced?!? So I continued to develop TYPO3-Analytics and started to do some little research about the Mining Software Repositories field.

Due to this enthusiasm, I found my topic for my bachelor thesis: Software Repository Mining - Concept and Potentials. In October / November 2013, I looked for a professor as an adviser, registered this topic at my university, started to research, and wrote this paper. I continued to work at TYPO3-Analytics and CVSAnaly. During this time, I found a lot of exciting papers about programming topics. Here is a small list of this paper (if you want more, just ping me ;)):

At 1st & 2nd February 2014 the FOSDEM event retakes places in Brussels, Belgium. Due to the communication at the MetricsGrimoire mailinglist i knew that people from Bitergia (@jgbarah, @sanacl, @dizquierdo, etc.) were there as well. I twittered that I want to meet them and ~1 hour later we had a lovely chat in the cafeteria of the University of Brussels. I showed and talked about the concepts of TYPO3-Analysis and got positive feedback.

~13 months later (since FOSDEM 2013), many things happen. I learned about …

  • new tools (CVSAnaly, Bicho, …)
  • a new programing language (Python)
  • a lot of programming experience
  • concepts with new tools (TYPO3-Analytics, RabbitMQ, supervisord, Vagrant, Chef…)
  • new friendly and interesting people (Jesus, Luis, Daniel, …)
  • a fresh business concept (Bitergia)
  • much new knowledge about an exciting research field I did not know before (Mining Software Repositories)

And finally, I passed my Bachelor of Science degree today.

And the whole story might be another one if I had not to visit FOSDEM in February 2013. A huge THANK YOU to all people who were part of this story (creator of tools, speaker, twitterer, IRC chat attendees, and other community members of the open-source world). The whole time was a lot of fun. Of course, there was stress, too. But the fun was much more prominent.

And what is the conclusion of this story? I do not know. Maybe visit conferences and be part of an amazing community? I do not know. But I can recommend this way to try it out! If you need help to get into it, because you do not know how: Ping me :) I will help you do get the right foot in!