ResearchPod

Fuzzy Logic: Opinion uncertainty and helping cities make better decisions

ResearchPod

Almost every city in the modern world faces the difficult challenge of understanding its citizens’ opinions and turning them into meaningful decisions.

Miloš Švaňa, a PhD student at Technical University of Ostrava, has decided to focus on this topic in his dissertation, with an aim to develop a framework for social media data analysis. By using fuzzy logic and fuzzy set theory to help municipalities understand the needs of their citizens, Švaňa uses this information to inform better decision making.

Read more from Miloš in an upcoming chapter of Springer Fuzzy Management Methods.

For more on Fuzzy logic, visit the FMsquare Foundation website

Hello, and welcome to ResearchPod. Thank you for listening, and joining us today.

Almost every city in the modern world faces the difficult challenge of understanding its citizens’ opinions and turning them into meaningful decisions. The natural language that we normally use to express ourselves introduce a lot of uncertainty, and this uncertainty reaches an even higher level if we consider the opinions of multiple people at once.

Generally speaking, there are two ways to deal with this issue. First, we can try to mitigate the uncertainty by forcing people to express their opinions in a different way, for example, by asking survey respondents to answer a predetermined set of questions by choosing one of the predetermined answers. 

The second option is to embrace the uncertainty of natural languages and even use it to improve our decisions. One of the many advantages of this mindset shift is that we don’t have to ask people to actively participate. We can gather lot of information by analyzing already existing natural language data, such as posts on social media.

However, if we want to properly incorporate uncertainty into our decision making, we need to switch to a different toolkit. Fuzzy logic and fuzzy sets are a great candidate.

As it turns out, this area is a bit underdeveloped, at least when it comes to tools that can be used in practical municipal decision making. That’s why MilošŠvaňa, a PhD student at Technical University of Ostrava, decided to focus on this topic in his dissertation. His goal was to develop a framework for social media data analysis that would use fuzzy set theory to help municipalities understand the needs of their citizens and use this information to make better decisions.

This framework processes social media data in three steps. The first step involves using sentiment analysis and topic modeling tools to analyze input data, mainly social media posts. 

Most sentiment analysis tools simply classify documents as positive, negative, and sometimes neutral. However, for the purposes of later steps, a method capable of expressing sentiment as a number was required. 

After evaluating a few different alternatives, Švaňa chose a sentiment analysis module provided by TextBlob — a text processing library for the Python programming language. This method is lexicon based, which means that it possesses a list of words with sentiment values assigned to them, and then calculates the overall sentiment of a document by aggregating the sentiment values of individual words. The result is a value from minus one to plus one , where minus one represents absolutely negative sentiment, zero represents neutral sentiment, and plus one represents absolutely positive sentiment. 

Topic modeling methods allow us to detect topics discussed in the input set of social media posts and tell us which of the extracted topics is being discussed in each individual post. Each topic is usually represented by a small set of most representative words. 

Again, there are many different approaches to topic modeling, each with its own unique strength and weaknesses. Švaňa chose a method called BERTopic (pron: Burrtopic), which is built on top of BERT, or Bee Ee Arr Tee — a much smaller predecessor to modern language models such as GPT, Gemini, or Claude. One of most important characteristics of this method is that it performs soft topic assignment. In other words, it assigns each social media post to multiple topics with different membership degrees. 

The second phase of the framework is called fuzzy aggregation. The data produced by sentiment analysis and topic modeling is used to construct a “triangular fuzzy number” for each topic that describes the overall sentiment expressed towards that topic. 

Unlike a simple aggregation such as arithmetic mean, this fuzzy number captures not only some sort of an average sentiment, but also the diversity of opinions — one of the sources of uncertainty we discussed in the beginning. 

To find the core of this triangular fuzzy number, Švaňa simply calculates the weighted mean of sentiment of all social media posts, as determined by TextBlob, belonging to a given topic. He uses the degree to which the post belongs to the topic as weight.

Determining the support is a bit more complicated. The two boundaries of the support interval are determined separately by calculating what could be called weighted standard semi-deviation. You are probably familiar with semi-variance. This metric commonly used in risk assessment is calculated just as normal variance, but we only consider data points on either the left or the right side from the mean.

Weighted semi-deviation extends this idea by first using the topic membership degrees as weights, and second by calculating the square root of the resulting value, a procedure which is used to transform variance into standard deviation. 

In his dissertation, Švaňa illustrates many desirable properties of this approach. For example, the length of the support interval is maximized if we are aggregating an equal number of posts with absolutely positive and absolutely negative sentiment.

Triangular fuzzy numbers are a great way to capture sentiment diversity, but they are not the best approach of presenting this information to the end user. The decision makers wanting to use social media sentiment very likely aren’t experts on fuzzy logic. Moreover, if we visualize the TFNs for multiple topics in a single plot, the resulting image quickly gets messy and hard to read.

Švaňa addresses this issue in the last, third phase of the framework. He determines the levels of positive and negative opinion expressed towards each topic. These two values can then be presented to the end user in many forms, for example, in a table. 

The levels of positive and negative opinions are calculated as degrees of similarity between the triangular fuzzy number representing sentiment towards a given topic and fuzzy sets representing the concepts of positive and negative opinion. To find the membership functions of these two fuzzy sets, Švaňa conducted a survey in which he asked respondents to determine the amount of positive and negative opinion in randomly selected but real tweets. He then found the best fitting trapezoidal membership functions for both fuzzy sets defined on the universe of sentiment polarities as provided by the TextBlob library. 

What was interesting about these the positive and negative opinion membership functions, is that they were not symmetrical. The negative sentiment membership function reached far deeper into positive polarity values than the positive sentiment membership function did into negative polarity values. Maybe we could speculate that people are more sensitive to negative than to positive aspects of social media posts.

After the two sets were defined, it was possible to calculate the degree of similarity with the triangular fuzzy number representing sentiment towards a specific topic, and by doing so determine the level of positive and negative opinion. 

There are many ways of calculating similarity between fuzzy sets. What worked best for Švaňa’s use-case was an approach originally proposed by Lee-Kwang, Song, and Loo in 1994. Simply put, do determine the degree of similarity, we find the intersection of the two sets using the standard minimum t-norm and then determine its height. The research covered today used the result of this simple procedure as the level of positive or negative opinion. 

However, he also performed a small transformation. It turned out that the levels of positive and negative opinion across different topics were “clumped” together — their values were very similar. So, to increase the ability of decision makers do distinguish the levels of different topics, their values were transformed to cover the entire zero to one interval. 

Taking this research into the real word, Švaňa decided to focus on two Czech cities: Ostrava and Brno. He analyzed tweets published from these two locations between January and March 2023. The framework detected tens of topics in both cities and discussing the results in detail would be quite a feat. So let’s focus on a few most interesting findings.

One of them is the comparison of opinions on healthcare and education in both cities. As evidenced by both a lower level of negative opinion and a higher level of positive opinion, people in Ostrava seem to perceive education more positively than healthcare. In Brno, the perception is reversed. This comes as a surprise as, at least when it comes to universities, Brno is typically ranked higher than Ostrava. But given what was observed, it is likely that Brno might have some issues at lower levels of education.

Švaňa suggests, that a result like this shouldn’t be seen as conclusive evidence that an expensive intervention is needed. Instead, it should be interpreted as a trigger for allocating some time, money, and human resources into a deeper investigation on what is going on with education in Brno. 

If we look at most controversial topics in both cities, fortunately most of them are quite benign. People tend to have diverging opinions on food. Citizens of Brno also disagreed about the topic photography, but also air-travel. In contrast, people in Ostrava argued about blocking users on social media or about appearance and personality.

The framework proposed is still a prototype, and Švaňa intends to continue working to improve it at many fronts, including the core methodology, or user experience. One important suggestion he received from people who are involved in actual municipal decision making is that they would like to use the framework to analyze data from other sources, such as free form responses from surveys, or that they would like to analyze the development of sentiment towards selected topics over time. 

At its fuzzy core, the framework could be improved by incorporating other sources of uncertainty. For example, does it make sense to express the sentiment of an individual post as a single number? Given that people can interpret the same piece of text differently, it might be better to use fuzzy sets to capture this ambiguity. 

Švaňa sees many opportunities to apply fuzzy logic in sentiment analysis as well as in using sentiment analysis to benefit all people, not just advertisers. Can we really capture the richness of human language in a single number or label as is the current approach? 

Should we use our technologies and skills to make people buy things they don’t need? Or can we do better? 

That’s all for this episode of ResearchPod. Thanks very much for listening. You can find more from Miloš in an upcoming chapter of Springer Fuzzy Management Methods , linked to in the show notes of this episode.  For more on Fuzzy logic, head to the FM Square foundations website, fmsquare dot org. Or search Researchpod dot org for “Fuzzy” . And, as always, stay tuned for more of the latest science. See you again soon.