From the 7th to the 8th February 2012 I attended an Industry Forum for Web Science in Southampton. I am presented a poster on reading hyperlinks on the Web and using the methodology of eye tracking to measure the impact of hyperlinks on reading. Below is a link to a pdf of the poster.
Fitzsimmons, G., Weal, M. & Drieghe, D. (2013, February). On Measuring the Impact of Hyperlinks on Reading. Poster presented at the Industry Forum for Web Science, Southampton.
The Importance of Politics to Web Science
Web Science is a new and exciting research field, and has been created in response to the massive impact that the modern Web has upon our lives. Web Science takes an interdisciplinary approach – i.e., not just studying the software or hardware that creates the Web, but tackling what the Web is, and what influence it has, from a broad range of perspectives. These include, but are not limited to, areas as diverse as archaeology, sociology, psychology, economics, and many others. In this blog post, I will be focusing on how politics is important to Web Science. Politics is a particularly interesting topic in relation to Web Science because of the immediate impact that political issues and elections can have upon society. I will be addressing the following two inter-related questions:
- Does the Web influence how the government interacts with the people?
- Does the Web influence how the people view the government, politicians, or political issues?
I will address these questions in turn, highlighting how they have been studied so far. I will then close with some concluding thoughts and suggestions for additional reading.
Does the Web influence how the government interacts with the people?
The Web is a fantastic service for allowing large numbers of people to interact with the government. To date, the government has made a number of attempts at collecting and collating vast quantities of information and presenting them online. One good example of this is the Directgov website. Though it is clear that this website is comprehensive, it is not necessarily very easy to navigate, or very easy to use to find information.
A more direct method for the government to interact with people comes in the form of petitions and requests. There is an official government e-petitions website where petitions can be put forward and signatures added. This has already resulted in direct action from the government as a result of petitions which were widely supported. For example, Alan Turing, the famous computer scientist, was given a public apology by Gordon Brown for his treatment prior to his suicide.
Turning to local government, many local councils now allow for online services such as online commenting on planning applications, which again makes it easier for the local population to have their voices heard in how their local area is governed.
Challenges for the Future
It remains an open question whether the government-provisioned websites such as Directgov are actually effective. To date, there has been relatively little research done to examine how easily these sites can be navigated, or the impact that they have. Furthermore, there are broader issues that need to be considered in relation to these sites. Although Internet access is now widespread, there are still many individuals who are not able to access the Internet: this is known as the digital divide. As can be expected, anyone without Internet access will be left behind in terms of being able to utilise government websites and the services that they have on offer; as a result, they will be unable to make their voices heard (e.g., with petitions).
Elsewhere, a new form of government-interaction website has begun to emerge, in the form of TheyWorkForYou.com. This site is aimed at enabling people to discover exactly what their local MPs have been doing, including how they have been voting, what debates they have attended, and so on. Though sites of this type are rather new and have not yet been studied in depth, the claims made by the charity which runs the site (mySociety.org) are impressive in terms of getting the government and people to interact (see Figure).
What does this mean to Web Science? The fact that people can now directly interact with the government online, and find out what the government is doing suggests that this area of politics is important to Web Science, though, as already noted, much of this remains to be researched in full.
Does the Web influence how the people view the government, politicians, or political issues?
The rise of social websites (e.g., Facebook, Twitter, etc.) has had an influence on how people view the government, politicians and political issues. There are many ways in which people can be reached with political-related information on social websites, including both from their friends commenting on political issues or linking politics-related links from elsewhere on the Web. People can also directly follow politicians’ accounts as well. During the 2010 elections, David Cameron was one of the first to use online videos for advertising as a politician on his YouTube channel, WebCameron. Nick Clegg had daily videos on YouTube on the LibDem channel during his campaign and Gordon Brown also appeared in his own YouTube videos via the official Number 10 channel. Each had their own take on the videos trying to show them in different lights, i.e. David Cameron had home-movie style shots showing him washing up like an ‘everyman’. This was an important component in him reaching out to the masses, rather than appearing as a wealthy member of the upper-class.
Future Challenges: The Good and the Bad
With the rise of social websites, we have seen both the good and the bad in relation to politics. It has been argued by many that without the availability of social networks to allow rapid organisation between large numbers of individuals, the revolutions that took place in the Middle East during 2011 would not have been possible. This demonstrates clearly how politics can be very important indeed to Web Science. However, it was also argued that social websites were also used by rioters in the UK to organise where to meet and what locations to target, though there was also a positive side to this particular story. The London riot clean-up operation used social networking as a platform to organise a clean up after the riots. This shows that mass unrest can be organised and cleaned up by the mass organisation of people through social networking.
Elsewhere, politicians keen to embrace the Web have also found that their efforts do not always go as planned. Gordon Brown’s video of the MP’s expenses scandal was highlighted as bad PR, not for its content, but because of the awkwardness of Gordon Brown during the video. A spoof video of Gordon Brown‘s video was created and the message from his video was lost proving that putting things online does not always work out well.
Much of what has been discussed above focuses on conjecture of suggestions relating to cause and effect, or assumes that certain websites are effective without a great deal of hard evidence. Though from these examples it is clear that the study of politics is important to Web Science, and has much to teach Web Scientists, there are still a large number of unanswered questions. For example: How many people visit Directgov? Do they find it useful? To what extent does Web use actually correlate (or cause) voting for one political party over another? Addressing these questions is made particularly difficult as the Web is developing so rapidly, and, furthermore, much of the answers to these questions are bound up in blogs, Facebook statuses and Tweets of many millions of people, which are hard to access and difficult to examine in a scientific manner.
Cohen, S., & Eimicke, W. (2003). The future of e-government: A project of potential trends and issues. Proceedings of the 36th Hawaii International Conference on System Sciences, IEEEXpore database.
Towner, T.L., & Dulio, D.A. (2011). The Web 2.0 election: Does the online medium matter? Journal of Political Marketing, 10, 165-188.
We live in the Information Age: we have access to more information and more data than people have ever been able to access before. Most people would agree that this has had huge benefits. For example, in the past, if we had wanted to find out something, or read up on a topic (such as about a famous person or battle), we would have had to search an Encyclopaedia for an answer. If the Encyclopaedia had no answers for us, we may have had to go down to our library to research the answers we were looking for. But that is no longer the case. With the creation and expansion of the modern Web, we can find out information after typing just a few words into a search engine.
However, with the expansion of the Web and information storage, we are facing a new set of problems. We now have vast quantities of data—more than we can really be able to imagine or comprehend—and are creating more and more each day. A good example is that most modern smartphones have data storage capacities that dwarf that of the average home PC from just five years ago. At a recent conference, Erik Schmidt , executive chairman of Google, commented that we create as much information in 2 days as we did up to 2003, which is around 5 exabytes of information created every 2 days! With the proliferation of social websites, the amount of information and data being created has expanded even more rapidly:
- Facebook has over 250 million photos uploaded per day on average
- Twitter are seeing 50 million tweets per day
- YouTube has 48 hours of video uploaded every minute (which is nearly 8 years of content a day)
It’s perhaps no surprise that Google’s Chief Economist, Hal Varian, has said that dealing with all this data will be “the sexy job” of the next 10 years. He has argued that all companies will need data scientists to keep up with the ever-growing requirements of the information that we are generating.
In this article, I will explore the various algorithms that are being used to deal with the massive increases in data that we are creating, covering distributed processing, Semantic Web and natural language processing.
Distributed processing – Multicore processing and Cloud computing
What is it?
A standard algorithm would operate in serial - in other words, the algorithm would run through the data available, piece by piece, until it was finished. However, this can be slow with large datasets, so one solution to dealing with massive datasets is to split up the task of dealing with those datasets into smaller pieces. This can be achieved with distributed processing, which involves running distributed algorithms on a number of different processors, either using a single machine (called parallel processing), or across multiple machines (cloud computing). The algorithm splits up the task and runs on the different processors, vastly speeding up the computation process.
How will it help?
Using distributed algorithms will save time and money compared to serial algorithms. Distributed algorithms were recently used to calculate the 2,000,000,000,000,000th digit of pi (that’s the two quadrillionth digit). “It took 23 days on 1,000 of Yahoo’s computers – on a standard PC, the calculation would have taken 500 years.” In case you were wondering, the digit was ’0′.
What are the Limitations?
There are challenges in creating distributed processing algorithms because they are much more complex than serial algorithms. Given that parallel algorithms work by cutting up a single large task into many smaller tasks, it’s vital that the algorithm will still work on the data when it has been broken down into smaller pieces. Whenever one part of the algorithm is unable to finish because it is waiting for information from another part of the algorithm, there is lag, which can be costly, so co-ordinating the distribution process is vital. Problems can also arise when different processors which are working on different sub-tasks all try to modify the same data at the same time. This means that most efficient way to run a distributed algorithm is to have the parts independent of each other so they can be completed separately and then simultaneously combined when all sub-tasks have finished.
What is it?
Distributed algorithms are used widely for various tasks in crunching data. But let’s turn now to searching for information within the masses of data that we are generating. When we use a search engine, we enter a string of words that we are searching for, and the search engine dutifully goes off and tries to find the information we are looking for. But there’s a problem. The search engine doesn’t ‘understand’ the pages it is examining, so can often give us suggested pages that we don’t want.
One solution to this that has been proposed is the Semantic Web. The Semantic Web approach promotes detailed formatting of data and Web pages to give rise to an ‘intelligent’ version of the Web. When Web pages are formatted using Semantic content, a Semantic search engine will be able to understand the content on each page and then be more able to locate what users are searching for. Even better, it will be able to hunt down words and phrases related to what is being searched for.
Within the Semantic Web, an ontology (structure of knowledge) defines the entities, classes, relationships and rules within a specific domain of knowledge. This is achieved using RDF (Resource Description Framework) and OWL (Web Ontology Language).
An ontology is created using RDF (a framework for describing data such as title, type of content, etc) and OWL (the language for processing the information, designed to be read by computer applications to help ‘understand’ the information) to create hierarchal description of structured data.
How will it help?
A useful analogy is to think of the Semantic Web operating rather like organising a library into meaningful sub-sections, making it possible to browse through the related content and books without having to exhaustively search every book in the building. In other words, by including additional, meaningful data in Web content, searches will be more accurately able to pinpoint what the user is trying to find. When using search engines which incorporate semantic information, they should be able to suggest answers from other, related words of phrases, helping you to find what you are looking for much more easily.
The main limitation is that the semantic data needs to be set up, maintained and updated. At the moment semantic data is not on every Web page, and it would take time to add the information. This is a classic problem with adopting new forms of technology.
Natural Language Processing
What is it?
Another approach to using more ‘intelligent’ algorithms comes in the form of Natural Language Processing (NLP). This involves mining facts from unstructured data, which is useful because naturally-occurring language data is very common on open-ended information sources such as the Web. NLP uses machine learning algorithms to learn, piece by piece, a model of human language, and derives information from the models that are generated. It is a branch of artificial intelligence which utilises algorithms that can learn over time based upon the data that they receive.
NLP algorithms are capable of many tasks, such as:
- Relationship extraction – given a chunk of text it can work out relationships in the text. Then if you ask “Who is the wife of David Cameron?” after giving it a news report about David Cameron and his family it will be able to work it out from text.
- Question answering – given natural language questions it should be able to automatically answer them.
Many more tasks that they are capable of are listed here.
Perhaps the most recent example of NLP algorithms in use are those that make up the Apple application Siri which can understand and complete tasks said in natural language. It can also learn as you use it, e.g. remembering people. If you say to ‘call my wife’, it can remember this and link it to her name so in future saying ‘call my wife’ will call her without you having to say her full name.
How will it help?
NLP can be used to answer complex questions that are embedded with the open-ended, unstructured language and information that is predominant on the modern Web. NLP can therefore be used to rapidly answer complex questions that a simple Web search may not be able to address very easily. Unlike semantic content, we don’t need to add the data formatting ourselves: NLP algorithms can work it out for themselves.
As with the distributed processing algorithms, NLP algorithms are complex and difficult to create. They require a massive corpus of data to be trained efficiently, and take time to ‘understand’ the data they have available to them. Furthermore, it has been reported that these algorithms also have heavy data loads when used with Web applications. For example, iPhones using Siri gobble up twice as much data as the previous iPhone model. This means that mobile service providers may soon have to expand the data transfer speed and bandwidth of their networks to keep up with the data requirements of these NLP algorithms.
The future of algorithms involves focusing on the massive task of dealing with the substantial amount of data being generated. We need to improve the technological power through distributed processing and cloud computing so that algorithms can be faster and more efficient. Improvement in hardware is not enough though, so we also need to focus on how the data is organised to make the data easier to sort and search. The Semantic Web will help with this and future algorithms can take full advantage of it. Finally natural language processing is a step forward in complex and advanced algorithms to create a way of searching and sorting various data to make interacting with technology more natural and intuitive.