What is Open Data & How to Use It
Open Data (public data) refers to information which is accessible by everyone to use and share. Common forms of open data include various types of governmental and data created by NGOs like UNICEF. This data type is convenient for example in increasing the transparency or improving the efficiency of public services.
What is Open Data?
Open data is data which is free to use and republish without restrictions, copyright, patents or other controls. An important source of open data is open government data (OGD) which allows citizens to access governmental data relevant to their lives free of charge. It aims to empower citizens and help small businesses to grow.
Open data is based on the idea that certain data should be free for everyone to access. Many advocates, such as scientists, want to see more open data, so that they can access what is new in the scientific world easily, stressing that this will lead to quicker breakthroughs. The Human Genome Programme is one example of where this has been implemented.
Advocates agree that open data will lead to greater technological innovations, economic growth and greater socio-economic well-being. One area that has rapidly expanded is free online educational courses which, it can be argued, will lead to a better educated and skilled society. Advocates argue that if data is based on public facts, funded with public money, or created by government agencies, then it should be free to the public.
Arguments against open data are that agencies need to charge for data in order to have the material resources to publish more data. It’s also argued that some data is private and should remain private and access to it should be restricted to certain groups.
Open data is defined by an Open Declaration that can be summed up as ‘’Open means anyone can freely access, use, modify, and share for any purpose (subject, at most, to requirements that preserve provenance and openness)”
The main tenets are that the data should be available as a whole, and at not more than a reasonable reproduction cost, in a convenient and modifiable form. That the data should be permitted to be reused and redistributed, including intermixing with other datasets. And that everyone should be free to use, reuse and distribute the data. No groups should be restricted, such as groups wanting to use the data for commercial purposes. And that there should be no restrictions to certain usage, e.g. only for educational purposes.
Open data should be open to interoperability (different systems or organisations work together-or intermix-different datasets). This ability to interplug (plug together) different systems is vital in building complex systems to provide better services and products.
Who uses Open Data and for what use cases?
Open data is already being utilised for holding governments to account and adding value to citizens’ lives, for fighting crime and preventing fraud, for keeping people healthy, to save lives in natural disasters and to get people from A to B, among other uses. Other proven benefits include self-empowerment, improved efficiency and effectiveness, new forms of measurement, and new knowledge from combining diverse datasets.
Businesses of all sizes, from large to small, are using open data, both governmental and non-governmental, to create innovative products and services. Open data makes it easier for businesses to find and access the data they need, so they can spend more time on being productive. Companies are using open data to help them to fill gaps in markets, identify new business opportunities and develop new business models, which in turn increases economic revenue and creates widespread social, economic and environmental benefits. This is why many SME’s, especially those offering consultancy services, use open data to grow their businesses.
The main types of open data used by businesses include geospatial data, transport data, environmental data and demographic data. Although many more types of open data are utilised to a lesser degree.
It’s impossible to predict all the uses open data will be put to in the future, as innovations often come from unlikely places. What we can say is that if data is openly sourced, the possibilities are greatly improved that innovations will greatly increase.
What are typical Open Data classifications?
Typical open data attributes include:
- Available- free to use
- Accessible- in an easily accessed form
- Reusable- can be republished in another form
- Redistributable- can be mixed with other data
- Unrestricted- can be used for both commercial and non-commercial purposes
How is Open Data typically collected?
There are several ways open data is typically collected. Open data is typically made publicly available:
- Through a data repository or publisher’s website.
- Assigned an open data licence, either with a PDDL (Public Domain Dedication and Licence) and put in the Public Domain.
- Obtains an Open Database Licence, wherein re-users are required to attribute and share changes with the original contributor. Examples of licences are: Database Contents Licence (DbCL) and Open Data Commons Open Database Licence (ODbL).
- The raw data must be prepared for publication by attending to: Quality, Technical Openness, Legal Openness and adding Metadata.
- Re-users must decide on the categories of open data that have most value to their business and choose open data accordingly.
How to assess the quality of Open Data?
Good quality open data is essential for success. When assessing the quality of open data, it is necessary to consider accuracy, timeliness, and is the data clean, complete and consistent. Checks that should be performed to assess the quality of the open data include:
- Check for completeness- The open data should have a header with a description, which is also described in the metadata. It should be labelled with a version number, which should be updated when updates are done so you can keep track of changes. It should contain information on its origin, what it is about, and why it has been published. And it should be given a status.
- Check for cleanliness- are there empty data fields, is the data correct, are there error or duplicate entries, is there privacy sensitive information, does it violate Legal Openness.
- Check for accuracy- is the data accurate enough for its assigned purpose, is it accurate enough to be reliable, does the data need aggregation or disaggregation
- Check for timeliness- is the data updated regularly, do you have a process in place to ensure this occurs automatically.
- Check for consistency- is the data presented consistently, make sure you stick to the same standard for each dataset.
How is Open Data typically priced?
Open data is typically priced in various ways, such as:
It is often free to the re-user.
Sometimes a small cost is incurred to cover the costs of publishing, so the original publisher can afford to publish more data.
Sometimes a larger cost may be charged if open data is to be used for commercial purposes.
- Government open data model, provided through public funds
- Community volunteers model, such as OpenStreetMap
- Advertising model- where costs to publishers are recovered through advertising revenue
Costs are mainly incurred by the publisher. In the case of governments and other public organisations, this means citizens. Publishers incur this cost for the returns of promoting transparency, improving efficiency, providing more effective services, encouraging innovation, creating new business opportunities, etc.
However, the prohibitive costs to publishers of publishing open data can lead to hidden costs for re-users.
These can include:
- Publishers publish less data, so there is less open data available.
- Publishers publish less frequently, so data is out-of-date
- Poorer cities can’t afford to publish as much open data as richer communities, so poorer communities lose out.
- Commercial companies put pressure on governments etc. to publish datasets that are useful for their purposes first, so the general public lose out.
- Greater transparency in the cost to governments etc. of providing open data would help alleviate these problems.
What are the common challenges when buying Open Data?
Common challenges with open data include:
- Inaccurate data
- Inconsistent data
- Out-of-date data
- Data in a form that can’t be easily accessed.
Other challenges include:
- Availability of open data
- Licences that don’t always allow the open data to be used for commercial purposes.
- Open data that can’t always be traced back to source.
- Labelling- open data is only useful if it can be clearly identified as open and traced back to its source so that publishers’ are acknowledged when it is reused. Data isn’t always properly labelled.
- Licencing- open data needs to be clearly defined as open and unrestricted for reuse. Data needs to be licenced clearly as either all right reserved or that it has an open knowledge licence.
- Research ethics restrictions- while open data is a valuable asset, steps must be taken to safeguard certain information such as personal information and to make sure proper consents for usage are in place. Other considerations are to ensure data is correctly identified as open and to ensure linkage (interoperability)protocols are in place. Best practises for privacy and confidentiality should always be observed.
What to ask Open Data providers?
- How accurate is the open data?
- How consistent is the data?
- Is the open data updated regularly?
- Is the data easily accessed?