Data collection for analysis

Julius Anthony
5 min readMar 31, 2023

--

Data collection is the first step of the data analysis process, after properly identifying the research problem.

It is the process of gathering and measuring data from numerous sources. The objective of data collection is to obtain high quality data that will enable the analysis to answer the research questions in a convincing and credible way. This data can be used for analysis, decision-making, monitoring and evaluation. Data can be qualitative(Descriptive, conceptual and not represented in numbers) or quantitative(Numerical in nature)

Data Collection Methods

There are various ways of collecting data.

One of them is surveys. These are digital and physical questionnaires that gather both qualitative and quantitative data. Surveys can be conducted through various methods such as in-person interviews, telephone interviews, online questionnaires, or paper-based forms.

A paper based survey form

When using surveys as a data collection method, it is important to look out for biases. Biases are errors or distortions in the data collection process that can lead to inaccurate results.

For example, collection bias is when you create questions using biased wording. It forces a particular perspective that influences survey answers. An instance is the survey question

“Do you support the new government policy aimed at reducing taxes and promoting economic growth?”

The question already frames the government policy as a positive thing, without completely stating the potential drawbacks of the policy. This questions influences respondents to provide more positive responses, even if they have reservations or alternative perspectives on the policy.

So, it is recommended to phrase your survey questions with as much neutrality as you possibly can, to prevent the introduction of any preconceived notions or assumptions. A better way to word the question will be:

“What are your thoughts on the new government policy regarding taxes and its impact on economic growth?”

Subject bias is another form of bias present in questions. It is when the subjects answering the survey use answers that they will deem socially acceptable or expected, as opposed to their actual answers. Subject bias can occur due to various psychological and social factors, and it can lead to inaccurate or unreliable survey results. For example, when asked about their exercise habits, respondents may overstate the frequency or intensity of their physical activity to align with the expectation of being health-conscious. Or when respondents agree with a survey question regardless of their actual beliefs or knowledge. Another bias is selection bias, where the respondents chosen are not representative of the target population.

These biases can heavily sway the outcome of your survey, and for this reason, consider pairing survey data with behavioral data from other collection methods to get the full picture.

Another method of data collection is Forms, which is one of the most straightforward methods for collecting data. Online forms help in gathering qualitative data about users, specifically demographic data or contact information. They are simple to set up and generally cheap. There are multiple tools you can use to create forms, like Google forms, which all have a range of functions to help you maximise the use of forms as a data collection tool. Other form building tools that are just as great are: Microsoft Forms, Typeform, JotForm, Wufoo etc.

When creating a form, it is important to make the questions simple to understand and relevant to your research questions. Consider your target audience and avoid unnecessary questions that might frustrate your respondents. A badly developed form also heavily sways the outcome of the data collection process!

A Google form

A little tip: Also opt for user-friendly and aesthetically pleasing designs, ensure the user experience is as smooth as possible.

Online tracking is also a reliable data collection method that uses cookies and pixels. Cookies are small text files that websites store on your computer or device when you visit them. These files contain little pieces of information that the website wants to remember about you. The cookies can be further used to keep track of your session, personalise your preferences and gather data. Pixels are similar but are tiny pieces of code embedded in the webpages that allow websites collect data about users and track activities. Cookies and pixels can be used to track users’ behavioural data in your website, including which parts interest them the most, how long they spend on certain pages and whether users are confused when using it. This is a very effective method for data collection.

Web scraping is a data collection method that involves extracting information from websites. It is a process where software, known as a web scraper or crawler, automatically visits websites, navigates their pages, and collects data from the HTML or other structured formats.

Web scraping allows you to gather data from multiple websites quickly and efficiently, eliminating the need for manual data entry or browsing through individual web pages. It can be used to extract various types of data, such as text, images, prices, reviews, contact details, or any other data available on a website.

It’s important to note that not all websites permit scraping, and it’s always advisable to check a website’s terms of service or consult legal experts to ensure compliance. Some webscraping tools include: BeautifulSoup, which is a python library, Scrapy, a python framework, Selenium, Octoparse, ParseHub etc.

When selecting a tool, consider factors such as your technical proficiency, the complexity of the scraping task, the need for coding or visual interfaces, and the ability of the web scraping tool to handle large-scale scraping tasks efficiently and effectively. There are multiple tutorials that provide proper education on webscraping, such as this video tutorial by FreeCodeCamp and the ones on their website.

You can also monitor social media channels for follower engagement to track data about your audience’s interests and motivations. Many social media platforms have analytics built in, but there are also third-party social platforms that give more detailed, organized insights pulled from multiple channels.

Observations, Interviews and Focus groups are also fairly accurate methods of collecting data.

Regardless of the method used, it is important to ensure that the data collected is accurate, reliable, and relevant to the research question or problem.

Try your best to understand you data sources in order to spot errors in the data collection process more easily. Data collection is a very crucial stage of the data analysis process and mistakes made in this process have a way of greatly affecting any other process, be it Exploratory Data Analysis or Machine Learning. It is really helpful to develop a set of standards for this process to ensure consistency across your dataset.

Data quality can be improved by using standardized procedures, training data collectors, and using appropriate data validation and cleaning techniques. It is also important to consider ethical issues related to data collection, such as informed consent and confidentiality.

And after the data collection process has been carried out, the next step is to clean the data.

--

--

Julius Anthony

Hi. I'm Julius, an analyst and a writer. So, naturally, I'm here.