Every OSINT expert was once a beginner. Learn these the need-to-know OSINT basics to jump start your OSINT journey.
As the world continues to digitize, gathering intelligence from online public data sources becomes necessary for organizations and individuals alike. For organizations, open-source intelligence (OSINT) data provides a cost-effective and instant method to get valuable information about different contexts related to business operations. For individual users, OSINT is used in some way daily to search for information online. Knowing OSINT tools and techniques will significantly simplify the search process for users when they need to find something online.
In this article, I will give a gentle introduction to the world of OSINT gathering, mentioning its types, search methods, and tools, and suggesting a methodology when gathering intelligence from publicly available data.
What is OSINT?
In a nutshell, open-source intelligence refers to all publicly available data. The majority of OSINT data is freely accessible. However, some exceptions exist, such as books, journals, and contents residing behind a paywall. These contents are also considered a part of OSINT, but you still need to pay money to access them – such as purchasing a subscription to access a digital library.
OSINT types
We start our article by talking about the volume of digital data currently available. However, OSINT is not only acquired from online sources; offline sources also play an essential role in OSINT gathering. Here are the primary sources for OSINT:
- Internet – this is the most-used source to get OSINT data. It includes all publicly available data online, such as news websites, blogs, discussion forums, social media platforms, government databases (deep web), dark web networks (TOR, I2P, Freenet) and anything available via the internet channel.
- Traditional media news – such as paper newspapers, TV, radio broadcasts and street advertisements.
- Grey literature – this includes academic journals, books, and dissertations. Some of this content require a user to pay for them.
- Corporate records, business meetings, conference proceedings, press releases, tax filings and annual reports
- Geospatial data – such as data acquired from online maps and satellite images
Who needs OSINT?
OSINT can be used across different sectors and industries. Almost all businesses and government agencies need to gather OSINT. Here are some examples of who might benefit from OSINT:
- Government agencies
- Law enforcement, intelligence services and military organizations
- Threat intelligence (Cybersecurity) consultancy firms
- Non-profit organizations - such as the UN and Red Cross
- Media companies – such as newspapers, TV, radio and advertisement companies
- Academia and researchers in different fields and studies
- Digital forensics investigators
- Financial institutions such as banks and insurance companies
- Healthcare originations
- Enterprises operating in different industries and fields – such as oil, telecom and internet service providers companies
OSINT gathering methodology
Having a methodology is excellent for basing your search on a predefined road. An OSINT methodology is vital for the following reasons:
- It provides a road map for collecting data from various open-source repositories.
- Helps ensure the data's integrity and accuracy by identifying reputable sources before beginning data collection.
- Having a methodology will allow OSINT gatherers to focus their efforts on specific objectives or goals.
- Following a predefined methodology will allow a consistent process through different OSINT investigation cases. This is important when doing OSINT in an organizational frame.
- A methodology will take into account data privacy acts and compliance regulations and adhere to them. This allows OSINT gatherers to comply and avoid breaching any law during OSINT investigations.
Here is a suggested search methodology for OSINT gathering:
Define scope
The first thing we need to do is define our search scope. Without a clear scope, we risk collecting a large volume of irrelevant data, significantly increasing the time required to finish the investigation and wasting precious resources (time, tools and personals).
Data collection
The second phase, we leverage the vast resources of OSINT to gather intelligence about our target. It involves the following activities:
Use search engines
Conventional search engines such as Google, Bing and Yahoo! will be used to find information about our target. We can use Google advanced search queries, also known as Google Dorks, to refine our search and return more precise results.
Here are some examples of using Google Dorks to find specific information about the domain name Authentic8.com
site:authentic8.com intext:"confidential" | Search for pages containing the keyword "confidential" across all pages of the domain name "Authentic8.com"
site:authentic8.com ext:doc OR ext:docx | Search for all MS Word files hosted on "authentic8.com"
site:authentic8.com intitle:"index of" | Reveals directory listing of the domain "authentic8.com" that can hold sensitive files or documents
Social media platforms
Checking the social media platforms of the target entity can reveal important information about their business, social relationships, mailing address, occupations, interests and affiliations, and activities, to name a few. Users also post all types of content on these platforms, such as photos, videos, and geolocation data. Digital files can be analyzed for metadata, and hashtags allow us to identify trending topics.
Each social media platform has a built-in search functionality to start our search using. In previous guides, I have covered searching for two popular social media websites. Here is a detailed guide on how to investigate Discord and Mastodon.
Public records databases
Government databases contain a plethora of information across different disciplines. Here are some examples of public databases and what we can get from them:
- Vital Records: contains information such as birth certificates, death certificates, marriage licenses and divorce records. Examples of such databases FamilySearch, Ancestry.com and National Center for Health Statistics
- Business records: contain information about companies such as: corporate registrations, business licenses, and filings. Examples of some corporate sources: OpenCorporates, Corporation Wiki and UK Companies House
- Legal records: contains information about legal cases and proceedings. In the USA, we can get these records through Public Access to Court Electronic Records (PACER)
- Criminal records: contains information about criminals' cases, warrants and wanted records. In the USA, we can find links to all criminal records across the USA in the Searchsystems directory in addition to National Sex Offender Registry public website
- Property records: contains information about property ownership. Examples include: US Realty Records, Homes.com and Redfin Corporation (Canada and USA)
News websites and archives
News websites provide information about current events, trends, and may contain valuable information about political and business events worldwide. Here are some links to news sources:
- NewsLibrary.com
- National Archives News
- Google News Archive Search
- Links to a large number of news sources around the world from Cornell University
Whois
WHOIS records provide valuable information about domain names and their registrants. Here are the types of information we can get from these databases:
- Registrar information – who registers the domain name, whether they are an individual or a company
- Registrant information – information about the registrant such as name, tier company (if applicable), email address, postal address, and phone number
- Technical and Administrative contacts
- Creation and expiration dates
- Status of the registered domain name - active, on hold, or pending deletion
Here are some links for popular WHOIS databases:
Data verification
In this phase, the OSINT analysis will verify the accuracy and reliability of the gathered data. It is vital to cross-reference information from multiple independent sources to ensure the credibility and accuracy of the gathered data. Understanding that not all information collected from public sources is reliable is critical. For instance, threat actors may spread misinformation to mislead the investigations, and leveraging Artificial Intelligence (AI) tools may produce inaccurate results in addition to biases that may exist along the OSINT gathering process.
In this phase, OSINT gatherers may need to use fact-checking websites, such as:
- PolitiFact – Fact check different areas, including politics, coronavirus and immigration issues
- FactCheck.org – USA politics
- Snopes – focuses on urban legends, hoaxes, and folklore
- Duke Reporters Lab – Links to numerous facts-checking websites
- Media Bias/Fact Check (MBFC)
Checking previous versions of websites is also essential to verify facts:
Data analysis
Now, the collected data is organized in a structured manner for analysis according to their relevancy, source origin, and topic. The following tasks are involved in this phase:
- Categorizing and indexing collected data
- Identify connections, relationships, and patterns within the collected data
- Uncover hidden connections within the data
- Research and adding contextual information to enhance understanding of the collected data contexts
- Applying analytical techniques – such as sentiment analysis and social network analysis
- Draw conclusions and establish a hypothesis based on the analyzed data
Reporting and presentation
The last phase involves presenting collected data in a formal report. We should consider using less technical jargon so non-tech savvy users can understand our report without further clarifications.
After finishing the OSINT gathering task, staying current on the latest search tools and techniques is essential. So, OSINT gatherers can update their skills and stay updated with the latest technological advancements.
OSINT is a powerful tool that allows individuals and organizations to gather valuable intelligence from publicly available data sources. By leveraging the vast amount of information available on the internet, social media platforms, public databases, news archives, and other open sources, OSINT practitioners can uncover insights, identify patterns, and deliver their findings in a formal report to help people in charge to make more informed decisions.
However, it's essential to approach your OSINT gathering task with a well-defined methodology. Following a structured approach similar to the one mentioned in this article will make you more confident that the gathered data is accurate, reliable, and compliant with relevant laws and regulations.