How do you perform software penetration testing

Anirudh Madhu K

Reviewed by

Pooja B

Published on

13 May 2026

16 min read

APPSEC

Penetration testing is well-documented as a concept and poorly documented as a practice. Most resources describe what it is, categorise it into types, and list tools. Very few explain how to actually do it against a web application.

What you set up before you start, what you look for at each phase, and what good output looks like at the end that is what most guides skip. This blog walks you through the full process with enough specificity to be useful whether you are running a test yourself, evaluating an automated tool, or briefing a vendor on what you need.

What software penetration testing is actually trying to do

Software penetration testing is an authorised, structured attempt to compromise an application the same way a real attacker would. The goal is not to produce a list of potential vulnerabilities that is what a vulnerability scanner does. The goal is to validate which vulnerabilities are genuinely exploitable, demonstrate the realistic impact of each one, and give the team owning the application a clear, prioritised path to fixing them.

This distinction matters. A scanner can tell you that a parameter might be vulnerable to SQL injection. A penetration test tells you whether it is, what data an attacker could extract if they exploited it, and whether that vulnerability can be combined with something else to produce a worse outcome than either issue would cause alone. The output of a pen test is actionable evidence, not a probability distribution.

Phase 1: Scoping and pre-engagement

Of all the phases in a penetration test, this phase is the one most teams skip or underinvest in. That is a mistake because everything that follows is only as good as what gets defined here. A poorly scoped test misses critical surface area, generates findings against environments that do not reflect production, and can cause unintended disruption when the rules of engagement were never clearly agreed. The phase feels administrative, but it is foundational.

Defining the target surface

The first decision is what you are testing. For a web application, that means specifying which URLs, subdomains, API endpoints, and authentication states are in scope. A single production application might have a public-facing frontend, an authenticated user portal, an admin interface, and a REST API, each with a different attack surface.

Be explicit. If a subdomain is not listed as in scope, a diligent tester will leave it alone. If it is in scope but not listed, it might not get tested at all. The target surface definition is what the tester uses to build their test plan.

Defining rules of engagement

Rules of engagement define what the tester is permitted to do. At a minimum, this includes whether the test is black-box, grey-box, or white-box; whether techniques like social engineering or phishing are in scope; whether denial-of-service testing is allowed; and how the tester should respond if they uncover an active intrusion during the engagement.

These constraints shape the entire methodology. If they are unclear or undocumented, the test either becomes too limited to be useful or too risky to run safely. Everything here needs to be agreed on and in writing before testing begins.

Setting up test credentials and environments

Most meaningful vulnerabilities exist behind authentication, so testing requires dedicated accounts across roles. Use least privilege, avoid real user data, and manage credentials securely through environment variables or a secrets manager(like AWS secret manager). Generate separate API keys or tokens for testing.

The environment should mirror production but remain isolated. A staging or QA setup is ideal. If testing in production is unavoidable, define safeguards like rate limits, monitoring, and rollback plans to prevent disruption.

Phase 2: Reconnaissance and application mapping

Before probing for vulnerabilities, the tester needs to understand the attack surface they are working with. This phase is about building an accurate picture of how the application behaves, how it is structured, and what technologies it relies on. Skipping it and jumping straight to scanning is how entire sections of an application go untested.

Passive reconnaissance

Passive reconnaissance gathers information without directly interacting with the application in a way that triggers logs. In software penetration testing, this includes DNS enumeration to identify subdomains and infrastructure, certificate transparency logs to uncover unadvertised hostnames, and analysis of JavaScript files that may expose internal endpoints, third-party integrations, or leftover development comments. Archive crawls and public repository searches can reveal older versions of the application or leaked configuration data. It establishes a baseline before active testing begins.

Active application mapping

Active mapping involves interacting with the application to catalogue endpoints, parameters, and functionality. In software penetration testing, this includes crawling pages, intercepting requests through a proxy to capture hidden parameters, and walking through all workflows registration, login, account management, data submission, and admin functions.

The goal is completeness. Many high-impact issues, especially in authorization, only appear when the tester understands exactly what is exposed and under which user contexts.

Technology fingerprinting

Identifying the technology stack narrows what to test. In software penetration testing, server headers, cookies, error messages, and response patterns reveal frameworks, platforms, and dependencies. A WordPress instance presents a very different attack surface from a react frontend backed by a GraphQL API. Each comes with its own vulnerability classes and testing approach. Known CVEs in detected components, framework versions, and JavaScript dependencies are flagged here for validation in the next phase.

Phase 3: Vulnerability Identification

This is the core of the engagement. The tester works systematically through every major vulnerability category, anchored to the OWASP Top 10 throughout, and goes beyond it for API-specific and business logic issues.

Input validation and injection testing

SQL injection, command injection, XML injection, and server-side template injection are all tested by sending malformed or adversarial input to every parameter the application accepts. This includes form fields, URL parameters, HTTP headers, and API request bodies. The test is looking for application behavior that suggests input is being interpreted rather than treated as data.

Authentication and session testing

Authentication testing covers password policy enforcement, account lockout behavior, session token entropy, session fixation, token expiry, and multi-factor authentication bypass. For multi-role applications, it also verifies that authenticating at one role level does not grant access to functions reserved for higher roles.

Access control and authorization testing

This is where insecure direct object references (IDOR), horizontal privilege escalation, and broken function-level authorization are found. The technique is systematic: log in as a low-privilege user, identify resources that belong to other users or roles, and attempt to access them by manipulating identifiers. This category of vulnerability is responsible for a significant proportion of real-world data exposure and cannot be found by scanners.

Business logic testing

Business logic vulnerabilities do not appear in any signature database. The tester needs to understand how the application is supposed to work and then look for ways to make it behave differently skipping workflow steps, manipulating pricing logic, accessing functions in the wrong sequence, or submitting values outside expected ranges. This phase requires the most manual judgment and the most application-specific knowledge.

API-specific testing

Applications exposing REST or GraphQL APIs require a separate and distinct testing methodology from standard web application testing. Key test cases include broken object-level authorization (BOLA) where an API endpoint returns data it should not based on the object identifier in the request excessive data exposure in API responses that return more fields than the frontend displays, missing rate limiting that enables enumeration or credential stuffing, and GraphQL introspection being left enabled in production, exposing schema details that should not be publicly accessible.

The OWASP API Security Top 10 is the relevant framework here. Many web application scanners do not cover APIs adequately, and API security requires explicit attention and the right toolset.

Phase 4: Exploitation and impact validation

The distinction between a vulnerability assessment and a penetration test is exploitation. A penetration test does not stop at identifying potential weaknesses, it validates that they are real and demonstrates their actual impact.

Validating findings, not just flagging them

A pen test finding without exploitation evidence is a hypothesis. Exploitation confirms the vulnerability exists, establishes the conditions required to trigger it, and demonstrates what an attacker could achieve. This is what separates a pen test report from scanner output. A scanner flags a potential SQL injection. A pen test confirms it, demonstrates data extraction, and documents exactly what database objects are accessible. The difference matters enormously for triage and remediation prioritisation: a confirmed, demonstrated finding is unambiguous.A potential finding requires investigation before anyone acts on it. A confirmed finding does not.

Chaining vulnerabilities

Real attacks rarely rely on a single vulnerability. A pen test should attempt to chain findings combining a low-severity information disclosure with a medium-severity authentication bypass to demonstrate a high-severity attack path.

This kind of chained exploitation is one of the most valuable outputs of a manual pen test and one that automated scanners cannot replicate. It reflects how real threat actors operate and often reveals that the aggregate risk of a set of findings is substantially higher than the sum of individual CVSS scores would suggest.

Documenting impact in business terms

Every exploited finding should be documented with both its technical detail and its business impact framing.

“SQL injection in the search parameter” is a technical finding. “SQL injection in the search parameter allows unauthenticated read access to the user database, including email addresses and hashed passwords” is a pen test finding. The business impact framing is what makes findings actionable for stakeholders who are not parsing CVE descriptions and it is the framing that determines whether a critical vulnerability gets the resources and urgency it requires.

Phase 5: Reporting and remediation

The report is not the conclusion, it is the beginning of remediation. What it contains and how findings are acted on determines whether the engagement produces lasting value.

What a pen test report should contain

A complete pen test report includes risk-ranked findings with CVSS scores, exploitation evidence, business impact statements, and specific remediation guidance for each finding. It should also include an executive summary for non-technical stakeholders and a technical appendix with full reproduction steps.

Findings should be mapped to the OWASP Top 10 and any applicable compliance frameworks where relevant. A report that lists vulnerability names without evidence and remediation advice without context is not a useful deliverable.

Remediation and retesting

Each finding should be assigned to an owner, triaged by risk level, and remediated in priority order critical and high findings first. After remediation, findings should be retested to confirm the fix is effective and has not introduced new issues.

Many vendors include one retest cycle in their engagement fee. Confirm this before signing an agreement. A patch that closes one vulnerability but opens another is not a net improvement.

What comes after the report

A point-in-time pen test captures the application at one moment. Applications that continue to ship code need continuous coverage between engagements — a report that was accurate in January may not reflect the application in March. Automated penetration testing integrated into the CI/CD pipeline runs attack simulations on every significant release, catching regressions before they reach production. This is how the output of a pen test becomes an ongoing security posture rather than a compliance artifact.

For a clearer picture of where this fits relative to static analysis, the distinction between DAST and SAST is worth understanding before you configure your pipeline.

Where agentic AI pentesting fits in the process

The five phases above describe a complete manual engagement. For teams shipping code continuously, the question is how to maintain that standard between annual or biannual engagements and that is where Agentic AI penetration testing enters the picture.

Agentic AI penetration testing tools cover phases two through four at speed and scale: application mapping, vulnerability identification across known categories, and authenticated attack simulation against web applications and APIs. Their primary value is in the continuous layer running the same methodology on every release rather than once or twice a year, ensuring that the attack surface is being assessed as consistently as it is being extended.

Tools like Beagle Security are built for this layer, providing agentic AI penetration testing of web applications and APIs without requiring a manual engagement each time code ships and without exposing application data to third-party LLMs, which matters when the application under test handles sensitive data.

What automation does not replace is judgment. Business logic testing, vulnerability chaining, and findings that need narrative context still require a human tester who understands the application. The practical answer for most teams is both: a manual pen test for depth, and Agentic AI penetration testing to hold the line between engagements.

Final thoughts

Software penetration testing is not a tool you run or a box you check. It is a structured injection flaw that wasn’t in scope last quarter may be in production today.process: scoped precisely, mapped thoroughly, exploited deliberately, and reported in terms that drive real remediation. The phases in this guide exist because skipping any one of them is how critical vulnerabilities get missed, or found too late.

The honest limitation is that even a rigorous software penetration testing engagement is a snapshot. The moment code ships again, that picture starts to age. Business logic, access control.

That is where Agentic AI penetration testing closes the gap running the same attack methodology continuously across your web applications and APIs, catching what changes between manual engagements before an attacker does. If you are looking for an accessible way to get started with automated penetration testing, Beagle Security offers a 14-day free trial and a free interactive demo to help you evaluate whether it fits your workflow before you commit.

FAQs

What is a penetration test in software?

A software penetration test is a simulated attack on your application carried out by security professionals. The goal is to find vulnerabilities before real attackers do, confirm whether they can actually be exploited, and document what the impact would be. The result is a clear, evidence-backed report your team can act on.

What is the timeline for software penetration testing?

Most web application penetration tests take between 5 to 10 business days to complete. The exact timeline depends on how large the application is, how many user roles exist, and whether APIs are in scope. Report delivery and retesting add time on top of that.

Written by