Getting started with web scraping can seem daunting, but WebStruct.AI makes it simple with natural language commands. This tutorial will walk you through creating your first scraping pipeline from start to finish.
Step 1: Setting Up Your Account
First, create your free WebStruct.AI account:
- Visit the WebStruct.AI homepage
- Click "Get Started Free"
- Complete the registration process
- Verify your email address
Step 2: Understanding the Dashboard
Once logged in, you'll see the main dashboard with two key sections:
- New Scrape: Where you create new scraping jobs
- Scrape History: View and manage your previous scrapes
Step 3: Your First Scrape
Let's create a simple scrape to extract product information from an e-commerce site.
Choose Your Target URL
For this example, we'll use a product listing page. Enter the URL in the "Website URL" field:
https://example-store.com/products
Write Your Command
In the "Scraping Command" field, describe what you want to extract in natural language:
"Extract all product names, prices, and customer ratings from this page"
Step 4: Understanding Commands
WebStruct.AI uses natural language processing to understand your commands. Here are some effective command patterns:
Basic Extraction
- "Get all article titles and publication dates"
- "Extract product names and prices"
- "Find all email addresses and phone numbers"
Specific Targeting
- "Get the top 10 search results with titles and URLs"
- "Extract only products with ratings above 4 stars"
- "Find all job postings in the technology category"
Complex Queries
- "Extract product details including name, price, description, and availability status"
- "Get all news articles with headlines, summaries, authors, and publication dates"
Step 5: Running Your Scrape
After entering your URL and command:
- Click "Start Scraping"
- Monitor the job status in real-time
- Wait for completion (usually 30 seconds to 2 minutes)
Step 6: Reviewing Results
Once complete, you can:
- View extracted data in the dashboard
- Download results as CSV or JSON
- Analyze data quality and completeness
Step 7: Building Automation
For recurring scraping needs, consider:
API Integration
Use our REST API to automate scraping from your applications:
POST /api/v1/scrape
{
"url": "https://example.com",
"command": "Extract all product data",
"format": "json"
}
Webhooks
Set up webhooks to receive notifications when scrapes complete.
Common Challenges & Solutions
Dynamic Content
If a page loads content with JavaScript, mention this in your command:
"Wait for the page to fully load, then extract all product information"
Pagination
For multi-page results:
"Extract all products from this page and follow pagination links"
Data Quality
Always review your results and refine commands for better accuracy.
Best Practices
- Start with simple commands and gradually increase complexity
- Test on a few pages before scaling up
- Be specific about the data you need
- Respect website rate limits and terms of service
- Regularly monitor and maintain your scraping workflows
Next Steps
Now that you've created your first scraping pipeline:
- Experiment with different websites and commands
- Explore our API documentation for advanced features
- Consider upgrading to Pro for higher limits and priority support
- Join our community Discord for tips and best practices
Happy scraping!