HakHub LLC

HakHub LLC Welcome to HakHub 🛡️ – Your Ally in the Cybersecurity Frontier.

01/02/2025

Python Tips For Beginners

3 Essential Python Tips Every Beginner Should Learn

Python is one of the most beginner-friendly programming languages, making it a favorite among new developers. But even with its simplicity, there are many handy tips and tricks that can level up your coding skills. Whether you're exploring Python for the first time or looking for a refresher, here are three essential tips every beginner should master.

---

# # # 1. Master the Art of List Comprehensions

List comprehensions are a concise way to create lists in Python. They allow you to replace traditional `for` loops with a single line of code that’s both readable and efficient.

# # # # Example:

```python
# Traditional way to create a list of squares
squares = []
for i in range(10):
squares.append(i**2)

# Using list comprehension
squares = [i**2 for i in range(10)]
```

Why it’s Useful:

- Saves time and lines of code.
- Makes your code more Pythonic and easier to read.

# # # # Pro Tip: You can also add conditions to your list comprehensions:

```python
even_squares = [i**2 for i in range(10) if i % 2 == 0]
```

This will create a list of squares for even numbers only.

---

# # # 2. Use F-Strings for Cleaner String Formatting

Introduced in Python 3.6, f-strings are the most intuitive way to format strings. They’re not only cleaner but also faster than older methods like `.format()` or `%` formatting.

# # # # Example:

```python
name = "Alice"
age = 25

# Using f-strings
print(f"My name is {name} and I am {age} years old.")
```

Why it’s Useful:

- Embeds variables directly into strings.
- Improves readability and reduces chances of errors.

# # # # Pro Tip: You can also use expressions inside f-strings:

```python
print(f"Next year, I will be {age + 1} years old.")
```

---

# # # 3. Leverage Python’s Built-in Functions

Python comes with a rich set of built-in functions that can save you time and effort. As a beginner, familiarize yourself with some of the most commonly used ones, like `len()`, `sum()`, `max()`, and `min()`.

# # # # Example:

```python
numbers = [10, 20, 30, 40, 50]

# Get the total
total = sum(numbers)

# Find the largest number
largest = max(numbers)

# Find the smallest number
smallest = min(numbers)

print(f"Total: {total}, Largest: {largest}, Smallest: {smallest}")
```

Why it’s Useful:

- Simplifies common tasks without writing extra code.
- Makes your programs more efficient and readable.

# # # # Pro Tip: Explore the Python documentation to discover even more built-in functions that can make your life easier. For example, `zip()` for combining lists or `sorted()` for sorting.

---

# # # Conclusion

Learning Python is a rewarding journey, and mastering these tips will give you a solid foundation. By using list comprehensions, f-strings, and built-in functions, you’ll write cleaner, more efficient, and professional-looking code in no time.

Start practicing these techniques in your next project, and don’t forget to share your progress with the Hakhub community on Reddit and Facebook!

---

What are your favorite Python tips? Share them in the comments below!

01/01/2025

Another Day trying to figure out Scrapling. I have been reading through the docs setting up the environment for both Linux as well as Windows to figure out how straightforward everything actually is or is not. For Someone to just open their computer and navigate to GitHub to test drive something they might not be into or maybe just find that they are relatively new at what they want to be doing. This was some time spent today messing around with it again and then i found the code and started reading some of that along with the docs and feel like I am starting over again.

12/30/2024

I realized I hadn't really followed up with the walk-through of Scraplings Code review. I wa'nted to at the very least discover whether or not any of this s**t worked out of the box or in its most basic forms of code examples throughout the repository and I have only found issues. Take it for what it's worth, I am no expert, but for anyone on or of a similar experience level then good luck. I have been going through each section and trying to make sense of it; most of this is what I have been sharing.

The first thing I did was mkdir 'project_directory' cd 'project_directory' touch scraper.py

python -m venv .venv code .

The instalation was straightforward and as easy as you can imagine when it comes to installing a python package. Make sure you have python installed and for this it was 3.9 or better and then use pip to install the package. afterward it was Scrapling install and that was that.

12/26/2024

Day Two Conclusion Scrapling Code Review

# # # Conclusion for Day Two

Today, we focused on refining the web scraping script using Scrapling to extract quotes from `https://quotes.toscrape.com`. We encountered and resolved several issues:

1. **Debugging CSS Extraction**:
- Initially, we faced an issue where the `.text::text` selection returned a `TextHandler` object without a `.text` attribute. This was resolved by directly handling the `TextHandler` object and using its string representation for debugging.

2. **Duplicate Output**:
- The script produced duplicate results for the CSS extraction section. This was due to an unintentional repeated block of code. The duplicate was identified and removed, streamlining the extraction process.

3. **Validation of Extraction Methods**:
- Three different methods—CSS selectors, XPath, and `find_all`—were implemented and validated. Each method successfully extracted the same set of quotes, confirming the reliability of the scraping logic.

4. **Stealthy Headers**:
- Adding `stealthy_headers=True` to the fetcher ensured that our scraper mimicked a real browser, avoiding potential blocks by the server.

# # # Next Steps
- **Push to Repository or Keep Locally?**
- If the script is part of a collaborative project or if it adds significant improvements over the previous version, it’s a good idea to **push it to the repository**. However, ensure the following before pushing:
- Update the README with instructions for running the script.
- Remove debug statements or leave them optionally enabled (e.g., using a flag or logging levels).
- Test the script in a clean environment to confirm it works seamlessly.

If this is more of a personal experiment or if the changes are minor, you can keep it locally for now and push it after further iterations.

# # # Summary
This was a productive day that helped refine the scraping script and improve its reliability. Each extraction method is now functional, with consistent results. The next focus could be on error handling, adding tests, or expanding functionality (e.g., handling pagination or exporting data to a file).

Call now to connect with business.

12/26/2024

Day Two Continued

Great! Let’s refine the script step by step as we walk through each method Scrapling offers. Along the way, we’ll clean up the output and make it more structured and advanced. Here's how we'll approach this:

---

# # # Step 1: Fetching the Web Page

The initial step is fetching the page. This step has been confirmed to work correctly, as evidenced by the log message indicating a successful HTTP 200 response. No changes are needed here.

---

# # # Step 2: CSS Selector (`page.css`)

# # # # Current Behavior
The `page.css('.quote .text::text')` method extracts all quotes as plain text. The output is readable but not structured.

# # # # Refinement
We’ll structure the output into a list of dictionaries, each containing the quote text, author, and tags. This improves readability and usability.

# # # # Updated Code
```python
# Extract quotes, authors, and tags using CSS selectors
quotes = []
for quote in page.css('.quote'):
text = quote.css_first('.text::text').text()
author = quote.css_first('.author::text').text()
tags = [tag.text() for tag in quote.css('.tag::text')]
quotes.append({"text": text, "author": author, "tags": tags})

# Print structured output
print("Extracted Quotes (Structured):")
for quote in quotes:
print(f"- Quote: {quote['text']}")
print(f" Author: {quote['author']}")
print(f" Tags: {', '.join(quote['tags'])}")
```

---

# # # Step 3: XPath (`page.xpath`)

# # # # Current Behavior
The `page.xpath('//span[="text"]/text()')` method extracts only the quote text, similar to the CSS selector.

# # # # Refinement
We’ll use XPath to extract the same structured data (quote, author, tags) for comparison with the CSS method.

# # # # Updated Code
```python
# Extract quotes, authors, and tags using XPath
quotes_xpath = []
for quote in page.xpath('//div[="quote"]'):
text = quote.xpath('.//span[="text"]/text()')[0]
author = quote.xpath('.//small[="author"]/text()')[0]
tags = quote.xpath('.//div[="tags"]/a[="tag"]/text()')
quotes_xpath.append({"text": text, "author": author, "tags": tags})

# Print structured output for XPath
print("Extracted Quotes (XPath):")
for quote in quotes_xpath:
print(f"- Quote: {quote['text']}")
print(f" Author: {quote['author']}")
print(f" Tags: {', '.join(quote['tags'])}")
```

---

# # # Step 4: Chained Selectors (`page.css().css()`)

# # # # Current Behavior
Chained selectors achieve the same result as `page.css('.quote .text::text')`, but they are slower and less efficient.

# # # # Refinement
Since chained selectors are redundant for this use case, we’ll skip restructuring this part but include a note in the code explaining the inefficiency.

---

# # # Step 5: `find_all`

# # # # Current Behavior
The `find_all` method returns raw HTML elements, which are less readable.

# # # # Refinement
We’ll parse the HTML elements into structured data (quote, author, tags) for consistency with the previous methods.

# # # # Updated Code
```python
# Extract quotes, authors, and tags using find_all
quotes_find_all = []
for quote in page.find_all('div', {'class': 'quote'}):
text = quote.find('span', class_='text').text()
author = quote.find('small', class_='author').text()
tags = [tag.text() for tag in quote.find_all('a', class_='tag')]
quotes_find_all.append({"text": text, "author": author, "tags": tags})

# Print structured output for find_all
print("Extracted Quotes (find_all):")
for quote in quotes_find_all:
print(f"- Quote: {quote['text']}")
print(f" Author: {quote['author']}")
print(f" Tags: {', '.join(quote['tags'])}")
```

---

# # # Step 6: Raw HTML Output

# # # # Current Behavior
Raw HTML elements are printed directly. This is helpful for debugging but not for final output.

# # # # Refinement
We’ll use `.prettify()` and `.attrib` only for debugging purposes and include a toggle to enable/disable it.

# # # # Updated Code
```python
# Debug: Print first quote element's raw HTML
if debug_mode:
first_quote = page.css_first('.quote')
print("First Quote Element (Raw HTML):")
print(first_quote.prettify())
```

---
Consolidated and Refined Script

Below is the finalized version of the script, incorporating refinements for structured outputs and debugging options:

from scrapling import Fetcher

fetcher = Fetcher(auto_match=False)

# Fetch a web page and create an Adaptor instance
page = fetcher.get('https://quotes.toscrape.com/', stealthy_headers=True)
# Get all strings in the full page
page.get_all_text(ignore_tags=('script', 'style'))

# Get all quotes, any of these methods will return a list of strings (TextHandlers)
quotes = page.css('.quote .text::text') # CSS selector
quotes = page.xpath('//span[="text"]/text()') # XPath
quotes = page.css('.quote').css('.text::text') # Chained selectors
quotes = [element.text for element in page.css('.quote .text')] # Slower than bulk query above

# Get the first quote element
quote = page.css_first('.quote') # / page.css('.quote').first / page.css('.quote')[0]

# Tired of selectors? Use find_all/find
quotes = page.find_all('div', {'class': 'quote'})
# Same as
quotes = page.find_all('div', class_='quote')
quotes = page.find_all(['div'], class_='quote')
quotes = page.find_all(class_='quote') # and so on...

# Working with elements
quote.html_content # Inner HTML
quote.prettify() # Prettified version of Inner HTML
quote.attrib # Element attributes
quote.path # DOM path to element (List)

The script has been consolidated and refined into a more structured and user-friendly format. The updates include support for CSS, XPath, and `find_all` methods, as well as a debugging option for raw HTML inspection. The problem is that it doesnt work how its supposed to so we have got to keep going with this

The issue with the edits stopping the script from working could stem from subtle changes in how data is extracted or stored. Let's simplify the debugging and ensure consistent output.

I'll revise the problematic parts of the script and ensure all methods (CSS, XPath, and `find_all`) are debugged thoroughly.

Here's a revised script:

```python
from scrapling import Fetcher

fetcher = Fetcher(auto_match=False)

# Fetch a web page and create an Adaptor instance
page = fetcher.get('https://quotes.toscrape.com/', stealthy_headers=True)

# Extract quotes using CSS selectors
quotes_css = []
try:
for quote in page.css('.quote'):
text = quote.css_first('.text::text').text() if quote.css_first('.text::text') else "N/A"
author = quote.css_first('.author::text').text() if quote.css_first('.author::text') else "N/A"
tags = [tag.text() for tag in quote.css('.tag::text')] if quote.css('.tag::text') else []
quotes_css.append({"text": text, "author": author, "tags": tags})
except Exception as e:
print(f"Error with CSS extraction: {e}")

# Print CSS extraction
print("\nExtracted Quotes (CSS):")
for quote in quotes_css:
print(f"- Quote: {quote['text']}")
print(f" Author: {quote['author']}")
print(f" Tags: {', '.join(quote['tags'])}")

# Extract quotes using XPath
quotes_xpath = []
try:
for quote in page.xpath('//div[="quote"]'):
text = quote.xpath('.//span[="text"]/text()')[0] if quote.xpath('.//span[="text"]/text()') else "N/A"
author = quote.xpath('.//small[="author"]/text()')[0] if quote.xpath('.//small[="author"]/text()') else "N/A"
tags = quote.xpath('.//div[="tags"]/a[="tag"]/text()') if quote.xpath('.//div[="tags"]/a[="tag"]/text()') else []
quotes_xpath.append({"text": text, "author": author, "tags": tags})
except Exception as e:
print(f"Error with XPath extraction: {e}")

# Print XPath extraction
print("\nExtracted Quotes (XPath):")
for quote in quotes_xpath:
print(f"- Quote: {quote['text']}")
print(f" Author: {quote['author']}")
print(f" Tags: {', '.join(quote['tags'])}")

# Extract quotes using find_all
quotes_find_all = []
try:
for quote in page.find_all('div', {'class': 'quote'}):
text = quote.find('span', class_='text').text() if quote.find('span', class_='text') else "N/A"
author = quote.find('small', class_='author').text() if quote.find('small', class_='author') else "N/A"
tags = [tag.text() for tag in quote.find_all('a', class_='tag')] if quote.find_all('a', class_='tag') else []
quotes_find_all.append({"text": text, "author": author, "tags": tags})
except Exception as e:
print(f"Error with find_all extraction: {e}")

# Print find_all extraction
print("\nExtracted Quotes (find_all):")
for quote in quotes_find_all:
print(f"- Quote: {quote['text']}")
print(f" Author: {quote['author']}")
print(f" Tags: {', '.join(quote['tags'])}")
```

# # # Key Fixes:
1. **Error Handling:**
- Wrap extraction loops in `try-except` blocks to catch and report errors without stopping ex*****on.

2. **Null Checks:**
- Ensure all `.text()` or `.xpath()` calls are safely checked to avoid `AttributeError` when elements are missing.

3. **Consistent Logging:**
- Print outputs for each method even if a failure occurs to help debug inconsistencies.

Run this and let me know the results!

└─$ /home/fahm/scrapling_reddit/venv/bin/python /home/fahm/scrapling_reddit/scraper.py
[2024-12-26 04:34:19] INFO: Fetched (200) (referer: https://www.google.com/search?q=toscrape)
Error with CSS extraction: 'TextHandler' object has no attribute 'text'

Extracted Quotes (CSS):

Extracted Quotes (XPath):
- Quote: “The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”
Author: Albert Einstein
Tags: change, deep-thoughts, thinking, world
- Quote: “It is our choices, Harry, that show what we truly are, far more than our abilities.”
Author: J.K. Rowling
Tags: abilities, choices
- Quote: “There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”
Author: Albert Einstein
Tags: inspirational, life, live, miracle, miracles
- Quote: “The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”
Author: Jane Austen
Tags: aliteracy, books, classic, humor
- Quote: “Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”
Author: Marilyn Monroe
Tags: be-yourself, inspirational
- Quote: “Try not to become a man of success. Rather become a man of value.”
Author: Albert Einstein
Tags: adulthood, success, value
- Quote: “It is better to be hated for what you are than to be loved for what you are not.”
Author: André Gide
Tags: life, love
- Quote: “I have not failed. I've just found 10,000 ways that won't work.”
Author: Thomas A. Edison
Tags: edison, failure, inspirational, paraphrased
- Quote: “A woman is like a tea bag; you never know how strong it is until it's in hot water.”
Author: Eleanor Roosevelt
Tags: misattributed-eleanor-roosevelt
- Quote: “A day without sunshine is like, you know, night.”
Author: Steve Martin
Tags: humor, obvious, simile
Error with find_all extraction: 'TextHandler' object is not callable

Extracted Quotes (find_all):

(venv) ┌──(venv)(fahm㉿MININT-FCGQJBQ)-[~/scrapling_reddit]
└─$

The output indicates two issues:

1. **CSS Extraction Error**:
The error message `'TextHandler' object has no attribute 'text'` suggests that the `.text()` method is being incorrectly applied to a `TextHandler` object. Instead, the correct property is likely `.text` (without parentheses).

2. **`find_all` Extraction Error**:
The error `'TextHandler' object is not callable` implies that a `TextHandler` object was mistakenly treated as a callable, likely where `.text()` or `.text` was expected.

# # # Fixes for the Issues

Here’s the updated script with corrections:

```python
from scrapling import Fetcher

fetcher = Fetcher(auto_match=False)

# Fetch a web page and create an Adaptor instance
page = fetcher.get('https://quotes.toscrape.com/', stealthy_headers=True)

# Extract quotes using CSS selectors
quotes_css = []
try:
for quote in page.css('.quote'):
text = quote.css_first('.text::text').text if quote.css_first('.text::text') else "N/A"
author = quote.css_first('.author::text').text if quote.css_first('.author::text') else "N/A"
tags = [tag.text for tag in quote.css('.tag::text')] if quote.css('.tag::text') else []
quotes_css.append({"text": text, "author": author, "tags": tags})
except Exception as e:
print(f"Error with CSS extraction: {e}")

# Print CSS extraction
print("\nExtracted Quotes (CSS):")
for quote in quotes_css:
print(f"- Quote: {quote['text']}")
print(f" Author: {quote['author']}")
print(f" Tags: {', '.join(quote['tags'])}")

# Extract quotes using XPath
quotes_xpath = []
try:
for quote in page.xpath('//div[="quote"]'):
text = quote.xpath('.//span[="text"]/text()')[0] if quote.xpath('.//span[="text"]/text()') else "N/A"
author = quote.xpath('.//small[="author"]/text()')[0] if quote.xpath('.//small[="author"]/text()') else "N/A"
tags = quote.xpath('.//div[="tags"]/a[="tag"]/text()') if quote.xpath('.//div[="tags"]/a[="tag"]/text()') else []
quotes_xpath.append({"text": text, "author": author, "tags": tags})
except Exception as e:
print(f"Error with XPath extraction: {e}")

# Print XPath extraction
print("\nExtracted Quotes (XPath):")
for quote in quotes_xpath:
print(f"- Quote: {quote['text']}")
print(f" Author: {quote['author']}")
print(f" Tags: {', '.join(quote['tags'])}")

# Extract quotes using find_all
quotes_find_all = []
try:
for quote in page.find_all('div', {'class': 'quote'}):
text = quote.find('span', class_='text').text if quote.find('span', class_='text') else "N/A"
author = quote.find('small', class_='author').text if quote.find('small', class_='author') else "N/A"
tags = [tag.text for tag in quote.find_all('a', class_='tag')] if quote.find_all('a', class_='tag') else []
quotes_find_all.append({"text": text, "author": author, "tags": tags})
except Exception as e:
print(f"Error with find_all extraction: {e}")

# Print find_all extraction
print("\nExtracted Quotes (find_all):")
for quote in quotes_find_all:
print(f"- Quote: {quote['text']}")
print(f" Author: {quote['author']}")
print(f" Tags: {', '.join(quote['tags'])}")
```

# # # Key Changes:
1. **CSS Selector `.text` Property**:
- Replaced `.text()` with `.text` in CSS extraction to access the property correctly.

2. **`find_all` Fixes**:
- Correctly used `.text` instead of treating `TextHandler` objects as callable.

# # # Next Steps:
Run the updated script and let me know if all methods (`CSS`, `XPath`, and `find_all`) now produce consistent outputs.

[2024-12-26 04:35:48] INFO: Fetched (200) (referer: https://www.google.com/search?q=toscrape)
Error with CSS extraction: 'TextHandler' object has no attribute 'text'

Extracted Quotes (CSS):

Extracted Quotes (XPath):
- Quote: “The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”
Author: Albert Einstein
Tags: change, deep-thoughts, thinking, world
- Quote: “It is our choices, Harry, that show what we truly are, far more than our abilities.”
Author: J.K. Rowling
Tags: abilities, choices
- Quote: “There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”
Author: Albert Einstein
Tags: inspirational, life, live, miracle, miracles
- Quote: “The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”
Author: Jane Austen
Tags: aliteracy, books, classic, humor
- Quote: “Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”
Author: Marilyn Monroe
Tags: be-yourself, inspirational
- Quote: “Try not to become a man of success. Rather become a man of value.”
Author: Albert Einstein
Tags: adulthood, success, value
- Quote: “It is better to be hated for what you are than to be loved for what you are not.”
Author: André Gide
Tags: life, love
- Quote: “I have not failed. I've just found 10,000 ways that won't work.”
Author: Thomas A. Edison
Tags: edison, failure, inspirational, paraphrased
- Quote: “A woman is like a tea bag; you never know how strong it is until it's in hot water.”
Author: Eleanor Roosevelt
Tags: misattributed-eleanor-roosevelt
- Quote: “A day without sunshine is like, you know, night.”
Author: Steve Martin
Tags: humor, obvious, simile

Extracted Quotes (find_all):
- Quote: “The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”
Author: Albert Einstein
Tags: change, deep-thoughts, thinking, world
- Quote: “It is our choices, Harry, that show what we truly are, far more than our abilities.”
Author: J.K. Rowling
Tags: abilities, choices
- Quote: “There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”
Author: Albert Einstein
Tags: inspirational, life, live, miracle, miracles
- Quote: “The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”
Author: Jane Austen
Tags: aliteracy, books, classic, humor
- Quote: “Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”
Author: Marilyn Monroe
Tags: be-yourself, inspirational
- Quote: “Try not to become a man of success. Rather become a man of value.”
Author: Albert Einstein
Tags: adulthood, success, value
- Quote: “It is better to be hated for what you are than to be loved for what you are not.”
Author: André Gide
Tags: life, love
- Quote: “I have not failed. I've just found 10,000 ways that won't work.”
Author: Thomas A. Edison
Tags: edison, failure, inspirational, paraphrased
- Quote: “A woman is like a tea bag; you never know how strong it is until it's in hot water.”
Author: Eleanor Roosevelt
Tags: misattributed-eleanor-roosevelt
- Quote: “A day without sunshine is like, you know, night.”
Author: Steve Martin
Tags: humor, obvious, simile

(venv) ┌──(venv)(fahm㉿MININT-FCGQJBQ)-[~/scrapling_reddit]
└─$

The updated results highlight the following:

1. **CSS Extraction Issue**:
The error `'TextHandler' object has no attribute 'text'` persists, indicating that the `.css_first` method is not returning the expected object type or the Scrapling library's behavior differs from the assumption. This needs a closer inspection of how Scrapling handles `.css_first` and what it returns.

2. **XPath and `find_all` Success**:
Both XPath and `find_all` methods are working correctly, producing consistent and accurate results.

---

# # # Debugging CSS Extraction:
Let’s investigate and address the issue with CSS extraction. Based on the Scrapling documentation and similar cases, the problem likely arises from the way `.css_first` interacts with the `::text` pseudo-element.

# # # # Suggested Fix:
Modify the CSS extraction to log the type and content of the object returned by `.css_first` for debugging. Here’s the updated section:

```python
# Extract quotes using CSS selectors
quotes_css = []
try:
for quote in page.css('.quote'):
# Debug: Print the type and content of each '.text::text' selection
text_handler = quote.css_first('.text::text')
print(f"Debug - .text::text type: {type(text_handler)}, content: {text_handler}")

text = text_handler.text if text_handler else "N/A"
author = quote.css_first('.author::text').text if quote.css_first('.author::text') else "N/A"
tags = [tag.text for tag in quote.css('.tag::text')] if quote.css('.tag::text') else []
quotes_css.append({"text": text, "author": author, "tags": tags})
except Exception as e:
print(f"Error with CSS extraction: {e}")
```

---

# # # Next Steps:
1. **Run the updated code**: Check the debug output for the `type` and `content` of the `.css_first('.text::text')` object.
2. **Adjust the code**: Based on the debug findings, we’ll refine the extraction logic to correctly handle the returned object type.

Let me know the debug output, and we’ll proceed from there!

from scrapling import Fetcher

fetcher = Fetcher(auto_match=False)

# Fetch a web page and create an Adaptor instance
page = fetcher.get('https://quotes.toscrape.com/', stealthy_headers=True)

# Extract quotes using CSS selectors
quotes_css = []
try:
for quote in page.css('.quote'):
# Debug: Print the type and content of each '.text::text' selection
text_handler = quote.css_first('.text::text')
print(f"Debug - .text::text type: {type(text_handler)}, content: {text_handler}")

text = text_handler.text if text_handler else "N/A"
author = quote.css_first('.author::text').text if quote.css_first('.author::text') else "N/A"
tags = [tag.text for tag in quote.css('.tag::text')] if quote.css('.tag::text') else []
quotes_css.append({"text": text, "author": author, "tags": tags})
except Exception as e:
print(f"Error with CSS extraction: {e}")

# Print CSS extraction
print("\nExtracted Quotes (CSS):")
for quote in quotes_css:
print(f"- Quote: {quote['text']}")
print(f" Author: {quote['author']}")
print(f" Tags: {', '.join(quote['tags'])}")

# Extract quotes using XPath
quotes_xpath = []
try:
for quote in page.xpath('//div[="quote"]'):
text = quote.xpath('.//span[="text"]/text()')[0] if quote.xpath('.//span[="text"]/text()') else "N/A"
author = quote.xpath('.//small[="author"]/text()')[0] if quote.xpath('.//small[="author"]/text()') else "N/A"
tags = quote.xpath('.//div[="tags"]/a[="tag"]/text()') if quote.xpath('.//div[="tags"]/a[="tag"]/text()') else []
quotes_xpath.append({"text": text, "author": author, "tags": tags})
except Exception as e:
print(f"Error with XPath extraction: {e}")

# Print XPath extraction
print("\nExtracted Quotes (XPath):")
for quote in quotes_xpath:
print(f"- Quote: {quote['text']}")
print(f" Author: {quote['author']}")
print(f" Tags: {', '.join(quote['tags'])}")

# Extract quotes using find_all
quotes_find_all = []
try:
for quote in page.find_all('div', {'class': 'quote'}):
text = quote.find('span', class_='text').text if quote.find('span', class_='text') else "N/A"
author = quote.find('small', class_='author').text if quote.find('small', class_='author') else "N/A"
tags = [tag.text for tag in quote.find_all('a', class_='tag')] if quote.find_all('a', class_='tag') else []
quotes_find_all.append({"text": text, "author": author, "tags": tags})
except Exception as e:
print(f"Error with find_all extraction: {e}")

# Print find_all extraction
print("\nExtracted Quotes (find_all):")
for quote in quotes_find_all:
print(f"- Quote: {quote['text']}")
print(f" Author: {quote['author']}")
print(f" Tags: {', '.join(quote['tags'])}")

(venv) ┌──(venv)(fahm㉿MININT-FCGQJBQ)-[~/scrapling_reddit]
└─$ /home/fahm/scrapling_reddit/venv/bin/python /home/fahm/scrapling_reddit/scraper.py
[2024-12-26 04:45:54] INFO: Fetched (200) (referer: https://www.google.com/search?q=toscrape)
Debug - .text::text type: , content: “The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”
Error with CSS extraction: 'TextHandler' object has no attribute 'text'

Extracted Quotes (CSS):

Extracted Quotes (XPath):
- Quote: “The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”
Author: Albert Einstein
Tags: change, deep-thoughts, thinking, world
- Quote: “It is our choices, Harry, that show what we truly are, far more than our abilities.”
Author: J.K. Rowling
Tags: abilities, choices
- Quote: “There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”
Author: Albert Einstein
Tags: inspirational, life, live, miracle, miracles
- Quote: “The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”
Author: Jane Austen
Tags: aliteracy, books, classic, humor
- Quote: “Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”
Author: Marilyn Monroe
Tags: be-yourself, inspirational
- Quote: “Try not to become a man of success. Rather become a man of value.”
Author: Albert Einstein
Tags: adulthood, success, value
- Quote: “It is better to be hated for what you are than to be loved for what you are not.”
Author: André Gide
Tags: life, love
- Quote: “I have not failed. I've just found 10,000 ways that won't work.”
Author: Thomas A. Edison
Tags: edison, failure, inspirational, paraphrased
- Quote: “A woman is like a tea bag; you never know how strong it is until it's in hot water.”
Author: Eleanor Roosevelt
Tags: misattributed-eleanor-roosevelt
- Quote: “A day without sunshine is like, you know, night.”
Author: Steve Martin
Tags: humor, obvious, simile

Extracted Quotes (find_all):
- Quote: “The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”
Author: Albert Einstein
Tags: change, deep-thoughts, thinking, world
- Quote: “It is our choices, Harry, that show what we truly are, far more than our abilities.”
Author: J.K. Rowling
Tags: abilities, choices
- Quote: “There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”
Author: Albert Einstein
Tags: inspirational, life, live, miracle, miracles
- Quote: “The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”
Author: Jane Austen
Tags: aliteracy, books, classic, humor
- Quote: “Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”
Author: Marilyn Monroe
Tags: be-yourself, inspirational
- Quote: “Try not to become a man of success. Rather become a man of value.”
Author: Albert Einstein
Tags: adulthood, success, value
- Quote: “It is better to be hated for what you are than to be loved for what you are not.”
Author: André Gide
Tags: life, love
- Quote: “I have not failed. I've just found 10,000 ways that won't work.”
Author: Thomas A. Edison
Tags: edison, failure, inspirational, paraphrased
- Quote: “A woman is like a tea bag; you never know how strong it is until it's in hot water.”
Author: Eleanor Roosevelt
Tags: misattributed-eleanor-roosevelt
- Quote: “A day without sunshine is like, you know, night.”
Author: Steve Martin
Tags: humor, obvious, simile

(venv) ┌──(venv)(fahm㉿MININT-FCGQJBQ)-[~/scrapling_reddit]
└─$

The debug output confirms that the `.css_first('.text::text')` method from Scrapling returns a `TextHandler` object. However, this object does not have a `.text` attribute, which is causing the error. Instead, we need to directly access the string value from the `TextHandler` object.

# # # Fix for CSS Extraction
The `TextHandler` object likely needs to be cast or handled differently to access the text content. Based on Scrapling's behavior, you can try using `str()` to convert the `TextHandler` object to a string.

Here’s the updated CSS extraction code:

```python
# Extract quotes using CSS selectors
quotes_css = []
try:
for quote in page.css('.quote'):
# Use str() to get the text value from TextHandler
text_handler = quote.css_first('.text::text')
text = str(text_handler) if text_handler else "N/A"
author_handler = quote.css_first('.author::text')
author = str(author_handler) if author_handler else "N/A"
tags = [str(tag) for tag in quote.css('.tag::text')] if quote.css('.tag::text') else []

quotes_css.append({"text": text, "author": author, "tags": tags})
except Exception as e:
print(f"Error with CSS extraction: {e}")

# Print CSS extraction
print("\nExtracted Quotes (CSS):")
for quote in quotes_css:
print(f"- Quote: {quote['text']}")
print(f" Author: {quote['author']}")
print(f" Tags: {', '.join(quote['tags'])}")
```

# # # Explanation
1. **Using `str()`**: The `TextHandler` object is converted into a string to retrieve the text content. This approach avoids calling a non-existent `.text` attribute.
2. **Improved Robustness**: The updated logic ensures that even if a specific CSS selector fails to match or returns `None`, the code gracefully handles it without raising an error.

---

# # # Next Steps
1. Replace the CSS extraction block in your script with the updated code above.
2. Run the script again and observe the output for the "Extracted Quotes (CSS)" section.

If the issue persists or if there’s another unexpected behavior, share the updated debug output!

from scrapling import Fetcher

fetcher = Fetcher(auto_match=False)

# Fetch a web page and create an Adaptor instance
page = fetcher.get('https://quotes.toscrape.com/', stealthy_headers=True)

# Extract quotes using CSS selectors
quotes_css = []
try:
for quote in page.css('.quote'):
# Use str() to get the text value from TextHandler
text_handler = quote.css_first('.text::text')
text = str(text_handler) if text_handler else "N/A"
author_handler = quote.css_first('.author::text')
author = str(author_handler) if author_handler else "N/A"
tags = [str(tag) for tag in quote.css('.tag::text')] if quote.css('.tag::text') else []

quotes_css.append({"text": text, "author": author, "tags": tags})
except Exception as e:
print(f"Error with CSS extraction: {e}")

# Print CSS extraction
print("\nExtracted Quotes (CSS):")
for quote in quotes_css:
print(f"- Quote: {quote['text']}")
print(f" Author: {quote['author']}")
print(f" Tags: {', '.join(quote['tags'])}")

# Print CSS extraction
print("\nExtracted Quotes (CSS):")
for quote in quotes_css:
print(f"- Quote: {quote['text']}")
print(f" Author: {quote['author']}")
print(f" Tags: {', '.join(quote['tags'])}")

# Extract quotes using XPath
quotes_xpath = []
try:
for quote in page.xpath('//div[="quote"]'):
text = quote.xpath('.//span[="text"]/text()')[0] if quote.xpath('.//span[="text"]/text()') else "N/A"
author = quote.xpath('.//small[="author"]/text()')[0] if quote.xpath('.//small[="author"]/text()') else "N/A"
tags = quote.xpath('.//div[="tags"]/a[="tag"]/text()') if quote.xpath('.//div[="tags"]/a[="tag"]/text()') else []
quotes_xpath.append({"text": text, "author": author, "tags": tags})
except Exception as e:
print(f"Error with XPath extraction: {e}")

# Print XPath extraction
print("\nExtracted Quotes (XPath):")
for quote in quotes_xpath:
print(f"- Quote: {quote['text']}")
print(f" Author: {quote['author']}")
print(f" Tags: {', '.join(quote['tags'])}")

# Extract quotes using find_all
quotes_find_all = []
try:
for quote in page.find_all('div', {'class': 'quote'}):
text = quote.find('span', class_='text').text if quote.find('span', class_='text') else "N/A"
author = quote.find('small', class_='author').text if quote.find('small', class_='author') else "N/A"
tags = [tag.text for tag in quote.find_all('a', class_='tag')] if quote.find_all('a', class_='tag') else []
quotes_find_all.append({"text": text, "author": author, "tags": tags})
except Exception as e:
print(f"Error with find_all extraction: {e}")

# Print find_all extraction
print("\nExtracted Quotes (find_all):")
for quote in quotes_find_all:
print(f"- Quote: {quote['text']}")
print(f" Author: {quote['author']}")
print(f" Tags: {', '.join(quote['tags'])}")

(venv) ┌──(venv)(fahm㉿MININT-FCGQJBQ)-[~/scrapling_reddit]
└─$ /home/fahm/scrapling_reddit/venv/bin/python /home/fahm/scrapling_reddit/scraper.py
[2024-12-26 04:49:03] INFO: Fetched (200) (referer: https://www.google.com/search?q=toscrape)

Extracted Quotes (CSS):
- Quote: “The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”
Author: Albert Einstein
Tags: change, deep-thoughts, thinking, world
- Quote: “It is our choices, Harry, that show what we truly are, far more than our abilities.”
Author: J.K. Rowling
Tags: abilities, choices
- Quote: “There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”
Author: Albert Einstein
Tags: inspirational, life, live, miracle, miracles
- Quote: “The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”
Author: Jane Austen
Tags: aliteracy, books, classic, humor
- Quote: “Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”
Author: Marilyn Monroe
Tags: be-yourself, inspirational
- Quote: “Try not to become a man of success. Rather become a man of value.”
Author: Albert Einstein
Tags: adulthood, success, value
- Quote: “It is better to be hated for what you are than to be loved for what you are not.”
Author: André Gide
Tags: life, love
- Quote: “I have not failed. I've just found 10,000 ways that won't work.”
Author: Thomas A. Edison
Tags: edison, failure, inspirational, paraphrased
- Quote: “A woman is like a tea bag; you never know how strong it is until it's in hot water.”
Author: Eleanor Roosevelt
Tags: misattributed-eleanor-roosevelt
- Quote: “A day without sunshine is like, you know, night.”
Author: Steve Martin
Tags: humor, obvious, simile

Extracted Quotes (CSS):
- Quote: “The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”
Author: Albert Einstein
Tags: change, deep-thoughts, thinking, world
- Quote: “It is our choices, Harry, that show what we truly are, far more than our abilities.”
Author: J.K. Rowling
Tags: abilities, choices
- Quote: “There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”
Author: Albert Einstein
Tags: inspirational, life, live, miracle, miracles
- Quote: “The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”
Author: Jane Austen
Tags: aliteracy, books, classic, humor
- Quote: “Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”
Author: Marilyn Monroe
Tags: be-yourself, inspirational
- Quote: “Try not to become a man of success. Rather become a man of value.”
Author: Albert Einstein
Tags: adulthood, success, value
- Quote: “It is better to be hated for what you are than to be loved for what you are not.”
Author: André Gide
Tags: life, love
- Quote: “I have not failed. I've just found 10,000 ways that won't work.”
Author: Thomas A. Edison
Tags: edison, failure, inspirational, paraphrased
- Quote: “A woman is like a tea bag; you never know how strong it is until it's in hot water.”
Author: Eleanor Roosevelt
Tags: misattributed-eleanor-roosevelt
- Quote: “A day without sunshine is like, you know, night.”
Author: Steve Martin
Tags: humor, obvious, simile

Extracted Quotes (XPath):
- Quote: “The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”
Author: Albert Einstein
Tags: change, deep-thoughts, thinking, world
- Quote: “It is our choices, Harry, that show what we truly are, far more than our abilities.”
Author: J.K. Rowling
Tags: abilities, choices
- Quote: “There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”
Author: Albert Einstein
Tags: inspirational, life, live, miracle, miracles
- Quote: “The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”
Author: Jane Austen
Tags: aliteracy, books, classic, humor
- Quote: “Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”
Author: Marilyn Monroe
Tags: be-yourself, inspirational
- Quote: “Try not to become a man of success. Rather become a man of value.”
Author: Albert Einstein
Tags: adulthood, success, value
- Quote: “It is better to be hated for what you are than to be loved for what you are not.”
Author: André Gide
Tags: life, love
- Quote: “I have not failed. I've just found 10,000 ways that won't work.”
Author: Thomas A. Edison
Tags: edison, failure, inspirational, paraphrased
- Quote: “A woman is like a tea bag; you never know how strong it is until it's in hot water.”
Author: Eleanor Roosevelt
Tags: misattributed-eleanor-roosevelt
- Quote: “A day without sunshine is like, you know, night.”
Author: Steve Martin
Tags: humor, obvious, simile

Extracted Quotes (find_all):
- Quote: “The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”
Author: Albert Einstein
Tags: change, deep-thoughts, thinking, world
- Quote: “It is our choices, Harry, that show what we truly are, far more than our abilities.”
Author: J.K. Rowling
Tags: abilities, choices
- Quote: “There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”
Author: Albert Einstein
Tags: inspirational, life, live, miracle, miracles
- Quote: “The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”
Author: Jane Austen
Tags: aliteracy, books, classic, humor
- Quote: “Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”
Author: Marilyn Monroe
Tags: be-yourself, inspirational
- Quote: “Try not to become a man of success. Rather become a man of value.”
Author: Albert Einstein
Tags: adulthood, success, value
- Quote: “It is better to be hated for what you are than to be loved for what you are not.”
Author: André Gide
Tags: life, love
- Quote: “I have not failed. I've just found 10,000 ways that won't work.”
Author: Thomas A. Edison
Tags: edison, failure, inspirational, paraphrased
- Quote: “A woman is like a tea bag; you never know how strong it is until it's in hot water.”
Author: Eleanor Roosevelt
Tags: misattributed-eleanor-roosevelt
- Quote: “A day without sunshine is like, you know, night.”
Author: Steve Martin
Tags: humor, obvious, simile

(venv) ┌──(venv)(fahm㉿MININT-FCGQJBQ)-[~/scrapling_reddit]
└─$

Call now to connect with business.

Address

621 Plymouth Drive NE
Keizer, OR
97303

Alerts

Be the first to know and let us send you an email when HakHub LLC posts news and promotions. Your email address will not be used for any other purpose, and you can unsubscribe at any time.

Share