Image webscrapping is it cool to do?

I have been pondering with the idea to make like a web scrapping feature on Imagine Board so you could drop a URL link into it and then it would download and display the images from that URL.

But reading a bit about it I get the feeling sites really don’t like it?
is it something worth while to pursue or just letting the user download the images would be better?

Also making extra modules to be installed separate to have this option running would be something okay to do?

What sites generally don’t want is scrapping their images by other sites. Cause that means they are not only stealing their content but also stealing their bandwidth.

There generally is no problem of scrapping images for personal use. Your web browser already does that anyways when it puts the image into cache.

That said, some sites will put some protection techniques to make it harder to grab images by other sites with also impacts third party software. Such as for example they check the referral header of the request or check the user agent. Some more annoying ones (very rare) can even check cookies or other request header data.

Even though it’s basically the same thing @KnowZero wrote, I’ll post this as well.

Because of the many different ways image hosting sites “enable” (i.e. make difficult) access to “their” resources, I would leave this up to the users.
If you implement it, you’ll have to jump through all sorts of hoops just to provide an automatic download for your plugin. It’s a cat and mouse game where users will constantly ask you to support “their” image hoster as well.

Michelist

Depending on the jurisdiction, what you scrape and how you do it, it can also be illegal.

but would it not be considered like a web browser in nature? it would be illegal?

A good scraper will pretend to be a user on a browser, yes. During my time as a dev for an advertising company I also did a lot of scrapping and some of them were really borderline illegal or at least morally questionable. Generally, if you just download a few Images, nobody will care (probably) but usually scrappers collect massive data with spiders crawling through websites and will do stuff like, using proxies to hide their activities (problematic in Europe at the time and I’m sure things are more strict nowadays).

Although I would never encourage it I think scrapy can do all that, if I don’t mistake it for another similar framework; it’s been a while.

What you do with the data is also important, so it really depends a bit.

This sounds awfully complicated and more time consuming than I was hopping to be for the feature I wanted to do. But legal ramifications seem the more problematic to me as I don’t know how people would use it. I just wanted to make a thing to open a web gallery with it. if people can abuse they will abuse so I guess I should just scrap the idea.

I guess focusing on displaying stuff would be best.

2 Likes

I think I’m looking for something similar …
I’m really looking for easy access to reference image, but even at Unsplash, there image are free for private and commercial use … License | Unsplash … so such a plugin would be real great.

Affinity photo as such a plugin with Pexel and Pixabay … such a plugin for Krita, with maybe such as Unsplash, would be Gold!

I doubt I will be doing this now, and even if I did the odds of me sharing it will be close to none.
That it would be cool I know, but I know people would just abuse it for everything they so please.

I have worked at a copy/print shop and people do walk in to reproduce copyrighted material because “everything in Google is free”.

1 Like

Best is to find a site or service that has a public API. Not only makes things easier, you can also be sure you’re allowed to do it. Some popular image hosting services may have one.

I wanted to access Pinterest myself (it has an API) but the idea I get from reading their notes is that you gotta be registered somewhere to have the OK from their part and you can be booted out. I have no idea what is even bad behavior.

I would be just some random access to their API with no login entry (I dont want to seem I am collecting data) and probably downloading a bit too much from many points in the globe and they would probably shut me down or the users. Imagine board can easily handle 10k images or more if it is feed it so I can imagine people just gobbling up more and more even with restrictions.

Also all python pinterest API access code dont seem to work and i need to do more investigation to find something decent. The best tutorials I found were ones from web scrapping with just python and not to use an API. Also web scrapping would allow to use it on other sites and not be a given exclusive website you know. Like I could ask pexels but honestly I really do no go there to search reference images there to bother people over it but would probably be the best option I have to make something like that if they have an API.

I think most of you must have already understood I am quite far from being a programmer everything I know i started learning when I joined this forum so every new topic I do I need to go learn and internet stuff with coding for me is a big unknown. and these internet wars is something I am more keen to avoid really.

Never looked at their API but commonly they’re rate limited by application or user or both. Hopefully there’s a documentation for the endpoints with the limits included.

While scrapping for just images is fairly easy, extracting it from the website can be pretty annoying still. Usually scrappers break easily when the website structure changes, especially annoying on very dynamic websites, or just regular updates.

One reason why website owners don’t like scraping (the content stealing aside) is that it creates a lot of overhead compared to api calls. There are many additional data on a website that gets delivered every time (like the entire layout for example).

Programming a scraper (spider or crawler) is a common programming exercise. You are usually on the safe side when you application doesn’t add more load on a website than a normal user does by browsing the site, then nobody will notice probably or care. Programming errors could easily DOS a weak website, so be careful.