Quantcast
Channel: Kodi Community Forum - Information Providers (scrapers)
Viewing all articles
Browse latest Browse all 707

Universal Movie Scraper not fetching certification from IMDB

$
0
0
I noticed all my recently scraped movies were rated NR. I Checked the logs and found that
Code:

DEBUG: scraper: GetIMDBCountryCert returned <details><url cache="tt0944835-reference.html" function="ParseIMDBCountryCert">http://www.imdb.com/title/tt0944835/reference|accept-language=en-us</url></details>
DEBUG: CurlFile:TonguearseAndCorrectUrl() adding custom header option 'accept-language: en-us'
DEBUG: CurlFile::Open(24399230) http://www.imdb.com/title/tt0944835/reference
DEBUG: CScraperUrl::Get: Using "UTF-8" charset for HTML "http://www.imdb.com/title/tt0944835/reference|accept-language=en-us"
ERROR: ADDON::CScraper::Run: Unable to parse web site

The tt0944835-reference.html file was created and contains all the information. I re-scraped movies that did have a certification rating on them and watched them lose it.

I'm on metadata.universal 4.1.8 and metadata.common.imdb.com 3.0.3, with Kodi 17.6 Git:20171114-a9a7a20.
<setting id="imdbcertcountry" value="United States" />

All other URLs from IMDB work just fine:
Code:

DEBUG: scraper: GetIMDBRatingsById returned <details><url cache="tt0944835-main.html" function="ParseIMDBRatings">http://www.imdb.com/title/tt0944835/|accept-language=en-us</url></details>
DEBUG: CurlFile:TonguearseAndCorrectUrl() adding custom header option 'accept-language: en-us'
DEBUG: CurlFile::Open(1C3E3B68) http://www.imdb.com/title/tt0944835/
DEBUG: CScraperUrl::Get: Using "UTF-8" charset for HTML "http://www.imdb.com/title/tt0944835/|accept-language=en-us"
DEBUG: scraper: ParseIMDBRatings returned <details><ratings><rating name="imdb" default="true"><value>6.4</value><votes>265,265</votes></rating></ratings></details>

While debugging, I found that if I changed the 2 to a 5, on lines 5 and 6 below, 350 and 351 in metadata.common.imdb.com/imdb.xml, everything worked just fine, but can't figure out why 2 or any other number throws the error.
<GetIMDBCountryCert dest="5">
<RegExp input="$$1" output="&lt;details&gt;&lt;url cache=&quot;$$1-reference.html&quot; function=&quot;ParseIMDBCountryCert&quot;&gt;http://www.imdb.com/title/$$1/reference|accept-language=en-us&lt;/url&gt;&lt;/details&gt;" dest="5">
<expression noclean="1" />
</RegExp>
<RegExp input="$INFO[imdbcertcountry]" output="$$2" dest="5">
<RegExp input="$$1" output="&lt;details&gt;&lt;url cache=&quot;$$1-reference.html&quot; function=&quot;ParseIMDBUSACert&quot;&gt;http://www.imdb.com/title/$$1/reference|accept-language=en-us&lt;/url&gt;&lt;/details&gt;" dest="2">
<expression noclean="1"/>
</RegExp>
<expression>United States</expression>
</RegExp>
</GetIMDBCountryCert>

Viewing all articles
Browse latest Browse all 707

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>