Re: [squid-users] Youtube Issue!

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Sun, 27 Nov 2011 15:02:22 +1300

On 27/11/2011 5:32 a.m., Ghassan Gharabli wrote:
> Hello Amos,
>
>
> Finally, I have almost captured the most YouTube Videos except
> something I want to get some asistance from you .
>
>
> As I have tested before and tried so many times .. Chudy's script is outdated.
>
> After testinig and logging Youtube Videos . I finally have found
> something not being fully cached . If you still remember I have said
> before with my old messages that ID isnt being captured in all places
> but its okay I have done this . I will post my details after I
> completelly finish them.
>
> Could you please explain to me whats happening here?
>
> If&range=13-2375679 was found in a URL then Squid doesnt understand
> how to cache the full video .. as it only cache the first 13 seconds I
> guess! and then it stops . If I try to download this finished cached
> movie then you notice its size about 2.2 MB . You try to remove it
> from cache then Squid cant even find it as it claims not cached but
> shows TCP_HIT in access.log . STRANGE!

(NP: by remove you mean PURGE request? HUT just means cached data was
found to service the request, which is right since purging the data
involves locating it (HITing) before erasing the cached entry. Followup
requests after the purge should not be HIT.).

I took a look at these"range" replies being generated by YT a while back.

What I found was that a request for video URL would send back a FLV
object with bytes eg "[SWF...]ABCDEFGH". All fine and good this is the
cacheable video.

If the user skips around in the video the player generates a range=
request stating what timestamp or bytes they want to strat at. Its not
clear which due to the reply which comes back having a *different* byte
sequence than the video at the same URL. For example, on the
"[SWF...]ABCDEFGH" video it would produce: "[SWF...]EFGH" or something
similar.

Under the HTTP rules the range object to be combined must be a snippet
portion of the base object (range 4-999, should have been just "DEFGH").
By adding the SWF headers on each reply YT are making them unique and
different objects. Combining them in the middle (ie by a caching app)
will cause errors in the binary object and crash the Flash player or
cause it to display an error message instead of the video

This range request only seems to happen if the user skips into a portion
of video the player has not yet downloaded. So sending them the whole
video, which is what we try to do with Squid, will cause a display lag
for the user but not cause problems in their player.

>
> Now look into this URL:
> -------------------------------
>
> "http://o-o.preferred.orange-par1.v4.lscache7.c.youtube.com/videoplayback?sparams=id%2Cexpire%2Cip%2Cipbits%2Citag%2Csource%2Calgorithm%2Cburst%2Cfactor%2Ccp&fexp=907605%2C912600%2C915002&algorithm=throttle-factor&itag=34&ip=84.0.0.0&burst=40&sver=3&signature=8223490C23E48CB708E04666E4
> A550422757CEC6.9D8D78E66DD14FEFC4B5F960F493ED4CDFD7C51C&source=youtube&expire=13
> 22348400&key=yt1&ipbits=8&factor=1.25&cp=U0hRR1NPVl9FSkNOMV9LSVpFOkpsV3BkS1B1ZXN
> F&id=e120643085f56831&range=13-2375679"
>
> HTTP/1.0 200 OK
> Last-Modified: Fri, 27 Nov 2009 12:44:54 GMT
> Content-Type: video/x-flv
> Date: Sat, 26 Nov 2011 16:06:29 GMT
> Expires: Sat, 26 Nov 2011 16:06:29 GMT
> Cache-Control: private, max-age=24511
> Accept-Ranges: bytes
> Content-Length: 2375667
> X-Content-Type-Options: nosniff
> Server: gvs 1.0
> X-Cache: MISS from Peer6
> X-Cache-Lookup: MISS from Peer6:3128
> Connection: close
>
> Whats the job of "Accept_ranges: bytes" here?

Accept-* means the software producing that reply or request supports a
certain HTTP feature. In this case it is Squid and maybe the server as
well supporting HTTP range requests. Not related to YT particulary.

>
> And the very confusion again you can see another similar URL with the
> same "/videoplayback?.*(id)" and here comes the ID inthe end of this
> URL then moves temporary just . I must mention that this URL sends the
> FLV url as Squid already read it in access.log and then it dds
> &ir=1&playretry=1 or pr=1&playretry which means Squid would be
> confused to cache it 2 times (FLV).
>
> EXAMPLE:
> ---------------
>
> "http://o-o.preferred.orange-par1.v3.lscache3.c.youtube.com/videoplayback?sparams=id%2Cexpire%2Cip%2Cipbits%2Citag%2Csource%2Calgorithm%2Cburst%2Cfactor%2Ccp&fexp=908525%2C910207%2C916201&algorithm=throttle
> -factor&itag=34&ip=84.0.0.0&burst=40&sver=3&signature=0489805DCC95F6EADBA9D43C3F
> D8C107FC768662.73AA6897FE78CF78BE7819E089F1A4FC47534C7D&source=youtube&expire=13
> 22344800&key=yt1&ipbits=8&factor=1.25&cp=U0hRR1NPUl9FSkNOMV9LSVZJOmdmQWdwWC01dlp
> n&id=283246f338ece5ad"
>
> HTTP/1.0 302 Moved Temporarily
> Last-Modified: Wed, 02 May 2007 10:26:10 GMT
> Date: Sat, 26 Nov 2011 15:50:47 GMT
> Expires: Sat, 26 Nov 2011 15:50:47 GMT
> Cache-Control: private, max-age=900
> Location: http://r9.orange-par2.c.youtube.com/videoplayback?sparams=id%2Cexpire%
> 2Cip%2Cipbits%2Citag%2Csource%2Calgorithm%2Cburst%2Cfactor%2Ccp&fexp=908525%2C91
> 0207%2C916201&algorithm=throttle-factor&itag=34&ip=84.0.0.0&burst=40&sver=3&sign
> ature=0489805DCC95F6EADBA9D43C3FD8C107FC768662.73AA6897FE78CF78BE7819E089F1A4FC4
> 7534C7D&source=youtube&expire=1322344800&key=yt1&ipbits=8&factor=1.25&cp=U0hRR1N
> PUl9FSkNOMV9LSVZJOmdmQWdwWC01dlpn&id=283246f338ece5ad&ir=1
> X-Content-Type-Options: nosniff
> Content-Type: text/html
> Server: gvs 1.0
> Age: 2068
> Content-Length: 0
> X-Cache: HIT from Peer6
> X-Cache-Lookup: HIT from Peer6:3128
> Connection: close

This is the 302 redirect Adrian and Chudy were discussing at the end of
the wiki page. If you cache it with storeurl_access reductions it will
loop infinitely back at itself.

Amos
Received on Sun Nov 27 2011 - 02:02:30 MST

This archive was generated by hypermail 2.2.0 : Sun Nov 27 2011 - 12:00:02 MST