Author Topic: 11/23/2015 16:45 Down again? UP 1322 28 Nov 2105!  (Read 190767 times)

Offline cockedandglocked

  • Full Member
  • ***
  • Posts: 152
  • Karma: +18/-48
    • View Profile
Re: 11/23/2015 16:45 Down again?
« Reply #150 on: November 24, 2015, 08:45:19 AM »
A few questions that I know nobody here can answer:

-Why did they not have redundant hardware in place? No self-respecting computer company doesn't have a plan (and spare hardware) in place to prevent a hardware failure from taking out company operations. "they are trying to expedite replacement parts"... in other words, they only had 1 of whatever broke.

-If the storage disk array is what blew up, and they are "working to restore from backups", does that mean we lose hours, days, weeks, or months of data?

-Which IT guy there is getting canned?


I fully understand that hardware failures happen, I've experienced lots of them myself... but anyone who halfway knows what they're doing, would never experience a catastrophic hardware failure like this. Redundant systems should be in place, where one a hardware bank goes offline, catches fire, or whatever, the system kicks itself over to the redundant hardware and operations continue. Google, Amazon, Facebook, and many others never experience downtime, and I'm pretty sure it isn't because their hardware never fails.

Maybe time for CG to switch hosts?
« Last Edit: November 24, 2015, 08:48:02 AM by cockedandglocked »
I'm only here for the timeshare presentation

Offline Son

  • Full Member
  • ***
  • Posts: 176
  • Karma: +12/-45
  • Bite me
    • View Profile
Re: 11/23/2015 16:45 Down again?
« Reply #151 on: November 24, 2015, 08:50:44 AM »
A few questions that I know nobody here can answer:

-Why did they not have redundant hardware in place? No self-respecting computer company doesn't have a plan (and spare hardware) in place to prevent a hardware failure from taking out company operations. "they are trying to expedite replacement parts"... in other words, they only had 1 of whatever broke.

-If the storage disk array is what blew up, and they are "working to restore from backups", does that mean we lose hours, days, weeks, or months of data?

-Which IT guy there is getting canned?


I fully understand that hardware failures happen, I've experienced lots of them myself... but anyone who halfway knows what they're doing, would never experience a catastrophic hardware failure like this. Redundant systems should be in place, where one a hardware bank goes offline, catches fire, or whatever, the system kicks itself over to the redundant hardware and operations continue. Google, Amazon, Facebook, and many others never experience downtime, and I'm pretty sure it isn't because their hardware never fails.

Maybe time for CG to switch hosts?
"I find your logic disturbing."  -Said the owners of the provider...


and (unrelated to to this particular situation) every leftist ever.
Don't do it, you'll be sorry...ah hell, go ahead and do it, it'll be funny.
-me

http://img.izismile.com/img/img3/20100928/1000/funny_gif_collection_22.gif

Offline MZ

  • Newbie
  • *
  • Posts: 8
  • Karma: +1/-4
    • View Profile
Re: 11/23/2015 16:45 Down again?
« Reply #152 on: November 24, 2015, 08:54:05 AM »
wow, I thought it was only my computer misbehaving again.........

Offline Piper Cub

  • Jr. Member
  • **
  • Posts: 67
  • Karma: +3/-35
  • Ye Shalt be SMITED
    • View Profile
Re: 11/23/2015 16:45 Down again?
« Reply #153 on: November 24, 2015, 09:05:20 AM »
I'm glad it's really down and not just me! I thought it might've gotten blocked or my IP been banned. That happened to me on another forum, someone was spamming and when the admin went to block the spammer they had a typo in the IP address and I got blocked instead.

Offline grampz

  • Newbie
  • *
  • Posts: 21
  • Karma: +5/-8
    • View Profile
Re: 11/23/2015 16:45 Down again?
« Reply #154 on: November 24, 2015, 09:16:41 AM »
Oh my. WTH... I'm IN!

Offline swalt

  • Full Member
  • ***
  • Posts: 164
  • Karma: +12/-29
    • View Profile
Re: 11/23/2015 16:45 Down again?
« Reply #155 on: November 24, 2015, 09:18:12 AM »
A few questions that I know nobody here can answer:

-Why did they not have redundant hardware in place? No self-respecting computer company doesn't have a plan (and spare hardware) in place to prevent a hardware failure from taking out company operations. "they are trying to expedite replacement parts"... in other words, they only had 1 of whatever broke.

-If the storage disk array is what blew up, and they are "working to restore from backups", does that mean we lose hours, days, weeks, or months of data?

-Which IT guy there is getting canned?


I fully understand that hardware failures happen, I've experienced lots of them myself... but anyone who halfway knows what they're doing, would never experience a catastrophic hardware failure like this. Redundant systems should be in place, where one a hardware bank goes offline, catches fire, or whatever, the system kicks itself over to the redundant hardware and operations continue. Google, Amazon, Facebook, and many others never experience downtime, and I'm pretty sure it isn't because their hardware never fails.

Maybe time for CG to switch hosts?

If its down for a few days I can imagine their customers losing a good chunk of change.  How many lawsuits will follow?   

Offline BAJ475

  • Newbie
  • *
  • Posts: 13
  • Karma: +0/-4
    • View Profile
Re: 11/23/2015 16:45 Down again?
« Reply #156 on: November 24, 2015, 09:20:23 AM »
First, does anyone have an estimated time when .net will be back up?
Although I do not know just how it works, when I tried to do a DNS lookup on Calguns.net it could not return a numeric IP address.  So it appears to me the the problem is more than just the servers that Calguns.net runs on and extends to DNS servers.  Anyone know more?

Offline Red-Osier77

  • Sr. Member
  • ****
  • Posts: 334
  • Karma: +12/-83
  • karma:+69/-899
    • View Profile
Re: 11/23/2015 16:45 Down again?
« Reply #157 on: November 24, 2015, 09:21:12 AM »
three days

Offline tuolumnejim

  • Jr. Member
  • **
  • Posts: 55
  • Karma: +12/-14
    • View Profile
Re: 11/23/2015 16:45 Down again?
« Reply #158 on: November 24, 2015, 09:21:18 AM »
I bet they are waiting for Black Friday sales to pick up the new hardware at steep discount.
Newegg FTW!  ;D

Offline Librarian

  • Global Moderator
  • Jr. Member
  • *****
  • Posts: 79
  • Karma: +15/-13
  • Books!
    • View Profile
Re: 11/23/2015 16:45 Down again?
« Reply #159 on: November 24, 2015, 09:30:10 AM »
First, does anyone have an estimated time when .net will be back up?

Post 134 in this thread:
Quote
... we are expecting services to restore by roughly 6am PST, November 26th.
If I'm posting *here*, I'm not moving/locking/deleting threads *there*.

Just sayin'

Offline 1bulletBarney

  • Jr. Member
  • **
  • Posts: 59
  • Karma: +5/-39
    • View Profile
Re: 11/23/2015 16:45 Down again?
« Reply #160 on: November 24, 2015, 09:31:49 AM »
I just checked ODOT (Oregon Dept of Transportation) and the site is down. My mom lives in Bend and I was thinking about going up there, I can check road conditions in real time. This outage may being affecting .gov sites and that would not be good...

Offline Wnick308

  • Full Member
  • ***
  • Posts: 119
  • Karma: +6/-99
  • We should all riot!
    • View Profile
Re: 11/23/2015 16:45 Down again?
« Reply #161 on: November 24, 2015, 09:39:30 AM »
So with the main site down does that mean we should get out and go shooting? Or maybe just go outside? :o

Offline jctheguy

  • Newbie
  • *
  • Posts: 3
  • Karma: +0/-0
    • View Profile
Re: 11/23/2015 16:45 Down again?
« Reply #162 on: November 24, 2015, 09:40:04 AM »
Frys?!?!  :-\

Offline ninety

  • Newbie
  • *
  • Posts: 11
  • Karma: +2/-4
    • View Profile
Re: 11/23/2015 16:45 Down again?
« Reply #163 on: November 24, 2015, 09:42:02 AM »
So its true?

I was smacking my router around, thinking it was a local issue. Then I thought to try some other sites.

Yes, I do this all day at work so my troubleshooting skills should be better, but I am getting older and forgetful.  :P

LoL Me too

rebooted modem ,router released IP , checked settings , ran virus scan..

Offline Red-Osier77

  • Sr. Member
  • ****
  • Posts: 334
  • Karma: +12/-83
  • karma:+69/-899
    • View Profile
Re: 11/23/2015 16:45 Down again?
« Reply #164 on: November 24, 2015, 09:44:31 AM »
did obamacare sign ups start yesterday

Offline Son

  • Full Member
  • ***
  • Posts: 176
  • Karma: +12/-45
  • Bite me
    • View Profile
Re: 11/23/2015 16:45 Down again?
« Reply #165 on: November 24, 2015, 09:55:19 AM »
So with the main site down does that mean we should get out and go shooting? Or maybe just go outside? :o
Hitting the range tonight for the regular CG Santa Clara monthly meet up.  6 PM.  If you're local c'mon down.  If you're not local, c'mon down!
:)
Don't do it, you'll be sorry...ah hell, go ahead and do it, it'll be funny.
-me

http://img.izismile.com/img/img3/20100928/1000/funny_gif_collection_22.gif

Offline Killer Bee

  • Newbie
  • *
  • Posts: 16
  • Karma: +1/-0
    • View Profile
Re: 11/23/2015 16:45 Down again?
« Reply #166 on: November 24, 2015, 10:06:07 AM »
First, does anyone have an estimated time when .net will be back up?

TWO WEEKS!!!
not affiliated with Killer Bee on calguns.net

Offline Killer Bee

  • Newbie
  • *
  • Posts: 16
  • Karma: +1/-0
    • View Profile
Re: 11/23/2015 16:45 Down again?
« Reply #167 on: November 24, 2015, 10:08:10 AM »
<snip
-If the storage disk array is what blew up, and they are "working to restore from backups", does that mean we lose hours, days, weeks, or months of data?
<snip>

no worries, NSA has a copy of everything.. maybe they'll share?
not affiliated with Killer Bee on calguns.net

Offline edwardm

  • Newbie
  • *
  • Posts: 1
  • Karma: +0/-0
    • View Profile
Re: 11/23/2015 16:45 Down again?
« Reply #168 on: November 24, 2015, 10:18:52 AM »
I'm with you.  Problems and their consequences are often not entirely avoidable.  But you can mitigate outages by preparing (and spending money) for outages. 

Assuming they're using a SAN from a vendor like NetApp, IBM, EMC, or Hitachi, they should have a support contract.  FFS, when just a disk dies on one of my SANs, the vendor is willing to send a warm body to swap the SOB.  And a good support contract also means parts are on the way before you even have to ask.  If it really hit the fan, there would be a team of people driving gear to one of my facilities, pulling it out of labs at their corporate HQ, or otherwise doing whatever they had to for a return to service. 

With the nature of the outage (customer data, but also their DNS servers appear to be offline, and thus no mailhost/MX records), it looks like they stored everything on the SAN (bad move #1), had no secondary site (bad move #2), have no disaster recovery (DR) plan (stupid stupid move #1), and basically put zero thought into recovering from a worst-case scenario. 

I'd still put $5 on "everything lives in virtual machines and every virtual machine lives on one SAN".  Derp. 

A few questions that I know nobody here can answer:

-Why did they not have redundant hardware in place? No self-respecting computer company doesn't have a plan (and spare hardware) in place to prevent a hardware failure from taking out company operations. "they are trying to expedite replacement parts"... in other words, they only had 1 of whatever broke.

-If the storage disk array is what blew up, and they are "working to restore from backups", does that mean we lose hours, days, weeks, or months of data?

-Which IT guy there is getting canned?


I fully understand that hardware failures happen, I've experienced lots of them myself... but anyone who halfway knows what they're doing, would never experience a catastrophic hardware failure like this. Redundant systems should be in place, where one a hardware bank goes offline, catches fire, or whatever, the system kicks itself over to the redundant hardware and operations continue. Google, Amazon, Facebook, and many others never experience downtime, and I'm pretty sure it isn't because their hardware never fails.

Maybe time for CG to switch hosts?

Offline FP562

  • Full Member
  • ***
  • Posts: 161
  • Karma: +17/-81
    • View Profile
Re: 11/23/2015 16:45 Down again?
« Reply #169 on: November 24, 2015, 10:33:07 AM »
Wow, Nov 26, that is pretty bad hardware failure .

Offline AR15barrels

  • Sr. Member
  • ****
  • Posts: 1721
  • Karma: +191/-75
  • AR-15 Guru
    • View Profile
    • ar15barrels.com
Re: 11/23/2015 16:45 Down again?
« Reply #170 on: November 24, 2015, 10:37:36 AM »
It was all the micro aggression in Randall banned thread...I know it.

And I didn't even get to see it...
I got a couple text messages about it while sitting on the beach in mexico (thank you sprint for free coverage in mexico now) and I responded back to my guys to go ahead and take the site down since I didn't need it for a few days.
I should have asked them to screen grab that thread and text it to me before they did...
I didn't really enjoy taking showers with other men though I have done it
I move a lot...and drool....

Offline AR15barrels

  • Sr. Member
  • ****
  • Posts: 1721
  • Karma: +191/-75
  • AR-15 Guru
    • View Profile
    • ar15barrels.com
Re: 11/23/2015 16:45 Down again?
« Reply #171 on: November 24, 2015, 10:40:20 AM »
and already five posts! Post whore  ;D

You newbies don't know what post whoring is...
I didn't really enjoy taking showers with other men though I have done it
I move a lot...and drool....

Offline AR15barrels

  • Sr. Member
  • ****
  • Posts: 1721
  • Karma: +191/-75
  • AR-15 Guru
    • View Profile
    • ar15barrels.com
Re: 11/23/2015 16:45 Down again?
« Reply #172 on: November 24, 2015, 10:42:15 AM »
is randall banned from here


Not yet, but I can try!
I didn't really enjoy taking showers with other men though I have done it
I move a lot...and drool....

Offline LowThudd

  • Jr. Member
  • **
  • Posts: 98
  • Karma: +9/-37
    • View Profile
Re: 11/23/2015 16:45 Down again?
« Reply #173 on: November 24, 2015, 10:43:45 AM »
So Mexico is your alibi on the seerver going down? Good one!

Offline Red-Osier77

  • Sr. Member
  • ****
  • Posts: 334
  • Karma: +12/-83
  • karma:+69/-899
    • View Profile
Re: 11/23/2015 16:45 Down again?
« Reply #174 on: November 24, 2015, 10:45:22 AM »

Offline Pinto

  • Full Member
  • ***
  • Posts: 112
  • Karma: +4/-26
    • View Profile
Re: 11/23/2015 16:45 Down again?
« Reply #175 on: November 24, 2015, 10:49:18 AM »
So Mexico is your alibi on the seerver going down? Good one!

No, I think he said being banned was the reason he ordered the site "taken down". The Mexico part is just puffing.

Offline readysetgo

  • Full Member
  • ***
  • Posts: 113
  • Karma: +6/-38
  • [applaud] [smite]
    • View Profile
Re: 11/23/2015 16:45 Down again?
« Reply #176 on: November 24, 2015, 10:50:57 AM »
It was all the micro aggression in Randall banned thread...I know it.

And I didn't even get to see it...
I got a couple text messages about it while sitting on the beach in mexico (thank you sprint for free coverage in mexico now) and I responded back to my guys to go ahead and take the site down since I didn't need it for a few days.
I should have asked them to screen grab that thread and text it to me before they did...
I knew it!  Well...might as well come clean before you see it...I was advocating in favor of the ban (more or less)...catching a lot of flack...you're a popular guy, why (?), who knows... :D  :P
There is only ONE calguns.org

Offline josey wales

  • Jr. Member
  • **
  • Posts: 72
  • Karma: +1/-9
  • Here only cause .Net is down
    • View Profile
Re: 11/23/2015 16:45 Down again?
« Reply #177 on: November 24, 2015, 10:51:22 AM »
Wonder if Melvino is behind this.  His computer was always malfunctioning, my friend.

Offline skyhawk

  • Jr. Member
  • **
  • Posts: 58
  • Karma: +5/-18
    • View Profile
Re: 11/23/2015 16:45 Down again?
« Reply #178 on: November 24, 2015, 11:07:20 AM »
I'm with you.  Problems and their consequences are often not entirely avoidable.  But you can mitigate outages by preparing (and spending money) for outages. 

Assuming they're using a SAN from a vendor like NetApp, IBM, EMC, or Hitachi, they should have a support contract.  FFS, when just a disk dies on one of my SANs, the vendor is willing to send a warm body to swap the SOB.  And a good support contract also means parts are on the way before you even have to ask.  If it really hit the fan, there would be a team of people driving gear to one of my facilities, pulling it out of labs at their corporate HQ, or otherwise doing whatever they had to for a return to service. 

With the nature of the outage (customer data, but also their DNS servers appear to be offline, and thus no mailhost/MX records), it looks like they stored everything on the SAN (bad move #1), had no secondary site (bad move #2), have no disaster recovery (DR) plan (stupid stupid move #1), and basically put zero thought into recovering from a worst-case scenario. 

I'd still put $5 on "everything lives in virtual machines and every virtual machine lives on one SAN".  Derp. 



I'd agree that nearly everything is probably running from VM and shared storage. But they alluded to a cascade hardware failure and also having to do restores.  It could be their chassis cooling failed, causing several disks to die off - more than the RAID level they use could tolerate, necessitating a complete restore after they get equipment spares. It could have been the data center cooling too.  I'm just spitballing based on the limited info we have.  If it was just a SAN switch, a restore would not be required.

I am a believer in your own DNS being hosted by a 3rd party.  At least that way when things go tango uniform in your data center, you can quickly point A records at a backup site to at least give clients status updates. This is a lot faster than updating your DNS servers at the registrar and waiting for propagation, just to give status updates or receive mail.  It also allows you to have a lower priority MX record always published and pointing at a backup store & forward mail service, to be used when you have problems like this.

They should also revive their twitter account, it is a great way to give status updates.

Offline Ubermcoupe

  • Jr. Member
  • **
  • Posts: 70
  • Karma: +11/-22
    • View Profile
Re: 11/23/2015 16:45 Down again?
« Reply #179 on: November 24, 2015, 11:23:31 AM »
First, does anyone have an estimated time when .net will be back up?

Post 134 in this thread:
Quote
... we are expecting services to restore by roughly 6am PST, November 26th.

Dang - seems so far off.

The Great-Great CGN outage of 2015.
The information contained in this document is CONFIDENTIAL and LEGALLY PRIVILEGED, intended only for the recipient(s) named above. If the reader of this message is not the intended recipient, you are notified that any use, copying, disclosure, retention or distribution is unlawful and illegal.