事情起因
今天下午偶然发现美国cPanel服务器无法连接数据库,登录服务器发现以下错误:
INFO: rcu_sched self-detected stall on CPU
0: (288016 ticks this GP) idle=ba5/140000000000001/0 softirq=596083517/596083517
INFO: rcu_sched self-detected stall on CPU
2: (288017 ticks this GP) idle=e99/140000000000001/0 softirq=408174935/408174935
然后设法重启后,起初发现似乎一切正常,但是晚点就发现mysql无法启动,始终返回以下错误:
Starting MySQL. ERROR! The server quit without updating PID file
处理过程
至此感觉到了问题的严重性,尝试了各种修复方案,发现仍然无法再次启动mysql,于是紧急联系了cPanel官方的技术支持团队,通过提交 Request Emergency Assistance 与对方官方技术支持工程师直接取得了电话联系,对方工程师紧急介入技术协助,通过查看给出以下解决方案,顺利解决了此次的故障:
Hello,
We just spoke on the phone a moment ago. I am Melanie and I will be assisting you with your concerns with MySQL.
The concern appears to be due to InnoDB being unable to start:
150223 17:59:16 InnoDB: highest supported file format is Barracuda.
InnoDB: The log sequence number in ibdata files does not match
InnoDB: the log sequence number in the ib_logfiles!
150223 17:59:16 InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files…
InnoDB: Error: trying to add tablespace 210 of name ‘./munin_innodb/sample_table.ibd’
InnoDB: to the tablespace memory cache, but tablespace
InnoDB: 210 of name ‘./eaqlight_com/wp_itsec_log.ibd’ already exists in the tablespace
InnoDB: memory cache!
MySQL attempts to repair the issue automatically, however is unable to do so as the table space exists in the memory cache.
I have started the MySQL service by disabling InnoDB by adding the following to /etc/my.cnf:
skip-innodb
default_storage_engine=MYISAM
This will disable databases that use InnoDB such a Roundcube and Horde until the InnoDB table structure can be repaired. While cPanel does not preform database or InnoDB recovery services, we do have a detailed guide on the forums that may help you in addressing the concern:
I hope this information helps. Please do advise if you have any further questions or concerns.
Best Regards,
—
Melanie Littleton
cPanel, Inc.
Technical Analyst
通过cPanel官方工程师的回复可以看出来此次故障源于这台服务器上mysql服务里边的InnoDB引擎Corruption导致的,mysql服务有两种常见的引擎,一种是InnoDB,一般使用在较大型的网站数据库系统,比如magento之类的,另外一种最常见的引擎是MYISAM,这种引擎通用于其他所有轻型的数据库系统,比如我们最常用的WordPress等,至此问题已经找到了,此次的InnoDB引擎Corruption是由于这台服务器上的一个Magento的网站过大的资源消耗间接导致了InnoDB引擎的Corruption,我们彻底对该站点进行了迁移处理,确保该服务器只运行MYISAM引擎,同时紧急要求技术工程师对这台服务器网站进行排查,确保所有站点一律不会采用InnoDB引擎。
针对此次故障,我们技术工程师虽然第一时间进行了处理,但仍然导致网站中断访问了4个小时左右的时间,针对此次的中断我们向广大客户表示诚挚的歉意,并且会依据公司相关SLA和TOS等协议进行相应的补偿,同时我们郑重承诺,虽然无法确保服务器的可用性达到100%(就算世界排名第一的主机商承诺的uptime也只有99.9%也允许一年内有几十个小时的中断和维护的时间),但我们可以保证在出现任何一种问题的时候能够第一时间响应,第一时间进行处理,确保所有网站数据的安全性和稳定性。
隽永东方技术支持团队敬上
2015年2月13日晚19:23