Skip to Content

Net::Twitter 3.x and Moose performance Syndicate content

Dave Sherohman's picture

Marc Mims has recently taken over maintainership of Net::Twitter and rebuilt it on top of Moose. This has been a cause of much concern for me, as it does not appear that the non-Moose Net::Twitter::Lite will meet my needs. Yesterday's Net::Twitter Roadmap post proved quite reassuring in that regard, but did not fully address my concerns. As Marc's blog does not appear to support comments, I am posting my thoughts on it here.

First off, huge kudos to Marc for doing The Right Thing by porting it to Moose while also choosing to maintain a non-Moose version for those who, for whatever reason, aren't (yet) in a position to use Moose. While it surely means additional work for the package maintainer, it seems likely to be the most effective means of promoting Modern Perl without imposing unnecessary dependencies on others.

As another side point, it's nice to see the mention that he's considering turning Twitter's responses into actual Perl objects instead of bare hashrefs. I definitely vote in favor of that idea, as it's something I'd been planning to write when I got time, either as a patch to Net::Twitter itself or as a new package which would bless Net::Twitter's hashrefs into appropriate classes.

Now, on to my concerns:

My particular situation is that I am using Net::Twitter (currently at 2.12) for a currently-CGI-based1,2 web application which, 99% of the time, does not interact with Twitter directly. The bulk of its Twitter interaction takes place through a cron job, but users will also be able to post status updates through the application (not yet implemented). Twitter authentication will be handled through OAuth, both for the users' convenience and because I don't want to take responsibility for securely storing other peoples' credentials for a third-party site.

This, then, presents the cause of my concerns about Net::Twitter 3. On the one hand, everything I've heard about Moose startup times has said that it's slow to load, which is bad news for CGI performance. On the other, Net::Twitter::Lite doesn't support OAuth.

In actual practice, I guess it doesn't really matter that much, since Net::Twitter will only be needed for the tiny fraction of requests in which users are posting status updates, so it can just be required in the relevant section of code and avoid the overhead of loading Moose unless it's actually going to be used. Worrying too much about it probably falls under "premature optimization".

Still, though, this does hit on my primary reason for having avoided Moose thus far: The bulk of my Perl development tends to be CGI-based1 and I don't want to pay the overhead of loading Moose up on every request received. While I haven't benchmarked it myself, the simple fact that someone has taken the time to create Mouse specifically to get quicker startup seems to be a strong argument for it being an actual issue.

But there's definitely a lot of goodness in Moose, though, and stvn has said on PerlMonks that improving startup time is a current focus of Moose development, even going so far as to discourage the use of Mouse unless you really need it.

What are others currently doing with respect to Moose and CGI-based1 development? Avoid it to maintain acceptable performance or is it not actually something worth being concerned over?


1 "CGI-based" as in "CGI is the means of communicating with the web server process", not necessarily "use CGI;". Catalyst or Mason apps can also be "CGI-based" as I'm using the term.

2 It's currently communicating with apache via CGI for ease of development. I do have full control of the server and will be moving it over to mod_perl once it stabilizes a bit more, making Moose startup time a moot point, but I'd prefer to do that later rather than sooner.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Moose startup

Anonymous's picture

The eventual plan is to precompile startup so that unless you access $obj->meta at any point -after- startup, Moose doesn't have to be loaded into your process at all.

Unfortunately, paying work has been getting in the way of me getting this finished as yet - but I'll be at YAPC::NA next week and am hoping to sit down with nperez, hack on it a bit and see if he fancies helping, since very similar code to what I need to achieve the precompilation should be rather useful to some of the code he's working on.

So, we'll see ...

One technique for improving

Anonymous's picture

One technique for improving Moose startup time is to require Moose classes in a lazy fashion.

That is, only require them in the subroutine that needs them (via require, *NOT* use). This works especially well with lazy_load 1. So if your invocation doesn't require a particular portion of your application, you don't pay the loading penalty it.

This is very general advice, I'm not sure it applies to Net::Twitter or if it even needs it.

Also, why mod_perl? FastCGI (via mod_fastcgi) looks to be the way to go these days.

Lazy loading

Dave Sherohman's picture

Lazy loading of Net::Twitter would definitely work and is probably what I'll end up doing, as I mentioned in the "In actual practice..." paragraph of the original post. I don't see it as a general solution which would help with using Moose as the basis for my core domain objects, though, since at least one such class will be required to process any request (if nothing else, most web apps will need to get at the user's preferences by way of a User object), so the overhead of Moose itself will be there, even if you're minimizing the number of Moose-based classes being built.

Or have I misunderstood the performance issues with Moose and the slow part is building the Moosey classes rather than loading up Moose itself?

As for why mod_perl, I've always gotten the impression that FCGI was largely just a "poor man's mod_perl" for use in shared hosting environments or other scenarios where mod_perl isn't available, so I've never looked at it that closely. Aside from simpler installation of individual programs, what are the main advantages of FCGI over mod_perl? A quick google search didn't turn up any other major differences, but everything was over three years old, so things may have changed significantly since then.

Roughly the difference is

Anonymous's picture

Roughly the difference is mod_perl gives you full access to the Apache internals. Which is awesome, except it comes with a price, being tightly bound to Apache. Your Perl application runs as part of the Apache process which means that upgrades and restarts require bouncing Apache. FastCGI apps run as a separate daemon and thus aren't bound to Apache at all. The cost for this is you don't have (much) control over the Request cycle. You however gain the ability to restart the FCGI back-end independent of the front-end. You also gain the ability to move away from Apache if you so desire, despite having used mod_perl *heavily* in the past I've found myself using nginx and FastCGI exclusively these days. If you're developing in Catalyst there is almost no difference in deployment between mod_perl and FastCGI.

I just remembered this point

Anonymous's picture

I just remembered this point too, the recommended mod_perl setup is to have a small Apache on the front-end and a mod_perl enabled Apache "app server" on the back-end. FastCGI works this way by default because the back-end processes are separate. To my mind less than FastCGI being a crappy version of mod_perl, I think of it as a crappy version of running a dedicated HTTP server (think Mongrel from Rails, or mod_perl app server).

mod_perl

Anonymous's picture

There are modules for mod_perl that allow to write your programs similar to CGI ones. Advantage is that you will not need to have separate FastCGI run per program. When FastCGI program is run by Apache (most convenient), restarting it still requires restarting Apache. For mod_perl there are modules that allow reloading code, or you can even tell mod_perl to remove your program from memory, so it will be loaded again on next request.

mod_perl

Anonymous's picture

If you're running your application using Apache::Registry (to write it similar to CGI), then I'm of the opinion that you have abstraction fail - writing new code like this is silly!

Fastcgis run by apache can quite happily be killed and apache will restart them on the next hit. Yes, if you kill an active one then you're going to send a user a broken page, but avoiding that (by using SIGUSR1 or similar to flag 'die at end of hit, or die now') is fairly easy to arrange.. Either way, fully restarting mod_perl is going to give you user visible downtime anyway.

That said, I don't think that having apache spin up your fastcgis for you is 'most convienient' at all - if you controll the application process lifecycle yourself then you can happily spin up a second instance of the same application against the same socket, with additional debugging turned on, or NYTProf loaded, or anything else you like. This instance will get a % of your application hits alongside the pre-existing instance. This level of control is extremely useful.

With regard to reloading your application code, I don't think that any of the available solutions for mod_perl will be found to work well for any non-trivial applications, and using any of them explicitly means that you're going to get exactly zero memory sharing. I.e. If you have 10 backend application server processes, then your memory use is going to be at least 10x size of perl interpreter running your application. Even if you don't use code reloading at all, and load all your code in a perl startup handler, then the tricks done by mod_perl to clone the interpreter tend to cycle the heap such that you don't get anywhere near the theoretical amount of shared memory - fastcgi, with its more simple plain forking model does a lot better at this in my experience..

>Or have I misunderstood the

Anonymous's picture

>Or have I misunderstood the performance issues with Moose and the slow part is building the Moosey classes rather than loading up Moose itself?

Yes, "use Moose;" is negligible. That is, simply including Moose in your code won't have much of an impact. If you're doing lots of Moosey stuff for many, many classes, then you may notice an impact. But you only pay for what you use (which is why I recommended lazy loading for larger applications, and I've only really noticed this problem with a command-line application I'm writing).

That being said, I don't want dissuade anyone from using Moose. The performance stuff is manageable, and not really even an issue in a persistent environment. The "force multiplier" of Moose in regards to developer time, maintenance time, and code clarity is worth every penny.

---

In regards to mod_perl, I'd turn the question around, and ask why would you want to couple your web server and application so tightly? Some downsides of coupling:

* Your application now has the same user/group as Apache, do you want this?
* You need to worry about @INC management... in Apache (ugh)
* Apache and application upgrades are now coupled: you can't upgrade one without interrupting the other.
* Every mod_perl application you run in Apache now shares the same perl interpreter (may not be a big deal with a dedicated setup, I tend to develop/play-with things simultaneously).

FastCGI keeps the applicationing in the application and the web serving in the web server. It is a step towards the ideal: dedicated services communicating over well-defined interfaces.

Thanks for the clarification

Dave Sherohman's picture

Thanks for the clarification regarding Moose.

As to mod_perl, none of the potential issues you mentioned are things which have ever given me problems in any of my previous mod_perl projects, aside from the need to apache2ctl graceful before app updates will (reliably) take effect. Perhaps this is because I was writing apache modules in C long before I first touched CGI programming, so my approach to mod_perl is primarily "I'm writing an apache module in Perl" rather than "I have a CGI program that I want to make persistent". When I was first doing web stuff in Perl, I even started that by writing to mod_perl before finally getting around to trying out CGI.

To answer more directly why I'd want to use mod_perl, it's what I know and, as I mentioned earlier, I've generally seen FastCGI presented as something you can use when you want mod_perl-like performance, but can't use mod_perl for whatever reason. This presentation implicitly paints it as an second choice, inferior to the real mod_perl. Since I already know mod_perl and have never run into a situation where vanilla CGI is too slow and mod_perl isn't available to me, I've never had cause to look into FastCGI.

In the interest of honesty

Anonymous's picture

In the interest of honesty "use Moose" does incur a penalty (currently).

[23:06:02] Alice-3@~ $time perl -e1

real 0m0.408s
user 0m0.002s
sys 0m0.011s
[13:53:01] Alice-3@~ $time perl -MMoose -e1
Moose does not export its sugar to the 'main' package.

real 0m1.138s
user 0m0.255s
sys 0m0.059s

Moose does a fair amount of bootstrapping. Also I think I should point out (since I"m doing the clarity thing here) Mouse was written for Shawn Moore to learn how a MOP was put together. It was incidentally faster because of design choices he made, and because adopted as the official "tiny Moose" until it became obvious that people wanted a "tiny Moose" to do everything the Real Moose (tm) does but somehow magically faster. That is a no win game and Shawn has gone on to recommend that people not use Mouse except in the specific situations we tend to outline (hard startup requirements, hard dependency requirements). He would rather see the efforts to optimize things be put into Moose because he (and I and many many others) believe it's the better solution.

Finally in the interests of disclosure, I am a active Moose developer and have worked on and with Mouse.

Cron? CGI?

Anonymous's picture

I think others have addressed the make Moose fast thing enough.

My question is why are you using CGI and cron jobs? It seems like the IPC and performance limitations would are not only a concern, but also a lot of work.

It smells to me like you have a web handler, a message queue, and a batch message processor.

If you use an event oriented application framework (such as POE or AnyEvent) this stuff should be much simpler, and have improved performance (no startup cost, lower latency in message handling, etc).

You could easily delegate from a webserver to the event application using FCGI or HTTP proxying. Personally I don't see this as a poor man's mod_perl, but rather I see mod_perl as a non portable FCGI that comes with an extension language for Apache that most people will never need. FCGI is a much simpler and more universal architecture, and allows you to easily sandbox your applications, too.

It's not a "pure" web

Dave Sherohman's picture

It's not a "pure" web application. cron is being used to keep tabs on the outside world and gather data asynchronously which will ultimately be displayed by the web interface. The cron tasks are not triggered by or directly related to user activity. I can see why the vague description in the original post could give the impression that the web interface is collecting requests which are being queued up for cron to execute them, but that's not what I'm doing.

Do you have a link to a good summary of FCGI's benefits over mod_perl? As I mentioned earlier, a quick googling only turned up rather old information which painted them as being roughly equivalent.

FastCGI vs. mod_perl

Anonymous's picture

All three of Chris, Robert, and Yuval have given a pretty good summary of the benefits of FastCGI. Can you be more specific about what you're looking for that they didn't answer? I'm having a hard time imagining something that would be in a summary that isn't already presented above.

Rereading their comments, the

Dave Sherohman's picture

Rereading their comments, the main point I see is that FCGI isn't bound to apache - you can use it with other servers (I've never personally encountered any reason to use a not-apache web server), it can run as a user other than apache's user (www-data is already about as non-privileged as you can get; why would I want to increase a web process's access to the system?), etc. While I can definitely imagine situations in which mod_perl being tied to apache would be a disadvantage, I've been using it on and off for seven years without running into any of them, so I'm not terribly worried about hitting one in the foreseeable future.

What I'm looking for but not seeing, then, would be an advantage that isn't easily reduced to "the perl interpreter exists outside of apache". Practical benefits, not just the semi-abstract "it's decoupled" - is it faster, more stable (I've never had stability problems with mod_perl, but others may have), able to handle more concurrent requests, etc.?

At this point, I'm not sold on it, but I have added "look into FCGI more closely" to my "when I have time" queue, if only for the sake of being able to update my source and immediately test it without needing to remember to take an additional step (apache2ctl graceful) first. Now it's just a question of whether FCGI should be promoted to the "make time for this" list.

mod_perl...

Anonymous's picture

Instead of praising FCGI, here's what's wrong with mod_perl:

  • Requires linkage level coupling with apache, which means you either trust your vendors' Perl or you build your own apache or you are likely to break things if you upgrade.
  • Has limitations imposed by the fact that there is just one Perl interpreter. It's not just a different user, but it's different loaded modules (and potentially different versions of loaded modules), different global state things that assume there is global state. Your apps don't get into each others' hair.
  • Non standard. The standard is the implementation (except that there are two and the fact that 1.3 even exists was still a pain for 2.0 users even recently). FCGI won't change any time soon, and you can use the same and the same setup.
  • If ithreads are involved then you take a performance hit (and other drawbacks) imposed by Apache's architecture that should be irrelevant and indeed are irrelevant using FCGI.

The only clear advantage of mod_perl is if you need to write custom apache level authentication handlers or things like that. But really IMHO coupling app level behavior so tightly to the webserver is just... wrong. I can see how one would need it (e.g. NTLM auth for your intranet apps on a windows network), but that's still not something to aspire to.

Regarding the points that you make:

www-data is not as non priviliged as you can get for your app. It still has access to all other apps, all of their data, and all the data in the web root. If any of that data is private an exploit in your app will allow accessing that data. If your application is sandboxed from the webserver and other apps, only its data is compromised.

Just because you are used to something that works doesn't mean it can't be simpler. mod_perl is not the devil, it's simply more complicated than it needs to be.

There is also no additional step to test your app. Restarting your app has nothing to do with restarting the webserver, it can continue to server static data and all other apps, while you restart only your FCGI handler.

NTLM or web server auth is a bad example.

Anonymous's picture

The only clear advantage of mod_perl is if you need to write custom apache level authentication handlers or things like that. But really IMHO coupling app level behavior so tightly to the webserver is just... wrong. I can see how one would need it (e.g. NTLM auth for your intranet apps on a windows network), but that's still not something to aspire to.

There is totally some valid stuff to do in mod_perl, for example Apache-SMTP (yes, that's crazy, but a good example of what you can do when close to the metal).

NTLM auth or client SLL certs are, IMO, a bad example, as you can trivially deal with this through environment variables - these are the specific use-cases for Catalyst's remote credential module.

OAuth in Net::Twitter::Lite, soon

Anonymous's picture

If OAuth support in Net::Twitter::Lite solves your dilemma, the solution shouldn't be far away. It is the very next thing I intend to implement. Some work to pay the mortgage, and a few other issues are competing for my time, but it should be ready within a few days.

OAuth in N::T::Lite

Dave Sherohman's picture

Thanks again for that! While it would/will solve my immediate issue of Twitter/OAuth, there's still the broader question of using Moose in vanilla CGI applications which it doesn't address, but that's a little outside the scope of your work anyhow...

(Really, I nearly cancelled this post entirely when I decided that lazy loading Net::Twitter for the short term and moving from vanilla CGI to a persistent environment in the long term should be more-than-adequate solutions. Perhaps I should have re-titled it to reflect that its main point had become "CGI vs. Moose" rather than "Net::Twitter and Moose".)

Vanilla CGI, development, and deployment

Anonymous's picture

Especially in light of all the comments it looks to me like the problem is in your deployment stack, not the modules you want or don't want to use.

Using vanilla CGI for development is unnecessary. The argument of having to restart is simply wrong, there are many development servers shipped with the various web frameworks that support automatic restarting on code changes, that let you run the same code under multiple deployment environments easily (choose FCGI or mod_perl without code changes), etc.

If you're letting the fact that you're developing with CGI now and worrying about load time, and are deferring deployment to your chosen production environment (likely because it's very complicated to set up), then the problem is not in the development/deployment environment, not in the language or the tools.

That's a fair critique.

Dave Sherohman's picture

You make a good point. I generally use vanilla CGI for initial development to avoid the hassle of having to poke apache to get it to (reliably) recognize code changes. I don't like frameworks that carry their own HTTP server along with them,1 but it sounds like FCGI doesn't need that extra step, so it would be suitable for development right from the start.

1 I'm deploying to apache, so why would I develop against not-apache? And, with apologies to Robert K, shouldn't the framework stick to frameworking and leave web serving to the web server?

Development Servers

Anonymous's picture

The embedded web server is for ease of development, not production. You're focusing only on deployment at launch time when it comes to choosing your platform.

If you would want to use two deployment environments with different priorities then using a web framework makes this easier.

The priorities for production deployment are good latency and throughput, handled scalability and stably when dispatching requests from multiple users.

The priorities for a development deployment are ease of updating, and simplicity (ensuring that what you see is really what you're working on), with the minimum setup, possibly at the expense of features or performance.

So in a deployment environment, yes, the framework should stick to frameworking, but since writing a development oriented server is a framework specific thing, most of the servers that really help the development cycle are tailored to the framework you are using.

FWIW, yes, FCGI is easier to develop with, but still not as easy as an instant on, zero configuration web server.

Getting to your statement of not wanting to develop against something that isn't Apache - well isn't that why this blog post started in the first place? You were dealing with the tension between your development environment and your production environment. If your development environment was similar to your production one (which vanilla CGI isn't), then you wouldn't be thinking twice about loading Moose.

In a sense you already are doing what you don't want to do in a way that is hurting your workflow much more than a web framework's embedded server would.

If you use a lightweight framework like HTTP::Engine or Mojo then your code will be very similar in structure to a mod_perl handler, but you will be able to use FCGI, mod_perl, or the bundled development server. The code won't need to be changed to go into production, and in contrast to vanilla CGI the general performance characteristics will be a lot more like your production environment.

These micro frameworks do everything *but* frameworking (as opposed to say Catalyst/Jifty/Mojolicious). The kool aid is an addon feature.

I have again expressed myself

Dave Sherohman's picture

I have again expressed myself unclearly in the original post... My initial plan (prior to this discussion of FastCGI) was to develop in vanilla CGI, deploy the beta in vanilla CGI, work out the kinks in beta, and only then convert the development copy to mod_perl after things have more-or-less stabilized, followed by deploying the mod_perl version. The tension was between performance concerns and a desire to do The Right Thing in a case where The Right Thing (i.e., using something Moose-dependent) would harm performance; the development and deployment target environments would have remained the same throughout, aside from the late-beta transition period when development would be focused on mod_perl, while production remained vanilla CGI.

Why not target mod_perl/fcgi

Anonymous's picture

Why not target mod_perl/fcgi from the start? Vanilla CGI is still getting in the way of your work or Moose's load time wouldn't have been a concern at all.

FCGI is the new plan.

Dave Sherohman's picture

Developing against mod_perl is a PITA when the code hasn't stabilized and is changing frequently and I didn't want to go FCGI because of my (incorrect) perception of it as nothing more than a half-assed substitute for mod_perl. Given the discussion here, I'm now planning to get FCGI installed on my workstation as soon as time permits and proceed with it from there. I expect it to cost a little time up front to learn the new environment, but it sounds likely to be a worthwhile investment.

Why bother with apache for development?

Anonymous's picture

but it sounds like FCGI doesn't need that extra step, so it would be suitable for development right from the start.

Correct, you can make a .htaccess in your ~/public_html/ if you want to, pointing out the socket for your app. If you also alias all the static content in .htaccess, then you really only need one app server process, and you don't have to make fcgi daemonise.. So just running your app without having to even restart apache works totally fine.

That said, why bother? I always design my apps such that their deployment location (i.e. root uri) can be configured (or autodetected hopefully) - so why not use perl as your web server too, there's just less to manage in development, you gain abstraction (you're sure you'll work if you ever need to move to another platform) and you can start multiple copies of the aplication at the same time on different ports (really handy for side by side ui comparisons..)

That said, I only do this on my workstation, and deploy as fcgi as soon as I put the code anywhere I'm going to point somebody to look at.

Post new comment