Category Archives: Uncategorized

I think there might be a problem with Google Scholar!

People!!!

Google scholar seems to be having problems with papers are originally indexed as arXiv preprints, then go on to be published elsewhere. e.g. a journal.  Google Scholar seems to ignore the new (real) citation, and be stuck with the arXiv submission- Has anyone else experienced this issue?

Case in point: A new paper I have out in PeerJ, that was originally an arXiv submission:

This is what google Scholar shows, today. Other papers, including newer ones from PeerJ can be seen in Google Scholar, so I don’t think its a matter of not waiting long enough..

Screen Shot 2013-08-05 at 7.55.38 AM

 

 

The related articles and versions that google scholar lists– none of them are the PeerJ paper. It’s as if the new paper is completely invisible to Google Scholar.

Searching Google Scholar is futile as well- it lists the arXiv paper, but NOT the PeerJ paper. Try it out for yourself:

http://scholar.google.com/scholar?q=Improving+transcriptome+assembly+through+error+correction+of+high-throughput+sequence+reads&btnG=&hl=en&as_sdt=0%2C5

 

Of note, this exact same thing is happening with another paper, the Assemblathon 2 paper- Google Scholar only sees the arXiv version, NOT the GigaScience version.

Google Scholar does not seem to index citations from preprints, and if the pub is lost, your h-index is not going to increase as fast unless Google Scholar fixes this. I’ve sent a message to their tech support, but am not expecting too much there..

 

 

The Tenure Track Job Search

Known to some of you, I recently accepted a tenure track position at the University of New Hampshire in their Department of Molecular, Cellular, and Biomedical Sciences.. I was helped immensely by my Twitter and blog friends (especially this set of posts), and this is my attempt at paying it forward. I’d really like to share my thoughts and experiences with those going through what I just finished..

First off, some basic info. I study evolution/adaptation/genomics.. this is an extremely hot field right now, which I’m sure was to my benefit. I was on the market only 1 year, having sent in my 1st application in October of 2012. I sent in about 35 applications, with a mixture of jobs for which I thought I was perfect, and others which were more of a stretch.. I have no glamour pubs.. in fact, a very modest publication record.. I got 1 phone interview, 2 on-campus interviews, 1 offer.. I was in year 2 of my 1st postdoc.

A few random points:

  • The job wiki was extremely helpful, though I know I wasted WAY too much time obsessing over there.. 
  • I got almost no science done during application season. Between writing and being paralyzed by anxiety, it was tough.
  • Rejection is REALLY difficult. Even from jobs that I didn’t really want.
  • Read all the posts at TT job search advice aggregator and Chronicle of Higher Ed.  
  • Use Google, talk to others who’ve recently gone through this…

Application:

Like I said above, I sent in over 35 applications over a ~6 month period. Some of these were clearly out of my league (e.g. Stanford) and a few others I applied to on a whim– places that would be cool to live, etc.. My application consisted of a 1 page cover letter, 2-3 page research statement, 1 page teaching statement, and my CV. For those of you that might benefit from seeing my package, please email.. I included links to my blog and twitter feed in the apps.. I have no idea if this helped or hurt, but I’m young and cool (don’t tell my teenagers), and wanted to demonstrate this in my package. I cited primary literature in my teaching statement- a few different people said that was really cool and unusual.

I did not do a ton of customizing each app– I figured my research statement was what it was, and though I did edit each some to speak to the individual job ad, this was actually pretty minimal in most cases. It was really helpful to have more senior people and peers read my document. I sent it to at least 5-6 people, each had slightly different opinions.

I looked at web pages of the various departments to which I was applying, and generally named a few people that I’d like to collaborate with- this list usually included somebody outside the hiring department. I was given the advice later on in my search to NOT name names in the application, but I did, and this didn’t seem to hurt me.

I sent the application package in, and generally, if I didn’t get a confirmation email response, I’d email the chair of the committee after a week of 2 and just confirm receipt, and of course introduce myself.. I emailed again at the 2 month point if I had not heard anything.

Letters: these seemed really important.. Like other people have said, big names are helpful, but better to get stronger letters from lesser known people.. Ask them early, and plan on reminding them many times over.

Interviews:

Phone interviews = bad… Bad for the candidate, I think bad for the committee.. If you have to do this, use Skype. Speaker phones are bad.. you can’t see who’s asking/answering.. no body language to read into.. bad bad bad! What I will say is that I had a list of questions ready for the panel, and that came in really helpful. I must have screwed up my phone interview as I was not invited to campus..

On campus interviews =  good. I was surprisingly at ease for both of my interviews. I generally can bullshit with almost anybody, and I think that helps. I think if you’re the type of person that is socially awkward, probably time to practice- talk to seminar speakers, scary faculty in your current department, etc.. Main thing, just be yourself (assuming yourself is likeable, friendly). Do all the common sense things– ask good questions, no looking constantly at your watch/cell phone/the ceiling. Avoid bashing your current PI, no gossip… you know basically, don’t give them a reason to reject you. I did a reasonable amount of homework on the people I met, and this was important.. I generally did not read papers (ok, make 1 or 2, and a few more abstracts), but for everybody, I knew a bit about their labs and what they did.

People will be nice to you. I didn’t meet any jerks, and only had only 1 really awkward meeting- our research areas were miles apart, him being a biomechanics/physics guy.  Nobody really challenged me too much about my stuff, though people were generally interested in talking about it. I did have a list of general question that I would pull from if conversation stalled.

The Job Talk: I practiced a lot.. I mean really a lot.. Like every morning for a couple of hours alone in an empty seminar room. I gave 2-3 practice talks, including one to a group of scary faculty members here at Berkeley. This was really important. These people generally did not know my research, and identified things a few areas where more background was needed. When it was time to actually give the talk for real, it was super. I was able to focus on the delivery rather than on making sure I said the stuff I was supposed to.

Unlike the application package, I did worry about customizing the talk to the department. I thought about projects that I’d like to do, for instance, in New Hampshire.. Importantly, this was not BS, I’ll actually end up doing the project I pitched in my job talk as a faculty member!

Negotiation:

This is gonna have to wait for another post.. Lots to be said here.

Bitching and moaning:

Maybe I am a the most impatient person in the world, but i had a really hard time waiting for applications to be reviewed, and decisions to be made. It took forever, and most of them felt like they were being send directly into the vortex, never to be seen again. I have no idea why this process needs to be so darn nebulous, but I vow to make it better.

I know that getting 60823428 applications for a single job means that committees can’t communicate with each individuals, but c’mon.. a blog/wiki/twitter acct with regular updates could prevent nervous breakdowns and hundreds of email inquiries…For instance.. once every week, make a blog post (“we’re reviewing applications!”, next week “wow, still reading”, next “we’re narrowing the pool to 40 applicants, you’ll receive an email if you’ve been cut”, and on and on..) People are OK with waiting, its the black-hole nature of the process that is maddening!! The same thing goes when you’re at the short list phase.. A blog would be perfect at disseminating this info, quickly, to a large number of neurotic applicants…

samtools consensus seq generation

Instead of trying to squeeze this explanation into 140 characters, here is a short blog about the issue. I’ve posted the question to the samtools list serve, but no response there.

I am generating consensus sequences from a series of BAM files using the standard samtools command:

samtools mpileup -AIuf my.fasta my.bam | bcftools view -cgI – | vcfutils.pl vcf2fq > my.fq

What happens with the genotype is 50/50 is that instead of calling one of the bases, it instead uses an ambiguity code- R, Y, M, etc.. This is problematic for me, as those polymorphism are interesting, and the downstream work (aligning, testing for selection) cannot handle them.

So:

  1. Is there a way to force samtools to call an nucleotide rather than an ambiguous base?
  2. Is there a better way to be generating these consensus sequences from BAM files?

 

I hate it when GREP doesn’t work like I want it to!

OK, so I have been fighting with GREP all afternoon– Im about to kick it in the teeth! Somehow, there are more lines in my output file than in my subject file.. This is driving me crazy cause I can’t figure out why!!!!

Here is the simple enough command:

query.txt | sort -k1 | awk '{print $1}' | grep -wf - subject.txt > out.txt

>head query.txt
comp10000_c0_seq1 0
comp10002_c0_seq1 0
comp10003_c0_seq1 0
comp10004_c0_seq1 0
comp10005_c0_seq1 0
comp10007_c0_seq1 0
comp1000_c0_seq1 0
comp10011_c0_seq1 0
comp10013_c0_seq1 0
comp10014_c0_seq1 0

>head subject.txt
comp10000_c0_seq1 comp1898_c0_seq2 100.00 5407 0 0 1 5407 1 5407 0.0 9985
comp10002_c0_seq1 comp8374_c0_seq1 100.00 754 0 0 1 754 1 754 0.0 1393
comp10003_c0_seq1 comp8423_c0_seq1 100.00 4387 0 0 1 4387 1 4387 0.0 8102
comp10004_c0_seq1 comp8084_c0_seq1 100.00 3036 0 0 1 3036 1 3036 0.0 5607
comp10005_c0_seq1 comp8387_c0_seq1 100.00 2122 0 0 1 2122 1 2122 0.0 3919
comp10007_c0_seq1 comp8168_c0_seq1 100.00 1141 0 0 1 1141 1 1141 0.0 2108
comp1000_c0_seq1 comp23962_c0_seq1 100.00 326 0 0 1 326 1 326 2e-172 603
comp10011_c0_seq1 comp2125_c0_seq1 100.00 333 0 0 1 333 718 386 3e-176 616
comp10013_c0_seq1 comp8442_c0_seq1 100.00 2745 0 0 1 2745 1 2745 0.0 5070
comp10014_c0_seq1 comp8362_c0_seq1 100.00 1335 0 0 1 1335 1 1335 0.0 2466

>head out.txt
comp10000_c0_seq1 comp1898_c0_seq2 100.00 5407 0 0 1 5407 1 5407 0.0 9985
comp10002_c0_seq1 comp8374_c0_seq1 100.00 754 0 0 1 754 1 754 0.0 1393
comp10003_c0_seq1 comp8423_c0_seq1 100.00 4387 0 0 1 4387 1 4387 0.0 8102
comp10004_c0_seq1 comp8084_c0_seq1 100.00 3036 0 0 1 3036 1 3036 0.0 5607
comp10005_c0_seq1 comp8387_c0_seq1 100.00 2122 0 0 1 2122 1 2122 0.0 3919
comp10007_c0_seq1 comp8168_c0_seq1 100.00 1141 0 0 1 1141 1 1141 0.0 2108
comp1000_c0_seq1 comp23962_c0_seq1 100.00 326 0 0 1 326 1 326 2e-172 603
comp10011_c0_seq1 comp2125_c0_seq1 100.00 333 0 0 1 333 718 386 3e-176 616
comp10013_c0_seq1 comp8442_c0_seq1 100.00 2745 0 0 1 2745 1 2745 0.0 5070
comp10014_c0_seq1 comp8362_c0_seq1 100.00 1335 0 0 1 1335 1 1335 0.0 2466

>wc -l query.txt subject.txt out.txt
22885 query.txt
23560 subject.txt
23560 out.txt

So in theory, query is a subset of subject, so there should be no more than 22885 hits in the outfile.. there should be no duplicates using the -w option in GREP..

Nevertheless, I scanned these files for duplicates, and found none…


cat query.txt | sort -k1 | awk '!a[$1]++' | wc -l
22885
cat subject.txt | sort -k1 | awk '!a[$1]++' | wc -l
23560
cat out.txt | sort -k1 | awk '!a[$1]++' | wc -l
23560

No duplicates…

So I’m stumped..

cool spam!

Every once in a while I get a spam message that is cooler than average– this one is, mainly because somebody took some time to add a few more details about the ‘mystery box of money’ than usual. For instance, Mark Morgan, who is the FBI agent named in this email, is actually a FBI agent in El Paso.

I just have to wonder tho, how could you actually fall for this type of scam, even if a bit more sophisticated? Is this little old ladies, teenagers? Even with the additional details, who would think it’s plausible that they had been sent 4.1 million dollars, just out of the blue? I know spammers play the odds, and I guess its likely true that sending the message to a very large number of people will uncover a few suckers..

Special Agent in Charge
Federal Bureau of Investigation
Intelligence Field Unit
El Paso Federal Justice Center
660 South Mesa Hills Drive
El Paso, TX 79912 USA

URGENT ATTENTION

I am special agent Mark A. Morgan, from the Intelligence Unit of the Federal Bureau of Investigation (FBI). We just intercepted/confiscated one (1) Trunk Box at the Dallas/Fort Worth International Airport Texas. We are on the verge of moving this consignment to the bureau headquarters. However, we scanned the said box and found out that it contained a total of USD$4.1M. Investigation carried out on the Diplomat who accompanied this box into the United States, revealed that he was to make the delivery of the fund to your residence, as these fund are entitled to you, been Contract/Inheritance over due payments. The funds were from the office of the Dr. (Mrs.) Ngozi Okonjo-Iweala Minister of Finance, Federal Republic of Nigeria.

Furthermore, after cross checking all the information we found in the box backing you up as the beneficiary of the funds, it became known to us that one of the documents is missing. This document is very important and until we get the document, the box will be temporarily confiscated pending when you will provide it. The much needed document is the Diplomatic Immunity Seal of Delivery Certificate (DISDC). This document will protect you from going against the US Patriot Act Section 314a and Section 314b. This delivery will be tagged A Diplomatic Transit Payment (D.T.P) once you get the document.

You are therefore required to get back to me on this email (***@Superposta.com) within 24 hours so that I will guide you on how to get the much needed document. Failure to comply with this directive may lead to the permanent confiscation of the funds and possible arrest. We may also get the Financial Action Task Force on Money Laundering (FATF) involved if do not follow our instructions. You are also advised not to get in contact with any Bank in Africa, Europe or any other institution, as your fund are here now in the United States of America.

Agent Mark A. Morgan
Special Agent in Charge
Federal Bureau of Investigation
Intelligence Field Unit
El Paso Federal Justice Center
660 South Mesa Hills Drive
El Paso, TX 79912 USA

Email:***@Superposta.com

Confidentiality Notice: This communication and its attachments may contain non-public, confidential or legally privileged information. The unlawful interception, use or disclosure of such information is prohibited. If you are not the intended recipient, or have received this communication in error, please notify the sender immediately by reply email and delete all copies of this communication and attachments without reading or saving them.

Day 2: Field work continues

photo (2)A mostly frustrating day in the field:  I had more success in the mouse department, but they were VERY cold, and they didn’t want to eat/drink.. this part of the plan was critical.. Sleep– yes, eat, not so much..  To make sure the mice dont get so cold tonight, I bought some pillow stuffing, and filled the trap with plenty.. Hopefully tomorrow I’ll find nice, toasty mice, just waiting to drink the fresh water I’ll provide!  I also added a bit more food, hoping that some more carbs will keep their little mouse metabolism high.  Here I am holding a mouse in each hand, trying to warm them up.

_MDM0164

On my return to the station, I found my wife and kids doctoring a bat.. a western pipistrelle, which is the most common bat here in Deep Canyon.. They found it in the pool, hanging on for dear life. This poor guy must have swooped just a bit too low. He was cold, and had a torn wing. We turned him over to the reserve biologist for rehabilitation..

 

In general, the field is a great place to take the kids. Around every corner is an adventure.. a learning experience. This time of year is relatively safe (no venomous snakes), so the worst that can probably happen is getting ‘bit’ by a cactus, or a scraped knee, but finding old bones, looking through scat, digging up an ant hill, etc far outweigh this. Our kids have largely grown up as city kids, so getting them out into nature is great!

 

Update from the field

We drove a long 9 hour drive on Wednesday, through the LA basin, then east to the Inland Empire– the Southern California Deserts. As I mentioned in the last post, I am really excited for this trip, as I am collecting a bunch of interesting and important data.

I have been watching the weather, and it has been cold and dry, exceptionally so on both accounts. Neither of these factors bode well, nor does the fact that is it a moonless night. Because we ended up arriving somewhat later than expected, I set traps in the dark, using my headlamp.. this makes is difficult to optimize trap placement..

This morning, nothing… no mice, completely empty.. This is the 1st time this has happened in an extremely long time. Bummer! I decided to try my luck elsewhere, and move my traps to a new area. Thankfully, I work in a canyon where good rodent habitat is nearly contiguous. I moved over an arroyo, to a rock face that had plenty of fresh looking rodent signs.. We’ll see what heppens tomorrow.

In the absence of mice, I decided to take the family on a long walk back into the canyon. This was great, and exhausting. Several close calls with the cholla cactus, and one more serious ‘interaction’ that required I use the needle-nose pliers to remove cactus spines.  All this with 3 year old on my shoulders for much of the walk back home..

bug  In ending, we saw a cool bug. Anyone know what it is (I don’t)?

Heading to the field!

I’m heading to the field in a couple of days– super excited! Busy today trying to wrap up a few things on campus, and pack up supplies. This should be a great trip, and I have plans to collect some really interesting stuff.. In particular, I’m really excited about trying out the hand-held refractometer that ATAGO gave me to demo– extracting a bit of urine directly from the bladder of animals may be challenging, but this type of info on the physiologic status of individuals is exactly what I really need.

In addition to urine, I’m collecting more blood, which will be used for electrolyte analyses– again, this physiology data is super informative. I have more mouse cages as well, so I should be able to scale up the water supplementation experiments as well! When I get back, going to make RNAseq libraries STAT, as I’d love to include these data in upcoming job talk.

I’ll plan on posting some from the field, so stay tuned.