I think there might be a problem with Google Scholar!


Google scholar seems to be having problems with papers are originally indexed as arXiv preprints, then go on to be published elsewhere. e.g. a journal.  Google Scholar seems to ignore the new (real) citation, and be stuck with the arXiv submission- Has anyone else experienced this issue?

Case in point: A new paper I have out in PeerJ, that was originally an arXiv submission:

This is what google Scholar shows, today. Other papers, including newer ones from PeerJ can be seen in Google Scholar, so I don’t think its a matter of not waiting long enough..

Screen Shot 2013-08-05 at 7.55.38 AM



The related articles and versions that google scholar lists– none of them are the PeerJ paper. It’s as if the new paper is completely invisible to Google Scholar.

Searching Google Scholar is futile as well- it lists the arXiv paper, but NOT the PeerJ paper. Try it out for yourself:



Of note, this exact same thing is happening with another paper, the Assemblathon 2 paper- Google Scholar only sees the arXiv version, NOT the GigaScience version.

Google Scholar does not seem to index citations from preprints, and if the pub is lost, your h-index is not going to increase as fast unless Google Scholar fixes this. I’ve sent a message to their tech support, but am not expecting too much there..



The Tenure Track Job Search

Known to some of you, I recently accepted a tenure track position at the University of New Hampshire in their Department of Molecular, Cellular, and Biomedical Sciences.. I was helped immensely by my Twitter and blog friends (especially this set of posts), and this is my attempt at paying it forward. I’d really like to share my thoughts and experiences with those going through what I just finished..

First off, some basic info. I study evolution/adaptation/genomics.. this is an extremely hot field right now, which I’m sure was to my benefit. I was on the market only 1 year, having sent in my 1st application in October of 2012. I sent in about 35 applications, with a mixture of jobs for which I thought I was perfect, and others which were more of a stretch.. I have no glamour pubs.. in fact, a very modest publication record.. I got 1 phone interview, 2 on-campus interviews, 1 offer.. I was in year 2 of my 1st postdoc.

A few random points:

  • The job wiki was extremely helpful, though I know I wasted WAY too much time obsessing over there.. 
  • I got almost no science done during application season. Between writing and being paralyzed by anxiety, it was tough.
  • Rejection is REALLY difficult. Even from jobs that I didn’t really want.
  • Read all the posts at TT job search advice aggregator and Chronicle of Higher Ed.  
  • Use Google, talk to others who’ve recently gone through this…


Like I said above, I sent in over 35 applications over a ~6 month period. Some of these were clearly out of my league (e.g. Stanford) and a few others I applied to on a whim– places that would be cool to live, etc.. My application consisted of a 1 page cover letter, 2-3 page research statement, 1 page teaching statement, and my CV. For those of you that might benefit from seeing my package, please email.. I included links to my blog and twitter feed in the apps.. I have no idea if this helped or hurt, but I’m young and cool (don’t tell my teenagers), and wanted to demonstrate this in my package. I cited primary literature in my teaching statement- a few different people said that was really cool and unusual.

I did not do a ton of customizing each app– I figured my research statement was what it was, and though I did edit each some to speak to the individual job ad, this was actually pretty minimal in most cases. It was really helpful to have more senior people and peers read my document. I sent it to at least 5-6 people, each had slightly different opinions.

I looked at web pages of the various departments to which I was applying, and generally named a few people that I’d like to collaborate with- this list usually included somebody outside the hiring department. I was given the advice later on in my search to NOT name names in the application, but I did, and this didn’t seem to hurt me.

I sent the application package in, and generally, if I didn’t get a confirmation email response, I’d email the chair of the committee after a week of 2 and just confirm receipt, and of course introduce myself.. I emailed again at the 2 month point if I had not heard anything.

Letters: these seemed really important.. Like other people have said, big names are helpful, but better to get stronger letters from lesser known people.. Ask them early, and plan on reminding them many times over.


Phone interviews = bad… Bad for the candidate, I think bad for the committee.. If you have to do this, use Skype. Speaker phones are bad.. you can’t see who’s asking/answering.. no body language to read into.. bad bad bad! What I will say is that I had a list of questions ready for the panel, and that came in really helpful. I must have screwed up my phone interview as I was not invited to campus..

On campus interviews =  good. I was surprisingly at ease for both of my interviews. I generally can bullshit with almost anybody, and I think that helps. I think if you’re the type of person that is socially awkward, probably time to practice- talk to seminar speakers, scary faculty in your current department, etc.. Main thing, just be yourself (assuming yourself is likeable, friendly). Do all the common sense things– ask good questions, no looking constantly at your watch/cell phone/the ceiling. Avoid bashing your current PI, no gossip… you know basically, don’t give them a reason to reject you. I did a reasonable amount of homework on the people I met, and this was important.. I generally did not read papers (ok, make 1 or 2, and a few more abstracts), but for everybody, I knew a bit about their labs and what they did.

People will be nice to you. I didn’t meet any jerks, and only had only 1 really awkward meeting- our research areas were miles apart, him being a biomechanics/physics guy.  Nobody really challenged me too much about my stuff, though people were generally interested in talking about it. I did have a list of general question that I would pull from if conversation stalled.

The Job Talk: I practiced a lot.. I mean really a lot.. Like every morning for a couple of hours alone in an empty seminar room. I gave 2-3 practice talks, including one to a group of scary faculty members here at Berkeley. This was really important. These people generally did not know my research, and identified things a few areas where more background was needed. When it was time to actually give the talk for real, it was super. I was able to focus on the delivery rather than on making sure I said the stuff I was supposed to.

Unlike the application package, I did worry about customizing the talk to the department. I thought about projects that I’d like to do, for instance, in New Hampshire.. Importantly, this was not BS, I’ll actually end up doing the project I pitched in my job talk as a faculty member!


This is gonna have to wait for another post.. Lots to be said here.

Bitching and moaning:

Maybe I am a the most impatient person in the world, but i had a really hard time waiting for applications to be reviewed, and decisions to be made. It took forever, and most of them felt like they were being send directly into the vortex, never to be seen again. I have no idea why this process needs to be so darn nebulous, but I vow to make it better.

I know that getting 60823428 applications for a single job means that committees can’t communicate with each individuals, but c’mon.. a blog/wiki/twitter acct with regular updates could prevent nervous breakdowns and hundreds of email inquiries…For instance.. once every week, make a blog post (“we’re reviewing applications!”, next week “wow, still reading”, next “we’re narrowing the pool to 40 applicants, you’ll receive an email if you’ve been cut”, and on and on..) People are OK with waiting, its the black-hole nature of the process that is maddening!! The same thing goes when you’re at the short list phase.. A blog would be perfect at disseminating this info, quickly, to a large number of neurotic applicants…

sed and awk for genomics

In my continuing quest to conquer fastQ, fastA, sam & bam files, I have accumulated several useful ‘tools’. Many of them are included in other software packages (e.g. SAMtools), but for some tasks, especially file management and  conversion, no standard toolkit exists, and instead researchers script their own solution.

For me, sed and awk, along with a few other standard *nix tools have been extremely useful, a few of them are generally useful, I think… In hopes of helping others, I’m going to start a list, here. The initial list is short (these were the ones I used in the past week or so, fresh in my mind), but I’ll plan to add and update as I discover new, better ways of handling these files. MOst all of these were inspired by stuff I’ve seen on the internet, so if you have something you want to share.. please add it in the comments!

#Create an unwrapped fasta file, you should add ambiguous nucleotides to the search if you have them, also, fasta defline ending in ‘ACTGNn-‘ will trip the script up, so use caution…

sed -i ':begin;$!N;/[ACTGNn-]\n[ACTGNn-]/s/\n//;tbegin;P;D' test.fa

#Add /1 to end of fastQ defline, good for assemblers that need those to tell left and right files. Note that you should change [0-9] to whatever follows your @– this is machine specific.

sed -i 's_^@[0-9]:.*_&/1_g' left.fq
sed -i 's_^@[0-9]:.*_&/2_g' right.fq

#Deinterleave a shuffled fastq file

sed -n '/2:N/{N;N;N;p;}' shuff.fastq > right.fq
sed -n '/1:N/{N;N;N;p;}' shuff.fastq > left.fq

##Get the sum of a column (Column 2 ($2)) in this case

awk '{add +=$2} END {print add}'

#Get SD of a column (Column 2 ($2) in this case

awk 'NR>2 {sum+=$2; array[NR]=$2} END {for(x=1;x<=NR;x++){sumsq+=((array[x]-(sum/NR))^2);}print sqrt(sumsq/NR)}'

#Print mean of column (Column 2 ($2)) in this case

awk '{sum+=$2} END { print "Average = ",sum/NR}'

#Remove duplicate entries in column 10, and print new spreadsheet at unique10.txt

cat test.txt | sort -k10 | awk '!a[$10]++' > unique10.txt

#Retain entries in speadsheet where column 3 is greater than 300 (imagine, culling by sequence length)

cat table.txt | awk '300>$3{next}1' > morethan300.txt

#Extract fastQ from BAM for a bunch of BAM files– requires Picard. (sorry, no sed or awk in this one)

for i in `ls *bam`;
do F=`basename $i .bam`;
java -Xmx9g -jar ~/software/SamFormatConverter.jar MAX_RECORDS_IN_RAM=1000000 INPUT=$i OUTPUT=/dev/stdout | java -Xmx9g -jar ~/software/SamToFastq.jar MAX_RECORDS_IN_RAM=1000000 INPUT=/dev/stdin FASTQ=$F.1.fq SECOND_END_FASTQ=$F.2.fq;

samtools consensus seq generation

Instead of trying to squeeze this explanation into 140 characters, here is a short blog about the issue. I’ve posted the question to the samtools list serve, but no response there.

I am generating consensus sequences from a series of BAM files using the standard samtools command:

samtools mpileup -AIuf my.fasta my.bam | bcftools view -cgI – | vcfutils.pl vcf2fq > my.fq

What happens with the genotype is 50/50 is that instead of calling one of the bases, it instead uses an ambiguity code- R, Y, M, etc.. This is problematic for me, as those polymorphism are interesting, and the downstream work (aligning, testing for selection) cannot handle them.


  1. Is there a way to force samtools to call an nucleotide rather than an ambiguous base?
  2. Is there a better way to be generating these consensus sequences from BAM files?


PeerJ Preprint server for ALL of Biology!

I just got done uploading a manuscript to the PeerJ Preprint server.  It was awesome! This was a manuscript that was outside even the loosest of definitions of quantitative biology, and therefore not appropriate for arXiv.

I am really keen to see biologists using preprint servers, and up until now, where to send non quant. manuscripts was a bit of a problem. Sure, researchers could (and sometime do) host pre-review manuscripts on their own website, but this is problematic for a few reasons.  While I’m not getting into the details here, the issue of visibility (and having a DOI) is huge– how should people find interesting manuscripts? A centralized repository like arXiv or PeerJ is crucial!

Now that the PeerJ preprint server is up and running, it’s up to us to keep it going. I like arXiv for a number of reasons (also, it’s more established), but the PeerJ Preprint server has a few major advantages– commenting (which may fill the role Haldane’s sieve currently fills for arXiv papers), speed (article visible within a few hours), social media integration, and article level metrics (at least basic metrics).

So, the question is this– where to send my NEXT preprint?? Do I send to arXiv, or to PeerJ.. I guess I’m kind of leaning towards PeerJ, unless somebody has a good argument against it.

P.S. I guess there are 3 things that concern me:

  1. What happens to preprints if PeerJ folds– are they lost in the bowels of the internets?
  2. Will Journals consider PeerJ preprints equivalent to those posted on arXiv?
  3. Does Google Scholar index PeerJ preprints? (I think the answer is Yes)

I hate it when GREP doesn’t work like I want it to!

OK, so I have been fighting with GREP all afternoon– Im about to kick it in the teeth! Somehow, there are more lines in my output file than in my subject file.. This is driving me crazy cause I can’t figure out why!!!!

Here is the simple enough command:

query.txt | sort -k1 | awk '{print $1}' | grep -wf - subject.txt > out.txt

>head query.txt
comp10000_c0_seq1 0
comp10002_c0_seq1 0
comp10003_c0_seq1 0
comp10004_c0_seq1 0
comp10005_c0_seq1 0
comp10007_c0_seq1 0
comp1000_c0_seq1 0
comp10011_c0_seq1 0
comp10013_c0_seq1 0
comp10014_c0_seq1 0

>head subject.txt
comp10000_c0_seq1 comp1898_c0_seq2 100.00 5407 0 0 1 5407 1 5407 0.0 9985
comp10002_c0_seq1 comp8374_c0_seq1 100.00 754 0 0 1 754 1 754 0.0 1393
comp10003_c0_seq1 comp8423_c0_seq1 100.00 4387 0 0 1 4387 1 4387 0.0 8102
comp10004_c0_seq1 comp8084_c0_seq1 100.00 3036 0 0 1 3036 1 3036 0.0 5607
comp10005_c0_seq1 comp8387_c0_seq1 100.00 2122 0 0 1 2122 1 2122 0.0 3919
comp10007_c0_seq1 comp8168_c0_seq1 100.00 1141 0 0 1 1141 1 1141 0.0 2108
comp1000_c0_seq1 comp23962_c0_seq1 100.00 326 0 0 1 326 1 326 2e-172 603
comp10011_c0_seq1 comp2125_c0_seq1 100.00 333 0 0 1 333 718 386 3e-176 616
comp10013_c0_seq1 comp8442_c0_seq1 100.00 2745 0 0 1 2745 1 2745 0.0 5070
comp10014_c0_seq1 comp8362_c0_seq1 100.00 1335 0 0 1 1335 1 1335 0.0 2466

>head out.txt
comp10000_c0_seq1 comp1898_c0_seq2 100.00 5407 0 0 1 5407 1 5407 0.0 9985
comp10002_c0_seq1 comp8374_c0_seq1 100.00 754 0 0 1 754 1 754 0.0 1393
comp10003_c0_seq1 comp8423_c0_seq1 100.00 4387 0 0 1 4387 1 4387 0.0 8102
comp10004_c0_seq1 comp8084_c0_seq1 100.00 3036 0 0 1 3036 1 3036 0.0 5607
comp10005_c0_seq1 comp8387_c0_seq1 100.00 2122 0 0 1 2122 1 2122 0.0 3919
comp10007_c0_seq1 comp8168_c0_seq1 100.00 1141 0 0 1 1141 1 1141 0.0 2108
comp1000_c0_seq1 comp23962_c0_seq1 100.00 326 0 0 1 326 1 326 2e-172 603
comp10011_c0_seq1 comp2125_c0_seq1 100.00 333 0 0 1 333 718 386 3e-176 616
comp10013_c0_seq1 comp8442_c0_seq1 100.00 2745 0 0 1 2745 1 2745 0.0 5070
comp10014_c0_seq1 comp8362_c0_seq1 100.00 1335 0 0 1 1335 1 1335 0.0 2466

>wc -l query.txt subject.txt out.txt
22885 query.txt
23560 subject.txt
23560 out.txt

So in theory, query is a subset of subject, so there should be no more than 22885 hits in the outfile.. there should be no duplicates using the -w option in GREP..

Nevertheless, I scanned these files for duplicates, and found none…

cat query.txt | sort -k1 | awk '!a[$1]++' | wc -l
cat subject.txt | sort -k1 | awk '!a[$1]++' | wc -l
cat out.txt | sort -k1 | awk '!a[$1]++' | wc -l

No duplicates…

So I’m stumped..

Improving transcriptome assembly through error correction of high-throughput sequence reads

I am writing this blog post in support of a paper that I have just submitted to arXiv: Improving transcriptome assembly through error correction of high-throughput sequence reads. My goal is not to talk about the nuts and bolts of the paper so much as it is to ramble about its motivation and the writing process.

First, a little bit about me, as this is my 1st paper with my postdoctoral advisor, Mike Eisen. In short, I am a evolutionary biologist by training, having done my PhD on the relationship between mating systems and immunogenes in wild rodents. My postdoc work focuses on adaptation to desert life in rodents- I work on Peromyscus rodents in the Southern California deserts, combining field work and genomics. My overarching goals include the ability to operate in multiple domains– genomics, field biology, evolutionary biology to better understand basic questions– the links between genotype and phenotype, adaptation, etc… OK, enough.. on the the paper.


The study of functional genomics–particularly in non-model organisms has been dramatically improved over the last few years by use of transcriptomes and RNAseq. While these studies are potentially extremely powerful, a computationally intensive procedure–the de novo construction of a reference transcriptome must be completed as a prerequisite to further analyses. The accurate reference is critically important as all downstream steps, including estimating transcript abundance are critically dependent on the construction of an accurate reference. Though a substantial amount of research has been done on assembly, only recently have the pre-assembly procedures been studied in detail. Specifically, several stand-alone error correction modules have been reported on, and while they have shown to be effective in reducing errors at the level of sequencing reads, how error correction impacts assembly accuracy is largely unknown. Here, we show via use of a simulated dataset, that applying error correction to sequencing reads has significant positive effects on assembly accuracy, by reducing assembly error by nearly 50%, and therefore should be applied to all datasets.

For the past couple of years, I have had an interest in better understanding the dynamics of de novo transcriptome assembly.. I had mostly selfish/practical reasons for wanting to understand–a large amount of my work depends on getting these assemblies ‘right’..  It was quickly evident that much of the computational research is directed at assembly itself, and very little on the pre- and post-assembly processes.. We know these things are important, but often an understanding of their effects is lacking…

How error correction of sequencing reads affects assembly accuracy has been one of the specific ideas I’ve been interested in thinking about for the past several months. The idea of simulating RNAseq reads, applying various error corrections, then understanding their effects is logical– so much so that I was really surprised that this has not been done before. So off I went..

I wrote this paper over the coarse of a couple of weeks. It is a short and simple paper, and was quite easy to write. Of note, about 75% of the paper was written on the playground in the UC Berkeley University Village, while (loosely) providing supervision for my 2 youngest daughters.  How is that for work-life balance!

The read data will be available on Figshare, and I owe thanks to those guys for lifting the upload limit- the read file is 2.6Gb with .bz2 compression, so not huge, but not small either. The winning (AllPathsLG corrected) assembly is there as well.

This type of work is inspired, in a very real sense, by C. Titus Brown, who is quickly becoming to be the go-to guy for understanding the nuts and bolts of genome assembly (and also got tenure based on his klout score HA!). His post and paper on The challenges of mRNAseq analysis is the type of stuff that I aspire to…

Anyway, I’d be really interested in hearing what you all think of the paper, so read, enjoy, commentand get to error correcting those reads!


UPDATE 1: The paper has made it to Haldane’s Sieve: http://haldanessieve.org/2013/04/04/improving-transcriptome-assembly-through-error-correction-of-high-throughput-sequence-reads/ and http://haldanessieve.org/2013/04/05/our-paper-improving-transcriptome-assembly-through-error-correction-of-high-throughput-sequence-reads/

Question about Structural Variants: These seem to be, like always, poorly reconstructed from de novo assemblies.. No worse with error corrected reads, but no better. If fact, contigs that are ‘full of errors’ are almost always those with complicated structural variation…

Real reads would behave differently than simulated reads. This certainly could be the case (but I don’t think so).. What I can say is that the 1st iteration of this experiment was done using reads from from the SRA– Homo.. The reason I did not use that experiment in the end is that it was hard to tell error from polymorphism relative to the reference.. but the patterns were the same. Fewer differences in error corrected reads relative to the the reference than in raw reads.. So, I do not think that the results I see are a result of the artificial nature of the simulated reads.


cool spam!

Every once in a while I get a spam message that is cooler than average– this one is, mainly because somebody took some time to add a few more details about the ‘mystery box of money’ than usual. For instance, Mark Morgan, who is the FBI agent named in this email, is actually a FBI agent in El Paso.

I just have to wonder tho, how could you actually fall for this type of scam, even if a bit more sophisticated? Is this little old ladies, teenagers? Even with the additional details, who would think it’s plausible that they had been sent 4.1 million dollars, just out of the blue? I know spammers play the odds, and I guess its likely true that sending the message to a very large number of people will uncover a few suckers..

Special Agent in Charge
Federal Bureau of Investigation
Intelligence Field Unit
El Paso Federal Justice Center
660 South Mesa Hills Drive
El Paso, TX 79912 USA


I am special agent Mark A. Morgan, from the Intelligence Unit of the Federal Bureau of Investigation (FBI). We just intercepted/confiscated one (1) Trunk Box at the Dallas/Fort Worth International Airport Texas. We are on the verge of moving this consignment to the bureau headquarters. However, we scanned the said box and found out that it contained a total of USD$4.1M. Investigation carried out on the Diplomat who accompanied this box into the United States, revealed that he was to make the delivery of the fund to your residence, as these fund are entitled to you, been Contract/Inheritance over due payments. The funds were from the office of the Dr. (Mrs.) Ngozi Okonjo-Iweala Minister of Finance, Federal Republic of Nigeria.

Furthermore, after cross checking all the information we found in the box backing you up as the beneficiary of the funds, it became known to us that one of the documents is missing. This document is very important and until we get the document, the box will be temporarily confiscated pending when you will provide it. The much needed document is the Diplomatic Immunity Seal of Delivery Certificate (DISDC). This document will protect you from going against the US Patriot Act Section 314a and Section 314b. This delivery will be tagged A Diplomatic Transit Payment (D.T.P) once you get the document.

You are therefore required to get back to me on this email (***@Superposta.com) within 24 hours so that I will guide you on how to get the much needed document. Failure to comply with this directive may lead to the permanent confiscation of the funds and possible arrest. We may also get the Financial Action Task Force on Money Laundering (FATF) involved if do not follow our instructions. You are also advised not to get in contact with any Bank in Africa, Europe or any other institution, as your fund are here now in the United States of America.

Agent Mark A. Morgan
Special Agent in Charge
Federal Bureau of Investigation
Intelligence Field Unit
El Paso Federal Justice Center
660 South Mesa Hills Drive
El Paso, TX 79912 USA


Confidentiality Notice: This communication and its attachments may contain non-public, confidential or legally privileged information. The unlawful interception, use or disclosure of such information is prohibited. If you are not the intended recipient, or have received this communication in error, please notify the sender immediately by reply email and delete all copies of this communication and attachments without reading or saving them.

edgeR: spotty documentation

Ok, just a quick post here to verbalize something that bothers me (and everybody else, I suspect).. The issue of poor documentation of software.. Even ‘well’ documented programs have huge issues.. For instance, I am working with edgeR a lot lately.. The documentation is pretty good (link), but there is a lot still left up to the investigators imagination.. For instance, when using the fiunction ‘calcNormFactors’, this is what the manual says:

Description: Calculate normalization factors to scale the raw library sizes.


calcNormFactors(object, method=c(“TMM”,”RLE”,”upperquartile”), refColumn = NULL,logratioTrim = .3, sumTrim = 0.05, doWeighting=TRUE, Acutoff=-1e10, p=0.75)

object: either a matrix of raw (read) counts or a DGEList object
method: method to use to calculate the scale factors
refColumn: column to use as reference for method=”TMM”
logratioTrim: amount of trim to use on log-ratios (“M” values) for method=”TMM”
sumTrim: amount of trim to use on the combined absolute levels (“A” values) for method=”TMM”
doWeighting: logical, whether to compute (asymptotic binomial precision) weights for method=”TMM”
Acutoff: cutoff on “A” values to use before trimming for method=”TMM”
p: percentile (between 0 and 1) of the counts that is aligned when method=”upperquartile”

method=”TMM” is the weighted trimmed mean of M-values (to the reference) proposed by
Robinson and Oshlack (2010), where the weights are from the delta method on Binomial data.
If refColumn is unspecified, the library whose upper quartile is closest to the mean upper quartile
is used.

Where do the default values for logratioTrim, sumTrim, etc. come from.. Should I change them? To what, how, why, when.. It would be really nice to have some guidance about when to consider changing these, and how to reasonably do so..

Of note, nearly every one of the functions contain poorly documented options, and edgeR is one of  the most well documented packages out there..

end rant..

Day 2: Field work continues

photo (2)A mostly frustrating day in the field:  I had more success in the mouse department, but they were VERY cold, and they didn’t want to eat/drink.. this part of the plan was critical.. Sleep– yes, eat, not so much..  To make sure the mice dont get so cold tonight, I bought some pillow stuffing, and filled the trap with plenty.. Hopefully tomorrow I’ll find nice, toasty mice, just waiting to drink the fresh water I’ll provide!  I also added a bit more food, hoping that some more carbs will keep their little mouse metabolism high.  Here I am holding a mouse in each hand, trying to warm them up.


On my return to the station, I found my wife and kids doctoring a bat.. a western pipistrelle, which is the most common bat here in Deep Canyon.. They found it in the pool, hanging on for dear life. This poor guy must have swooped just a bit too low. He was cold, and had a torn wing. We turned him over to the reserve biologist for rehabilitation..


In general, the field is a great place to take the kids. Around every corner is an adventure.. a learning experience. This time of year is relatively safe (no venomous snakes), so the worst that can probably happen is getting ‘bit’ by a cactus, or a scraped knee, but finding old bones, looking through scat, digging up an ant hill, etc far outweigh this. Our kids have largely grown up as city kids, so getting them out into nature is great!